A little bit more...

Monday, November 13, 2006

Three ways of validating a xml document with Java

With the rollout of Java 5.0 last year, JAXP 1.3 was in place for use. And one of the new features provided by JAXP 1.3 is a brand new Schema Validation Framework.

The newly provided framework decouples the validation of an instance document as a process independent of parsing. The Validation APIs are in the new package javax.xml.validation and let developers obtain from a compiled schema a Validator or/and a Validator Handler which are used to validate xml against the given schema. Alternatively, a compiled schema instance could also be passed to any Reader/Parser to validate xml. So there're roughly two ways provided by the new Schema Validation Framework. And besides these two, setting the uncomplied schema source on Reader/Parser is also available due to the issue of backward compatibility. As we can see in the first article and the accompanying example codes listed in the Resources section, the newly introduced Validation Frame improves the performance, effiency and flexibility.

Below are simple code snippets to respectively illustrate how validating xml documents is done in these three ways.

1. Set uncompiled schema (since JAXP 1.2):
private static void saxParseJAXP1_2(String xmlFile, DefaultHandler dh,
String schemaFile) {
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(true);
SAXParser sp = spf.newSAXParser();
sp.setProperty(
http://java.sun.com/xml/jaxp/properties/schemaLanguage,
XMLConstants.W3C_XML_SCHEMA_NS_URI);
sp.setProperty(
"
http://java.sun.com/xml/jaxp/properties/schemaSource",
schemaFile);

sp.parse(new File(xmlFile), dh);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

2. Set compiled schema instance (since JAXP 1.3, FIX ME HERE)
private static void saxParseSetSchemaJAXP1_3(String xmlFile, DefaultHandler dh,
String schemaFile) {
try {
SchemaFactory sf = SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = sf.newSchema(new File(schemaFile));
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setSchema(schema);
SAXParser sp = spf.newSAXParser();
sp.parse(new File(xmlFile), dh);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}

3. Validator (since JAXP1.3)
private static void saxParseValidateJAXP1_3(String xmlFile,
ErrorHandler dh, String schemaFile) {
try {
SchemaFactory sf = SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI);
Validator validator = sf.newSchema(
new File(schemaFile)).newValidator();

validator.setErrorHandler(dh);
validator.validate(new StreamSource(xmlFile));
} catch (Exception e) {
e.printStackTrace();
}

It's noteworthy that the first way and the second way can apply for both DOM source and SAX source, while the third way is usually only used to validate a SAX stream (FIX ME HERE).

Update (20061113):

Basics of using Schema

Be aware of the concept of xml target namespace and "source namespaces". The name defined in a schema are said to belong to its target namespace. Definitions and declarations in a schema can refer to names that may belong to other namespaces. In the fourth article those namespaces are referred to as "source namespaces". And here follows a little colour as to simple type and complex type. An element that doesn't contain attributes or other elements can be defined to be of a simple type, predefined or user-defined, such as string, integer, decimal, time, etc. Elements with attributes and embeded elements must have a complex type. There're a huge amount of details about XML Schema definition that are not covered here but can be found here.

Simple example

A xml instance document:
<?xml version = "1.0" encoding = "utf-8"?>
<SONGS xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='mySong.xsd'>
<SONG genre = "pop">
<TITLE > Hot Cop </TITLE>
<COMPOSER > Jacques Morali
</COMPOSER>
<COMPOSER>Henri Belolo</COMPOSER>
<COMPOSER>Victor Willis</COMPOSER>
<PRODUCER>Jacques Morali</PRODUCER>
<PUBLISHER>PolyGram Records</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG>
</SONGS>

The corresponding schema definition:
<?xml version="1.0" encoding="UTF-8" ?>
<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
<xsd:element name="SONGS">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="SONG" minOccurs='1' maxOccurs='unbounded' />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="SONG">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="TITLE" type="xsd:string" />
<xsd:element name="COMPOSER" type="xsd:string" maxOccurs='unbounded' />
<xsd:element name="PRODUCER" type="xsd:string" maxOccurs='unbounded' />
<xsd:element name="PUBLISHER" type="xsd:string" maxOccurs='unbounded' />
<xsd:element name="LENGTH" type="xsd:string" />
<xsd:element name="YEAR" type="xsd:gYear" />
<xsd:element name="ARTIST" type="xsd:string" maxOccurs='unbounded' />
</xsd:sequence>
<xsd:attribute name="genre" type="xsd:string" />
</xsd:complexType>
</xsd:element>
</xsd:schema>

Resources:

1. Easy and Efficient XML Processing: Upgrade to JAXP 1.3

2. Java 2 Platform Standard Edition 5.0 API Specification

3. Java 2 Platform Standard Edition 1.4.2 API Specification

4. The basics of using XML Schema to define elements

5. XML Schema Part 0: Primer Second Edition

No comments:

About Me

My photo
I'm finishing my master degree in Software Engineering, Computer Science. I believe and have been following what Forrest Gump's Mam said: you have to do the best with what god gave you.