A number of technologies can be used to automate this transformation, such as the Apache Commons Digester (a rules-based entity mapper) or XMLBeans (which provides schema-based bean generation).
Document Structure and VersioningBecause you are using static typing, there is an implicit understanding that the document structure is going to be fixed. For most applications, this is fine, except when a new version of the document is introduced and the code is required to handle the old version as well as the new one. The new structure may be a superset of the old one or it may have structural changes that make it incompatible with the old one. Either way, the programmer needs to determine the version and handle it accordingly.
For real-world examples, we'll use the Java web application deployment descriptors. These can be found in J2EE WAR files (
WEB-INF/web.xml) and define servlet mappings, access control and so on. Typical file signatures are listed below, starting at version 2.2 (the version that introduced web.xml). Notice that when version 2.4 was introduced, the platform switched from document type definitions (DTD) to schemas (XSD) for validation.
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN" "http://java.sun.com/j2ee/dtds/web-app_2_2.dtd"> <web-app>
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd"> <web-app>
<?xml version="1.0" encoding="UTF-8"?> <web-app id="WebApp_ID" version="2.4" xmlns="http://java.sun.com/xml/ns/j2ee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
<?xml version="1.0" encoding="UTF-8"?> <web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:web="http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" id="WebApp_ID" version="2.5">
- Mix of XSD and DTD means that the same mechanism cannot be used to determine version in all files.
- The file has to be parsed twice - once to determine the version; again when you pass the file off to whatever framework you're using to instantiate the beans.
- Versioning is domain dependent; there is no universal solution to determining the version of an XML document.
Using the SAX Parser to Determine
Since the SAX parser comes with the standard Java library (since version 1.4), it is probably the best bet for determining the version. The code that follows will print out the information that can be used to determine the file version.
(EDIT: actually, StAX would be a better option than SAX if you're on Java SE 6, Java EE 5 or above. It lets you avoid parsing the entire document.)
Sample output for version 2.3 (DTD) and 2.4 (XSD) files looks like this:
METHOD=resolveEntity publicId=-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN systemId=http://java.sun.com/dtd/web-app_2_3.dtd METHOD=startElement version=2.4 xmlNamespace=http://java.sun.com/xml/ns/j2ee schemaLocation=http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd
By matching the strings, we get the version. (EDIT: Note that the
schemaLocationattribute is optional for documents.)
The cost of the version check can be minimized by:
- Throwing a
SAXExceptionto stop the
SAXParserafter a match has been found.
- Reusing the
InputStreamby using the