Thursday, 28 May 2009

Java: using XPath with namespaces and implementing NamespaceContext

XPath is a handy expression language for running queries on XML. This post is about how to use it with XML namespaces in Java (javax.xml.xpath).

This Java code and uses an XPath expression to extract the value of the bar attribute from a simple document:

    XPathFactory factory = XPathFactory.newInstance();
    XPath xpath = factory.newXPath();

    String xml = "<data><foo bar=\"hello\" /></data>";
    String value = xpath.evaluate(
        "/data/foo/attribute::bar"new InputSource(
            new StringReader(xml)));
    System.out.println(value);

When run, it prints hello on the console.

XML with namespaces

When the XML uses namespaces, things get a little bit trickier. These two documents are functionally equivalent:

<?xml version="1.0" encoding="utf-8"?>
<!-- ns1.xml -->
<data xmlns:foo="http://foo" xmlns:bar="http://bar"
  xmlns="http://def">
  <foo:value>1</foo:value>
  <bar:value>2</bar:value>
</data>
<?xml version="1.0" encoding="utf-8"?>
<!-- ns2.xml -->
<data xmlns:bar="http://foo" xmlns:foo="http://bar"
  xmlns="http://def">
  <bar:value>1</bar:value>
  <foo:value>2</foo:value>
</data>

Note that the namespace prefixes (foo and bar) have been swapped round, but the value element in the namespace http://foo contains the value 1 in both documents. Likewise, the value element in the http://bar namespace contains the number 2 in both documents.

Since the namespace prefixes can vary in the documents, a namespaced XPath expressions need to map their own prefixes to the URIs. The namespace URIs act as constant identifiers - that's their job! In the Java API, this mapping is performed by implementing the NamespaceContext interface.

This code uses a NamespaceContext to extract the value in the http://foo namespace from each of the documents:

    InputSource ns1xml = new InputSource("ns1.xml");
    InputSource ns2xml = new InputSource("ns2.xml");

    NamespaceContext context = new NamespaceContextMap(
        "foo""http://foo"
        "bar""http://bar",
        "def""http://def");

    XPathFactory factory = XPathFactory.newInstance();
    XPath xpath = factory.newXPath();
    xpath.setNamespaceContext(context);
    XPathExpression expression = xpath.compile("/def:data/foo:value");

    System.out.println(expression.evaluate(ns1xml));
    System.out.println(expression.evaluate(ns2xml));

Note that the expression was compiled for reuse. Output:

1
1

The prefixes given to the context only need to be consistent with the XPath expressions, not the documents. This code works just as well:

    InputSource ns1xml = new InputSource("ns1.xml");
    InputSource ns2xml = new InputSource("ns2.xml");

    NamespaceContext context = new NamespaceContextMap(
        "abc""http://foo",
        "pqr""http://bar"
        "xyz""http://def");

    XPathFactory factory = XPathFactory.newInstance();
    XPath xpath = factory.newXPath();
    xpath.setNamespaceContext(context);
    XPathExpression expression = xpath
        .compile("/xyz:data/abc:value");

    System.out.println(expression.evaluate(ns1xml));
    System.out.println(expression.evaluate(ns2xml));

Unfortunately, there are no implementations of NamespaceContext provided in the standard library (well, there is one in StAX but it is of limited utility). If you choose to implement it yourself, take note of the entire contract as defined in the javadoc. A sample implementation is provided below.

Note 2011/07: I've corrected the above listings to remove a namespace mapping of ("", "http://def") with an expression starting with /:data. This expression is not legal syntax - see the comments for more details.

Listing: NamespaceContextMap.java

import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;

import javax.xml.XMLConstants;
import javax.xml.namespace.NamespaceContext;

/**
 * An implementation of <a
 * href="http://java.sun.com/javase/6/docs/api/javax/xml/namespace/NamespaceContext.html">
 * NamespaceContext </a>. Instances are immutable.
 
 @author McDowell
 */
public final class NamespaceContextMap implements
    NamespaceContext {

  private final Map<String, String> prefixMap;
  private final Map<String, Set<String>> nsMap;

  /**
   * Constructor that takes a map of XML prefix-namespaceURI values. A defensive
   * copy is made of the map. An IllegalArgumentException will be thrown if the
   * map attempts to remap the standard prefixes defined in the NamespaceContext
   * contract.
   
   @param prefixMappings
   *          a map of prefix:namespaceURI values
   */
  public NamespaceContextMap(
      Map<String, String> prefixMappings) {
    prefixMap = createPrefixMap(prefixMappings);
    nsMap = createNamespaceMap(prefixMap);
  }

  /**
   * Convenience constructor.
   
   @param mappingPairs
   *          pairs of prefix-namespaceURI values
   */
  public NamespaceContextMap(String... mappingPairs) {
    this(toMap(mappingPairs));
  }

  private static Map<String, String> toMap(
      String... mappingPairs) {
    Map<String, String> prefixMappings = new HashMap<String, String>(
        mappingPairs.length / 2);
    for (int i = 0; i < mappingPairs.length; i++) {
      prefixMappings
          .put(mappingPairs[i], mappingPairs[++i]);
    }
    return prefixMappings;
  }

  private Map<String, String> createPrefixMap(
      Map<String, String> prefixMappings) {
    Map<String, String> prefixMap = new HashMap<String, String>(
        prefixMappings);
    addConstant(prefixMap, XMLConstants.XML_NS_PREFIX,
        XMLConstants.XML_NS_URI);
    addConstant(prefixMap, XMLConstants.XMLNS_ATTRIBUTE,
        XMLConstants.XMLNS_ATTRIBUTE_NS_URI);
    return Collections.unmodifiableMap(prefixMap);
  }

  private void addConstant(Map<String, String> prefixMap,
      String prefix, String nsURI) {
    String previous = prefixMap.put(prefix, nsURI);
    if (previous != null && !previous.equals(nsURI)) {
      throw new IllegalArgumentException(prefix + " -> "
          + previous + "; see NamespaceContext contract");
    }
  }

  private Map<String, Set<String>> createNamespaceMap(
      Map<String, String> prefixMap) {
    Map<String, Set<String>> nsMap = new HashMap<String, Set<String>>();
    for (Map.Entry<String, String> entry : prefixMap
        .entrySet()) {
      String nsURI = entry.getValue();
      Set<String> prefixes = nsMap.get(nsURI);
      if (prefixes == null) {
        prefixes = new HashSet<String>();
        nsMap.put(nsURI, prefixes);
      }
      prefixes.add(entry.getKey());
    }
    for (Map.Entry<String, Set<String>> entry : nsMap
        .entrySet()) {
      Set<String> readOnly = Collections
          .unmodifiableSet(entry.getValue());
      entry.setValue(readOnly);
    }
    return nsMap;
  }

  @Override
  public String getNamespaceURI(String prefix) {
    checkNotNull(prefix);
    String nsURI = prefixMap.get(prefix);
    return nsURI == null ? XMLConstants.NULL_NS_URI : nsURI;
  }

  @Override
  public String getPrefix(String namespaceURI) {
    checkNotNull(namespaceURI);
    Set<String> set = nsMap.get(namespaceURI);
    return set == null null : set.iterator().next();
  }

  @Override
  public Iterator<String> getPrefixes(String namespaceURI) {
    checkNotNull(namespaceURI);
    Set<String> set = nsMap.get(namespaceURI);
    return set.iterator();
  }

  private void checkNotNull(String value) {
    if (value == null) {
      throw new IllegalArgumentException("null");
    }
  }

  /**
   @return an unmodifiable map of the mappings in the form prefix-namespaceURI
   */
  public Map<String, String> getMap() {
    return prefixMap;
  }

}

Listing: NamespaceContextMapTest.java

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import javax.xml.XMLConstants;
import javax.xml.namespace.NamespaceContext;

import org.junit.Assert;
import org.junit.Test;

import xml.NamespaceContextMap;

//JUnit 4 test
public class NamespaceContextMapTest {

  @Test
  public void testContext() {
    Map<String, String> mappings = new HashMap<String, String>();
    mappings.put("foo""http://foo");
    mappings.put("altfoo""http://foo");
    mappings.put("bar""http://bar");
    mappings.put(XMLConstants.XML_NS_PREFIX,
        XMLConstants.XML_NS_URI);

    NamespaceContext context = new NamespaceContextMap(
        mappings);
    for (Map.Entry<String, String> entry : mappings
        .entrySet()) {
      String prefix = entry.getKey();
      String namespaceURI = entry.getValue();

      Assert.assertEquals("namespaceURI", namespaceURI,
          context.getNamespaceURI(prefix));
      boolean found = false;
      Iterator<?> prefixes = context
          .getPrefixes(namespaceURI);
      while (prefixes.hasNext()) {
        if (prefix.equals(prefixes.next())) {
          found = true;
          break;
        }
        try {
          prefixes.remove();
          Assert.fail("rw");
        catch (UnsupportedOperationException e) {
        }
      }
      Assert.assertTrue("prefix: " + prefix, found);
      Assert.assertNotNull("prefix: " + prefix, context
          .getPrefix(namespaceURI));
    }

    Map<String, String> ctxtMap = ((NamespaceContextMapcontext)
        .getMap();
    for (Map.Entry<String, String> entry : mappings
        .entrySet()) {
      Assert.assertEquals(entry.getValue(), ctxtMap
          .get(entry.getKey()));
    }

    System.out.println(context.toString());
  }

  @Test
  public void testModify() {
    NamespaceContextMap context = new NamespaceContextMap();

    try {
      Map<String, String> ctxtMap = context.getMap();
      ctxtMap.put("a""b");
      Assert.fail("rw");
    catch (UnsupportedOperationException e) {
    }

    try {
      Iterator<String> it = context
          .getPrefixes(XMLConstants.XML_NS_URI);
      it.next();
      it.remove();
      Assert.fail("rw");
    catch (UnsupportedOperationException e) {
    }
  }

  @Test
  public void testConstants() {
    NamespaceContext context = new NamespaceContextMap();
    Assert.assertEquals(XMLConstants.XML_NS_URI, context
        .getNamespaceURI(XMLConstants.XML_NS_PREFIX));
    Assert.assertEquals(
        XMLConstants.XMLNS_ATTRIBUTE_NS_URI, context
            .getNamespaceURI(XMLConstants.XMLNS_ATTRIBUTE));
    Assert.assertEquals(XMLConstants.XML_NS_PREFIX, context
        .getPrefix(XMLConstants.XML_NS_URI));
    Assert.assertEquals(
        XMLConstants.XMLNS_ATTRIBUTE_NS_URI, context
            .getNamespaceURI(XMLConstants.XMLNS_ATTRIBUTE));
  }

}

11 comments:

  1. Found a great impl: org.apache.ws.commons.util.NamespaceContextImpl.
    You can use the following maven dependency for it:

    org.apache.ws.commons
    ws-commons-util
    1.0.1
    test

    ReplyDelete
  2. Thanks for your post, it's a pity we cannot find any implementation of NamespaceContext provided in the standard library.

    Can I use your sample code for the NamespaceContextMap or is it protected by a copyright ?

    Thanks again and seeya

    ReplyDelete
  3. @Anonymous - anyone is free to use the sample code in this post with the caveats noted at the bottom of the page.

    ReplyDelete
  4. Is there any way to use default namespace without using ":" (as the standard?) /data/foo:value instead of /:data/foo:value ?
    I am using xalan 2.7.1 and doesn't work, and if I use saxon I got a Unexpected colon at start of token

    ReplyDelete
  5. @Anonymous - I wasn't aware that implementations varied. I would be inclined to just namespace everything: "/xyz:data/abc:value"

    ReplyDelete
  6. According to http://www.w3.org/2007/01/applets/xpathApplet.html
    /:data/foo:value is an invalid expression.

    ReplyDelete
  7. @Anonymous - thanks for the link to the applet; I was not aware of it.

    Prompted by the comments, I've checked the spec. Namespace prefixes must be at least one character long.

    Here are the relevant parts of the lexical structure:

    PrefixedName ::= Prefix ':' LocalPart

    Prefix ::= NCName

    NCName ::= Name - (Char* ':' Char*) /* An XML Name, minus the ":" */

    Name ::= NameStartChar (NameChar)*


    Support for XPath in the Java 6 runtime is for version 1.0.

    The fact that /:elementName worked as an expression was just an accident of the implementation.

    I shall correct the post.

    ReplyDelete
  8. dang, I just implemented this in Clojure as a function that takes a hash-map of prefixes to URI strings, and returns a full implementation of NamespaceContext, and it's literally 7 lines of code.

    (defn namespace-map
    [mapping]
    (let [prefixes (fn [uri] (map key (filter #(= uri (val %)) mapping)))]
    (proxy [Object NamespaceContext] []
    (getNamespaceURI [prefix] (get mapping prefix))
    (getPrefix [uri] (first (prefixes uri)))
    (getPrefixes [uri] (.iterator (prefixes uri))))))

    ReplyDelete
    Replies
    1. Nice, but note that your type does not meet the class contract for NamespaceContext as it does not perform the special constant handling required by the API documentation.

      Delete
    2. Thanks! I fixed it up.

      (defn namespace-map
      "Returns an implementation of NamespaceContext ... actual usefulness TBD"
      [mapping]
      (let [defaults {XMLConstants/XML_NS_PREFIX XMLConstants/XML_NS_URI
      XMLConstants/XMLNS_ATTRIBUTE XMLConstants/XMLNS_ATTRIBUTE_NS_URI}
      mapping (merge mapping defaults)
      prefixes (fn [uri] (map key (filter #(= uri (val %)) mapping)))]
      (proxy [Object NamespaceContext] []
      (getNamespaceURI [prefix] (get mapping prefix))
      (getPrefix [uri] (first (prefixes uri)))
      (getPrefixes [uri] (.iterator (prefixes uri))))))

      Delete
  9. I find this article http://www.ibm.com/developerworks/xml/library/x-nmspccontext/index.html?ca=drs- from IBM very helpful. It provide 3 approaches:
    1) hard coded solution. Implement the NamespaceContext interface, and hard code the mapping in the code. Only works for the xml you are targeting.
    2) read namespaces from the document. use Document.lookupNamespaceURI(String prefix) and Document.lookupPrefix(String namespaceURI). Works for all xml files but need to lookup each time an xpath is evaluated.

    3) Read the namespaces from the document and cache them. Only lookup namespaces once in the constructor, then cache the namespaces.

    ReplyDelete

All comments are moderated