SOA / Web Services / Java

A Technology Blog

What XML parser to use?

Posted by Vivek on February 3, 2009

XML, being the lingua franca of web services, has found its use in many applications mostly because of its platform agnostic feature. Applications using XML and XML-based artifacts like SOAP, WSDL etc. tremendously require traversal, customization and transformation of these artifacts to suit its needs. This is the reason why we see parsing and binding (marshalling and unmarshalling) of XML documents at various critical points in an application. One important thing to note is that XML is verbose and leads to various performance issues when processed, especially in complex environments. Over the years, we have seen use of DOM (Document Object Model) and SAX (Simple API for XML) parsers but these parsers have their own limitations and require trade-offs in some situations. For example,  

DOM requires in-memory representation of the XML document, which exerts unnecessary pressure on system resources and is thus considered impractical for large documents. Also, DOM APIs introduces complexity while processing XML nodes.

Other alternatives or parsers that use tree-based APIs are dom4j, JDOM and XOM. While dom4j document object model is fast and memory-efficient and offers great extensibility, JDOM is mostly known for its ease of use. The XOM document object model protects users from common mistakes in the use of XML, while offering good performance and memory efficiency. 

SAX, on the other hand, is inappropriate when reordering or cross-referencing of XML nodes is desired. 

WoodStox is an XML processor that combines the best of both the worlds. In other words, it is easy to use or convenient like DOM and efficient like SAX. It is an open source implementation of StAX pull parser standard. It, however, encounters performance issues like Xerces2, the Apache Implementation of the W3C document object model standard, when dealing with small documents, for eg. SOAP. 

To improve the performance of such widely-used XML parsers, Apache introduced a new parser. This new parser, AXIs Object Model (AXIOM), is considered not just another object model but a very useful and performance enhancing parser for CPU and Memory intensive applications. AXIOM tries to achieve its objective of being a memory-efficient parser by deferring creation of XML tree when not required. AXIOM is StAX (Streaming Api for XML) based object model and it implements “pull-parsing” methodology, unlike SAX and DOM. It also has built in support for XML Optimized Packaging (XOP) and MTOM, the combination of which allows XML to carry binary data efficiently and in a transparent manner. Another important feature supported by AXIOM is that it provides APIs to parse SOAP (Simple Object Access Protocol) documents effectively. When the APIs are already defined, complexity can be easily avoided and performance overhead can be easily tackled. Apache’s SOAP engine Axis2, thus, has provision for AXIOM. AXIOM’s ability to support various binding mechanisms helps it optimize the marshalling and unmarshalling of XML fragments and hence, results in improved performance.

 

While AXIOM has pre-defined APIs for parsing a SOAP message, it should also be possible to define APIs that can extract elements from a WSDL (Web Services Description Language) file.

 

Advertisements

2 Responses to “What XML parser to use?”

  1. barriers said

    you might also want to look at vtd-xml, the latest and most advanced XML processing API available today

    http://vtd-xml.sf.net

    Like

  2. jimmyzhang said

    you may also want to check out vtd-xml, the latest and most advanced xml processing model

    vtd-xml

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: