XML: Parsers

XML Parsers, also called as XML Processors provide you with an standard API to navigate and manipulate XML documents. In other words, if you have some XML documents, and need to get data out of it or change it, or modify the XML document structure, you don't have to write code to load and parse the XML files, rather you can use XML Parser, which'll load the document and give you access to it as objects. For a complete list of available parsers, check our Software section.

Some parsers load an entire XML document into the memory and provide you access to the document as nodes and elements, they are called DOM-based parsers. Now, if you have very large XML files, this method is not preferred since they'll need a lot more memory/resources (because they load the entire document as tree in memory), and also it's not possible to cancel once the load is started. There is another kind of standard API, known as SAX or Simple API for XML, which is based on events model. The SAX-based parsers, do not load the entire document at once, but load the document sequentially and send events to the application and it's up to the application to how to respond to document load events.

Coming back to my original problem with "account statements" discussed in the beginning of the article, my credit unions and bank agreed to return the data in XML format when I logon to their Web sites from my Visual Basic application. The credit unions and bank defined XML structure/schema and promised that they'll not change the structure of XML documents. Now only thing I had to do was load those document using XML parser and get to the data directly, which was very easy (just 5-6 lines of Visual Basic code). Only problem now is all three XML documents have different structure and hence I'll have to write code to deal with three different XML documents (having similar data). There was an easy solution to this: XSLT.