XML: Document StructureAn XML document has an hierarchical structure with two sections: Prolog and Body. The XML document starts with an optional Prolog, which constitutes of XML declaration statement (<?xml version...), Processing Instructions, Document Type Declaration and Comments. We'll look at Processing Instructions (PIs) and Document Type Declarations later. Comments in XML document are exactly same as in HTML. Comments are used for some useful description/inline documentation. Comments begin with a <!-- and end with a -->.If an XML document has <?xml version="1.0" ...?> declaration statement, it should be the fist line in the document, without even space preceeding it. The XML declaration statement identifies the current document as an XML document, and has three attributes version (compulsary), encoding and standalone (both are optional). XML documents are text documents. Here "text" does not refer to ASCII-7, rather to Unicode. The use of Unicode allows XML documents to be created in international languages. The encoding attribute in XML declaration identifies which encoding is used to represent the characters in the document. In above example, the XML declaration statement only defines the compulsory version attribute, which for now can have only 1.0 as a value. The standalone attribute can have value as either "yes" or "no" telling the processor if the XML document is dependent on any other external references or no. Here is one example of XML declaration statement: <?xml version="1.0" encoding="UTF-8" standalone="no" ?> Let's now move our focus to the rest of XML document, the body. As described earlier, XML documents have hierarchical (tree-like) structure. XML documents have one (and always only one) root element, also called as document element. All other elements are children to this element. In above example, BankAccount is the document element, with Number, Name, etc as child elements. Elements are delimited by an start tag (ex: <Type>) and an end tag (</Type>). In HTML it is not necessary to have end tags (browser take care of that), but XML is strict here and needs all tags to be properly closed. Also tags should be properly nested (<A><B><C></C></B></A> is legal, but <A><B><C></B></C></A> is not.). No overlapping tags are allowed. And once again, remember that, XML documents are case-sensitive. Elements can also have attributes, which are essentially name-value pairs inside start tag (ex: <Student RollNo="2">). XML requires all attributes to be encloded within quotation marks ' or ". There are some (actually five) characters that have special meaning. They are <, >, ', ", and &. If you wish to use any of these directly (and not for markup), you should escape them using <, >, ',", and & respectively. If an XML document adheres to above rules, it's called an "Well Formed" XML document. Document Type Definition or DTD can be used to describe the structure of an XML document. It let's you specify what should be document element, parent-children relationship, element attributes and their default values, etc. An XML document can refer to an external DTD and/or can have it inline. If an XML document is Well-formed and also refers to an DTD, it is said to be a "Valid XML" document. Alright, now we have data stored in XML document, what next? If you were to navigate or manipulate this data, you can use what is called as XML Parsers. |