XML Notes
XML Notes
XML Notes
XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
XML tags are not predefined. You must "invent" your own tags.
The tags used to mark up HTML documents and the structure of HTML documents are
predefined. The author of HTML documents can only use tags that are defined in the
HTML standard (like <p>, <h1>, etc.).
XML allows the author to define his own tags and his own document structure.
XML HTML
User definable tags Defined set of tags designed for
web display
Content driven Format driven
End tags required for well formed End tags not required
documents
Quotes required around attributes Quotes not required
values
Slash required in empty tags Slash not required
XML Does not DO Anything
Maybe it is a little hard to understand, but XML does not DO anything. XML was
created to structure, store and to send information.
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The note has a header and a message body. It also has sender and receiver information.
But still, this XML document does not DO anything. It is just pure information wrapped
in XML tags. Someone must write a piece of software to send, receive or display it.
DOM
It defines an interface that enables programs to access and update the style, structure, and
contents of XML documents.
The Document Object Model (DOM) is an interface specification maintained by the W3C
DOM Workgroup that defines an application independent mechanism to access, parse, or
update XML data. i.e. it is a hierarchical model that allows developers to manipulate
XML documents easily
When we parse an XML document with a DOM parser, we get back a tree structure that
contains all of the elements of the document. The DOM provides a variety of functions
that can be used to examine the contents and structure of the document.
//create a comment
Comment comment = doc.createComment("This is comment");
//add in the root element
root.appendChild(comment);
SAX
SAX provides a mechanism for reading data from an XML document.
XMLReader xr = XMLReaderFactory.createXMLReader();
A SAX Parser functions as a stream parser, with an event-driven API. The user defines a
number of callback methods that will be called when events occur during parsing.
The SAX events include:
Events are fired when each of these XML features are encountered, and again when the
end of them is encountered. XML attributes are provided as part of the data passed to
element events.
SAX parsing is unidirectional; previously parsed data cannot be re-read without starting
the parsing operation again.
Benefits
SAX parsers have certain benefits over DOM-style parsers. The quantity of memory that
a SAX parser must use in order to function is typically much smaller than that of a DOM
parser. DOM parsers must have the entire tree in memory before any processing can
begin, so the amount of memory used by a DOM parser depends entirely on the size of
the input data. The memory footprint of a SAX parser, by contrast, is based only on the
maximum depth of the XML file (the maximum depth of the XML tree) and the
maximum data stored in XML attributes on a single XML element. Both of these are
always smaller than the size of the parsed tree itself.
Because of the event-driven nature of SAX, processing documents can often be faster
than DOM-style parsers. Memory allocation takes time, so the larger memory footprint of
DOM is also a performance issue.
Due to the nature of SAX, streamed reading from disk is possible. Processing XML
documents that could never fit into memory is only possible through the use of a SAX
parser (or another kind of stream XML parser).
Drawbacks
The event-driven model of SAX is useful for XML parsing, but it does have certain
drawbacks.
Certain kinds of XML validation requires access to the document in full. For example, a
DTD IDREF attribute requires that there be an element in the document that uses the
given string as a DTD ID attribute. To validate this in a SAX parser, one would need to
keep track of every previously encountered ID attribute and every previously encountered
IDREF attribute, to see if any matches are made. Furthermore, if an IDREF does not
match an ID, the user only discovers this after the document has been parsed; if this
linkage was important to the building functioning output, then time has been wasted in
processing the entire document only to throw it away.
XSLT
xsl:template to match the appropriate XML element, xsl:value-of to select the attribute
value, and the optional xsl:apply-templates to continue processing the document.
<xsl:template match="element-name">
Attribute Value:
<xsl:value-of select="@attribute"/>
<xsl:apply-templates/>
</xsl:template>
SOAP
SOAP (Simple Object Access Protocol) is a protocol for exchanging XML-based
messages over computer networks, normally using HTTP. SOAP forms the foundation
layer of the Web services stack, providing a basic messaging framework that more
abstract layers can build on.
SOAP uses XML to define a protocol for the exchange of information in distributed
computing environments.
SOAP consists of three components:
• an envelope,
• a set of encoding rules, and
• a convention for representing remote procedure calls.
How would you build a search engine for large volumes of XML data?
The way candidates answer this question may provide insight into their view of XML
data. For those who view XML primarily as a way to denote structure for text files, a
common answer is to build a full-text search and handle the data similarly to the way
Internet portals handle HTML pages. Others consider XML as a standard way of
transferring structured data between disparate systems. These candidates often describe
some scheme of importing XML into a relational or object database and relying on the
database's engine for searching. Lastly, candidates that have worked with vendors
specializing in this area often say that the best way the handle this situation is to use a
third party software package optimized for XML data.