5 XML (Unit 2)
5 XML (Unit 2)
5 XML (Unit 2)
TECHNOLOGY
Contents
XML Introduction
Differences between HTML and XML
XML Related Technologies
XML Attributes and Comments
XML Validation
DTD and XSD
CSS and XSLT
CDATA vs PCDATA
XML Parsers: SAX and DOM
DHTML
Difference between HTML and DHTML
Extensible Markup Language (XML) is used to describe the data.
The XML standard is a flexible way to create information formats and
electronically share structured data via the public Internet, as well as
via corporate networks.
XML
XML stands for eXtensible Markup Language and it is used for storing and transferring data
We can use it to take data from a program like Microsoft SQL, convert it into XML then share that XML with other
programs and platforms.
- Can communicate between two platforms which are generally very difficult.
- The main thing which makes XML truly powerful is its international acceptance.
- Many corporation use XML interfaces for databases, programming, office application mobile phones and
more. It is
due to its platform independent feature.
Features and Advantages of XML
XML is widely used in the era of web development. It is also used to simplify data storage and data sharing.
Computer systems and databases contain data in incompatible formats while XML data is stored in plain text format. This
provides a software- and hardware-independent way of storing and sharing data by different applications.
Disadvantages of XML
Cons/drawback of using XML:
4 HTML is static because it is used to display data. XML is dynamic because it is used to transport data.
6 HTML has its own predefined tags. We can define tags according to our need.
7 HTML does not preserve white space. XML preserves white space.
4) XQuery XML query language It is a XML based language which is used to query XML based data.
5) DTD Document type definition It is an standard which is used to define the legal elements in an XML document.
6) XSD XML schema definition It is an XML based alternative to dtd. It is used to describe the structure of an XML
document.
7) SOAP Simple object access protocol It is an acronym stands simple object access protocol. It is XML based protocol to let
applications exchange information over http. in simple words you can say that it is
protocol used for accessing web services.
8) WSDL web services description languages It is an XML based language to describe web services. It also describes the
functionality offered by a web service.
9) RSS Really simple syndication RSS is a XML-based format to handle web content syndication. It is used for fast
browsing for news and updates. It is generally used for news like sites.
(A) XML Example
XML documents create a hierarchical structure looks like a tree so it is known as XML Tree that starts at "the root" & branches to "the leaves".
Example of XML Document: XML documents uses a self-describing and simple syntax:
<?xml version="1.0" encoding="ISO-8859-1"?> // XML declaration. It defines the XML version (1.0) and the encoding used (ISO-8859-1 = Latin-1/West
European
character set).
<note> //root element of the document
<to>Abhi</to>
<from>Avi</from>
//4 lines describe 4 child elements of the root (to, from, heading, & body).
<heading>Wishes</heading>
<body>Have a great life ahead!</body>
</note> // defines the end of the root element
• XML documents must contain a root element and it is "the parent" of all other elements.
• Elements in an XML document form a document tree.
• The tree starts at the root and branches to the lowest level of the tree. <root>
<child>
• All elements can have sub elements (child elements). <subchild>.....</subchild>
• The terms parent, child, and sibling are used to describe the relationships between elements. </child>
</root>
• Parent elements have children. Children on the same level are called siblings (brothers or sisters).
• All elements can have text content and attributes (just like in HTML).
(B) Example of XML: Books
TechBooks.xml
Data can be stored in attributes or in child elements. But there are some limitations in using attributes, over child elements.
XML comments are just like HTML comments. Although XML is known as self-describing data but sometimes XML
comments are necessary.
Syntax :
An XML comment should be written as: <!-- Write your comment-->
Don't use a comment before an XML declaration.
Comment can be used anywhere in XML document except within attribute value.
<?xml version="1.0"?>
<college>
<student>
<firstname>Anu</firstname>
<lastname>Bhatt</lastname>
<contact>07899044992</contact>
<email>[email protected]</email>
<address>
<city>Haldwani</city>
<state>Uttarakhand</state>
<pin>201206</pin>
</address>
</student>
</college> Line 1: XML declaration( defines the XML version 1.0.
Line 2: Root element (college).
Line 3: Inside root element, there is one more
element: student
<student> : contains 5 branches-
<firstname>, <lastname>, <contact>,
XML Tree Rules
Used to represent the relationship of the elements. It shows if an element is a child or a parent of the other element.
(b)Ancestors: The containing element which contains other elements is called "Ancestor" of other element.
In the figure below- Root element (College) is ancestor of all other elements.
XML Validation
- XML document ( a well- formed) can be validated against DTD or Schema.
- A well-formed XML document (valid XML document) is an XML document with correct syntax.
Main purpose:
To define the structure of an XML document. It contains a list of legal elements and attributes
DTD and XML schema both are used to form a well formed XML document.
An XML document is called "well-formed" if it contains the correct syntax.
A well-formed and valid XML document is one which have been validated against DTD.
XML schema
Description of DTD
<!DOCTYPE employee : root element of the document is employee.
<!ELEMENT employee: employee element contains 3 elements "firstname, lastname and email".
An ampersand (&)
An entity name
A semicolon (;)
// Here, sm is an entity , used inside the author element. In such case, it will print the value of sm entity that is "Seema Maitrey"
XML CSS Example xml file using CSS and DTD : employee.xml
cssemployee.css
DTD file : employee.dtd
employee <?xml version="1.0"?>
{ <!ELEMENT employee (firstname,lastname,email)> <?xml-stylesheet type="text/css" href=“cssemployee.css"?>
background-color: pink; <!ELEMENT firstname (#PCDATA)> <!DOCTYPE employee SYSTEM "employee.dtd">
} <!ELEMENT lastname (#PCDATA)> <employee>
firstname,lastname,email <!ELEMENT email (#PCDATA)>
{ <firstname>Aviral</firstname>
font-size:25px; <lastname>Maitreyl</lastname>
display:block;
<email>[email protected]</email>
color: blue;
margin-left: 50px; </employee>
}
CSS is not generally used to format XML file. W3C recommends XSLT instead of CSS.
XML Schema – XSD (XML Schema Definition)
The XML Schema language is also referred to as XML Schema Definition (XSD).
XSD is used to define the possible structure and contents of an XML format.
A validating parser can then check whether an XML instance document conforms to an XSD schema or a set of schemas.
• Similar to DTD, XML Schema is also used to check whether the given XML document is “well formed” and “valid”.
• XML schema is an alternative to DTD.
• An XML document is considered “well formed” and “valid” if it is successfully validated against XML Schema.
• The extension of Schema file is .xsd.
1) DTD stands for Document Type Definition. XSD stands for XML Schema Definition.
5) Doesn't define order for child elements. Defines order for child elements.
OP:
Example:
<?xml version="1.0"?> As the CDATA is used just after the element employee to make
<!DOCTYPE employee SYSTEM "employee.dtd"> the data/text unparsed, so it will give the output :
<employee>
<![CDATA[ <employee>
<firstname>Abhi</firstname> <![CDATA[ <firstname>Abhi</firstname>
<lastname>Maitrey</lastname> <lastname>Maitrey</lastname>
<email>[email protected]</email> <email>[email protected]</email> ]]>
]]> </employee>
</employee>
PCDATA
: PCDATA: (Parsed Character Data): PCDATA is the text that will be parsed by a parser.
Tags inside the PCDATA will be treated as markup and entities will be expanded.
Example:
OP
:
<?xml version="1.0"?> As the employee element contains 3 more elements 'firstname',
<!DOCTYPE employee SYSTEM 'lastname', and 'email', so it parses further to get the data/text
"employee.dtd"> of firstname, lastname and email to give the output as :
<employee>
<firstname>Abhi</firstname>
<lastname>Maitrey</lastname>
<email>[email protected]</email>
</employee> <employee>
<firstname>Abhi</firstname>
<lastname>Maitrey</lastname>
<email>[email protected]</email>
</employee>
XML Entities
In simple terms, entities are a way of representing special characters. Entities are also known as entity references.
• For example, the < and > symbols a used for tags. You cannot directly type from the keyboard for less than and greater
than signs. Instead, you need to use entities.
Following table shows some of the popular XML entities. Character Description Entity Name Usage
" Quotation mark (double quote) quot "
& Ampersand amp &
' Apostrophe (single quote) apos '
< Less than sign lt <
> Greater than sign gt >
Example:
<friend>
<name>My friends are Vinny & Anu.</name>
</friend>
XML Parsers
- It is a software library or package that provides interfaces for client applications to work with an XML document.
- The XML Parser is designed to read the XML and create a way for programs to use XML.
- XML parser validates the document and check that the document is well formatted.
Advantages
1) It supports both read and write operations and the API is very simple to use.
2) It is preferred when random access to widely separated parts of a document is required.
Disadvantages
1) It is memory inefficient. (consumes more memory because the whole XML document needs to loaded into memory).
2) It is comparatively slower than other parsers.
For example, consider this table, The DOM represents this table like this:
taken from an HTML document:
<TABLE>
<ROWS>
<TR>
<TD>A</TD>
<TD>B</TD>
</TR>
<TR>
<TD>C</TD>
<TD>D</TD>
</TR>
</ROWS>
</TABLE>
SAX (Simple API for XML)
A SAX Parser implements SAX API. This API is an event based API and less intuitive.
Advantages
1) It is simple and memory efficient.
2) It is very fast and works for huge documents.
Disadvantages
3) It is event-based so its API is less intuitive.
4) Clients never know the full information because the data is broken into pieces.
S.NO. SAX PARSER DOM PARSER
01. It is called a Simple API for XML Parsing. It is called as Document Object Model.
03. SAX Parser is slower than DOM Parser. DOM Parser is faster than SAX Parser.
04. Best for the larger sizes of files. Best for the smaller size of files.
05. The internal structure can not be created by SAX Parser. The internal structure can be created by DOM Parser.
07. In the SAX parser backward navigation is not possible. In DOM parser backward and forward search is possible
08. Suitable for efficient memory. Suitable for large XML document.
09. A small part of the XML file is only loaded in memory. It loads whole XML documents in memory.
XSL and XSLT
• XSL stands for EXtensible Stylesheet Language. It is a styling language for XML just like CSS is a styling language for HTML.
• XSLT stands for XSL Transformation. It is used to transform XML documents into other formats (like transforming XML into
HTML).
• World Wide Web Consortium (W3C) developed XSL to understand and style an XML document, which can act as XML based
Stylesheet Language.
• An XSL document specifies how a browser should render an XML document.
• XSLT: It is a language for transforming XML documents into various other types of documents.
• XPath: It is a language for navigating in XML documents.
• XQuery: It is a language for querying XML documents.
• XSL-FO: It is a language for formatting XML documents.
Working of XSLT:
Image representation:
Advantage of XSLT
• Provides an easy way to merge XML data into presentation because it applies user defined transformations to an XML
document and the output can be HTML, XML, or any other structured document.
• Provides Xpath to locate elements/attribute within an XML document. So it is more convenient way to traverse an XML
document rather than a traditional way, by using scripting language.
• It is template based. So it is more resilient to changes in documents than low level DOM and SAX.
• By using XML and XSLT, the application UI script will look clean and will be easier to maintain.
• XSLT can be used as a validation language as it uses tree-pattern-matching approach.
• We can change the output simply modifying the transformations in XSL files.
XSLT <xsl:for-each> Element
The XSLT <xsl:for-each> element is used to apply a template repeatedly for each node.
<xsl:for-each
select = Expression>
</xsl:for-each>
creates a table of <employee> element with its attribute "id" and its child
<firstname>,<lastname><nickname> and <salary> by iterating over each employee.
Employee.xsl
Employee.xml
<?xml version = "1.0" encoding = "UTF-8"?>
<xsl:stylesheet version = "1.0"
<?xml version = "1.0"?> xmlns:xsl = "http://www.w3.org/1999/XSL/Transform">
<?xml-stylesheet type = "text/xsl" href = "employee.xsl"?> <xsl:template match = "/">
<class> <html>
<body>
<employee id = "001"> <h2>Employee</h2>
<firstname>Aryan</firstname> <table border = "1">
<lastname>Gupta</lastname> <tr bgcolor = "pink">
<th>ID</th>
<nickname>Raju</nickname> <th>First Name</th>
<salary>30000</salary> <th>Last Name</th>
</employee> <th>Nick Name</th>
<th>Salary</th>
<employee id = "024"> </tr>
<firstname>Sara</firstname> <!-- for-each processing instruction
<lastname>Khan</lastname> Looks for each element matching the XPath expression
-->
<nickname>Zoya</nickname> <xsl:for-each select="class/employee">
<salary>25000</salary> <tr>
</employee> <td>
<!-- value-of processing instruction
<employee id = "056"> process the value of the element matching the XPath expression
<firstname>Peter</firstname> -->
<lastname>Symon</lastname> <xsl:value-of select = "@id"/>
</td>
<nickname>John</nickname> <td><xsl:value-of select = "firstname"/></td>
<salary>10000</salary> <td><xsl:value-of select = "lastname"/></td>
</employee> <td><xsl:value-of select = "nickname"/></td>
<td><xsl:value-of select = "salary"/></td>
</class> </tr>
</xsl:for-each>
</table> </body> </html> </xsl:template>
</xsl:stylesheet>
DHTML
• DHTML stands for Dynamic Hypertext Markup language i.e., Dynamic HTML.
• Dynamic HTML is not a markup or programming language but it is a term that combines the features of various web
development technologies for creating the web pages dynamic and interactive.
• The DHTML application was introduced by Microsoft with the release of the 4th version of IE (Internet Explorer) in 1997.
1. HTML 4.0
2. CSS
3. JavaScript
4. DOM.
Uses of DHTML
• It is used for designing the animated and interactive web pages that are developed in real-time.
• DHTML helps users by animating the text and images in their documents.
• It allows the authors for adding the effects on their pages.
• It also allows the page authors for including the drop-down menus or rollover buttons.
• This term is also used to create various browser-based action games.
• It is also used to add the ticker on various websites, which needs to refresh their content automatically.
Features of DHTML
• Its simplest and main feature is that we can create the web page dynamically.
• Dynamic Style is a feature, that allows the users to alter the font, size, color, and content of a web page.
• It provides the facility for using the events, methods, and properties. And, also provides the feature of code
reusability.
• It also provides the feature in browsers for data binding.
• With the help of DHTML, users can easily change the tags and their properties.
• The web page functionality is enhanced because the DHTML uses low-bandwidth effect.
Difference between HTML and DHTML
1. HTML is simply a markup language. 1. DHTML is not a language, but it is a set of technologies
of web development.
2. It is used for developing and creating web pages. 2. It is used for creating and designing the animated
and interactive web sites or pages.
3. This markup language creates static web pages. 3. This concept creates dynamic web pages.
4. It does not contain any server-side scripting code. 4. It may contain the code of server-side scripting.
5. The files of HTML are stored with the .html or .htm 5. The files of DHTML are stored with the .dhtm extension in
extension in a system. a system.
6. A simple page which is created by a user without using 6. A page which is created by a user using the HTML,
the scripts or styles called as an HTML page. CSS, DOM, and JavaScript technologies called a DHTML
page.
7. This markup language does not need database connectivity. 7. This concept needs database connectivity because it interacts
with users.