Unit-2 XML
Unit-2 XML
Unit-2 XML
Introduction to XML
o Xml (eXtensible Markup Language) is a markup language.
o XML is designed to store and transport data.
o Xml was released in late 90’s. it was created to provide an easy to use and store self
describing data.
o XML became a W3C Recommendation on February 10, 1998.
o XML is not a replacement for HTML.
o XML is designed to be self-descriptive.
o XML is designed to carry data, not to display data.
o XML tags are not predefined. You must define your own tags.
o XML is platform independent and language independent.
Why xml
Platform Independent and Language Independent: The main benefit of xml is that you can
use it to take data from a program like Microsoft SQL, convert it into XML then share that XML
with other programs and platforms. You can communicate between two platforms which are
generally very difficult.
The main thing which makes XML truly powerful is its international acceptance. Many
corporation use XML interfaces for databases, programming, office application mobile phones
and more. It is due to its platform independent feature.
Features and Advantages of XML
XML is widely used in the era of web development. It is also used to simplify data storage and
data sharing.
The main features or advantages of XML are given below.
1) XML separates data from HTML
If you need to display dynamic data in your HTML document, it will take a lot of work to edit
the HTML each time the data changes.
With XML, data can be stored in separate XML files. This way you can focus on using
HTML/CSS for display and layout, and be sure that changes in the underlying data will not
require any changes to the HTML.
With a few lines of JavaScript code, you can read an external XML file and update the data
content of your web page.
2) XML simplifies data sharing
In the real world, computer systems and databases contain data in incompatible formats.
XML data is stored in plain text format. This provides a software- and hardware-independent
way of storing data.
This makes it much easier to create data that can be shared by different applications.
3) XML simplifies data transport
One of the most time-consuming challenges for developers is to exchange data between
incompatible systems over the Internet.
Exchanging data as XML greatly reduces this complexity, since the data can be read by different
incompatible applications.
4) XML simplifies Platform change
Upgrading to new systems (hardware or software platforms), is always time consuming. Large
amounts of data must be converted and incompatible data is often lost.
XML data is stored in text format. This makes it easier to expand or upgrade to new operating
systems, new applications, or new browsers, without losing data.
5) XML increases data availability
Different applications can access your data, not only in HTML pages, but also from XML data
sources.
With XML, your data can be available to all kinds of "reading machines" (Handheld computers,
voice machines, news feeds, etc), and make it more available for blind people, or people with
other disabilities.
6) XML can be used to create new internet languages
A lot of new Internet languages are created with XML.
Here are some examples:
o XHTML
o WSDL for describing available web services
o WAP and WML as markup languages for handheld devices
o RSS languages for news feeds
o RDF and OWL for describing resources and ontology
o SMIL for describing multimedia for the web
There are three important characteristics of XML that make it useful in a variety of systems and
solutions −
• XML is extensible − XML allows you to create your own self-descriptive tags, or language,
that suits your application.
• XML carries the data, does not present it − XML allows you to store the data irrespective
of how it will be presented.
• XML is a public standard − XML was developed by an organization called the World
Wide Web Consortium (W3C) and is available as an open standard.
A programming language consists of grammar rules and its own vocabulary which is used to
create computer programs. These programs instruct the computer to perform specific tasks.
XML does not qualify to be a programming language as it does not perform any computation or
algorithms. It is usually stored in a simple text file and is processed by special software that is
capable of interpreting XML.
XML key components
The following diagram depicts the syntax rules to write different types of markup and text in an
XML document.
XML Declaration
The XML document can optionally have an XML declaration. It is written as follows −
<?xml version = "1.0" encoding = "UTF-8"?>
Where version is the XML version and encoding specifies the character encoding used in the
document.
Syntax Rules for XML Declaration
• The XML declaration is case sensitive and must begin with "<?xml>" where "xml" is
written in lower-case.
• If document contains XML declaration, then it strictly needs to be the first statement of the
XML document.
• The XML declaration strictly needs be the first statement in the XML document.
• An HTTP protocol can override the value of encoding that you put in the XML declaration.
Tags and Elements
An XML file is structured by several XML-elements, also called XML-nodes or XML-tags. The
names of XML-elements are enclosed in triangular brackets < > as shown below −
<element>
Syntax Rules for Tags and Elements
Element Syntax − Each XML-element needs to be closed either with start or with end elements
as shown below −
<element>....</element>
or in simple-cases, just this way −
<element/>
Nesting of Elements − An XML-element can contain multiple XML-elements as its children,
but the children elements must not overlap. i.e., an end tag of an element must have the same
name as that of the most recent unmatched start tag.
The Following example shows incorrect nested tags −
<?xml version = "1.0"?>
<contact-info>
<company>TutorialsPoint
</contact-info>
</company>
The Following example shows correct nested tags −
<?xml version = "1.0"?>
<contact-info>
<company>TutorialsPoint</company>
<contact-info>
Root Element − An XML document can have only one root element. For example, following is
not a correct XML document, because both the x and y elements occur at the top level without a
root element −
<x>...</x>
<y>...</y>
The Following example shows a correctly formed XML document −
<root>
<x>...</x>
<y>...</y>
</root>
XML References
References usually allow you to add or include additional text or markup in an XML document.
References always begin with the symbol "&" which is a reserved character and end with the
symbol ";". XML has two types of references −
• Entity References − An entity reference contains a name between the start and the end
delimiters. For example & where amp is name. The name refers to a predefined string
of text and/or markup.
• Character References − These contain references, such as A, contains a hash mark
(“#”) followed by a number. The number always refers to the Unicode code of a character.
In this case, 65 refers to alphabet "A".
XML Text
The names of XML-elements and XML-attributes are case-sensitive, which means the name of
start and end elements need to be written in the same case. To avoid character encoding
problems, all XML files should be saved as Unicode UTF-8 or UTF-16 files.
Whitespace characters like blanks, tabs and line-breaks between XML-elements and between the
XML-attributes will be ignored.
XML DTD
The XML Document Type Declaration, commonly known as DTD, is a way to describe XML
language precisely. DTDs check vocabulary and validity of the structure of XML documents
against grammatical rules of appropriate XML language.
An XML DTD can be either specified inside the document, or it can be kept in a separate
document and then liked separately.
Syntax
Basic syntax of a DTD is as follows −
<!DOCTYPE element DTD identifier
[
declaration1
declaration2
........
]>
In the above syntax,
• The DTD starts with <!DOCTYPE delimiter.
• An element tells the parser to parse the document from the specified root element.
• DTD identifier is an identifier for the document type definition, which may be the path to a
file on the system or URL to a file on the internet. If the DTD is pointing to external path, it
is called External Subset.
• The square brackets [ ] enclose an optional list of entity declarations called Internal Subset.
1. Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML files. To refer
it as internal DTD, standalone attribute in XML declaration must be set to yes. This means, the
declaration works independent of an external source.
Syntax
Following is the syntax of internal DTD −
<!DOCTYPE root-element [element-declarations]>
where root-element is the name of root element and element-declarations is where you declare
the elements.
Example
Following is a simple example of internal DTD −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
Let us go through the above code −
Start Declaration − Begin the XML declaration with the following statement.
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
DTD − Immediately after the XML header, the document type declaration follows, commonly
referred to as the DOCTYPE −
<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The
DOCTYPE informs the parser that a DTD is associated with this XML document.
DTD Body − The DOCTYPE declaration is followed by body of the DTD, where you declare
elements, attributes, entities, and notations.
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>
Several elements are declared here that make up the vocabulary of the <name> document.
<!ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA". Here
#PCDATA means parse-able text data.
End Declaration − finally, the declaration section of the DTD is closed using a closing bracket
and a closing angle bracket (]>). This effectively ends the definition, and thereafter, the XML
document follows immediately.
Rules
• The document type declaration must appear at the start of the document (preceded only by
the XML header) − it is not permitted anywhere else within the document.
• Similar to the DOCTYPE declaration, the element declarations must start with an
exclamation mark.
• The Name in the document type declaration must match the element type of the root
element.
2. External DTD
In external DTD elements are declared outside the XML file. They are accessed by specifying
the system attributes which may be either the legal .dtd file or a valid URL. To refer it as
external DTD, standalone attribute in the XML declaration must be set as no. This means,
declaration includes information from the external source.
Syntax
Following is the syntax for external DTD −
<!DOCTYPE root-element SYSTEM "file-name">
where file-name is the file with .dtd extension.
Example
The following example shows external DTD usage −
<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>
<!DOCTYPE address SYSTEM "address.dtd">
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
The content of the DTD file address.dtd is as shown −
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
Types
You can refer to an external DTD by using either system identifiers or public identifiers.
➢ System Identifiers
A system identifier enables you to specify the location of an external file containing DTD
declarations. Syntax is as follows −
<!DOCTYPE name SYSTEM "address.dtd" [...]>
As you can see, it contains keyword SYSTEM and a URI reference pointing to the location of
the document.
➢ Public Identifiers
Public identifiers provide a mechanism to locate DTD resources and is written as follows −
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">
As you can see, it begins with keyword PUBLIC, followed by a specialized identifier. Public
identifiers are used to identify an entry in a catalog. Public identifiers can follow any format,
however, a commonly used format is called Formal Public Identifiers, or FPIs.
XML Schema
XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe
and validate the structure and the content of XML data. XML schema defines the elements,
attributes and data types. Schema element supports Namespaces. It is similar to a database
schema that describes the data in a database.
Syntax
You need to declare a schema in your XML document as follows −
Example
The following example shows how to use schema −
<?xml version = "1.0" encoding = "UTF-8"?>
<xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema">
<xs:element name = "contact">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
<xs:element name = "phone" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The basic idea behind XML Schemas is that they describe the legitimate format that an XML
document can take.
2. Elements
The elements are the building blocks of XML document. An element can be defined within an
XSD as follows −
<xs:element name = "x" type = "y"/>
3. Definition/ Data Types
You can define XML schema elements in the following ways −
a. Simple Type
Simple type element is used only in the context of the text. Some of the predefined simple types
are: xs:integer, xs:boolean, xs:string, xs:date. For example −
<xs:element name = "phone_number" type = "xs:int" />
b. Complex Type
A complex type is a container for other element definitions. This allows you to specify which
child elements an element can contain and to provide some structure within your XML
documents. For example −
<xs:element name = "Address">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
<xs:element name = "phone" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
In the above example, Address element consists of child elements. This is a container for
other <xs:element> definitions, that allows to build a simple hierarchy of elements in the XML
document.
c. Global Types
With the global type, you can define a single type in your document, which can be used by all
other references. For example, suppose you want to generalize the person and company for
different addresses of the company. In such case, you can define a general type as follows −
<xs:element name = "AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
Now let us use this type in our example as follows −
<xs:element name = "Address1">
<xs:complexType>
<xs:sequence>
<xs:element name = "address" type = "AddressType" />
<xs:element name = "phone1" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name = "Address2">
<xs:complexType>
<xs:sequence>
<xs:element name = "address" type = "AddressType" />
<xs:element name = "phone2" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
Instead of having to define the name and the company twice (once for Address1 and once
for Address2), we now have a single definition. This makes maintenance simpler, i.e., if you
decide to add "Postcode" elements to the address, you need to add them at just one place.
4. Attributes
Attributes in XSD provide extra information within an element. Attributes
have name and type property as shown below −
<xs:attribute name = "x" type = "y"/>
An XML document is always descriptive. The tree structure is often referred to as XML
Tree and plays an important role to describe any XML document easily.
The tree structure contains root (parent) elements, child elements and so on. By using tree
structure, you can get to know all succeeding branches and sub-branches starting from the root.
The parsing starts at the root, then moves down the first branch to an element, take the first
branch from there, and so on to the leaf nodes.
Example
Following example demonstrates simple XML tree structure −
<?xml version = "1.0"?>
<Company>
<Employee>
<FirstName>Tanmay</FirstName>
<LastName>Patil</LastName>
<ContactNo>1234567890</ContactNo>
<Email>[email protected]</Email>
<Address>
<City>Bangalore</City>
<State>Karnataka</State>
<Zip>560212</Zip>
</Address>
</Employee>
</Company>
Following tree structure represents the above XML document −
In the above diagram, there is a root element named as <company>. Inside that, there is one
more element <Employee>. Inside the employee element, there are five branches named
<FirstName>, <LastName>, <ContactNo>, <Email>, and <Address>. Inside the <Address>
element, there are three sub-branches, named <City> <State> and <Zip>.
➢ XSL
Before learning XSLT, we should first understand XSL which stands for
Extensible Stylesheet Language. It is similar to XML as CSS is to HTML.
✓ In case of HTML document, tags are predefined such as table, div, and span; and the
browser knows how to add style to them and display those using CSS styles. But in case
of XML documents, tags are not predefined.
✓ In order to understand and style an XML document, World Wide Web Consortium (W3C)
developed XSL which can act as XML based Stylesheet Language.
✓ An XSL document specifies how a browser should render an XML document.