Module 2 - XML

WEB TECHNOLOGIES,S7 R
Module II (12 hours)

XML
Introduction to SGML – features of XML - XML as a subset of SGML – XML Vs HTML – Views of
an XML document - Syntax of XML- XML Document Structure – Namespaces- XML Schemas-
simple XML documents – Different forms of markup that can occur in XML documents - Document
Type declarations – Creating XML DTDs – Displaying XML Data in HTML browser – Converting
XML to HTML with XSL minimalist XSL style sheets – XML applications
INTRODUCTION
SGML is a meta-markup language is a language for defining markup language it can
describe a wide variety of document types.
_ Developed in the early 1980s; In 1986 SGML was approved by ISO std.
_ HTML was developed using SGML in the early 1990s - specifically for Web
documents.
Problems with HTML:
1. HTML is defined to describe the general form and layout of information in web documents without
considering its meaning.
2. Fixed set of tags and attributes. Given tags must fit every kind of document. No way to
find particular information
3. There are no restrictions on arrangement or order of tag appearance in document. For example, an
opening tag can appear in the content of an element, but its corresponding closing tag can appear after
the end of the element in which it is nested.
Eg : <strong> Now <em> is </strong> the time </em>
_ One solution to the first problems is to allow for group of users with common needs to
define their own tags and attributes and then use the SGML standard to define a newmarkup language to
meet those needs. Each application area would have its own markup language.
_ Use SGML to define a new markup language to meet those needs
Problem with using SGML:
1. It‘s too large and complex to use and it is very difficult to build a parser for it. SGML includes a large
number of capabilities that are only rarely used.
2. A program capable of parsing SGML documents would be very large and costly to develop.
3. SGML requires that a formal definition be provided with each new markup language.So having area-
specific markup language is a good idea, basing them on SGML is not.
A better solution: Define a simplified version of SGML and allow users to define their own markup
languages based on it. XML was designed to be that simplified version of SGML.
_ XML is not a replacement for HTML . Infact two have different goals
1
_ HTML is a markup language used to describe the layout of any kind of information
_ XML is a meta-markup language that provides framework for defining specialized markup languages
_ Instead of creating text file like
<html>
<head><title>name</title></head>……
XML Syntax
<name>
<first> nandini </first>
<last> sidnal </last>
</name>
XML is much larger then text file but makes easier to write software that accesses the
information by giving structure to data.
-XML is a very simple and universal way of storing and transferring any textual kind
- XML does not predefine any tags
- XML tag and its content, together with closing tag _ element
- XML has no hidden specifications
- XML based markup language is called tag set
- Document that uses XML based markup language is called XML document
- An XML processor is a program that parses XML documents and provides the parts to
an application
- Both IE7 and FX2 support basic XML .XML is a meta language for describing mark-up
languages. It provides a facility to define tags and the structural relationship between them
What is XML?
 XML stands for EXtensible Markup Language
 XML is a markup language much like HTML
 XML was designed to carry data, not to display data
 XML tags are not predefined. You must define your own tags
 XML is designed to be self-descriptive
 XML is a W3C Recommendation
 XML is not a replacement for HTML.
2
 XML is not a markup language. It is a meta markup language that specifies rules for creating
markup languages
XML SYNTAX RULES OR THE SYNTAX OF XML

The syntax of XML can be thought of at two distinct levels.
1. There is the general low-level syntax of XML that imposes its rules on all XML
documents.
2. The other syntactic level is specified by either document type definitions or XML
schemas. These two kinds of specifications impose structural syntactic rules on
documents written with specific XML tag sets.
- DTDs and XML schema specify the set of tags and attributes that can appear in particular
document or collection of documents, and also the orders and various arrangements in which
they can appear.
- DTD‘s and XML schema can be used to define a XML markup language.
XML document can include seveal different kinds of statements.
1) Data elements
2) Markup declarations - instructions to XML parser
3) Processing instructions – instructions for an applications program that will process the data described
in the document.
The most common of these are the data elements of the document. XML document may also include
markup declarations, which are instructions to the XML parser, and processing instructions, which are
instructions for an application program that will process the data described in the document.
 All XML document must begin with XML declaration. It identifies the document as being
XML and provides the version no. of the XML standard being used. It will also include
encoding standard. It is a first line of the XHTML document.
 
 XML names must begin with a letter or underscore and can include digits, hyphens, and periods.
 XML names are case sensitive. , the tag <Letter> is different from the tag <letter>.
 There is no length limitation for names.
 space is Preserved in XML
HTML truncates multiple white-space characters to one single white-space:
HTML: Hello my name is Tove
3
Output: Hello my name is Tove.

With XML, the white-space in a document is not truncated.
 In HTML closing tags are optional. But in XML closing tags are necessary.
In html,
<p>This is a paragraph
<p>This is another paragraph.
<p>This is a paragraph</p>
<p>This is another paragraph</p>
In XML,each opening tag must have a closing tag:
<Message>This is incorrect</message>
<message>This is correct</message>
 In HTML, you might see improperly nested elements:
<b><i>This text is bold and italic</b></i>
In XML, all elements must be properly nested within each other:
<b><i>This text is bold and italic</i></b>
In the example above, "Properly nested" simply means that since the <i> element is opened inside the
<b> element, it must be closed inside the <b> element.
 XML Documents Must Have a Root Element .It contain one element that is the parent of all
other elements. This element is called the root element.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
 XML tags can have attributes, which are specified with name/value assignments. XML Attribute
Values must be enclosed with single or double quotation marks. XML document that strictly
adheres to these syntax rule is considered as well formed. An XML document that follows all of
these rules is well formed
Example : <?xml version = .1.0. encoding = utf-8.?>
<ad><year>1960</year><make>Cessna</make><model>Centurian</moel>
<color>Yellow with white trim</color>
<location>
<city>Gulfport</city><state>Mississippi</state>
4
</location>
</ad>
None of this tag in the document is defined in XHTML-all are designed for the specific content of the
document.
 When designing an XML document, the designer is often faced with the choice between adding
a new attribute to an element or defining a nested element.
- In some cases there is no choices.
- In other cases, it may not matter whether an attribute or a nested element is used.
- Nested tags are used

<patient name = "Maggie Dee Magpie">
...
</patient>

<patient>
<name> Maggie Dee Magpie </name>
...
</patient>

<patient>
<name>
<first> Maggie </first>
<middle> Dee </middle>
<last> Magpie </last>
</name>
... </patient>
Here third one is a better choice because it provides easy access to all of the parts of data.
WELL FORMED AND VALID XML DOCUMENTS
 Well-formed documents applies basic xml rules on all its Documents
 Valid documents are well-formed and also specified by Dtds or xml schemas
 Dtds or xml schemas specify the set of tags that can appear in a particular document/documents
 Xml with correct syntax is "well formed" xml.
5
 Xml validated against a dtd is "valid" xml.

 A "valid" xml document is a "well formed" xml document, which also conforms to/ suitable
with the rules of a document type definition (dtd):
 The doctype declaration, is a reference to an external dtd file.
Xml dtd
 Defines the legal building blocks of an xml document
 Can be inline in xml or as an external reference
Xml schema
 An xml based alternative to dtd, more powerful and support namespace and data types.
SGML FEATURES
1. The term SGML stands for Standard Generalized Markup Language

2. It is a system for defining the markup language.
3. SGML is a meta language .It facilitates the creation of other languages.
4. SGML is extensible .It allows the author to define a particular structure by defining the parts that
fits the structure.
5. SGML a system for organizing and tagging elements of a document.
6. SGML specifies the rules for tagging elements.
7. It is widely used to manage large document that are subject to frequent revisions and need to be
print in different format.
8. Authors can mark up their document by representing structural, presentational and semantic
information along with the content.
9. SGML is intended to be absolutely independent of any application.
10. Closing tags are optional and nothing in the SGML document indicates how the data should
look.
11. HTML is an application of SGML because HTML was created using SGML standards.
12. SGML added provisions for identifying the characters to be used in the document and providing
a way to identify the objects that will be used throughout a document.
EXTENSIBLE MARKUP LANGUAGE (XML)
A markup language specifies the structure and content of a document. Extensible

Markup Language (XML) is a subset of, or restricted form of, Standard Generalized Markup Language
6
(SGML), which was introduced in the 1980s. XML documents are conforming SGML
documents. XML, because it is extensible, can be used to create a wide variety of document types.
With XML, new markup languages, called XML applications, can be created. Many XML applications
have been developed to work with specific types of documents.
XML FEATURES
1. XML stands for Extensible Markup Language.
2. It is designed to describe data or information and focus on what data is?

3. XML shall be straightforwardly usable over the Internet.
4. XML shall be compatible with SGML XML is a smaller language than SGML(ie subset of
SGML).
5. It is used to format and transfer data in an easy and convenient way.
6. It is a markup language like HTML.
7. XML has the ability to work with HTML for data display and presentation
8. It is a standard language used to structure and describe data that can be understood by different
application.
9. XML documents are called self describing documents
10. XML tags are not predefined . you must define your own tags
11. XML is free and extensible. It is a compliment to HTML
12. XML includes specification for a Style Sheet Language called eXtensible Stylesheet
Language ( XSL )
13. XML includes specification for a hyper linking scheme , which is described as a separate
language called eXtensible Link Language ( XLL )
14. Every XML document consists of data and markup.You can literally tag up your data with your
own tags .
15. XML can be used as a data interchange format .Since the XML text format is standards based
,data can be converted and then easily read by another system or application
7
16. XML shall support a wide variety of applications.
XML AS A SUBSET OF SGML
 SGML is a very powerful, very general and a standard markup language. But with that power comes
the increased complexity.
 XML is a subset of SGML intended to make SGML” light “enough for use on web.
 As XML is a proper subset of SGML, all XML documents are valid SGML documents .But not all
SGML documents are valid XML document.
 Relationship of XML to SGML
SGML
XML
SGML is intended to be absolutely independent of any applications
XML can be considered as SGML-Lite: 20% of SGML's complexity, 80% of its capacity.
XML is a lightweight cut-down version of SGML i.e. XML uses only the most commonly-used SGML
features.
 The complexity of implementing SGML’s power limits it’s users to big companies that need all that
power. Hence XML the simplified SGML that retains most of the inherent power of SGML in a
simple ,tidy ,easy-to-use and easy-to-implement form arrived.
 Since XML is optimized for use on the World Wide Web, it is designed in such a way that it has some
benefits that are not found in SGML.
 XML becomes a smaller language than SGML because the designers of XML removed some
specification in SGML that was not needed for web delivery.
8
 XML will not replace either SGML or HTML; XML is compatible with both.
 HTML and XML are both based on SGML.

COMPARISON OF HTML AND XML / XML VS HTML
HTML XML
1. HTML is HyperText Markup Languge 1. XML is eXtensible Markup Language

2. It is used for displaying information and 2. It is designed to describe data and to
to format the document ie HTML focus on what data is?
describes presentation ie XML describes content
3. HTML is not extensible.The user can’t 3. It is Extensible,it allows the author to
modify the structure or format by adding define a particular structure
your tags. 4. Tags are not predefined.
4. HTML tags are predefined. 5. Closing tags are compulsory
5. Closing tags are mostly optional
6. HTML is not case sensitive 6. XML is highly case sensitive
7. Attribute values need not always be 7. Attribute values must always be quoted.
quoted. eg: <file type="gif">
8. HTML has no Document Type
Defenition(DTD) 8. XML uses DTD to describe data
9. Document display is direct elements used in the document
and easy using any web browser with 9. XML need XSL interaction for web
HTML . browser display of document
10. Cascading Style Sheet (CSS) a style 10. In XML presentation and content are
sheet standard for HTML can be embed kept separate ie XSL page is acting
within HTML code independently.
11. HTML documents needn’t be in a well 11.XML documents must be in a well formed
structure.
formed structure.
12. HTML lacks synctactic checking.So we 12.We can validate XML code.
cannot validate HTML code. 13.<?xml version = “1.0”?>
13. Sample code
<html> <person>
<head> <name> Aswathy</name>
<title> This is my home
page</title></head> <age>21</age>
9
<body> WELCOME</body> <place>Trivandrum</place>

</html>
</person>
TWO VIEWS OF AN XML DOCUMENT
XML describes a class of data objects called XML documents. XML documents have both logical and
a physical structures, which must nest properly to be well-formed. XML documents consist of storage
units (entities). An entity, by reference to other entities, may include them in a document. XML
documents begin with a "root" (or document) entity. The overall structure of any given XML document
can be looked at in two distinct ways .Firstly it has a Logical structure and side by side with the
logical structure XML document have a Physical structure.
1.Logical structure
Viewed from this angle an XML document is a hierarchy of information .It enlists the elements to be
included in a document and in the order in which they have to be included .The elements or character
data of the document hangs in individual group in a tree like structure created by the markup.
At the very top of the tree is called Root element from which all the further logical structure
develops .Thus it refers to the organization of the different parts of a document , ie it indicates how a
document is built.
Eg: <PcForSale>
<item type=”PC”>
<Maker>Acme PC Inc</Maker>
<Brand>Acme Deluxe</Brand>
<storage>
<RAM units=”MB”>72</RAM>
<HardDisk units=”GB”>10</HardDisk>
</storage>
10
<CPU >Speed 500 GHz </CPU>
</item>
</PcForSale>
PCForSale
item type=”PC”
Maker Brand Storage CPU
RAM units=”MB” HardDisk units=”GB”
fig: the logical structure of the acmepc catalog XML document.
The logical structure is the layer above the physical structure .At this level an XML document consists
of an optional prolog, root element, and an optional epilog.
The first structural element in XML document that precedes the first start-tag is collectively known as
prolog. The prolog is everything that occurs before the root element starts .It can be completely empty
but should at least contain an XML declaration.
The XML declaration identifies the version of the XML specification to which the document conforms
.The sample document begins with the XML declaration <?xml version = ”1.0”?>
If the XML document is going to be associated with a Document Type Definition then the prolog will
contain a Document Type Declaration.
11
The Document Type Declaration is the area of the prolog used to declare element types ,attributes
,entities and so on .It takes the following general form:<!DOCTYPE … > .It consists of markup code
that indicates the grammar rule .It can also point to an external file that contains all or part of DTD.
The following code adds a Document Type Declaration to the sample document
<?xml version= ”1.0”?>
<!DOCTYPE catalog SYSTEM “catalog.dtd”>
The above statement conveys the XML parser that the document is of the class ‘catalog’ and conforms
to the rules formed in the DTD files named ‘catalog.dtd’ .
Root Element :-
The root element of an XML document is the element that contains all other element in the document.
<hello> Welcome to XML</hello>
Here hello is the root element.
The root element can be empty.
<hello/>
Epilog
The epilog is everything that occurs after the root element ends.
The word epilog is used here to name that area which can contain processing instruction, comments or
white space.
2) Physical Structure
The physical structure of an XML document is composed of all the content used in the
document .A single XML document can be made up of a number of distinct physical storage units
known as Entities .An Entity is a unit of text and are building blocks of XML document.
The full document is rooted in the entity known as Document Entity.
12
An entity can be part of the XML document or external to the document .Each entity is identified by a
unique name and contains its own content from a single character inside the document to a large file
that exists outside the document.
Entities are declared in the document in the prolog and referenced in the document element. An entity
can contain reference to other entities, which themselves can contain references to other entities.
The previous XML document is split across five separate entities-typically files or storage medium or
other.
PCForSale
Entity A Entity B
(part1.xml) ( part2.xml)
Entity A1 Entity A2
(part12.xml)
(part11.xml)
Fig : Physical View of an XML document
An XML processor sees an XML document as a series of characters, which reads in a series fashion
.when it sees something called Entity Reference ,it reads the name of the entity and replaces the entity
reference with the actual text or graphic or other type of media that is referred to.
Types of Entities
1. Predefined Entity
In XML certain character (< ,> , /) are used specifically for marking up the document .It
cannot be interpreted as Character data ,so cannot be used as content .You must use.Entity Reference
to insert the character into the document like (<,> ,&amp etc)
<myelement>7 > 2</myelement>
13
2.Parsed Entity
It contains text data that becomes part of the XML document once the data is processed .Parsed
entity is intended to be read by the XML processor which will extract the content. After the content is
extracted it becomes part of the document at the location of the entity reference.
Eg: publisher information (PUB1) entity can be declared as
<!ENTITY PUB! “BPB Publishers”>
Whenever the entity declaration is referenced in the document it will be replaced by its content
.First insert an ampersand (&) and then enter entity name followed by (;) for entity reference.
<publisher>This book is from &PUB1;</publisher>
3 .Unparsed Entity
The contents may or may not be text .It is often a binary file or image that is not directly
interpreted by the XML processor .Unparsed entity requires a notation. Notation identifies the format
or type or resource to which the entity is declared.
<!ENTITY myimage SYSTEM “1.gif” NDATA GIF>
Here GIF is the notation .Notation declaration for GIF is
<!Notation GIF SYSTEM “utils\gifview.exe”>
The above declaration tells the processor that whenever it encounters an entity of type GIF it should use
“gifview.exe” to process it.
4. External Entity
It refers to a storage unit in its declaration by using a SYSTEM or public identifier.It provides a
pointer to a location at which entity can be found.
<!ENTITY myimage SYSTEM http://www.abc.com/image/1.gif
NDATA GIF>
In this example the XML processor must read the file 1.gif to retrieve the content of this entity.
14
VI SIMPLE XML DOCUMENT
Create a test.xml file with the following content.
<greeting> Hello World </greeting>
The one line document has 3 component parts
 A start tag (<greeting>)

 An End tag ( </greeting>)
 Character data (“Hello World”)
By default the XML parser does not produce any output.It gives a simple tree structure it has built from
an XML document.The document consists of
Element -- greeting
PCDATA -- “Hello World “
WHITESPACE --0xa
Here the “Hello World” text has been encapsulated beneath a “greeting” element.At the same level
it gives some White space in the form of end-of-line code added to the file by the text editor. The parser
reports this as a line feed character denoted by 0xa(linefeed – in Unicode and ASCII)
GRAPHICAL REPRESENTATION OF SIMPLE XML
DOCUMENT
GREETING
HELLO WORLD WHITE SPACE
15
CREATING XML DOCUMENT
There are seven forms of markup that can occur in XML document.
 Start and End Tags

 Attribute Assignment
 Entity References
 Comments
 CDATA section
 Processing Instruction
 Document Type Declaration
XML DOCUMENT STRUCTURE

XML document often uses two auxiliary files:
1. It specifies tag set and syntactic structural rules.
2. It contain the style sheet to describe how the content of the document to be printed.
XML documents (and HTML documents) are made up by the following building blocks:
Elements, Tags, Attributes, Entities, PCDATA, and CDATA
Elements are the basic building blocks of XML files.
Elements are the main building blocks of both XML and HTML documents.
Elements can contain text, other elements, or be empty.
1.TAGS
Tags are used to markup elements. A starting tag like <element_name> mark up the beginning of an
element, and an endingtag like </element_name> mark up the end of an element.
Examples:
A body element: <body>body text in between</body>.
A message element: <message>some message in between</message>
Names are case sensitive. XML supports two types of elements, closed and empty (open) elements.
Closed elements consist of both opening(start) and closing(ending) tags. The following example
presents a closed element. In the closing tag, a forward slash precedes the element name.
<Month>January</Month>
16
Elements can be nested, and all elements must be nested within a single root element. Nested elements
are termed child elements. Elements must be nested correctly, with child elements enclosed within their
parent opening and closing element tags, as follows:
<Year>2000
<Month>January</Month>
<Month>February</Month>
</Year>
Empty (open) elements contain no content. An empty or open element can be used to mark sections of
the document for the processor. Empty elements can contain attributes used. An empty element has the
following syntax; the element name is followed by a slash. Eg: <Year/>
Tag TagMeaning
<greeting> Starts a greeting element
</introduction> Ends an introduction element
<Joe Black> Bad start tag .No space allowed
<42> Element name cannot begin with number
</ Product> No space allowed b/w slash and element name
Elements can be nested to an arbitrary depth to describe very rich information structure .Element
which does not have content is an empty element
ex: <hello/> .another example is <br> (line break) element in HTML, cannot sensibly have any
content .In XML it is an empty element .Empty element can have attributes.<hello
happy=”TRUE”/> is valid.
Also in <hello/>, the new line will be ignored as it occurs within markup .Empty element can also have
matching start and end tags as given below.<hello></hello> .
2.ATTRIBUTES
Attributes provide extra information about elements. Attributes are placed inside the start tag of an
element. Attributes come in name/value pairs. The following "img" element has an additional
17
information about a source file: <img src="computer.gif" /> The name of the element is "img". The
name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is
empty it is closed by a " /".
Attribute Assignment Meaning
1) <fruit type=”apple”> type attribute have value “apple
2) <fruit type=’apple’> Single quotes can also be used
3) <table border=2> Invalid.Attributes must be quoted
4) <animal leg=”4” The leg attribute has the value 4,blood

attribute has the value . White space within
blood=”cold”>
start tag are ignored by the parser.
3.PCDATA
PCDATA means parsed character data. Think of character data as the text found between the start tag
and the end tag of an XML element. PCDATA is text that will be parsed by a parser. Tags inside the
text will be treated as markup and entities will be expanded.
4.CDATA
CDATA also means character data. CDATA is text that will NOT be parsed by a parser.
Tags inside the text will NOT be treated as markup and entities will not be expanded.
Most of you know the HTML entity reference: " " that is used to insert an extra
space in an HTML document. Entities are expanded when a document is parsed by an
XML parser.
_ An XML document often uses two auxiliary files:
• One to specify the structural syntactic rules ( DTD / XML schema)
• One to provide a style specification ( CSS /XSLT Style Sheets)
5.ENTITIES
 An XML document has a single root element, but often consists of one or more entities
18
 An XML document consist of one or more entities that are logically related collection of
information,
 Entities range from a single special character to a book chapter.
 An XML document has one document entity.
 All other entities are referenced in the document entity .
Reasons to break a document into multiple entities:
1. Good to define a Large documents as a smaller no. of parts easier to manage .
2. If the same data appears in more than one place in the document, defining it as a entity
allows any no. of references to a single copy of data.
3. Many documents include information that cannot be represented as text, such as
images. Such information units are usually stored as binary data. Binary entities can only
be referenced in the document entities
Rules of Entity names:
• No length limitation
• Must begin with a letter, a dash, or a colon
• Can include letters, digits, periods, dashes, underscores, or colons.
• A reference to an entity has the form name with prepended ampersand and appended semicolon:
&entity_name; Eg. &apple_image;
 processor parses the document it will replace the entity reference with actual characters and will
not interpret characters as markup.
Five Built-in Entity
Entity Reference Interpretation
< <
> >
& &
' ‘
" “
 Regardless of the entity types, all entities are referenced in the same way: &name;This code
will include the simple entity
19
<!entity iso “International Organization for Standardization”> within a sentence :
The &iso; sets the standard for character encoding.when interepted by an XML parser the result is the
International Organization for Standardization. External entities are referenced in the same way as
internal text entities ,with the content of the external file referenced by the entity declaration ,which
replaces the entity reference .Binary entities can be referenced only as the value of the element with
an attribute that takes an entity value.
 If several predefined entities must appear near each other in a document, it is better to avoid
using entity references. Character data section can be used. The content of a character data
section is not parsed by the XML parser, so it can include any tags.
 The form of a character data section is as follows: <![CDATA[content]]> // no tags can be used
since it is not parsed For example, instead of Start >>>> HERE <<<<
use <![CDATA[Start >>>> HERE <<<<]]>
 The opening keyword of a character data section is not just CDATA, it is in effect [CDATA[.
There can be any spaces between [ and C or between A and [.
 Content of Character data section is not parsed by parser For example the content of the line
<![CDATA[The form of a tag is <tag name>]]> is as follows .The form of a tag is <tag
name> XML allows a block of text to be insulated from the attention of the parser using
CDATA section .CDATA stands for character data . you can mark a section as character data
using this syntax.
<![CDATA[content]]>
Ex1:<Document><![CDATA[ if a<b and b<c then a<c]] ></Document>
Between the start of the section ” <![CDATA[ ” and end of the section “]]>” all character
data is passed directly to the application .comments are not recognized in a CDATA section.Here the
parser has detected the presence of a CDATA section and waved the entire string “a<b and b<c then
a<c” through the application directly.
Ex2: <![CDATA [this is not an <apple>start tag]]>
CDATA sections can occur anywhere character data can occur. Because the first occurrence of “]]>”
will terminate the CDATA section .CDATA sections cannot be nested.
Here is an example of a valid CDATA section:
Ex3:<?xml version=”1.0”?>
20
<apples><![CDATA[this is not an </apple>end tag and this is not an &entity; reference]]>
</apples>
Note the shielding effect of the CDATA section ,which protects what looks like an apple end-tag and
what looks like an entity reference.
In element type models ,the keyword “#PCDATA” denotes character data.
<!ELEMENT para (#PCDATA)>
#PCDATA means ‘zero or more characters’.
6.PROCESSING INSTRUCTION (PI).
PI’s are defined as markup that provides information to be used by s/w application. I’s begins with
“<?” and ends with “?>” pair.XML itself make use of processing instruction in what is known as
XML declaration. The simplest form of PI which should head up the entire XML document is<?xml
version=”1.0”?>
7.DOCUMENT TYPE DECLARATION
1. A Document Type Declaration is a statement embedded in an XML document whose purpose is to

acknowledge the existence and location of Document Type Definition(DTD).
2. Document Type Declaration is a statement that points to the Document Type Definition(DTD) .
3. Document Type Definition is a set of rules that defines the structure of an XML document where
as a Document Type Declaration is a statement that tells the parser which DTD to use for checking
and validation.
4. All Document Type Declaration starts with a string “<!DOCTYPE ”
5. The Document Type Declaration can be external or internal
6. If external the DTD must be specified either as SYSTEM or PUBLIC.
7. If PUBLIC the DTD can be used by anyone by referring the URL.
8. If SYSTEM that means it resides on local harddisk and may not be available for use by other
application. For example suppose there is an XML document called myfile.xml that we want to
parse and validate against a DTD called my-rules.dtd .
21
DOCUMENT TYPE DEFINITIONS

A DTD is a set of structural rules called declarations. These rules specify a set of elements, along with
how and where they can appear in a document
• Purpose: provide a standard form for a collection of XML documents an define a markup language for
them.
• DTD provide entity definition.
• With DTD, application development would be simpler.
• Not all XML documents have or need a DTD
External style sheets are used to impose a uniform style over a collection of documents.
When are DTDs used?
• When same tag set definition are used by collection of documents , collection of users
and documents must have a consistent and uniform structure.
• A document can be tested against a DTD to determine whether it confirms to the rules
the DTD describes.
• Application programs that processes the data in the collection of XML documents can be written
to assume the particular document form. Without such structural restrictions, developing such
applications would be difficult.
• The DTD for a document can be internal (embedded in XML document) or external(separate
file)- can be used with more than one document.
• DTD with incorrect/inappropriate declaration can have wide-spread consequences.
WHAT IS DTD?
DTD stands for Document Type Definition.DTDs define an XML document's structure (e.g.,
what elements,attributes, etc. are permitted in the document). An XML document is not required to have
a corresponding DTD. However, DTDs are often recommended to ensure document conformity,
especially in business-to-business (B2B) transactions, where XML documents are exchanged.
CREATING XML DOCUMENT TYPE DEFENITION (DTD)
 A Document Type Declaration is a statement embedded in an XML document whose purpose is to

acknowledge the existence and location of Document Type Definition (DTD).
 It tells you what tags you can use in a document, what order they should appear in, which tags can
appear inside other ones, which tags have attributes and so on.
22
 All Document Type Declaration starts with the string “<!DOCTYPE ”

The DTD associated to an XML file can be:
 Internal: when rule are inserted directly in to the same XML document
 External: when the rules are contained in an external file
 Internal and External: when some rules are inserted directly in to the same XML document
and other rules are contained in an external file
Generally ,external files containing the DTD rules for an XML document have the extension “.dtd” .
INTERNAL DTD
The DTD section contains the rules an XML document must comply with to be a valid document.This
section can be inserted directly in to the XML document. If the DTD’s are internal then the syntax
is<!DOCTYPE root-element [<!internal type definition>]>
Internal DTD’s are also known as Internal Subset. This is a sample XML document with internal
DTD
<?xml version=”1.0”?>
<!DOCTYPE mail [ <!ELEMENT mail(to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>]>
<mail>
<to>Rani</to>
<from>Ravi</from>
<heading>Remainder</heading>
<body>About our parents Wedding Anniversary</body>
</mail>
23
The DTD is interpreted as follows:
1) <!DOCTYPE mail indicates that mail is the root element
2) <!ELEMENT mail Root element mail has 4 sub elements
3) <!ELEMENT to Sub element ‘to’ wil be of type PCDATA
4) PCDATA Element contain only text data
5) # # is reserved character indicates that #PCDATA is a reserved word
EXTERNAL DTD
The DTD section can be placed in an external file.If the Document Type Declaration is external then
the DTD must be specified either as SYSTEM or PUBLIC in the Document Type Declaration.
If the DTD’s are external then the syntax is
<!DOCTYPE root-element SYSTEM “DTD_file_URL”>
or
<!DOCTYPE root-element PUBLIC “public_id” “DTD_file_URL”>
If SYSTEM the DTD resides on the local hard disk and may not be available for use by other
applications. If PUBLIC the DTD can be used by anyone by referring the URL . The public_id is
unique for standard DTD files on the web.
The External subset(external DTD) if present, consists of a reference to an external entity following
the DOCTYPE keyword as illustrated here:
//mail.xml
<!DOCTYPE mail SYSTEM “mail.dtd”>
<mail>
<to>Rani</to>
<from>Ravi</from>
24
<heading>Remainder</heading>
<body>About our parents Wedding Anniversary</body>
</mail>
//mail.dtd
<!ELEMENT mail(to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>]>
The DTD can be housed exclusively by either the external or internal subset or both.
INTERNAL AND EXTERNAL DTD
The DTD sections required to validate an XML document can be inserted partially in to the document
and partially in to an external DTD file .
<!DOCTYPE root-element SYSTEM “DTD_file_URL”[
Insert the internal DTD sections here]>
//apples.dtd
<!ELEMENT apples(#PCDATA)>
//apples.xml
<!DOCTYPE SYSTEM “apples.dtd” [
<!ATTLIST apples color CDATA #REQUIRED>]>
<apples color=”green”>12</apples>
ELEMENT TYPE DECLARATION
25
 Every element in a valid XML document must have an element type declared in the DTD.
 To validate an XML document ,a validating parser needs to know three things about each element
1) What the element type is named
2) What elements of that type can contain(content model)
3) What attributes an element of that type has associated
 Both the element type name and its content model are declared together in what is known as
Element Type declaration
 Element Type declaration must start with the string “ <!ELEMENT “ followed by the name and
content specification
 Every element has certain allowed content. there are four general types of content specification
1)EMPTY content may not have content
2)ANY content may have any combination of elements in any order

3)Mixed content may have character data or mix of character data and sub elements
Element Type Declaration Interpretation
<!ELEMENT stock EMPTY> An element of type stock does not contain anything ex:
<stock/>
<!ELEMENT An element of type contact contains 3 sub elements

contact(name,address,phone)> name, address and phone exactly in that order.
Ex:<contact>
<name>aaa</name>
<address>SJCET,pala</address>
<phone>239301</phone>
</contact>
<!ELEMENT An element of type contact contains name element

contact(name,address?,phone)> followed by an optional address element and phone
.address can occur once or not at all
Ex:<contact>
26
<name>aaa</name><phone>239301</phone></contact>
<!ELEMENT An element of type fruit contains either a single apple

fruit(apple|orange)> element or a single orange element. A selection from a
list of element, only one allowed
<fruit><apple>---</apple></fruit>
An element of type fruit contains one or more sub

elements that are either apple element or orange element
<!ELEMENT
fruit(apple|orange)+> Ex1:<fruit><apple>---/apple>
<apple>---</apple>
<orange>--</orange>
</fruit>
An element of type fruit contains zero or more sub

elements that are either apple element or orange element
<!ELEMENT
fruit(apple|orange)*> Ex1:<fruit><apple>---/apple>
<orange>---</orange><fruit>
Ex2:<fruit></fruit>
An element of type para contains a mixture of character

data and list elements in any order.
<!ELEMENT
para(#PCDATA|list)*> Ex1:<para>Here is my list
<list>---</list></para>
Ex2:<para>aaa,bbb,ccc</para>
Ex3:<para><list>---</list></para>
An invoice element consists of a from element followed

27
by to element followed by one or more item element

<!ELEMENT
invoice(from,to,item+)>
An invoice element consists of a from element followed

by zero or more to element followed by one or more item
<!ELEMENT
element
invoice(from,to*,item+)>
An invoice element consists of any combination of

elements or character data, in any order .
<!ELEMENT invoice ANY>
ATTRIBUTE LIST DECLARATION
Attributes are information to further describe an element. If an element has attributes ,they need to be
declared. An element can contain attributes ,which are name-value pairs within a tagged element that
modify certain features of the element. An attribute specifies the type(and value) of additional
parameters that can be contained in the element. For XML,all values must be enclosed in quotation
marks. Either single or double quotation marks are acceptable, just be consistant . Attributes are defined
in the DTD as a subset of the element definition.
Attributes need to be declared in the DTD for validating XML parser to check that they have been
used properly in an XML document.
SYNTAX:
<!ATTLIST elementname attributename attributetype attributedefault
attributename attributetype attributedefault
“ “ “ >
Where, elementname : name of the element to which the attribute belong.
Attributename : name of the attribute of the element.
So in general an attribute list declaration has four aspects:
28
1. The element type to which it belongs
2. What the attribute is named.
3. What type of date the attribute value can contain.
4. How assignments to the attribute possible,if the value is not supplied( What default value each
attribute has default value).
 Example : <!ELEMENT person(#PCDATA)>

<!ATTLIST person email CDATA #REQUIRED>
 You can declare many attribute in a single attribute list declaration

<!ATTLIST person email CDATA #REQUIRED
phone CDATA #REQUIRED
fax CDATA #REQUIRED>
 You can also have multiple attribute list declaration for a single element
<!ATTLIST person email CDATA #REQUIRED>
<!ATTLIST person phone CDATA #REQUIRED>
 Each attribute in a declaration has three parts: a name, type and default value. The table below shows
the partial attribute list declaration
Partial Attribute List Declaration Interpretation
<!ATTLIST product name----> An element of type product has an

attribute known as name
<product name=”----”>
<!ATTLIST product An element of type product has an attribute

known as name and color
name----
<product name=”--” color=”---”>
color ----->
29
<!attlist product name----> Error the keyword “ATTLIST” must

always be in uppercase
The study of attribute list declaration is incomplete without knowing about the Attribute type and
Attribute Defaults. The next two sections discuss about the same ..
ATTRIBUTE TYPES
Attribute list declarations serve to specify the name, type and optionally the default value of the
attribute associate with an element. There are mainly three classifications:
1. String attribute type

2. Enumerated attribute type
3. Tokenized attribute type.
 STRING ATTRIBUTES
The simplest type of attribute is the CDATA or string attribute. CDATA attribute values can be any
string of characters.
<!ATTLIST elementname attributename CDATA ……………>
An element of type ‘elementname’ have an attribute called ‘attributename’ whose values can be any
string of characters(letters,numbers or punctuations) except <,>,$.
For example
<!ATTLIST product name CDATA …… >
An element of type product has an attribute called name whose values can be any string of
characters except <,>,& . String attribute can contain an arbitrary collection characters of any length as
long as any occurrence of “<” or “&” is escaped with entity references.
Sample:<product name=”Acme”>or
<product name=”Profit &Loss”>
 ENUMERATED ATTRIBUTES
30
A list of permissible values can be supplied using an enumerated type attributes. An enumerated
attribute is one that can take on one of a fixed set of values supplied as part of its declaration.
Ex:<!ATTLIST product name CDATA ……. Color(red|green) ………>
An element of type product has two attributes known as name and color. The name attribute can have
any string of characters except <,>,&.color attribute must be either the string “red” or “green” .
<product name=”acme” color=”red”>
Remember the enumerated attribute elements are case sensitive.

<!ATTLIST apple quality(GOOD|BAD|INDIFFERENT) .....>
The quality attribute is declared to have three permissible values “GOOD”,

”BAD” and “INDIFFERENT” .here is a valid apple element
<apple quality=”GOOD”> and following is an invalid apple element
<apple quality=”good”>
Here the attribute values of a particular attribute have unique value.
 TOKENIZED ATTRIBUTE TYPE

o ID/IDREF/IDREFS
These three attribute types are treated as together as they are strongly interrelated. Any ID
attributes occurring in an XML document must be unique in order for the document to be valid.
Attributes of type ID thus provide a handy way of a name to an element to uniquely identify it. ID’s
must begin with a letter, a “_” ,or a “:” character .In this example a UniqueName attribute is attached to
hello element :
<!ATTLIST hello UniqueName ID …….>
the sample element might look like this <hello UniqueName =”P1234”>
An IDREF attribute assigned values in a valid XML document must match the value assigned to an ID
attribute somewhere in the document.
<!ATTLIST bar Reference IDREF ….> Here is an example of bar element using its IDREF attribute to
point to the hello element of the last example.
<bar Reference=”P1234”>
IDREFS is a variation on IDREF in which an attribute is allowed to contain multiple referenced IDs
.So given this declaration:
<!ATTLIST bar References IDREFS ……>
31
bar element might look like this:<bar References=”P1234Q5678”>

thus the value IDREFS attribute may contain multiple IDREF values separated by White Spaces.
o ENTITY/ENTITIES
Attribute of type ENTITY or ENTITIES are treated together as they are strongly related. An Attribute
of type ENTITY must have a value corresponding to the name of an unparsed entity declared
somewhere in the Document Type Declaration. In this example an external data entity, bob, is declared
and the salutation attribute of the letter element is declared to be type ENTITY.
<!ENTITY bob system “bob.gif” NDATA gif>
<!ATTLIST letter salutation ENTITY …>
The letter element might look like this: <letter salutation=”bob”>

Like IDREFS above ,ENTITIES is a variation on ENTITY that allows an attribute to contain one or
more entity names in its value..
o NMTOKEN /NMTOKENS
Attribute of type NMTOKEN or NMTOKENS are treated together as they are strongly related. An
NMTOKEN attribute is restricted to contain characters allowed in a name: any combination of letters,
digits and some punctuation characters “.”, ”-”, ”_” and “:”.Note that this list does not contain any white
space characters.
An NMTOKENS attribute is one that can contain one or more NMTOKEN separated by white spaces.
<!ATTLIST product code NMTOKEN ….>
Product elements might look look like this:
<product code=”Alpha-123”> or<product code=”333”>
Here is an example of invalid NMTOKEN attribute
<product code=”A 123”>
ATTRIBUTE DEFAULTS
An Attribute list declaration includes information about whether or not a value must be supplied for it
and if not,what the XML processor should do.
There are four different variations:

32
1)A value --> The quoted value,which is used if none is specified in an element
2)Required --->A value must be specified.
3)Implied --->The XML processor tells the application that no value was supplied. The application can
decide what best to do.
4)Fixed --->A value is supplied in the declaration. No value need be supplied in the document and
the XML processor will pass the specified fixed value through the document. If a value is supplied in
the document, it must exactly match the fixed value.
#REQUIRED -The attribute must have an explicitly specified value on every occurrence of the
element in the document.
<!ATTLIST product name CDATA #REQUIRED>
An element of type product has an attribute called name whose value can be any string of chars except
<,>,&.The value must be supplied when it is used in the document. <product name=”Acmepc”>In
this example the type attribute of the fruit element is declared to be required.
<!DOCTYPE fruit[ <!ELEMENT fruit EMPTY>
<!ATTLIST fruit type CDATA #REQUIRED>]>
<fruit type=”apple”/>
A validating XML parser would thus reject the following document
.<!DOCTYPE fruit[<!ELEMENT fruit EMPTY>
<!ATTLIST fruit type CDATA #REQUIRED>]>
<fruit />
#IMPLIED -These are attributes that can be left unspecified if desired. The XML processor passes the
fact that the attribute was unspecified through out the XML application, which can then choose what
best to do.
Valid document.<!DOCTYPE fruit[<!ELEMENT fruit EMPTY>
<!ATTLIST fruit type CDATA #IMPLIED>]>
33
<fruit />
<!ATTLIST product color(red|green) #IMPLIED>
An element of type product has an attribute called color. Color attribute must be either string “red” or
“green”. If the value is not supplied, leave it up to the XML application to decide what to do.
<product color=”red”>…………</product> or <product> is also valid.
#FIXED attribute value – An attribute declaration may specify that an attribute has a fixed value.
<!ATTLIST product name CDATA #FIXED “Acmepc”>
An element of type product has an attribute called name having a fixed value Acmepc. Any other
value is an Error.<product name=”Acmepc”>
Default value- If the attribute value is making default then it should be specified within quotes.
<! ATTLIST product name CDATA “cmepc”>Here default value will be taken.
DTD declarations have the form: <!keyword … >
There are four possible declaration keywords:

ELEMENT, ATTLIST, ENTITY, and NOTATION
1. Declaring Elements:
• Element declarations are similar to BNF(CFG)(used to define syntactic structure of
Programming language) here DTD describes syntactic structure of particular set of
doc so its rules are similar to BNF.
• An element declaration specifies the name of an element and its structure
• If the element is a leaf node of the document tree, its structure is in terms of
characters
• If it is an internal node, its structure is a list of children elements (either leaf or
internal nodes)
• General form:
<!ELEMENT element_name(list of child names)>
e.g.,
<!ELEMENT memo (from, to, date, re, body)>
34
This element structure can describe the document tree structure shown below.
• Child elements can have modifiers,
+ -> One or more occurrences
* -> Zero or more occurrences
? ->Zero or one occurrences
Ex: consider below DTD declaration

<!ELEMENT person (parent+, age, spouse?, sibling*)>
_ One or more parent elements
_ One age element
_ Possible a spouse element.
_ Zero or more sibling element.
• Leaf nodes specify data types of content of their parent nodes which are elements
1. PCDATA (parsable character data)
2. EMPTY (no content)
3. ANY (can have any content)
Example of a leaf declaration:
<!ELEMENT name (#PCDATA)>
2. Declaring Attributes:
• Attributes are declared separately from the element declarations
• General form:
<!ATTLIST element_name attribute_name attribute_type [default _value]>
More than one attribute
< !ATTLIST element_name attribute_name1 attribute_type default_value_1
attribute_name 2 attribute_type default_value_2
…>
_ Attribute type :There are ten different types, but we will consider only CDATA
_ Possible Default value for attributes:
Value - value ,which is used if none is specified
#Fixed value - value ,which every element have and can‘t be changed
# Required - no default value is given ,every instance must specify a value
#Implied - no default value is given ,the value may or may not be specified
Example :
35
<!ATTLIST car doors CDATA "4">

<!ATTLIST car engine_type CDATA #REQUIRED>
<!ATTLIST car make CDATA #FIXED "Ford">
<!ATTLIST car price CDATA #IMPLIED>
<car doors = "2" engine_type = "V8">
...
</car>
3.Declaring Entities :
Two kinds:
• A general entity can be referenced anywhere in the content of an XML document
Ex: Predefined entities are all general entities.
• A parameter entity can be referenced only in DTD.
• General Form of entity declaration.
<!ENTITY [%] entity_name "entity_value">
% when present it specifies declaration parameter entity
Example :
<!ENTITY jfk "John Fitzgerald Kennedy">
_ A reference above declared entity: &jfk;
• If the entity value is longer than a line, define it in a separate file (an external text entity)
_ General Form of external entity declaration
<!ENTITY entity_name SYSTEM .file_location">
SYSTEM specifies that the definition of the entity is in a different file.
• Example for parameter entity
<!ENTITY %pat .(USN, Name).>
<!ELEMENT student %pat; >
4. Sample DTD:
<?xml version = "1.0" encoding = "utf-8"?>

<!ELEMENT planes_for_sale (ad+)>
<!ELEMENT ad (year, make, model, color, description, price?, seller, location)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT make (#PCDATA)>
<!ELEMENT model (#PCDATA)>
36
<!ELEMENT color (#PCDATA)>

<!ELEMENT description (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT seller (#PCDATA)>
<!ELEMENT location (city, state)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ATTLIST seller phone CDATA #REQUIRED>
<!ATTLIST seller email CDATA #IMPLIED>
<!ENTITY c "Cessna">
<!ENTITY p "Piper">
<!ENTITY b "Beechcraft">
5. Internal and External DTD’s:
• Internal DTDs
<!DOCTYPE planes [
<!– The DTD for planes -->
]>
Xml file
• External DTDs
<!DOCTYPE XML_doc_root_name SYSTEM
.DTD_file_name.>
For examples,
<!DOCTYPE planes_for_sale SYSTEM
.planes.dtd.>

<!DOCTYPE planes_for_sale SYSTEM "planes.dtd">
<planes_for_sale>
<ad>
<year> 1977 </year>
<make>&c; </make>
<model> Skyhawk </model>
<color> Light blue and white </color>
37
<description> New paint, nearly new interior,
685 hours SMOH, full IFR King avionics </description>
<price> 23,495 </price>
<seller phone = "555-222-3333"> Skyway Aircraft </seller>
<location>
<city> Rapid City, </city>
<state> South Dakota </state>
</location>
</ad>
</planes_for_sale>
XML NAMESPACES
 Xml namespaces provide a method to avoid element name conflicts.
 In xml, element names are defined by the developer. This often results in a conflict when trying
to mix xml documents from different xml applications.
 This xml carries html table information:
<table>
<tr>
<td>apples</td>
<td>bananas</td>
</tr>
</table>
 This xml carries information about a table (a piece of furniture):
<table>
<name>african coffee table</name> <width>80</width>
<length>120</length>
</table>
 If these xml fragments were added together, there would be a name conflict. Both contain a
<table> element, but the elements have different content and meaning.
 An xml parser will not know how to handle these differences.
 Dtds do not support namespaces very well.
 Namespaces can be declared in the elements where they are used or in the xml root element
 The namespace uri is not used by the parser to look up information.
38
 The purpose is to give the namespace a unique name.

SOLVING THE NAME CONFLICT USING A PREFIX
 Name conflicts in xml can easily be avoided using a name prefix.
 This xml carries information about an html table, and a piece of furniture:
<h:table>
<h:tr>
<h:td>apples</h:td>
<h:td>bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>african coffee table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
 In the example above, there will be no conflict because the two <table> elements have different
names.
XML NAMESPACES - the xmlns ATTRIBUTE
 The namespace declaration has the following syntax:
xmlns:prefix="URI"
• The square bracket indicates that what is within them is optional.
• prefix[optional] specify name to be attached to names in the declared namespace
Two reasons for prefix :
1. Shorthand for URI // URI is too long to be typed on every occurrence of every name
from the namespace.
2. URI may includes characters that are illegal in XML
 Namespaces can be declared in the elements where they are used or in the xml root element:
<root xmlns:h="http://www.w3.org/tr/html4/"
xmlns:f="http://www.w3schools.com/furniture">
<h:table>
<h:tr>
<h:td>apples</h:td>
<h:td>bananas</h:td>
39
</h:tr>
</h:table>
<f:table>
<f:name>africancoffeetable</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
DEFAULT NAMESPACES
Defining a default namespace for an element saves us from using prefixes in all the child
elements. It has the following syntax:
Xmlns="namespaceURI"
 This xml carries html table information:
<table xmlns="http://www.w3.org/TR/html4/">
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
 This xml carries information about a piece of furniture:
 <table xmlns="http://www.w3schools.com/furniture">
<name>AfricanCoffeeTable</name>
<width>80</width>
<length>120</length>
</table>
XML schemas
A schema is any type of model document that defines the structure of something, such as databases
structure or documents. Here something is XML doc. Actually DTDs are a type of schema.
An XML schema is an XMl document so it can be parsed with an XML parser.
The term XML schema is used to refer to specify W3C XML schema technology.
W3C XML Schemas like DTD allow you to describe the structure for an XML doc.
40
DTDs have several disadvantages

• Syntax is different from XML - cannot be parsed with an XML parser
• It is confusing to deal with two different syntactic forms
• DTDs do not allow restriction on the form of data that can be content of element ex:
<quantity>5</quantity> and <quantity>5</quantity> are valid DTD can only specifies that could be
anything. Eg time No datatype for integers all are treated as texts.
XML Schemas is one of the alternatives to DTD
• It is XML document, so it can be parsed with XML parser
• It also provides far more control over data types than do DTDs
• User can define new types with constraints on existing data types
1. Schema Fundamentals:
• Schema are related idea of class and an object in an OOP language Schema D class definition
XML document confirming to schema structure D Object
• Schemas have two primary purposes
Specify the structure of its instance XML documents
specify the data type of every element & attribute of its instance XML documents
2. Defining a schema:
Schemas are written from a namespace(schema of schemas):
http://www.w3.org/2001/XMLSchema element, schema, sequence and string are some
names from this namespace
Every XML schema has a single root, schema.
• The schema element must specify the namespace for the schema of schemas from
which the schema‘s elements and its attributes will be drawn.
• It often specifies a prefix that will be used for the names in the schema. This name
space specs appears as
xmlns:xsd = http://www.w3.org/2001/XMLSchema
Every XML schema itself defines a tag set like DTD, which must be named with the targetNamespace
attribute of schema element. The target namespace is specified by assigining a name space to the target
namespace attribute as the following:
targetNamespace = http://cs.uccs.edu/planeSchema
Every top-level element places its name in the target namespace If we want to include nested elements,
we must set the elementFormDefault attribute to qualified.
elementFormDefault = qualified.
41
The default namespace which is source of the unprefixed names in the schema is given
with another xmlns specification xmlns = "http://cs.uccs.edu/planeSchema
A complete example of a schema element:
<xsd:schema

xmlns:xsd = http://www.w3.org/2001/XMLSchema 
targetNamespace = .http://cs.uccs.edu/planeSchema.

xmlns = .http://cs.uccs.edu/planeSchema.

elementFormDefault = "qualified. >
3.Defining a schema instance:

• An instance of schema must specify the namespaces it uses
• These are given as attribute assignments in the tag for its root element
1. Define the default namespace
<planes xmlns = http://cs.uccs.edu/planesScema
…>
2. It is root element of an instance document is for the schemaLocation attribute.
Specify the standard namespace for instances (XMLSchema-instance) xmlns:xsi
=.http://www.w3.org/2001/XMLSchema-instance"
3. Specify location where the default namespace is defined, using the schemaLocation
attribute, which is assigned two values namespace and filename.
xsi:schemaLocation ="http://cs.uccs.edu/planeSchema planes.xsd" >
4. Schema Data types: Two categories of data types

1.Simple (strings only, no attributes and no nested elements)
2. Complex (can have attributes and nested elements)
• XML Schema defines 44 data types
• Primitive: string, Boolean, float, …
• Derived: byte, decimal, positiveInteger, …
• User-defined (derived) data types – specify constraints on an existing
42
type (then called as base type)

• Constraints are given in terms of facets of the base type
Ex: interget data type has *8 facets :totalDigits, maxInclusive….
Both simple and complex types can be either named or anonymous
DTDs define global elements (context of reference is irrelevant). But context of reference is essential in
XML schema
Data declarations in an XML schema can be
1. Local ,which appears inside an element that is a child of schema
2. Global, which appears as a child of schema
5. Defining a simple type:
• Use the element tag and set the name and type attributes
<xsd:element name = "bird. type = "xsd:string./>
The instance could be :
<bird> Yellow-bellied sap sucker </bird>
• An element can be given default value using default attribute
<xsd:element name = "bird. type = "xsd:string. default=.Eagle. />
• An element can have constant value, using fixed attribute
<xsd:element name = "bird. type = "xsd:string. fixed=.Eagle. />
Declaring simple User-Defined Types:
• User-Define type is described in a simpleType element, using facets
• facets must be specified in the content of restriction element
• facets values are specified with the value attribute
For example, the following declares a user-defined type , firstName
<xsd:simpleType name = .firstName" >
<xsd:restriction base = "xsd:string" >
<xsd:maxLength value = "20" />
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name = “phoneNumber" >
<xsd:restriction base = "xsd:decimal" >
<xsd:precision value = “10" />
</xsd:restriction>
</xsd:simpleType>
6.Declaring Complex Types:
43
• There are several categories of complex types, but we discuss just one, element-only elements
• Element-only elements are defined with the complex Type element
• Use the sequence tag for nested elements that must be in a particular order
• Use the all tag if the order is not important
• Nested elements can include attributes that give the allowed number of occurrences
(minOccurs, maxOccurs, unbounded)
• For ex:
<xsd:complexType name = "sports_car" >
<xsd:sequence>
<xsd:element name = "make. type = "xsd:string" />
<xsd:element name = "model . type = "xsd:string" />
<xsd:element name = "engine. type = "xsd:string" />
<xsd:element name = "year. type = "xsd:string" />
</xsd:sequence>
</xsd:complexType>
7. Validating Instances of Schemas:
• An XML schema provides a definition of a category of XML documents.
• However, developing a schema is of limited value unless there is some mechanical way to determine
whether a given XML instance document confirms to the schema.
• Several XML schema validation tools are available eg. xsv(XML schema validator) .This can be used
to validate online.
• Output of xsv is an XML document. When run from command line output appears without being
formated.
Output of xsv when run on planes.xml
<?XML version = =1.0‘ encoding = =utf-8?>
<xsv docElt = ={ http://cs.uccs.edu/planesSchema} planes‘
instanceAccessed = =true‘
instanceErrors = =0‘
schemaErrors = =0‘
schemaLocs = =http://cs.uccs.edu/planesSchema->planes.xsd‘
Target = =file: /c:/wbook2/xml/planes.xml‘
Validation = =strict‘
Version = =XSV 1.197/1.101 of 2001/07/07 12:01:19‘
Xmlns==http:// www.w3.org/2000/05.xsv‘>
44
<importAttempt URI ==file:/c:wbook2/xml/planes.xsd‘
namespace = =http://cs.uccs.edu/planesSchema‘
outcome = =success‘ />
</xsv> If schema is not in the correct format, the validator will report that it could not find the specified
schema.
Displaying RAW XML Documents
An XML enabled browser or any other system that can deal with XML documents
cannot possibly know how to format the tags defined in the doc.
Without a style sheet that defines presentation styles for the doc tags the XML doc can
not be displayed in a formatted manner.
Some browsers like FX2 have default style sheets that are used when style sheets are
not defined.
Eg of planes.xml document.
Refer Page No.307
DISPLAYING XML DOCUMENTS WITH CSS

Style sheet information can be provided to the browser for an xml document in two
ways.
• First, a CSS file that has style information for the elements in the XML document can
be developed.
• Second the XSLT style sheet technology can be used..
Using CSS is effective, XSLT provides far more power over the appearance of the
documents display.
A CSS style sheet for an XML document is just a list of its tags and associated styles. The connection of
an XML document and its style sheet is made through an xmlstylesheet processing instruction
Display– used to specify whether an element is to be displayed inline or in a separate
block.
<?xml-stylesheet type = "text/css. href = .planes.css"?>
For example: planes.css

ad { display: block; margin-top: 15px; color: blue;}
year, make, model { color: red; font-size: 16pt;}
45
color {display: block; margin-left: 20px; font-size: 12pt;}
description {display: block; margin-left: 20px; font-size: 12pt;}
seller { display: block; margin-left: 15px; font-size: 14pt;}
location {display: block; margin-left: 40px; }
city {font-size: 12pt;}
state {font-size: 12pt;}

<planes_for_sale>
<ad>
<year> 1977 </year>
<make> Cessana </make>
<model> Skyhawk </model>
<color> Light blue and white </color>
<description> New interior
</description>
<seller phone = "555-222-3333">
Skyway Aircraft </seller>
<location>
<city> Rapid City, </city>
<state> South Dakota </state>
</location>
</ad>
</planes_for_sale>
With planes.css the display of planes.xml as following:
1977 Cessana Skyhawk

Light blue and white
New interior,685 hours SMOH,full IFR King avionics
Skyway Aircraft
Rapid City, South Dakota
46
STORING XML DATA IN HTML DOCUMENT:
Data island:
An XML data island is Extensible Markup Language (XML) embedded in an HTML

document. By themselves, data islands are not that important or helpful, but if employed correctly, they
can be useful for storing client-side information.The data can be located in external XML files or
internally coded. In an external data island,data and presentation of data are being seperated. The
HTML element xml is used to embed XML in HTML documents. Data islands are created by assigning
an ID attribute to the xml tag. The example below creates a data island using an external XML file.
The source src attribute specifies the data file location.
<xml id="customers" src="customers.xml”>
The following example creates an internal data island in a simplified HTML document. The requisite
XML declaration and root element are included in the example.
<html><body>
<xml id="customers">
<customer>
<name>Font Factory</name>
<name>BeBop Grafix</name>
</customer>
</xml></body></html>
IE's XML parser creates a data island from an XML document by storing the data as a Data Source
Object (DSO). ). Interactions between data islands and the Web page are controlled by the DSO.
DISPLAYING XML DATA IN HTML BROWSER AS HTML TABLES:
Binding tables for display purposes:

One of the most powerful features of binding data islands is the ability to bind HTML tables.
The element attributes datasrc, datafld, dataformatas, and datapagesize allow IE to display bound data.
47
The datasrc attribute specifies the data island source with a URI.
The datafld attribute specifies elments in the XML data source.
The datasrc attribute links an HTML element to a DSO. The attribute property specifing the unique
identifier of a DSO must be prefixed by a number sign (#). The datafld attributes reference specific
elements in the XML data source. The following syntax is required:
<tag datasrc="#id" datafld="field_name">
The datasrc attributte can be used with the following elements: a, applet, button, div, frame, iframe,
img, label, marquee, select, span, table, textarea, and the input tags (types button, checkbox, hidden,
image, password, radio, and text).
//catalogs.xml
<?xml version="1.0"?>
<PCS>
<PC>
<NAME>Zenith</NAME>
<CAPACITY>100</CAPACITY>
<PRICE>20000</PRICE>
</PC>
<PC>
<NAME>DELL</NAME>
</PC>
<PC>
<NAME>Acer</NAME>
48
</PC>
</PCS>
//catalog1.html
<html>
<body>
<xml id="PCS" src="catalogs.xml">
</xml>
<table datasrc="#PCS" border="1" align="center">
<thead><th>Name</th><th>Capacity</th><th>Price</th></thead>
<tr><td><div datafld="NAME"></td>
<td><div datafld="CAPACITY"></td>
<td><div datafld="PRICE"></td>
</tr>
</table>
</body></html>
Explanation:
 we need to create an HTML table to lay out the information .We can write the start of the table
element to fit our data which is to be displayed.
<table datasrc="#x" border="1" align="center">
<thead>
<th>Name</th>
<th>Capacity</th>
49
<th>Price</th>
</thead>
This is a simple HTML start tag with table heading information.
 The body of the table should look like this

<td><div datafld="CAPACITY"></td>
<td><div datafld="PRICE"></td>
</tr>
This table row is declared to contain three table cells .The datafld attribute is used to specify the
name of the data field each cell will contain.
//output
Name Capacity Price
Zenith 100 20000
DELL 200 40000
Acer 300 35000
Eg2: Product.xml
<?xml version="1.0" ?>
<Products>
<Item>
<Name>Coffee Cup Warmer</Name>
<Number>0001</Number>
<Price>15.00</Price>
50
</Item>
<Item>
<Name>42 Cup Coffee Brewing System</Name>
<Number>0015</Number>
<Price>473.00</Price>
</Item></Products>
In the following html document(saved as filename.html), external data

island is used(product.xml)
<HTML>
<HEAD>
<TITLE>Show XML In Your HTML</TITLE>
</HEAD>
<BODY>
<XML ID="MyProducts" SRC="Product.xml"></XML>
<TABLE DATASRC="#MyProducts" BORDER="1">
<THEAD>
<TH>Item Name</TH>
<TH>Item Number</TH>
<TH>Price</TH>
</THEAD>
<TR>
<TD><Span DATAFLD="Name"/></TD>
51
<TD><SPAN DATAFLD="Number"/></TD>
<TD><SPAN DATAFLD="Price"/></TD>
</TR>
</TABLE>
</BODY>
</HTML>
Output:
Item Name Item Number Price
Coffee Cup Warmer 0001 15.00
42 Cup Coffee Brewing System 0015 473.00
DISPLAY HIERARCHIAL XML AS NESTED HTML TABLES
Step1: create c1.xml
<PCS><PC>
<NAME>Zenith</NAME>
<CAPACITY><RAM>10</RAM>
<DISK>200</DISK></CAPACITY>
52
</PC>
<PC>
<NAME>DELL</NAME>
<CAPACITY><RAM>20</RAM>
<DISK>300</DISK></CAPACITY>
</PC>
</PCS>
Step2: create the c1.html with the following content
<html><body>
<xml id="x" src="c1.xml"/>
<table datasrc="#x" border="1" align="center" width=300>
<thead>
<th>Name</th>
<th>Capacity</th>
<th>Price</th>
</thead>
<td>
<table datasrc="#x" datafld="CAPACITY" border="1" align="center"

width=150>
<thead><th>RAM</th>
<th>DISK</th>
53
</thead>
<tr>
<td><div datafld="RAM"></td>
<td><div datafld="DISK"></td>
</tr>
</table>
</td><td><div datafld="PRICE"></td>
</tr>
</table>
</body></html>
//output
Eg-
Write an application to create an Address Book in XML and display the Address book in the browser
usingHTML.
1) Create a valid XML document for Address Book with following elements.The address book stroes
'N' persons Name(fname,lname),Address(hname,city,state,country) all the phone numbers(mob,off,res).
2) The sub elements are given in the bracketStore the xml data in an HTML page and display the data in
HTML browser as HTML Tables. As we are storing the XML data in HTML page , there is only one
HTMLfilecalledaddressbook.html
3) Use XML tag for storing the content in an HTML page.
54
//adressbook.html
<html>
<body>
<xml id="addbook">

<addbook>
<person>
<name>
<fname>Ammu </fname>
<lname>Kurian</lname>
</name>
<address>
<hname>Vadakkan</hname>
<city>Pala</city>
<state>Kerala</state>
<country>India</country>
</address>
<email>ammu@yahoo.com</email>
<phone>
<res>647346</res>
<mob>97878473</mob>
<off>38463486</off>
</phone>
55
</person>
<person>
<name>
<fname>Anju</fname>
<lname>john</lname>
</name>
<address>
<hname>Anju Nivas</hname>
<city>Kavadiyar,Tvm</city>
<state>Kerala</state>
<country>India</country>
</address>
<email>ammu@hotmail.com</email>
<phone>
<res>4447346</res>
<mob>4548473</mob>
<off>3763486</off>
</phone>
</person>
</addbook>
</xml>

<TABLE datasrc="#addbook" cellSpacing=1 align=center cellPadding=1

width="85%" border=1>
56
<THEAD>
<TH>NAME</TH>
<TH>ADDRESS</TH>
<TH>EMAIL</TH>
<TH>PHONE</TH>
</THEAD>
<TR>
<TD valign=top><table datasrc="#addbook" datafld="name" border=0>
<tr><td><div datafld="fname"></td>
<td><div datafld="lname"></td>
</tr></table></TD>
<TD valign=top><table datasrc="#addbook" datafld=address border=0 >
<tr><td><div datafld=hname></td></tr>
<tr><td><div datafld=city></td></tr>
<tr><td><div datafld=state></td></tr>
<tr><td><div datafld=country></td></tr>
</table></TD>
<TD valign=top><div datafld=email></TD>
<TD valign=top><table datasrc="#addbook" datafld=phone border=0

width=150>
<tr><td>Res:<span datafld=res></td></tr>
<tr><td>Mob:<span datafld=mob></td></tr>
<tr><td>Off:<span datafld=off></td></tr>
57
</table></TD>
</TR></TABLE>
</body></html>
//output
NAME ADDRESS EMAIL PHONE
Ammu Kurian Vadakkan ammu@yahoo.com Res:647346
Pala Mob:97878473
Kerala Off:38463486
India
Anjujohn Anju Nivas ammu@hotmail.com Res:4447346
Kavadiyar,Tvm Mob:4548473
Kerala Off:3763486
India
EXTENSIBLE STYLE SHEET LANGUAGE (XSL)
XSL a transformation language that enables the conversion of XML into grammar and structure
suitable for display in the browser .XSL allows direct browsing of XML file using explorer or any
browser.
58
The basic tags that are used in the XSL is given below.
 xsl:stylesheet :This is the first element in the style sheet .In this element we can specify the
Namespace for style sheet.
XML Namespace is a collection of names identified by a URI (Uniform Resource Identifier) or a
URL (Uniform Resource Locater) reference which are used in the XML document as element type
and attribute names .By using the namespace, developers can qualify element names uniquely on the
web and thus avoid conflicts between element that have same name. Imporatant namespaces are given
below.
a) xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
This is the official standard transformation namespace used as a part of the W3C
recommendation.[W3C is World Wide Web Consortium , which is a forum for information,
commerce ,communication and collective understanding] .It is not supported by IE5.0
b) xmlns:xsl='http://www.w3.org/TR/WD-xsl
This is the namespace used for style sheet processed by IE5.
The important xsl:stylesheet attributes are as follows:
1)id style sheet unique identifier
2) xmlns:xsl Declares the namespace for the current specification. This is a fixed attribute with the
value 'http://www.w3.org/1999/XSL/Transform'
 xsl:template : Template is a structured container that manages the way a source tree or a portion of
the source tree is transformed. when you build a style sheet ,you will build a series of template that
match elements you would like to attach some styles to. xsl:template defines a set of riles for
59
transforming nodes in a source document into a result tree .This is handled with match attributes. The
important xsl:template attributes are as follows:
1)modeIdentifies the processing mode and matches it against an apply-template element that has
matching template value.
2)Name Give a name to the template so that it can be accessed.
3)matchIdentifies the template to be processed.
 xsl:apply-templates : This element tells the processor to process a named template that has been
defined using xsl:template element. Possible attributes are given below:
1) selectIdentify the node to be processed.If theis attribute is not specified,then processor will
process the template of the current node in the order of their appearance in the root document
2)mode  Identify the processing node and selects only those template elements that have a matching
node value.
 xsl:for-each : This element locates the set of elements in the XML data and repeats a portion of the
template for each one .Possible attributes are
1)select Identify the node to be processed.
2)xml-space indicates whether or not white space present.
 xsl:value-of : within <xsl:for-each> element we can further drill down to select children.<xsl:value-
of> element specifies a specific child and then insert the text content of that child into that template
using select attribute .select attribute specifies the node to be processed.
CONVERTING XML INTO HTML WITH XSL
An XSL style sheet are themselves XML documents. The XSL is based on set of rules that trigger
when specified elements are encountered in an XML document. Every XML page should contain an
XSL file to display it in an explorer.
Example 1 :To display a Welcome Message
Step1: create a hello.xml file with the following content

60
<!DOCTYPE hello[
<!ELEMENT hello (msg)>
<!ELEMENT msg (#PCDATA)>]>

<?xml:stylesheet href="hello.xsl" type="text/xsl"?>

<hello>
<msg>Welcome to XML world</msg>
</hello>
Step2: create the hello.xsl file
<xsl:stylesheetversion="1.0"
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'></xsl:stylesheet>
Step3: browser the hello.xml file on the browser you will get the output.
The DTD part can also be neglected for simple file like this the XSL will take care of the document.
Example 2 :To display different Message
//hello1.xml
<?xml:stylesheet href="hello1.xsl" type="text/xsl"?>
<hello><wel>welcome to XML</wel>
<xm><xm1>XML is eXtensible Markup Language</xm1>
<xm2><sg1>SGML is Standard Generalized Markup Language</sg1>
<sg2>XML is a subset of SGML</sg2></xm2></xm></hello>
61
//hello1.xsl
<xsl:stylesheet version="1.0"
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="hello">
<html><body><table>
<tr><td align="center">
<h1><xsl:value-of select="wel"/></h1></td></tr>
<xsl:for-each select="xm">
<tr><td>
<h2><xsl:value-of select="xm1"/></h2></td></tr>
<xsl:for-each select="xm2">
<tr><td>
<h3><xsl:value-of select="sg1"/></h3></td></tr>
<tr><td>
<h3><xsl:value-of select="sg2"/></h3></td></tr>
</xsl:for-each></xsl:for-each>
</table></body></html>
</xsl:template>
welcome to XML
XML is eXtensible Markup Language

SGML is Standard Generalized Markup Language
XML is a subset of SGML
62
Example 3:
//Email.xml

<?xml:stylesheet href="email.xsl" type="text/xsl"?>
<mail>
<Recipient>ab@yahoo.com</Recipient>
<Sender>cd@yahoo.com</Sender>
<Date>Mon, 21 Apr 1997 09:27:55 +0200</Date>
<Subject>XML literature</Subject>
<Textbody>
<sal>Hello </sal>
<content>Please read Jon Bosak's introductory text
"SGML, Java and the Future of the Web"</content>
<thanks> Best wishes, </thanks>
<name>Ingo Macherius </name>
</Textbody>
</mail>
//Email.xsl

<xsl:stylesheet version = '1.0'
<xsl:template match="/">
<HTML>
<BODY><TABLE BORDER="0">
<TR><TD>MAIL DETAILS</TD></TR>
<xsl:for-each select="mail">
<TR><TD><xsl:value-of select="Recipient"/></TD></TR>
<TR><TD><xsl:value-of select="Sender"/></TD></TR>
<TR><TD><xsl:value-of select="Date"/></TD></TR>
<TR><TD><xsl:value-of select="Subject"/></TD></TR>
<TR><TD><xsl:for-each select="Textbody">
<xsl:value-of select="sal"/>
<br/><xsl:value-of select="content"/>
<br/><br/><xsl:value-of select="thanks"/>
<br/><xsl:value-of select="name"/>
</xsl:for-each>
</TD></TR>
</xsl:for-each></TABLE></BODY></HTML>
</xsl:template></xsl:stylesheet>
64
//output
MAIL DETAILS
ab@yahoo.com
cd@yahoo.com
Mon, 21 Apr 1997 09:27:55 +0200
XML literature
Hello
Please read Jon Bosak's introductory text "SGML, Java and the Future of the Web"
Best wishes,
Ingo Macherius
Example 4:
Write a program which creates a valid book CATALOG document in XML .The book catalog stores
any CATEGORY of books and each BOOK ELEMENT stores
BOOKNAME,AUTHORNAME,ISBN,PUBLISHER,PAGES,PRICES etc.The element BOOK
having an enumerated attribute list called Best Seller and the Price element having an attribute called
Currency.You can further expand the elements if necessary…Write the valid internal DTD and create
XSL file to display it in browser..
//bcatalogxml
<?xml:stylesheet href="bcatalog.xsl" type="text/xsl"?>
<!DOCTYPE CATALOGS[
<!ELEMENT CATALOGS (CATEGORY)*>
<!ELEMENT CATEGORY (BOOK)>
<!ATTLIST CATEGORY TYPE CDATA #REQUIRED>

65
<!ELEMENT BOOK (BOOKNAME,AUTHORNAME,ISBN,PUBLISHER,PAGES,PRICES)>
<!ATTLIST BOOK BESTSELLER (YES|NO) #REQUIRED>
<!ELEMENT BOOKNAME (#PCDATA)>
<!ELEMENT AUTHORNAME (#PCDATA)>
<!ELEMENT ISBN (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ELEMENT PAGES (#PCDATA)>
<!ELEMENT PRICE (#PCDATA)>
<!ATTLIST PRICE CURRENCY CDATA #REQUIRED>
]><CATALOGS>
<CATEGORY TYPE="XML">
<BOOK BESTSELLER="NO">
<BOOKNAME>CLOUDES TO CODE</BOOKNAME>
<AUTHORNAME>JESSE</AUTHORNAME>
<ISBN>111-S223</ISBN><PUBLISHER>WROX</PUBLISHER>
<PAGES>276</PAGES>
<PRICE CURRENCY="usd">42.00</PRICE>
</BOOK></CATEGORY>
<CATEGORY TYPE="XML">
<BOOK BESTSELLER="YES">
<BOOKNAME>XML IN ACTION</BOOKNAME>
<AUTHORNAME>WILLIAM</AUTHORNAME>
<ISBN>222-S223</ISBN>
66
<PUBLISHER>TECHMEDIA</PUBLISHER><PAGES>476</PAGES>
<PRICE CURRENCY="usd">87.00</PRICE>
</BOOK></CATEGORY></CATALOGS>
bcatalog.xsl
<xsl:stylesheet version = '1.0'
<xsl:template match="/">
<html><body><table border="1">
<tr>
<th>BOOKNAME</th>
<th>AUTHORNAME</th>
<th>ISBN</th>
<th>PUBLISHER</th>
<th>PAGES</th>
<th>PRICE</th>
</tr>
<xsl:for-each select="CATALOGS/CATEGORY">

<xsl:for-each select="BOOK">
<tr><td valign="top"><xsl:value-of select="BOOKNAME"/></td>

67
<td valign="top"><xsl:value-of select="AUTHORNAME"/></td>
<td valign="top"><xsl:value-of select="ISBN"/></td>
<td valign="top"><xsl:value-of select="PUBLISHER"/></td>
<td valign="top"><xsl:value-of select="PAGES"/></td>
<td valign="top"><xsl:value-of select="PRICE"/></td>
</tr>
</xsl:for-each>
</xsl:for-each>
</table></body></html>
</xsl:template>
</xsl:stylesheet>
//output
BOOKNAME AUTHORNAME ISBN PUBLISHER PAGES PRICE
CLOUDES TO JESSE 111- WROX 276 42.00

CODE S223
XML IN ACTION WILLIAM 222- TECHMEDIA 476 87.00

S223
68

Module 2 - XML

Uploaded by

Copyright:

Available Formats

Module 2 - XML

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 2 - XML

Uploaded by

Copyright:

Available Formats

WEB TECHNOLOGIES,S7 R

Module II (12 hours)

XML SYNTAX RULES OR THE SYNTAX OF XML

Output: Hello my name is Tove.

 Xml validated against a dtd is "valid" xml.

1. The term SGML stands for Standard Generalized Markup Language

EXTENSIBLE MARKUP LANGUAGE (XML)

A markup language specifies the structure and content of a document. Extensible

1. XML stands for Extensible Markup Language.

2. It is designed to describe data or information and focus on what data is?

5. It is used to format and transfer data in an easy and convenient way.

6. It is a markup language like HTML.

9. XML documents are called self describing documents

11. XML is free and extensible. It is a compliment to HTML

16. XML shall support a wide variety of applications.

XML AS A SUBSET OF SGML

 Relationship of XML to SGML

SGML is intended to be absolutely independent of any applications

 HTML and XML are both based on SGML.

1. HTML is HyperText Markup Languge 1. XML is eXtensible Markup Language

<body> WELCOME</body> <place>Trivandrum</place>

TWO VIEWS OF AN XML DOCUMENT

<CPU >Speed 500 GHz </CPU>

Maker Brand Storage CPU

RAM units=”MB” HardDisk units=”GB”

fig: the logical structure of the acmepc catalog XML document.

<?xml version= ”1.0”?>

<!DOCTYPE catalog SYSTEM “catalog.dtd”>

<?xml version= ”1.0”?>

<hello> Welcome to XML</hello>

Here hello is the root element.

The root element can be empty.

<?xml version= ”1.0”?>

The full document is rooted in the entity known as Document Entity.

Fig : Physical View of an XML document

<myelement>7 &gt; 2</myelement>

Eg: publisher information (PUB1) entity can be declared as

<!ENTITY PUB! “BPB Publishers”>

<publisher>This book is from &PUB1;</publisher>

<!ENTITY myimage SYSTEM “1.gif” NDATA GIF>

Here GIF is the notation .Notation declaration for GIF is

<!Notation GIF SYSTEM “utils\gifview.exe”>

<!ENTITY myimage SYSTEM http://www.abc.com/image/1.gif

VI SIMPLE XML DOCUMENT

Create a test.xml file with the following content.

<greeting> Hello World </greeting>

The one line document has 3 component parts

 A start tag (<greeting>)

Element -- greeting

PCDATA -- “Hello World “

GRAPHICAL REPRESENTATION OF SIMPLE XML

HELLO WORLD WHITE SPACE

CREATING XML DOCUMENT

 Start and End Tags

XML DOCUMENT STRUCTURE

<greeting> Starts a greeting element

</introduction> Ends an introduction element

<Joe Black> Bad start tag .No space allowed

<42> Element name cannot begin with number

<myelement>7 > 2</myelement>