Module 2 - XML
Module 2 - XML
Module 2 - XML
INTRODUCTION
SGML is a meta-markup language is a language for defining markup language it can
describe a wide variety of document types.
_ Developed in the early 1980s; In 1986 SGML was approved by ISO std.
_ HTML was developed using SGML in the early 1990s - specifically for Web
documents.
Problems with HTML:
1. HTML is defined to describe the general form and layout of information in web documents without
considering its meaning.
2. Fixed set of tags and attributes. Given tags must fit every kind of document. No way to
find particular information
3. There are no restrictions on arrangement or order of tag appearance in document. For example, an
opening tag can appear in the content of an element, but its corresponding closing tag can appear after
the end of the element in which it is nested.
Eg : <strong> Now <em> is </strong> the time </em>
_ One solution to the first problems is to allow for group of users with common needs to
define their own tags and attributes and then use the SGML standard to define a newmarkup language to
meet those needs. Each application area would have its own markup language.
_ Use SGML to define a new markup language to meet those needs
Problem with using SGML:
1. It‘s too large and complex to use and it is very difficult to build a parser for it. SGML includes a large
number of capabilities that are only rarely used.
2. A program capable of parsing SGML documents would be very large and costly to develop.
3. SGML requires that a formal definition be provided with each new markup language.So having area-
specific markup language is a good idea, basing them on SGML is not.
A better solution: Define a simplified version of SGML and allow users to define their own markup
languages based on it. XML was designed to be that simplified version of SGML.
_ XML is not a replacement for HTML . Infact two have different goals
1
WEB TECHNOLOGIES,S7 R
_ HTML is a markup language used to describe the layout of any kind of information
_ XML is a meta-markup language that provides framework for defining specialized markup languages
_ Instead of creating text file like
<html>
<head><title>name</title></head>……
XML Syntax
<name>
<first> nandini </first>
<last> sidnal </last>
</name>
XML is much larger then text file but makes easier to write software that accesses the
information by giving structure to data.
-XML is a very simple and universal way of storing and transferring any textual kind
- XML does not predefine any tags
- XML tag and its content, together with closing tag _ element
- XML has no hidden specifications
- XML based markup language is called tag set
- Document that uses XML based markup language is called XML document
- An XML processor is a program that parses XML documents and provides the parts to
an application
- Both IE7 and FX2 support basic XML .XML is a meta language for describing mark-up
languages. It provides a facility to define tags and the structural relationship between them
What is XML?
XML stands for EXtensible Markup Language
XML is a markup language much like HTML
XML was designed to carry data, not to display data
XML tags are not predefined. You must define your own tags
XML is designed to be self-descriptive
XML is a W3C Recommendation
XML is not a replacement for HTML.
2
WEB TECHNOLOGIES,S7 R
XML is not a markup language. It is a meta markup language that specifies rules for creating
markup languages
</location>
</ad>
None of this tag in the document is defined in XHTML-all are designed for the specific content of the
document.
When designing an XML document, the designer is often faced with the choice between adding
a new attribute to an element or defining a nested element.
- In some cases there is no choices.
- In other cases, it may not matter whether an attribute or a nested element is used.
- Nested tags are used
<!-- A tag with one attribute -->
<patient name = "Maggie Dee Magpie">
...
</patient>
<!-- A tag with one nested tag -->
<patient>
<name> Maggie Dee Magpie </name>
...
</patient>
<!-- A tag with one nested tag, which contains
three nested tags -->
<patient>
<name>
<first> Maggie </first>
<middle> Dee </middle>
<last> Magpie </last>
</name>
... </patient>
Here third one is a better choice because it provides easy access to all of the parts of data.
WELL FORMED AND VALID XML DOCUMENTS
Well-formed documents applies basic xml rules on all its Documents
Valid documents are well-formed and also specified by Dtds or xml schemas
Dtds or xml schemas specify the set of tags that can appear in a particular document/documents
Xml with correct syntax is "well formed" xml.
5
WEB TECHNOLOGIES,S7 R
SGML FEATURES
(SGML), which was introduced in the 1980s. XML documents are conforming SGML
documents. XML, because it is extensible, can be used to create a wide variety of document types.
With XML, new markup languages, called XML applications, can be created. Many XML applications
have been developed to work with specific types of documents.
XML FEATURES
7. XML has the ability to work with HTML for data display and presentation
8. It is a standard language used to structure and describe data that can be understood by different
application.
10. XML tags are not predefined . you must define your own tags
12. XML includes specification for a Style Sheet Language called eXtensible Stylesheet
Language ( XSL )
13. XML includes specification for a hyper linking scheme , which is described as a separate
language called eXtensible Link Language ( XLL )
14. Every XML document consists of data and markup.You can literally tag up your data with your
own tags .
15. XML can be used as a data interchange format .Since the XML text format is standards based
,data can be converted and then easily read by another system or application
7
WEB TECHNOLOGIES,S7 R
SGML is a very powerful, very general and a standard markup language. But with that power comes
the increased complexity.
XML is a subset of SGML intended to make SGML” light “enough for use on web.
As XML is a proper subset of SGML, all XML documents are valid SGML documents .But not all
SGML documents are valid XML document.
SGML
XML
XML can be considered as SGML-Lite: 20% of SGML's complexity, 80% of its capacity.
XML is a lightweight cut-down version of SGML i.e. XML uses only the most commonly-used SGML
features.
The complexity of implementing SGML’s power limits it’s users to big companies that need all that
power. Hence XML the simplified SGML that retains most of the inherent power of SGML in a
simple ,tidy ,easy-to-use and easy-to-implement form arrived.
Since XML is optimized for use on the World Wide Web, it is designed in such a way that it has some
benefits that are not found in SGML.
XML becomes a smaller language than SGML because the designers of XML removed some
specification in SGML that was not needed for web delivery.
8
WEB TECHNOLOGIES,S7 R
XML will not replace either SGML or HTML; XML is compatible with both.
HTML XML
9
WEB TECHNOLOGIES,S7 R
XML describes a class of data objects called XML documents. XML documents have both logical and
a physical structures, which must nest properly to be well-formed. XML documents consist of storage
units (entities). An entity, by reference to other entities, may include them in a document. XML
documents begin with a "root" (or document) entity. The overall structure of any given XML document
can be looked at in two distinct ways .Firstly it has a Logical structure and side by side with the
logical structure XML document have a Physical structure.
1.Logical structure
Viewed from this angle an XML document is a hierarchy of information .It enlists the elements to be
included in a document and in the order in which they have to be included .The elements or character
data of the document hangs in individual group in a tree like structure created by the markup.
At the very top of the tree is called Root element from which all the further logical structure
develops .Thus it refers to the organization of the different parts of a document , ie it indicates how a
document is built.
Eg: <PcForSale>
<item type=”PC”>
<Maker>Acme PC Inc</Maker>
<Brand>Acme Deluxe</Brand>
<storage>
<RAM units=”MB”>72</RAM>
<HardDisk units=”GB”>10</HardDisk>
</storage>
10
WEB TECHNOLOGIES,S7 R
</item>
</PcForSale>
PCForSale
item type=”PC”
The logical structure is the layer above the physical structure .At this level an XML document consists
of an optional prolog, root element, and an optional epilog.
The first structural element in XML document that precedes the first start-tag is collectively known as
prolog. The prolog is everything that occurs before the root element starts .It can be completely empty
but should at least contain an XML declaration.
The XML declaration identifies the version of the XML specification to which the document conforms
.The sample document begins with the XML declaration <?xml version = ”1.0”?>
If the XML document is going to be associated with a Document Type Definition then the prolog will
contain a Document Type Declaration.
11
WEB TECHNOLOGIES,S7 R
The Document Type Declaration is the area of the prolog used to declare element types ,attributes
,entities and so on .It takes the following general form:<!DOCTYPE … > .It consists of markup code
that indicates the grammar rule .It can also point to an external file that contains all or part of DTD.
The following code adds a Document Type Declaration to the sample document
The above statement conveys the XML parser that the document is of the class ‘catalog’ and conforms
to the rules formed in the DTD files named ‘catalog.dtd’ .
Root Element :-
The root element of an XML document is the element that contains all other element in the document.
<hello/>
Epilog
The epilog is everything that occurs after the root element ends.
The word epilog is used here to name that area which can contain processing instruction, comments or
white space.
2) Physical Structure
The physical structure of an XML document is composed of all the content used in the
document .A single XML document can be made up of a number of distinct physical storage units
known as Entities .An Entity is a unit of text and are building blocks of XML document.
12
WEB TECHNOLOGIES,S7 R
An entity can be part of the XML document or external to the document .Each entity is identified by a
unique name and contains its own content from a single character inside the document to a large file
that exists outside the document.
Entities are declared in the document in the prolog and referenced in the document element. An entity
can contain reference to other entities, which themselves can contain references to other entities.
The previous XML document is split across five separate entities-typically files or storage medium or
other.
PCForSale
Entity A Entity B
(part1.xml) ( part2.xml)
Entity A1 Entity A2
(part12.xml)
(part11.xml)
An XML processor sees an XML document as a series of characters, which reads in a series fashion
.when it sees something called Entity Reference ,it reads the name of the entity and replaces the entity
reference with the actual text or graphic or other type of media that is referred to.
Types of Entities
1. Predefined Entity
In XML certain character (< ,> , /) are used specifically for marking up the document .It
cannot be interpreted as Character data ,so cannot be used as content .You must use.Entity Reference
to insert the character into the document like (<,> ,& etc)
13
WEB TECHNOLOGIES,S7 R
2.Parsed Entity
It contains text data that becomes part of the XML document once the data is processed .Parsed
entity is intended to be read by the XML processor which will extract the content. After the content is
extracted it becomes part of the document at the location of the entity reference.
Whenever the entity declaration is referenced in the document it will be replaced by its content
.First insert an ampersand (&) and then enter entity name followed by (;) for entity reference.
3 .Unparsed Entity
The contents may or may not be text .It is often a binary file or image that is not directly
interpreted by the XML processor .Unparsed entity requires a notation. Notation identifies the format
or type or resource to which the entity is declared.
The above declaration tells the processor that whenever it encounters an entity of type GIF it should use
“gifview.exe” to process it.
4. External Entity
It refers to a storage unit in its declaration by using a SYSTEM or public identifier.It provides a
pointer to a location at which entity can be found.
NDATA GIF>
In this example the XML processor must read the file 1.gif to retrieve the content of this entity.
14
WEB TECHNOLOGIES,S7 R
WHITESPACE --0xa
Here the “Hello World” text has been encapsulated beneath a “greeting” element.At the same level
it gives some White space in the form of end-of-line code added to the file by the text editor. The parser
reports this as a line feed character denoted by 0xa(linefeed – in Unicode and ASCII)
DOCUMENT
GREETING
15
WEB TECHNOLOGIES,S7 R
There are seven forms of markup that can occur in XML document.
Names are case sensitive. XML supports two types of elements, closed and empty (open) elements.
Closed elements consist of both opening(start) and closing(ending) tags. The following example
presents a closed element. In the closing tag, a forward slash precedes the element name.
<Month>January</Month>
16
WEB TECHNOLOGIES,S7 R
Elements can be nested, and all elements must be nested within a single root element. Nested elements
are termed child elements. Elements must be nested correctly, with child elements enclosed within their
parent opening and closing element tags, as follows:
<Year>2000
<Month>January</Month>
<Month>February</Month>
</Year>
Empty (open) elements contain no content. An empty or open element can be used to mark sections of
the document for the processor. Empty elements can contain attributes used. An empty element has the
following syntax; the element name is followed by a slash. Eg: <Year/>
Tag TagMeaning
Elements can be nested to an arbitrary depth to describe very rich information structure .Element
which does not have content is an empty element
ex: <hello/> .another example is <br> (line break) element in HTML, cannot sensibly have any
content .In XML it is an empty element .Empty element can have attributes.<hello
happy=”TRUE”/> is valid.
Also in <hello/>, the new line will be ignored as it occurs within markup .Empty element can also have
matching start and end tags as given below.<hello></hello> .
2.ATTRIBUTES
Attributes provide extra information about elements. Attributes are placed inside the start tag of an
element. Attributes come in name/value pairs. The following "img" element has an additional
17
WEB TECHNOLOGIES,S7 R
information about a source file: <img src="computer.gif" /> The name of the element is "img". The
name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is
empty it is closed by a " /".
Attribute Assignment Meaning
3.PCDATA
PCDATA means parsed character data. Think of character data as the text found between the start tag
and the end tag of an XML element. PCDATA is text that will be parsed by a parser. Tags inside the
text will be treated as markup and entities will be expanded.
4.CDATA
CDATA also means character data. CDATA is text that will NOT be parsed by a parser.
Tags inside the text will NOT be treated as markup and entities will not be expanded.
Most of you know the HTML entity reference: " " that is used to insert an extra
space in an HTML document. Entities are expanded when a document is parsed by an
XML parser.
_ An XML document often uses two auxiliary files:
• One to specify the structural syntactic rules ( DTD / XML schema)
• One to provide a style specification ( CSS /XSLT Style Sheets)
5.ENTITIES
An XML document has a single root element, but often consists of one or more entities
18
WEB TECHNOLOGIES,S7 R
An XML document consist of one or more entities that are logically related collection of
information,
Entities range from a single special character to a book chapter.
An XML document has one document entity.
All other entities are referenced in the document entity .
Reasons to break a document into multiple entities:
1. Good to define a Large documents as a smaller no. of parts easier to manage .
2. If the same data appears in more than one place in the document, defining it as a entity
allows any no. of references to a single copy of data.
3. Many documents include information that cannot be represented as text, such as
images. Such information units are usually stored as binary data. Binary entities can only
be referenced in the document entities
Rules of Entity names:
• No length limitation
• Must begin with a letter, a dash, or a colon
• Can include letters, digits, periods, dashes, underscores, or colons.
• A reference to an entity has the form name with prepended ampersand and appended semicolon:
&entity_name; Eg. &apple_image;
processor parses the document it will replace the entity reference with actual characters and will
not interpret characters as markup.
< <
> >
& &
' ‘
" “
Regardless of the entity types, all entities are referenced in the same way: &name;This code
will include the simple entity
19
WEB TECHNOLOGIES,S7 R
The &iso; sets the standard for character encoding.when interepted by an XML parser the result is the
International Organization for Standardization. External entities are referenced in the same way as
internal text entities ,with the content of the external file referenced by the entity declaration ,which
replaces the entity reference .Binary entities can be referenced only as the value of the element with
an attribute that takes an entity value.
If several predefined entities must appear near each other in a document, it is better to avoid
using entity references. Character data section can be used. The content of a character data
section is not parsed by the XML parser, so it can include any tags.
The form of a character data section is as follows: <![CDATA[content]]> // no tags can be used
since it is not parsed For example, instead of Start >>>> HERE <<<<
use <![CDATA[Start >>>> HERE <<<<]]>
The opening keyword of a character data section is not just CDATA, it is in effect [CDATA[.
There can be any spaces between [ and C or between A and [.
Content of Character data section is not parsed by parser For example the content of the line
<![CDATA[The form of a tag is <tag name>]]> is as follows .The form of a tag is <tag
name> XML allows a block of text to be insulated from the attention of the parser using
CDATA section .CDATA stands for character data . you can mark a section as character data
using this syntax.
<![CDATA[content]]>
Between the start of the section ” <![CDATA[ ” and end of the section “]]>” all character
data is passed directly to the application .comments are not recognized in a CDATA section.Here the
parser has detected the presence of a CDATA section and waved the entire string “a<b and b<c then
a<c” through the application directly.
CDATA sections can occur anywhere character data can occur. Because the first occurrence of “]]>”
will terminate the CDATA section .CDATA sections cannot be nested.
Ex3:<?xml version=”1.0”?>
20
WEB TECHNOLOGIES,S7 R
</apples>
Note the shielding effect of the CDATA section ,which protects what looks like an apple end-tag and
what looks like an entity reference.
PI’s are defined as markup that provides information to be used by s/w application. I’s begins with
“<?” and ends with “?>” pair.XML itself make use of processing instruction in what is known as
XML declaration. The simplest form of PI which should head up the entire XML document is<?xml
version=”1.0”?>
21
WEB TECHNOLOGIES,S7 R
WHAT IS DTD?
DTD stands for Document Type Definition.DTDs define an XML document's structure (e.g.,
what elements,attributes, etc. are permitted in the document). An XML document is not required to have
a corresponding DTD. However, DTDs are often recommended to ensure document conformity,
especially in business-to-business (B2B) transactions, where XML documents are exchanged.
22
WEB TECHNOLOGIES,S7 R
Internal: when rule are inserted directly in to the same XML document
External: when the rules are contained in an external file
Internal and External: when some rules are inserted directly in to the same XML document
and other rules are contained in an external file
Generally ,external files containing the DTD rules for an XML document have the extension “.dtd” .
INTERNAL DTD
The DTD section contains the rules an XML document must comply with to be a valid document.This
section can be inserted directly in to the XML document. If the DTD’s are internal then the syntax
is<!DOCTYPE root-element [<!internal type definition>]>
Internal DTD’s are also known as Internal Subset. This is a sample XML document with internal
DTD
<?xml version=”1.0”?>
<!ELEMENT to (#PCDATA)>
<mail>
<to>Rani</to>
<from>Ravi</from>
<heading>Remainder</heading>
</mail>
23
WEB TECHNOLOGIES,S7 R
EXTERNAL DTD
The DTD section can be placed in an external file.If the Document Type Declaration is external then
the DTD must be specified either as SYSTEM or PUBLIC in the Document Type Declaration.
or
If SYSTEM the DTD resides on the local hard disk and may not be available for use by other
applications. If PUBLIC the DTD can be used by anyone by referring the URL . The public_id is
unique for standard DTD files on the web.
The External subset(external DTD) if present, consists of a reference to an external entity following
the DOCTYPE keyword as illustrated here:
//mail.xml
<?xml version=”1.0”?>
<mail>
<to>Rani</to>
<from>Ravi</from>
24
WEB TECHNOLOGIES,S7 R
<heading>Remainder</heading>
</mail>
//mail.dtd
<?xml version=”1.0”?>
<!ELEMENT mail(to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
The DTD can be housed exclusively by either the external or internal subset or both.
The DTD sections required to validate an XML document can be inserted partially in to the document
and partially in to an external DTD file .
//apples.dtd
<!ELEMENT apples(#PCDATA)>
//apples.xml
<apples color=”green”>12</apples>
25
WEB TECHNOLOGIES,S7 R
Every element in a valid XML document must have an element type declared in the DTD.
To validate an XML document ,a validating parser needs to know three things about each element
1) What the element type is named
2) What elements of that type can contain(content model)
3) What attributes an element of that type has associated
Both the element type name and its content model are declared together in what is known as
Element Type declaration
Element Type declaration must start with the string “ <!ELEMENT “ followed by the name and
content specification
Every element has certain allowed content. there are four general types of content specification
1)EMPTY content may not have content
<!ELEMENT stock EMPTY> An element of type stock does not contain anything ex:
<stock/>
<name>aaa</name>
<address>SJCET,pala</address>
<phone>239301</phone>
</contact>
26
WEB TECHNOLOGIES,S7 R
<name>aaa</name><phone>239301</phone></contact>
<fruit><apple>---</apple></fruit>
<apple>---</apple>
<orange>--</orange>
</fruit>
<orange>---</orange><fruit>
Ex2:<fruit></fruit>
<list>---</list></para>
Ex2:<para>aaa,bbb,ccc</para>
Ex3:<para><list>---</list></para>
Attributes are information to further describe an element. If an element has attributes ,they need to be
declared. An element can contain attributes ,which are name-value pairs within a tagged element that
modify certain features of the element. An attribute specifies the type(and value) of additional
parameters that can be contained in the element. For XML,all values must be enclosed in quotation
marks. Either single or double quotation marks are acceptable, just be consistant . Attributes are defined
in the DTD as a subset of the element definition.
Attributes need to be declared in the DTD for validating XML parser to check that they have been
used properly in an XML document.
SYNTAX:
“ “ “ >
28
WEB TECHNOLOGIES,S7 R
4. How assignments to the attribute possible,if the value is not supplied( What default value each
attribute has default value).
You can also have multiple attribute list declaration for a single element
<!ATTLIST person email CDATA #REQUIRED>
<!ATTLIST person phone CDATA #REQUIRED>
Each attribute in a declaration has three parts: a name, type and default value. The table below shows
the partial attribute list declaration
29
WEB TECHNOLOGIES,S7 R
The study of attribute list declaration is incomplete without knowing about the Attribute type and
Attribute Defaults. The next two sections discuss about the same ..
ATTRIBUTE TYPES
Attribute list declarations serve to specify the name, type and optionally the default value of the
attribute associate with an element. There are mainly three classifications:
STRING ATTRIBUTES
The simplest type of attribute is the CDATA or string attribute. CDATA attribute values can be any
string of characters.
An element of type ‘elementname’ have an attribute called ‘attributename’ whose values can be any
string of characters(letters,numbers or punctuations) except <,>,$.
For example
<!ATTLIST product name CDATA …… >
An element of type product has an attribute called name whose values can be any string of
characters except <,>,& . String attribute can contain an arbitrary collection characters of any length as
long as any occurrence of “<” or “&” is escaped with entity references.
Sample:<product name=”Acme”>or
ENUMERATED ATTRIBUTES
30
WEB TECHNOLOGIES,S7 R
A list of permissible values can be supplied using an enumerated type attributes. An enumerated
attribute is one that can take on one of a fixed set of values supplied as part of its declaration.
Ex:<!ATTLIST product name CDATA ……. Color(red|green) ………>
An element of type product has two attributes known as name and color. The name attribute can have
any string of characters except <,>,&.color attribute must be either the string “red” or “green” .
<product name=”acme” color=”red”>
the sample element might look like this <hello UniqueName =”P1234”>
An IDREF attribute assigned values in a valid XML document must match the value assigned to an ID
attribute somewhere in the document.
<!ATTLIST bar Reference IDREF ….> Here is an example of bar element using its IDREF attribute to
point to the hello element of the last example.
<bar Reference=”P1234”>
IDREFS is a variation on IDREF in which an attribute is allowed to contain multiple referenced IDs
.So given this declaration:
<!ATTLIST bar References IDREFS ……>
31
WEB TECHNOLOGIES,S7 R
o ENTITY/ENTITIES
Attribute of type ENTITY or ENTITIES are treated together as they are strongly related. An Attribute
of type ENTITY must have a value corresponding to the name of an unparsed entity declared
somewhere in the Document Type Declaration. In this example an external data entity, bob, is declared
and the salutation attribute of the letter element is declared to be type ENTITY.
o NMTOKEN /NMTOKENS
Attribute of type NMTOKEN or NMTOKENS are treated together as they are strongly related. An
NMTOKEN attribute is restricted to contain characters allowed in a name: any combination of letters,
digits and some punctuation characters “.”, ”-”, ”_” and “:”.Note that this list does not contain any white
space characters.
An NMTOKENS attribute is one that can contain one or more NMTOKEN separated by white spaces.
ATTRIBUTE DEFAULTS
An Attribute list declaration includes information about whether or not a value must be supplied for it
and if not,what the XML processor should do.
1)A value --> The quoted value,which is used if none is specified in an element
3)Implied --->The XML processor tells the application that no value was supplied. The application can
decide what best to do.
4)Fixed --->A value is supplied in the declaration. No value need be supplied in the document and
the XML processor will pass the specified fixed value through the document. If a value is supplied in
the document, it must exactly match the fixed value.
#REQUIRED -The attribute must have an explicitly specified value on every occurrence of the
element in the document.
An element of type product has an attribute called name whose value can be any string of chars except
<,>,&.The value must be supplied when it is used in the document. <product name=”Acmepc”>In
this example the type attribute of the fruit element is declared to be required.
<fruit type=”apple”/>
<fruit />
#IMPLIED -These are attributes that can be left unspecified if desired. The XML processor passes the
fact that the attribute was unspecified through out the XML application, which can then choose what
best to do.
33
WEB TECHNOLOGIES,S7 R
<fruit />
An element of type product has an attribute called color. Color attribute must be either string “red” or
“green”. If the value is not supplied, leave it up to the XML application to decide what to do.
#FIXED attribute value – An attribute declaration may specify that an attribute has a fixed value.
An element of type product has an attribute called name having a fixed value Acmepc. Any other
value is an Error.<product name=”Acmepc”>
Default value- If the attribute value is making default then it should be specified within quotes.
<! ATTLIST product name CDATA “cmepc”>Here default value will be taken.
34
WEB TECHNOLOGIES,S7 R
This element structure can describe the document tree structure shown below.
• Child elements can have modifiers,
+ -> One or more occurrences
* -> Zero or more occurrences
? ->Zero or one occurrences
• Leaf nodes specify data types of content of their parent nodes which are elements
1. PCDATA (parsable character data)
2. EMPTY (no content)
3. ANY (can have any content)
Example of a leaf declaration:
<!ELEMENT name (#PCDATA)>
2. Declaring Attributes:
• Attributes are declared separately from the element declarations
• General form:
<!ATTLIST element_name attribute_name attribute_type [default _value]>
More than one attribute
< !ATTLIST element_name attribute_name1 attribute_type default_value_1
attribute_name 2 attribute_type default_value_2
…>
_ Attribute type :There are ten different types, but we will consider only CDATA
_ Possible Default value for attributes:
Value - value ,which is used if none is specified
#Fixed value - value ,which every element have and can‘t be changed
# Required - no default value is given ,every instance must specify a value
#Implied - no default value is given ,the value may or may not be specified
Example :
35
WEB TECHNOLOGIES,S7 R
• External DTDs
<!DOCTYPE XML_doc_root_name SYSTEM
.DTD_file_name.>
For examples,
<!DOCTYPE planes_for_sale SYSTEM
.planes.dtd.>
<?xml version = "1.0" encoding = "utf-8"?>
<!-- planes.xml - A document that lists ads for used airplanes -->
<!DOCTYPE planes_for_sale SYSTEM "planes.dtd">
<planes_for_sale>
<ad>
<year> 1977 </year>
<make>&c; </make>
<model> Skyhawk </model>
<color> Light blue and white </color>
37
WEB TECHNOLOGIES,S7 R
<description> New paint, nearly new interior,
685 hours SMOH, full IFR King avionics </description>
<price> 23,495 </price>
<seller phone = "555-222-3333"> Skyway Aircraft </seller>
<location>
<city> Rapid City, </city>
<state> South Dakota </state>
</location>
</ad>
</planes_for_sale>
XML NAMESPACES
Xml namespaces provide a method to avoid element name conflicts.
In xml, element names are defined by the developer. This often results in a conflict when trying
to mix xml documents from different xml applications.
This xml carries html table information:
<table>
<tr>
<td>apples</td>
<td>bananas</td>
</tr>
</table>
This xml carries information about a table (a piece of furniture):
<table>
<name>african coffee table</name> <width>80</width>
<length>120</length>
</table>
If these xml fragments were added together, there would be a name conflict. Both contain a
<table> element, but the elements have different content and meaning.
An xml parser will not know how to handle these differences.
Dtds do not support namespaces very well.
Namespaces can be declared in the elements where they are used or in the xml root element
The namespace uri is not used by the parser to look up information.
38
WEB TECHNOLOGIES,S7 R
<h:table>
<h:tr>
<h:td>apples</h:td>
<h:td>bananas</h:td>
39
WEB TECHNOLOGIES,S7 R
</h:tr>
</h:table>
<f:table>
<f:name>africancoffeetable</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
DEFAULT NAMESPACES
Defining a default namespace for an element saves us from using prefixes in all the child
elements. It has the following syntax:
Xmlns="namespaceURI"
This xml carries html table information:
<table xmlns="http://www.w3.org/TR/html4/">
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
This xml carries information about a piece of furniture:
<table xmlns="http://www.w3schools.com/furniture">
<name>AfricanCoffeeTable</name>
<width>80</width>
<length>120</length>
</table>
XML schemas
A schema is any type of model document that defines the structure of something, such as databases
structure or documents. Here something is XML doc. Actually DTDs are a type of schema.
An XML schema is an XMl document so it can be parsed with an XML parser.
The term XML schema is used to refer to specify W3C XML schema technology.
W3C XML Schemas like DTD allow you to describe the structure for an XML doc.
40
WEB TECHNOLOGIES,S7 R
41
WEB TECHNOLOGIES,S7 R
The default namespace which is source of the unprefixed names in the schema is given
with another xmlns specification xmlns = "http://cs.uccs.edu/planeSchema
A complete example of a schema element:
<xsd:schema
<!-- Namespace for the schema itself -->
xmlns:xsd = http://www.w3.org/2001/XMLSchema <!-- Namespace where
elements defined here will be placed -->
targetNamespace = .http://cs.uccs.edu/planeSchema.
<!-- Default namespace for this document -->
xmlns = .http://cs.uccs.edu/planeSchema.
<!-- Specify non-top-level elements to be in the target namespace-->
elementFormDefault = "qualified. >
42
WEB TECHNOLOGIES,S7 R
• There are several categories of complex types, but we discuss just one, element-only elements
• Element-only elements are defined with the complex Type element
• Use the sequence tag for nested elements that must be in a particular order
• Use the all tag if the order is not important
• Nested elements can include attributes that give the allowed number of occurrences
(minOccurs, maxOccurs, unbounded)
• For ex:
<xsd:complexType name = "sports_car" >
<xsd:sequence>
<xsd:element name = "make. type = "xsd:string" />
<xsd:element name = "model . type = "xsd:string" />
<xsd:element name = "engine. type = "xsd:string" />
<xsd:element name = "year. type = "xsd:string" />
</xsd:sequence>
</xsd:complexType>
7. Validating Instances of Schemas:
• An XML schema provides a definition of a category of XML documents.
• However, developing a schema is of limited value unless there is some mechanical way to determine
whether a given XML instance document confirms to the schema.
• Several XML schema validation tools are available eg. xsv(XML schema validator) .This can be used
to validate online.
• Output of xsv is an XML document. When run from command line output appears without being
formated.
Output of xsv when run on planes.xml
<?XML version = =1.0‘ encoding = =utf-8?>
<xsv docElt = ={ http://cs.uccs.edu/planesSchema} planes‘
instanceAccessed = =true‘
instanceErrors = =0‘
schemaErrors = =0‘
schemaLocs = =http://cs.uccs.edu/planesSchema->planes.xsd‘
Target = =file: /c:/wbook2/xml/planes.xml‘
Validation = =strict‘
Version = =XSV 1.197/1.101 of 2001/07/07 12:01:19‘
Xmlns==http:// www.w3.org/2000/05.xsv‘>
44
WEB TECHNOLOGIES,S7 R
<importAttempt URI ==file:/c:wbook2/xml/planes.xsd‘
namespace = =http://cs.uccs.edu/planesSchema‘
outcome = =success‘ />
</xsv> If schema is not in the correct format, the validator will report that it could not find the specified
schema.
Displaying RAW XML Documents
An XML enabled browser or any other system that can deal with XML documents
cannot possibly know how to format the tags defined in the doc.
Without a style sheet that defines presentation styles for the doc tags the XML doc can
not be displayed in a formatted manner.
Some browsers like FX2 have default style sheets that are used when style sheets are
not defined.
Eg of planes.xml document.
Refer Page No.307
45
WEB TECHNOLOGIES,S7 R
color {display: block; margin-left: 20px; font-size: 12pt;}
description {display: block; margin-left: 20px; font-size: 12pt;}
seller { display: block; margin-left: 15px; font-size: 14pt;}
location {display: block; margin-left: 40px; }
city {font-size: 12pt;}
state {font-size: 12pt;}
<?xml version = "1.0" encoding = "utf-8"?>
<!-- planes.xml - A document that lists ads for used airplanes -->
<planes_for_sale>
<ad>
<year> 1977 </year>
<make> Cessana </make>
<model> Skyhawk </model>
<color> Light blue and white </color>
<description> New interior
</description>
<seller phone = "555-222-3333">
Skyway Aircraft </seller>
<location>
<city> Rapid City, </city>
<state> South Dakota </state>
</location>
</ad>
</planes_for_sale>
46
WEB TECHNOLOGIES,S7 R
Data island:
The following example creates an internal data island in a simplified HTML document. The requisite
XML declaration and root element are included in the example.
<html><body>
<xml id="customers">
<customer>
<name>Font Factory</name>
<name>BeBop Grafix</name>
</customer>
</xml></body></html>
IE's XML parser creates a data island from an XML document by storing the data as a Data Source
Object (DSO). ). Interactions between data islands and the Web page are controlled by the DSO.
The element attributes datasrc, datafld, dataformatas, and datapagesize allow IE to display bound data.
47
WEB TECHNOLOGIES,S7 R
The datasrc attribute specifies the data island source with a URI.
The datasrc attribute links an HTML element to a DSO. The attribute property specifing the unique
identifier of a DSO must be prefixed by a number sign (#). The datafld attributes reference specific
elements in the XML data source. The following syntax is required:
The datasrc attributte can be used with the following elements: a, applet, button, div, frame, iframe,
img, label, marquee, select, span, table, textarea, and the input tags (types button, checkbox, hidden,
image, password, radio, and text).
//catalogs.xml
<?xml version="1.0"?>
<PCS>
<PC>
<NAME>Zenith</NAME>
<CAPACITY>100</CAPACITY>
<PRICE>20000</PRICE>
</PC>
<PC>
<NAME>DELL</NAME>
<CAPACITY>200</CAPACITY>
<PRICE>40000</PRICE>
</PC>
<PC>
<NAME>Acer</NAME>
48
WEB TECHNOLOGIES,S7 R
<CAPACITY>300</CAPACITY>
<PRICE>35000</PRICE>
</PC>
</PCS>
//catalog1.html
<html>
<body>
</xml>
<thead><th>Name</th><th>Capacity</th><th>Price</th></thead>
<tr><td><div datafld="NAME"></td>
<td><div datafld="CAPACITY"></td>
<td><div datafld="PRICE"></td>
</tr>
</table>
</body></html>
Explanation:
we need to create an HTML table to lay out the information .We can write the start of the table
element to fit our data which is to be displayed.
<table datasrc="#x" border="1" align="center">
<thead>
<th>Name</th>
<th>Capacity</th>
49
WEB TECHNOLOGIES,S7 R
<th>Price</th>
</thead>
<td><div datafld="CAPACITY"></td>
<td><div datafld="PRICE"></td>
</tr>
This table row is declared to contain three table cells .The datafld attribute is used to specify the
name of the data field each cell will contain.
//output
Eg2: Product.xml
<Products>
<Item>
<Number>0001</Number>
<Price>15.00</Price>
50
WEB TECHNOLOGIES,S7 R
</Item>
<Item>
<Number>0015</Number>
<Price>473.00</Price>
</Item></Products>
<HTML>
<HEAD>
</HEAD>
<BODY>
<THEAD>
<TH>Item Name</TH>
<TH>Item Number</TH>
<TH>Price</TH>
</THEAD>
<TR>
<TD><Span DATAFLD="Name"/></TD>
51
WEB TECHNOLOGIES,S7 R
<TD><SPAN DATAFLD="Number"/></TD>
<TD><SPAN DATAFLD="Price"/></TD>
</TR>
</TABLE>
</BODY>
</HTML>
Output:
<?xml version="1.0"?>
<PCS><PC>
<NAME>Zenith</NAME>
<CAPACITY><RAM>10</RAM>
<DISK>200</DISK></CAPACITY>
<PRICE>20000</PRICE>
52
WEB TECHNOLOGIES,S7 R
</PC>
<PC>
<NAME>DELL</NAME>
<CAPACITY><RAM>20</RAM>
<DISK>300</DISK></CAPACITY>
<PRICE>40000</PRICE>
</PC>
</PCS>
<html><body>
<thead>
<th>Name</th>
<th>Capacity</th>
<th>Price</th>
</thead>
<tr><td><div datafld="NAME"></td>
<td>
<thead><th>RAM</th>
<th>DISK</th>
53
WEB TECHNOLOGIES,S7 R
</thead>
<tr>
<td><div datafld="RAM"></td>
<td><div datafld="DISK"></td>
</tr>
</table>
</td><td><div datafld="PRICE"></td>
</tr>
</table>
</body></html>
//output
Eg-
Write an application to create an Address Book in XML and display the Address book in the browser
usingHTML.
1) Create a valid XML document for Address Book with following elements.The address book stroes
'N' persons Name(fname,lname),Address(hname,city,state,country) all the phone numbers(mob,off,res).
2) The sub elements are given in the bracketStore the xml data in an HTML page and display the data in
HTML browser as HTML Tables. As we are storing the XML data in HTML page , there is only one
HTMLfilecalledaddressbook.html
3) Use XML tag for storing the content in an HTML page.
54
WEB TECHNOLOGIES,S7 R
//adressbook.html
<html>
<body>
<xml id="addbook">
<addbook>
<person>
<name>
<fname>Ammu </fname>
<lname>Kurian</lname>
</name>
<address>
<hname>Vadakkan</hname>
<city>Pala</city>
<state>Kerala</state>
<country>India</country>
</address>
<email>ammu@yahoo.com</email>
<phone>
<res>647346</res>
<mob>97878473</mob>
<off>38463486</off>
</phone>
55
WEB TECHNOLOGIES,S7 R
</person>
<person>
<name>
<fname>Anju</fname>
<lname>john</lname>
</name>
<address>
<hname>Anju Nivas</hname>
<city>Kavadiyar,Tvm</city>
<state>Kerala</state>
<country>India</country>
</address>
<email>ammu@hotmail.com</email>
<phone>
<res>4447346</res>
<mob>4548473</mob>
<off>3763486</off>
</phone>
</person>
</addbook>
</xml>
<TH>NAME</TH>
<TH>ADDRESS</TH>
<TH>EMAIL</TH>
<TH>PHONE</TH>
</THEAD>
<TR>
<tr><td><div datafld="fname"></td>
<td><div datafld="lname"></td>
</tr></table></TD>
<tr><td><div datafld=hname></td></tr>
<tr><td><div datafld=city></td></tr>
<tr><td><div datafld=state></td></tr>
<tr><td><div datafld=country></td></tr>
</table></TD>
<tr><td>Res:<span datafld=res></td></tr>
<tr><td>Mob:<span datafld=mob></td></tr>
<tr><td>Off:<span datafld=off></td></tr>
57
WEB TECHNOLOGIES,S7 R
</table></TD>
</TR></TABLE>
</body></html>
//output
Pala Mob:97878473
Kerala Off:38463486
India
Kavadiyar,Tvm Mob:4548473
Kerala Off:3763486
India
XSL a transformation language that enables the conversion of XML into grammar and structure
suitable for display in the browser .XSL allows direct browsing of XML file using explorer or any
browser.
58
WEB TECHNOLOGIES,S7 R
The basic tags that are used in the XSL is given below.
xsl:stylesheet :This is the first element in the style sheet .In this element we can specify the
Namespace for style sheet.
XML Namespace is a collection of names identified by a URI (Uniform Resource Identifier) or a
URL (Uniform Resource Locater) reference which are used in the XML document as element type
and attribute names .By using the namespace, developers can qualify element names uniquely on the
web and thus avoid conflicts between element that have same name. Imporatant namespaces are given
below.
a) xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
This is the official standard transformation namespace used as a part of the W3C
recommendation.[W3C is World Wide Web Consortium , which is a forum for information,
commerce ,communication and collective understanding] .It is not supported by IE5.0
b) xmlns:xsl='http://www.w3.org/TR/WD-xsl
This is the namespace used for style sheet processed by IE5.
2) xmlns:xsl Declares the namespace for the current specification. This is a fixed attribute with the
value 'http://www.w3.org/1999/XSL/Transform'
xsl:template : Template is a structured container that manages the way a source tree or a portion of
the source tree is transformed. when you build a style sheet ,you will build a series of template that
match elements you would like to attach some styles to. xsl:template defines a set of riles for
59
WEB TECHNOLOGIES,S7 R
transforming nodes in a source document into a result tree .This is handled with match attributes. The
important xsl:template attributes are as follows:
1)modeIdentifies the processing mode and matches it against an apply-template element that has
matching template value.
xsl:apply-templates : This element tells the processor to process a named template that has been
defined using xsl:template element. Possible attributes are given below:
1) selectIdentify the node to be processed.If theis attribute is not specified,then processor will
process the template of the current node in the order of their appearance in the root document
2)mode Identify the processing node and selects only those template elements that have a matching
node value.
xsl:for-each : This element locates the set of elements in the XML data and repeats a portion of the
template for each one .Possible attributes are
1)select Identify the node to be processed.
xsl:value-of : within <xsl:for-each> element we can further drill down to select children.<xsl:value-
of> element specifies a specific child and then insert the text content of that child into that template
using select attribute .select attribute specifies the node to be processed.
CONVERTING XML INTO HTML WITH XSL
An XSL style sheet are themselves XML documents. The XSL is based on set of rules that trigger
when specified elements are encountered in an XML document. Every XML page should contain an
XSL file to display it in an explorer.
<?xml version="1.0"?>
60
WEB TECHNOLOGIES,S7 R
<!DOCTYPE hello[
<hello>
</hello>
<xsl:stylesheetversion="1.0"
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'></xsl:stylesheet>
Step3: browser the hello.xml file on the browser you will get the output.
The DTD part can also be neglected for simple file like this the XSL will take care of the document.
//hello1.xml
<?xml version="1.0"?>
<hello><wel>welcome to XML</wel>
61
WEB TECHNOLOGIES,S7 R
//hello1.xsl
<xsl:stylesheet version="1.0"
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="hello">
<html><body><table>
<tr><td align="center">
<h1><xsl:value-of select="wel"/></h1></td></tr>
<xsl:for-each select="xm">
<tr><td>
<h2><xsl:value-of select="xm1"/></h2></td></tr>
<xsl:for-each select="xm2">
<tr><td>
<h3><xsl:value-of select="sg1"/></h3></td></tr>
<tr><td>
<h3><xsl:value-of select="sg2"/></h3></td></tr>
</xsl:for-each></xsl:for-each>
</table></body></html>
</xsl:template>
welcome to XML
62
WEB TECHNOLOGIES,S7 R
Example 3:
//Email.xml
<?xml version="1.0"?>
<mail>
<Recipient>ab@yahoo.com</Recipient>
<Sender>cd@yahoo.com</Sender>
<Subject>XML literature</Subject>
<Textbody>
<sal>Hello </sal>
</Textbody>
</mail>
//Email.xsl
<!--Since Style Sheet is an XML itself, the file begins with xml declaration
the <xsl:stylesheet> element indicates that this document is a style sheet file and provides the location
for declaring XSL namespace.
63
WEB TECHNOLOGIES,S7 R
Also wrap the entire template with <xsl:template match="/"> to indicate that this template corresponds
to the root(/)of the XML document -->
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<HTML>
<BODY><TABLE BORDER="0">
<TR><TD>MAIL DETAILS</TD></TR>
<xsl:for-each select="mail">
<TR><TD><xsl:value-of select="Recipient"/></TD></TR>
<TR><TD><xsl:value-of select="Sender"/></TD></TR>
<TR><TD><xsl:value-of select="Date"/></TD></TR>
<TR><TD><xsl:value-of select="Subject"/></TD></TR>
<TR><TD><xsl:for-each select="Textbody">
<xsl:value-of select="sal"/>
<br/><xsl:value-of select="content"/>
<br/><br/><xsl:value-of select="thanks"/>
<br/><xsl:value-of select="name"/>
</xsl:for-each>
</TD></TR>
</xsl:for-each></TABLE></BODY></HTML>
</xsl:template></xsl:stylesheet>
64
WEB TECHNOLOGIES,S7 R
//output
MAIL DETAILS
ab@yahoo.com
cd@yahoo.com
XML literature
Hello
Please read Jon Bosak's introductory text "SGML, Java and the Future of the Web"
Best wishes,
Ingo Macherius
Example 4:
Write a program which creates a valid book CATALOG document in XML .The book catalog stores
any CATEGORY of books and each BOOK ELEMENT stores
BOOKNAME,AUTHORNAME,ISBN,PUBLISHER,PAGES,PRICES etc.The element BOOK
having an enumerated attribute list called Best Seller and the Price element having an attribute called
Currency.You can further expand the elements if necessary…Write the valid internal DTD and create
XSL file to display it in browser..
//bcatalogxml
<?xml version="1.0"?>
<!DOCTYPE CATALOGS[
]><CATALOGS>
<CATEGORY TYPE="XML">
<BOOK BESTSELLER="NO">
<BOOKNAME>CLOUDES TO CODE</BOOKNAME>
<AUTHORNAME>JESSE</AUTHORNAME>
<ISBN>111-S223</ISBN><PUBLISHER>WROX</PUBLISHER>
<PAGES>276</PAGES>
<PRICE CURRENCY="usd">42.00</PRICE>
</BOOK></CATEGORY>
<CATEGORY TYPE="XML">
<BOOK BESTSELLER="YES">
<BOOKNAME>XML IN ACTION</BOOKNAME>
<AUTHORNAME>WILLIAM</AUTHORNAME>
<ISBN>222-S223</ISBN>
66
WEB TECHNOLOGIES,S7 R
<PUBLISHER>TECHMEDIA</PUBLISHER><PAGES>476</PAGES>
<PRICE CURRENCY="usd">87.00</PRICE>
</BOOK></CATEGORY></CATALOGS>
bcatalog.xsl
<?xml version="1.0"?>
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<html><body><table border="1">
<tr>
<th>BOOKNAME</th>
<th>AUTHORNAME</th>
<th>ISBN</th>
<th>PUBLISHER</th>
<th>PAGES</th>
<th>PRICE</th>
</tr>
<xsl:for-each select="CATALOGS/CATEGORY">
<!--<tr><td colspan="6">
<xsl:apply-templates/></td>
</tr>-->
<xsl:for-each select="BOOK">
</tr>
</xsl:for-each>
</xsl:for-each>
</table></body></html>
</xsl:template>
</xsl:stylesheet>
//output
68