XML

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 40

I.What is XML?

• XML and HTML


• Where does it fit in with other markup languages?
II. How does it work?
• Your own private language
• DTDs and schemas
• XSLT: Extensible style sheet transformation
language
• Xpath, Xlink, Xpointer, Xforms
III. How will it change the web?
• Examples of XML applications
I. What is XML?
XML is Extensible Markup Language
It is a meta-language
It is a language used to create languages that can
describe data
It is extensible
Authors can define their own tags and attributes
that can be easily processed and displayed across
platforms
XML became a World Wide Web Consortium (W3C)
Recommendation 2/10/98, corrected 10/6/00
http://www.w3.org/TR/REC-xml
Phase 1: Began 6/96, ended in the W3C XML 1.0
Recommendation, 2/98 (revised 10/00)
Phase 2: Began 2/98 Working Groups developed
Recommendation Namespaces in XML (1/99) and
Recommendation for style sheet linking (6/99)
Phase 3: Began 9/99, with unfinished work from phase 2
and ended 5/02
Introduced a Working Group on XML Query
XML Protocol Activity was launched in 9/00
Phase 4: Began 5/02, focus on completing work in
progress, cleaning up existing specs, and aligning them
better with each other and with other W3C specifications
http://www.w3.org/XML/Activity
So what’s wrong with HTML?
It’s simple enough for children to use
This is because it is rigid and inflexible
It does a good job representing the structure and format
of documents
It can’t tell us anything about the meaning of
documents
It can be used across platforms
It is rife with proprietary markup
It can be searched
The inability of search engines to capture the meaning
of content leads to poor performance
So what’s right with XML?
It should be easily usable over the Internet
Web servers should require minimal configuration
changes to be able to serve XML documents
It should be easy to write programs that process XML
documents
Experimental XML software is written in Java, with
some XML parsers contained in class files of a few KB
XML documents should be human-legible and clear
Users of XML can create their own tags and attributes
with self-explanatory names
An XML file should as readable as plain text
The XML standard should be prepared quickly
The design of XML shall be formal and concise
Syntax descriptions in XML specification use a formal
grammar that is concise, easy to understand, and easy
to translate into code
XML documents shall be easy to create
Well-formedness enables you to quickly mark up any
document or translate it from HTML to XML
Terseness in XML markup is of minimal importance
Clear and unambiguous syntax is always given
preference over saving a few keystrokes
XML is used to create specialized markup languages by
defining sets of tags and attributes
It is a subset of SGML and allows “generalized markup”
It is useful for storing structured data that will be
published in a variety of media
By itself, XML does not define any tags
You create your own tags (your own markup language)
CML: Chemical Markup Language

MathML: Mathematical Markup Language


ebXML: Electronic Business Markup Language
Properly done, XML documents can be viewed across
platforms
XML describes data in a human readable and machine
understandable format
This format is intended to capture the meaning of the
data
There is no indication of how the data are to be displayed
It is a database-neutral and device-neutral language
Data marked up in XML can be targeted to different
formats
XML can also be used to publish data on different
platforms
Some relationships among markup languages

SGML

XML
HTML

CML
HTML 3.2 XSLT

ebXML
CSS HTML 4.01
MML
XHTML
How XML supports other Web markup languages and
applications

http://www.w3.org/XML/Activity
I. What is XML?
• XML and HTML
• Where does it fit in with other markup languages?
II. How does it work?
• Your own private language
• DTDs and schemas
• XSLT: Extensible style sheet transformation
language
• Xpath, Xlink, Xpointer, Xforms
III. How will it change the web?
• Examples of XML applications
II. How does it work?
An XML document us actually composed of three
different files
1. The raw XML file (.xml)
This file has the basic data marked up with XML tags
It will contain markup that will link the file to both the
DTD(or “schema”) and the XSL stylesheet
It must follow certain rules to be considered “well
formed” and “valid”
This is necessary if the document is to be displayed
by a browser or parser
Here's a simple HTML document:

<html>
<head>
<title>Memo form</title>
</head>
<body>
<b>4.10.01</b><br />
<b>TO:</b> Nitin<br />
<b>CC:</b> Saurabh<br />
<b>FROM:</b> Manisha<br />

<p>Please take note: our phone number has changed.</p>

<p>Yours in clownitude,<br />


Bozo</p>
</body>
</html>
XML reflects the structure of the data by creating tags
identifying:
The type of document as a <memo>
Its content divisions: a <header> and a
<memotext>
When it was sent: <date>
An addressing scheme with two types of actions:
<to> and <cc>
The sender of the message as <sender>
The name of the recipient as <name>
The text of the memo: <memotext>
The signature as an entity called &sig;
<?xml version="1.0" standalone="no"?>
<!DOCTYPE MEMO SYSTEM "http://www.site.com/dtds/memo.dtd">
<memo>
<header type=“informative”>
<date>04.10.01</date>
<to>
To: Here’s the same
<name>Nitin</name> document as an
</to> XML file
<cc>
CC:
<name> Saurabh </name>
</cc>
<from>
From:
<sender>Manisha</sender>
</from>
</header>
<memotext>
Please take note our phone number has changed.
&sig;
</memotext>
</memo>
Rules for writing XML
There must be a “root element”
Documents must be “well formed”
Elements must be properly nested
If a DTD is used, documents must be “valid”
Markup on the document must conform to the DTD
Every tag must be closed
Empty tags are closed with a slash <picture />
XML is case sensitive
All attribute values must be in quotation marks
All entity references must be declared in a DTD before
being used in a document
2. A Document Type Definition (DTD)
It is a set of rules that defines the tags, elements,
entities, attributes and other elements that can be used
in XML files
It determines how they can be used
It also specifies how they are logically related
Elements in a DTD are hierarchical and nested
DTDs can be internal (within the document) or external
(.dtd extension)
For the XML document to be “valid,” it must conform
to the rules laid out in the DTD to which it is linked
DTDs have
Elements
These are the basic tags used in the markup
One must be a “root element” and is the most
inclusive container
All other elements are nested with it
An element can be defined by using other elements
It can also be defined as containing text (#PCDATA)
The sequence determines the nesting
Elements defined in the DTD must appear in the
document
There is special markup that allows choice
The generic form of an element is:
<!ELEMENT element_name rule>
The “rule” is the “content model” of the element
It specifies the nested elements used to define the
main element
It also specifies the order in which the elements must
appear
In our example the root element is <memo>
It is defined in terms of <header> and <memotext>
It is written as:
<!ELEMENT memo (header, memotext)>
DTDs have
Attributes
These contain additional information associated with
the element
The information is a form of metadata
It is “about” the element rather than part of the
element
They are useful for enumerated data (ex: product id #)
There is a small predefined set of attributes that can
be used
Attributes and their values appear in the opening tag
of a paired tag (or in the unpaired tag)
The generic form of an attribute is:
<!ATTLIST element_name
attribute_name attribute_type default_value
attribute_name attribute_type default_value
attribute_name attribute_type default_value>
The element name is required because attributes must
be attached to elements
There is a set of attribute types that can be used to
specify categories of content (for example)
CDATA: Character data (anything except markup)
ID: unique value (only appear once in a document)
NOTATION: provides processing instructions (how to
open a binary file)
In our example there is an attribute called “type” that is
placed in the opening <memo> tag
The value is “informative”
Assume this is one of several types of memos that
could be sent
In a DTD, it might look like this:
<!ATTLIST memo (informative | directive | scheduling)
The “|” (pipe) is a separator
It sets a condition where one one value from the
sequence may appear in the document markup
Entities provide a type of shorthand in XML markup
They reference text or other elements and call them
when used in the DTD or document
General entities place data into the document
Internal means that they are used only within the
document
External means that they are in an external DTD and
can be reused
Parameter entities are used in the DTD
They can refer to another element or group of
elements and can be reused in the same or different
DTDs
The entity has the generic form:
<!ENTITY entity_name “text string”>
In the example, it appears in the DTD as:
<!ENTITY sig “Yours in JIMS, Manisha”>
In our example, we represented a text string with an
entity
“Yours in JIMS, Manisha” was represented in the
document with:
&sig;
The entity is expanded when the document is parsed
This is a convenient way to include large blocks of
text that only have to be entered once
Here’s what a DTD (memo.dtd) would look like for this
memo

<!ENTITY sig “Yours in JIMS, Manisha”>


<!ELEMENT memo (header, memotext)>
<!ELEMENT header (date, to, cc?, from)>
<!ATTLIST header
type (informative | directive | scheduling)>
+ = must appear at least
<!ELEMENT date (#PCDATA)> once or many times
<!ELEMENT to (name+)> ? = may be omitted or
<!ELEMENT name (#PCDATA)> can appear once
<!ELEMENT cc (name*)> * = may be omitted or
can appear many times
<!ELEMENT from (sender+)
| = one or the other but
<!ELEMENT sender (#PCDATA)> only one may appear
<!ELEMENT memotext (#PCDATA)> #PCDATA = text
Schemas
XML Schema are an alternative to DTDs
DTDs are “global,” so an element can only be defined
once
This is a problem if the element is used differently in
two different contexts
Schemas allow global (the same everywhere) and local
(differ in different contexts) elements
DTDs cannot specify the data type of an element
Schemas can specify data types
DTDs are not written in XML
Schemas are
Schemas divide content into two types
Simple types
These contain only text
In DTDs these are represented by the attribute_type
“PCDATA” (a name, integer, date…)
Complex types
These elements define the structure of the document
Some will contain other elements
Some will contain elements and text
Some will contain only text
Some will be empty
<?xml version=“1.0”?>
<xml:schema xmlns:xsd=“http://www.w3c.org/2000/10/XMLSchema”>
<xsd:element name=“name” type=“xsd:string”>
<xsd:complexType name=“memo”>
<xsd:sequence>
<xsd: complexType=“header”>
<xsd:element name=“date” type =“xsd:date”>
<xsd: complexType=“to”> Here is the
<xsd:element ref=“name”/> memo DTD as
</xsd:complexType> a schema:
<xsd: complexType=“cc”>
<xsd:element ref=“name”/>
</xsd:complexType>
<xsd: complexType=“from”>
<xsd: element=“sender” type =“xsd:string”>
</xsd:complexType>
<xsd:element name=“memotext” type =“xsd:string”>
</xsd sequence>
<xsd:attribute name=“type” value=“informative | directive | scheduling”>
</xsd:complextype>
</xsd:schema>
<?xml version=“1.0”?>
<memo xmlns:xsi=“http://www.w3c.org/2000/10/XMLSchema-
instance”> <xsi:noNamespaceSchemaLocation=“/xml/ns/memo.xsd”>
<memo>
<header type=“informative”>
<date>04.10.01</date>
<to> Here is how the
To:
<name>Nitin</name> memo calls the
</to> schema
<cc>
CC:
<name> Saurabh </name>
</cc>
<from>
From:
<sender>Manisha</sender>
</from>
</header>
<memotext>
Please take note our phone number has changed.
&sig;
</memotext>
</memo>
3. An XSL stylesheet
This file contains transformation rules that determine
how the components of an XML file will be rendered
and displayed in a range of formats (.xsl extension)
With XSL-FO, specific formatting or style rules can be
applied to specific components of a DTD
This language is not supported by any browsers yet
With XSLT, a transformation process can be specified
to convert XML documents into other formats (HTML,
RTF, LaTeX, text)
This can be used
An XSL stylesheet is also an XML document and must
be "well formed"
The process begins with an XML document and an XSLT
style sheet
The XSLT parser translates both into trees
The XML document is the source tree
The XSLT style sheet is the style tree
Trees consist of nodes
Root node Element nodes Text nodes
Attribute nodes Processing instruction nodes
Namespace node
The XSLT processor uses these trees to create a result
tree
This becomes the final or result document
XML Memo as a source tree

Memo

Header Memotext

Date To CC From #PCDATA

#PCDATA Name Name Sender

#PCDATA #PCDATA #PCDATA


And here’s what the XSL stylesheet might look like

<?xml version=“1.0”>
<xsl stylesheet xmlns=“http://www.w3c.org/1999/XSL Transform”
version=“1.0”>
<xsl template match=“/”>
<html> <head> <title>Memo form</title> </head> <body>
<xsl:template match=“header”>
<b><xsl:apply-templates select=“date” /><b><br />
<b><xsl:apply-value-of select=“to/name” /><b><br />
<b><xsl:apply-value-of select=“cc/name” /><b><br />
<b><xsl:apply-value-of select=“from/sender” /></b>
</xsl:template>
<p><xsl:apply value-of select=“memotext”></p>
<p><xsl:apply value-of select=“&sig;”></p>
</body></html>
</xsl:template>
</xsl:stylesheet>
There are other components of XML that greatly extend
its power and flexibility
Xpath
This is a syntax that locates nodes in the hierarchical
structure of an XML document
It is used in XSLT
<xsl:template match=“node_name”>
This specifies the current node
It uses patterns: these can be repeated throughout
the document
It also uses expressions: these are context specific
This syntax is a sophisticated shorthand to use when
writing processing instructions
Xlink
This is extensible linking language
It allows more complex type of linking
Here’s a simple link
<logo xlink:type=“simple”
xlink:href=“../images/logo.gif”
xlink:role=“image”
xlink:title=“logo” replace
xlink:show=“embedded” new
xlink:actuate:”onload” />
onLoad
Xlink defines “linksets” or extended links
A set of files can be connected through a chain of links
moving from the first to the last file in the linkset
Xpointer
This is a syntax for linking to specific locations within
XML documents
It uses Xpath expressions to define the locations
#xpointer(element_name[position()=1])
This is appended to the end of a URL in an Xlink
expression
Xforms
This is a subset of XML that is going to be used
someday to allow more complex forms to be created in
XHTML
I. What is XML?
• XML and HTML
• Where does it fit in with other markup languages?
II. How does it work?
• Your own private language
• DTDs and schemas
• XSLT: Extensible style sheet transformation
language
• Xpath, Xlink, Xpointer, Xforms
III. How will it change the web?
• Examples of XML applications
III. How will it change the web?
XML has interesting potential to change a portion of the
web
It is expected to move us closer to write once display
anywhere (XSLT)
It will be an important component of the “semantic
web”
Search engines that can process XML should be much
more precise and return more relevant results
It can improve business processes, particularly if
professions develop their own markup languages
Examples of XML applications
Resource Description Framework (RDF)
This is a framework that allows the description and
interchange of metadata
Because it is designed to be platform independent, it
becomes a hub for metadata activity
RDF provides a model for metadata, and a syntax so
that independent parties can exchange it and use it
RDF makes it possible to use multiple pieces of
software to process the same metadata
It also allows a single piece of software to process (at
least in part) many different metadata vocabularies
Extensible Hypertext Markup Language (XHTML)
Synchonized MultiMedia Markup Language (SMIL)
Math Markup Language (MathML)
Chemical Markup Language (CheML)
Commerce Markup Language (CML)
Electronic Business XML (ebXML)
National Library of Medicine XML Data formats
Electronic Component Information Exchange (ECIX)
Geography Markup Language (GML)
Research Information Exchange Markup Language
(RIXML)
MARC to XML conversions

You might also like