Web Services: XML Documents Contain The Information Being Exchanged Between Two Parties. It Is
Web Services: XML Documents Contain The Information Being Exchanged Between Two Parties. It Is
Web Services: XML Documents Contain The Information Being Exchanged Between Two Parties. It Is
XML documents contain the information being exchanged between two parties. It is
used to organize documents and business data. XML files can be stored or
transmitted between two applications on a network.
SOAP provides a packaging and routing standard for exchanging XML documents
over a network. A SOAP message is just an XML document. SOAP is specially
designed, however, to contain and transmit other XML documents as well as
information related to routing, processing, security, transactions, and other qualities
of service.
WSDL allows an organization to describe the types of XML documents and SOAP
messages that must be used to interact with their Web services.
When you create a new Web service, you can also create a WSDL
document that describes the type of data you're exchanging.
There are two versions of WSDL today, versions 1.1 and 1.2.
UDDI (Universal Description, Discovery, and Integration) defines a standard set of Web
service operations (methods) that are used to store and look up information about other Web
service applications.
In other words, UDDI defines a standard SOAP-based interface for a Web services registry.
You can use a UDDI registry to find a particular type of Web service, or to find out about the
Web services hosted by a specific organization.
When you look up information about a Web service in a UDDI registry, you can narrow your
search using various categories (technologies used, business types, industry, and so on).
Each entry in a UDDI registry provides information on where the Web service is located and how
to communicate with it.[2] The UDDI registry also provides information about the organization
that hosts a particular Web service.
UDDI can also store data about other types of services, such as a Web site or a phone service.
There are three versions of UDDI at this time, versions 1.0, 2.0, and 3.0.
J2EE Web Service APIs:
SAAJ:
JAXR:
JAXP:
These are the APIs you will need to understand if you want to implement Web service
applications using the J2EE platform.
Perhaps the most important Web service API is JAX-RPC, which is used to implement J2EE
Web service clients and endpoints (services)
The client-side APIs allow you to communicate with Web service endpoints hosted on some
other platform. For example, you can use one of the client-side APIs to send SOAP messages to a
VB.NET or an Apache Axis Web service. The client-side APIs can be used from standalone Java
applications or from J2EE components like servlets, JSPs, or EJBs.
The generated stub is the one you will use the most, and its semantics closely resemble those of
Java RMI.
The dynamic proxy API also follows many of the Java RMI semantics, but is used less often.
The DII is a very low-level API used primarily by vendor tools, but can also be employed by
Web services developers if necessary.
The server-side components include the JAX-RPC service endpoint (JSE) and the EJB endpoint.
The JSE component is actually a type of servlet that has been adapted for use as a Web services
component. It's very easy to implement, yet it has access to the full array of services and
interfaces common to servlets.
The EJB endpoint is simply a type of stateless session EJB that has been adapted for use as a
Web service endpoint. The EJB endpoint provides all the transactional and security features of a
normal stateless session bean, but it's specifically designed to process SOAP requests.
SAAJ:
(SOAP with Attachments API for Java) is a low-level SOAP API that complies with SOAP 1.1
and the SOAP Messages with Attachments specification. SAAJ allows you to build SOAP
messages from scratch as well as read and manipulate SOAP messages.
You can use it alone to create, transmit, and process SOAP messages, but you're more likely to
use it in conjunction with JAX-RPC.
In JAX-RPC, SAAJ is used primarily to process SOAP header blocks (the SOAP message meta-
data.
JAXR:
(Java API for XML Registries) provides an API for accessing UDDI registries. It simplifies the
process of publishing and searching for Web service endpoints. JAXR was originally intended
for ebXML registries, a standard that competes with UDDI, but was adapted for UDDI and
works pretty well in most cases.
JAXR has a set of business-domain types like Organization, Postal Address, and Contact as
well as technical-domain types like Service Binding, External Link, and Classification.
These domain models map nicely to UDDI data types.
JAXR also defines APIs for publishing and searching for information in a UDDI registry.
JAXP
JAXP (Java API for XML Processing) provides a framework for using DOM 2 and SAX2,
standard Java APIs that read, write, and modify XML documents.
DOM 2 (Document Object Model, Level 2) is a Java API that models XML documents as trees
of objects. It contains objects that represent elements, attributes, values, and so on. DOM 2 is
used a lot in situations where speed and memory are not factors, but complex manipulation of
XML documents is required. DOM 2 is also the basis of SAAJ 1.1.
SAX2 (Simple API for XML, version 2) is very different in functionality from DOM 2.
When a SAX parser reads an XML document, it fires events as it encounters start and end tags,
attributes, values, etc. You can register listeners for these events, and they will be notified as the
SAX2 parser detects changes in the XML document it is reading.
JAXP comes in several versions including 1.1, 1.2, and 1.3. Version 1.3 is very new and is not
supported by J2EE 1.4 Web Services.
XML BASICS
XML Primer
An XML markup language defines a set of tags that are used to organize and describe text.
Tags are usually paired; together, a start tag, an end tag, and everything between them are called
an element.
For example, you could save the addresses of your friends, family members, and business
associates in a text file using XML
XML documents are composed of Unicode text (usually UTF-8), so people as well as software
can understand them
The ability to create an infinite number of new markup languages is why XML is called
eXtensible
XML only defines the syntax of elements used in text—it is not software and isn't compiled,
interpreted, or executed. It's just plain text
Data-oriented markup languages focus on how data is organized and typed; they define a
schema for storing and exchanging data between software applications.
Some XML markup languages are industry standards, like SOAP and XHTML, while most are
designed to serve a single application, organization, or individual.
The XML markup languages used in this book, both custom and standard, are decidedly data-
oriented.
Regardless of the source of a markup language, if it's based on XML it must follow the same
syntax and rules defined by the XML specification, which makes XML documents portable.
Portability means you can use any standard XML parsers, editors, and other utilities to process
most, if not all, of the XML documents you will encounter.
An XML parser is a utility that can read and analyze an XML document. In most cases an
XML parser is combined with a parser API (such as SAX2 or DOM 2) that allows a
developer to interact with the XML document while it's being parsed, or after.
A Web page written in XHTML (a variant of HTML), which is a text file, is an XML document.
Similarly, a SOAP message, which is generated and exchanged over a network, is an XML
document.
A business might choose to store address information as an XML document. In this case the text
file might look like
The above example is called an XML document instance, which means it represents one possible
set of data for a particular markup language. It might be saved as a file or sent over the Internet
as the payload of a SOAP message. If you were to create another XML document with the same
tags but different contents (like a different street or Zip code) it would be considered a different
XML document instance.
An XML document is made up of declarations, elements, attributes, text data, comments, and
other components. This section examines an XML document instance in detail and explains its
most important components.
XML Declaration
An XML document may start with an XML declaration, but it's not required. An XML
declaration declares the version of XML used to define the document (there is only one version
at this time, version 1.0). It may also indicate the character encoding used to store or transfer the
document, and whether the document is standalone or not (the standalone attribute is not used
in this book).
Elements
XML markup languages organize data hierarchically, in a tree structure, where each branch of
the tree is called an element and is delimited by a pair of tags. All elements are named and have a
start tag and an end tag. A start tag looks like <tagname> and an end tag looks like </tagname>.
The tagname is a label that usually describes the information contained by the element. Between
the start and end tags, an element may contain text or other elements, which themselves may
contain text or more elements.
There are six elements in this example (address, name, street, city, state, and zip). The
address element uses the start tag <address> and the end tag </address>, and contains the
other five elements. The address element, because it contains all the other elements, is referred
to as the root element. Each XML document must have one root element, and that element must
contain all the other elements and text, except the XML declaration, comments, and certain
processing instructions.
The other elements (name, street, city, state, zip) all contain text. According to the WS-I
Basic Profile 1.0, XML documents used in Web services must use either UTF-8 or UTF-16
encoding. This limitation simplifies things for Web service vendors and makes interoperability
easier, because there is only one character encoding standard to worry about, Unicode. UTF-8
and UTF-16 encoding allows you to use characters from English, Chinese, French, German,
Japanese, and many other languages.
An element name must always begin with a letter or underscore, but can contain pretty much any
Unicode character you like, including underscores, letters, digits, hyphens, and periods. Some
characters may not be used: /, <, >, ?, ", @, &, and others. Also, an element name must never start
with the string xml, as this is reserved by the XML 1.0 specification. As long as you follow
XML's rules you may name elements anything and your elements may contain any combination
of valid text and other elements.
Elements do not have to contain any data at all. It's perfectly acceptable to use an empty-element
tag, a single tag of the form <tagname/>, which is interpreted as a pair of start and end tags with
no content (<tagname></tagname>). Empty-element tags are typically used when an element
has no data, when it acts like flag, or when it’s pertinent data is contained in its attributes
(attributes are described in the next section).
Attributes
An element may have one or more attributes. You use an attribute to supplement the data
contained by an element, to provide information about it not captured by its contents. For
example, we could describe the kind of address in an XML address document by declaring a
category attribute as in.
Each attribute is a name-value pair. The value must be in single or double quotes.
You can define any number of attributes for an element, but a particular attribute may occur only
once in a single element. Attributes cannot be nested like elements. Attribute names have the
same restrictions as element names. Attributes must be declared in the start tag and never the end
tag of an element.
In many cases, empty-element tags are used when the attributes contain all the data.
Listing 2-4 Using the Empty-Element Tag in XML
<?xml version="1.0" encoding="UTF-8" ?>
<address category="business" >
<name>Amazon.com</name>
<street>1516 2nd Ave</street>
<city>Seattle</city>
<state>WA</state>
<zip>90952</zip>
<phone countrycode="01" areacode="715" number ="55529482" ext="341" />
</address>
Using attributes instead of nested elements is considered a matter of style, rather than
convention. There are no "standard" design conventions for using attributes or elements.
Comments
You can add comments to an XML document just as you can add comments to a Java program.
A comment is considered documentation about the XML document and is not part of the data it
describes. Comments are placed between a <!-- designator and a --> designator, as in HTML:
<!-- comment goes here -->.
CDATA Section
An element may contain other elements, text, or a mixture of both. When an element contains
text, you have to be careful about which characters you use because certain characters have
special meaning in XML. Using quotes (single or double), less-than and greater-than signs (< and
>), the ampersand (&), and other special characters in the contents of an element will confuse
parsers, which consider these characters to be special parsing symbols. To avoid parsing
problems you can use escape characters like > for greater-than or & for ampersand, but this
technique can become cumbersome.
A CDATA section allows you to mark a section of text as literal so that it will not be parsed for
tags and symbols, but will instead be considered just a string of characters. For example, if you
want to put HTML in an XML document, but you don't want it parsed, you can embed it in a
CDATA section. In the address document contains a note in HTML format.
Using a CDATA Section in XML
<?xml version="1.0" encoding="UTF-8" ?>
<!-- This document contains address information -->
<address category="business" >
<name>Amazon.com</name>
<street>1516 2nd Ave</street>
<city>Seattle</city>
<state>WA</state>
<zip>90952</zip>
<note>
<![CDATA[
<html>
<body>
<p>
Last time I contacted <b>Amazon.com</b> I spoke to ...
</body>
</html>
]]>
</note>
</address>
CDATA Sections take the form <![CDATA[ text goes here ]]> . If we include the HTML in
the note element without embedding it in a CDATA section, XML processors will parse it as
Address Markup, instead of treating it as ordinary text, causing two kinds of problems: First,
HTML's syntax isn't as strict as XML's so parsing problems are likely. Second, the HTML is not
actually part of Address Markup; it's simply a part of the text contained by the note element, and
we want it treated as literal text.
Although XML is just plain text, and can be accessed using a common text editor, it's usually
read and manipulated by software applications and not by people using text editors. A software
application that reads and manipulates XML documents will use an XML parser.
In general, parsers read a stream of data (usually a file or network stream) and break it down into
functional units that can then be processed by a software application. An XML parser can read an
XML document and parse its contents according to the XML syntax.
Parsers usually provide a programming API that allows developers to access elements,
attributes, text, and other constructs in XML documents.
There are basically two standard kinds of XML parser APIs: SAX and DOM.
SAX (Simple API for XML) was the first standard XML parser API and is very popular.
Although several individuals created it, David Brownell currently maintains SAX2, the latest
version, as an open development project at SourceForge.org. SAX2 parsers are available in many
programming languages including Java.
SAX2 is based on an event model. As the SAX2 parser reads an XML document, starting at the
beginning, it fires off events every time it encounters a new element, attribute, piece of text, or
other component. SAX2 parsers are generally very fast because they read an XML document
sequentially and report on the markup as it's encountered.
DOM (Document Object Model) was developed after SAX2 and maintained by the W3C. DOM
level 2 (DOM 2) is the current version, but there is a DOM level 3 in the works. DOM 2 parsers
are also available for many programming languages, including Java. DOM 2 presents the
programmer with a generic, object-oriented model of an XML document. Elements, attributes,
and text values are represented as objects organized into a hierarchical tree structure that reflects
the hierarchy of the XML document being processed. DOM 2 allows an application to navigate
the tree structure, modify elements and attributes, and generate new XML documents in memory.
It's a very powerful and flexible programming model, but it's also slow compared to SAX2, and
consumes a lot more memory.
In addition to providing a programming model for reading and manipulating XML documents,
the parser's primary responsibility is checking that documents are well formed; that is, that their
elements, attributes, and other constructs conform to the syntax prescribed by the XML 1.0
specification.
For example, an element without an end tag, or with an attribute name that contains invalid
characters, will result in a syntax error. A parser may also, optionally, enforce validity of an XML
document. An XML document may be well formed, but invalid because it is not organized
according to its schema.
1) Crimson
2) Xerces-J
These include both SAX2 and DOM 2, so you can pick the API that better meets your needs.
Crimson is a part of the Java 2 platform (JDK 1.4), which means it's available to you
automatically. Xerces, which some people feel is better, is maintained by the Apache Software
Foundation.
You must download it as a JAR file and place it in your class path (or ext directory) before you
can use it. Either parser library is fine for most cases, but Xerces supports W3C XML Schema
validation while Crimson doesn't.
JAXP (Java API for XML Processing), which is part of the J2EE platform, is not a parser. It's a
set of factory classes and wrappers for DOM 2 and SAX2 parsers.
Java-based DOM 2 and SAX2 parsers, while conforming to standard DOM 2 or SAX2
programming models, are instantiated and configured differently, which inhibits their portability.
JAXP eliminates this portability problem by providing a consistent programming model for
instantiating and configuring DOM 2 and SAX2 parsers. JAXP can be used with Crimson or
Xerces-J.
JAXP is a standard Java extension library, so using it will help keep your J2EE applications
portable.
Other non-standard XML APIs are also available to Java developers, including JDOM, dom4j,
and XOM. These APIs are tree-based like DOM 2, and although they are non-standard, they tend
to provide simpler programming models than DOM 2. JDOM and dom4j are actually built on top
of DOM 2 implementations, wrapping DOM 2 with their own object-oriented programming
model. JDOM and dom4j can both be used with either Xerces-J or Crimson. If ease of use is
important, you may want to use one of these non-standard parser libraries, but if J2EE portability
is more important, stick with JAXP, DOM 2, and SAX2.
XML Namespaces
An XML namespace provides a qualified name for an XML element or attribute, the same way that a Java
package provides a qualified name for a Java class.
In most Java programs, classes are imported from other packages (java.io, javax.xml, and the rest). When the
Java program is compiled, every operation performed on every object or class is validated against the class
definition in the appropriate package. If Java didn't have package names, the classes in the Java core libraries
(I/O, AWT, JDBC, etc.) would all be lumped together with developer-defined classes. Java package names allow
us to separate Java classes into distinct namespaces, which improves organization and access control, and helps
us avoid name conflicts (collisions). XML namespaces are similar to Java packages, and serve the same
purposes;
XML namespace provides a kind of package name for individual elements and attributes.
Creating XML documents based on multiple markup languages is often desirable. For example, suppose we are
building a billing and inventory control system for a company called Monson-Haefel Books. We can define a
standard markup language for address information, the Address Markup Language, to be used whenever an XML
document needs to contain address information. An instance of Address Markup is shown in Listing 2-7.
Address Markup has its own schema, defined using either DTD (Document Type Definition) or the W3C XML
Schema Language, which dictates how its elements are organized. Every time we use address information in an
XML document, it should be validated against Address Markup's schema. For example, in Listing 2-8 the
address information is included in the PurchaseOrder XML document.
Listing 2-8 The PurchaseOrder Document Using the Address Markup Langauge
<?xml version="1.0" encoding="UTF-8" ?>
<purchaseOrder orderDate="2003-09-22" >
<accountName>Amazon.com</accountName>
<accountNumber>923</accountNumber>
<address>
<name>AMAZON.COM</name>
<street>1850 Mercer Drive</street>
<city>Lexington</city>
<state>KY</state>
<zip>40511</zip>
</address>
<book>
<port name="BookPrice_Port" binding="mh:BookPrice_Binding">
<soapbind:address location=
"http://www.Monson-Haefel.com/jwsbook/BookQuote" />
</port>
</service>
</definitions>
The types element uses the XML schema language to declare complex data types and elements that are used
elsewhere in the WSDL document.
The import element is similar to an import element in an XML schema document; it's used to import WSDL
definitions from other WSDL documents.
The message element describes the message's payload using XML schema built-in types, complex types, or
elements that are defined in the WSDL document's types element, or defined in an external WSDL document
the import element refers to.
The portType and operation elements describe a Web service's interface and define its methods. A portType
and its operation elements are analogous to a Java interface and its method declarations. An operation
element uses one or more message types to define its input and output payloads.
The binding element assigns a portType and its operation elements to a particular protocol (for instance,
SOAP 1.1) and encoding style.
The service element is responsible for assigning an Internet address to a specific binding.
The documentation element explains some aspect of the WSDL document to human readers. Any of the other
WSDL elements may contain documentation elements. The documentation element is not critical, so it will
not be mentioned again in this chapter.
XML Namespaces
An XML namespace provides a qualified name for an XML element or attribute, the same way
that a Java package provides a qualified name for a Java class. In most Java programs, classes
are imported from other packages (java.io, javax.xml, and the rest). When the Java program is
compiled, every operation performed on every object or class is validated against the class
definition in the appropriate package. If Java didn't have package names, the classes in the Java
core libraries (I/O, AWT, JDBC, etc.) would all be lumped together with developer-defined
classes. Java package names allow us to separate Java classes into distinct namespaces, which
improves organization and access control, and helps us avoid name conflicts (collisions).
XML namespaces are similar to Java packages, and serve the same purposes; an XML
namespace provides a kind of package name for individual elements and attributes.
Creating XML documents based on multiple markup languages is often desirable. For example,
suppose we are building a billing and inventory control system for a company called Monson-
Hoeffel Books. We can define a standard markup language for address information, the Address
Markup Language, to be used whenever an XML document needs to contain address
information. An instance of Address Markup is shown below
Address Markup is used in Address Book Markup (nested in the addresses element) defined in
at the start of this chapter, but it will also be reused in about half of Monson-Haefel Books' other
XML markup languages (types of XML documents): Invoice, Purchase Order, Shipping,
Marketing, and others.
Address Markup has its own schema, defined using either DTD (Document Type Definition) or
the W3C XML Schema Language, which dictates how its elements are organized. Every time we
use address information in an XML document, it should be validated against Address Markup's
schema. For example, in Listing 2-8 the address information is included in the PurchaseOrder
XML document.
Listing 2-8 The PurchaseOrder Document Using the Address Markup Language
[ Team LiB ]
Invocation
There are invocation mechanisms on both the server side and the client
side. On the server side, the invocation mechanism is responsible for:
Server-Side Invocation
1. Receiving a SOAP message from a transport (e.g., from an HTTP or
JMS endpoint).
2. Invoking handlers that preprocess the message (e.g., to persist the
message for reliability purposes, or process SOAP headers).
3. Determining the message’s target service—in other words, which
WSDL operation the message is intended to invoke.
4. Given the target WSDL operation, determining which Java class/
method to invoke. I call this the Java target. Determining the Java
target is referred to as dispatching.
5. Handing off the SOAP message to the Serialization subsystem to
deserialize it into Java objects that can be passed to the Java target as
parameters.
6. Invoking the Java target using the parameters generated by the Serialization
subsystem and getting the Java object returned by the targetmethod.
7. Handing off the returned object to the Serialization subsystem to
serialize it into an XML element conformant with the return message
specified by the target WSDL operation.
8. Wrapping the returned XML element as a SOAP message response
conforming to the target WSDL operation.
9. Handing the SOAP response back to the transport for delivery
At each stage in this process, the invocation subsystem must also handle
exceptions. When an exception occurs, the invocation subsystem often must
package it as a SOAP fault message to be returned to the client. In practice,
the invocation process is more nuanced and complex than this. However,
the steps outlined here offer a good starting point for our discussion of Java
Web Services architecture. Later chapters go into greater detail—particularly
Chapters 6 and 7 where I examine JAX-WS, and Chapter 11 where the
SOA-J5 invocation mechanism is described.
As you can see, the invocation process is nontrivial. Part of its complexity
results from having to support SOAP. We’ll look at a simpler alternative,
known as REST (Representational State Transfer), in Chapter 3. Even with
REST, however, invocation is complicated. It’s just not that easy to solve the
generalized problem of mapping an XML description of a Web service to a
Java target and invoking that target with an XML message.
On the client side, the invocation process is similar if you want to invoke
a Web service using a Java interface. This approach may not always be the
most appropriate way to invoke a Web service—a lot depends on the problem
you are solving. If your client is working with XML, it might be easier
to just construct a SOAP message from XML and pass it to the Web service.
On the other hand, if your client is working with Java objects, as JWS
assumes, the client-side invocation subsystem is responsible for:
Client-Side Invocation
The W3C XML Schema definition (WXS) represents the Abstract Data Model of W3C XML
Schema (WXS) in XML language. By defining an Abstract Data Model of the schema, the W3C
Schema becomes agnostic about the language used to represent that model. XML representation
is the formal representation specified by WXS, but you are free to represent the Abstract Data
Model any way you want and use it for validation. For example, you can directly create an in-
memory schema using any data structure that adheres to the Abstract Data Model. This
encourages the vendors that develop W3C Schema validators to provide an API that you can use
create an in-memory schema directly.
There are numerous grammars available for validating XML-instance documents. Some became
obsolete immediately, while others—such as DTD, which is part of W3C XML 1.0 REC—have
passed the test of time. Of the extant grammars,
Here's an example of how you would validate an XML instance against an externally specified
schema:
import java.io.FileInputStream;
import oracle.xml.parser.v2.XMLError;
import oracle.xml.parser.schema.XML Schema;
import oracle.xml.parser.schema.XSDBuilder;
import oracle.xml.schemavalidator.XSDValidator;
...
//load XML Schema
Oracle XML Developer's Kit (XDK) includes a W3C-complaint XML Schema processor, as well
as several utilities, such as for creating schema datatypes and restricting them programatically
using the APIs, parsing and validating the XML Schema structure itself, traversing the Abstract
Data Model of an XMLSchema, and so on. Check out the oracle.xml.parser.schema and
oracle.xml.schemavalidator packages.
Element Content
In an XML document, the content of an element is the content enclosed between its <opening>
and </closing> tag. An element can have only four types of content: TextOnly, ElementOnly,
Mixed, and Empty. Attributes declared on an element are not considered to contribute to the
content of an element. They are just part of the element on which they are declared, and
contribute to the structure of XML.
TextOnly
The content of an element is said to be TextOnly, when that element has only character data (or
simply called as text data) between its <opening> and </closing> tag, or in other words, when
that element has no child elements. For example:
The content of an element is said to be ElementOnly, when that element has only child elements
between its <opening> and </closing> tag, optionally separated by whitespaces (space, tab,
newline, carriage return). These whitespaces are called ignorable whitespaces, and are often used
for indenting the XML. Therefore the following:
<ElementOnly>
<child1 .../>
<child2 .../>
</ElementOnly>
Mixed
The content of an element is said to be Mixed when that element has character data interspersed
with child elements between its <opening> and </closing> tag. (In other words, its content has
both character data as well as child elements.) When the content is mixed, then so-called
ignorable whitespaces are not ignorable anymore. Therefore, the following:
<Mixed>
<child1 .../>
some character data
<child1 .../>
</Mixed>
Empty
The content of an element is said to be Empty when that element has absolutely nothing between
the <opening> and </closing> tag, not even whitespaces. For example:
<Empty></Empty>
Another way, for ease of use and clarity, to represent an element, which has an
empty content is to use a single empty tag, as follows:
<Empty />
Content Models
In an XML grammar, one declares the content model of an element to specify the type of element
content in the corresponding XML instance document. Therefore, a content model is the
definition of the element content.
The figure below illustrates how to declare the content models in an XML Schema. Trace the
paths in this figure starting from <schema>, to understand how to declare the content model for
the four types of element content, with and without attribute declarations. Let's examine each one
briefly.
TextOnly
In the illustration above, trace the path until simpleType-1 to declare an element with TextOnly
content model:
<xsd:element name="TextOnly">
<xsd:simpleType>
<xsd:restriction base="xsd:string" />
</xsd:simpleType>
</xsd:element>
OR equivalent
As mentioned previously, attributes don't contribute to the element content; therefore, another
example of an XML instance with a TextOnly content, and with attributes, is:
<xsd:element name="TextOnly">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute name="att" type="xsd:string" use="required" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
The above schema declares an element named "TextOnly" with TextOnly content
model whose content must be a string and must have an attribute named "attr" in
the corresponding XML instance.
ElementOnly
Trace the path in Figure 1 until either one of sequence-5, choice-6, or all-7 to declare an element
with ElementOnly content model:
<xsd:element name="ElementOnly">
<xsd:complexType>
<xsd:sequence> <!-- could have used choice or all instead —>
<xsd:element name="child1" type="xsd:string" />
<xsd:element name="child2" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
The above schema declares an element named "ElementOnly" with ElementOnly
content model. The element "ElementOnly" must have the child elements "child1"
and "child2" in the corresponding XML instance document. See the corresponding
XML instance for this schema in the previous section.
Another XML instance with ElementOnly element content and with attributes looks like:
<ElementOnly att="val">
<child1 .../>
<child2 .../>
</ElementOnly>
Mixed
Trace the path in Figure 1 until either one of sequence-5, choice-6, or all-7 to declare an element
with Mixed content model—which is identical to declaring ElementOnly content model—but
this time set the mixed attribute on the complexType to true, as follows:
<xsd:element name="Mixed">
<xsd:complexType mixed="true">
<xsd:sequence>
<xsd:element name="child1" type="xsd:string" />
<xsd:element name="child2" type="xsd:string" />
</xsd:sequence>
<xsd:attribute name="att" type="xsd:string" use="required" />
</xsd:complexType>
</xsd:element>
To declare an element with ElementOnly content model and with attributes, the
path in Figure 1 is same as that of declaring ElementOnly content model. The
attributes are then declared within the complexType as follows:
<xsd:element name="ElementOnly">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="child1" type="xsd:string" />
<xsd:element name="child2" type="xsd:string" />
</xsd:sequence>
<xsd:attribute name="att" type="xsd:string" use="required" />
</xsd:complexType>
</xsd:element>
The corresponding XML instance for the above schema looks like
<Mixed att="val">
<child1 .../>
some character data
<child1 .../>
</Mixed>
Empty
Trace the path until complexType-2 to declare an element with Empty content model, with or
without attributes:
<xsd:element name="EmptyContentModels">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Empty1">
<xsd:complexType />
</xsd:element>
<xsd:element name="Empty2">
<xsd:complexType>
<xsd:attribute name="att" type="xsd:string" use="required" />
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
The corresponding XML instance for the above schema looks like
<EmptyContentModels>
<Empty1 />
<Empty2 att="val" />
</EmptyContentModels>
Model Groups
When the content model of an element is declared to be ElementOnly (or mixed), which means
that the element has child elements, then you can specify the order and occurrence of the child
elements in more detail using the model groups. A model group consists of particles; a particle
can be an element declaration or yet another model group. The model groups itself can have a
cardinality, which can be refined using the minOccurs and maxOccurs attributes. These
characteristics make model groups quite powerful.
• Sequence - (a , b)* - means that the child elements declared within the
sequence model group must occur in the corresponding XML-instance in the
same order as defined in the schema. The cardinality of a sequence model
group can range from 0 to unbounded. A sequence model group can futher
contain a sequence or a choice model group recursively.
• Choice - (a | b)* - means that from the set of child elements declared within
the choice model group exactly one element must occur in the corresponding
XML-instance. The cardinality of a choice model group can range from 0 to
unbounded. A choice model group can futher contain a sequence or a choice
model group recursively.
• All - {a , b}? - means that the entire set of child elements declared within the
all model group must occur in the corresponding XML-instance, but unlike
sequence model group, the order is not important. The child elements can
therefore occur in any order. The cardinality of an all model group can only be
either 0 or 1. An all model group can only contain element declarations and
not any other model group.
These model groups can either be declared in-line or as a global declaration (immediate child of
<schema> construct with a name for re-usability). A global model group must be declared within
the <group> construct, which you can later refer to by its name. But unlike the in-line model
groups, the minOccurs/maxOccurs attributes cannot be declared on the globally declared model
groups. When required, you can use the minOccurs/maxOccurs attributes when referencing the
globally declared model group. For example:
<xsd:group name="globalDecl">
<xsd:sequence>
<xsd:element name="child1" type="xsd:string" />
<xsd:element name="child2" type="xsd:string" />
</xsd:sequence>
</xsd:group>
Subsequently, you can reference the globally declared model group using the group
construct along with the minOccurs/maxOccurs attributes, if required, as follows:
<xsd:element name="complexModelGroup">
<xsd:complexType>
</xsd:complexType>
</xsd:element>
The complexType story
You now have enough information to write a simple schema for an XML document. But many
advanced concepts in XML Schema remain to be addressed.
complexType is one of the other most powerful constructs in the XML Schema. Apart from
allowing you to declare all four content models with or without attributes, you can derive a new
complexType by inheriting an already declared complexType. Consequently, the derived
complexType can either add more declarations to the ones inherited from the base complexType
(using extension) or can restrict the declarations from the base complexType (using restriction).
Extending a complexType
simpleContent
Figure 2. A complexType with simpleContent can only be extended to add attributes.
<xsd:complexType name="DerivedType1">
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute name="att1" type="xsd:string" use="required" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
<xsd:complexType name="DerivedType2">
<xsd:simpleContent>
<xsd:extension base="tns:DerivedType1">
<xsd:attribute name="att2" type="xsd:string" use="required" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
<xsd:element name="SCExtension">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Derived1" type="tns:DerivedType1" />
<xsd:element name="Derived2" type="tns:DerivedType2" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
In the above schema:
<SCExtension xmlns="http://inheritance-ext-res"
xmlns:xsi="http://www.w3.org/2001/XML Schema-instance"
xsi:schemaLocation="http://inheritance-ext-res CTSCExt.xsd">
<Derived1 att1="val">abc</Derived1>
<Derived2 att1="val" att2="val">def</Derived2>
</SCExtension>
complexContent
<xsd:element name="CCExtension">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Base" type="tns:BaseType" />
<xsd:element name="Derived" type="tns:DerivedType" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
In the above schema:
1. The DerivedType inherits the sequence model group from the base
complexType, and adds a choice model group, thereby, making the final
content model of the derived complexType - ((child1)+ , (child2 |
child3)).
2. The DerivedType inherits attribute attr1 from the BaseType, and adds
attribute attr2.
<CCExtension xmlns="http://inheritance-ext-res"
xmlns:xsi="http://www.w3.org/2001/XML Schema-instance"
xsi:schemaLocation="http://inheritance-ext-res CTCCExt.xsd">
<Base att1="val">
<child1>This is base</child1>
<child1>This is base</child1>
</Base>
</CCExtension>
Restricting a complexType
simpleContent
Figure 4. A complexType with simpleContent can be used to restrict the datatype and
attributes.
A complexType with simpleContent can only restrict a complexType with simpleContent. As
illustrated in Figure 4, in the derived complexType, then, you can restrict the simpleType of the
base, as well as restrict the type and use (optional, mantatory, etc.) of the attributes from the base.
For example:
<xsd:complexType name="BaseType">
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute name="att1" type="xsd:string" use="optional" />
<xsd:attribute name="att2" type="xsd:integer" use="optional" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
<xsd:complexType name="DerivedType">
<xsd:simpleContent>
<xsd:restriction base="tns:BaseType">
<xsd:maxLength value="35" />
<xsd:attribute name="att1" use="prohibited" />
</xsd:restriction>
</xsd:simpleContent>
</xsd:complexType>
<xsd:element name="SCRestriction">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Base" type="tns:BaseType" />
<xsd:element name="Derived" type="tns:DerivedType" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
In the above schema:
1. You restricted the simpleType content of the base (of type string) to a string
of length 35 in the derived.
2. You blocked the attribute att1 from being inherited from base.
3. You restricted the type of the attribute att2 to an integer of 2 digits, and
made it mandatory from optional.
<SCRestriction xmlns="http://inheritance-ext-res"
xmlns:xsi="http://www.w3.org/2001/XML Schema-instance"
xsi:schemaLocation="http://inheritance-ext-res CTSCRes.xsd">
</SCRestriction>
complexContent
<xsd:complexType name="BaseType">
<xsd:sequence>
<xsd:element name="child1" type="xsd:string" maxOccurs="unbounded" />
<xsd:element name="child2" type="xsd:string"/>
</xsd:sequence>
<xsd:attribute name="att1" type="xsd:string" use="optional" />
</xsd:complexType>
<xsd:complexType name="DerivedType">
<xsd:complexContent>
<xsd:restriction base="tns:BaseType">
<xsd:sequence>
<xsd:element name="child1" type="xsd:string" maxOccurs="4" />
<xsd:element name="child2">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="35" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="att1" type="xsd:string" use="prohibited" />
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
<xsd:element name="CCRestriction">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Base" type="tns:BaseType" />
<xsd:element name="Derived" type="tns:DerivedType" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
In the above schema:
1. You restricted the cardinality of child1 in the DerivedType, inherited from the
BaseType, from unbounded to 4.
2. You restricted the type of child2 in the DerivedType, inherited from the
BaseType to a string of length 35
3. You prohibited the attribute att1 from being inherited from the BaseType.
<CCRestriction xmlns="http://inheritance-ext-res"
xmlns:xsi="http://www.w3.org/2001/XML Schema-instance"
xsi:schemaLocation="http://inheritance-ext-res CTCCRes.xsd">
<Base att1="val">
<child1>This is base type</child1>
<child2>This is base type</child2>
</Base>
<Derived>
<child1>This is restricted in the derived</child1>
<child2>This is restricted in the derived</child2>
</Derived>
</CCRestriction>
Assembling Schemas
Imports, includes, and chameleon effects
Many Java projects involve multiple different classes and packages instead of a single, huge Java
file because modularization makes the code easy to re-use, read, and maintain. Subsequently, you
have to stick the necessary import into the classes before you can use them. Similarly, in XML
Schema, you have to manage multiple different schemas from various different namespaces and
you need to stick the necessary import in the schemas before you use them.
XML Schemas can be assembled using <import/> and <include/> schema constructs, and of
course, the following should be the first statement in the schema before any other declarations:
<schema>
<import namespace="foo" schemaLocation="bar.xsd" />
<include schemaLocation="baz.xsd" />
...
</schema>
Usually <import /> is used when the schema being imported has a
targetNamespace, while <include /> is used when the schema being included has
no targetNamespace declared.
Let's look at an example involving two schemas - A and B— with A referring to items declared
in B.
Case I
When both the schemas have a targetNamespace and the targetNamespace of schema A (tnsA) is
different from the targetNamespace of schema B (tnsB), then A must import B.
Case II
When both the schemas have a targetNamespace and the targetNamespace of schema A (tnsAB)
is same as the targetNamespace of schema B (tnsAB), then A must include B.
<include schemaLocation="B.xsd">
It is an error for A to import B.
Case III
When both the schemas A and B don't have a targetNamespace. In this case, A must include B.
When A includes B, all the included items from B get the namespace of A. Such an include is
known as a chameleon include.
When you don't want such a chameleon effect to take place, you must use an import without
specifying the namespace. An import without the namespace attribute allows unqualified
reference to components with no target namespace.
<import schemaLocation="B.xsd">
Importing or including a schema multiple times is not an error, because the schema
processors can detect such a scenario and not load an already loaded schema.
Therefore, it is not an error if A.xsd imports B.xsd and C.xsd; and both B.xsd and
C.xsd individually import A.xsd. Circular references are not errors either but are
strongly discouraged.
By the way, a mere import like <import /> is legal as well. This approach simply allows
unqualified reference to foreign components with no target namespace without giving any hints
as to where to find them. It is up to the Schema processor to either throw an error or lookup for
unknown items using some mechanism, and this behaviour may vary from one Schema processor
to other. A mere <include /> is however illegal.
Rules of thumb:
Redefining Schemas
You may not always want to assemble schemas in their original forms. For example, you may
want to modify the components being imported from the schema. In such cases, when we want to
redefine a declaration without changing its name, we use the redefine component to do this, with
the constraint that the schema which is to be redefined must either have (a) the same
targetNamespace as the <redefine>ing schema document, or have (b) no targetNamespace at all,
in which case the <redefine>d schema document is converted to the <redefine>ing schema
document's targetNamespace.
For example:
actual.xsd
<?xml version="1.0" ?>
<xsd:schema targetNamespace="http://inheritance-ext-res"
xmlns:tns="http://inheritance-ext-res"
xmlns:xsd="http://www.w3.org/2001/XML Schema"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xsd:complexType name="BaseType">
<xsd:sequence>
<xsd:element name="child1" type="xsd:string" />
</xsd:sequence>
<xsd:attribute name="att1" type="xsd:string" use="required" />
</xsd:complexType>
<xsd:complexType name="DerivedType">
<xsd:complexContent>
<xsd:extension base="tns:BaseType">
<xsd:choice>
<xsd:element name="child2" type="xsd:string" />
<xsd:element name="child3" type="xsd:string" />
</xsd:choice>
<xsd:attribute name="att2" type="xsd:string" use="required" />
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:schema>
redefine.xsd
<xsd:redefine schemaLocation="actual.xsd">
<xsd:complexType name="DerivedType">
<xsd:complexContent>
<xsd:extension base="tns:DerivedType">
<xsd:sequence>
<xsd:element name="child4" type="xsd:string" />
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:redefine>
<xsd:element name="Redefine">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Base" type="tns:BaseType" />
<xsd:element name="Derived" type="tns:DerivedType" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
In the above schema:
Note that the name of a type is not changed when redefining it. Therefore, redefined types use
themselves as their base types.
In the above example, we redefine a complexType named DerivedType without changing its
name. While redefining DerivedType, any reference to "DerivedType" (for example
base="tns:DerivedType") is supposed to refer to the actual DerivedType. After the type is
redefined, any reference to the DerivedType is supposed to refer to the redefined type.
<Redefine xmlns="http://inheritance-ext-res"
xmlns:xsi="http://www.w3.org/2001/XML Schema-instance"
xsi:schemaLocation="http://inheritance-ext-res redefine.xsd">
<Base att1="val">
<child1>This is base type</child1>
</Base>
</Redefine>
Constraints
Identity constraint
XML Schema allows you to enforce uniqueness constraints on the content of elements and
attributes, which guarantees that in the instance document the value of the specified elements or
attributes are unique. When uniqueness is enforced, there must be an item whose value is to be
checked for uniqueness—ISBN number, for example. When you have identified the item, then
you must identify the set in which the value of those selected items should be checked for
uniqueness (a set of books, for example).
XML Schema provides two constructs — unique and key—to enforce uniqueness constraints.
Unique ensures that if the specified values are not null, then they must be unique in the defined
set; key ensures that the specified values are never null and are unique in the defined set.
There is one more construct — keyref, which points to some key already defined. Keyref then
ensures that the value of the specified item within keyref exists in the set of keys the keyref is
pointing to.
All three constructs have the same syntax (all of them use a selector and fields) but different
meanings. The selector is used to define the set in which uniqueness is to enforced, and field
(multiple fields are used to define a composite item) is used to define the item whose value is to
be checked for uniqueness. The value for both selector and field are XPath expressions. XPath
expressions do not respect default namespaces; therefore, it becomes very essential to make the
XPath expressions namespace aware by explicitly using prefixes bound to appropriate
namespace, if the elements/attributes are in a namespace. For example:
<xsd:complexType name="BookType">
<xsd:sequence>
<xsd:element name="title" type="xsd:string" />
<xsd:element name="half-isbn" type="xsd:string" />
<xsd:element name="other-half-isbn" type="xsd:float" />
</xsd:sequence>
</xsd:complexType>
<xsd:element name="Books">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Book" type="tns:BookType" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>
<xsd:key name="isbn">
<xsd:selector xpath=".//tns:Book" />
<xsd:field xpath="tns:half-isbn" />
<xsd:field xpath="tns:other-half-isbn" />
</xsd:key>
</xsd:element>
</xsd:schema>
In the above schema, we declared a key named "isbn" that says, "The composite
value (half-isbn + other-half-isbn) specified by field must be not null and unique in
the set of books, as specified by the selector."
The UPA constraint ensures that the content model of every element be specified in a way such
that while validating XML instance there is no ambiguity and the correct element declarations
can be determined deterministically for validation. For example, the following schema violates
the UPA constraint:
<xsd:element name="upa">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="a" minOccurs="0"/>
<xsd:element name="b" minOccurs="0"/>
<xsd:element name="a" minOccurs="0"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
...because in the corresponding XML-instance for the above schema:
<upa>
<a/>
</upa>
It is not deterministic that the element "a" in the XML instance corresponds to which
element declaration in the schema—the element declaration for "a", which is before
the element declaration for "b"; or the element declaration for "a", which is after the
element declaration for "b"? This restriction limits you to write an XMLSchema for
the type of XML instance you just saw. Anyway, in this case, if you just set the
minOccurs of element "b" to anything greater than 0, then the UPA is not violated.
<xsd:element name="upa">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="a" minOccurs="0"/>
<xsd:element name="b" minOccurs="1"/>
<xsd:element name="a" minOccurs="0"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
...because in the corresponding XML-instance for the above schema:
<upa>
<a/>
<b/>
</upa>
It is quite clear that the element "a" in the XML instance is actually an instance of
the element declaration for "a", which is before the element declaration for "b" in
the schema.
Conclusion
Now that you have completed this series, you should understand:
One of the primary motivations for defining an XML namespace is to avoid naming conflicts
when using and re-using multiple vocabularies.
XML Schema is used to create a vocabulary for an XML instance, and uses namespaces heavily..
Thus, we see that the namespaces in XML concept is not very different from packages in Java.
This correlation is intended to simplify the understanding of namespaces in XML and to help you
visualize the namespaces concept.
It is not mandatory to declare namespaces only at the root element; rather it could be declared at
any element in the XML document.
The scope of a declared namespace begins at the element where it is declared and applies to the
entire content of that element, unless overridden by another namespace declaration with the same
prefix name—where, the content of an element is the content between the <opening-tag> and
</closing-tag> of that element. A namespace is declared as follows:
In the attribute xmlns:pfx, xmlns is like a reserved word, which is used only to declare a
namespace. In other words, xmlns is used for binding namespaces, and is not itself bound to any
namespace. Therefore, the above example is read as binding the prefix "pfx" with the namespace
"http://www.foo.com."
It is a convention to use XSD or XS as a prefix for the XML Schema namespace, but that
decision is purely personal. One can choose to use a prefix ABC for the XML Schema
namespace, which is legal, but doesn't make much sense. Using meaningful namespace prefixes
add clarity to the XML document. Note that the prefixes are used only as a placeholder and must
be expanded by the namespace-aware XML parser to use the actual namespace bound to the
prefix. In Java analogy, a namespace binding can be correlated to declaring a variable,
and wherever the variable is referenced, it is replaced by the value it was assigned.
In our previous namespace declaration example, wherever the prefix "pfx" is referenced within
the namespace declaration scope, it is expanded to the actual namespace (http://www.foo.com)
to which it was bound:
Although a namespace usually looks like a URL, that doesn't mean that one must be connected to
the Internet to actually declare and use namespaces. Rather, the namespace is intended to serve
as a virtual "container" for vocabulary and un-displayed content that can be shared in the Internet
space. In the Internet space URLs are unique—hence you would usually choose to use URLs to
uniquely identify namespaces. Typing the namespace URL in a browser doesn't mean it would
show all the elements and attributes in that namespace; it's just a concept.
But here's a twist: although the W3C Namespaces in XML Recommendation declares that the
namespace name should be an IRI, it enforces no such constraint. Therefore, I could also use
something like:
By now it should be clear that to use a namespace, we first bind it with a prefix and then use that
prefix wherever required. But why can't we use the namespaces to qualify the elements or
attributes from the start?
First, because namespaces—being IRIs—are quite long and thus would hopelessly clutter the
XML document.
Second and most important, because it might have a severe impact on the syntax, or to be
specific, on the production rules of XML—the reason being that an IRI might have characters
that are not allowed in XML tags per the W3C XML 1.0 Recommendation.
Below the elements Title and Author are associated with the Namespace
http://www.library.com:
<?xml version="1.0"?>
<Book xmlns:lib="http://www.library.com">
<lib:Title>Sherlock Holmes</lib:Title>
<lib:Author>Arthur Conan Doyle</lib:Author>
</Book>
In the example below, the elements Title and Author of Sherlock Holmes - IIIand
Sherlock Holmes - I are associated with the namespace http://www.library.com and the
elements Title and Author of Sherlock Holmes - II are associated with the namespace
http://www.otherlibrary.com.
<?xml version="1.0"?>
<Book xmlns:lib="http://www.library.com">
<lib:Title>Sherlock Holmes - I</lib:Title>
<lib:Author>Arthur Conan Doyle</lib:Author>
<purchase xmlns:lib="http://www.otherlibrary.com">
<lib:Title>Sherlock Holmes - II</lib:Title>
<lib:Author>Arthur Conan Doyle</lib:Author>
</purchase>
<lib:Title>Sherlock Holmes - III</lib:Title>
<lib:Author>Arthur Conan Doyle</lib:Author>
</Book>
<?xml version="1.0"?>
<Book xmlns:XmlLibrary="http://www.library.com">
<lib:Title>Sherlock Holmes - I</lib:Title>
<lib:Author>Arthur Conan Doyle</lib:Author>
</Book>
It would be painful to repeatedly qualify an element or attribute you wish to use from a
namespace. In such cases, you can declare a {default namespace} instead. Remember, at any
point in time, there can be only one {default namespace} in existence. Therefore, the term
"Default Namespaces" is inherently incorrect.
Declaring a {default namespace} means that any element within the scope of the {default
namespace} declaration will be qualified implicitly, if it is not already qualified explicitly using a
prefix. As with prefixed namespaces, a {default namespace} can be overridden too. A {default
namespace} is declared as follows:
<?xml version="1.0"?>
<Book xmlns="http://www.library.com">
<Title>Sherlock Holmes</Title>
<Author>Arthur Conan Doyle</Author>
</Book>
In this case the elements Book, Title, and Author are associated with the Namespace
http://www.library.com.
Remember, the scope of a namespace begins at the element where it is declared. Therefore, the
element Book is also associated with the {default namespace}, as it has no prefix.
<?xml version="1.0"?>
<Book xmlns="http://www.library.com">
<Title>Sherlock Holmes - I</Title>
<Author>Arthur Conan Doyle</Author>
<purchase xmlns="http://www.otherlibrary.com">
<Title>Sherlock Holmes - II</Title>
<Author>Arthur Conan Doyle</Author>
</purchase>
<Title>Sherlock Holmes - III</Title>
<Author>Arthur Conan Doyle</Author>
</Book>
In the above, the elements Book, and Title, and Author of Sherlock Holmes - III and
Sherlock Holmes - I are associated with the namespace http://www.library.com and the
elements purchase, Title, and Author of Sherlock Holmes - II are associated with the
namespace http://www.otherlibrary.com.
Default Namespace and Attributes
Default namespaces do not apply to attributes; therefore, to apply a namespace to an attribute the
attribute must be explicitly qualified. Here the attribute isbn has {no namespace} whereas the
attribute cover is associated with the namespace http://www.library.com.
<?xml version="1.0"?>
<Book isbn="1234"
pfx:cover="hard"
xmlns="http://www.library.com"
xmlns:pfx="http://www.library.com">
<Title>Sherlock Holmes</Title>
<Author>Arthur Conan Doyle</Author>
</Book>
Undeclaring Namespace
Unbinding an already-bound prefix is not allowed per the W3C Namespaces in XML 1.0
Recommendation, but is allowed per W3C Namespaces in XML 1.1 Recommendation. There
was no reason why this should not have been allowed in 1.0, but the mistake has been rectified in
1.1. It is necessary to know this difference because not many XML parsers yet support
Namespaces in XML 1.1.
Although there were some differences in unbinding prefixed namespaces, both versions allow
you to unbind or remove the already declared {default namespace} by overriding it with another
{default namespace} declaration, where the namespace in the overriding declaration is empty.
Unbinding a namespace is as good as the namespace not being declared at all. Here the elements
Book, Title, and Author of Sherlock Holmes - III and Sherlock Holmes - I are
associated with the namespace http://www.library.com and the elements purchase, Title,
and Author of Sherlock Holmes - II have {no namespace}:
<?xml version="1.0"?>
<Book xmlns="http://www.library.com">
<Title>Sherlock Holmes - I</Title>
<Author>Arthur Conan Doyle</Author>
<purchase xmlns="">
<Title>Sherlock Holmes - II</Title>
<Author>Arthur Conan Doyle</Author>
</purchase>
<Title>Sherlock Holmes - III</Title>
<Author>Arthur Conan Doyle</Author>
</Book>
Here's an invalid example of unbinding a prefix per Namespaces in XML 1.0 spec, but a valid
example per Namespaces in XML 1.1:
<purchase xmlns:lib="">
From this point on, the prefix lib cannot be used in the XML document because it is now
undeclared as long as you are in the scope of element purchase. Of course, you can definitely
re-declare it.
No Namespace
No namespace exists when there is no default namespace in scope. A {default namespace} is one
that is declared explicitly using xmlns. When a {default namespace} has not been declared at all
using xmlns, it is incorrect to say that the elements are in {default namespace}. In such cases, we
say that the elements are in {no namespace}. {no namespace} also applies when an already
declared {default namespace} is undeclared.
In summary:
Thus far we have seen how to declare and use an existing namespace. Now let's examine how to
create a new namespace and add elements and attributes to it using XML Schema.
XML Schema is an XML before it's anything else. In other words, like any other XML
document, XML Schema is built with elements and attributes. This "building material" must
come from the namespace http://www.w3.org/2001/XMLSchema, which is a declared and
reserved namespace that contains elements and attributes as defined in W3C XML Schema
Structures Specification and W3C XML Schema Datatypes Specification. You should not add
elements or attributes to this namespace.
Using these building blocks we can create new elements and attributes as required and enforce
the required constraints on these elements and attributes and keep them in some namespace. (See
Figure 1.) XML Schema calls this particular namespace as the {target namespace}, or the
namespace where the newly created elements and attributes will reside.
Figure 1: Elements and attributes in XML Schema namespace
are used to write an XML Schema document, which
generates elements and attributes as defined by user and
puts them in {target namespace}. This {target namespace}
is then used to validate the XML instance.
This {target namespace} is referred from the XML instance for ensuring validity of the instance
document. (See Figure 2.) During validation, the Validator verifies that the elements/attributes
used in the instance exist in the declared namespace, and also checks for any other constraint on
their structure and datatype.
Figure 2: From XML Schema to XML Schema
instance
Qualified or Unqualified
In XML Schema we can choose to specify whether the instance document must qualify all the
elements and attributes, or must qualify only the globally declared elements and attributes.
Regardless of what we choose, the entire instance would be validated. So why do we have two
choices?
The answer is "manageability." When we choose qualified, we are specifying that all the
elements and attributes in the instance must have a namespace, which in turn adds namespace
complexity to instance. If say that the schema is modified by making some local declarations
global and/or making some global declarations local, then the instance documents are not
affected at all. In contrast, when we choose unqualified, we are specifying that only the globally
declared elements and attributes in the instance must have a namespace, which in turn hides the
namespace complexity from the instance. But in this case, if say, the schema is modified by
making some local declarations global and/or making some global declarations local, then all
instance documents are affected—and the instance is no longer valid. The XML Schema
Validator would report validation errors if we try to validate this instance against the modified
XML Schema. Therefore, the namespaces must be fixed in the instance per the modification
done in XML Schema to make the instance valid again.
<complexType name="BookType">
<sequence>
<element name="Title" type="string" />
<element name="Author" type="string" />
</sequence>
</complexType>
</schema>
The declarations that are the immediate children of the element <schema> are the global
declarations, and the rest are local declarations. In the above example, Book and BookType are
declared globally whereas Title and Author are local declarations.
We can express the choice between qualified and unqualified by setting the schema element
attributes elementFormDefault and attributeFormDefault to either qualified or unqualified.
When elementFormDefault is set to qualified, it implies that in the instance of this grammar
all the elements must be explicitly qualified, either by using a prefix or setting a {default
namespace}. An unqualified setting means that only the globally declared elements must be
explicitly qualified, and the locally declared elements must not be qualified. Qualifying a local
declaration in this case is an error. Similarly, when attributeFormDefault is set to qualified,
all attributes in the instance document must be explicitly qualified using a prefix.
Remember, {default namespace} doesn't apply to attributes; hence, we can't use a {default
namespace} declaration to qualify attributes. Unqualified seems to imply being in the namespace
by virtue of the containing element. This is interesting, isn't it?
In the following diagrams, the concept symbol space is similar to the non-normative concept of
namespace partition. For example, if a namespace is like a refrigerator, then the symbol spaces
are the shelves in the refrigerator. Just as shelves partition the entire space in a refrigerator, the
symbol spaces partition the namespace.
There are three primary partitions in a namespace: one for global element declarations, one for
global attribute declarations, and one for global type declarations (complexType/simpleType).
This arrangement implies we can have a global element, a global attribute, and a global type all
have the same name, and still co-exist in a {target namespace} without any name collisions.
Further, every global element and a global complexType have their own symbol space to contain
the local declarations.
Let's examine the four possible combinations of values for the pair of attributes
elementFormDefault and attributeFormDefault.
Now we know that XML Schema creates the new elements and attributes and puts it in a
namespace called {target namespace}. But what if we don't specify a {target namespace} in the
schema? When we don't specify the attribute targetNamespace at all, no {target namespace}
exists—which is legal—but specifying an empty URI in the targetNamespace attribute is
"illegal."
For example, the following is invalid. We can't specify an empty URI for the {target
namespace}:
In this case, when no {target namespace} exists, we say, as described earlier, that the newly
created elements and attributes are kept in {no namespace}. (It would have been incorrect to use
the term {default namespace}.) To validate the corresponding XML instance, the corresponding
XML instance must use the noNamespaceSchemaLocation attribute from the
http://www.w3.org/2001/XMLSchema-instance namespace to refer to the XML Schema with
no target namespace.
Conclusion
Hopefully, this overview of namespaces should help you move to XML Schema more easily. The
Oracle XML Developer Kit (XDK) supports the W3C Namespaces in the XML 1.0
Recommendation; you can turn on/off the namespace check using the JAXP APIs in the Oracle
XDK by using the setNamespaceAware(boolean) method in the SAXParserFactory and the
DocumentBuilderFactory classes.
Learn which datatypes are supported in XML Schema version 1.0, and how to use them.
The W3C XML Schema Datatype Specification defines numerous datatypes for validating the
element content and the attribute value. These datatypes can be used to validate only the scalar
content of elements, and not the non-scalar or mixed content. The text enclosed between the
<opening> and </closing> element tags, and the value of the attributes are often referred to as
scalar data, but it can also be a list of scalar data. These datatypes are intended for use in XML
Schema definition and other XML-related documents.
Initially, Document Type Definition (DTD) was the only grammar available for validating XML
instances. But DTD has only a handful of datatypes, ensuring coarse validation of the scalar data
in XML via the familiar PCDATA, CDATA, and so on. XML Schema, in contrast, overcomes
this limitation by providing 44 built-in datatypes. Each of these datatypes can be further
customized to ensure fine validation of the scalar data. For example, the built-in datatype string
can be customized to successfully validate strings and ensure they are of length 4.
• The difference between the value space, lexical space, and canonical lexical
representation of the supported datatypes
• The datatypes supported in XML Schema, their classifications, and their
relationships to each other
• Creation of new datatypes from the built-in datatypes using restriction, list,
and union constructs
• Various constraining facets available for restricting a datatype
• How to use Oracle XDK to programmatically create and use XML Schema
datatypes.
Datatype Fundamentals
Before we dive into the various types of datatypes, their usage, and the relationships between
them, we need to understand datatypes as a general concept. Although XML Schema
specification explains the following fundamentals about datatypes, these fundamentals are not
specific to XML Schema. Rather, they are general mathematical concepts. Let's examine them in
more detail.
Consider this metaphor: In the English language (and in fact in all languages), we
have various words that share the same meaning. A value can be correlated to a
word's meaning, and the corresponding literals can then be correlated to various
different words, all having the same meaning.
For example, 100.0, 200.0, and so on are values in the value space of datatype
float. The value 100.0 can be represented using multiple literals such as 10.0E+1,
1.0E2, 1.0E+2, and so on. Similarly, the value 200.0 can be represented using
multiple literals such as 2.0E2, 2.0E+2, and so on. All such literals for every value in
the value space of float belong to the lexical space of datatype float. (See Figure
1.)
A canonical lexical representation is a set of literals from among the valid set of literals for a
datatype such that there is a one-to-one mapping between literals in the canonical lexical
representation and values in the value space. (See Figures 2 and 3.)
Figure 2: Many literals in the lexical space map
to exactly one literal in the canonical lexical
representation.
Canonical representations do not serve any purpose in XML Schema but are useful in other
specifications that use XML Schema datatypes. For example, the XQuery/XPath datamodel uses
XML Schema types as well as the canonical lexical representation to serialize a value. Therefore,
when serializing a value such as 100.0, the corresponding canonical lexical representation is used
—in this case, 1.0E2.
Datatypes in XMLSchema
Now that we understand the fundamental concept about datatypes in general, let's
explore the datatypes available in XML Schema. Broadly speaking, the datatypes in
XML Schema can be categorized as ur-Type, built-in, and user-derived (se Table 1
below) and are related to each other as shown in Figure 4.
ur-Type anyType
anySimpleType
Built-in Primitive
(Atomic)
Derived
User-Derived Restriction
List
Union
Now, let's examine the major classifications—ur-Type, built-in, and user-derived—more closely.
ur-Type
An ur-Type is a classification that says there exists a base or root of the entire type system
hierarchy in XML Schema datatypes. Any and every datatype in XML Schema has the ur-Type
as its parent or ancestor. The ur-Type has a role similar to that of java.lang.Object in Java, which
is the base class of all built-in and user-defined classes in that language. Similarly, the ur-type is
the base of all datatypes in XML Schema. anyType and anySimpleType are the two ur-types
available in XML Schema.
anyType
The anyType datatype is a concrete ur-Type, which can serve either as a complex type (non-
scalar data, means elements), or as a simple type (scalar data) depending on the context. For
example, here is an XML Schema using the anyType datatype:
<?xml version="1.0" encoding="US-ASCII"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://mydatatypes.edu"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
</schema>
Here is the corresponding valid instance using scalar data:
<Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://mydatatypes.edu ex2.xsd"
xmlns="http://mydatatypes.edu">USD</Currency>
And here is the corresponding valid instance using non-scalar data:
<Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://mydatatypes.edu ex2.xsd"
xmlns="http://mydatatypes.edu">
<dollars>100</dollars>
</Currency>
anySimpleType
The anySimpleType datatype is also a concrete ur-Type, and is the parent of all built-in
datatypes and ancestor of all user-derived scalar datatypes. It differs from anyType in the sense
that it can hold only scalar data corresponding to any scalar datatype, whereas anyType can hold
scalar as well as non-scalar data. For example, here is an XML Schema using the
anySimpleType datatype:
</schema>
Here is the corresponding valid instance using scalar data:
<Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://mydatatypes.edu ex3.xsd"
xmlns="http://mydatatypes.edu">USD</Currency>
And here is the corresponding invalid instance using non-scalar data:
<Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://mydatatypes.edu ex3.xsd"
xmlns="http://mydatatypes.edu">
<dollars>100</dollars>
</Currency>
In fact, if you don't specify any type for an element declaration, its type defaults to
anyType, and if you don't specify any type for an attribute declaration, its type
defaults to anySimpleType. In the example below, the type of element Currency
defaults to anyType and the type of attribute MoreCurrency defaults to anySimpleType.
</schema>
Built-in Datatypes
Built-in datatypes, which are defined in the W3C XML Schema Datatype Specification, must be
supported by all W3C XML Schema-compliant parsers. There are two classifications of built-in
datatypes: primitive and derived. The differences between the two have little relevance for the
user, but we will examine them here anyway to demonstrate the mechanics and utility of
datatype generation. (See the W3C's built-in datatype inheritance diagram here.)
Primitive datatypes are indivisible. They are not defined in terms of other datatypes; they exist
independently. For example, decimal is a well-defined mathematical concept that cannot be
defined in terms of any other datatypes. There are the 19 built-in primitive datatypes supported
by the XML Schema Datatypes Specification:
string
boolean
decimal
float
double
duration
dateTime
time
date
gYearMonth
gYear
gMonthDay
gDay
gMonth
hexBinary
base64Binary
anyURI
QName
NOTATION
For details, see Section 3.2 of the XML Schema Part 2.
Built-in Derived Datatypes
Derived datatypes, in contrast, are divisible because they are derived from the built-in primitive
datatypes—in other words, derived datatypes are defined in terms of other datatypes. For
example, an integer is a well-defined mathematical concept that can be defined in terms of
decimal with the restriction of not using the decimal point. There are 25 built-in derived
datatypes supported by XML Schema Datatypes:
normalizedString
token
language
NMTOKEN
NMTOKENS
Name
NCName
ID
IDREF
IDREFS
ENTITY
ENTITIES
integer
nonPositiveInteger
negativeInteger
long
int
short
byte
nonNegativeInteger
unsignedLong
unsignedInt
unsignedShort
unsignedByte
positiveInteger
For details, see Section 3.3 of Part 2 of the XML Schema spec.
User-Derived Datatypes
User-derived datatypes are the ones specified by the user in an XML Schema Definition, and are
created by either restriction, list, or union. The XML Schema construct <simpleType> is
used to create user-derived datatypes. Such a datatype can be named if one wants to re-use it or
can be anonymous if it is to be used only once.
There has been some confusion because the specification currently categorizes list and union
as user-derived datatypes. They should rather be categorized as user-defined datatypes for clarity.
This confusion may be addressed in the next version of XML Schema.
Every built-in datatype has a set of allowed constraining facets, which can be used to constrain
or restrict that datatype, leading to the creation of a new datatype categorized as a user-derived
datatype. A constraining facet is an optional property that can be applied to a datatype to
constrain its "value space." Constraining the "value space" consequently constrains the "lexical
space." Remember, the value space of a datatype can only be restricted and not extended. The
XML Schema construct <restriction> is used to create user-derived datatypes by restricting an
existing datatype with the allowed constraining facets. For example, a string of length 3 can be
expressed as:
<element name="Currency">
<simpleType>
<restriction base="string">
<length value="3" />
</restriction>
</simpleType>
</element>
</schema>
In the above example, an anonymous user-derived datatype—the base datatype being string—is
defined along with the constraining facet, length. The same example can be written using a
named user-derived datatype for re-usability:
<simpleType name="currency_type">
<restriction base="string">
<length value="3" />
</restriction>
</simpleType>
</schema>
Following are the 12 constraining facets in XML Schema, which can be used to
create a user-derived datatype from other available built-in datatypes. The
constraining facets might change however depending on the base datatype:
length
minLength
maxLength
pattern
enumeration
whiteSpace
maxInclusive
maxExclusive
minExclusive
minInclusive
totalDigits
fractionDigits
In XML Schema a list is a sequence of homogeneous items, separated by a white space (space,
tabs, carriage returns, new lines), where all the items in the list have the same datatype. It is
similar to an array in Java, which is self-describing.
The XML Schema construct <list> is used to create a list datatype. For example, a list of float
can be created as under:
<element name="Currency">
<simpleType>
<list itemType="float" />
</simpleType>
</element>
</schema>
A list need not always be of a built-in datatype; it can also be a list of user-derived
datatype. For example, a list of user-derived datatype from float, where the value is
restricted from 10.0 to 20.0, can be expressed as:
<element name="Currency">
<simpleType>
<list>
<simpleType>
<restriction base="float">
<minInclusive value="10.0" />
<maxInclusive value="20.0" />
</restriction>
</simpleType>
</list>
</simpleType>
</element>
</schema>
To re-use the above defined list datatype, we must name the list datatype as
follows:
<simpleType name="listOfFloat">
<list>
<simpleType>
<restriction base="float">
<minInclusive value="10.0" />
<maxInclusive value="20.0" />
</restriction>
</simpleType>
</list>
</simpleType>
</schema>
A valid instance adhering to the above schema can hold a list of float between the
range 10.0 and 20.0, both inclusive:
<Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://mydatatypes.edu ex5.xsd"
xmlns="http://mydatatypes.edu">10.0 12.4
15.0</Currency>
In the above example the items in the list are restricted to have a value from 10.0 to 20.0, but
there is no restriction on the number of items in the list. If we want to restrict the number of
items in the list to say 3, we can do that as follows:
<element name="Currency">
<simpleType>
<restriction base="tns:listOfFloat">
<length value="3" />
</restriction>
</simpleType>
</element>
<simpleType name="listOfFloat">
<list>
<simpleType>
<restriction base="float">
<minInclusive value="10.0" />
<maxInclusive value="20.0" />
</restriction>
</simpleType>
</list>
</simpleType>
</schema>
Here we used a facet—length—to restrict the number of items in the list in the
above example. For datatypes derived from list datatype, regardless of the
datatype of the individual itemType of list, only the following facets are allowed:
Length
MinLength
MaxLength
Pattern
Enumeration
WhiteSpace
User-Derived Union Datatype
A union datatype is created by taking a union of one or more other datatypes. The XML Schema
construct <union> is used to create union datatypes. For example, a union of int and float
datatypes can be expressed as:
<element name="Currency">
<simpleType>
<union memberTypes="int float" />
</simpleType>
</element>
</schema>
When validating the value of currency in the instance, it is first matched against
datatype int. If it is not a valid int then it is matched against datatype float. If it is
not a valid float either, then an error is raised. As you can see, the order in which
memberTypes are declared is indeed significant, but only from a datatype validator
perspective. From the user's perspective, the order of memberTypes is not
significant at all.
Similar to list, a union can be of primitive datatypes as well as user-derived datatypes. For
example, a union of user-derived datatypes from int and float can be expressed as follows:
<simpleType name="UnionOfIntFloat">
<union>
<simpleType>
<restriction base="int">
<minInclusive value="10" />
<maxInclusive value="20" />
</restriction>
</simpleType>
<simpleType>
<restriction base="float">
<minInclusive value="30.0" />
<maxInclusive value="40.0" />
</restriction>
</simpleType>
</union>
</simpleType>
</schema>
A valid instance adhering to the above schema can hold either a single int between
the range 10 and 20 or a single float between the range 30.0 and 40.0, both
inclusive:
<Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://mydatatypes.edu ex7.xsd"
xmlns="http://mydatatypes.edu">35.0</Currency>
When restricting a union datatype,regardless of the datatype of the individual
memberTypes, only the following facets are allowed:
Pattern
Enumeration
It is possible to mix and match list, union, and atomic datatypes with restrictions to
define a datatype per specific requirements. For more details about constraining
facets, see Section 4.1.5 of XML Schema Part 2 and Appendix B of XML Schema Part
0.
Datatype Namespaces
The datatypes that we have seen thus far are associated with the XML Schema
namespace http://www.w3.org/2001/XMLSchema, which has other XML Schema
constructs as well, like complexType, complexContent, group, and so on.
Because the W3C XML Schema Datatypes spec was written with the intention of not being used
exclusively within XML Schema definition language, but rather also to be used by other XML-
related languages, it provides a subset namespace of http://www.w3.org/2001/XMLSchema—
http://www.w3.org/2001/XMLSchema-datatypes—which contains only the built-in datatypes,
constraining facets, and so on needed to facilitate the use of XML Schema datatypes in other
languages.
The advantage of this separation affects the XML Schema datatype validator implementation, in
the sense that a standalone implementation of XML Schema datatypes is possible—as opposed to
implementing the entire XML Schema Structures plus XML Schema datatypes specification.
Apart from validating an XML instance against the XML Schema grammar, the Oracle XML
Developer's Kit (XDK) provides APIs to programmatically use the built-in datatypes, restrict
them using the constraining facets, and validate a value against the schema. For example:
import oracle.xml.parser.schema.*;
. . .
XSDSimpleType st = XSDSimpleType.getPrimitiveType(XSDSimpleType.iSTRING);
try {
//set a constraining facet on the simpleType
st.setFacet(XSDSimpleType.LENGTH, "5");
}
catch(XSDException ex1) {
System.out.println("[ERROR] Facet not supported.
"+ex1.getMessage());
}
try {
//validate value
st.validateValue("hello");
System.out.println("[SUCCESS] The value is valid.");
}
catch(XSDException ex2) {
System.out.println("[ERROR] Invalid Value. "+ex2.getMessage());
creates an anonymous datatype of type string and restricts it to successfully
validate only strings of length 5. You can use the XDK Schema APIs to create
datatypes and restrict them programmatically. See the XDK javadoc for more
details.
Conclusion
Now that you understand datatypes in XML Schema and their usage, moving to other constructs
of XML Schema, which define complex element content, should be much easier.