The Purpose of XML Schema
The Purpose of XML Schema
The Purpose of XML Schema
According to the World Wide Web Consortium (W3C), which approved XML Schema as an official recommendation in 2001, "XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content, and semantics of XML documents." XML Schema was born out of a need to provide a more powerful and flexible alternative to the standard DTD (Document Type Definition), a language for expressing SGML and XML content models. Though many DTDs are still in use today in legacy document frameworks and industry standards, and are even often used in tandem with XSDs, XML Schema offers a lengthy list of advantages for defining XML documents XML Schema is an XML-based language used to create XML-based languages and data models. An XML schema defines element and attribute names for a class of XML documents. The schema also specifies the structure that those documents must adhere to and the type of content that each element can hold. XML Schema provides a much richer set of structures, types and constraints for describing data and is therefore expected to soon become the most common method for defining and validating highly structured XML documents XML documents that attempt to adhere to an XML schema are said to be instances of that schema. If they correctly adhere to the schema, then they are valid instances. This is not the same as being well formed. A well-formed XML document follows all the syntax rules of XML, but it does necessarily adhere to any particular schema. So, an XML document can be well formed without being valid, but it cannot be valid unless it is well formed.
A First Look
An XML schema describes the structure of an XML instance document by defining what each element must or may contain. An element is limited by its type. For example, an element of complex type can contain child elements and attributes, whereas a simple-type element can only contain text. The diagram below gives a first look at the types of XML Schema elements.
Schema authors can define their own types or use the built-in types. The following is a high-level overview of Schema types. 1. Elements can be of simple type or complex type. 2. Simple type elements can only contain text. They can not have child elements or attributes. 3. All the built-in types are simple types (e.g, xs:string). 4. Schema authors can derive simple types by restricting another simple type. For example, an email type could be derived by limiting a string to a specific pattern. 5. Simple types can be atomic (e.g, strings and integers) or non-atomic (e.g, lists). 6. Complex-type elements can contain child elements and attributes as well as text. 7. By default, complex-type elements have complex content, meaning that they have child elements. 8. Complex-type elements can be limited to having simple content, meaning they only contain text. They are different from simple type elements in that they have attributes. 9. Complex types can be limited to having no content, meaning they are empty, but they have may have attributes. 10. Complex types may have mixed content - a combination of text and child elements.
</xs:sequence> </xs:complexType> </xs:element> </xs:schema> Code Explanation As you can see, an XML schema is an XML document and must follow all the syntax rules of any other XML document; that is, it must be well formed. XML schemas also have to follow the rules defined in the "Schema of schemas," which defines, among other things, the structure of and element and attribute names in an XML schema. The document element of XML schemas is xs:schema. It takes the attribute xmlns:xs with the value ofhttp://www.w3.org/2001/XMLSchema, indicating that the document should follow the rules of XML Schema. This will be clearer after you learn about namespaces. In this XML schema, we see a xs:element element within the xs:schema element. xs:element is used to define an element. In this case it defines the element Author as a complex type element, which contains a sequence of two elements: FirstName and LastName, both of which are of the simple type, string.
Document type declaration XML Schema is XML, and therefore conforms to the syntax specified in the W3C XML recommendations. This means that XML Schema can be parsed by a standard XML parser,
can be accessed programmatically for integration testing and other validation purposes, and also that it is extensible (as demonstrated by standards such as XBRL). The XML document type declaration is not required by XML Schema, though it is inferred by the root element, <schema>.
Namespace declaration Namespaces provide a context for element and attribute names used within an XML document, allowing architects to build and extend upon XML vocabularies using URIs to ensure the creation of unique data tags.
A namespace declaration is not required in XML Schema, but namespaces can rather be defined inline with element and attribute in the XML instance. For example:
Namespaces can play a vital role in any large data integration project or data exchange scenario, where item names can often come into conflict. Expanded names Namespaces in XML dictates that once namespaces are declared, they are enforced through the use of expanded names. An expanded name is simply a namespace name (defined in the namespace declaration) combined with a local name to denote an item definition that is unique to the declared namespace. In the example below, <xsd:annotation> tells us that the definition of annotation applies specifically and uniquely to the XML Schema (xsd) vocabulary.
Type definitions Type definitions (complexType and simpleType) enable developers to build modular data structures and reuse individual content models without rewriting code every time they need to employ the same data syntax. In the example, line 11 defines a complexType "USAddress", which uses a familiar data structure. This structure is then reused in lines 13 and 14 to describe our shipping and billing addresses, which may contain different content, but will still adhere to the same syntax rules.
Element / Attribute declarations Element and attribute declarations simply define the names that will be used for tags within the XML instance. Both of these can be further defined with a variety of different constraints including id, type, substitutionGroup, max/minOccurs, etc.
Sequence definition The sequence element defines the order in which child elements are required to appear in the corresponding XML document.
Data Validation
XML Schemas define the structure of elements and attributes within an XML document, and offer a great deal of flexibility in designing and customizing content models for any kind of
documentation requirements. XML parsers use XML Schema to validate the following aspects of XML instances: Document structure, or syntax Datatypes Inclusion of required elements/attributes
This enables application designers to automate the control of user input in any of the many situations where XML is used including Web forms, publishing systems, databases and other backend storage mechanisms, data integration applications, Web services, etc.
For example, used in conjunction with other XML technologies, such as XSLT and XMLenabled databases, global elements defined in XSDs can be processed consistently and uploaded to the appropriate database structure or even simultaneously output to HTML, RTF, PDF, and other formats using a methodology called single source publishing. The data-oriented datatypes provided in XML Schema 1.1, in addition to the documentoriented datatypes in the previous version of the recommendation, facilitate
complex document exchange and data integration scenarios, giving it exposure to the B2B and e-commerce architectures that traditionally employ other data formats such as EDI (electronic data interchange). In addition, XML Schemas support for namespaces enables XML documents to contain unique identifiers, and therefore incorporate more than one commonly used vocabulary at
a time. A namespace declaration, or binding, is generally declared in an XML document via an IRI (Internationalized Resource Identifier), and is expressed by applying a prefix to relevant elements and attributes. Namespaces provide enormous opportunities for data exchange and integration, enabling entire XML frameworks to coexist within the same architecture. This is an extremely valuable asset for a global economy, where mergers and acquisitions, supply chain requirements, and industry standards often dictate
Despite these hurdles, the ability to create a flexible and extensible architecture provided by XML Schema and other XML technologies enables early adopters and forward thinking companies to easily adapt to changing industry mandates with resources such as XSLT, XPath, XQuery, and XML-enabled databases.
Schema Component Details, and Schemas and Schema-validity Assessment. This section of the specification depends on and refers directly to other W3C publications: XML Information Set, XML Namespaces, and XPath, as well as the XML Schema: Datatypes. XML Schema 1.1 Part 2: Datatypes describes and defines the strong datatyping capabilities of the XML Schema recommendation and is included as a separate document to enable it to be used as an independent entity and therefore portable to other XML tools and technologies. Datatyping allows schema designers to constrain the input of end-users through the application of recognized abstract concepts such as string, Boolean, integer, etc.
Restrictions on Values
The following example defines an element called "age" with a restriction. The value of age cannot be lower than 0 or greater than 120:
<xs:element name="age"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> </xs:restriction> </xs:simpleType> </xs:element>
<xs:element name="car"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:simpleType> </xs:element>
The example above could also have been written like this:
<xs:element name="car" type="carType"/> <xs:simpleType name="carType"> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:simpleType>
Note: In this case the type "carType" can be used by other elements because it is not a part of the "car" element.
<xs:element name="letter"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[a-z]"/> </xs:restriction> </xs:simpleType> </xs:element>
The next example defines an element called "initials" with a restriction. The only acceptable value is THREE of the UPPERCASE letters from a to z:
<xs:element name="initials"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[A-Z][A-Z][A-Z]"/> </xs:restriction> </xs:simpleType> </xs:element>
The next example also defines an element called "initials" with a restriction. The only acceptable value is THREE of the LOWERCASE OR UPPERCASE letters from a to z:
<xs:element name="initials"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[a-zA-Z][a-zA-Z][a-zA-Z]"/> </xs:restriction> </xs:simpleType> </xs:element>
The next example defines an element called "choice" with a restriction. The only acceptable value is ONE of the following letters: x, y, OR z:
<xs:element name="choice">
<xs:element name="prodid"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:pattern value="[0-9][0-9][0-9][0-9][0-9]"/> </xs:restriction> </xs:simpleType> </xs:element>
<xs:element name="letter"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="([a-z])*"/> </xs:restriction> </xs:simpleType> </xs:element>
The next example also defines an element called "letter" with a restriction. The acceptable value is one or more pairs of letters, each pair consisting of a lower case letter followed by an upper case letter. For example, "sToP" will be validated by this pattern, but not "Stop" or "STOP" or "stop":
<xs:element name="letter"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="([a-z][A-Z])+"/> </xs:restriction> </xs:simpleType> </xs:element>
The next example defines an element called "gender" with a restriction. The only acceptable value is male OR female:
<xs:element name="gender"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="male|female"/> </xs:restriction> </xs:simpleType> </xs:element>
The next example defines an element called "password" with a restriction. There must be exactly eight characters in a row and those characters must be lowercase or uppercase letters from a to z, or a number from 0 to 9:
<xs:element name="password"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[a-zA-Z0-9]{8}"/> </xs:restriction> </xs:simpleType> </xs:element>
<xs:element name="address"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:whiteSpace value="preserve"/> </xs:restriction> </xs:simpleType> </xs:element>
This example also defines an element called "address" with a restriction. The whiteSpace constraint is set to "replace", which means that the XML processor WILL REPLACE all white space characters (line feeds, tabs, spaces, and carriage returns) with spaces:
<xs:element name="address"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:whiteSpace value="replace"/> </xs:restriction> </xs:simpleType> </xs:element>
This example also defines an element called "address" with a restriction. The whiteSpace constraint is set to "collapse", which means that the XML processor WILL REMOVE all white space characters (line feeds, tabs, spaces, carriage returns are replaced with spaces, leading and trailing spaces are removed, and multiple spaces are reduced to a single space):
<xs:element name="address"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:whiteSpace value="collapse"/> </xs:restriction> </xs:simpleType> </xs:element>
Restrictions on Length
To limit the length of a value in an element, we would use the length, maxLength, and minLength constraints. This example defines an element called "password" with a restriction. The value must be exactly eight characters:
<xs:element name="password"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:length value="8"/> </xs:restriction> </xs:simpleType> </xs:element>
This example defines another element called "password" with a restriction. The value must be minimum five characters and maximum eight characters:
<xs:element name="password"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:minLength value="5"/> <xs:maxLength value="8"/> </xs:restriction> </xs:simpleType> </xs:element>
maxExclusive Specifies the upper bounds for numeric values (the value must be less than this value) maxInclusive Specifies the upper bounds for numeric values (the value must be less than or equal to this value) maxLength Specifies the maximum number of characters or list items allowed. Must be equal to or greater than zero
minExclusive Specifies the lower bounds for numeric values (the value must be greater than this value) minInclusive Specifies the lower bounds for numeric values (the value must be greater than or equal to this value) minLength pattern totalDigits whiteSpace Specifies the minimum number of characters or list items allowed. Must be equal to or greater than zero Defines the exact sequence of characters that are acceptable Specifies the exact number of digits allowed. Must be greater than zero Specifies how white space (line feeds, tabs, spaces, and carriage returns) is handled