XSD and Schematron Validation - Part 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Print Story Page 1 of 8

Close Window

Print Story

Multipass Validation with XSD and


Schematron Part 1
If it is important that your XML documents are correct, catching mistakes early is, of course, much
less costly than catching them later. This should not be news to any XML developer.

But "correct" often means more than just a simple validity test at the end of the development
process. Today, no one schema language covers all the bases. Different languages offer different
possible measures of correctness. In combination, multiple schemas may provide the rich structure
an application or organization needs for success.

The most common of the newer schema languages is W3C XML Schema (XSD). Classwell
Learning Group, a division of textbook publisher Houghton Mifflin, has put XSD at the center of
their content-driven applications since 2001. While XSD is far more powerful than DTD for
defining structure and data types, limitations remain. Classwell found that a rules-based schema
language, Schematron, could provide additional validation rules to ensure consistency between
classes of documents and to enforce conventions in the XSD schemas themselves. It has become
particularly evident that, given XSD's notorious complexity, consistent and thought-out design
patterns are crucial for the effective use of XSD. Classwell has been working to establish their own
set of best practices for XSD development that can be monitored by a combination of validation
tools.

As a rules-based system, Schematron takes an approach to validation that's distinctly different from
that of XSD. When the two are paired together, Schematron adds significant prescriptive flexibility
that complements XSD's stronger expression of structure. Schematron came out of research by Rick
Jelliffe at Academia Sinica, but is interesting from a practical perspective because it provides
capabilities today that will only be available in XML Schema 1.1 or 2.0 in the future.

This article shows how to build a productive framework for multipass validation within Altova's
popular XMLSPY environment. The example we use is drawn from Classwell's use of Schematron
rules to enforce best practices in XSD design. In designing a new layer for an already complex
development process, we felt the issue of productivity was important. To reduce the chance of error
and speed up complex validation, we leveraged XMLSPY's rich scripting capabilities. The
XMLSPY Scripting Environment enabled us to add UI elements that controlled a flexible and
manageable multilanguage validation process with minimal effort.

Defining Rules for Best Practice


Our goal was to enforce a short list of practices that experience taught us should be required
patterns. For this article we have selected a subset of Classwell's XSD rules that demonstrate the
concept, while keeping the examples simple.
 Ensure the root element is an extension of a base rootType, guaranteeing a certain minimum
tagging of the content. XSD does not define which element is intended to be the document root, but

http://xml.sys-con.com/read/40656_p.htm 5/24/2008
Print Story Page 2 of 8

we can assume it will be the first global element defined.


 Schema must be versioned (XSD allows a version attribute for tracking - enforce its use).
 XSD files that define much of the common tagging must be consistently used - for our example
here, para.xsd and root.xsd.
 Disallow the use of the xs:string data type (xs:string allows text formatting we do not want within
PCDATA - xs:token should be used instead since it rules out new lines, tabs, leading and trailing
spaces, or runs of more than one space between words).
 Some elements are crucial to our system - when they are processed the system is particular about
their use; however, XSD lets you "shadow" global element definitions. So include a rule
disallowing the definition of certain elements locally, for our example, the element "locator".

These rules are really just a start. But in the interest of not overwhelming the reader with
information specific to Classwell, we will stop with these five. No doubt every development group
has its own similar rules.

Framework Requirements
Our goal for this article is straightforward: we want to demonstrate an easy path to multiple
language validation within a common IDE using an example that makes sense in day-to-day work.
With that aim in mind we kept our framework's requirements lightweight.
1. A user must be able to easily validate a document using an arbitrary number of validation
commands associated with that document.
2. Each validation command needs to execute an arbitrary command-line process that could give
feedback on any kind of validation.
3. The results of each validation command must be reported to the user.
4. The GUI must provide the following operations:

 Set up a validation command


 Remove a validation command
 Set up a schema
 Remove a schema
 Add a command/schema pair to a document
 Remove a command/schema pair from a document
 Validate using the set of all command/schema pairs associated with the document
5. Commands must be able to be written with variables that are swapped for the schema file path
and the XML document path before execution.
6. The solution must be reasonably easy to set up.

Neither of us saw a significant need to guarantee the order of validation at this time, so we left that
off the list. However, our final implementation could easily be modified for strict adherence to an
order of validation commands.

Overview of Tools Used


XMLSPY
Altova's XMLSPY product is in its fifth successful version this year. XMLSPY is the leading XML
IDE and for most people doesn't need much further introduction.

The XMLSPY scripting environment, however, is less well known. Although tightly integrated
with XMLSPY, the scripting environment is a separate Visual Basic-like tool used for forms-based
customizations. Scripting projects have a choice between JavaScript and VB Script - we used

http://xml.sys-con.com/read/40656_p.htm 5/24/2008
Print Story Page 3 of 8

JavaScript for this article, but either language offers a rich set of Windows components in addition
to the XMLSPY COM API.

XMLSPY's API has three main concerns:

 The application object model (projects, views, documents, etc.)


 XML and file-processing functions (validate, save, generate schema, etc.)
 A DOM-friendly XML object model

We'll get into each of these areas, as well as leverage the scripting environment's forms and event-
handling abilities. Also, we used Windows Scripting Host objects to interface with the file system
and command-line processes.

Schematron
Schematron is a lightweight language wrapped around XPath. There are several Schematron tools
freely available on the Internet. Of these we chose zvonSchematron (available at www.zvon.org).

Simply put, zvonSchematron is an XSLT file. Using zvonSchematron is a two-step process. After
you have created your Schematron schema first transform it using the zvonSchematron.xsl file and
your favorite XSLT processor - we used Altova's stand-alone XSLT Engine.

Then take the resulting XSLT file and apply it to your XML document. The final outcome is
usually an HTML file that itemizes any lack of conformance to your schema. As you will see,
within our framework it made more sense to dispense with the HTML and just output single lines of
text. To avoid distraction we made that modification to the XSLT output of zvonSchematron, rather
than modifying the generator.

Bridging the XMLSPY-Schematron disconnect


Out of the box, XMLSPY does not have special support for Schematron or multipass validation.
This lack of support is not a big surprise; most XML tools also do not support either concept.
What's good about XMLSPY is that its API makes it simple to add the capability we need using its
scripting environment, as we did, or as an XMLSPY plug-in.

We created a framework for executing a set of command-line processes associated with a file and
capturing the results. We stored the validation commands associated with an XML document in
processing instructions (PI) within that document. Each PI holds a validation command name
(mapped to an actual command by the user) and a schema name mapped to the path to a schema
document.

Another common approach we did not take is to store Schematron in the appinfo elements of an
XSD schema. We talked about taking this approach but decided that it had at least two drawbacks -
it required a tighter coupling of XSD to Schematron than we wanted (remember that at Classwell
the same Schematron rules validate a class of XSD), and we felt it would be a more difficult
implementation to present in a short article.

The XMLSPY Scripts


We needed the following macros:

http://xml.sys-con.com/read/40656_p.htm 5/24/2008
Print Story Page 4 of 8

 SetupCommand: Sets up a named command in a commands file


 RemoveCommand: Removes a named command from the commands file
 SetupSchema: Sets up a named schema in a schemas file
 RemoveSchema: Removes a named schema from the schemas file
 AddValidation: Adds a command/schema pair to an instance document as a PI
 RemoveValidation: Removes a command/schema pair from an instance document
 Validate: Validates using the set of command/schema pairs found in the validate PI within the
current document and reports the validation results

These macros drive forms and global functions. The user will run a macro by selecting its name on
XMLSPY's Tools menu. The last one, Validate, will not have its own front-end form, but it will
result in one of two simple forms popping up at the end of the validation process to report the
outcome. After our project is complete, under a simple standard configuration which we will
introduce, XMLSPY will offer each macro as a selection on the Tools menu without users having to
do any further setup.

To facilitate our current implementation and future script reuse we located much of the
programmatic work in global functions. This approach is a straightforward means of increasing
readability. What is less well known is that XMLSPY scripting projects (.prj files, usually kept in
the XMLSPY install directory) can be easily shared. So while the code in this article may not be
quite ready for the corporate XMLSPY script library, we have provided it in a format that makes it
easy for you to generalize it for reuse in your own projects.

Creating the macros


The best way to start is by stubbing out the macros. Open XMLSPY's scripting environment by
clicking on the Tools menu and selecting "Switch to Scripting Environment...". In the scripting
environment on the Project menu under the Modules folder you should see three items. Right-click
on the item named "(XMLSpy Macros)" and select "Add Function".

In the New Function dialog type "Validate" and click "OK". If you again right-click on "(XMLSpy
Macros)" and this time select "View Code" you see the code window on the right-hand side of the
application. In the top right-hand corner of the code window there is a pull-down menu that shows
the names of the available macros. "Validate" should be among those you see listed.

Each macro is simple. In general, they do three things: call a setup function, open a form, and call
an error report function. The setup function is responsible for clearing global variables and making
sure the macros are visible on the Tools menu. Each macro's form handles display and invokes
global functions that do the heavy lifting. The error reporting call pops up a dialog if an error
condition is present. In general the code you need to get this functionality is:

showMacros();
Application.ShowForm("form_name");
reportError();

However, the Validate macro you just created is the exception in that it does not have its own form,
so replace "Application.ShowForm("form_name");" with a call to a global function "validate".

Double-click on "(Global Declarations)" to bring up the code view of the global code space. Any
functions or variables declared in this area are available to all macros, forms, and event handlers.

http://xml.sys-con.com/read/40656_p.htm 5/24/2008
Print Story Page 5 of 8

For now, just stub out a validate function like this:

function validate() {
//
// validation code goes here.
//
}

At the same time add a stub of the showMacros and reportError functions.

Now add the other macros we need using the same process. For each one add the three lines using
the form name given here:

 Setup Command: setupValidationCommand


 Remove Command: removeValidationCommand
 Setup Schema: setupValidationSchema
 Remove Schema: removeValidationSchema
 Add Validation: addValidation
 Remove Validation: removeValidation

Building the forms


Creating the six forms you need is straightforward. Click on the Project menu and select "Add
Form". A blank form will appear with a name something like "Form1". If the properties sheet is not
open, click on the Layout menu and select "Properties...".

We kept track of validation commands and schemas in a pair of text files. Commands and schemas
are managed separately so they can be more easily reused, thereby saving the user from having to
retype the same command or file path. Each command or schema is listed on a line as a name-value
pair. The name is used instead of the value in the forms and PI for readability.

Each of the forms for validation commands is similar to the analogous one for schemas. We will
walk through the command forms together and leave you to create the matching schema forms
independently (of course, you can always download and use our code, available at www.sys-
con.com/xml/sourcec.cfm). Start with the Setup command. Click on the (Form Code) property of
your new form and enter "setupValidationCommand". In the Title property enter "Setup Validation
Command".

Now you need some form elements. To make sure you have access to the widgets you need click on
the View menu, select "Toolbars" and then "Object Bar". Drag two buttons onto the form. Double-
click on their text and change one to "Cancel" and one to "OK". In the property sheet of each set the
Button Type property to "Cancel" and "OK" respectively.

Next drag two text boxes on the form. One of these will take the name of a command. The other
will take a schema name. At this point the form's look and feel is essentially complete (see Figure
1).

To wire the form's elements together you need to add some code. Click on the Property Sheet's
Events tab, and then click on the form's Cancel button. On the Property Sheet, next to "EventClick"
where it says "(None)" click the mouse. A code dialog opens giving you a way to associate code

http://xml.sys-con.com/read/40656_p.htm 5/24/2008
Print Story Page 6 of 8

with a mouse-click event on that button. The only behavior we need from this button is for it to
close the form. Add the following line and click "OK":

TheView.Cancel();

The form's OK button will do a bit more work. Click on that button and then click on the same
place in the Property Sheet's Events tab. The form needs to call an addCommand function with the
Text properties of EditBox1 and EditBox2 as the arguments, as shown in Figure 2.

Let's keep things focused on the forms and macros for now. Just as you did for "validate" stub out
an addCommand function in the global area. You will return to the global functions later.

Moving right along, create a second form and name it "removeValidationCommand". This form
removes a named command from the commands file. Like the add form it needs two buttons, but in
addition to those it only needs a pull-down select element. The select is filled from a global function
that returns an array of command names. Add this code to the element's EventInitalize area:

var cmds = getCommands();


for ( i = 0; i < cmds.length; i++ ) This.AddItem(
cmds[i], i );

Then in the OK button's EventClick area add this code:

removeCommand( ComboBox1.GetText( ComboBox1.Selection ) );


TheView.Cancel();

Again stub out the getCommands and removeCommand functions in the global area.

The schema forms are basically the same as these command forms. Create them with the same
steps, remembering to give their functions the appropriate names.

Of the four remaining forms, the two reporting forms are also virtually identical. reportError lists
any error conditions found during a macro run. reportValidation shows the output, positive or
negative, for each validation command. If a document is valid all commands will be listed as
passed, otherwise this form will show the issues.

Starting with the reportError form, add an OK button and a list box. The list box shows each section
of the validationError string on its own line. We decided to add ";" between each section of the
message. In the EventInitalize area of ListBox1 you will break the string on that character with this
code:

if ( validationError == null ) {
return;
}
var split = validationError.split( ";" );
for ( i = 0; i < split.length; i++ ) This.InsertString(
i, split[i] );

http://xml.sys-con.com/read/40656_p.htm 5/24/2008
Print Story Page 7 of 8

Create the reportValidation form in the same way. Then add the two report variables to the global
area.

The final two forms you need are not significantly dissimilar from what you have already created.
addValidation and removeValidation are responsible for adding and removing PI from the current
document. Each PI holds the name of one validation command and one schema. When the validate
function is called all of the validate PI in the current document are collected and each command is
attempted. The forms look as shown in Figure 3.

The main new features of these forms are the following items. In the list box on removeValidation,
the EventInitalize area holds:

var names = getPIValues("validate");


for ( i = 0; i < names.length; i++ ) {
This.AddString( names[i] );
}

so you need to stub out a getPIValues function. The OK button's EventClick needs:

var number = ListBox1.Selection;


var text = ListBox1.GetText( number );
if ( number != -1 ) {
removePIByMatch( text );
}
TheView.Cancel();

Again, stub out the new functions called. Over on the addValidation form, the OK button's
EventClick has:

if ( ComboBox1.GetText( ComboBox1.Selection ) ==
xmlspyValidation ) {
if ( ComboBox2.GetText( ComboBox2.Selection ) !=
xmlspyValidation &&
ComboBox2.GetText( ComboBox2.Selection ) != "" &&
ComboBox2.GetText( ComboBox2.Selection ) !=
null ) {
printError( "addValidation(onClick)", new Error(0,
ComboBox1.GetText( ComboBox1.Selection ) + " can not
be matched to: " + ComboBox2.GetText(
ComboBox2.Selection ) ) );
TheView.Cancel();
return;

writePI( "validate", "command", ComboBox1.GetText(


ComboBox1.Selection ), "schema", ComboBox2.GetText(

http://xml.sys-con.com/read/40656_p.htm 5/24/2008
Print Story Page 8 of 8

ComboBox2.Selection ) );
TheView.Cancel();

What is happening is a check to see if the user is attempting to select XMLSPY native validation
and a schema. This combination is not permitted because XMLSPY makes that association through
a different channel, and changing that behavior is outside the scope of this article (but it is very
doable to programatically associate a schema with a document for the XMLSPY validator on-the-
fly). But if that misassociation is not the case, the form calls writePI. The PI written will have a
name of "validate" and two name-value pairs. The first named value is called "command" and holds
the validation command name. The second is "schema" and holds the schema name.

After you finish these forms and stub out all the functions you called you are done with the visual
side of things. You should be able to call your macros from the Tools menu of XMLSPY and see
your forms appear.

Looking Ahead
In the second part of the article we will give a quick tour of XMLSPY's core XMLData object as
you finish up the global functions. Then we'll turn to the XML side of things. You will see how to
implement the work group XSD design rules we outlined above as a Schematron schema. More
importantly, you will see how the framework you are creating can help you apply those rules as a
part of your regular development process.

© 2008 SYS-CON Media Inc.

http://xml.sys-con.com/read/40656_p.htm 5/24/2008

You might also like