Working With XML Data: Optional
Working With XML Data: Optional
Working With XML Data: Optional
Module Objetives
After completing this topic, you should be able to:
■ Describe the XML stage
2
© 2013 IBM Corporation
Information Management
XML Document
■ XML syntax rules
‒ Document has exactly one root element
‒ Each start tag is matched by one end tag
‒ All elements are properly nested
‒ Attribute values are quoted
‒ XML elements are case sensitive
‒ Disallowed characters are not used in tags or values
■ DTD or Schema
‒ Elements and attributes that must or might be included, and their permitted
structure
‒ The structure is specified by a regular expression syntax
■ Well-formed document
‒ Follows XML syntax rules
■ Valid document
‒ Well-formed document
‒ Follows the rules defined in its DTD or schema
3
© 2013 IBM Corporation
Information Management
XML Transformation
4
© 2013 IBM Corporation
Information Management
5
© 2013 IBM Corporation
Information Management
XML Stage
■ Can be used as source, target or transformer
■ Parse
‒ Parse incoming XML files or messages and create hierarchical data
representation
■ Compose
‒ Convert hierarchical or relational records to XML file or message
■ Transformer
‒ Performs transformation on XML data using a set of hierarchical operations
➢ Aggregate, Sort, Regroup, etc.
6
© 2013 IBM Corporation
Information Management
Assembly Editor
■ Performs actual design combining a set of steps or operations on XML
data
Input/Output
Palette links
Input Step
Steps
Output Step
7
© 2013 IBM Corporation
Information Management
8
© 2013 IBM Corporation
Information Management
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name=" department"> Complex types
<xs:complexType> are simplified
<xs:sequence>
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="employee_name" type="xs:string"/>
<xs:element name="age" type="xs:int"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType> Primitive Types
</xs:element>
</xs:schema>
9
© 2013 IBM Corporation
Information Management
10
© 2013 IBM Corporation
Information Management
XML Transformation
■ XML job design is XML schema driven
■ It starts with an XML schema document (XSD files) imported in schema
library manager
‒ Schema specifies the basic structure of incoming XML file or message
format, it also specifies the basic constraints on the XML document
‒ Schema is converted to internal hierarchical schema tree representation and
presented to end user for job design
■ Design the corresponding operations that fit business requirements
‒ Parsing, sorting, aggregation, composing, etc.
11
© 2013 IBM Corporation
Information Management
XML Parser
■ Parse incoming XML files or
messages and create hierarchical
data representation based on
given schema
■ The XML file can be read in 3
ways
‒ String Set
➢ Select a column from an input link
of the XML stage or an input
schema item from previous parser
steps
‒ Single File
➢ Specify the file path to be
processed
‒ File Set
➢ Select a column from an input link
of the XML stage. The selected
column contains the path name of
an XML file to be processed
12
© 2013 IBM Corporation
Information Management
Input Step
■ Used to map a relational data structure to a
hierarchical data structure
■ Activated when an Input link is defined
■ Output from the input step is used in further
steps
■ Each input link is transformed into a child list of
the InputLinks node
■ Two views
‒ Links view
➢ Same as columns tab of a
Stage
‒ Tree view
➢ Hierarchical view
13
© 2013 IBM Corporation
Information Management
Click to select
14
© 2013 IBM Corporation
Information Management
XML Validation
■ The XML file is validated against the schema defined in the Document
Root tab using validation rules
■ Validation rules
‒ Value validation
➢ Checks the actual values in the XML document against the defined schema types
‒ Structure validation
➢ Checks that the XML
document conforms to
the schema specified in
the document root
‒ Consists of a condition
and an action
➢ Condition
Validation check
➢ Action
How to handle the
error when the
check fails
15
© 2013 IBM Corporation
Information Management
■ Strict validation
‒ Job aborts for any type of
validation error
‒ The action for the validation
rules are set to Fatal
‒ Default validation rule selection
in the XML Composer
16
© 2013 IBM Corporation
Information Management
chunk
17
© 2013 IBM Corporation
Information Management
18
© 2013 IBM Corporation
Information Management
XML Stage
19
© 2013 IBM Corporation
Information Management
20
© 2013 IBM Corporation
Information Management
Output Step
■ Used to map a hierarchical data structure to a relational data structure
■ Output step is activated when output link defined
■ Simple rules
‒ A list is mapped to other list
‒ Groups are not mapped
‒ Parent item is mapped before child
■ Columns can be viewed in Output Tab
21
© 2013 IBM Corporation
Information Management
Map Source
to target
22
© 2013 IBM Corporation
Information Management
XML Composer
■ Creates an XML file or message based on a pre-defined schema
■ Allows the user to control validation types
■ The Header tab allows the user to add comments, processing
instructions and XML declaration
■ Three target options
‒ Write the output to a file
‒ Write the output as a
XML string
‒ Write the output as Large
object and pass the
reference from this stage
to downstream
23
© 2013 IBM Corporation
Information Management
Select root
element
Composer
elements
24
© 2013 IBM Corporation
Information Management
25
© 2013 IBM Corporation
Information Management
26
© 2013 IBM Corporation
Information Management
27
© 2013 IBM Corporation
Information Management
28
© 2013 IBM Corporation
Information Management
XML Operations
■ Regroup
■ H-Join
■ Sort
■ Switch
■ Aggregation
■ Union
■ H-Pivot
■ V-Pivot
■ OrderJoin
29
© 2013 IBM Corporation
Information Management
Supporting Slides
30
Information Management
Regroup Step
▪ Creates a parent-child hierarchy from a single list
▪ Removes redundancy in data by allowing the user to put the
repeating items into a parent list
▪ For best performance, the
data coming into Regroup
should be pre-sorted
– Unsorted data decreases
performance and adds
significant memory
requirement
– It is recommended to use Regroup
DataStage sort if possible
31
© 2013 IBM Corporation
Information Management
Drag parent
& child items
32
© 2013 IBM Corporation
Information Management
33
© 2013 IBM Corporation
Information Management
H-Join Transformation
▪ Transforms the items from two lists into a single list
▪ The Output will have the child list placed within the parent
list
HJoin
34
© 2013 IBM Corporation
Information Management
H-Join Configuration
▪ Disk based optimization is recommended for large input
data
35
© 2013 IBM Corporation
Information Management
Switch Step
▪ Used to filter the input list into multiple output lists based on the
specified constraints
▪ Supported switch functions:
– between
– compare <Address>
<street>121 Main Street</street>
– contains <Address> <city>San Jose</city>
– equals <street>121 Main Street</street> <state>California</state>
36
© 2013 IBM Corporation
Information Management
Switch Configuration
Select Scope
37
© 2013 IBM Corporation
Information Management
Sort Step
▪ Sorts the items in a list as based on one or more sort keys
▪ For relational records, it is recommended to sort in
DataStage job flow rather than inside XML Stage
<integer>45</integer> <integer>25</integer>
<integer>41</integer> <integer>26</integer>
<integer>32</integer> <integer>29</integer>
<integer>25</integer> <integer>30</integer>
<integer>29</integer> <integer>31</integer>
<integer>40</integer> Sort <integer> in <integer>31</integer>
<integer>38</integer> <integer>32</integer>
ascending order
<integer>35</integer> <integer>35</integer>
<integer>26</integer> <integer>36</integer>
<integer>31</integer> <integer>38</integer>
<integer>30</integer> <integer>40</integer>
<integer>36</integer> <integer>41</integer>
<integer>31</integer> <integer>45</integer>
38
© 2013 IBM Corporation
Information Management
Select Scope
39
© 2013 IBM Corporation
Information Management
Aggregation Step
▪ Performs aggregation on the items in a list
▪ Supported aggregation functions
– average
– concat
– count <root>
– first <integer>1001</integer>
<integer>-3456</integer>
– last <integer>23453</integer> <root>
Aggregate with
– maximum <integer>32767</integer> function “First”
<integer>1001</integer>
</root>
– minimum <integer>-32768</integer>
– sum <integer>-234</integer>
<integer>7932</integer>
– variance </root>
40
© 2013 IBM Corporation
Information Management
Aggregation Configuration
Select List to
aggregate
Select Scope
41
© 2013 IBM Corporation
Information Management
Union Step
▪ The Union step combines 2 different lists
▪ Target list has to be pre-imported in the library (Union Type)
▪ Use two parser steps to read each document (list)
<tns:EmployeeInfo employeeID="B6540" departmentID="A100"> <prn:employee employeeID="B6540" departmentID="A100">
<EMP_Name> <name>
<firstName>Cynthia</firstName> <firstName>Cynthia</firstName>
<middleName>P</middleName> <middleName>P</middleName>
<lastName>Donald</lastName> <lastName>Donald</lastName>
</EMP_Name> </name>
<gender>female</gender> <gender>female</gender>
<DOB>1987-01-17</DOB> <dateOfBirth>1987-01-17</dateOfBirth>
<title>Miss</title> <title>Miss</title>
<hireDate>2000-07-25</hireDate> <hireDate>2000-07-25</hireDate>
</tns:EmployeeInfo> </prn:employee>
Union
<prn:employee employeeID="A8990" departmentID="A100">
<tns:Dept_employee employeeID="A8990" departmentID="A100">
<name>
<name>
<firstName>Zen</firstName>
<Emp_firstName>Zen</Emp_firstName>
<middleName>P</middleName>
<Emp_middleName>P</Emp_middleName>
<lastName>Wright</lastName>
<Emp_lastName>Wright</Emp_lastName>
</name>
</name>
<gender>male</gender>
<gender>male</gender>
<dateOfBirth>1980-04-04</dateOfBirth>
<dateOfBirth>1980-04-04</dateOfBirth>
<title>Mr</title>
<title>Mr</title>
<hireDate>2008-07-11</hireDate>
<hireDate>2008-07-11</hireDate>
</prn:employee>
</tns:Dept_employee>
42
© 2013 IBM Corporation
Information Management
Union Configuration
▪ Select target schema as Union Type
▪ Map Source to target Select
Mappings Tab
Map Source
to target list
43
© 2013 IBM Corporation
Information Management
H-Pivot Step
▪ Converts columns to a single hierarchical column
H-Pivot
List
44
© 2013 IBM Corporation
Information Management
H-Pivot Configuration
Select Scope
Add columns
to convert
45
© 2013 IBM Corporation
Information Management
V-Pivot Step
▪ Spread values from a column to different columns according
to a key value.
New columns
Column
contains “car”
& “Meals”
values
46
© 2013 IBM Corporation
Information Management
V Pivot Configuration
Select column
with types
47
© 2013 IBM Corporation
Information Management
48
© 2013 IBM Corporation
Information Management
49
© 2013 IBM Corporation
Information Management
XML Mapper
▪ Runs on Engine node only,
Menu->Tools Stage
Custom UI no interaction to service and
Web 2.0 Clients
repository tiers
XML Metadata
Importer
Assembly
Editor
▪ Streaming and event driven
UI UI
Restful Services
XML Tables
DataStage jobs PX Engine
E2 Connector
XMeta PX Jobs
50
© 2013 IBM Corporation
Information Management
51
© 2013 IBM Corporation
Information Management
52
© 2013 IBM Corporation