SG4 - IPT 101 DataMapping and Exchange

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

Module No. 4
MODULE TITLE
DATA MAPPING AND EXCHANGE

MODULE OVERVIEW

This module discussed the concepts in integrating and translating real-world data using XML. It provides an
overview of schema mapping and describes the different challenges in integrating information.

LEARNING OBJECTIVES

At the end of the module, the student is expected to:


- Use DTD to create a document definition for a data structure
- Define the term metadata
- Understand how XML and the document object model can be used to integrate and exchange data between
systems.

LEARNING CONTENTS

DATA INTEGRATION, MAPPING AND EXCHANGE CONCEPTS

A. Information Integration
- Data may reside at several different sites in several different formats (relational, XML, …)
- Applications need to access and process all these data.
- Growing market of enterprise information integration tools:
o About $1.5B per year; 17% annual rate of growth
o Information integration consumes 40% of the budget of enterprise information technology
shops.
- The following figure shows the existing Information Integration products.

- The software companies were categorized as either Challengers, Leaders, Niche Players, and
Visionaries. You visit may this link The Top 28 Data Integration Software Solutions, Tools Vendor
Directory (solutionsreview.com) to check on the available data integration and application
integration solution.
- There are two facets of Information Integration includes Data Integration (aka Data Federation)
and Data Exchange (aka Data Translation)

PANGASINAN STATE UNIVERSITY 1


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

B. Data Integration (Data Federation): refers to the process and technologies for data movement
from Source Data Systems to Target Data Systems. On its way, data are usually transformed to fit
business requirements.

Data
Integration

Data Producers Source Target Data Consumer

- On its way from data producers to data consumers, data can go through multiple “source-to-target”
layers (e.g., from operational sources into staging area into data warehouse, etc.)

Data Integration Scenarios

a. Data Integration Scenarios – ETL. ETL (Extract Transform Load) refers to the data
integration approach where data are extracted from Source Systems, then going through
transformation process and ends up by loading in Target System. ETL scenario is typical
for data warehouse systems.
b. Data Integration Scenarios – ELT. ELT (Extract Load Transform) refers to the data
integration approach where data are extracted from Source Systems, then loaded in Target
System without any transformation. Transformation of data is performed in target system.
ELT scenario is typical for big data / Hadoop system.

You may view a short discussion of data Integration on (64) Data Management - Data Integration -
YouTube.
Data Integration Process
a. Data integration is closely related to application development (applications that move data
from source to target). Hence, Data integration process should be a part of SDLC (System
Development Life Cycle).
b. SDLC (System Development Lifecycle) refers to the process of planning, creating, testing,
and deploying an information system.

Planning Creating Testing Deploying

Data Integration Technology Support


- Data Integration tools key features include:
a. Ability to perform real-time and batch processing

PANGASINAN STATE UNIVERSITY 2


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

b. Ability to perform change data capture (identify changed records)


c. Ability to perform powerful transformation on structured and unstructured data
d. Ability for comprehensive error handling operations
C. Data Exchange
- Transform data structured under a source schema into data structured under different target schema.
- Data comes in a variety of formats:
o Plain text
o Json
o XML
o CLR Object
- We can transform these data using XML.

EDI

Plain
Text XML JSON

CLR
OBJECT

Data Mapping
• Define how source corresponds destination.
o E.g., ORM defines how objects corresponds to relational models
• Source and destination formats must be defined.
• Data schemas useful for mapping
o CLR object structure
o XSD
o Written specification
• Data Formats:
o Extensible Markup Language (XML)
▪ Widely acceptable as a universal data format
▪ Human readable
▪ API Support:
• System.Xml
• System.Xml.Linq
▪ Navigable via XPath
▪ Serializable to / from CLR Objects
o XML Schema Definition (XSD)
▪ Defines the structure of XML file
▪ Typically autogenerated from XML file (VS, XmlSpy) or CLR object
▪ Some hand-tuning may be necessary.
o Extensible Stylesheet Language Transformation and XQuery
▪ Defines transformations of XML files to
• Other XML files
• Text files
▪ XSLT are themselves XML.
▪ XSLT 1 is supported by .Net while XSLT 2 is not.
▪ XQuery is not present in .Net and can only be integrated as 3 rd party only.

• YouTube Links: Data Exchange in Motion with Demo for Dawex, AWS, and AWS Part 2

PANGASINAN STATE UNIVERSITY 3


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

XML DTD AND XML SCHEMA

A. How does an XML processor check your xml document?


1. Checking that your document is well-formed (Syntax rule)
2. Checking that it is valid (syntax – check your XML either in XML DTD [Document Type
Definition] or XSD [XML Schema Definition)

- Why need an XML Validator? Errors in XML documents will stop your XML application.

B. XML DTD
o An XML document with correct syntax is called "Well Formed".
o An XML document validated against a DTD is "Well Formed" and "Valid".
o The purpose of a DTD is to define the structure of an XML document and a list of legal
elements.
o DTD can be a separate document or, they can be built into an XML document using a special
element named <!DOCTYPE>.

Example 1. An XML Document with a DTD (example1.xml)


<?xml version=”1.0” encoding=”UTF-8”?>
<?xml-stylesheet type=”text/css” href=”css1.css”?>

<!DOCTYPE document
[ <!ELEMENT document (heading, message)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT message (#PCDATA)>
]>

<document>
<heading>Hello from XML</heading>
<message>This is an XML document!<!message>
</document>

Example 2. An XML Document with a separate DTD (example2.xml)


<?xml version=”1.0” encoding=”UTF-8”?>
<!DOCTYPE note SYSTEM “Note.dtd”>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don’t forget me this weekend!</body>
</note>
FILE NAME: Note.dtd
<!DOCTYPE note
[ <!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>

]>

Example 2 explanation
The DTD above is interpreted like this:
• !DOCTYPE note defines that the root element of the document is note
• !ELEMENT note defines that the note element contains four elements: "to, from, heading,
body"
• !ELEMENT to defines the to element to be of type "#PCDATA"

PANGASINAN STATE UNIVERSITY 4


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

• !ELEMENT from defines the from element to be of type "#PCDATA"


• !ELEMENT heading defines the heading element to be of type "#PCDATA"
• !ELEMENT body defines the body element to be of type "#PCDATA“

Note #PCDATA means parse-able text data.

When NOT to Use a Document Definition?


• When you are working with small XML files, creating document definitions may be a waste of
time.

C. XML Schema
o Another way of validating XML documents: using XML schemas.
o The XML Schema language is also referred to as XML Schema Definition (XSD),
describes the structure of an XML document.
o defines the legal building blocks (elements and attributes) of an XML document
like DTD.
o defines which elements are child elements
o defines the number and order of child elements
o defines whether an element is empty or can include text
o defines data types for elements and attributes
o defines default and fixed values for elements and attributes
o XML Schemas will be used in most Web applications as a replacement for DTDs.
Here are some reasons:
i. XML Schemas are extensible to future additions
ii. XML Schemas are richer and more powerful than DTDs
iii. XML Schemas are written in XML
iv. XML Schemas support data types and namespaces
o The following are the basic building blocks of XML Schema
▪ XSD Simple Elements
▪ XSD Attributes
▪ XSD Complex Elements
▪ XSD Indicators

D. XSD Simple Element


o A simple element is an XML element that can contain only text.
o It cannot contain any other elements or attributes.
o XML Schema has a lot of built-in data types.
o The most common types are:
▪ xs:string
▪ xs:decimal
▪ xs:integer
▪ xs:boolean
▪ xs:date
▪ xs:time
o The syntax for defining a simple element is:
<xs:element name=”xxx” type=”yyy”/>

Example 3. Using XML Schema for Simple Elements


<xs:element name=”lastname” type=”xs:string”/>
<xs:element name=”age” type=”xs:integer”/>
<xs:element name=”dateborn” type=”xs:date”/>

<lastname>Refsnes</lastname>
<age>36</age>
<dateborn>1970-02-27</dateborn>

PANGASINAN STATE UNIVERSITY 5


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

E. XSD Attributes
o Simple elements cannot have attributes.
o If an element has attributes, it is a complex type. But the attribute itself is always declared as
a simple type.
o The syntax for defining an attribute is:
<xs : attribute name="xxx" type="yyy"/>

Example 4. Using XML Schema for XSD Attributes


<lastname lang="EN">Smith</lastname>
<xs:attribute name="lang" type="xs:string"/>

o Attributes may have a default value or a fixed value specified.


<xs:attribute name="lang" type="xs:string" default="EN"/>
<xs:attribute name="lang" type="xs:string" fixed="EN"/>
o Attributes are optional by default. To specify that the attribute is required, use the "use"
attribute:
<xs:attribute name="lang" type="xs:string" use="required"/>

F. XSD Complex Elements


o A complex element is an XML element that contains other elements and/or attributes.
o There are four kinds of complex elements:
▪ A complex XML element, "product", which is empty:
<product pid="1345"/>
▪ A complex XML element, "employee", which contains only other elements:
<employee>
<firstname>John</firstname>
<lastname>Smith</lastname>
</employee>
▪ A complex XML element, "food", which contains only text:
<food type="dessert">Ice cream</food>
▪ A complex XML element, "description", which contains both elements and text:
<description>
It happened on <date lang=“EN">03.03.99</date>
</description>

How to Define a Complex Element using XML Scheme


o Complex XML element, "employee", which contains only other elements:
<employee>
<firstname>John</firstname>
<lastname>Smith</lastname>
</employee>
o The "employee" element can be declared directly by naming the element:
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
o An empty complex element cannot have contents, only attributes.
<product prodid="1345" />
o It is possible to declare the "product" element more compactly:
<xs:element name="product">
<xs:complexType>
<xs:attribute name="prodid" type="xs:positiveInteger"/>
</xs:complexType>
</xs:element>

PANGASINAN STATE UNIVERSITY 6


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

G. XSD Indicators
o How elements are to be used in documents with indicators.
o Order indicators are used to define the order of the elements. They are:
▪ All
▪ Choice
▪ Sequence
o All Indicator:
▪ The <all> indicator specifies that the child elements can appear in any order, and
that each child element must occur only once:
<xs:element name="person">
<xs:complexType>
<xs:all>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:all>
</xs:complexType>
</xs:element>

o Choice Indicator:
▪ The <choice> indicator specifies that either one child element or another can occur:
<xs:element name="person">
<xs:complexType>
<xs:choice>
<xs:element name="employee" type="employee"/>
<xs:element name="member" type="member"/>
</xs:choice>
</xs:complexType>
</xs:element>

o Sequence Indicator:
▪ The <sequence> indicator specifies that the child elements must appear in a specific
order:
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>

XSL, XSLT, and XPATH


A. DEFINITION
o XSL stands for EXtensible Stylesheet Language.
▪ It is an XML-based Stylesheet Language.
▪ XSL describes how the XML document should be displayed
▪ XSL consists of three parts:
• XSLT - a language for transforming XML documents
• XPath - a language for navigating in XML documents
• XSL-FO - a language for formatting XML documents
B. XSLT
o XSLT stands for XSLTransformations,
o XSLT transforms an XML source-tree into an XML result-tree.
o XSLT transforms an XML document into another XML document, recognized by a browser,
like HTML and XHTML.
o Add/remove elements and attributes to or from the output file.
o Rearrange and sort elements, perform tests and make decisions about which elements to
hide and display, and a lot more.
o XSLT uses XPath to find information in an XML document.

PANGASINAN STATE UNIVERSITY 7


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

o XPath is used to navigate through elements and attributes in XML documents.

- How Does it Work?


o In the transformation process, XSLT uses XPath to define parts of the source document
that should match one or more predefined templates.
o When a match is found, XSLT will transform the matching part of the source document into
the result document.
o All major browsers such as Internet Explorer, Chrome, Firefox, Safari, and Opera supports
XML, XSLT, and XPath
- Building Blocks of XSLT
o XSLT <xsl:stylesheet> Element
▪ defines that this document is an XSLT style sheet document (along with the version
number and XSLT namespace attributes).
o XSLT <xsl:template> Element
▪ An XSL style sheet consists of one or more set of rules that are called templates.
▪ A template contains rules to apply when a specified node is matched.
▪ The <xsl:template> element is used to build templates.
o XSLT match attribute
▪ The match attribute is used to associate a template with an XML element.
▪ The match attribute can also be used to define a template for the entire XML
document.
▪ The value of the match attribute is an XPath expression (i.e. match="/" defines the
whole document).
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
// some output
</xsl:template>
</xsl:stylesheet>
o XSLT <xsl:value-of> Element
▪ used to extract the value of an XML element and add it to the output stream of the
transformation
o XSLT select attribute
▪ contains an XPath expression. An XPath expression works like navigating a file
system; a forward slash (/) selects subdirectories.
o XSLT <xsl:for-each> and <xsl:sort> Element
▪ <xsl:for-each> element to loop through the XML elements, and display all of the
records.
▪ The <xsl:sort> element is used to sort the output.
▪ To sort the output, simply add an <xsl:sort> element inside the <xsl:for-each>
element in the XSL file:
Example 5. XSLT Implementation
FILE 1: XML FILE
The XML File
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<cd>
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<country>USA</country>
<company>Columbia</company>
<price>10.90</price>
<year>1985</year>
</cd>
…..
</catalog>
FILE 2: THE XSL STYLESHEET FILE
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:templatematch="/">
<html>

PANGASINAN STATE UNIVERSITY 8


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32"> <th>Title</th> <th>Artist</th></tr>
<xsl:for-each select="catalog/cd">
<xsl:sort select="artist"/>
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

o XSLT <xsl:if> Element


▪ The <xsl:if> element is used to put a conditional test against the content of the XML
file.
o Syntax
<xsl:if test="expression">
...some output if the expression is true...
</xsl:if>
o To add a conditional test, add the <xsl:if> element inside the <xsl:for-each> element in the
XSL file.
o The value of the required test attribute contains the expression to be evaluated.

Example 6. Implementation of xsl:if element (use the previous xml file)


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<xsl:if test="price &gt; 10">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
<td><xsl:value-of select="price"/></td>
</tr>
</xsl:if>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

NOTES ON XSL OPERATORS: https://www.w3schools.com/xml/xsl_for_each.asp

C. XML and XPath


o XPath is a syntax for defining parts of an XML document
o XPath uses path expressions to navigate in XML documents

PANGASINAN STATE UNIVERSITY 9


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

o XPath contains a library of standard functions


o XPath is also used in XSLT, XQuery, XPointer and XLink
o Without XPath knowledge you will not be able to create XSLT documents.
o XPath is a W3C recommendation
o There are various types of legal XPath expressions:
▪ Node sets-indicates what type of node you want to match
▪ Booleans-use the built-in XPath logical operators to produce Boolean results.
• Besides Boolean values, XPath can also work with node sets.
<xsl:template match="state[position() > 3]">
<xsl:value-of select="."/>
</xsl:template>
▪ Numbers - use numbers in expressions
<xsl:apply-templates select="state[population div area > 200]"/>
▪ Strings-XPath functions are specially designed to work on strings
▪ Wildcard - to select element nodes
• * -Matches any element node
• @*-Matches any attribute node
• node() -Matches any node of any kind
o XPath Sample Expressions

LEARNING POINTS

PANGASINAN STATE UNIVERSITY 10


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

- There are two facets of Information Integration includes Data Integration (aka Data Federation)
and Data Exchange (aka Data Translation)
- Data Integration (Data Federation) refers to the process and technologies for data movement
from Source Data Systems to Target Data Systems
- Data Integration can be done either in ETL or ELT format
- Data Exchange is a process that Transform data structured under a source schema into data
structured under different target schema
- Mapping, Integration and Exchange of data can be done using XML, XSL, and XSLT.

LEARNING ACTIVITIES

Laboratory Exercise. Given the following XML file (menu.xml)


<breakfast_menu>
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>Two of our famous Belgian Waffles with plenty of real
maple syrup</description>
<calories>650</calories>
</food>
<food>
<name>Strawberry Belgian Waffles</name>
<price>$7.95</price>
<description>Light Belgian waffles covered with strawberries and
whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>Berry-Berry Belgian Waffles</name>
<price>$8.95</price>
<description>Light Belgian waffles covered with an assortment of
fresh berries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>French Toast</name>
<price>$4.50</price>
<description>Thick slices made from our homemade sourdough
bread</description>
<calories>600</calories>
</food>
<food>
<name>Homestyle Breakfast</name>
<price>$6.95</price>
<description>Two eggs, bacon or sausage, toast, and our ever-
popular hash browns</description>
<calories>950</calories>
</food>
</breakfast_menu>

1. Create a DTD file based on the provided data in the XML document. The DTD file will be used to
validate the XML file.
2. Create an XML schema based on the provided data in the XML file.
3. Create an HTML document that will display the following:
a. All contents of the XML document in the format shown below:

PANGASINAN STATE UNIVERSITY 11


Study Guide in IPT 101 (Integrative Programming and Technologies) Module 4 Data Mapping and Exchange

b. Food whose price is greater than $5.00


c. Food whose calorie does not exceed 700 cal.

REFERENCES

Weblinks:
- https://www.slideshare.net/DmitriNesteruk/data-mapping-tutorial?from_action=save
- https://www2.informatik.hu-berlin.de/logik/events/deis10/downloads/10452.KolaitisPhokion.Slides.pdf
- Chapter 3 and 4 lecture slides of Dr. J. VijiPriya, Assistant Professor in Hawassa University, Ethiopia
- https://www.youtube.com/watch?v=MaNjsbdSDZ4
- https://www.youtube.com/watch?v=IvgcwYisiOQ
- https://www.w3schools.com/xml/
-

PANGASINAN STATE UNIVERSITY 12

You might also like