SG4 - IPT 101 DataMapping and Exchange
SG4 - IPT 101 DataMapping and Exchange
SG4 - IPT 101 DataMapping and Exchange
Module No. 4
MODULE TITLE
DATA MAPPING AND EXCHANGE
MODULE OVERVIEW
This module discussed the concepts in integrating and translating real-world data using XML. It provides an
overview of schema mapping and describes the different challenges in integrating information.
LEARNING OBJECTIVES
LEARNING CONTENTS
A. Information Integration
- Data may reside at several different sites in several different formats (relational, XML, …)
- Applications need to access and process all these data.
- Growing market of enterprise information integration tools:
o About $1.5B per year; 17% annual rate of growth
o Information integration consumes 40% of the budget of enterprise information technology
shops.
- The following figure shows the existing Information Integration products.
- The software companies were categorized as either Challengers, Leaders, Niche Players, and
Visionaries. You visit may this link The Top 28 Data Integration Software Solutions, Tools Vendor
Directory (solutionsreview.com) to check on the available data integration and application
integration solution.
- There are two facets of Information Integration includes Data Integration (aka Data Federation)
and Data Exchange (aka Data Translation)
B. Data Integration (Data Federation): refers to the process and technologies for data movement
from Source Data Systems to Target Data Systems. On its way, data are usually transformed to fit
business requirements.
Data
Integration
- On its way from data producers to data consumers, data can go through multiple “source-to-target”
layers (e.g., from operational sources into staging area into data warehouse, etc.)
a. Data Integration Scenarios – ETL. ETL (Extract Transform Load) refers to the data
integration approach where data are extracted from Source Systems, then going through
transformation process and ends up by loading in Target System. ETL scenario is typical
for data warehouse systems.
b. Data Integration Scenarios – ELT. ELT (Extract Load Transform) refers to the data
integration approach where data are extracted from Source Systems, then loaded in Target
System without any transformation. Transformation of data is performed in target system.
ELT scenario is typical for big data / Hadoop system.
You may view a short discussion of data Integration on (64) Data Management - Data Integration -
YouTube.
Data Integration Process
a. Data integration is closely related to application development (applications that move data
from source to target). Hence, Data integration process should be a part of SDLC (System
Development Life Cycle).
b. SDLC (System Development Lifecycle) refers to the process of planning, creating, testing,
and deploying an information system.
EDI
Plain
Text XML JSON
CLR
OBJECT
Data Mapping
• Define how source corresponds destination.
o E.g., ORM defines how objects corresponds to relational models
• Source and destination formats must be defined.
• Data schemas useful for mapping
o CLR object structure
o XSD
o Written specification
• Data Formats:
o Extensible Markup Language (XML)
▪ Widely acceptable as a universal data format
▪ Human readable
▪ API Support:
• System.Xml
• System.Xml.Linq
▪ Navigable via XPath
▪ Serializable to / from CLR Objects
o XML Schema Definition (XSD)
▪ Defines the structure of XML file
▪ Typically autogenerated from XML file (VS, XmlSpy) or CLR object
▪ Some hand-tuning may be necessary.
o Extensible Stylesheet Language Transformation and XQuery
▪ Defines transformations of XML files to
• Other XML files
• Text files
▪ XSLT are themselves XML.
▪ XSLT 1 is supported by .Net while XSLT 2 is not.
▪ XQuery is not present in .Net and can only be integrated as 3 rd party only.
• YouTube Links: Data Exchange in Motion with Demo for Dawex, AWS, and AWS Part 2
- Why need an XML Validator? Errors in XML documents will stop your XML application.
B. XML DTD
o An XML document with correct syntax is called "Well Formed".
o An XML document validated against a DTD is "Well Formed" and "Valid".
o The purpose of a DTD is to define the structure of an XML document and a list of legal
elements.
o DTD can be a separate document or, they can be built into an XML document using a special
element named <!DOCTYPE>.
<!DOCTYPE document
[ <!ELEMENT document (heading, message)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT message (#PCDATA)>
]>
<document>
<heading>Hello from XML</heading>
<message>This is an XML document!<!message>
</document>
]>
Example 2 explanation
The DTD above is interpreted like this:
• !DOCTYPE note defines that the root element of the document is note
• !ELEMENT note defines that the note element contains four elements: "to, from, heading,
body"
• !ELEMENT to defines the to element to be of type "#PCDATA"
C. XML Schema
o Another way of validating XML documents: using XML schemas.
o The XML Schema language is also referred to as XML Schema Definition (XSD),
describes the structure of an XML document.
o defines the legal building blocks (elements and attributes) of an XML document
like DTD.
o defines which elements are child elements
o defines the number and order of child elements
o defines whether an element is empty or can include text
o defines data types for elements and attributes
o defines default and fixed values for elements and attributes
o XML Schemas will be used in most Web applications as a replacement for DTDs.
Here are some reasons:
i. XML Schemas are extensible to future additions
ii. XML Schemas are richer and more powerful than DTDs
iii. XML Schemas are written in XML
iv. XML Schemas support data types and namespaces
o The following are the basic building blocks of XML Schema
▪ XSD Simple Elements
▪ XSD Attributes
▪ XSD Complex Elements
▪ XSD Indicators
<lastname>Refsnes</lastname>
<age>36</age>
<dateborn>1970-02-27</dateborn>
E. XSD Attributes
o Simple elements cannot have attributes.
o If an element has attributes, it is a complex type. But the attribute itself is always declared as
a simple type.
o The syntax for defining an attribute is:
<xs : attribute name="xxx" type="yyy"/>
G. XSD Indicators
o How elements are to be used in documents with indicators.
o Order indicators are used to define the order of the elements. They are:
▪ All
▪ Choice
▪ Sequence
o All Indicator:
▪ The <all> indicator specifies that the child elements can appear in any order, and
that each child element must occur only once:
<xs:element name="person">
<xs:complexType>
<xs:all>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:all>
</xs:complexType>
</xs:element>
o Choice Indicator:
▪ The <choice> indicator specifies that either one child element or another can occur:
<xs:element name="person">
<xs:complexType>
<xs:choice>
<xs:element name="employee" type="employee"/>
<xs:element name="member" type="member"/>
</xs:choice>
</xs:complexType>
</xs:element>
o Sequence Indicator:
▪ The <sequence> indicator specifies that the child elements must appear in a specific
order:
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32"> <th>Title</th> <th>Artist</th></tr>
<xsl:for-each select="catalog/cd">
<xsl:sort select="artist"/>
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
LEARNING POINTS
- There are two facets of Information Integration includes Data Integration (aka Data Federation)
and Data Exchange (aka Data Translation)
- Data Integration (Data Federation) refers to the process and technologies for data movement
from Source Data Systems to Target Data Systems
- Data Integration can be done either in ETL or ELT format
- Data Exchange is a process that Transform data structured under a source schema into data
structured under different target schema
- Mapping, Integration and Exchange of data can be done using XML, XSL, and XSLT.
LEARNING ACTIVITIES
1. Create a DTD file based on the provided data in the XML document. The DTD file will be used to
validate the XML file.
2. Create an XML schema based on the provided data in the XML file.
3. Create an HTML document that will display the following:
a. All contents of the XML document in the format shown below:
REFERENCES
Weblinks:
- https://www.slideshare.net/DmitriNesteruk/data-mapping-tutorial?from_action=save
- https://www2.informatik.hu-berlin.de/logik/events/deis10/downloads/10452.KolaitisPhokion.Slides.pdf
- Chapter 3 and 4 lecture slides of Dr. J. VijiPriya, Assistant Professor in Hawassa University, Ethiopia
- https://www.youtube.com/watch?v=MaNjsbdSDZ4
- https://www.youtube.com/watch?v=IvgcwYisiOQ
- https://www.w3schools.com/xml/
-