Ibm Iis
Ibm Iis
Ibm Iis
Version 11 Release 3
Introduction
SC19-4312-00
Introduction
SC19-4312-00
Note
Before using this information and the product that it supports, read the information in Notices and trademarks on page
87.
Contents
Introduction to InfoSphere Information
Server . . . . . . . . . . . . . . . 1
Information integration phases . . . . . . . . 3
Plan . . . . . . . . . . . . . . . . 4
Discover and analyze . . . . . . . . . . 5
Design . . . . . . . . . . . . . . . 6
Develop . . . . . . . . . . . . . . . 6
Deliver . . . . . . . . . . . . . . . 7
Components in the InfoSphere Information Server
suite . . . . . . . . . . . . . . . . . 8
InfoSphere Blueprint Director . . . . . . . 10
InfoSphere Information Governance Catalog . . 14
InfoSphere DataStage . . . . . . . . . . 19
InfoSphere Data Architect . . . . . . . . 23
InfoSphere Discovery . . . . . . . . . . 26
InfoSphere FastTrack . . . . . . . . . . 30
InfoSphere Information Analyzer . . . . . . 32
InfoSphere Information Services Director . . . 36
InfoSphere QualityStage . . . . . . . . . 38
Additional components in the InfoSphere
Information Server portfolio . . . . . . . . . 42
Mainframe integration . . . . . . . . . . 42
Companion components for InfoSphere
Information Server . . . . . . . . . . . 44
IBM InfoSphere Information Server architecture and
concepts . . . . . . . . . . . . . . . 51
54
63
68
72
76
iii
iv
Introduction
In
e
t
ma rstand & Collab rnan
r
ora
o e
ce
te
nf Und
er
e Information
r
e
Se
ph
rv
S
fo
ion Gov
Introduction
InfoSphere
Blueprint
Director
InfoSphere
Information
Services
Director
InfoSphere
Discovery
InfoSphere
Data Architect
InfoSphere
Information
Governance
Catalog
InfoSphere
Glossary
Anywhere
Share
InfoSphere
DataStage
InfoSphere
Information
Analyzer
InfoSphere
FastTrack
Share
Share
InfoSphere
QualityStage
InfoSphere
Information
Governance
Catalog
Share
Metadata repository
InfoSphere Information Server framework
Enterprise architects use InfoSphere Blueprint Director to plan and manage your
project vision. After a blueprint of your information project exists, data architects
can use InfoSphere Data Architect to discover the structure of your organization's
data, relate and integrate data assets, and create physical and logical models that
are based on those relationships. This data can be input to InfoSphere Information
Governance Catalog, where business analysts and data analysts define and establish a
common understanding of business concepts.
Data analysts can also use InfoSphere Discovery for Information Integration to
automate the identification and definition of data relationships, feeding that
information to InfoSphere Information Analyzer and InfoSphere FastTrack.
Data quality specialists use InfoSphere Information Analyzer to design, develop, and
manage data quality rules for your organization's data to ensure data quality. As
your organization's data evolves, these rules can be modified in real time so that
trusted information is delivered to InfoSphere Information Governance Catalog,
InfoSphere FastTrack, InfoSphere DataStage, InfoSphere QualityStage, and other
InfoSphere Information Server components.
Data analysts can use InfoSphere FastTrack to create mapping specifications that
translate business requirements into business applications. Data integration
specialists can use these specifications to generate jobs that become the starting
point for complex data transformation in InfoSphere DataStage and InfoSphere
QualityStage. By using the InfoSphere DataStage and QualityStage Designer, data
integration specialists develop jobs that extract, transform, load, and check the
quality of data. SOA architects use InfoSphere Information Services Director to
deploy integration tasks from the suite components as consistent, reusable
information services.
InfoSphere Information Governance Catalog provides end-to-end data flow
reporting and impact analysis of your organization's data assets. Business analysts,
data analysts, data integration specialists, and other users interact with this
component to explore and manage the assets that are produced and used by
InfoSphere Information Server. InfoSphere Information Governance Catalog enables
users to understand and manage the flow of data through your enterprise, and
discover and analyze relationships between information assets in the InfoSphere
Information Server metadata repository. You use InfoSphere Metadata Asset
Manager to import technical information into the metadata repository, such as BI
reports, logical models, physical schemas, and InfoSphere DataStage and
QualityStage jobs.
Plan
InfoSphere Information Server includes capabilities that you can use to manage the
structure of your information project from initial sketches to delivery. By
collaborating on blueprints, your team can connect the business vision for your
project with corresponding business and technical artifacts.
To enhance your blueprint, you can create a business glossary to develop and
share a common vocabulary between your business and IT users. The terms that
you create in your glossary establish a common understanding of business
concepts, further improving communication and efficiency.
As your information landscape evolves, it is crucial to understand how information
assets are connected. To help users understand the origin of data, you can associate
the terms that you create to information assets in your blueprint.
Introduction
For example, users can view data lineage reports to understand how data flows
between assets. Terms are stored in the metadata repository so that they can be
shared and reused by users of other suite tools. When a user changes a term, the
change is made in every location where that term is used, ensuring that a
vocabulary is standardized throughout your enterprise.
InfoSphere
Blueprint
Director
InfoSphere
Information
Governance
Catalog
InfoSphere
Glossary
Anywhere
InfoSphere
Discovery
InfoSphere
Data Architect
InfoSphere
Information
Analyzer
InfoSphere
FastTrack
Design
InfoSphere Information Server can help you design and create information models
based on specific requirements of your information project. Carefully designing
your physical data models, logical data models, and databases ensures that your
architecture can handle changes as they occur, rather than reacting to changes after
they happen.
New data continuously enters your applications, data warehouses, and business
analytic systems. By using InfoSphere Information Server, you can design
sophisticated data quality rules that you can modify in real time as your data
evolves. In addition, you can scan samples of your data to determine their quality
and structure so that you can correct problems before they affect your project. This
approach ensures reliability and integrity of your data by consistently monitoring
changes and making modifications.
You can also design your architecture to move large quantities of data in real time
from your source applications to your data warehouse or analytics dashboard. Poor
design requires constant changes to adapt your environment as the size of data
volumes fluctuate. InfoSphere Information Server helps you to design your
architecture to handle these demands from the outset so that the information that
you need in your warehouses and analytic systems is delivered quickly and
reliably.
InfoSphere
Data Architect
InfoSphere
DataStage
InfoSphere
Information
Analyzer
InfoSphere
FastTrack
InfoSphere
QualityStage
Develop
InfoSphere Information Server supports information quality and consistency by
standardizing, validating, matching, and merging data. By using the suite of
Introduction
components, you can certify and enrich common data elements, use trusted data
such as postal records for name and address information, and match records across
or within data sources.
InfoSphere Information Server enables a single record to survive from the best
information across sources for each unique entity, helping you to create a single,
comprehensive, and accurate view of information across source systems.
In addition, InfoSphere Information Server transforms and enriches information to
ensure that it is in the required context for new uses. Hundreds of prebuilt
transformation functions combine, restructure, and aggregate information.
Transformation functions are broad and flexible to meet the requirements of varied
integration scenarios. For example, InfoSphere Information Server provides inline
validation and transformation of complex data types such as US Health Insurance
Portability and Accountability Act (HIPAA), and high-speed joins and sorts of
heterogeneous data. InfoSphere Information Server also provides high-volume,
complex data transformation and movement functions that can be used for
stand-alone extract, transform, and load (ETL) scenarios, or as a real-time engine
for processing applications or processes.
InfoSphere
DataStage
InfoSphere
Information
Analyzer
InfoSphere
QualityStage
Deliver
InfoSphere Information Server includes the capabilities to virtualize, synchronize,
and move information to the people, processes, and applications that need it.
Information can be delivered by using federation-based, time-based, or event-based
processing, moved in large bulk volumes from location to location, or accessed in
place when it cannot be consolidated.
InfoSphere Information Server provides direct, local access to various information
sources, both mainframe and distributed. It provides access to databases, files,
services, and packaged applications, and to content repositories and collaboration
systems. Companion products allow high-speed replication, synchronization, and
distribution across databases, change data capture, and event-based publishing of
information.
InfoSphere
Information
Services
Director
InfoSphere
Information
Governance
Catalog
InfoSphere
Glossary
Anywhere
Share
InfoSphere
DataStage
InfoSphere
QualityStage
Share
InfoSphere
Information
Server Packs
InfoSphere
Information
Governance
Catalog
Share
Metadata repository
InfoSphere Information Server framework
Introduction
Table 1. Components that are included in each InfoSphere Information Server solution
Component
InfoSphere
DataStage
InfoSphere
Information
Server for Data
Integration
InfoSphere
Information
Server for Data
Quality
InfoSphere
QualityStage
InfoSphere
Information
Governance
Catalog
U*
InfoSphere
Information
Server
Enterprise
Edition
U
U
InfoSphere
DataStage and
QualityStage
Designer
U*
InfoSphere
Blueprint
Director
InfoSphere Data
Click
InfoSphere Data
Quality Console
InfoSphere
Discovery for
Information
Integration
InfoSphere
FastTrack
InfoSphere
Information
Analyzer
U
U
U
U
U
U
InfoSphere
Information
Governance
Catalog
U*
U*
InfoSphere
Glossary
Anywhere
InfoSphere
Information
Governance
Dashboard
InfoSphere
Information
Services Director
10
Introduction
11
Assets
InfoSphere Information
Governance Catalog
Data models
InfoSphere DataStage
Jobs
Shared containers
12
Introduction
Warehouse databases
Table 2. How software products work with InfoSphere Blueprint Director (continued)
What you can do in InfoSphere
Blueprint Director
Assets
InfoSphere FastTrack
Mapping projects
Examples
Parts of Microsoft
Office files
v Rows of a spreadsheet in
a Microsoft Office Excel
file
v Slides in a Microsoft
Office PowerPoint file
Files
13
InfoSphere
Blueprint
Director
InfoSphere
Information
Services
Director
InfoSphere
Discovery
InfoSphere
Data Architect
InfoSphere
Information
Governance
Catalog
InfoSphere
Glossary
Anywhere
Share
InfoSphere
DataStage
InfoSphere
Information
Analyzer
InfoSphere
FastTrack
Share
Share
InfoSphere
QualityStage
InfoSphere
Information
Governance
Catalog
Share
Metadata repository
InfoSphere Information Server framework
14
Introduction
15
data sources that populated the business intelligence report. The CFO was able to
review financial parameters for each stage of production to analyze where changes
were necessary. While viewing the report, the CFO used IBM InfoSphere Glossary
Anywhere to understand the definitions of the fields that were used in the report
without having to contact the finance department.
16
Introduction
17
You can create data lineage reports, business lineage reports, and impact
analysis reports to visualize relationships. Data lineage reports help you
understand where data comes from and where it goes. Business lineage
reports show less detailed reports, excluding detailed information that
business users do not need.
Manage metadata
You can create and edit descriptions for assets in the InfoSphere
Information Server metadata repository. These changes proliferate through
the metadata repository so that other suite users have access to the most
current metadata.
You can also extend data lineage to external processes that do not write to
disk, or ETL tools, scripts, and other programs that do not save their
metadata in the metadata repository. You can create and import extended
assets, and then use these assets in extension mapping documents to track
the flow of information to and from the extended data sources and other
assets.
18
Introduction
InfoSphere
Blueprint
Director
InfoSphere
Information
Services
Director
InfoSphere
Discovery
InfoSphere
Data Architect
InfoSphere
Information
Governance
Catalog
InfoSphere
Glossary
Anywhere
InfoSphere
DataStage
InfoSphere
Information
Analyzer
InfoSphere
FastTrack
Share
Share
Share
InfoSphere
QualityStage
InfoSphere
Information
Governance
Catalog
Share
Metadata repository
InfoSphere Information Server framework
InfoSphere DataStage
InfoSphere DataStage is a data integration tool that enables users to move and
transform data between operational, transactional, and analytical target systems.
Data transformation and movement is the process by which source data is selected,
converted, and mapped to the format required by target systems. The process
manipulates data to bring it into compliance with business, domain, and integrity
rules, and with other data in the target environment.
InfoSphere DataStage provides direct connectivity to enterprise applications as
sources or targets, ensuring that the most relevant, complete, and accurate data is
integrated into your data integration project.
By using the parallel processing capabilities of multiprocessor hardware platforms,
InfoSphere DataStage enables your organization to solve large-scale business
19
Balanced Optimization
Balanced Optimization helps to improve the performance of your InfoSphere
DataStage job designs that use connectors to read or write source data. You design
your job and then use Balanced Optimization to redesign the job automatically to
your stated preferences.
For example, you can maximize performance by minimizing the amount of input
and output (I/O) that are used, and by balancing the processing against source,
intermediate, and target environments. You can then examine the new optimized
job design and save it as a new job. Your root job design remains unchanged.
You can use the Balanced Optimization features of InfoSphere DataStage to push
sets of data integration processing and related data I/O into a database
managements system (DBMS) or into a Hadoop cluster.
20
Introduction
21
volumes of data, your organization can linearly scale the speed of data
throughput. A scalable platform that includes parallel processing and
incorporates flexible, reusable functions enables users to design logic once,
and then run and scale that logic anywhere.
By using parallel processing capabilities of multiprocessor hardware
platforms, you can scale transformation jobs to address any demands, large
or small. During development, the deployment configuration automatically
adds the degree of parallelism that you specify. By making a simple change
to the configuration file, you can change your application from 2-way
processing to 32-way processing to 128-way processing.
Design reusable transformation jobs
Reusable transformation functions enable data integration specialists to
maximize speed, flexibility, and effectiveness in their designs.
Data integration specialists use the rich user interface for all design work,
including workflow, data integration, and data quality. Prebuilt
transformation functions can dragged to a design, making it easy to
determine the flow of information and the transformations that occur. Any
portion of the design can be shared and reused across the data integration
landscape, maximizing reuse and productivity.
Extend connectivity to various objects
By using common connectors, any data source that is supported by
InfoSphere Information Server can be used as input to or output from
InfoSphere DataStage, enabling your organization to integrate data
effectively across the enterprise.
A nearly unlimited number of heterogeneous data sources and targets are
supported, including text files, complex data structures in XML, enterprise
resource planning (ERP) systems such as SAP and PeopleSoft, nearly any
database, web services, and business intelligence (BI) tools like SAS.
Manage operations and resources
By operating in real time, your organization can capture messages or
extract data at any moment on the same platform that integrates bulk data
and uses transformation rules. This integration ensures that data can be
used to respond to your data integration needs on demand.
Real-time data integration support captures messages from Message
Oriented Middleware (MOM) queues using JMS or WebSphere MQ
adapters to combine data into operational and historical analysis
perspectives. By using InfoSphere DataStage with InfoSphere Information
Services Director, data integration jobs can be deployed with Java
Message Services, web services, or other services. This service-oriented
architecture (SOA) enables numerous developers to share complex data
integration processes without having to understand the steps contained in
the services.
You can use the InfoSphere DataStage Operations Console to access
information about your jobs, job activity, and system resources for each of
your InfoSphere Information Server engines. The Operations Console is
useful for troubleshooting failed job runs, improving job run performance,
and actively monitoring your engines.
22
Introduction
InfoSphere
Blueprint
Director
InfoSphere
Information
Services
Director
InfoSphere
Discovery
InfoSphere
Data Architect
InfoSphere
Information
Governance
Catalog
InfoSphere
Glossary
Anywhere
Share
InfoSphere
DataStage
InfoSphere
Information
Analyzer
InfoSphere
FastTrack
Share
Share
InfoSphere
QualityStage
InfoSphere
Information
Governance
Catalog
Share
Metadata repository
InfoSphere Information Server framework
23
24
Introduction
25
architects use InfoSphere Data Architect to incorporate the terms and definitions
from that vocabulary into existing or new data models to align business and IT.
Data architects can extend the definitions with information about the mapping of
those concepts to the physical databases so that business users can identify
appropriate data sources for ad hoc queries or reports. Those definitions can also
become the basis for InfoSphere Information Server to load data warehouses,
consolidate databases for mergers, or establish and manage master data.
Data architects can use InfoSphere Data Architect to design dimensional models for
InfoSphere Data Warehouse, IBM Netezza, and Cognos BI reporting. This support
for dimensional modeling can help reduce development time of warehouse and
business intelligence systems.
Team members across your organization can use InfoSphere Data Architect as a
plug-in to a shared Eclipse instance, or share assets through standard configuration
management repositories like Rational Team Concert and Subversion. This
integration enables collaboration across your organization and ensures a clear
division of responsibilities.
InfoSphere
Blueprint
Director
InfoSphere
Information
Services
Director
InfoSphere
Discovery
InfoSphere
Data Architect
InfoSphere
Information
Governance
Catalog
InfoSphere
Glossary
Anywhere
InfoSphere
DataStage
InfoSphere
Information
Analyzer
InfoSphere
FastTrack
Share
Share
Share
InfoSphere
QualityStage
InfoSphere
Information
Governance
Catalog
Share
Metadata repository
InfoSphere Information Server framework
InfoSphere Discovery
InfoSphere Discovery provides innovative data exploration and analysis techniques
to automatically discover relationships and mappings among structured data in
your enterprise. The analysis is based on actual values in the data, rather than on
just metadata.
This value-driven analysis means that InfoSphere Discovery can detect
relationships between tables and columns whose names or metadata alone do not
26
Introduction
suggest any connection. InfoSphere Discovery can identify and generate highly
complex transformations that you can use to describe the locations and formats of
sensitive data, describe the relationships of data elements across applications, or
output as SQL code or extract, transform, and load (ETL) code for use in data
transformation jobs.
Before you implement any data-centric projects, you must know what data you
have, where it is located, and how it relates between various systems.
InfoSphere Discovery is a data analysis tool that provides a full range of data
analysis capabilities that include single source profiling, cross-system data overlap
analysis, and matching key discovery. It provides a profiling solution with
automatic primary-foreign key discovery and validation. In addition, InfoSphere
Discovery can analyze the data overlap across multiple sources simultaneously.
27
The hospital used InfoSphere Discovery to discover statistics about the columns in
each of the databases. These statistics were used to develop a detailed
understanding of the structure and format of the patient records, which helped to
normalize and standardize records. Using the built-in classification algorithms,
data analysts identified patterns that matched data in each record, such as patient
name, address, and date of birth. By using custom classifications, data analysts
isolated sensitive data elements, and enforced enterprise-wide policies to protect
these elements, masking them from unauthorized users.
28
Introduction
29
InfoSphere
Blueprint
Director
InfoSphere
Information
Services
Director
InfoSphere
Discovery
InfoSphere
Data Architect
InfoSphere
Information
Governance
Catalog
InfoSphere
Glossary
Anywhere
InfoSphere
DataStage
InfoSphere
Information
Analyzer
InfoSphere
FastTrack
Share
Share
Share
InfoSphere
QualityStage
InfoSphere
Information
Governance
Catalog
Share
Metadata repository
InfoSphere Information Server framework
InfoSphere FastTrack
InfoSphere FastTrack provides capabilities to automate the workflow of your data
integration project. Users can track and automate multiple data integration tasks,
shortening the time between developing business requirements and implementing
a solution.
Business analysts use InfoSphere FastTrack to translate business requirements into
a set of specifications, which data integration specialists then use to produce a data
integration application that incorporates the business requirements.
By automating the flow of information and increasing collaboration, development
time is reduced. By linking information in a shared metadata repository, data is
accessible, current, and integrated across the data integration project.
30
Introduction
31
32
Introduction
repository. Other components in the suite can access lineage information directly to
simplify the collection and management of metadata across your organization.
33
item (finished goods), and material (raw materials). The company plans to migrate
data into a single master SAP environment and a companion SAP business
intelligence reporting platform.
34
Introduction
35
InfoSphere
Blueprint
Director
InfoSphere
Information
Services
Director
InfoSphere
Discovery
InfoSphere
Data Architect
InfoSphere
Information
Governance
Catalog
InfoSphere
Glossary
Anywhere
Share
InfoSphere
DataStage
InfoSphere
Information
Analyzer
InfoSphere
FastTrack
Share
Share
InfoSphere
QualityStage
InfoSphere
Information
Governance
Catalog
Share
Metadata repository
InfoSphere Information Server framework
36
Introduction
37
One of the major advantages of this approach is that you can combine data
integration tasks with the leading enterprise messaging, enterprise
application integration (EAI), and business process management (BPM)
products by choosing the protocol binding that you want to use to deploy
information services.
InfoSphere QualityStage
InfoSphere QualityStage provides capabilities to create and maintain an accurate
view of data entities such as customer, location, vendors, and products throughout
your enterprise.
38
Introduction
39
visited customer portals on the web could not get complete information about their
account status, eligible services, and other details.
Using InfoSphere QualityStage, the company implemented a real-time, in-process
data quality check of all portal inquiries. InfoSphere QualityStage and WebSphere
MQ transactions were combined to retrieve customer data from multiple sources
and return integrated customer views. The new process provides more than 25
million subscribers with a real-time, complete view of their insurance services. A
unique customer ID for each subscriber also helps the insurer move toward a
single customer database for improved customer service and marketing.
40
Introduction
conform to load standards are identified and filtered so that only the best
representation of the match data is loaded into the master data record.
Missing values in one record are supplied with values from other records
of the same entity. Missing values can also be populated with values from
corresponding records that have been identified as a group in the matching
stage.
41
InfoSphere
Blueprint
Director
InfoSphere
Information
Services
Director
InfoSphere
Discovery
InfoSphere
Data Architect
InfoSphere
Information
Governance
Catalog
InfoSphere
Glossary
Anywhere
InfoSphere
DataStage
InfoSphere
Information
Analyzer
InfoSphere
FastTrack
Share
Share
Share
InfoSphere
QualityStage
InfoSphere
Information
Governance
Catalog
Share
Metadata repository
InfoSphere Information Server framework
Mainframe integration
InfoSphere Information Server extends its capabilities to the mainframe with
InfoSphere Information Server for System z and InfoSphere Classic. These
solutions enable your organization to deliver trusted information for key business
decisions and provide the most current information to people, processes, and
applications.
42
Introduction
and service-oriented architecture (SOA) projects. The Linux software for System z
also provides rich security features, stability, flexibility, interoperability, and
reduced software costs.
InfoSphere Information Server for System z is a fully integrated software platform
that profiles, cleanses, transforms, and delivers information from both mainframe
and distributed data sources to drive greater insight for your business without
added IBM z/OS operational costs. Your organization can derive more value from
the complex, heterogeneous information spread across systems.
With breakthrough productivity and performance for cleansing, transforming, and
moving this information consistently and securely throughout your enterprise,
InfoSphere Information Server for System z helps you access and use information
in new ways to drive innovation, increase operational efficiency, and lower risk.
InfoSphere Information Server for System z uniquely balances the reliability,
scalability, and security of the System z platform with the low-cost processing
environment of the Integrated Facility for the Linux specialty engine.
In a System z environment, all of the fundamental information components of the
solution are hosted on the System z server, including the operational data store, the
data warehouse, and any data marts utilized in the system. This model is well
suited to address strategic business needs, centered around the demand for
real-time or near real-time access to information to support key decision-making or
business process requirements.
43
44
Introduction
45
46
Introduction
auditing throughout the federation process. Federated queries can scale to run
against any volume of information by using the parallel processing engine of
InfoSphere Information Server.
By using a federated system, you can send distributed requests to multiple data
sources within a single SQL statement. For example, you can join data that is in an
IBM DB2 table, an Oracle table, a web service, and an XML file in a single SQL
statement.
Scenarios for data integration:
The following scenarios show how different organizations used InfoSphere
Federation Server to solve their data integration needs.
Financial services: Risk management
A major European bank wanted to improve risk management across its member
institutions and meet deadlines for Basel II compliance. The bank had different
methods of measuring risk among its members. Without a consolidated view of
risk management, the bank had to generated individual reports for each vendor.
The bank used InfoSphere Federation Server as part of their solution to enable
compliance with Basel II by implementing a single mechanism to measure risk. A
database management system stores a historical view of data, handles large
volumes of information, and distributes data in a format that enables analysis and
reporting. The bank can view data in operational systems that are spread across
the enterprise, including vendor information.
Product development: Defect tracking
A major automobile manufacturer needed to quickly identify and fix defects in
several lines of its cars. Traditional methods, such as data queries or reporting,
were too complex and too slow to identify the sources of problems.
By installing InfoSphere Federation Server, the company was able to identify and
fix defects by mining data from multiple databases that store warranty information
and correlating warranty reports with individual components or software in its
vehicles.
Government: Emergency response
A small government needed to eliminate storage of redundant contact information
and simplify maintenance. The department had limited resource for any
improvements, and had only one DBA and one manager.
The government chose InfoSphere Federation Server to join employee contact
information in a human resources database on Oracle with information about
employee skills in an IBM DB2 database. The information was presented to
emergency personnel through a web portal that is implemented with IBM
WebSphere Application Server. By interfacing with existing SQL tools, the small
staff was able to merge employee contact information across multiple databases.
InfoSphere Federation Server tasks:
InfoSphere Federation Server helps your organization to virtualize data and
provide information in a form that applications and users need while hiding the
Introduction to InfoSphere Information Server
47
48
Introduction
49
are in stock. The website is updated to reflect the current quantity, and the retailer
can notify customers when the quantity of an item that they viewed on the website
is running low. The retailer can also use InfoSphere Data Replication to detect
when inventory is low so that more units can be purchased. This solution ensures
that customer satisfaction and loyalty remain high, and that customers return to
the retailer for additional purchases.
Financial services: Supporting continuous operations
A large, organized financial services company wanted to grow their business to
support international operations. The existing trading services were not prepared
for globalization, and the current database performance did not have the capacity
to meet projected load requirements. The company needed a solution that
supported high scalability and constant availability before they moving forward
with their plans to expand.
The company included InfoSphere Data Replication for DB2 for z/OS as part of
their solution to enable continuous, integrated management of their data across
time zones, and from multiple data sources. The software delivers high availability,
scalability, and performance with improved security that the company required to
expand their operations. Because of the active-active high availability
configuration, the company can complete daily maintenance operations without
stopping applications or experiencing downtime. If the primary system is affected
by an outage, a secondary system becomes active so that trading can continue
without interruptions.
InfoSphere Data Replication tasks:
InfoSphere Data Replication includes real-time data replication capabilities to help
your organization support high availability, database migration, application
consolidation, dynamic warehousing, master data management (MDM), service
oriented architecture (SOA), business analytics, and data quality processes.
Your organization uses InfoSphere Data Replication to complete the following
tasks:
Integrate information in real time
Real-time data integration enables your organization to sense and respond
to relevant business data changes throughout your enterprise.
InfoSphere Data Replication provides real-time feeds of changed data for
data warehouse or master data management (MDM) projects, enabling
your organization to make operational and business decisions based on the
latest information. As your organization changes, data and applications can
be consolidated without interruption. Data can be routed to various
message queues to be consumed by multiple applications, ensuring
accurate and reliable data across your enterprise.
Deliver data continuously
Continuous delivery of data ensures that your critical business operations
are always available and contain the most current information.
With InfoSphere Data Replication, you can synchronize data between two
systems to provide continuous availability. If the primary system is
impacted by a planned or unplanned outage, a secondary system is
available so that your business continues to run without interruption.
Publish changed data to multiple targets
InfoSphere Data Replication captures changed-data events from database
50
Introduction
51
52
Introduction
53
Client tier
The client tier consists of the client programs and consoles that are used for
development, administration, and other tasks and the computers where they are
installed.
The following tools are installed as part of the client tier, based on the products
and components that you select:
v
v
v
v
v
54
Introduction
IBM
IBM
IBM
IBM
IBM
InfoSphere
InfoSphere
InfoSphere
InfoSphere
InfoSphere
Client tier
InfoSphere
DataStage and
QualityStage
Administrator,
Designer, and
Director clients
Multi-client
manager
InfoSphere
FastTrack
clients
InfoSphere
Information
Server
Manager
client
InfoSphere Metadata
Integration Bridges
Administration
InfoSphere Connector
Migration Tool
InfoSphere
Information
Analyzer
client
istool
command line
Services tier
The services tier consists of the application server, common services, and product
services for the suite and product modules. The services tier provides common
services (such as security) and services that are specific to certain product modules.
On the services tier, IBM WebSphere Application Server hosts the services. The
services tier also hosts InfoSphere Information Server applications that are
Web-based.
Some services are common to all product modules. Other services are specific to
the product modules that you install. The services tier must have access to the
metadata repository tier and the engine tier.
An application server hosts these services. IBM WebSphere Application Server is
included with the suite for supported operating systems. You can choose to use
WebSphere Application Server Liberty Profile or WebSphere Application Server
Network Deployment. Alternatively, you can use an existing instance of WebSphere
Application Server Network Deployment, if the version is supported by InfoSphere
Information Server.
55
The following diagram shows the services that run on the application server on the
services tier.
Services tier
Application server
Product module-specific
services
Common
services
Connector access
services
Scheduling
Directory
Security
IBM InfoSphere
FastTrack services
Reporting
IBM InfoSphere
QualityStage services
Core services
IBM InfoSphere
DataStage services
Metadata services
56
Introduction
Engine tier
The engine tier is the logical group of engine components, communication agents,
and so on. The engine runs jobs and other tasks for product modules.
Several product modules require the engine tier for certain operations. You install
the engine tier components as part of the installation process for these product
modules. The following product modules require the engine tier:
v
v
v
v
v
57
58
Introduction
QualityStage. The Resource Tracker logs the processor, memory, and I/O
usage on each computer that runs parallel jobs.
dsrpcd (DSRPC Service)
Allows InfoSphere DataStage clients to connect to the server engine.
AIX HP-UX Solaris Linux: This process runs as a daemon (dsrpcd).
Microsoft Windows: This process runs as the DSRPC Service.
Job monitor
A Java application (JobMonApp) that collects processing information from
parallel engine jobs. The information is routed to the server controller
process for the parallel engine job. The server controller process updates
various files in the metadata repository with statistics such as the number
of inputs and outputs, the external resources that are accessed, operator
start time, and the number of rows processed.
Operational MetaData monitor
A Java application (OMDMonApp) that processes the operational metadata
XML files that are generated by job runs if the collection of operational
metadata is enabled. The information in the XML files is stored in the
metadata repository, and the XML files are deleted.
DataStage engine resource service
Microsoft Windows: Establishes the shared memory structure that is used
by server engine processes.
DataStage Telnet service
Microsoft Windows: Allows users to connect to the server engine by using
Telnet. Useful for debugging issues with the server engine. Does not need
to be started for normal InfoSphere DataStage processing.
MKS Toolkit
Microsoft Windows: Used by the InfoSphere Information Server parallel
engine to run jobs.
The following diagram shows the components that make up the engine tier. Items
marked with asterisks (*) are only present in Microsoft Windows installations.
59
Engine tier
ASB agents
Connectivity
(ODBC drivers
& native libaries)
Server engine
Connector access
services agent
InfoSphere Information
Analyzer agent
Parallel
engine
InfoSphere Information
Services Director agent
*MKS
Toolkit
DataStage
Telnet
Service
DataStage
Engine
Resource
service
istool
command
line
Job
monitor
DSRPC
service
Resource
Tracker
Operations
Console
processes
WLMServer
Source
and
target
data
Note: InfoSphere Metadata Integration Bridges are installed only on the client tier,
not on the engine tier.
Repository tier
The repository tier consists of the metadata repository and, if installed, other data
stores to support other product modules. The metadata repository contains the
shared metadata, data, and configuration information for InfoSphere Information
Server product modules. The other data stores store extended data for use by the
product modules they support, such as the operations database, which is a data
store that is used by the engine operations console.
The repository tier includes the metadata repository database for IBM InfoSphere
Information Server. The metadata repository exists as its own schema in this
database. The metadata repository is a shared component that stores design-time,
runtime, glossary, and other metadata for product modules in the InfoSphere
Information Server suite.
The repository tier also includes other data stores. Some of these data stores might
be referred to as databases or repositories throughout the documentation, based on
legacy naming conventions. However, they might exist as either separate database
schemas in a shared database or as schemas in their own separate databases in the
product suite. Some of these data stores can exist on other computers, and in that
sense the repository tier can be thought of as a logical tier. However, when this
documentation refers to the repository tier computer, it is the computer that hosts
the database for the metadata repository. Location and connection information for
the other data stores in the repository tier is stored in the metadata repository.
The repository tier can include these data stores:
v As part of IBM InfoSphere Metadata Asset Manager, a data store called the
staging area is installed as a separate schema in the database for the metadata
repository.
60
Introduction
61
Repository Tier
Metadata repository
database
InfoSphere
Information Analyzer
analysis databases
Metadata
repository
InfoSphere Metadata
Asset Manager
staging area
InfoSphere QualityStage
Standardization
Rules Designer
repository
InfoSphere DataStage
operations database
Tier relationships
The tiers provide services, job execution, and storage of metadata and other data
for the product modules that you install.
The following diagram illustrates the tier relationships.
62
Introduction
Client tier
Console
Services tier
Engine tier
ODBC
drivers
Engine
Product
modulespecific
services
Common
services
Data
Repository tier
63
Data pipelining
Data pipelining is the process of pulling records from the source system and moving
them through the sequence of processing functions that are defined in the
data-flow (the job). Because records are flowing through the pipeline, they can be
processed without writing the records to disk, as Figure 8 shows.
Data can be buffered in blocks so that each process is not slowed when other
components are running. This approach avoids deadlocks and speeds performance
by allowing both upstream and downstream processes to run concurrently.
Without data pipelining, the following issues arise:
v Data must be written to disk between processes, degrading performance and
increasing storage requirements and the need for disk management.
v The developer must manage the I/O processing between components.
v The process becomes impractical for large data volumes.
v The application will be slower, as disk use, management, and design
complexities increase.
v Each process must complete before downstream processes can begin, which
limits performance and full use of hardware resources.
64
Introduction
Data partitioning
Data partitioning is an approach to parallelism that involves breaking the record set
into partitions, or subsets of records. If no resource constraints or other data skew
issues exist, data partitioning can provide linear increases in application
performance. Figure 9 shows data that is partitioned by customer surname before it
flows into the Transformer stage.
Range
Round-robin
Random
Entire
v Modulus
v Database partitioning
InfoSphere Information Server automatically partitions data based on the type of
partition that the stage requires. Typical packaged tools lack this capability and
require developers to manually create data partitions, which results in costly and
time-consuming rewriting of applications or the data partitions whenever the
administrator wants to use more hardware capacity.
In a well-designed, scalable architecture, the developer does not need to be
concerned about the number of partitions that will run, the ability to increase the
number of partitions, or repartitioning data.
Dynamic repartitioning
In the examples shown in Figure 9 and Figure 10 on page 66, data is partitioned
based on customer surname, and then the data partitioning is maintained
throughout the flow.
65
Figure 10. A less practical approach to data partitioning and parallel execution
Without partitioning and dynamic repartitioning, the developer must take these
steps:
v Create separate flows for each data partition, based on the current hardware
configuration.
v Write data to disk between processes.
v Manually repartition the data.
v Start the next process.
The application will be slower, disk use and management will increase, and the
design will be much more complex. The dynamic repartitioning feature of
InfoSphere Information Server helps you overcome these issues.
66
Introduction
For maximum scalability, data integration software must use all available system
resources to accomplish data integration tasks. This capability must extend beyond
Symmetric Multiprocessing (SMP) systems to include both Massively Parallel
Processing (MPP) systems and grid systems.
InfoSphere Information Server components use grid, SMP, and MPP environments
to optimize the use of all available hardware resources.
For example, when you use the IBM InfoSphere DataStage and QualityStage
Designer to create a data-flow graph, the underlying hardware architecture and
number of processors is irrelevant. A separate configuration file defines the amount
and location of parallel processing the job should run with. This configuration is
bound to the job at run time and determines the resources required from the
underlying computing system.
As Figure 12 shows, the configuration provides a clean separation between creating
the data-flow graph and the parallel execution of the application. This separation
simplifies the development of scalable data integration systems that run in parallel.
Application Assembly: One dataflow graph
Match
Transform
Load
Source
Data
Data
Warehouse
Sequential
Disk
CPU
Memory
Uniprocessor
64-way parallel
Disk
CPU
CPU
CPU
CPU
Shared
Memory
SMP System
MPP System
(Symmetric Multiprocessing)
67
ensures that processing capacity does not inhibit project results and allows
solutions to easily expand to new hardware and to fully utilize the processing
power of all available hardware.
Administrative services
IBM InfoSphere Information Server provides administrative services to help you
manage users, roles, sessions, security, logs, and schedules. The Web console
provides global administration capabilities that are based on a common
framework.
The IBM InfoSphere Information Server console provides these services:
v Security services
v Log services on page 70
v Scheduling services on page 71
Security services
Security services support role-based authorization of users, access-control services,
and encryption that complies with many privacy and security regulations. As
Figure 13 on page 69 shows, the console helps administrators add users, groups,
and roles and lets administrators browse, create, delete, and update operations
within InfoSphere Information Server.
Directory services act as a central authority that can authenticate resources and
manage identities and relationships among identities. You can base directories on
68
Introduction
the InfoSphere Information Server internal directory, on external directories that are
based on LDAP and Microsoft Active Directory, or Microsoft Windows and UNIX
local operating systems.
Users use only one credential to access all the components of InfoSphere
Information Server.
69
v Session timeout
v Changes to audit logging configuration settings
The creation and removal of users and groups, assignment or removal of a user
from a group, and user password changes can be logged only if the User Registry
Configuration is set to InfoSphere Information Server User Registry. This registry
is also known as the InfoSphere Information Server internal user registry.
You can configure the location, size, name, and number of audit log files, as well
as the events to log.
Log services
Log services help you manage logs across all of the InfoSphere Information Server
suite components. The Web console provides a central place to view logs and
resolve problems. Logs are stored in the common repository, and each InfoSphere
Information Server suite component defines relevant logging categories.
You can configure which categories of logging messages are saved in the
repository. Log views are saved queries that an administrator can create to help
with common tasks. For example, you might want to display all of the IBM
InfoSphere Information Services Director error events that were logged in the last
24 hours.
Figure 14 shows where logging reports can be configured in the IBM InfoSphere
Information Server Web console. Logging is organized by server components. The
Web console displays default and active configurations for each component.
70
Introduction
Scheduling services
Scheduling services help plan and track activities such as logging and reporting
and suite component tasks such as data monitoring and trending. Schedules are
maintained by using the IBM InfoSphere Information Server console, which helps
you define schedules; view their status, history, and forecast; and purge them from
the system.
Reporting services
Reporting services manage run time and administrative aspects of reporting for
IBM InfoSphere Information Server.
You can create product-specific reports for IBM InfoSphere DataStage, IBM
InfoSphere QualityStage, and IBM InfoSphere Information Analyzer, and you can
create cross-product reports for logging, monitoring, scheduling, and security
services.
You can also access, delete, and purge report results contents from an associated
scheduled report execution.
You can set up and run all reporting tasks from the IBM InfoSphere Information
Server Web console. You can retrieve, view, and schedule reports to run at a
specific time and frequency. You can tag reports as favorites and restrict their
access for security purposes.
The following figure shows the IBM InfoSphere Information Server Web console.
You define reports by choosing from a set of templates and setting the parameters
for that template. You can specify a history policy that determines how the report
will be archived. Additionally, you can set a time frame for the report expiration, if
needed. Reports can be formatted as HTML, PDF, RTF, TXT, XLS, and XML.
71
72
Introduction
v You can track and analyze the data flow across departments and processes.
v Metadata is shared automatically among tools.
v Glossary definitions provide business context for metadata that is used in jobs
and reports.
v Data stewards take responsibility for metadata assets such as schemas and tables
that they have authority over.
v By using data lineage, you can focus on the end-to-end integration path, from
the design tool to the business intelligence (BI) report. Or you can drill down to
view any element of the lineage.
v You can eliminate duplicate or redundant metadata to create a single, reliable,
version that can be used by multiple tools.
Managing metadata
The metadata repository of IBM InfoSphere Information Server stores metadata
from suite tools and external tools and databases and enables sharing among them.
You can import metadata into the repository from various sources, export metadata
by various methods, and transfer metadata assets between design, test, and
production repositories.
73
74
Introduction
75
Business analytics
A large, for-profit education provider needed to devise a strategy for better
student retention. Business managers needed to analyze the student life
cycle from application to graduation in order to direct their recruiting
efforts at students with the best chance of success.
To meet this business imperative, the company designed and delivered a
business intelligence solution using a data warehouse. The warehouse
contains a single view of student information that is populated from
operational systems.
The IT organization uses InfoSphere Information Server and its metadata
repository to coordinate metadata throughout the project. Other tools that
are used include Embarcadero ER/Studio for data modeling and IBM
Cognos for business intelligence. The reports that are produced show an
accurate view of student trends over the lifecycle from application to
graduation.
The consumers are able to understand the meaning of the fields in their BI
reports by accessing the business definitions in InfoSphere Information
Governance Catalog. This enables them to identify key factors that
correlate student characteristics and retention. They are also able to
understand the origin of data in the reports by using business lineage,
which enables them to trust the sources and flow of the data that they are
looking at. The net result is the ability to make better decisions with more
confidence, allowing the education provider to design and implement
effective initiatives to retain students.
76
Introduction
77
78
Introduction
Accessible documentation
Accessible documentation for InfoSphere Information Server products is provided
in an information center. The information center presents the documentation in
XHTML 1.0 format, which is viewable in most web browsers. Because the
information center uses XHTML, you can set display preferences in your browser.
This also allows you to use screen readers and other assistive technologies to
access the documentation.
The documentation that is in the information center is also provided in PDF files,
which are not fully accessible.
79
80
Introduction
...
Indicates that you can specify multiple values for the previous argument.
Indicates mutually exclusive information. You can use the argument to the
left of the separator or the argument to the right of the separator. You
cannot use both arguments in a single use of the command.
{}
Note:
v The maximum number of characters in an argument is 256.
v Enclose argument values that have embedded spaces with either single or
double quotation marks.
For example:
wsetsrc[-S server] [-l label] [-n name] source
The source argument is the only required argument for the wsetsrc command. The
brackets around the other arguments indicate that these arguments are optional.
wlsac [-l | -f format] [key... ] profile
In this example, the -l and -f format arguments are mutually exclusive and
optional. The profile argument is required. The key argument is optional. The
ellipsis (...) that follows the key argument indicates that you can specify multiple
key names.
wrb -import {rule_pack | rule_set}...
In this example, the rule_pack and rule_set arguments are mutually exclusive, but
one of the arguments must be specified. Also, the ellipsis marks (...) indicate that
you can specify multiple rule packs or rule sets.
81
82
Introduction
Software services
My IBM
IBM representatives
83
84
Introduction
If you want to access a particular topic, specify the version number with the
product identifier, the documentation plug-in name, and the topic path in the
URL. For example, the URL for the 11.3 version of this topic is as follows. (The
symbol indicates a line continuation):
http://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/
com.ibm.swg.im.iis.common.doc/common/accessingiidoc.html
Tip:
The knowledge center has a short URL as well:
http://ibm.biz/knowctr
To specify a short URL to a specific product page, version, or topic, use a hash
character (#) between the short URL and the product identifier. For example, the
short URL to all the InfoSphere Information Server documentation is the
following URL:
http://ibm.biz/knowctr#SSZJPZ/
And, the short URL to the topic above to create a slightly shorter URL is the
following URL (The symbol indicates a line continuation):
http://ibm.biz/knowctr#SSZJPZ_11.3.0/com.ibm.swg.im.iis.common.doc/
common/accessingiidoc.html
85
AIX Linux
Where <host> is the name of the computer where the information center is
installed and <port> is the port number for the information center. The default port
number is 8888. For example, on a computer named server1.example.com that uses
the default port, the URL value would be http://server1.example.com:8888/help/
topic/.
86
Introduction
Notices
IBM may not offer the products, services, or features discussed in this document in
other countries. Consult your local IBM representative for information on the
products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product,
program, or service that does not infringe any IBM intellectual property right may
be used instead. However, it is the user's responsibility to evaluate and verify the
operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not grant you
any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785 U.S.A.
For license inquiries regarding double-byte character set (DBCS) information,
contact the IBM Intellectual Property Department in your country or send
inquiries, in writing, to:
Intellectual Property Licensing
Legal and Intellectual Property Law
IBM Japan Ltd.
19-21, Nihonbashi-Hakozakicho, Chuo-ku
Tokyo 103-8510, Japan
The following paragraph does not apply to the United Kingdom or any other
country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or
implied warranties in certain transactions, therefore, this statement may not apply
to you.
This information could include technical inaccuracies or typographical errors.
Changes are periodically made to the information herein; these changes will be
incorporated in new editions of the publication. IBM may make improvements
and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
87
Any references in this information to non-IBM Web sites are provided for
convenience only and do not in any manner serve as an endorsement of those Web
sites. The materials at those Web sites are not part of the materials for this IBM
product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it
believes appropriate without incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information which has been exchanged, should contact:
IBM Corporation
J46A/G4
555 Bailey Avenue
San Jose, CA 95141-1003 U.S.A.
Such information may be available, subject to appropriate terms and conditions,
including in some cases, payment of a fee.
The licensed program described in this document and all licensed material
available for it are provided by IBM under terms of the IBM Customer Agreement,
IBM International Program License Agreement or any equivalent agreement
between us.
Any performance data contained herein was determined in a controlled
environment. Therefore, the results obtained in other operating environments may
vary significantly. Some measurements may have been made on development-level
systems and there is no guarantee that these measurements will be the same on
generally available systems. Furthermore, some measurements may have been
estimated through extrapolation. Actual results may vary. Users of this document
should verify the applicable data for their specific environment.
Information concerning non-IBM products was obtained from the suppliers of
those products, their published announcements or other publicly available sources.
IBM has not tested those products and cannot confirm the accuracy of
performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the
suppliers of those products.
All statements regarding IBM's future direction or intent are subject to change or
withdrawal without notice, and represent goals and objectives only.
This information is for planning purposes only. The information herein is subject to
change before the products described become available.
This information contains examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples include the
names of individuals, companies, brands, and products. All of these names are
fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
88
Introduction
Type of cookie
that is used
Purpose of data
Disabling the
cookies
Any (part of
InfoSphere
Information
Server
installation)
InfoSphere
Information
Server web
console
v Session
User name
v Session
management
Cannot be
disabled
Any (part of
InfoSphere
Information
Server
installation)
InfoSphere
Metadata Asset
Manager
v Session
Product module
v Persistent
v Authentication
v Persistent
No personally
identifiable
information
v Session
management
Cannot be
disabled
v Authentication
v Enhanced user
usability
v Single sign-on
configuration
89
Table 5. Use of cookies by InfoSphere Information Server products and components (continued)
Product module
Component or
feature
Type of cookie
that is used
Purpose of data
Disabling the
cookies
InfoSphere
DataStage
v Session
v User name
v Persistent
v Digital
signature
v Session
management
Cannot be
disabled
InfoSphere
DataStage
XML stage
Session
v Authentication
v Session ID
v Single sign-on
configuration
Internal
identifiers
v Session
management
Cannot be
disabled
v Authentication
InfoSphere
DataStage
InfoSphere Data
Click
IBM InfoSphere
DataStage and
QualityStage
Operations
Console
Session
InfoSphere
Information
Server web
console
v Session
InfoSphere Data
Quality Console
No personally
identifiable
information
User name
v Persistent
v Session
management
Cannot be
disabled
v Authentication
v Session
management
Cannot be
disabled
v Authentication
Session
No personally
identifiable
information
v Session
management
Cannot be
disabled
v Authentication
v Single sign-on
configuration
InfoSphere
QualityStage
Standardization
Rules Designer
InfoSphere
Information
Server web
console
InfoSphere
Information
Governance
Catalog
InfoSphere
Information
Analyzer
v Session
User name
v Persistent
v Session
management
Cannot be
disabled
v Authentication
v Session
v User name
v Persistent
v Internal
identifiers
v Session
management
Cannot be
disabled
v Authentication
Session
Session ID
Session
management
Cannot be
disabled
If the configurations deployed for this Software Offering provide you as customer
the ability to collect personally identifiable information from end users via cookies
and other technologies, you should seek your own legal advice about any laws
applicable to such data collection, including any requirements for notice and
consent.
For more information about the use of various technologies, including cookies, for
these purposes, see IBMs Privacy Policy at http://www.ibm.com/privacy and
IBMs Online Privacy Statement at http://www.ibm.com/privacy/details the
section entitled Cookies, Web Beacons and Other Technologies and the IBM
Software Products and Software-as-a-Service Privacy Statement at
http://www.ibm.com/software/info/product-privacy.
90
Introduction
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of
International Business Machines Corp., registered in many jurisdictions worldwide.
Other product and service names might be trademarks of IBM or other companies.
A current list of IBM trademarks is available on the Web at www.ibm.com/legal/
copytrade.shtml.
The following terms are trademarks or registered trademarks of other companies:
Adobe is a registered trademark of Adobe Systems Incorporated in the United
States, and/or other countries.
Intel and Itanium are trademarks or registered trademarks of Intel Corporation or
its subsidiaries in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other
countries, or both.
Microsoft, Windows and Windows NT are trademarks of Microsoft Corporation in
the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Java and all Java-based trademarks and logos are trademarks or registered
trademarks of Oracle and/or its affiliates.
The United States Postal Service owns the following trademarks: CASS, CASS
Certified, DPV, LACSLink, ZIP, ZIP + 4, ZIP Code, Post Office, Postal Service, USPS
and United States Postal Service. IBM Corporation is a non-exclusive DPV and
LACSLink licensee of the United States Postal Service.
Other company, product or service names may be trademarks or service marks of
others.
91
92
Introduction
Index
A
architecture
engine 57
services 55
architecture, InfoSphere Information
Server 51
ASB agents 57
auditing services 68
Balanced Optimization
overview 19
business initiatives aided
engine tier
overview 57
exporting models 29
exporting terms 29
C
capabilities
InfoSphere FastTrack 31
client tier
described 54
cluster 77
clustering 76
collaborative development
InfoSphere FastTrack 32
column analysis 27, 28
command-line syntax
conventions 81
commands
syntax 81
common connectivity 51
common services 51
common services tier
high availability 77
companion components
InfoSphere DataStage Change Data
Capture 44
InfoSphere Information Server
Packs 44
InfoSphere Replication Server 44
companion products
companion components
InfoSphere Federation Server 44
InfoSphere Data Event Publisher 44
InfoSphere Federation Server 44
component installer
described 57
components
described 54
components in the suite 8
Connector access services agent 57
cross-platform services 68
cross-system data analysis types 26
customer scenarios
InfoSphere Information Server
Packs 44
customer support
contacting 83
F
features
InfoSphere FastTrack 31
foreign key analysis 28
G
grid
high availability 77
grid computing 68
H
Hadoop
integration with InfoSphere
DataStage 19
HDFS stage
overview 19
high availability 77
analysis database 77
grid 77
high availability
match database 77
InfoSphere Information Server
engine 77
metadata repository 77
overview 76
I
IBM MetaBrokers and bridges
overview 73
importing data sources 29
InfoSphere Data Architect 23
InfoSphere Data Event Publisher
InfoSphere DataStage
jobs, defined 21
scenarios 19
stages, defined 21
using 21
44
93
L
legal notices 87
logging services 68
logical tiers 54
M
Massively Parallel Processing (MPP),
exploiting 67
metadata management
overview 72
metadata repository 73
high availability 77
multi-client manager 54
N
nodes
services (continued)
reporting 71
scheduling 68
security 68
services tier
high availability 77
single point of failure 76
software services
contacting 83
special characters
in command-line syntax 81
SPOF 76
support
customer 83
Symmetric Multiprocessing (SMP),
exploiting 67
syntax
command-line 81
77
T
O
P
parallel engine
monitoring system resources
parallel processing
basics 64
overview 63
planning
client tier 54
engine tier 57
repository tier 60
services tier 55
primary key analysis 28
product accessibility
accessibility 79
product documentation
accessing 85
57
U
unified metadata 51
unified parallel processing engine
unified user interfaces 51
V
volume analysis
W
workflow
InfoSphere FastTrack
R
redundancy 76
reporting services 71
repository tier
overview 60
Resource Tracker
monitoring system resources
S
scalability 67, 72
scenarios
InfoSphere DataStage 19
InfoSphere FastTrack 30
scheduling services 68
security services 68
services
auditing 68
logging 68
94
27
Introduction
57
32
51
54
Printed in USA
SC19-4312-00
Spine information:
Version 11 Release 3
Introduction