Siperian XU Administrator's Guide SP1
Siperian XU Administrator's Guide SP1
Siperian XU Administrator's Guide SP1
Administrator Guide
© 2008 Siperian, Inc.
Copyright 2008 Siperian Inc. [Unpublished - rights reserved under the Copyright Laws of the United
States]
Siperian and the Siperian logo are trademarks or registered trademarks of Siperian, Inc. in the US and
other countries. All other products or services mentioned are the trademarks or service marks of their
respective companies or organizations.
THIS DOCUMENTATION CONTAINS CONFIDENTIAL INFORMATION AND TRADE
SECRETS OF SIPERIAN, INC. USE, DISCLOSURE OR REPRODUCTION IS PROHIBITED
WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF SIPERIAN, INC.
Revised: February 20, 2009
Contents
Preface
Intended Audience .......................................................................................................................................xxv
Organization...................................................................................................................................................xxv
Learning About Siperian Hub ..................................................................................................................xxviii
Contacting Siperian ......................................................................................................................................xxxi
Part 1: Introduction
Chapter 1: Introduction
About Siperian Hub Administrators...............................................................................................................4
Phases in Siperian Hub Administration .........................................................................................................4
Startup Phase............................................................................................................................................4
Configuration Phase................................................................................................................................5
Production Phase.....................................................................................................................................5
Summary of Administration Tasks .................................................................................................................6
Setting Up Security ..................................................................................................................................6
Building the Data Model ........................................................................................................................7
Configuring the Data Flow ....................................................................................................................9
Executing Siperian Hub Processes .....................................................................................................13
Configuring Hierarchies .......................................................................................................................14
Configuring Workflow Integration.....................................................................................................14
Other Administration Tasks ................................................................................................................15
iii
Navigating the Hub Console......................................................................................................................... 24
Toggling Between the Processes and Workbenches Views............................................................ 24
Starting a Tool in the Workbenches View ........................................................................................ 27
Acquiring Locks to Change Settings in the Hub Console .............................................................. 28
Changing the Target Database............................................................................................................ 31
Logging in as a Different User............................................................................................................ 32
Changing the Password for a User ..................................................................................................... 32
Using the Navigation Tree in the Navigation Pane ......................................................................... 33
Adding, Editing and Removing Objects Using Command Buttons............................................. 43
Customizing the Hub Console Interface........................................................................................... 45
Showing Version Details...................................................................................................................... 47
Siperian Hub Workbenches and Tools........................................................................................................ 48
Tools in the Configuration Workbench ............................................................................................ 48
Tools in the Model Workbench.......................................................................................................... 49
Tools in the Security Access Manager Workbench ......................................................................... 50
Tools in the Data Steward Workbench ............................................................................................. 50
Tools in the Utilities Workbench ....................................................................................................... 51
Contents v
About Dependent Objects ................................................................................................................ 117
How Dependent Objects Are Related to Base Objects and Cross-reference Tables .............. 118
Process Overview for Defining Dependent Objects .................................................................... 119
Dependent Object Columns ............................................................................................................. 120
Creating Dependent Objects............................................................................................................. 121
Editing Dependent Objects............................................................................................................... 123
Deleting Dependent Objects............................................................................................................. 125
Configuring Columns in Tables.................................................................................................................. 125
About Columns ................................................................................................................................... 126
Navigating to the Column Editor..................................................................................................... 131
Adding Columns ................................................................................................................................. 134
Importing Column Definitions From Another Table................................................................... 135
Editing Column Properties................................................................................................................ 137
Changing the Column Display Order .............................................................................................. 139
Deleting Columns ............................................................................................................................... 139
Configuring Foreign-Key Relationships Between Base Objects ........................................................... 140
About Foreign Key Relationships .................................................................................................... 140
Parent and Child Base Objects ......................................................................................................... 142
Process Overview for Defining Foreign-Key Relationships........................................................ 143
Adding Foreign-Key Relationships .................................................................................................. 143
Editing Foreign-Key Relationships .................................................................................................. 145
Configuring Lookups for Foreign-Key Relationships................................................................... 147
Deleting Foreign-Key Relationships ................................................................................................ 147
Viewing Your Schema .................................................................................................................................. 148
Starting the Schema Viewer............................................................................................................... 148
Zooming In and Out of the Schema Diagram ............................................................................... 150
Switching Views of the Schema Diagram ....................................................................................... 152
Navigating to Related Design Objects and Batch Jobs................................................................. 155
Configuring Schema Viewer Options .............................................................................................. 156
Saving the Schema Diagram as a JPG Image ................................................................................. 157
Printing the Schema Diagram ........................................................................................................... 158
Contents vii
Enabling Match on Pending Records .............................................................................................. 214
Enabling Message Queue Triggers for State Changes................................................................... 215
Modifying the State of Records .................................................................................................................. 216
Promoting Records in the Data Steward Tools ............................................................................. 216
Promoting Records Using the Promote Batch Job ....................................................................... 218
Rules for Loading Data ................................................................................................................................ 221
Contents ix
Trust Settings and Validation Rules ................................................................................................. 303
Run-time Execution Flow of the Load Process............................................................................. 304
Other Considerations for the Load Process ................................................................................... 315
Managing the Load Process............................................................................................................... 316
Match Process................................................................................................................................................ 317
About the Match Process .................................................................................................................. 317
Match Data Flow................................................................................................................................. 319
Key Concepts for the Match Process .............................................................................................. 320
Run-Time Execution Flow of the Match Process ......................................................................... 329
Managing the Match Process............................................................................................................. 333
Consolidate Process...................................................................................................................................... 335
About the Consolidate Process......................................................................................................... 335
Consolidation Options ....................................................................................................................... 339
Consolidation and Workflow Integration ....................................................................................... 340
Managing the Consolidate Process................................................................................................... 341
Publish Process.............................................................................................................................................. 342
About the Publish Process ................................................................................................................ 342
Run-time Flow of the Publish Process ............................................................................................ 345
Managing the Publish Process........................................................................................................... 346
Contents xi
Chapter 12: Configuring Data Cleansing
Before You Begin.......................................................................................................................................... 406
About Data Cleansing in Siperian Hub ..................................................................................................... 406
Setup Tasks for Data Cleansing........................................................................................................ 406
Configuring Cleanse Match Servers ........................................................................................................... 407
About the Cleanse Match Server ...................................................................................................... 407
Starting the Cleanse Match Server Tool .......................................................................................... 409
Cleanse Match Server Properties ...................................................................................................... 410
Adding a New Cleanse Match Server .............................................................................................. 411
Editing Cleanse Match Server Properties........................................................................................ 412
Deleting a Cleanse Match Server ...................................................................................................... 413
Testing the Cleanse Match Server Configuration .......................................................................... 413
Using Cleanse Functions.............................................................................................................................. 414
About Cleanse Functions................................................................................................................... 414
Starting the Cleanse Functions Tool................................................................................................ 415
Overview of Configuring Cleanse Functions ................................................................................. 417
Configuring Cleanse Libraries........................................................................................................... 418
Configuring Regular Expression Functions.................................................................................... 422
Configuring Graph Functions........................................................................................................... 424
Testing Functions................................................................................................................................ 437
Using Conditions in Cleanse Functions .......................................................................................... 438
Configuring Cleanse Lists ............................................................................................................................ 440
About Cleanse Lists ............................................................................................................................ 440
Adding Cleanse Lists .......................................................................................................................... 441
Editing Cleanse List Properties......................................................................................................... 442
Contents xiii
Configuring Match Columns for Exact-match Base Objects ...................................................... 527
Configuring Match Rule Sets ...................................................................................................................... 531
About Match Rule Sets ...................................................................................................................... 531
Match Rule Set Properties ................................................................................................................. 534
Navigating to the Match Rule Set Tab............................................................................................. 537
Adding Match Rule Sets..................................................................................................................... 538
Editing Match Rule Set Properties ................................................................................................... 539
Renaming Match Rule Sets................................................................................................................ 541
Deleting Match Rule Sets................................................................................................................... 542
Configuring Match Column Rules for Match Rule Sets ......................................................................... 542
About Match Column Rules.............................................................................................................. 542
Match Rule Properties for Fuzzy-match Base Objects Only ....................................................... 544
Match Column Properties for Match Rules.................................................................................... 559
Requirements for Exact-match Columns in Match Column Rules............................................. 563
Command Buttons for Configuring Column Match Rules .......................................................... 564
Adding Match Column Rules............................................................................................................ 565
Editing Match Column Rules............................................................................................................ 570
Deleting Match Column Rules.......................................................................................................... 572
Changing the Execution Sequence of Match Column Rules ....................................................... 573
Specifying Consolidation Options for Match Column Rules....................................................... 574
Configuring the Match Weight of a Column.................................................................................. 575
Configuring Segment Matching for a Column ............................................................................... 576
Configuring Primary Key Match Rules...................................................................................................... 578
About Primary Key Match Rules...................................................................................................... 578
Adding Primary Key Match Rules.................................................................................................... 578
Editing Primary Key Match Rules.................................................................................................... 581
Deleting Primary Key Match Rules.................................................................................................. 582
Investigating the Distribution of Match Keys .......................................................................................... 583
About Match Keys Distribution ....................................................................................................... 583
Navigating to the Match Keys Distribution Tab ........................................................................... 584
Components of the Match Keys Distribution Tab........................................................................ 585
Filtering Match Keys .......................................................................................................................... 587
Excluding Records from the Match Process ............................................................................................ 590
Contents xv
Elements in an XML Message .......................................................................................................... 623
Filtering Messages ............................................................................................................................... 625
Example XML Messages ................................................................................................................... 625
Legacy JMS Message XML Reference ....................................................................................................... 644
Message Fields for Legacy XML....................................................................................................... 644
Filtering Messages for Legacy XML................................................................................................. 645
Example Messages for Legacy XML................................................................................................ 646
Contents xvii
Chapter 18: Writing Custom Scripts to Execute Batch Jobs
About Executing Siperian Hub Batch Jobs .............................................................................................. 750
Setting Up Job Execution Scripts............................................................................................................... 750
About Job Execution Scripts............................................................................................................. 750
About the C_REPOS_TABLE_OBJECT_V View ...................................................................... 751
Determining Available Execution Scripts ....................................................................................... 754
Retrieving Values from C_REPOS_TABLE_OBJECT_V at Execution Time ....................... 755
Running Scripts Asynchronously...................................................................................................... 755
Monitoring Job Results and Statistics ........................................................................................................ 755
Error Messages and Return Codes................................................................................................... 755
Job Execution Status .......................................................................................................................... 756
Stored Procedure Reference........................................................................................................................ 758
Alphabetical List of Batch Jobs ........................................................................................................ 758
Accept Non-matched Records As Unique .................................................................................... 760
Autolink Jobs ....................................................................................................................................... 762
Auto Match and Merge Jobs ............................................................................................................. 762
Automerge Jobs................................................................................................................................... 764
BVT Snapshot Jobs ............................................................................................................................ 765
Execute Batch Group Jobs................................................................................................................ 765
External Match Jobs ........................................................................................................................... 766
Generate Match Token Jobs ............................................................................................................. 767
Get Batch Group Status Jobs............................................................................................................ 769
Hub Delete Jobs.................................................................................................................................. 769
Key Match Jobs ................................................................................................................................... 773
Load Jobs.............................................................................................................................................. 775
Manual Link Jobs ................................................................................................................................ 777
Manual Merge Jobs ............................................................................................................................. 777
Manual Unlink Jobs ............................................................................................................................ 779
Manual Unmerge Jobs........................................................................................................................ 779
Match Jobs ........................................................................................................................................... 783
Match Analyze Jobs ............................................................................................................................ 785
Match for Duplicate Data Jobs......................................................................................................... 786
Multi Merge Jobs................................................................................................................................. 788
Contents xix
About the JMS Event Schema Manager Tool ................................................................................ 824
Starting the JMS Event Schema Manager Tool.............................................................................. 825
Generating and Deploying ORS-specific Schemas........................................................................ 827
Contents xxi
About Custom Java Cleanse Functions ........................................................................................... 915
How Custom Java Cleanse Functions Are Registered .................................................................. 915
Viewing Registered Custom Java Cleanse Functions .................................................................... 915
Viewing Custom Button Functions............................................................................................................ 916
About Custom Button Functions..................................................................................................... 916
How Custom Button Functions Are Registered ............................................................................ 917
Viewing Registered Custom Button Functions.............................................................................. 917
Contents xxiii
User Exits for the Unmerge Process................................................................................................ 964
Additional User Exits ......................................................................................................................... 965
Glossary ..............................................................................................................................................................993
Index.....................................................................................................................................................................1041
Welcome to the Siperian Hub™ Administrator Guide. This guide explains how to
administer, manage, and configure Siperian Hub.
Intended Audience
This guide is intended for Siperian Hub administrators. These are the IT people
responsible for configuring or updating a Hub Store so that it provides the rules and
functionality required by the data stewards. Administrators should have an excellent
knowledge of database administration.
Organization
This guide contains the following chapters:
xxv
Organization
Chapter 3, “About the Hub Describes the key components of the Hub Store: the Master
Store” Database and Operational Record Stores (ORS).
Chapter 4, “Configuring Explains how to configure Operational Record Stores (ORS)
Operational Record Stores and and datasources.
Datasources”
Chapter 5, “Building the Describes the Hub Store schema and provides instructions
Schema” on building the schema for your Siperian Hub
implementation.
Chapter 6, “Configuring Explains how to use and create Siperian Hub queries and
Queries and Packages” packages.
Chapter 7, “State Describes state management concepts and provides
Management” instructions for configuring state management in your
Siperian Hub implementation.
Chapter 8, “Configuring Explains how to configure Siperian Hierarchy Manager (HM)
Hierarchies” and describes how to create and configure relationships
based on foreign keys.
Part 3, “Configuring the Data Describes the flow of data through the Siperian Hub via a
Flow” series of processes (land, stage, load, match, consolidate, and
distribute), and provides instructions for configuring each
process using tools in the Hub Console.
Chapter 9, “Siperian Hub Describes the flow of data through the Siperian Hub via
Processes” batch processes, starting with the land process and
concluding with the distribution process.
Chapter 10, “Configuring the Describes the data landing process and explains how to
Land Process” configure source systems and landing tables.
Chapter 11, “Configuring the Describes the data staging process and explains how to
Stage Process” configure staging tables, mappings, and other settings in that
affect Stage jobs.
Chapter 12, “Configuring Data Explains how to configure data cleansing rules that are run
Cleansing” during Stage jobs.
Chapter 13, “Configuring the Explains how to use the load process, and how to define
Load Process” trust and validation rules.
Chapter 14, “Configuring the Explains how to configure your Hub Store to match data.
Match Process”
Chapter 15, “Configuring the Explains how to configure your Hub Store to consolidate
Consolidate Process” data.
Chapter 16, “Configuring the Explains how to configure Siperian Hub to write changes to
Publish Process” a message queue.
Part 4, “Executing Siperian Describes how to use Hub Console tools to run Siperian
Hub Processes” Hub processes via batch jobs, and how to use third-party job
management tools to schedule and manage Siperian Hub
processes via stored procedures.
Chapter 17, “Using Batch Explains how to use the Siperian Hub batch jobs and batch
Jobs” groups.
Chapter 18, “Writing Custom Explains how to schedule Siperian Hub batch jobs using job
Scripts to Execute Batch Jobs” execution scripts.
Part 5, “Configuring Describes how to use Hub Console tools to configure
Application Access” Siperian Hub client applications that access Siperian Hub
using Services Integration Framework (SIF) requests.
Chapter 19, “Generating Describes how to generate ORS-specific SIF APIs using the
ORS-specific APIs and SIF Manager tool in the Hub Console.
Message Schemas”
Chapter 20, “Setting Up Explains how to set up security for users who will access
Security” Siperian Hub resources via the Hub Console or third-party
applications.
Chapter 21, “Viewing Explains how to register custom code using the User Object
Registered Custom Code” Registry tool in the Hub Console.
Chapter 22, “Auditing Siperian Describes how to set up auditing and debugging in the Hub
Hub Services and Events” Console.
Part 6, “Appendixes” Describes other administration-related topics.
Appendix A, “Configuring Describes how to configure different character sets for
International Data Support” internationalization purposes.
Appendix B, “Backing Up and Explains how to back up and restore a Siperian Hub
Restoring Siperian Hub” implementation.
Appendix C, “Configuring Explains how to configure user exits, which are
User Exits” user-customized, unencrypted stored procedures that are
configured to execute at a specific point during batch job
execution.
Appendix D, “Viewing Explains how to view details of your Siperian Hub
Configuration Details” implementation using the Enterprise Manager tool in the
Hub Console.
xxvii
Learning About Siperian Hub
Appendix E, “Implementing Explains how to add custom buttons to tools in the Hub
Custom Buttons in Hub Console that allow users to invoke external services on
Console Tools” demand.
Appendix F, “Configuring Describes how to grant or revoke user access to tools in the
Access to Hub Console Tools” Hub Console using the Tool Access tool.
Glossary Defines Siperian Hub terminology.
What’s New in Siperian Hub describes the new features in this Siperian Hub release.
The Siperian Hub Release Notes contain important information about this Siperian Hub
release. Installers should read the Siperian Hub Release Notes before installing Siperian
Hub.
The Siperian Hub Overview introduces Siperian Hub, describes the product architecture,
and explains core concepts that all users need to understand before using the product.
The Siperian Hub Installation Guide explains to installers how to set up Siperian Hub,
the Hub Store, Cleanse Match Servers, and other components. There is a Siperian Hub
Installation Guide for each supported platform.
The Siperian Hub Cleanse Adapter Guide explains to installers how to configure Siperian
Hub to use the supported adapters and cleanse engines.
The Siperian Hub Data Steward Guide explains to data stewards how to use Siperian Hub
tools to consolidate and manage their organization's data. After reading the Siperian
Hub Overview, data stewards should read the Siperian Hub Data Steward Guide.
The Siperian Hub Administrator Guide explains to administrators how to use Siperian
Hub tools to build their organization’s data model, configure and execute Siperian Hub
data management processes, set up security, provide for external application access to
Siperian Hub functionality and resources, and other customization tasks. After reading
the Siperian Hub Overview, administrators should read the Siperian Hub Administrator
Guide.
The Siperian Services Integration Framework Guide explains to developers how to use
the Siperian Hub Services Integration Framework (SIF) to integrate Siperian Hub
functionality with their applications, and how to create applications using the data
provided by Siperian Hub. SIF allows developers to integrate Siperian Hub smoothly
with their organization's applications. After reading the Siperian Hub Overview,
developers should read the Siperian Services Integration Framework Guide.
The Siperian Hub Metadata Manager Guide explains how to use the Siperian Hub
Metadata Manager tool to validate their organization’s metadata, promote changes
between repositories, import objects into repositories, export repositories, and related
tasks.
The Siperian Hub Resource Kit Guide explains how to install and use the Siperian Hub
Resource Kit, which is a set of utilities, examples, and libraries that assist developers
with integrating the Siperian Hub into their applications and workflows. This
xxix
Learning About Siperian Hub
document provides a description of the various sample applications that are included
with the Resource Kit.
The Siperian Hub Insight Manager Guide explains how to install, configure, and use the
Siperian Hub Insight Manager to generate reporting metadata for the data managed
in the Hub Store. It provides a description of how to use this reporting metadata with
third-party reporting tools to create reports and metrics for this data.
Contacting Siperian
Technical support is available to answer your questions and to help you with any
problems encountered using Siperian products. Please contact your local Siperian
representative or distributor as specified in your support agreement. If you have a
current Siperian Support Agreement, you can contact Siperian Technical Support:
We are interested in hearing your comments about this book. Send your comments to:
by Email: [email protected]
by Postal Service: Documentation Manager
Siperian, Inc.
100 Foster City Blvd.
2nd Floor
Foster City, California 94404 USA
xxxi
Contacting Siperian
Contents
• Chapter 1, “Introduction”
• Chapter 2, “Getting Started with the Hub Console”
1
2 Siperian Hub Administrator Guide
1
Introduction
Note: This document assumes that you have read the Siperian Hub Overview and have a
basic understanding of Siperian Hub architecture and key concepts.
Chapter Contents
• About Siperian Hub Administrators
• Phases in Siperian Hub Administration
• Summary of Administration Tasks
3
About Siperian Hub Administrators
For an introduction to using the Hub Console, see Chapter 2, “Getting Started with
the Hub Console.”
This section describes typical phases in Siperian Hub administration. These phases may
vary for your Siperian Hub implementation based on your organization’s methodology.
Startup Phase
The startup phase involves installing and configuring core Siperian Hub components:
Hub Store, Hub Server, Cleanse Match Server(s), and cleanse adapters. For instructions
on installing the Hub Store, Hub Server, and Cleanse Match Servers, see the Siperian
Hub Installation Guide for your platform. For instructions on setting up a cleanse
adapter, see the Siperian Hub Cleanse Adapter Guide.
Note: The instructions in this document assume that you have already completed the
startup phase and are ready to begin configuring your Siperian Hub implementation.
Configuration Phase
After Siperian Hub has been installed and set up, administrators can begin configuring
and testing Siperian Hub functionality—the data model and other objects in the Hub
Store, data management processes, external application access, and so on. This phase
involves a dynamic, iterative process of building and testing Siperian Hub functionality
to meet the stated requirements of an organization. The bulk of the material in this
document refers to tasks associated with the configuration phase.
After a schema has been sufficiently built and the Siperian Hub has been properly
configured, developers can build external applications to access Siperian Hub
functionality and resources. For instructions on developing external applications, see
the Siperian Services Integration Framework Guide.
Production Phase
After a Siperian Hub implementation has been sufficiently configured and tested,
administrators deploy the Siperian Hub in a production environment. In addition to
managing ongoing Siperian Hub operations, this phase can involve performance tuning
to optimize the processing of actual business data.
Introduction 5
Summary of Administration Tasks
Setting Up Security
In this document, Chapter 20, “Setting Up Security,” describes the tasks associated
with setting up security in a Siperian Hub implementation. Setup tasks vary depending
on the particular security requirements of your Siperian Hub implementation, as
described in “Security Implementation Scenarios” on page 836. Additional security
tasks are involved if external applications access your Siperian Hub implementation
using Services Integration Framework (SIF) requests. For more information, see
“About Setting Up Security” on page 832, “Summary of Security Configuration Tasks”
on page 838, and “Configuration Tasks For Security Scenarios” on page 839.
To configure security for a Siperian Hub implementation using Siperian Hub’s internal
security framework, you complete the following tasks using tools in the Hub Console:
High-Level Tasks for Setting Up Security
Task Usage
“Managing the Global Password Required to define global password policies for all users
Policy” on page 877. according to your organization’s security policies and
procedures.
“Configuring Siperian Hub Required to define user accounts for users to access Siperian
Users” on page 866 Hub resources.
“Assigning Users to the Current Required to provide users with access to the database(s)
ORS Database” on page 886 they need to use.
“Configuring User Groups” on Optional. To simplify security configuration tasks by
page 881 configuring user groups and assign users.
“Securing Siperian Hub Required in order to selectively and securely expose Siperian
Resources” on page 841 Hub resources to external applications.
“Configuring Roles” on page Required to define roles and assign resource privileges to
854 them.
“Assigning Roles to Users and Required to assign roles to users and (optionally) user
User Groups” on page 887 groups.
Introduction 7
Summary of Administration Tasks
To configure the land process for a base object, see “Land Process” on page 292,
“Configuring the Land Process” on page 347, and the following topics:
High-Level Tasks for Configuring the Land Process
Task Usage
“Configuring Source Systems” Required to define a unique name internal name for each
on page 348 source system (external applications or systems that provide
data to Siperian Hub). For more information, see “About
Source Systems” on page 348.
“Configuring Landing Tables” Required to create landing tables, which provide
on page 355 intermediate storage in the flow of data from source
systems into Siperian Hub. For more information, see
“About Landing Tables” on page 355.
To configure the stage process for a base object, see “Stage Process” on page 295,
“Configuring the Stage Process” on page 363, and the following topics:
High-Level Tasks for Configuring the Stage Process
Task Usage
“Configuring Staging Tables” Required to create staging tables, which provide temporary,
on page 364 intermediate storage in the flow of data from landing tables
into base objects and dependent objects via load jobs.
To learn more, see “About Staging Tables” on page 364.
Introduction 9
Summary of Administration Tasks
To configure the load process for a base object, see “Load Process” on page 299,
“Configuring the Load Process” on page 453, and the following topics:
High-Level Tasks for Configuring the Load Process
Task Usage
“Configuring Trust for Source Used when multiple source systems contribute data to a
Systems” on page 455 column in a base object. Required if you want to designate
the relative trust level (confidence factor) for each
contributing source system. For more information, see
“About Trust” on page 455.
“Configuring Validation Rules” Required if you want to use validation rules to downgrade
on page 468 trust scores for cell data based on configured conditions.
For more information, see “About Validation Rules” on
page 468.
To configure the match process for a base object, see “Match Process” on page 317,
“Configuring the Match Process” on page 483, and the following topics:
High-Level Tasks for Configuring the Match Process
Task Usage
“Configuring Match Properties Required for each base object that will be involved in
for a Base Object” on page 488 mapping. For more information, see “Match Properties” on
page 490.
“Configuring Match Paths for Required for match column rules involving related records
Related Records” on page 497 in either separate tables or in the same table. For more
information, see “About Match Paths” on page 497.
“Configuring Match Columns” Required to specify the base object columns to use in match
on page 515 column rules. For more information, see “About Match
Columns” on page 515.
“Configuring Match Rule Sets” Required if you want to use match rule sets to execute
on page 531 different sets of match column rules at different stages in
the match process. For more information, see “About
Match Rule Sets” on page 531.
“Configuring Match Column Required to specify match column rules that determine
Rules for Match Rule Sets” on whether two records for a base object are similar enough to
page 542 consolidate. For more information, see “About Match
Column Rules” on page 542.
“Configuring Primary Key Required to specify the base object columns (primary keys)
Match Rules” on page 578 to use in primary key match rules. For more information,
see “About Primary Key Match Rules” on page 578.
“Investigating the Distribution Useful for investigating the distribution of generated match
of Match Keys” on page 583 keys upon completion of the match process. For more
information, see “About Match Keys Distribution” on page
583.
“Configuring Match Settings for Required for configuring matches involving non-US
Non-US Populations” on page populations and multiple populations.
941
Introduction 11
Summary of Administration Tasks
To configure the consolidation process for a base object, see “Consolidate Process” on
page 335 and “Configuring the Consolidate Process” on page 593.
To configure the publish process for a base object, see “Publish Process” on page 342,
“Configuring the Publish Process” on page 601, and the following topics:
High-Level Tasks for Configuring the Publish Process
Task Usage
“Configuring Global Message Required to specify global settings for all message queues
Queue Settings” on page 604 involving outbound Siperian Hub messages.
“Configuring Message Queue Required to set up one or more message queue servers that
Servers” on page 605 Siperian Hub will use for incoming and outgoing messages.
The message queue server must already be defined in your
application server environment according to the application
server instructions. For more information, see “About
Message Queue Servers” on page 605.
“Configuring Outbound Required to set up one or more outbound message queues
Message Queues” on page 608 for a message queue server. For more information, see
“About Message Queues” on page 608.
“Configuring Message Triggers” Required for configuring message triggers for a base object.
on page 612 Message queue triggers identify which actions within
Siperian Hub are communicated to outside applications via
messages in message queues. For more information, see
“About Message Triggers” on page 612.
To execute Siperian Hub processes using tools in the Hub Console, see “About
Siperian Hub Batch Jobs” on page 668, “Using Batch Jobs” on page 667, and the
following topics:
High-Level Tasks for Executing Siperian Hub Process in the Hub Console
Task Usage
“Running Batch Jobs Using the Required if you want to run individual batch jobs from the
Batch Viewer Tool” on page Hub Console using the Batch Viewer tool. For more
674 information, see “Batch Viewer Tool” on page 674.
“Running Batch Jobs Using the Required if you want to run batch jobs in a group from the
Batch Group Tool” on page 688 Hub Console, allowing you to configure the execution
sequence for batch jobs and to execute batch jobs in
parallel. For more information, see “About Batch Groups”
on page 688.
To execute and manage Siperian Hub stored procedures on a scheduled basis (using
job management tools that control IT processes), see “About Executing Siperian Hub
Introduction 13
Summary of Administration Tasks
Batch Jobs” on page 750, Chapter 18, “Writing Custom Scripts to Execute Batch Jobs,”
and the following topics:
High-Level Tasks for Executing Siperian Hub Processes Using Job Management Tools
Task Usage
“Setting Up Job Execution Required for writing job execution scripts for job
Scripts” on page 750 management tools. For more information, see “About Job
Execution Scripts” on page 750 and “About the C_
REPOS_TABLE_OBJECT_V View” on page 751.
“Monitoring Job Results and Required for determining the execution results of job
Statistics” on page 755 execution scripts. For more information, see “Error
Messages and Return Codes” on page 755 and “Job
Execution Status” on page 756.
“Executing Batch Groups Using Required for executing batch jobs in groups via stored
Stored Procedures” on page 798 procedures using job scheduling software (such as Tivoli,
CA Unicenter, and so on). For more information, see
“About Executing Batch Groups” on page 798.
“Developing Custom Stored Required for create, registering, and running custom stored
Procedures for Batch Jobs” on procedures for batch jobs. For more information, see
page 806 “About Custom Stored Procedures” on page 806.
Configuring Hierarchies
If your Siperian Hub implementation uses Hierarchy Manager to manage hierarchies,
you need to configure hierarchies and their related objects, including entity icons, entity
objects and entity types, relationship base objects (RBOs) and relationship types,
Hierarchy Manager profiles, and Hierarchy Manager packages. For more information,
see Chapter 8, “Configuring Hierarchies.”
Introduction 15
Summary of Administration Tasks
This chapter introduces the Hub Console and provides a high-level overview of the
tools involved in configuring your Siperian Hub implementation.
Chapter Contents
• About the Hub Console
• Starting the Hub Console
• Navigating the Hub Console
• Siperian Hub Workbenches and Tools
17
About the Hub Console
Note: The available tools in the Hub Console depend on your Siperian license
agreement. Therefore, your Hub Console tool might differ from the previous figure.
where YourHubHost is your local Siperian Hub host and port is the port number.
Check with your administrator for the correct port number.
Note: You must use an HTTP connection to start the Hub Console. SSL
connections are not supported.
The Siperian Hub launch screen is displayed.
The first time (only) that you launch Hub Console from a client machine, Java Web
Start downloads application files and displays a progress bar.
After you have logged in with a valid user name and password, Siperian Hub will
prompt you to choose a target database—the Master Database or an Operational
Record Store(ORS) with which to work.
The list of databases to which you can connect is determined by your security
profile.
• The Master Database stores Siperian Hub environment configuration
settings—user accounts, security configuration, ORS registry, message queue
settings, and so on. A given Siperian Hub environment can have only one
Master Database.
• An Operational Record Store (ORS) stores the rules for processing the
master data, the rules for managing the set of master data objects, along with
the processing rules and auxiliary logic used by the Siperian Hub in defining
the best version of the truth (BVT). A Siperian Hub configuration can have
one or more ORS databases.
Throughout the Hub Console, an icon next to an ORS indicates whether it has
been validated and, if so, whether the most recent validation resulted in issues.
Image Meaning
Unknown. ORS has not been validated since it was initially created, or since the
last time it was updated.
ORS has been validated with no issues. No change has been made to the ORS
since the validation process was made.
ORS has been validated with warnings.
The Hub Console screen is displayed, as shown in the following example (in which
the Schema Manager is selected from the Model workbench).
Menu
When you select a tool from the Workbenches page or start a process from the
Processes page, the window is typically divided into several panes:
Pane Description
Workbenches Displays one of the following:
/ Processes
• List of workbenches and tools to which you have access (as shown in the
previous figure).
• List of the steps in the process that you are running.
Note: The workbenches and tools that you see depends on what your
company has purchased, as well as to what your administrator has given you
access. If you do not see a particular workbench or tool when you log into the
Hub Console, then your user account has not been assigned permission to
access it.
Navigation Allows you to navigate items (a list of objects) in the current tool.
Tree For example, in the Schema Manager, the middle pane contains a list of
schema objects (base objects, landing tables, and so on).
Pane Description
Properties Shows details (properties) for the selected item in the navigation tree, and
Panel possibly other panels if available in the current tool. Some of the properties
might be editable.
Pane Description
By Workbenches Similar tools are grouped together by workbench—a logical collection of
related tools.
By Process Tools are grouped into a logical workflow that walks you through the
tools and steps required for completing a task.
You can click the tabs at the left-most side of the Hub Console window to toggle
between the Processes and Workbenches views.
Note: When you log into Siperian Hub, you see only those workbenches and processes
that contain the tools that your Siperian Hub security administrator has authorized you
to use. The screen shots in this document show the full set of workbenches, processes,
and tools available.
Workbenches View
Utilities Workbench
Tools in the
Utilities Workbench
The workbench names and tool descriptions are metadata-driven, as is the way
in which tools are grouped. It is possible to have customized tool groupings.
Therefore, the arrangement of tools and workbenches that you see after you log in to
Hub Console might differ somewhat from the previous figure.
Processes View
Hub Console displays a list of available processes on the Processes tab. Tools are
organized into common sequences or processes, as shown in the following example.
Available Processes
Processes step you through a logical sequence of tools to complete a specific task.
The same tool can belong to several processes, and can appear many times in one
process.
Types of Locks
In the Hub Console, the Write Lock menu provides two types of locks:
Note: The data steward tools—Data Manager, Merge Manager, and Hierarchy
Manager—do not require write locks. For more information about these tools, see the
Siperian Hub Data Steward Guide. The Audit Manager does not require write locks, either.
The Hub Console takes care of refreshing the lock every 60 seconds on the current
connection. The user can manually release a lock according to the instructions in
“Releasing a Lock” on page 30. If a user switches to a different database while holding
a lock, then the lock is automatically released. If the Hub Console is terminated, then
the lock expires after one minute.
When no locks are in effect in the Hub Console, the Hub Server caches metadata and
other configuration settings for performance reasons. As soon as a Hub Console user
acquires a write lock or exclusive lock, caching is disabled, the cache is emptied, and
Siperian Hub retrieves this information from the database instead. When all locks are
released, caching is enabled again.
Write locks allow multiple users to edit data in the Hub Console at the same time.
However, write locks do not prevent those users from editing the same data at the time
time. In such cases, the most recently-saved changes prevail.
• If the lock has already been acquired by someone else, then the login name and
machine address of that person is displayed.
• If the ORS in production mode, then a message is displayed explaining that
you cannot acquire the lock.
• If the lock is acquired successfully, then the tools are in read-write mode.
Multiple users can have a write lock per ORS or in the Master Database.
2. When you are finished, you can explicitly release the write lock according to the
instructions in “Releasing a Lock” on page 30.
Releasing a Lock
Clearing Locks
You can force the release of any locks—write or exclusive locks—held by other users.
You might want to do this, for example, to obtain an exclusive lock on the ORS.
Because other users are not warned to save changes before their write locks are
released, you should use this only when necessary.
To change the target database in the Hub Console, do one of the following.
1. On the status bar, click the database name.
Hub Console prompts you to choose a target database with which to work.
For a description of the types of databases that you can select, see “Starting the
Hub Console” on page 19.
2. Select the Master Database or the ORS to which you want to connect.
3. Click Connect.
Each named object is represented as a node in the hierarchy tree. A node that contains
other nodes is called a parent node. A node that belongs to a parent node is called a child
node.
In the following example in the Schema Manager, the Address base object is the parent
node to the associated child nodes (Columns, Cross-Reference, Dependent Objects,
and so on).
Parent Node
(Address Base Object)
Child Nodes
(of Address)
Tree Options
The display name is the name of an object as it appears in the navigation tree. You can
change the order in which the objects are displayed in the navigation tree by clicking
Sort By in the tree options area and selecting the appropriate sort option.
Filtering Items
You can filter the items shown in the navigation tree by clicking the Filter area at the
bottom of the left pane and selecting the appropriate filter option. The figures in this
section are from the Schema Manager, but the sample principles apply to other Hub
Console tools for which filtering is available.
If you choose Table type, you click the down arrow to display a list of table types
from which to select for your filter.
Select a Type
• If you choose Table, you click the down arrow to display a list of tables from which
to select for your filter.
Select a Table
For example, in the Schema Manager, you can choose tables based on either the
table type or table name. When you choose Some Items, the Hub Console displays
the Define Item Filter button above the navigation tree.
• Select the item(s) that you want to include in the filter, and then click OK.
Note: Use the No Filter (All Items) option to remove the filter.
Certain Hub Console tools show a View or View By area below the navigation tree.
• In the Schema Manager, you can show or hide the public Siperian Hub items by
clicking the View area below the navigation tree and choosing the appropriate
command.
• In the Mappings tool, you can view items by mapping, staging table, or landing
table.
• In the Packages tool, you can view items by package or by table.
• In the Users and Groups tool, you can display sub groups and sub users. In the
Batch Viewer, you can group jobs by table, date, or procedure type.
When there is no filter, or when the Some Items filter is selected, Hub Console displays
a Find area above the navigation tree so that you can search for items by name.
For example, in the Schema Manager, you can search for tables and columns.
1. Click anywhere in the Find area to display the Find window.
2. Type the name (or first few letters of the name) that you want to find.
3. Click the F3 - Find button.
The Hub Console highlights the matched item(s). In the following example, the
Schema Manager displays the list of tables and highlights the table matches the find
criteria:
For example, in the Schema Manager, you can right-click on certain types of objects in
the navigation tree to see a popup menu of the commands available for the selected
object.
Popup Menu
Command Buttons
If you have access to create, modify, or delete objects in a Hub Console window, and if
you have acquired a write lock (“Acquiring a Write Lock” on page 30), you might see
some or all of the following command buttons in the Properties panel. There are other
command buttons as well.
Edit Edit a property for the selected item in the Properties panel. Indicates that
the property is editable.
Delete Remove the selected item.
The following figure shows an example of command buttons on the right side of the
properties panel for the Secure Resources tool.
Command Buttons
To see a description about what a command button does, hold the mouse over the
button to display a tooltip, as shown in the following example.
Tooltip
Adding Objects
To add an object:
1. Acquire a write lock.
4. Click OK.
2. In the Hub Console tool, select the object whose properties you want to edit.
3. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
4. Click the Save button to save your changes.
Removing Objects
To remove an object:
1. Acquire a write lock.
2. In the Hub Console tool, select the object that you want to remove.
3. Click the Remove button.
4. If prompted to confirm deletion, choose the appropriate option (OK or Yes) to
confirm deletion.
3. Click Close.
4. Click Close.
Contents
• Chapter 3, “About the Hub Store”
• Chapter 4, “Configuring Operational Record Stores and Datasources”
• Chapter 5, “Building the Schema”
• Chapter 6, “Configuring Queries and Packages”
• Chapter 7, “State Management”
• Chapter 8, “Configuring Hierarchies”
53
54 Siperian Hub Administrator Guide
3
About the Hub Store
The Hub Store is where business data is stored and consolidated in Siperian Hub.
The Hub Store contains common information about all of the databases that are part
of your Siperian Hub implementation.
Chapter Contents
• Databases in the Hub Store
• How Hub Store Databases Are Related
• Creating Hub Store Databases
• Version Requirements
55
Databases in the Hub Store
Element Description
Master Database Contains the Siperian Hub environment configuration settings—user
accounts, security configuration, ORS registry, message queue
settings, and so on. A given Siperian Hub environment can have only
one Master Database. The default name of the Master Database is
CMX_SYSTEM.
In the Hub Console, the tools in the Configuration workbench
(Databases, Users, Security Providers, Tool Access, and Message
Queues) manage configuration settings in the Master Database.
Operational Record Database that contains the master data, content metadata, the rules
Store (ORS) for processing the master data, the rules for managing the set of
master data objects, along with the processing rules and auxiliary
logic used by the Siperian Hub in defining the best version of the
truth (BVT). A Siperian Hub configuration can have one or more
ORS databases. The default name of an ORS is CMX_ORS.
Users for Hub Store databases are created globally—within the Master Database—and
then assigned to specific ORSs. The Master Database also stores site-level information,
such as the number of incorrect log-in attempts allowed before a user account is locked
out.
You can access and manage multiple ORSs from one Master Database. The Master
Database stores the connection settings and properties for each ORS.
Note: An ORS can be registered in only one Master Database. Multiple Master
Databases cannot share the same ORS. A single ORS cannot be associated with
multiple Master Databases.
To learn more, see the Siperian Hub Installation Guide for your platform.
Version Requirements
Different versions of the Siperian Hub cannot operate together in the same
environment. All components of your installation must be the same version, including
the Siperian Hub software and the databases in the Hub Store.
If you want to have multiple versions of Siperian Hub at your site, you must install each
version in a separate environment. If you try to work with a different version of a
database, you will receive a message telling you to upgrade the database to the current
version.
This chapter describes how to configure Operational Record Store (ORS) and
datasources for the Hub Store using the Databases tool in the Hub Console.
Chapter Contents
• Before You Begin
• About the Databases Tool
• Starting the Databases Tool
• Configuring Operational Record Stores
• Configuring Datasources
59
Before You Begin
Registered
ORSs
ORS
Properties
Column Description
Number of databases Number of ORSs currently defined in the Hub Store.
Database List List of registered Siperian Hub ORSs.
Database Properties Database properties for the selected ORS.
Registering an ORS
Note: Registering an ORS will fail if you try to register an ORS that does not contain
the Siperian Hub repository objects or Siperian Hub procedures.
To register an ORS:
1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the button.
The Databases tool displays the Register Database dialog box. By default, the
selected database type is Oracle.
4. If you are registering a DB2 database, select DB2 in the Database type drop-down
list.
The Databases tool displays the Register Database dialog box for a DB2 database.
5. Specify the following settings. Note that Oracle and DB2 have slightly different
settings.
Note: The Schema Name and the User Name are both the name of the ORS that
was specified in the script used to create the ORS. If you need this information,
consult your database administrator.
Property Description
Identity
Database Display Name for this ORS as it will be displayed in the Hub Console.
Name
Machine Identifier Prefix given to keys to uniquely identify records from this
instance of the Hub Store.
Connection
Properties
Database type One of the following values: Oracle or DB2.
Database hostname Oracle only. IP address or name (if supported on your network)
of the server hosting the Oracle database.
Database server name DB2 only. IP address or name (if supported on your network)
of the database server.
Oracle SID Oracle only. Oracle System Identifier (SID) that refers to the
instance of the Oracle database running on the server.
Database name DB2 only. Name of the DB2 database.
Note: The DB2 database needs to be cataloged via the DB2
client on the application server machine.
Port One of the following settings:
• Oracle: The TCP port of the Oracle listener running on the
Oracle database server. The Oracle installation default is
1521.
• DB2: The TCP port on which the database server listens
for connections. The DB2 installation default is 50000.
Oracle TNS Name Oracle only. Name by which the database is known on your
network as defined in the application server’s TNSNAMES.ORA
file. For example:
mydatabase.mycompany.com
This value is set when you install Oracle. See your Oracle
documentation to learn more about this name.
Schema Name Name of the ORS.
Property Description
User Name User name for the ORS. By default, this is the user name that
was specified in the script used to create the ORS. This user
owns all of the ORS database objects in the Hub Store.
If a proxy user has been configured for this ORS, then you can
specify the proxy user instead. For instructions on running of
the setup_ors.sql script and defining proxy users, see the Siperian
Hub Installation Guide.
Password Password associated with the User Name for the ORS.
• For Oracle, this password is case-insensitive.
• For DB2, this password is case-sensitive.
By default, this is the password associated with the user name
that was specified in the script used to create the ORS.
If a proxy user has been configured for this ORS, then you
specify the password for the proxy user instead. For instructions
on running of the setup_ors.sql script and defining proxy users,
see the Siperian Hub Installation Guide.
Create datasource after Check (select) to create the datasource on the application server
registration after registration. For WebLogic users, you will need to specify
the WebLogic username and password.
6. If you want to create the datasource on the application server after registration,
check (select) the Create datasource after registration check box.
Siperian Hub uses the datasources provided by the application server and,
therefore, does not write any data to the ORS at the time of registration.
Note for WebLogic: If you are using WebLogic, a dialog box prompts you for
your username and password. This process writes only to the Master Database.
The ORS and datasource need not be available at registration time.
If you do not check this option, then you will need to manually configure the
datasource, as described in “Configuring Datasources” on page 77.
7. Click OK.
8. Test your database connection settings. To learn more, see “Testing ORS
Connections” on page 71.
Note: When you register an ORS that has been used elsewhere, and if the ORS
already has Cleanse Match Servers registered and no other servers get registered,
then you need to re-register one of the Cleanse Match Servers. This updates the
data in c_repos_db_release.
DB2 Settings
8. Test your updated database connection settings. To learn more, see “Testing ORS
Connections” on page 71.
Property Description
Database Type Oracle or DB2
Database ID Identification for the ORS. This ID is used in SIF requests.
The database ID lookup is case-sensitive.
The format for the database ID is:
jdbc/siperian-hostname-sid-databasename
Example:
jdbc/siperian-aiz01-aix01-cmx_ors-ds
When registering a new ORS, the host, server, and database
names are normalized.
• Host name is converted to lowercase.
• Database name is converted to uppercase (the standard for
schemas, tables, etc.).
The normalization of each field can be done on a
database-specific basis so that it can be changed if needed.
JNDI Datasource Displays the datasource JNDI name for the selected ORS.
Name This is the JNDI name that is configured for this JDBC
connection on the application server.
Machine Identifier Prefix given to keys to uniquely identify records from this
instance of the Hub Store.
GETLIST Limit Limits the number of records returned through SIF search
(records) requests, such as searchQuery, searchMatch, getLookupValues,
and so on.
Property Description
Production Mode Specifies whether this ORS is in production mode.
• If not enabled (unchecked, the default), production mode is
disabled, allowing authorized users to edit metadata for this
ORS in the Hub Console.
• If enabled (checked), then production mode is enabled.
Users cannot make changes to the metadata for this ORS.
If a user attempts to acquire a write lock on an ORS in
production mode, the Hub Console will display a message
explaining that the lock cannot be obtained.
Note: Only Siperian Hub administrator users can change this
setting.
For more information, see “Changing an ORS to Production
Mode” on page 75.
4. To change a property, click the button next to it, and edit the property.
5. Click the Save button to save your changes.
If production mode is enabled for an ORS, then the Databases tool displays a lock
icon next to it in the list.
Changing Passwords
To change passwords for the Master Database or an ORS, you need to make changes
first on your database server and possibly on your application server as well.
2. Log into the administration console for your application server and edit the
datasource connection information, specifying the new password for CMX_
SYSTEM, and then saving your changes.
Option One
1. On your database server, change the password for the ORS schema.
2. Start the Hub Console and select Master Database as the target database. To learn
more, see “Changing the Target Database” on page 31.
3. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
4. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
5. Select the ORS that you want to configure.
6. Click the button.
The Databases tool displays the Update Database Registration dialog box for the
selected ORS.
7. Enter the new password in the Password text box.
8. Check (select) the Update datasource after registration check box.
9. Click OK.
10. Test your updated database connection settings. To learn more, see “Testing ORS
Connections” on page 71.
Option Two
1. On your database server, change the password for the ORS schema.
2. Start the Hub Console and select Master Database as the target database. To learn
more, see “Changing the Target Database” on page 31.
3. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
4. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
5. Select the ORS that you want to configure.
6. In the Database Properties panel, make a note of the JNDI Datasource Name for
the selected ORS.
7. Log into the administration console for your application server and edit the
datasource connection information for this ORS, specifying the new password for
the noted JDNI Datasource name, and then saving your changes.
Encrypting Passwords
In order to successfully change the schema password, you must change it in the data
sources defined in the application server. This password is not encrypted, because the
application server protects it. In addition to updating the data sources on the
application server, Siperian requires that the password to be encrypted and stored in
various tables.
To encrypt the new password, execute the following command from the prompt:
java -classpath siperian-common.jar
com.siperian.common.security.Blowfish
For example, if admin is your new password, then the command would be:
java -classpath siperian-common.jar
com.siperian.common.security.Blowfish admin
Plaintext Password: admin
Encrypted Password: A75FCFBCB375F229
Execute the following commands to update the passwords for your ORS and Master
Database:
User-name and passwords that can be changed when installing/configuring the MRM:
• The CMX_SYSTEM user should not be changed.
• The CMX_SYSTEM password can be changed after the MRM is installed. You
need to change the password for the CMX user in Oracle, and you need to set the
same password in the datasource on the application server.
• The CMX_ORS user and password can be changed when the setup_ors.sql is run.
You need to use the same password when registering the ORS in the Hub Console.
4. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
5. Select the ORS that you want to configure.
The Databases tool displays the database properties for the selected ORS.
6. Change the setting of the Production Mode check box, as described in “Editing
ORS Properties” on page 69.
Select (check) the check box to enable production mode, or clear (uncheck) it to
disable it.
7. Click the Save button to save your changes.
Unregistering an ORS
Unregistering an ORS removes the connection information to this ORS from the
Master Database and removes the datasource definition from the application server
environment.
To unregister an ORS:
1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Select the ORS that you want to unregister.
4. Click the button.
Note: If you are running WebLogic, enter the WebLogic user name and password
when prompted.
The Databases tool prompts you to confirm unregistering the ORS.
5. Click Yes.
Configuring Datasources
This section describes how to configure datasources for an ORS. Every ORS requires a
datasource definition in the application server environment.
About Datasources
In Siperian Hub, a datasource specifies properties for an ORS, such as the location of the
database server, the name of the database, the database user ID and password, and so
on. A Siperian Hub datasource points to a JDBC resource defined in your application
server environment. To learn more about JDBC datasources, see your application
server documentation.
Creating Datasources
You might need to explicitly create a datasource if, for example, you created an ORS
using a different application server, or if you did not check (select) the Create
datasource after registration check box when registering the ORS.
To create a datasource:
1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Right-click the ORS in the Databases list, and then choose Create Datasource.
Note: If you are running WebLogic, enter the WebLogic user name and password
when prompted.
The Databases tool creates the datasource and displays a progress message.
4. Click OK.
Removing Datasources
If you have registered an ORS with a configured datasource, you can use the Databases
tool to manually remove its datasource definition from your application server. After
removing the datasource definition, however, the ORS will still appear in Hub Console.
To completely remove a database from the Hub Console, you need to unregister it (see
“Unregistering an ORS” on page 76).
To remove a datasource:
1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Right-click an ORS in the Databases list, and then choose Remove Datasource.
Note: If you are running WebLogic, enter the WebLogic user name and password
when prompted.
The Databases tool removes the datasource and displays a progress message.
4. Click OK.
This chapter explains how to design and build your schema in Siperian Hub.
Chapter Contents
• Before You Begin
• About the Schema
• Starting the Schema Manager
• Configuring Base Objects
• Configuring Dependent Objects
• Configuring Columns in Tables
• Configuring Foreign-Key Relationships Between Base Objects
• Viewing Your Schema
81
Before You Begin
Note: The process of designing the schema for your Siperian Hub implementation is
outside the scope of this document. It is assumed that you have developed a data
model—using industry-standard data modeling methodologies—that is based on a
thorough understanding of your organization’s requirements and in-depth knowledge
of the data you are working with.
The Siperian schema is a flexible, repository-driven model that supports the data
structure of any vertical business sector. The Hub Store is the database that underpins
Siperian Hub and provides the foundation of Siperian Hub’s functionality. Every
Siperian Hub installation has a Hub Store, which includes one Master Database and
one or more Operational Record Store (ORS) databases. Depending on the
configuration of your system, you can have multiple ORS databases in an installation.
For example, you could have a development ORS, a testing ORS, and a production
ORS. For more information, see Chapter 3, “About the Hub Store,” and Chapter 4,
“Configuring Operational Record Stores and Datasources.”
Before you begin to implement the schema, you must understand the basic structure of
the underlying Siperian Hub schema and its components. This section introduces the
most important tables in an ORS and how they work together.
Note: You must use tools in the Hub Console to define and manage the consolidated
schema—you cannot make changes directly to the database. For example, you must use
the Schema Manager to define tables and columns. For details, see“Requirements for
Defining Schema Objects” on page 87.
Configurable Tables
The following types of Siperian Hub tables are used to model business reference data.
You must explicitly create and configure these tables.
Types of Configurable Tables in an ORS
Type of Table Description
base object Used to store data for a central business entity (such as customer,
product, or employee) or a lookup table (such as country or state). In a
base object table (or simply a base object), you can consolidate data from
multiple source systems and use trust settings to determine the most
reliable value of each base object cell. You can define one-to-many
relationships between base objects. Base objects must be explicitly
created and configured according to the instructions in “Process
Overview for Defining Base Objects” on page 94.
dependent object Used to store detailed information about the records in a base object
(for example, supplemental notes). One record in a base object can map
to multiple records in a dependent object table (or simply a dependent
object). Dependent objects must be explicitly created and configured
according to the instructions in “Process Overview for Defining
Dependent Objects” on page 119.
landing table Used to receive batch loads from a source system. Landing tables must
be explicitly created and configured according to the instructions in
“Configuring Landing Tables” on page 355.
staging table Used to load data into a base objects and dependent objects. Mappings
are defined between landing tables and staging tables to specify whether
and how data is cleansed and standardized when it is moved from a
landing table to a staging table. Staging tables must be explicitly created
and configured according to the instructions in “Configuring Staging
Tables” on page 364.
Infrastructure Tables
The following types of Siperian Hub infrastructure tables are used to manage and
support the flow of data in the Hub Store. Siperian Hub automatically creates,
configures, and maintains these tables whenever you configure base objects and
dependent objects.
Types of Infrastructure Tables in an ORS
Type of Table Description
cross-reference Used for tracking the origin of each record in the base object. Named
table according to the following pattern:
C_baseObjectName_XREF
where baseObjectName is the root name of the base object (for example,
C_PARTY_XREF). For this reason, this table is sometimes referred to
as the XREF table. When you create a base object, Siperian Hub
automatically creates a cross-reference table to store information about
data coming from source systems. For more information, see
“Cross-Reference Tables” on page 97.
history table Used if history is enabled for a base object (see “Enable History” on
page 102). Named according to the following pattern:
C_baseObjectName_HIST—base object history table, as described in
“Base Object History Tables” on page 101.
C_baseObjectName_HXRF—cross-reference history table, as described in
“Cross-Reference History Tables” on page 101.
where baseObjectName is the root name of the base object (for example,
C_PARTY_HIST and C_PARTY_HXRF).
Siperian Hub creates and maintains several different history tables to
provide detailed change-tracking options, including merge and unmerge
history, history of the pre-cleansed data, history of the base object, and
the cross-reference history.
match key table Contains the match keys that were generated for all base object records.
Named according to the following pattern:
C_baseObjectName_STRP
where baseObjectName is the root name of the base object (for example,
C_PARTY_STRP). For more information, see “Columns in Match Key
Tables” on page 325.
Siperian Hub supports one:many and many:many relationships among tables, as well as
hierarchical relationships between records in the same base object. In Siperian Hub,
relationships between records can be defined in various ways.
Once these relationships are configured in the Hub Console, you can use these
relationships to configure match column rules by defining match paths between
records. For more information, see “Configuring Match Paths for Related Records” on
page 497.
Siperian Hub maintains schema consistency, provided that all model changes are
done using the Hub Console tools, and that no changes are made directly to the
database. Siperian Hub provides all the tools necessary for maintaining the schema.
Important: Schema changes can involve risk to data and should be approached in a
managed and controlled manner. You should plan the changes to be made and analyze
the impact of the changes before making them. You should also back up the database
before making any changes.
In order to make any changes to the schema, you must have a write lock. For more
information, see “Acquiring a Write Lock” on page 30.
Reserved Suffixes
Note: To understand which Hub processes create which tables and how to best
manage these tables, please refer to the “Transient Tables” technical note found on the
SHARE portal.
Siperian Hub creates metadata objects that use suffixes appended to the names you use
for base objects. In order to avoid confusion and possible data loss, database object
names must not use the following strings as either names or suffixes:
_T _L _D _C _CL _TML0
DLT _MTCH _TUPD _TGVI _TGR _TGVO
OPL _STRP _TSI _TMGA _TGV TBVB_
_TGVI _TMGA _TSU1 _TMGB _TGV1 TBVC_
_TMST _TMG1 _TMG2 _TMG3 _TRLG _TRLT
_TMP0 _EMI _TSU2 _TMG0 _TGC TBVV_
_XREF _EMO _TC0 _TMG1 _TGC1 TBVT_
_VCT _TXCU _TC1 _TMG2 _TGT _BVTB
_TGRP _TVXR TBVT_ _TMG3 _TGN _BVTC
_HIST _TROU TBVN_ _HUID _TGZ _BVTV
_HXRF _TCRV _TBVB _TNKY _TGA TUTR_
_TLA _TSRV _TBVC _TUID _TGA1 TUHM_
_TOU0 _TIND _TBVV _TGF _TGM TUGR_
_TGB1 _TLU _TIRD _TGB _TGD TVXRD_
The following column names are reserved and cannot be used for user-defined
columns.
ROWID_OBJECT CONSOLIDATION_IND
PKEY_SRC_OBJECT DELETED_IND
CREATE_DATE DELETED_BY
LAST_UPDATE_DATE DELETED_DATE
CREATOR LAST_ROWID_SYSTEM
UPDATED_BY DIRTY_IND
HIST_CREATE_DATE INTERACTION_ID
HIST_UPDATE_DATE HUB_STATE_IND
SRC_ROWID ROWID_SYSTEM
ROWID_XREF SRC_LUD
ROWID_OBJECT CONSOLIDATION_IND
PROMOTE_IND PUT_UPDATE_MERGE_IND
For purely technical reasons, you might want to add columns to a base object.
For example, for a segment match, you must add a segment column. For more
information on adding columns for segment matches, see “Segment Matching” on
page 562.
We recommend that you distinguish columns added to base objects for purely technical
reasons from those added for other business reasons, because you generally do not
want to include these columns in most views used by data stewards. Prefixing these
column names with a specific identifier, such as CSTM_, is one way to easily filter them
out.
Pane Description
Navigation pane Shows (in a tree view) the core schema objects: base objects and landing
tables. Expanding an object in the tree shows you the property groups
available for that object.
Properties pane Shows the properties for the selected object in the left-hand pane.
Clicking any node in the schema tree displays the corresponding
properties page (that you can view and edit) in the right-hand pane.
For general instructions about using the Schema Manager, see “Navigating the Hub
Console” on page 24. You must use the Schema Manager when defining tables in an
ORS, as described in “Requirements for Defining Schema Objects” on page 87.
Each individual entity has a single master record—the best version of the truth—for that
entity. An individual entity might have additional records in the base object (contributing
records) that contain the “multiple versions of the truth” that need to be consolidated
into the master record. Consolidation is the process of merging duplicate records into a
single consolidated record that contains the most reliable cell values from all of the source
records.
Most Reliable Cell Value
Master
Record
Contributing
Records
Important: You must use the Schema Manager to define base objects—you cannot
configure them directly in the database. For more information, see “Requirements for
Defining Schema Objects” on page 87.
Cross-Reference Tables
This section describes cross-reference tables in the Hub Store.
Each base object has one associated cross-reference table (or XREF table), which is used
for tracking the lineage (origin) of records in the base object. Siperian Hub
automatically creates a cross-reference table when you create a base object. Siperian
Hub uses cross-reference tables to translate all source system identifiers into the
appropriate ROWID_OBJECT values.
Note: Cross-reference tables are not created or needed for dependent objects, as
dependent objects are not matched and consolidated.
Each row in the cross-reference table represents a separate record from a source
system. If multiple sources provide data for a single column (for example, the phone
number comes from both the CRM and ERP systems), then the cross-reference table
contains separate records from each source system. Each base object record will have
one or more associated cross-reference records.
The load process populates cross-reference tables. During load inserts, new records are
added to the cross-reference table. During load updates, changes are written to the
affected cross-reference record(s).
Cross-reference records are visible in the Merge Manager and can be modified using
the Data Manager. For more information, see the Siperian Hub Data Steward Guide.
The following figure shows an example of the relationships between base objects,
cross-reference tables, and C_REPOS_SYSTEM.
Cross-reference tables have the following system columns. Note that cross-reference
tables have a unique key representing the combination of the PKEY_SRC_OBJECT
and ROWID_SYSTEM columns.
History Tables
This section describes history tables in the Hub Store. If history is enabled for a base
object (see “Enable History” on page 102), then Siperian Hub maintains history tables
for base objects and cross-reference tables. History tables are used by Siperian Hub to
provide detailed change-tracking options, including merge and unmerge history, history
of the pre-cleansed data, history of the base object, the cross-reference history, and so
on.
Item Type
The type of table that you are adding. Select Base Object.
Display Name
The name of this base object as it will be displayed in the Hub Console. Enter a
descriptive name.
Physical Name
The actual name of the table in the database. Siperian Hub will suggest a physical name
for the table based on the display name that you enter. Make sure that you do not use
any reserved name suffixes, as described in “Rules for Database Object Names” on
page 88.
Data Tablespace
The name of the data tablespace. Read-only. For more information, see the Siperian
Hub Installation Guide for your platform.
Index Tablespace
The name of the index tablespace. Read-only. For more information, see the Siperian
Hub Installation Guide for your platform.
Description
Enable History
Specifies whether history is enabled for this base object. If enabled, Siperian Hub keeps
a log of records that are inserted, updated, or deleted for this base object. You can use
the information in history tables for audit purposes. For more information, see
“History Tables” on page 100.
When the percentage of the records that have changed is higher than this value, a
complete re-tokenization is performed. If the number of records to be tokenized does
not exceed this threshold, then Siperian Hub deletes the records requiring
re-tokenization from the match key table, calculates the tokens for those records, and
then reinserts them into the match key table. The default value is 60. For more
information, see “Match Keys and the Tokenization Process” on page 322.
Note: Deleting can be a slow process. However, if your Cleanse Match Server is fast
and the network connection between Cleanse Match Server and the database server is
also fast, then you may test with a much lower tokenization threshold (such as 10%).
This will enable you to determine whether there are any gains in performance.
This parameter is used only with the Match for Duplicate Data job for initial data
loads. The default value is 0. To enable this functionality, this value must be set to 2 or
above. For more information, see “Match for Duplicate Data Jobs” on page 740 and
the Siperian Hub Data Steward Guide.
The load process inserts and updates batches records in the base object. The load
batch size specifies the number of records to load per batch cycle (default is 1000000).
For more information, see “Loading Records by Batch” on page 305, and Chapter 13,
“Configuring the Load Process.”
This specifies the execution timeout (in minutes) when executing a match rule. If this
time limit is reached, then the match process (whenever a match rule is executed, either
manually or via a batch job) will exit. If a match process is executed as part of a batch
job, the system should move onto the next match. It will stop if this is a single match
process. The default value is 20. Increase this value only if the match rule and data are
very complex. Generally, rules are able to complete with 20 minutes (the default).
For more information, see “Match Process” on page 317 and Chapter 14, “Configuring
the Match Process.”
Parallel Degree
Oracle only. This specifies the degree of parallelism set on the base object table and its
related tables. It does not take effect for all batch processes, but can have a beneficial
effect on performance when it is used. However, its use is constrained by the number
of CPUs on the database server machine, as well as the amount of memory available.
The default value is 1.
If this value is greater than zero, when parents are merged, the related child records are
set as unconsolidated. If set, when parents are merged, then related child records are
flagged as New again (consolidation indicator is 4, see “Consolidation Status for Base
Object Records” on page 289) so that they can be matched. The default value is 0.
For more information, see “Consolidation Indicator” on page 289 and “Immutable
Rowid Object” on page 594.
If selected (checked), then the tokenization process executes after the completion of
the load process. This is useful for intertable match scenarios in which the parent must
be loaded first, followed by the child match/merge. By not tokenizing the parent, the
child match/merge will not need to update any of the parent records in the match key
table.
Once the child match/merge is complete, you can run the match process on the parent
to force it to tokenize. This is also useful in cases where you have a limited window in
which to perform the load process. Not tokenizing will save time in the load process, at
the cost of tokenizing the data later.
You must tokenize before you match your data. For more information, see “Load
Process” on page 299, “Generating Match Tokens (Optional)” on page 316, and
“Generating Match Tokens During Load Jobs” on page 730.
You can PUT data into a base object using the Data Manager (see the Siperian Hub Data
Steward Guide). If you are using the Data Manager to PUT data, you can enable (check)
this value to tokenize your data later. Performing this operation later allows you to
process PUT requests faster. Use this only when you know that the data will not be
matched immediately. For more information, see “Match Keys and the Tokenization
Process” on page 322.
Note: Do not use the Generate Match Tokens on Put option if you are using the SIF
API. If you have this parameter enabled, your SIF Put and CleansePut requests will
fail. Use the Tokenize request instead. Enable Generate Match Tokens on Put only if
you are not using the SIF API and you want data steward updates from the Hub
Console to be tokenized immediately. For more information, see “Editing Base Object
Properties” on page 108.
If checked (selected), this feature enables locking of the data during updates, which
allows for a higher degree of concurrent access. The default value is 0, signifying that
row locking is disabled during batch.
For more information, see “Match Process” on page 317 and Chapter 14, “Configuring
the Match Process.”
Specifies whether Siperian Hub manages the system state for records in this base
object. By default, state management is disabled. Select (check) this check box to enable
state management for this base object in support of approval workflows. If enabled,
this base object is referred to in this document as a state-enabled base object. For more
information, see Chapter 7, “State Management,” and “Enabling State Management”
on page 211.
For state-enabled base objects, specifies whether Siperian Hub maintains the
promotion history for cross-reference records that undergo a state transition from
PENDING (0) to ACTIVE (1). By default, this option is disabled. For more
information, see Chapter 7, “State Management,” and “Enabling the History of
Cross-Reference Promotion” on page 213.
4. Specify the basic base object properties. For more information, see “Basic Base
Object Properties” on page 101.
5. Click OK.
The Schema Manager creates the new base table in the Operational Record Store
(ORS), along with any support tables, and then adds the new base object table to
the schema tree.
4. For each property that you want to edit on the Basic tab, click the Edit button
next to it, and specify the new value. For more information, see “Basic Base
Object Properties” on page 101.
5. If you want, check (select) the Enable History check box to have Siperian Hub
keep a log of records that are inserted, updated, or deleted. You can use a history
table for audit purposes.
7. Specify the advanced properties for this base object. For more information, see
“Advanced Base Object Properties” on page 102.
8. In the left pane, click Match/Merge Setup beneath the base object’s name.
For more information about setting the properties for matching and merging, see
“Configuring Match Properties for a Base Object” on page 488.
When you configure columns for a base object, system indexes are created
automatically for primary keys and unique columns. In addition, Siperian Hub
automatically drops and creates system indexes as needed when executing batch jobs or
stored procedures.
A custom index is a optional, supplemental index for a base object that you can define
and have Siperian Hub maintain automatically. Custom indexes are non-unique.
You might want to add a custom index to a base object for performance reasons. For
example, suppose an external application calls the SIF SearchQuery request to search a
base object by last name. If the base object has a custom index on the last name
column, the last name search is processed more quickly. For custom indexes that are
registered in Siperian Hub, custom indexes are automatically dropped and recreated
during batch execution to improve performance.
You have the option to manually define indexes outside the Hub Console using a
database utility for your database platform. For example, you could create a
function-based index—such as Upper(Last_Name) in the index expression—in
support of some specialized operation. However, if you add a user-defined index which
are not supported by the Schema Manager, then the custom index is not registered with
Siperian Hub, and you are responsible for maintaining that index—Siperian Hub will
not maintain it for you. If you do not properly maintain the index, you risk affecting
batch processing performance.
If an index already exists for the selected column(s), the Schema Manager displays
an error message and does not create the index.
To change a custom index, you must delete the existing custom index and add a new
custom index with the columns that you want.
5. Click Close.
Important: You must use the Schema Manager to define dependent objects—you
cannot configure them directly in the database. For more information, see
“Requirements for Defining Schema Objects” on page 87.
3. Expand the schema tree for the base object on which the new object will depend.
Property Description
Item Type Type of table that you are adding (Dependent Object).
Display Name Name for this dependent object as it will be displayed in the
Hub Console.
Physical Name Actual name of the table in the database. Siperian Hub will
suggest a physical name for the table based on the display name
that you enter.
Data Tablespace Name of the data tablespace. For more information, see the
Siperian Hub Installation Guide for your platform.
Index Tablespace Name of the index tablespace. For more information, see the
Siperian Hub Installation Guide for your platform.
Description Description of this dependent object.
6. Click OK.
The Schema Manager creates the new dependent object table in the Operational
Record Store (ORS) and then adds the new base object table to the schema tree.
6. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
7. Expand the tree below the dependent object.
Note: Dependent objects do not have all of the nodes that are available to base
objects.
• To modify columns, select Columns and follow the instructions in
“Configuring Columns in Tables” on page 125.
• To modify the message trigger configuration, select Message Trigger Setup
and follow the instructions in “Adding Message Triggers” on page 615.
• To modify staging tables, select Staging Tables and follow the instructions in
“Configuring Staging Tables” on page 364.
Note: In the Schema Manager, you can also view the columns for cross-reference
tables and history tables, but you cannot edit them.
About Columns
This section provides general information about table columns.
Column Description
system columns A column that Siperian Hub automatically creates and maintains.
System columns contain metadata.
user-defined columns Any column in a table that is not a system column. User-defined
columns are added in the Schema Manager and usually contain
business data.
Warning: The system columns contain Siperian Hub metadata. Do not alter Siperian
Hub metadata in any way. Doing so will cause Siperian Hub to behave in unpredictable
ways and you can lose data.
For more information about system columns in Hub Store tables, see:
• “Base Object Columns” on page 95
• “Columns in Cross-Reference Tables” on page 99
• “History Tables” on page 100
• “Dependent Object Columns” on page 120
• “Landing Table Columns” on page 356
• “Staging Table Columns” on page 365
Siperian Hub uses a common set of data types for columns that map directly to the
following Oracle and DB2 data types.
Note: For information regarding the available data types, refer to the product
documentation for your database platform.
Siperian Hub Data Type Oracle Data Type DB2 Data Type
CHAR CHAR CHAR
VARCHAR VARCHAR2 VARCHAR
NVARCHAR2 NVARCHAR2
NCHAR NCHAR
DATE DATE DATE
NUMBER NUMBER NUMERIC
INT INTEGER INT or INTEGER
Column Properties
A Global Business Identifier (GBID) column contains common identifiers (key values) that
allow you to uniquely and globally identify a record based on your business needs.
Examples include:
• Identifiers defined by applications external to Siperian Hub, such as ERP (SAP or
Siebel customer numbers) or CRM systems.
• Identifiers defined by external organizations, such as industry-specific codes (AMA
numbers, DEA numbers. and so on), or government-issued identifiers (social
security number, tax ID number, driver’s license number, and so on).
In the Schema Manager, you can define multiple GBID columns in a base object. For
example, an employee table might have columns for social security number and driver’s
license number, or a vendor table might have a tax ID number.
GBIDs do not replace the ROWID_OBJECT. GBIDs provide additional ways to help
you integrate your Siperian Hub implementation with external systems, allowing you to
query and access data through unique identifiers of your own choosing (using SIF
requests, as described in the Siperian Services Integration Framework Guide). In addition, by
configuring GBID columns using already-defined identifiers, you can avoid the need to
custom-define identifiers.
GBIDs help with the traceability of your data. Traceability is keeping track of the data so
that you can determine its lineage—which systems, and which records from those
systems, contributed to consolidated records. When you define GBID columns in a
base object, the Schema Manager creates a separate table for this base object (the table
name ends with _HUID) that tracks the old and new values (current/obsolete value
pairs).
For example, suppose two of your customers (both of which had different tax ID
numbers) merged into a single company, and one tax ID number survived while the
other one became obsolete. If you defined the taxID number column as a GBID,
Siperian Hub could help you track both the current and historical tax ID numbers so
that you could access data (via SIF APIs) using the historical value.
Note: Siperian Hub does not perform any data verification or error detection on
GBID columns. If the source system has duplicate GBID values, then those duplicate
values will be passed into Siperian Hub.
The columns for staging tables cannot be defined using the column editor. Staging
table columns are a special case, as they are based on some or all columns in the staging
table’s target object. You use the Add/Edit Staging Table window to select the columns
on the target table that can be populated by the staging table. Siperian Hub then creates
each staging table column with the same data types as the corresponding column in the
target table. See “Configuring Staging Tables” on page 364 for more information on
choosing the columns for staging tables.
A base object cannot have more than 200 user-defined columns if it will have match
rules that are configured for automatic consolidation. For more information, see
“Flagging Matched Records for Automatic or Manual Consolidation” on page 333 and
“Specifying Consolidation Options for Matched Records” on page 543.
Note: In the above example, the schema shows ANSI SQL data types that Oracle
converts to its own data types. For more information, see “Data Types for
Columns” on page 126.
The Properties pane in the Column Editor contains the following command buttons:
You can toggle the Show System Columns check box to show or hide system columns.
For more information, see “Types of Columns in ORS Tables” on page 126.
You can expand the properties pane to display all the column properties in a single
pane. By default, the Schema Manager displays column definitions in a contracted view.
Adding Columns
To add a column:
1. Navigate to the column editor for the table that you want to configure. For more
information, see “Navigating to the Column Editor” on page 131.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the button.
The Schema Manager displays an empty row.
4. For each column, specify its properties. For more information, see “Column
Properties” on page 127.
5. Click the button to save the columns you have added.
4. Specify the connection properties for the schema that you want to import.
If you need more information about the connection information to specify here,
contact your database administrator.
The settings for the User name / Password fields depend on whether proxy users
are configured for your Siperian Hub implementation.
• If proxy users are not configured (the default), then the user name will be the
same as the schema name.
• If proxy users are configured, then you must specify the custom user name /
password so that Siperian Hub can use those credentials to access the schema.
For more information about proxy user support, see the Siperian Hub Installation
Guide for your platform.
5. Click Next.
Note: The database you enter does not need to be the same as the Siperian ORS
that you’re currently working in, nor does it need to be a Siperian ORS.
The only restriction is that you cannot import from a relational database that is a
different type from the one in which you are currently working. For example, if
your database is an Oracle database, then you can import columns only from
another Oracle database.
The Schema Manager displays a list of the tables that are available for import.
The Schema Manager displays a list of columns for the selected table.
Important: As with any schema changes that are attempted after the tables have been
populated with data, manage changes to columns in a planned and controlled fashion,
and ensure that the appropriate database backups are done before making changes.
1. Navigate to the column editor for the table that you want to configure. For more
information, see “Navigating to the Column Editor” on page 131
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. For each column, you can change the following properties. Be sure to read about
the implications of changing a property before you make the change. For more
information about each property, see “Column Properties” on page 127.
Deleting Columns
Removing columns should be approached with extreme caution. Any data that has
already been loaded into a column will be lost when the column is removed. It can also
be a slow process due to the number of underlying tables that could be affected. You
must save the changes immediately after removing the existing columns.
To delete a column from base objects, dependent objects, and landing tables:
1. Navigate to the column editor for the table that you want to configure. For more
information, see “Navigating to the Column Editor” on page 131
Type Description
system foreign key Automatically defined and enforced by Siperian Hub to protect
relationships the referential integrity of your schema.
user-defined foreign key Custom foreign key relationships that are manually defined
relations according to the instructions later in this section.
Foreign-key relationships are implicit between a dependent object and its parent base
object. This relationship is defined according to the instructions in “Configuring
Dependent Objects” on page 117.
If the child table contains generated keys from the parent table, the load process copies
the appropriate primary key value from the parent table into the child table.
Note: After you have created a relationship, if you go back and try to create another
relationship, the column is not displayed because it is in use. When you delete the
relationship, the column will be displayed.
To edit the lookup display name for a foreign-key relationship between two base
objects:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, expand a base object and right-click Relationships.
The Schema Manager displays the Properties tab of the Relationships page.
4. On the Properties tab, click the foreign-key relationship whose properties you want
to view.
The Schema Manager displays the relationship details.
5. Click the Edit button next to the Lookup Display Name and specify the new
value.
6. Click the button to save your changes.
The Hub Console displays the Schema Viewer tool, as shown in the following
example.
Pane Description
Diagram pane Shows a detailed diagram of your schema.
Overview pane Shows an abstract overview of your schema. The gray box highlights the
portion of the overall schema diagram that is currently displayed in the
diagram pane. Drag the gray box to move the display area over a particular
portion of your schema.
The Diagram Pane in the Schema Viewer contains the following command buttons:
Zooming In
Note that the gray highlight box in the Overview Pane has grown smaller to indicate
the portion of the schema that is displayed in the diagram pane.
Zooming Out
Note that the gray box in the Overview Pane has grown larger to indicate a larger
viewing area.
Zooming All
To zoom all of the schema diagram, which means that the entire schema diagram is
displayed in the Diagram Pane:
• Click the button.
The Schema Viewer zooms out to display the entire schema diagram.
Hierarchic View
The following figure shows an example of the hierarchic view (the default).
Orthogonal View
The following figure shows the same schema in the orthogonal view.
Toggling Views
Command Description
Go to BaseObject Launches the Schema Manager and displays this base object with an
expanded base object node.
Go to Staging Launches the Schema Manager and displays the selected staging table under
Table the associated base object.
Go to Mapping Launches the Mappings tool and displays the properties for the selected
mapping.
Go to Job Launches the Batch Viewer and displays the properties for the selected
batch job.
Go to Batch Launches the Batch Group tool.
Groups
Pane Description
Show column names Controls whether column names appear in the entity boxes.
• Check (select) this option to display column names in the
entity boxes.
• Uncheck (clear) this option to hide column names and display
only entity names in the entity boxes.
Orientation Controls the orientation of the schema hierarchy. One of the
following values:
• Top to Bottom (default)—Hierarchy goes from top to
bottom, with the highest-level node at the top.
• Bottom to Top—Hierarchy goes from bottom to top, with
the highest-level node at the bottom.
• Left to Right—Hierarchy goes from left to right, with the
highest-level node at the left.
• Right to Left—Hierarchy goes from right to left, with the
highest-level node at the right.
3. Click OK.
2. Navigate to the location on the file system where you want to save the JPG file.
3. Specify a descriptive name for the JPG file.
4. Click Save.
The Schema Viewer saves the file.
Pane Description
Print Area Scope of what to print:
• Print All—Print the entire schema diagram.
• Print viewable—Print only the portion of the schema
diagram that is currently visible in the Diagram Pane.
Page Settings Page output options, such as media, orientation, and margins.
Printer Settings Printer options based on available printers in your environment.
3. Click Print.
The Schema Viewer sends the schema diagram to the printer.
This chapter describes how to configure Siperian Hub to provide queries and packages
that data stewards and applications can use to access data in the Hub Store.
Chapter Contents
• Before You Begin
• About Queries and Packages
• Configuring Queries
• Configuring Packages
161
About Queries and Packages
Configuring Queries
This section describes how to create and modify queries using the Queries tool in the
Hub Console. The Queries tool allows you to create simple, advanced, and custom
queries.
About Queries
In Siperian Hub, a query is a request to retrieve data from the Hub Store. Just like any
SQL-based query statement, Siperian Hub queries allow you to specify, via the Hub
Console, the criteria used to retrieve that data—tables and columns to include,
conditions for filtering records, and sorting and grouping the results. Queries that you
save in the Queries tool can be used in packages, and data stewards can use them in the
Data Manager and Merge Manager tools.
Query Capabilities
Types of Queries
Type Description
query Created by selecting tables and columns, and configuring query conditions,
sort by, and group by options, according to the instructions in “Configuring
Queries” on page 166.
custom query Created by specifying a SQL statement according to the instructions in
“Configuring Custom Queries” on page 190.
Queries are dependent on the base object columns from which they retrieve data.
If changes are made to the column configuration in the base object associated with a
query, then the queries—including custom queries—are updated automatically.
For example, if a column is renamed, then the name is updated in any dependent
queries. If a column is deleted in the base object, then the consequences depend on the
type of query:
• For a custom query, the query becomes invalid and must be manually fixed in the
Queries tool or the Packages tool. Otherwise, if executed, an invalid query will
return an error.
• For all other queries, the column is removed from the query, as well as from any
packages that depend on the query.
Pane Description
navigation pane Displays a hierarchical list of configured queries and query
groups.
properties pane Displays the properties of the selected query or query group.
A query group is a logical group of queries. A query group is simply a mechanism for
organizing queries in the Queries tool.
3. In the navigation pane, select the query group that you want to configure.
4. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
5. Click the Save button to save your changes.
You can delete an empty query group but not a query group that contains queries.
Configuring Queries
This section describes how to configure queries.
Adding Queries
To add a query:
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the query group to which you want to add the query.
4. Right-click in the Queries pane and choose New Query.
Property Description
Query name Descriptive name for this query.
Description Option description of this query.
Query Group Select the query group to which this query belongs.
Select primary Primary table from which this query retrieves data.
table
8. Select the query columns from which you want the query to retrieve data.
Note: PUT-enabled packages require the Rowid Object column in the query.
9. Click Finish.
The Queries tool adds the new query to the tree.
10. Refine the query criteria by proceeding to the instructions in “Editing Query
Properties” on page 168.
Once you have created a query, you can modify its properties to refine the criteria it
uses to retrieve data from the ORS.
Tab Description
Tables Tables associated with this query. Corresponds to the SQL FROM clause.
For more information, see “Configuring the Table(s) in a Query” on page
170.
Tab Description
Select Columns associated with this query. Corresponds to the SQL SELECT
clause. For more information, see “Configuring the Column(s) in a
Query” on page 174.
Conditions Conditions associated with this query. Determines selection criteria for
individual records. Corresponds to the SQL WHERE clause. For more
information, see “Configuring Conditions for Selecting Records of Data”
on page 178.
Sort Sort order for the results of this query. Corresponds to the SQL ORDER
BY clause. For more information, see “Specifying the Sort Order for
Query Results” on page 183.
Grouping Grouping for the results of this query. Corresponds to the SQL GROUP
BY clause. “Specifying the Grouping for Query Results” on page 186.
SQL Displays the SQL associated with the selected query settings. “Viewing
the SQL for a Query” on page 190.
The Tables tab displays the table(s) from which the query will retrieve information.
The information in this tab corresponds to the SQL FROM clause.
The Queries tool prompts you to select the table you want to add.
6. If prompted, select a foreign key relationship (if you want), and then click OK.
The Queries tool displays the added table in the Tables tab.
For multiple tables, the Queries tool displays all added tables in the Tables tab.
If you specified a foreign key between tables, the corresponding key columns are
linked. Also, if tables are linked by foreign key relationships, then the Queries tool
allows you to select the type of join for this query.
A query must have multiple tables in order for you to remove a table. You cannot
remove the last table in a query.
The Select tab displays the list of column(s) in one or more source tables from which
the query will retrieve information, as shown in the following example.
The information in this tab corresponds to the SQL SELECT clause.
The Queries tool prompts you to select from a list of one or more tables.
5. Expand the list for the table containing the column that you want to add.
The Queries tool displays the list of columns for the selected table.
The Queries tool adds the selected column(s) to the list of columns on the Select
tab.
8. Click the Save button.
To change the order in which the columns will appear in the result set (if the list
contains multiple columns):
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Select tab.
4. Select one column that you want to move.
5. Do one of the following:
• To move the selected column up the list, click the button.
• To move the selected column up the list, click the button.
The Queries tool moves the selected column up or down.
Adding Functions
You can add aggregate functions to your queries (such as COUNT, MIN, or MAX).
At run time, these aggregate functions appear in the usual syntax for the SQL
statement used to execute the query—such as:
select col1, count(col2) as c1 from table_name group by col1
Adding Constants
The Conditions tab displays a list of condition(s) that the query will use to select
records from the table. A comparison is a query condition that involves one column, one
operator, and either another column or a constant value. The information in this tab
corresponds to the SQL WHERE clause.
Operators
Operator Description
= Equals.
<> Does not equal.
IS NULL
IS NOT NULL
LIKE Value in the comparison column must be like the search value (includes
column values that match the search value). For example, if the search value is
%JO% for the last_name column, then the parameter will match column
values like “Johnson”, “Vallejo”, “Major”, and so on.
NOT LIKE Value in the comparison column must not be like the search value (excludes
column values that match the search value). For example, if the search value is
%JO% for the last_name column, then the parameter will omit column values
like “Johnson”, “Vallejo”, “Major”, and so on.
< Less than.
<= Less than or equal to.
Operator Description
> Greater than.
>= Greater than or equal to.
Adding a Comparison
• If you selected Constant, then click the button, specify the constant that
you want to add, and then click OK.
8. Click OK.
The Queries tool adds the comparison to the list on the Conditions tab.
9. Click the Save button.
Editing a Comparison
Removing a Comparison
The Sort By tab displays a list of column(s) containing the values that the query will use
to sort the query results at run time. The information in this tab corresponds to the
SQL ORDER BY clause.
5. Expand the list for the table containing the column(s) that you want to select for
sorting.
The Queries tool displays the list of columns for the selected table.
7. Click OK.
The Queries tool adds the selected column(s) to the list of columns on the Sort By
tab.
8. Do one of the following:
• Enable (check) the Ascending check box to sort records in ascending order for
the specified column.
• Disable (uncheck) the Ascending check box to sort records in descending
order for the specified column.
9. Click the Save button.
To change the order in which the columns will appear in the result set (if the list
contains multiple columns):
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
The Grouping tab displays a list of column(s) containing the values that the query will
use for grouping the query results at run time. The information in this tab corresponds
to the SQL GROUP BY clause.
5. Expand the list for the table containing the column(s) that you want to select for
grouping.
The Queries tool displays the list of columns for the selected table.
To change the order in which the columns will be grouped in the result set (if the list
contains multiple columns):
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Grouping tab.
4. Select one column that you want to move.
5. Do one of the following:
• To move the selected column up the list, click the button.
• To move the selected column up the list, click the button.
The Queries tool moves the selected column up or down a record.
6. Click the Save button.
The SQL tab displays the SQL statement that corresponds to the query options you
have specified for the selected query, as shown in the following example.
A custom query is simply a query for which you supply the SQL statement directly, rather
than building it according to the instructions in “Configuring Queries” on page 166.
Custom queries can be used in packages and in the data steward tools.
3. Select the query group to which you want to add the query.
4. Right-click in the Queries pane and choose New Custom Query.
The Queries tool displays the New Custom Query Wizard.
5. If you see a Welcome screen, click Next.
Property Description
Query name Descriptive name for this query.
Description Option description of this query.
Query Group Select the query group to which this query belongs.
7. Click Finish.
Once you have created a custom query, you can modify its properties to refine the
criteria it uses to retrieve data from the ORS.
You delete a custom query in the same way in which you delete a regular query.
For more information, see “Removing Queries” on page 195.
The Queries tool displays the results of your query, as shown in the following
example.
5. Expand the list next to a table to display the columns associated with the query, if
you want.
6. Click Close.
Removing Queries
If a query has multiple packages based on it, remove those packages first before
attempting to remove the query.
To remove a query:
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Expand the query group associated with the query you want to remove.
4. Select the query you want to remove.
5. Right click the query and choose Delete Query from the pop-up menu.
The Queries tool prompts you to confirm deletion.
6. Click Yes.
The Queries tool removes the query from the list.
Configuring Packages
This section describes how to create and modify PUT and display packages. You use
the Packages tool in the Hub Console to define packages.
About Packages
A package is a public view of one or more underlying tables in Siperian Hub. Packages
represent subsets of the columns in those tables, along with any other tables that are
joined to the tables. A package is based on a query. The underlying query can select a
subset of records from the table or from another package. For more information, see
“Configuring Queries” on page 162.
You must create a package if you want your Siperian Hub implementation to:
• Merge and update records in the Hub Store using the Merge Manager and Data
Manager tools. For more information, see the Siperian Hub Data Steward Guide.
• Allow an external application user to access Siperian Hub functionality using
Services Integration Framework (SIF) requests. For more information, see the
Siperian Services Integration Framework Guide.
In most cases, you create one set of packages for the Merge Manager and Data
Manager tools, and a different set of packages for external application users.
PUT-enabled packages:
• cannot include joins to other tables
• cannot be based on system tables or other packages
• cannot be based on queries that have constant columns, aggregate functions, or
group by settings
Pane Description
navigation pane Displays a hierarchical list of configured packages.
properties pane Displays the properties of the selected package.
Adding Packages
To add a new package:
1. In the Hub Console, start the Packages tool according to the instructions in
“Starting the Packages Tool” on page 198.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click in the Packages pane and choose New Package.
The Packages tool displays the New Package Wizard.
Note: If the welcome screen is displayed, click Next.
Field Description
Display Name Name of this package as it will be displayed in the Hub Console.
Physical Name Actual name of the package in the database. Siperian Hub will
suggest a physical name for the package based on the display
name that you enter.
Description Description of this package.
Enable PUT To create a PUT package, check (select) to insert or update
records into base object tables.
Note: Every package that you use for merging data or updating
data must be PUT-enabled.
If you do not enable PUT, you create a display (read-only)
package.
Secure Resource Check (enable) to make this package a secure resource, which
allows you to control access to this package. Once a package is
designated as a secure resource, you can assign privileges to it in
the Roles tool. For more information, see “Securing Siperian
Hub Resources” on page 841, and “Assigning Resource
Privileges to Roles” on page 859.
5. Click Next.
6. If you want, click New Query Group to add a new query group, as described in
“Configuring Query Groups” on page 164.
7. If you want, click New Query to add a new query, as described in “Configuring
Queries” on page 166.
8. Select a query.
Note: For PUT-enabled packages:
• only queries with ROWID_OBJECT can be used
• custom queries cannot be used
9. Click Finish.
The Packages tool adds the news package to the list.
To refresh a package:
1. In the Hub Console, start the Packages tool according to the instructions in
“Starting the Packages Tool” on page 198.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the package that you want to refresh.
Note: If after a refresh the query remains out of synch with the package, then simply
check (select) or uncheck (clear) any columns for this query. For more information, see
“Configuring the Column(s) in a Query” on page 174.
2. Create a query to join the PUT-enabled base object package with the other tables.
3. Create a display package based on the query you just created.
Removing Packages
To remove a package:
1. In the Hub Console, start the Packages tool according to the instructions in
“Starting the Packages Tool” on page 198.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the package to remove.
4. Right click the package and choose Delete Package.
The Packages tool prompts you to confirm deletion.
5. Click Yes.
The Packages tool removes the package from the list.
This chapter describes how to configure state management in your Siperian Hub
implementation.
Chapter Contents
• Before You Begin
• About State Management in Siperian Hub
• State Transition Rules for State Management
• Configuring State Management for Base Objects
• Modifying the State of Records
• Rules for Loading Data
205
Before You Begin
State Description
ACTIVE Default state. Record has been reviewed and approved. Active
records participate in Hub processes by default.
This is a state associated with a base object or cross reference record.
A base object record is active if at least one of its cross reference
records is active. A cross reference record contributes to the
consolidated base object only if it is active.
These are the records that are available to participate in any
operation. If records are required to go through an approval process,
then these records have been through that process and have been
approved.
Note that Siperian Hub allows matches to and from PENDING and
ACTIVE records.
State Description
PENDING Pending records are records that have not yet been approved for
general usage in the Hub. These records can have most operations
performed on them, but operations have to specifically request
pending records. If records are required to go through an approval
process, then these records have not yet been approved and are in the
midst of an approval process.
If there are only pending XREF records, then the Best Version of the
Truth (BVT) on the base object is determined through trust on the
PENDING records.
Note that Siperian Hub allows matches to and from PENDING and
ACTIVE records.
DELETED Deleted records are records that are no longer desired to be part of
the Hub’s data. These records are not used in processes (unless
specifically requested). Records can only be deleted explicitly and
once deleted can be restored if desired. When a record that is
pending is deleted, it is physically deleted, does not enter the
DELETED state, and cannot be restored.
In order for a record to be deleted, it must be in either the ACTIVE
state for soft delete or the PENDING state for hard delete.
Note that Siperian Hub does not include records in the DELETED
state for trust and validation rules.
Note: The Interaction ID can be specified through any API; however, it cannot be
specified when performing batch processing. For example, records that are protected
by an Interaction ID cannot be updated by the Load batch process.
The protection provided by interaction IDs is outlined in the following table. Note that
in the following table the Version A and Version B examples are used to represent the
situations where the incoming and existing interaction ID do and do not match:
State transition rules determine whether and when a record can change from one state to
another. State transition for base object and XREF records can be enabled using the
following methods:
• Using the Data Manager or Merge Manager tools in the Hub Console; for more
information, see Siperian Hub Data Steward Guide.
• Promote batch job; for more information, see “Promote Jobs” on page 741.
• SiperianClient API; for more information, see Siperian Services Integration Framework
Guide.
State transition rules differ for base object and cross-reference records.
Siperian Hub enables you to trigger message events for base object record when a
pending update occurs. The following message triggers are available for state changes
to base object or XREF records:
To enable the message queue triggers on a pending update for a base object:
1. Open the Model workbench and click on Schema.
2. In the Schema tool, click the Trigger on Pending Updates checkbox for message
queues in the Message Queues tool.
To learn more about message queues and message triggers, including how to enable
message queue triggers for state changes to base object and XREF records, see
“Configuring Message Triggers” on page 612.
To flag base object or XREF records for promotion at a later time using the Data
Manager:
1. Open the Data Steward workbench and click on the Data Manager tool.
2. In the Data Manager tool, click on the desired base object or XREF record.
3. Click on the Flag for Promote button on the associated panel.
You can now promote these PENDING XREF records using the Promote batch job.
To set up a batch job using the Batch Viewer to promote records flagged for
promotion:
1. Flag the desired PENDING records for promotion.
For more information, see “Modifying the State of Records” on page 216.
2. Open the Utilities workbench and click on the Batch Viewer tool.
3. Click on the Promote batch job under the Base Object node displayed in the
Batch Viewer.
4. Select Promote flagged records abc.
Where abc represents the associated records that you have previously flagged for
promotion.
5. Click Execute Batch button to promote the records flagged for promotion.
To add a Promote Batch job using the Batch Group Tool to promote records flagged
for promotion:
1. Flag the desired PENDING records for promotion.
For more information, see “Modifying the State of Records” on page 216.
2. Open the Utilities workbench and click on the Batch Group tool.
3. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
4. Right-click the Batch Groups node in the Batch Group tree and choose Add
Batch Group from the pop-up menu (or select Add Batch Group from the
Batch Group menu). For more information, see “Adding Batch Groups” on page
691.
5. In the batch groups tree, right click on any level, and choose the desired option to
add a new level to the batch group.
The Batch Group tool displays the Choose Jobs to Add to Batch Group dialog.
For more information, see “Adding Levels to a Batch Group” on page 694.
6. Expand the base object(s) for the job(s) that you want to add.
The Batch Group tool adds the selected job(s) to the batch group.
You can now execute the batch group job. For more information, see “Executing a
Batch Group” on page 704.
The following table describes how input states affect the states of existing XREF
records.
No XREF
Existing (Load by No Base Object
XREF State: ACTIVE PENDING DELETED rowid) Record
Incoming
XREF State:
ACTIVE Update Update + Update + Insert Insert
Promote Restore
PENDING Pending Pending Pending Pending Pending Insert
Update Update Update + Update
Restore
DELETED Soft Delete Hard Delete Hard Delete Error Error
Undefined Treat as Treat as Treat as Treat as Treat as ACTIVE
ACTIVE PENDING DELETED ACTIVE
This chapter explains how to configure Siperian Hierarchy Manager (HM) using the
Siperian Hierarchies tool in the Hub Console. The chapter describes how to set up
your data and how to configure the components needed by Hierarchy Manager for
your Siperian Hub implementation, including entity types, hierarchies, relationships
types, packages, and profiles. For instructions on using the Hierarchy Manager, see the
Siperian Hub Data Steward Guide. This chapter is recommended for Siperian Hub
administrators and implementers.
Chapter Contents
• About Configuring Hierarchies
• Starting the Hierarchies Tool
• Configuring Hierarchies
• Configuring Relationship Base Objects and Relationship Types
• Configuring Packages for Use by HM
• Configuring Profiles
• Sandboxes
223
About Configuring Hierarchies
When you have finished defining Hierarchy Manager components, you can use the
package or query manager tools to update the query criteria.
To understand the concepts in this chapter, you must be familiar with the concepts in
the following chapters in this guide (Siperian Hub Administrator Guide):
• Chapter 5, “Building the Schema”
• Chapter 6, “Configuring Queries and Packages”
• Chapter 15, “Configuring the Consolidate Process”
• Chapter 20, “Setting Up Security”
Note: The same options you see on the right-click menu in the Hierarchy Manager are
also available on the Hierarchies menu.
• Created valid schema to work with Siperian Hub and the HM.
For more information on schemas and how to create them, see Chapter 5,
“Building the Schema”.
• Created all relationships between your entities, including:
• Hierarchical relationships:
• All child entities must have a valid parent entity related to them.
Your data cannot have any ‘orphan’ child entities when it enters HM.
• All hierarchies must be validated (see Chapter 9, “Siperian Hub
Processes”).
• Foreign key relationships.
For a general overview of foreign key relationships, see “Process Overview for
Defining Foreign-Key Relationships” on page 143. For more information
about parent-child relationships, see “Configuring Match Paths for Related
Records” on page 497.
• One-hop and multi-hop relationships (direct and indirect relationships
between entities). For more information on these kinds of relationships, see
the Siperian Hub Data Steward Guide.
• Derived HM types.
• Consolidated duplicate entities from multiple source systems.
For example, a group of entities (Source A) might be the same as another group of
entities (Source B), but the two groups of entities might have different group
names. Once the entities are identified as being identical, the two groups can be
consolidated.
For more information on consolidation, see Chapter 9, “Siperian Hub Processes”.
• Grouped your entities into logical categories, such as physician’s names into the
“Physician” category.
For more information on how to group your data, see Chapter 4, “Configuring
Operational Record Stores and Datasources”.
• Made sure that your data complies with the rules for:
• Referential integrity.
• Invalid data.
• Data volatility.
For more information on these database concepts, see a database reference text.
Scenario
John has been tasked with manipulating his company’s data so that it can be viewed
and used within Hierarchy Manager in the most efficient way. To simplify the example,
we are describing a subset of the data that involves product types and products of the
company, which sells computer components.
The company sells three types of products: mice, trackballs, and keyboards. Each of
these product types includes several vendors and different levels of products, such as
the Gaming keyboard and the TrackMan trackball.
Methodology
In this step you organize the data into the Hierarchy that will then be translated into
the HM configuration.
John begins by analyzing the product and product group hierarchy. He organizes the
products by their product group and product groups by their parent product group.
The sheer volume of data and the relationships contained within the data are difficult
to visualize, so John lists the categories and sees if there are relationships between
them.
The following table (which contains data from the Marketing department) shows an
example of how John might organize his data.
The table shows the data that will be stored in the Products BO. This is the BO to
convert (or create) in HM. The table shows Entities, such as Mice or Laser Mouse. The
relationships are shown by the grouping, that is, there is a relationship between Mice
and Laser Mouse. The heading values are the Entity Types: Mice is a Product Group
and Laser Mouse is a Product. This Type is stored in a field on the Product table.
Organizing the data in this manner allows John to clearly see how many entities and
entity types are part of the data, and what relationships those entities have.
The major category is ProdGroup, which can include both a product group (such as
mice and pointers), the category Product, and the products themselves (such as the
Trackman Wheel). The relationships between these items can be encapsulated in a
relationship object, which John calls Product Rel. In the information for the Product
Rel, John has explained the relationships: Product Group is the parent of both Product
and Product Group.
John begins by accessing the Hierarchy Tool. When he accesses the tool, the system
creates the Relationship Base Object Tables (RBO tables). RBO tables are essentially
system base objects that are required base objects containing specific columns. They
store the HM configuration data, such as the data that you see in the table in Step 1.
The Siperian Hub Administrator Guide explains how to create base objects in detail. This
section describes the choices you would make when you create the example base
objects in the Schema tool.
You must create and configure a base object for each entity object and relationship
object that you identified in the previous step. In the example, you would create a base
object for Product and convert it to an HM Entity Object. The Product Rel BO should
be created in HM directly (an easier process) instead of converting. Each new base
object is displayed in the Schema panel under the category Base Objects. Repeat this
process to create all your base objects.
In the next section, you configure the base objects so that they are optimized for HM
use.
You created the two base objects (Product and Product Rel) in the previous section.
This section describes how to configure them.
Configuring a base object involves filling in the criteria for the object’s properties, such
as the number and type of columns, the content of the staging tables, the name of the
cross-reference tables (if any), and so on. You might also enable the history function,
set up validation rules and message triggers, create a custom index, and configure the
external match table (if any).
Whether or not you choose these options and how you configure them depends on
your data model and base object choices.
In the example, John configures his base objects as the following sections explain.
Note: Not all components of the base-object creation are addressed here, only the
ones that have specific significance for data that will be used in the HM. For more
information on the components not discussed here, see the Schema chapter in this
Guide.
Columns
This table shows the Product BO after conversion to an HM entity object. In this list,
only the Product Type field is an HM field.
Every base object has system columns and user-defined columns. System columns are
created automatically, and include the required column: Rowid Object. This is the
Primary key for each base object table and contains a unique, Hub-generated value.
This value cannot be null because it is the HM lookup for the class code. HM makes a
foreign key constraint in the database so a ROWID_OBJECT value is required and cannot
be null.
For the user-defined columns, John choose logical names that would effectively include
information about the products, such as Product Number, Product Type, and Product
Description. These same column and column values must appear in the staging tables.
Staging Tables
John makes sure that all the user-defined columns from the staging tables are added as
columns in the base object, as the graphic above shows. The Lookup column shows
the HM-added lookup value.
Notice that several columns in the Staging Table (Status Cd, Product Type, and
Product Type Cd) have references to lookup tables. You can set these references up
when you create the Staging Table. You would use lookups if you do not want to
hardcode a value in your staging table, but would rather have the server look up a value
in the parent table.
Most of the lookups are unrelated to HM and are part of the data model. The Rbo Bo
Class lookup is the exception because it was added by HM. HM adds the lookup on the
product Type column.
Note: When you are converting entities to entity base objects (entities that are
configured to be used in HM), you must have lookup tables to check the values for the
Status Cd, Product Type, and Product Type Cd.
Warning: HM Entity objects do not require start and end dates. Any start and end
dates would be user defined. However, Rel Objects do use these. Do not create new
Rel Objects with different names for start and end dates. These are already provided.
You create entity types in the Hierarchy Tool. John creates two entity types: ProdGroup
and Product Type. The following graphic shows the completed Product Entity Type
information.
Each entity type has a code that derives from the data analysis and the design. In this
example, John chose to use Product as one type, and Product Group as another.
This code must be referenced in the corresponding RBO base object table. In this
example, the code Product is referenced in the C_RBO_BO_CLASS table. The value
of the BO_CLASS_CODE is ‘Product’.
The following graphic shows the relationship between the HM entity objects and HM
relationship objects to the RBO tables:
When John has completed all the steps in this section, he will be ready to create other
HM components, such as packages, and to view his data in the HM. For example, the
following graphic shows the relationships that John has set up in the Hierarchies Tool,
displayed in the Hierarchy Manager. This example shows the hierarchy involving Mice
devices fully. For more information on how to use HM, see the Data Steward Guide.
The Hub Console displays the Hierarchies tool, as shown in the following
example:
Properties Pane
Navigation Pane
If you are setting up the Hierarchies tool, see “Creating the HM Repository Base
Objects” on page 235. If you already have RBO tables set up, see “Configuring Entity
Icons” on page 238.
Queries and MRM packages (and their associated queries) will also be created for these
RBO tables.
2. Start the Hierarchies tool. Expand the Model workbench and click Hierarchies.
To learn more, see “Starting the Hierarchies Tool” on page 234.
Note: Any option that you can select by right-clicking in the navigation panel, you can
also choose from the Hierarchies tool menu.
After you start the Hierarchies tool, if an ORS does not have the necessary RBO tables,
then the Hierarchies tool walks you through the process of creating them.
The following steps explain what to select in the dialog boxes that the Hierarchies tool
displays:
1. Choose Yes in the Siperian Hub Console dialog to create the metadata (RBO
tables) for HM in the ORS.
2. Select the tablespace names in the Create RBO tables dialog, and then click OK.
2. Start the Hub Console. To learn more, see “Starting the Hub Console” on page 19.
3. Launch the Hierarchies tool in the Hub Console.
4. Click Yes to add additional columns.
After you upgrade a pre-XU schema to XU, you will be reminded to remove obsolete
HM metadata when you get into the Hierarchies tool.
2. Start the Hub Console. To learn more see the “Starting the Hub Console” on page
19.
3. Launch the Hierarchies tool in the Hub Console.
Note: If the Rbo Rel Type Usage base object is being used by some other non-HM
base object, you will be told to manually delete the table by going to the schema
manager.
Siperian Hub XU shows relationship and entity types under the base object with which
they are associated. If a type is not associated with a base object, for example it does
not have packages assigned, it is not displayed in the GUI, but does remain in the
database.
During the ORS upgrade process, the migration script skips over the orphan entity and
relationship types, displays a related warning message, then continues. After the ORS
upgrade, you can delete the orphan types or associate entities and relationship types
with them.
If you want to associate orphan types but you have not created the corresponding base
objects, create the objects, then press refresh. The software prompts you to create the
association.
To import your own icons, create a ZIP or JAR file containing your icons. For each
icon, create a 16 x 16 icon for the small icon and a 48 x 48 icon for the large icon.
You cannot modify icons directly from the console. You can download a ZIP or JAR
file, modify its contents, then upload it again into the console.
You can either delete icons groups or make them inactive. If an icon is already
associated with an entity, or if you could use a group of icons in the future, you might
consider choosing to inactivate them instead of deleting them.
You inactivate a group of icons by marking the icon package Inactive. Inactive icons
are not displayed in the UI and cannot be assigned to an entity type. To reactivate the
icon packet, mark it Active.
Warning: Siperian Hub does not validate icons assignments before deleting. If you
delete an icon that is currently assigned to an Entity Type, you will get an error when
you try to save the edit.
You cannot delete individual icons from a ZIP or JAR file from the console; you can
only delete them as a group or package.
2. Start the Hierarchies tool. To learn more, see “Starting the Hierarchies Tool” on
page 234.
3. Right-click the icon collections in the navigation pane and choose Delete Entity
Icons.
This section describes entities, entity objects, and entity types in Hierarchy Manager.
Entities
In Hierarchy Manager, an entity is any object, person, place, organization, or other thing
that has a business meaning and can be acted upon in your database. Examples include
a specific person’s name, a specific checking account number, a specific company, a
specific address, and so on.
An entity base object is a base object that has been configured in HM, and that is used to
store HM entities. When you create an entity base object using the Hierarchies tool
(instead of the Schema Manager), the Hierarchies tool automatically creates the
columns required for Hierarchy Manager. You can also convert an existing MRM base
object to an entity base object by using the options in the Hierarchies tool.
After adding an entity base object, you use the Schema Manager to view, edit, or delete
it. To learn more, see “Configuring Base Objects” on page 92.
Entity Types
4. Click OK.
The Hierarchies tool prompts you to enter information about the new base object.
Field Description
Item Type Read-only. Already specified.
Display name Name of this base object as it will be displayed in the Hub
Console.
Physical name Actual name of the table in the database. Siperian Hub will
suggest a physical name for the table based on the display name
that you enter.
The RowId is generated and assigned by the system, but the BO
Class Code is created by the user, making it easier to remember.
Data tablespace Name of the data tablespace. To learn more, see the Siperian Hub
Installation Guide for your platform.
Index tablespace Name of the index tablespace. To learn more, see the Siperian
Hub Installation Guide for your platform.
Description Description of this base object.
Foreign Key column Column used as the Foreign Key for this entity type; can be
for Entity Types either ROWID or CODE.
The ability to choose a BO Class CODE column reduces
the complexity by allowing you to define the foreign key
relationship based on a predefined code, rather than the
Siperian generated ROWID.
Display name Descriptive name of the column of the Entity Type Foreign Key
that is displayed in Hierarchy Manager.
Physical name Actual name of the FK column in the table. Siperian Hub will
suggest a physical name for the FK column based on the display
name that you enter.
The base object you created has the columns required by Hierarchy Manager. You
probably require additional columns in the base object, which you can add using the
Schema Manager, as described in “Configuring Columns in Tables” on page 125.
Important: When you modify the base object using the Schema Manager, do not
change any of the columns added by Hierarchy Manager. Modifying any of these
Hierarchy Manager columns will result in unpredictable behavior and possible data loss.
You must convert base objects to entity base objects before you can use them in HM.
Base objects created in MRM do not have the metadata required by Hierarchy
Manager. In order to use these MRM base objects with HM, you must add this
metadata via a conversion process. Once you have done this, you can use these
converted base objects with both MRM and HM.
Note: If you do not see any choices in the Modify Base Object field, then there are
no non-hierarchy base objects available. You must create one in the Schema tool.
4. Click OK.
If the base object already has HM metadata, the Hierarchies tool will display a
message indicating the HM metadata that exists.
5. In the Foreign Key Column for Entity Types field, select the column to be added:
RowId Object or BO Class Code.
This is the descriptive name of the column of the Entity Type Foreign Key that is
displayed in Hierarchy Manager.
The ability to choose a BO Class Code column reduces the complexity by allowing
you to define the foreign key relationship based on a predefined code, rather than
the Siperian generated ROWID.
6. In the Existing BO Column to use, select an existing column or select the Create
New Column option.
If no BO columns exist, only the Create New Column option is available.
7. In the Display Name and Physical Name fields, create display and physical names
for the column, and click OK.
The base object will now have the columns that Hierarchy Manager requires. To add
additional columns, use the Schema Manager (see “Configuring Columns in Tables” on
page 125).
Important: When you modify the base object using the Schema Manager tool, do not
change any of the columns added using the Hierarchies tool. Modifying any of these
columns will result in unpredictable behavior and possible data loss.
The Hierarchies tool displays a new entity type (called New Entity Type) in the
navigation tree under the Entity Object you selected.
2. In the properties panel, specify the following properties for this new entity base
object.
Field Description
Code Unique code name of the Entity Type. Can be used as a foreign
key from HM entity base objects.
Display name Name of this entity type as it will be displayed in the Hub
Console. Specify a unique, descriptive name.
Description Description of this entity type.
Color Color of the entities associated with this entity type as they will
be displayed in the Hub Console in the Hierarchy Manager
Console and Business Data Director.
Small Icon Small icon for entities associated with this entity type as they will
be displayed in the Hub Console in the Hierarchy Manager
Console and Business Data Director.
Large Icon Large icon for entities associated with this entity type as they will
be displayed in the Hub Console in the Hierarchy Manager
Console and Business Data Director.
The color you choose determines how entities of this type are displayed in the
Hierarchy Manager. Select a color and click OK.
4. To select a small icon for the new entity type, click next to Small Icon.
The Choose Small Icon window is displayed.
Small icons determine how entities of this type are displayed when the graphic
view window shows many entities. To learn more about adding icon graphics for
your entity types, see “Configuring Entity Icons” on page 238.
Select a small icon and click OK.
5. To select a large icon for the new entity type, click next to Large Icon.
Large icons determine how entities of this type are displayed when the graphic
view window shows few entities. To learn more about adding icon graphics for
your entity types, see “Configuring Entity Icons” on page 238.
Select a large icon and click OK.
6. Click to save the new entity type.
2. For each field that you want to edit, click and make the change that you want.
For more information about these fields, see “Adding Entity Types” on page 246.
3. When you have finished making changes, click to save your changes.
Warning: If your entity object uses the code column, you probably do not want to
modify the entity type code if you already have records for that entity type.
You can delete any entity type that is not used by any relationship types. If the entity
type is being used by one or more relationship types, attempting to delete it will
generate an error.
2. In the Hierarchies tool, in the navigation tree, right-click the entity type that you
want to delete, and choose Delete Entity Type.
If the entity type is not used by any relationship types, then the Hierarchies tool
prompts you to confirm deletion.
3. Choose Yes.
The Hierarchies tool removes the selected entity type from the list.
Warning: You probably do not want to delete an entity type if you already have entity
records that use that type. If your entity object uses the code column instead of the
rowid column and you have records in that entity object for the entity type you are
trying to delete, you will get an error.
In addition to configuring color and icons for entities, you can also configure the font
size and maximum width. While color and icons can be specified for each entity type,
the font size and width apply to entities of all types.
To change the font size in HM, use the HM Font Size and Entity Box Size. The default
entity font size (38 pts) and max entity box width (600 pixels) can be overridden by
settings in the cmxserver.properties file. The settings to use are:
sip.hm.entity.font.size=fontSize
sip.hm.entity.max.width=maxWidth
The value for fontSize can be from 6 to 100 and the value for maxWidth can be from
20 to 5000. If value specified is outside the range, the minimum or maximum values are
used. Default values are used if the values specified are not numbers.
Note that when you revert the entity object, you are also reverting its corresponding
relationship objects.
Configuring Hierarchies
This section describes how to define hierarchies using the Hierarchies tool.
About Hierarchies
A hierarchy is a set of relationship types (as described in “About Relationships,
Relationship Objects, and Relationship Types” on page 255). These relationship types
are not ranked, nor are they necessarily related to each other. They are merely
relationship types that are grouped together for ease of classification and identification.
The same relationship type can be associated with multiple hierarchies. A hierarchy type is
a logical classification of hierarchies.
Adding Hierarchies
To add a new hierarchy:
1. In the Hierarchies tool, acquire a write lock.
2. Right-click an entity object in the navigation pane and choose Add Hierarchy.
The Hierarchies tool displays a new hierarchy (called New Hierarchy) in the
navigation tree under the Hierarchies node. The default properties are displayed in
the properties pane.
Field Description
Code Unique code name of the hierarchy. Can be used as a foreign key
from HM relationship base objects.
Display name Name of this hierarchy as it will be displayed in the Hub
Console. Specify a unique, descriptive name.
Description Description of this hierarchy.
Editing Hierarchies
To edit a hierarchy:
1. In the Hierarchies tool, acquire a write lock.
Warning: If your relationship object uses the hierarchy code column (instead of the
rowid column), you probably do not want to modify the hierarchy code if you already
have records for that hierarchy in the relationship object.
Deleting Hierarchies
Warning: You do not want to delete a hierarchy if you already have relationship
records that use the hierarchy. If your relationship object uses the hierarchy code
column instead of the rowid column and you have records in that relationship object
for the hierarchy you are trying to delete, you will get an error.
To delete a hierarchy:
1. In the Hierarchies tool, acquire a write lock.
2. In the navigation tree, right-click the hierarchy that you want to delete, and choose
Delete Hierarchy.
The Hierarchies tool prompts you to confirm deletion.
3. Choose Yes.
The Hierarchies tool removes the selected hierarchy from the list.
Note: You are allowed to delete a hierarchy that has relationship types associated with
it. There will be a warning with the list of associated relationship types. If you elect to
delete the hierarchy, all references to it will automatically be removed.
Relationships
A relationship describes the affiliation between two specific entities. Hierarchy Manager
relationships are defined by specifying the relationship type, hierarchy type, attributes
of the relationship, and dates for when the relationship is active.
Relationship Types
A relationship type describes classes of relationships and defines the types of entities that
a relationship of this type can include, the direction of the relationship (if any), and
how the relationship is displayed in the Hub Console.
Note: Relationship Type is a physical construct and can be configuration heavy, while
Hierarchy Type is more of a logical construct and is typically configuration light.
Therefore, it is often easier to have many Hierarchy Types than to have many
Relationship Types. Be sure to understand your data and hierarchy management
requirements prior to defining Hierarchy Types and Relationship Types within Siperian.
A well defined set of Hierarchy Manager relationship types has the following
characteristics:
• It reflects the real-world relationships between your entity types.
• It supports multiple relationship types for each relationship.
The Hierarchies tool prompts you to select the type of base object to create.
5. Specify the following properties for this new entity base object.
Field Description
Item Type Read-only. Already specified.
Display name Name of this base object as it will be displayed in the Hub
Console.
Physical name Actual name of the table in the database. Siperian Hub will
suggest a physical name for the table based on the display name
that you enter.
Data tablespace Name of the data tablespace. To learn more, see the Siperian Hub
Installation Guide for your platform.
Index tablespace Name of the index tablespace. To learn more, see the Siperian
Hub Installation Guide for your platform.
Description Description of this base object.
Entity Base Object 1 Entity base object to be linked via this relationship base object.
Display name Name of the column that is a FK to the entity base object 1.
Physical name Actual name of the column in the database. Siperian Hub will
suggest a physical name for the column based on the display
name that you enter.
Entity Base Object 2 Entity base object to be linked via this relationship base object.
Display name Name of the column that is a FK to the entity base object 2.
Physical name Actual name of the column in the database. Siperian Hub will
suggest a physical name for the column based on the display
name that you enter.
Hierarchy FK Column Column used as the foreign key for the hierarchy; can be either
ROWID or CODE.
The ability to choose a BO Class CODE column reduces the
complexity by allowing you to define the foreign key relationship
based on a predefined code, rather than the Siperian generated
ROWID.
Hierarchy FK Display Name of this FK column as it will be displayed in the Hub
Name Console
Hierarchy FK Physical Actual name of the hierarchy foreign key column in the table.
Name Siperian Hub will suggest a physical name for the column based
on the display name that you enter.
Field Description
Rel Type FK Column Column used as the foreign key for the relationship; can be
either ROWID or CODE.
Rel Type Display Name of the column that is used to store the Rel Type CODE
Name or ROWID.
Rel Type Physical Actual name of the relationship type FK column in the table.
Name Siperian Hub will suggest a physical name for the column based
on the display name that you enter.
The relationship base object you created has the columns required by Hierarchy
Manager. You may require additional columns in the base object, which you can add
using the Schema Manager, as described in “Configuring Columns in Tables” on page
125.
Important: When you modify the base object using the Schema Manager, do not
change any of the columns added by Hierarchy Manager. Modifying any of these
columns will result in unpredictable behavior and possible data loss.
A foreign key relationship base object is an entity base object with a foreign key to
another entity base object.
2. Right-click anywhere in the navigation pane and choose Create Foreign Key
Relationship.
The Hierarchies tool displays the Modify Existing Base Object dialog.
3. Specify the base object and the number of Foreign Key columns, then click OK.
The Hierarchies tool displays the Convert to FK Relationship Base Object dialog.
Field Description
FK Constraint Entity Select FK entity base object from list.
BO 1
Existing BO Column Name of existing base object column used for FK, or choose to
to Use create a new column.
FK Column Display Name of FK column as it will be displayed in the Hub Console.
Name 1
Field Description
FK Column Physical Actual name of FK column in the database. Siperian Hub will
Name 1 suggest a physical name for the table based on the display name
that you enter.
FK Column Choose Entity1 or Entity2, depending on what the FK Column
Represents represents in the relationship.
The base object you created has the columns required by Hierarchy Manager. You may
require additional columns in the base object, which you can add using the Schema
Manager, as described in “Configuring Columns in Tables” on page 125.
Important: When you modify the base object using the Schema Manager tool, do not
change any of the columns added by the Hierarchies tool. Modifying any of these
columns will result in unpredictable behavior and possible data loss.
For more information about foreign key relationships, see Chapter 5, “Building the
Schema.”
Relationship base objects are tables that contain information about two entity base
objects.
Base objects created in MRM do not have the metadata required by Hierarchy Manager
for relationship information. In order to use these MRM base objects with Hierarchy
Manager, you must add this metadata via a conversion process. Once you have done
this, you can use these converted base objects with both MRM and HM.
3. Click OK.
The Convert to Relationship Base Object screen is displayed.
5. Click OK.
Warning: When you modify the base object using the Schema Manager tool, do not
change any of the columns added by HM. Modifying any of these HM columns will
result in unpredictable behavior and possible data loss.
This removes HM metadata from the relationship object. The relationship object
remains as a base object, but is no longer displayed in the Hierarchy Manager.
Note: You can only save a relationship type if you associate it with a hierarchy.
An Foreign Key Relationship Base Object is an Entity Base Object containing a foreign
key to another Entity Base Object. A Relationship Base Object is a table that relates the
two Entity Base Objects.
3. The properties panel displays the properties you must enter to create the
relationship.
4. In the properties panel, specify the following properties for this new relationship
type.
Field Description
Code Unique code name of the rel type. Can be used as a foreign key
from HM relationship base objects.
Display name Name of this relationship type as it will be displayed in the Hub
Console. Specify a unique, descriptive name.
Description Description of this relationship type.
Color Color of the relationships associated with this relationship type
as they will be displayed in the Hub Console in the Hierarchy
Manager Console and Business Data Director.
Entity Type 1 First entity type associated with this new relationship type.
Any entities of this type will be able to have relationships of this
relationship type.
Entity Type 2 Second entity type associated with this new relationship type.
Any entities of this type will be able to have relationships of this
relationship type.
Field Description
Direction Select a direction for the new relationship type to allow a
directed hierarchy. The possible directions are:
• Entity 1 to Entity 2
• Entity 2 to Entity 1
• Undirected
• Bi-Directional
• Unknown
An example of a directed hierarchy is an organizational chart,
with the relationship reports to being directed from employee to
supervisor, and so on, up to the head of the organization.
FK Rel Start Date The start date of the foreign key relationship.
FK Rel End Date The end date of the foreign key relationship.
Hierarchies Check the check box next to any hierarchy that you want
associated with this new relationship type. Any selected
hierarchies can contain relationships of this relationship type.
The color you choose determines how entities of this type are displayed in the
Hierarchy Manager. Select a color and click OK.
6. Click the Calendar button to designate a start and end date for a foreign key
relationship. All relationships of this FK relationship type will have the same start
and end date. If you do not specify these dates, the default values are automatically
added.
7. Select a hierarchy.
8. Click to save the new relationship type.
2. In the navigation tree, click the relationship type that you want to edit.
3. For each field that you want to edit, click and make the change that you want.
To learn more about these fields, see “Adding Relationship Types” on page 265.
4. When you have finished making changes, click to save your changes.
Warning: If your relationship object uses the code column, you probably do not want
to modify the relationship type code if you already have records for that relationship
type.
Warning: You probably do not want to delete a relationship type if you already have
relationship records that use the relationship type. If your relationship object uses the
relationship type code column instead of the rowid column and you have records in
that relationship object for the relationship type you are trying to delete, you will get an
error.
The above warnings are not applicable to FK relationship types.You can delete
relationship types that are associated with hierarchies. The confirmation dialog displays
the hierarchies associated with the relationship type being deleted.
2. In the navigation tree, right-click the relationship type that you want to delete, and
choose Delete Relationship Type.
The Hierarchies tool prompts you to confirm deletion.
3. Choose Yes.
The Hierarchies tool removes the selected relationship type from the list.
About Packages
As described in Chapter 6, “Configuring Queries and Packages,” package is a public
view of one or more underlying tables in Siperian Hub. Packages represent subsets of
the columns in those tables, along with any other tables that are joined to the tables. A
package is based on a query. The underlying query can select a subset of records from
the table or from another package. Packages are used for configuring user views of the
underlying data. For more information, see “Configuring Queries and Packages” on
page 161.
You must first create a package to use with Hierarchy Manager, then you must
associate it with Entity Types or Relationship Types.
Creating Packages
This section describes how to create HM and Relationship packages.
To create an HM package:
1. Acquire a write lock.
2. In the Hierarchies tool, right-click anywhere in the navigation pane and choose
Create New Package.
The Hierarchies tool starts the Create New Package wizard and displays the first
dialog box.
Field Description
Type of Package One of the following types:
• Entity Object
• Relationship Object
• FK Relationship Object
Query Group Select an existing query group or choose to create a new one. In
Siperian Hub, query groups are logical groups of queries. For
more information, see “Configuring Query Groups” on page
164.
Query group name Name of the new query group - only needed if you chose to
create a new group above.
Description Optional description for the new query group you are creating.
4. Click Next.
The Create New Package wizard displays the next dialog box.
Field Description
Query Name Name of the query. In Siperian Hub, a query is a request to
retrieve data from the Hub Store. For more information, see
“Configuring Queries” on page 166.
Description Optional description.
Select Primary Table Primary table for this query.
6. Click Next.
The Create New Package wizard displays the next dialog box.
Field Description
Display Name Display name for this package, which will be used to display this
package in the Hub Console.
Physical Name Physical name for this package. The Hub Console will suggest a
physical name based on the display name you entered.
Description Optional description.
Enable PUT Select to enable records to be inserted or changed. (optional)
If you do not choose this, your package will be read only. If you
are creating a foreign key relationship object package, you have
additional steps in Step 9 of this procedure.
Note: You must have both a PUT and a non-PUT package for
every Foreign Key relationship. Both Put and non-Put packages
that you create for the same foreign key relationship object must
have the same columns.
Secure Resource Select to create a secure resource. (optional)
8. Click Next.
The Create New Package wizard displays a final dialog box. The dialog box you see
depends on the type of package you are creating.
• If you selected to create either a package for entities or relationships or a PUT
package for FK relationships, a dialog box similar to the following dialog box
is displayed. The required columns (shown in grey) are automatically selected
— you cannot deselect them.
Note: You must have both a PUT and a non-PUT package for every Foreign Key
relationship. Both Put and non-Put packages that you create for the same foreign key
relationship object must have the same columns.
• If you selected to create a non-Put enabled package for foreign key
relationships (see Step 7 of this procedure - do not check the Put check box),
the following dialog box is displayed:
9. If you are creating a non-Put enabled package for foreign key relationships, specify
the following information for this new package.
Field Description
Hierarchy Hierarchy associated with this package. For more information,
see “Configuring Hierarchies” on page 253.
Relationship Type Relationship type associated with this package. For more
information, see “Configuring Relationship Base Objects and
Relationship Types” on page 255.
Note: You must have both a PUT and a non-PUT package for every Foreign Key
relationship. Both Put and non-Put packages that you create for the same foreign key
relationship object must have the same columns.
10. Select the columns for this new package.
Use the Packages tool to view, edit, or delete this newly-created package, as described
in “Configuring Packages” on page 196.
You should not remove columns that are needed by Hierarchy Manager. These
columns are automatically selected (and greyed out) when the user creates packages
using the Hierarchies tool.
The numbers in the cells define the sequence in which the attributes are displayed.
3. Configure the package for your entity or relationship type.
Label Columns used to display the label of the entity/relationship you are
viewing in the HM graphical console. These columns are used to
create the Label Pattern in the Hierarchy Manager Console and
Business Data Director.
To edit a label, click the label value to the right of the label. In the
Edit Pattern dialog, enter a new label or double-click a column to use
it in a pattern.
Tooltip Columns used to display the description or comment that appears
when you scroll over the entity/relationship. Used to create the
tooltip pattern in the Hierarchy Manager Console and Business Data
Director.
To edit a tooltip, click the tooltip pattern value to the right of the
Tooltip Pattern label. In the Edit Pattern dialog, enter a new tooltip
pattern or double-click a column to use it in a pattern.
Label Columns used to display the label of the entity/relationship you are
viewing in the HM graphical console. These columns are used to
create the Label Pattern in the Hierarchy Manager Console and
Business Data Director.
To edit a label, click the label value to the right of the label. In the
Edit Pattern dialog, enter a new label or double-click a column to use
it in a pattern.
Common Columns used when entities/relationships of different types are
displayed in the same list. The selected columns must be in packages
associated with all Entity/Relationship Types in the Profile.
Search Columns that can be used with the search tool
List Columns to be displayed in a search result
Detail Columns used for the detailed view of an entity/relationship
displayed at the bottom of the screen
Put Columns that are displayed when you want to edit a record
Add Columns that are displayed when you want to create a new record
4. When you have finished making changes, click to save your changes.
Configuring Profiles
This section describes how to configure profiles using the Hierarchies tool.
About Profiles
In Hierarchy Manager, a profile is used to define user access to HM objects—what users
can view and what the HM objects look like to those users. A profile determines what
fields and records an HM user may display, edit, or add. For example, one profile can
allow full read/write access to all entities and relationships, while another profile can be
read-only (no add or edit operations allowed). Once you define a profile, you can
configure it as a secure resource, as described in “Securing Siperian Hub Resources” on
page 841.
Adding Profiles
A new profile (called Default) is created automatically for you before you access the
HM. The default profile can be maintained, and you can also add additional profiles.
Note: The Business Data Director uses the Default Profile to define how Entity
Labels as well as Relationship and Entity Tooltips are displayed. Additional Profiles, as
well as the additional information defined within Profiles, is only used within the
Hierarchy Manager Console and not the Business Data Director.
2. In the Hierarchy tool, right-click anywhere in the navigation pane and choose Add
Profiles.
The Hierarchies tool displays a new profile (called New Profile) in the navigation
tree under the Profiles node. The default properties are displayed in the properties
pane.
When you select these relationship types and click Save, the tree below the Profile
will be populated with Entity Objects, Entity Types, Rel Objects and Rel Types.
When you deselect a Rel type, only the Rel types will be removed from the tree -
not the Entity Types.
3. Specify the following information for this new profile.
Field Description
Name Unique, descriptive name for this profile.
Description Description of this profile.
Relationship Types Select one or more relationship types associated with this profile.
Editing Profiles
To edit a profile:
1. Acquire a write lock.
2. In the Hierarchies tool, in the navigation tree, click the profile that you want to
edit.
3. Configure the profile as needed (specifying the appropriate profile name,
description, and relationship types and assigning packages), according to the
instructions in “Adding Profiles” on page 278 and “Configuring Packages for Use
by HM” on page 269.
4. When you have finished making changes, click to save your changes.
Validating Profiles
To validate a profile:
1. Acquire a write lock.
2. In the Hierarchies tool, in the navigation pane, select the profile to validate.
The Hierarchies tool displays a progress window during the validation process. The
results of the validation appear in the window below the buttons.
Copying Profiles
To copy a profile:
1. Acquire a write lock.
2. In the Hierarchies tool, right-click the profile that you want to copy, and then
choose Copy Profile.
The Hierarchies tool displays a new profile (called New Profile) in the navigation
tree under the Profiles node. This new profile that is an exact copy (with a
different name) of the profile that you selected to copy. The default properties are
displayed in the properties pane.
Deleting Profiles
To delete a profile:
1. Acquire a write lock.
2. In the Hierarchies tool, right-click the profile that you want to delete, and choose
Delete Profile.
The Hierarchies tool displays a window that warns that packages will be removed
when you delete this profile.
3. Click Yes.
The Hierarchies tool removes the deleted profile.
2. In the Hierarchy tool, right-click the relationship type and choose Delete Entity
Type/Relationship Type From Profile.
If the profile contains relationship types that use the entity/relationship type that
you want to delete, you will not be able to delete it unless you delete the
relationship type from the profile first.
2. In the Hierarchy tool, right-click the entity type and choose Delete Entity
Type/Relationship Type From Profile.
If the profile contains relationship types that use the entity type that you want to
delete, you will not be able to delete it unless you delete the relationship type from
the profile first.
Configure the package as a secure resource. To learn more, see “Securing Siperian Hub
Resources” on page 841.
Sandboxes
To learn about sandboxes, see the Hierarchy Manager chapter in the Siperian Hub Data
Steward Guide.
Contents
• Chapter 9, “Siperian Hub Processes”
• Chapter 10, “Configuring the Land Process”
• Chapter 11, “Configuring the Stage Process”
• Chapter 12, “Configuring Data Cleansing”
• Chapter 13, “Configuring the Load Process”
• Chapter 14, “Configuring the Match Process”
• Chapter 15, “Configuring the Consolidate Process”
• Chapter 16, “Configuring the Publish Process”
285
286 Siperian Hub Administrator Guide
9
Siperian Hub Processes
This chapter provides an overview of the processes associated with batch processing in
Siperian Hub, including key concepts, tasks, and references to related topics in the
Siperian Hub documentation.
Chapter Contents
• About Siperian Hub Processes
• Land Process
• Stage Process
• Load Process
• Match Process
• Consolidate Process
• Publish Process
287
About Siperian Hub Processes
Note: The publish process is not shown in this figure because it is not a batch process.
Consolidation Indicator
Indicator
Value State Name Description
1 CONSOLIDATED Indicates the record has been through the
match and merge process.
2 UNMERGED Indicates that the record has gone through the
match process.
3 QUEUED_FOR_MATCH Indicates that the record is ready to be put
through the match process against the rest of
the records in the base object.
4 NEWLY_LOADED Indicates that the record has been newly loaded
into the base object and has not gone through
the match process.
Indicator
Value State Name Description
9 ON_HOLD Indicates that the Data Steward has put the
record on hold, to deal with later.
Siperian Hub updates the consolidation indicator for base object records in the
following sequence.
1. During the load process, when a new or updated record is loaded into a base
object, Siperian Hub assigns the record a consolidation indicator of 4, indicating
that the record needs to be matched.
2. Near the start of the match process, when a record is selected as a match
candidate, the match process changes its consolidation indicator to 3.
Note: Any change to the match or merge configuration settings will trigger a reset
match dialog, asking whether you want to reset the records in the base object
(change the consolidation indicator to 4, ready for match). For more information,
see Chapter 14, “Configuring the Match Process,” and Chapter 15, “Configuring
the Consolidate Process.”
3. Before completing, the match process changes the consolidation indicator of
match candidate records to 2 (ready for consolidation).
Note: The match process may or may not have found matches for the record.
A record with a consolidation indicator of 2 or 4 is visible in Merge Manager.
For more information, see the Siperian Hub Data Steward Guide.
4. If Accept All Unmatched Rows as Unique is enabled, and a record has undergone
the match process but no matches were found, then Siperian Hub automatically
changes its consolidation indicator to 1 (unique). For more information, see
“Accept All Unmatched Rows as Unique” on page 492.
5. If Accept All Unmatched Rows as Unique is enabled, after the record has
undergone the consolidate process, and once a record has no more duplicates to
merge with, Siperian Hub changes its consolidation indicator to 1, meaning that
this record is unique in the base object, and that it represents the master record
(best version of the truth) for that entity in the base object.
Note: Once a record has its consolidation indicator set to 1, Siperian Hub will
never directly match it against any other record. New or updated records (with a
consolidation indicator of 4) can be matched against consolidated records.
Survivorship applies to both trust-enabled columns and columns that are not trust
enabled. When comparing cells from two different records, Siperian Hub determines
survivorship based on the following factors, in order of precedence:
1. If the two columns are trust-enabled, then the data with the highest trust score
wins.
2. If there are no trust scores, then the data with the more recent LAST_UPDATE_
DATE wins.
3. If trust scores are the same from both systems, then the data with the more recent
cross-reference SRC_LUD wins.
4. If the SRC_LUD values are equal, then Siperian Hub compares whether the record
is an incoming load update (applies to the load process only).
5. If both records are incoming load updates, then Siperian Hub compares the
LAST_UPDATE_DATE values in the associated cross-reference records and the
one with the more recent LAST_UPDATE_DATE wins.
6. If the LAST_UPDATE_DATE values are equal, then Siperian Hub compares the
ROWID_OBJECT, in numeric descending order. The highest ROWID_OBJECT
has the winning values.
Land Process
This section describes concepts and tasks associated with the land process in Siperian
Hub.
Landing data involves the transfer of data from one or more source systems to Siperian
Hub landing tables.
• A source system is an external system that provides data to Siperian Hub. Source
systems can be applications, data stores, and other systems that are internal to your
organization, or obtained or purchased from external sources. For more
information, see “About Source Systems” on page 348.
• A landing table is a table in the Hub Store that contains the data that is initially
loaded from a source system. For more information, see “About Landing Tables”
on page 355.
The following figure shows the land process in relation to other Siperian Hub
processes.
The land process is external to Siperian Hub and is executed using an external batch
process or an external application that directly populates landing tables in the Hub
Store. Subsequent processes for managing data are internal to Siperian Hub.
For any given source system, the approach used depends on whether it is the most
efficient—or perhaps the only—way to data from a particular source system. In
addition, batch processing is often used for the initial data load (the first time that
business data is loaded into the Hub Store), as it can be the most efficient way to
populate the landing table with a large number of records. For more information, see
“Initial Data Loads and Incremental Loads” on page 302.
Note: Data in the landing tables cannot be deleted until after the load process for the
base object has been executed and completed successfully.
Task Topic(s)
Configuration Chapter 10, “Configuring the Land Process”
• “Configuring Source Systems” on page 348
• “Configuring Landing Tables” on page 355
Execution Execution of the land process is external to Siperian Hub and
depends on the approach you are using to populate landing tables, as
described in “Ways to Populate Landing Tables” on page 294.
Task Topic(s)
Application If you are using external application(s) to populate landing tables, see
Development the developer documentation for the API used by your application(s).
Stage Process
This section describes concepts and tasks associated with the stage process in Siperian
Hub.
Data is transferred according to mappings that link a source column in the landing
table with a target column in the staging table. Mappings also define data cleansing, if
any, to perform on the data before it is saved in the target table.
If delta detection is enabled (see “Configuring Delta Detection for a Staging Table” on
page 401), Siperian Hub detects which records in the landing table are new or updated
and then copies only these records, unchanged, to the corresponding RAW table.
Otherwise, all records are copied to the target table. Records with obvious problems in
the data are rejected and stored in a corresponding reject table, which can be inspected
after running the stage process (see “Viewing Rejected Records” on page 685).
Data from landing tables can be distributed to multiple staging tables. However, each
staging table receives data from only one landing table.
The stage process prepares data for the load process, described in “Load Process” on
page 299, which subsequently loads data from the staging table into a target
table—either a base object or a dependent object.
The following figure shows the stage process in relation to other Siperian Hub
processes.
The following tables in the Hub Store are associated with the stage process.
Task Topic(s)
Configuration Chapter 11, “Configuring the Stage Process”
• “Configuring Staging Tables” on page 364
• “Mapping Columns Between Landing and Staging Tables” on
page 380
• “Using Audit Trail and Delta Detection” on page 398
Chapter 12, “Configuring Data Cleansing”
• “Configuring Cleanse Match Servers” on page 407
• “Using Cleanse Functions” on page 414
• “Configuring Cleanse Lists” on page 440
Execution Chapter 17, “Using Batch Jobs”
• “Stage Jobs” on page 745
Chapter 18, “Writing Custom Scripts to Execute Batch Jobs”
• “Stage Jobs” on page 795
Application Siperian Services Integration Framework Guide
Development
Load Process
This section describes concepts and tasks associated with the load process in Siperian
Hub. For related tasks, see “Managing the Load Process” on page 316.
The load process determines what to do with the data in the staging table based on:
• whether the target table is a base object or dependent object
• whether a corresponding record already exists in the target table and, if so, whether
the record in the staging table has been updated since the load process was last run
• whether trust is enabled for certain columns (base objects only); if so, the load
process calculates trust scores for the cell data
• whether the data is valid to load; if not, the load process rejects the record instead
• other configuration settings
During the initial data load, all records in the staging table are inserted into the base
object as new records. For more information, see “Load Inserts” on page 306.
Once the initial data load has occurred for a base object, any subsequent load processes
are called incremental loads because only new or updated data is loaded into the base
object.
Duplicate data is ignored. For more information, see “Run-time Execution Flow of the
Load Process” on page 304.
Trust Settings
If a column in a base object derives its data from multiple source systems, Siperian Hub
uses trust to help with comparing the relative reliability of column data from different
source systems. For example, the Orders system might be a more reliable source of
billing addresses than the Direct Marketing system.
Trust is enabled and configured at the column level. For example, you can specify a
higher trust level for Customer Name in the Orders system and for Phone Number in
the Billing system.
Trust provides a mechanism for measuring the relative confidence factor associated
with each cell based on its source system, change history, and other business rules.
Trust takes into account the quality and age of the cell data, and how its reliability
decays (decreases) over time. Trust is used to determine survivorship (when two
records are consolidated) and whether updates from a source system are sufficiently
reliable to update the master record. For more information, see “Survivorship and
Order of Precedence” on page 291 and “Configuring Trust for Source Systems” on
page 455.
Data stewards can manually override a calculated trust setting if they have direct
knowledge that a particular value is correct. Data stewards can also enter a value
directly into a record in a base object. For more information, see the Siperian Hub Data
Steward Guide.
Validation Rules
Trust is often used in conjunction with validation rules, which might downgrade (reduce)
trust scores according to configured conditions and actions. For more information, see
“Configuring Validation Rules” on page 468.
When data meets the criterion specified by the validation rule, then the trust value for
that data is downgraded by the percentage specified in the validation rule. For example:
Downgrade trust on First_Name by 50% if Length < 3
Downgrade trust on Address Line 1, City, State, Zip and Valid_
address_ind if Valid_address_ind= ‘False’
If the Reserve Minimum Trust flag is enabled (checked) for a column, then the trust
cannot be downgraded below the column’s minimum trust setting.
The load process handles staging table records in batches. For each base object, the
load batch size setting (see “Load Batch Size” on page 103) specifies the number of
records to load per batch cycle (default is 1000000).
During execution of the load process for a base object, Siperian Hub creates a
temporary table (_TLL) for each batch as it cycles through records in the staging table.
For example, suppose the staging table contained 250 records to load, and the load
batch size were set to 100. During execution, the load process would:
• create a TLL table and process the first 100 records
• drop and create the TLL table and process the second 100 records
• drop and create the TLL table and process the remaining 50 records
• drop and create the TLL table and stop executing because the TLL table contained
no records
During the load process, Siperian Hub first checks to see whether the record has the
same primary key as an existing record from the same source system. It compares each
record in the staging table with records in the target table to determine whether it
already exists in the target table.
During the load process, load updates are executed first, followed by load inserts.
Load Inserts
What happens during a load insert depends on the target table (base object or
dependent object) and other factors.
• For each new record in the base object, the load process sets its DIRTY_IND to 1
so that match keys can be regenerated during the tokenization process, as
described in “Base Object Records Flagged for Tokenization” on page 323.
• For each new record in the base object, the load process sets its
CONSOLIDATION_IND to 4 (ready for match) so that the new record can
matched to other records in the base object. For more information, see
“Consolidation Status for Base Object Records” on page 289.
• The load process inserts a record into the cross-reference table associated with the
base object. The load process generates a primary key value for the cross-reference
table, then copies into this new record the generated key, an identifier for the
source system, and the columns in the staging table (including PKEY_SRC_
OBJECT). For more information, see “Cross-Reference Tables” on page 97.
Note: The base object does not contain the primary key value from the source
system. Instead, the base object’s primary key is the generated ROWID_OBJECT
value. The primary key from the source system (PKEY_SRC_OBJECT) is stored
in the cross-reference table instead.
• If history is enabled for the base object (see “History Tables” on page 100), then
the load process inserts a record into its history and cross-reference history tables.
• If trust is enabled for one or more columns in the base object, then the load
process also inserts records into control tables that support the trust algorithms,
populating the elements of trust and validation rules for each trusted cell with the
values used for trust calculations. This information can be used subsequently to
calculate trust when needed. For more information, see “Configuring Trust for
Source Systems” on page 455 and “Control Tables for Trust-Enabled Columns”
on page 457.
• If Generate Match Tokens on Load is enabled for a base object (see “Generate
Match Tokens on Load” on page 104), then the tokenization process is
automatically started after the load process completes.
For load inserts into target dependent objects, the load process:
• inserts the new record into the dependent object
• substitutes any foreign keys required to maintain referential integrity
Load Updates
What happens during a load update depends on the target table (base object or
dependent object) and other factors.
• If the record in the staging table has been updated since the last time the
record was supplied by the source system, then the load process proceeds with
the load update.
• If the record in the staging table is unchanged since the last time the record
was supplied by the source system, then the load process ignores the record (no
action is taken) if the dates are the same and trust is not enabled, or rejects the
record if it is a duplicate.
Administrators can change the default behavior so that the load process bypasses
this LAST_UPDATE_DATE check and forces an update of the records regardless
of whether the records might have already been loaded. For more information, see
“Forcing Updates in Load Jobs” on page 730.
• The load process performs foreign key lookups and substitutes any foreign key
value(s) required to maintain referential integrity. For more information, see
“Performing Lookups Needed to Maintain Referential Integrity” on page 312.
• If the target base object has trust-enabled columns, then the load process:
• calculates the trust score for each trust-enabled column in the record to be
updated, based on the configured trust settings for this trusted column (as
described in “Configuring Trust for Source Systems” on page 455)
• applies validation rules, if defined, to downgrade trust scores where applicable
(see “Configuring Validation Rules” on page 468)
The load process updates the target record in the base object according to the
following rules:
• If the trust score for the cell in the staging table record is higher than the trust
score in the corresponding cell in the target base object record, then the load
process updates the cell in the target record.
• If the trust score for the cell in the staging table record is lower than the trust
score in the corresponding cell in the target base object record, then the load
process does not update the cell in the target record.
• If the trust score for the cell in the staging table record is the same as the trust
score in the corresponding cell in the target base object record, or if trust is
not enabled for the column, then the cell value in the record with the most
recent LAST_UPDATE_DATE wins.
• If the staging table record has a more recent LAST_UPDATE_DATE,
then the corresponding cell in the target base object record is updated.
• If the target record in the base object has a more recent LAST_
UPDATE_DATE, then the cell is not updated.
For more information, see “Survivorship and Order of Precedence” on page 291.
• For each updated record in the base object, the load process sets its DIRTY_IND
to 1 so that match keys can be regenerated during the tokenization process. For
more information, see “Base Object Records Flagged for Tokenization” on page
323.
• For each updated record in the base object, the load process sets its
CONSOLIDATION_IND to 4 so that the updated record can matched to other
records in the base object. For more information, see “Consolidation Status for
Base Object Records” on page 289.
• Whenever the load process updates a record in the base object, it also updates the
associated record in the cross-reference table (“Cross-Reference Tables” on page
97), history tables (if history is enabled, see “History Tables” on page 100), and
other control tables as applicable.
• If Generate Match Tokens on Load is enabled for a base object (see “Generate
Match Tokens on Load” on page 104), then the tokenization process is
automatically started after the load process completes.
For load updates with target dependent objects, the load process updates the records in
the target dependent object with the values in the staging table without checking the last
update date.
Note: Data in staging tables from different source systems must have unique keys in
order to be loaded into a dependent object. Records coming from different source
systems each have their own key that uniquely identifies the record in that source
system. Siperian Hub considers any records from the same source system with the
same key values to be the same record. Therefore, if a record in the staging table has
the same key value as an existing cross-reference record, Siperian Hub performs a load
update because the record is considered to exist already in the base object.
Undefined Lookups
If a lookup on a child object is not defined (the lookup table and column were not
populated), before you can successfully load data, you must repeat the stage process for
the child object prior to executing the load process. For more information, see “Stage
Jobs” on page 745 and “Load Jobs” on page 727.
When configuring columns for a staging table in the Schema Manager, you can specify
whether to allow NULL foreign keys for target base objects; this setting does not apply
to dependent objects. In the Schema Manager, the Allow Null Foreign Key check box
(see “Properties for Columns in Staging Tables” on page 370) determines whether
NULL foreign keys are permitted.
• By default, the Allow Null Foreign Key check box is unchecked, which means that
NULL foreign keys are not allowed. The load process:
• accepts records valid lookup values
• rejects records with NULL foreign keys
• rejects records with invalid foreign key values
• If Allow Null Foreign Key is enabled (selected), then the load process:
• accepts records with valid lookup values
• accepts records with NULL foreign keys (and permits load inserts and load
updates for these records)
• rejects records with invalid foreign key values
The load process permits load inserts and load updates for accepted records only.
Rejected records are inserted into the reject table rather than being loaded into the
target table.
Note: During the initial data load only, when the target base object is empty, the load
process allows null foreign keys. For more information, see “Initial Data Loads and
Incremental Loads” on page 302.
During the load process, records in the staging table might be rejected for the
following reasons:
• future date or NULL date in the LAST_UPDATE_DATE column
• NULL value mapped to the PKEY_SRC_OBJECT of the staging table
• duplicates found in PKEY_SRC_OBJECT
• invalid value in the HUB_STATE_IND field (for state-enabled base objects only)
• invalid or NULL foreign keys, as described in “Allowing Null Foreign Keys” on
page 313
Rejected records will not be loaded into base objects or dependent objects. Rejected
records can be inspected after running Load jobs (see “Viewing Rejected Records” on
page 685).
For more information about configuring the behavior delta detection for duplicates
and the retention of records in the REJ and RAW tables for a staging table, see “Using
Audit Trail and Delta Detection” on page 398.
Note: To reject records, the load process requires traceability back to the landing table.
If you are loading a record from a staging table and its corresponding record in the
associated landing table has been deleted, then the load process does not insert it into
the reject table.
If the child table contains generated keys from the parent table, the load process copies
the appropriate primary key value from the parent table into the child table.
For example, suppose you had the following data.
PARENT TABLE:
PARENT_ID FNAME LNAME
101 Joe Smith
102 Jane Smith
The load process has special considerations when processing records for state-enabled
base objects. For more information, see “Rules for Loading Data” on page 221.
Note: The load process rejects any record from the staging table that has an invalid
value in the HUB_STATE_IND column. For more information, see “About the Hub
State Indicator” on page 207.
Tokenizing data prepares it for the match process. In the Schema Manager, when
configuring a base object, you can specify whether to generate match tokens
immediately after the Load job completes, or to delay tokenizing data until the Match
job runs. The setting of the Generate Match Tokens on Load check box determines
when tokenization occurs. For more information, see “Match Process” on page 317
and “Generate Match Tokens on Load” on page 104.
Task Topic(s)
Configuration Chapter 13, “Configuring the Load Process”
• “Configuring Trust for Source Systems” on page 455
• “Configuring Validation Rules” on page 468
Execution Chapter 17, “Using Batch Jobs”
• “Load Jobs” on page 727
• “Synchronize Jobs” on page 747
• “Revalidate Jobs” on page 745
Chapter 18, “Writing Custom Scripts to Execute Batch Jobs”
• “Load Jobs” on page 775
• “Synchronize Jobs” on page 796
• “Revalidate Jobs” on page 794
Application Siperian Services Integration Framework Guide
Development
Match Process
This section describes concepts and tasks associated with the match process in Siperian
Hub.
In Siperian Hub, the match process provides you with two main ways in which to
compare records and determine duplicates:
• Fuzzy matching is the most common means used in Siperian Hub to match records
in base objects. Fuzzy matching looks for sufficient points of similarity between
records and makes probabilistic match determinations that consider likely
variations in data patterns, such as misspellings, transpositions, the combining or
splitting of words, omissions, truncation, phonetic variations, and so on.
• Exact matching is less commonly-used because it matches records with identical
values in the match column(s). An exact strategy is faster, but an exact match
might miss some matches if the data is imperfect.
The best option to choose depends on the characteristics of the data, your knowledge
of the data, and your particular match and consolidation requirements. For more
information, see “Exact-match and Fuzzy-match Base Objects” on page 320.
During the match process, Siperian Hub compares records in the base object for points
of similarity. If the match process finds sufficient points of similarity (identical or
similar matches) between two records, indicating that the two records probably are
duplicates of each other, then the match process:
• populates a match table with ROWID_OBJECT references to matched record
pairs, along with the match rule that identified the match, and whether the
matched records qualify for automatic consolidation
Match Rules
A match rule defines the criteria by which Siperian Hub determines whether two records
in the base object might be duplicates. Siperian Hub supports two types of match rules:
Type Description
Match column rules Used to match base object records based on the values in columns
you have defined as match columns, such as last name, first name,
address1, and address2. This is the most commonly-used method
for identifying matches. For more information, see “Configuring
Match Columns” on page 515.
Primary key match rules Used to match records from two systems that use the same
primary keys for records. It is uncommon for two different source
systems to use identical primary keys. However, when this does
occur, primary key matches are quick and very accurate. For more
information, see “Configuring Primary Key Match Rules” on page
578.
Both kinds of match rules can be used together for the same base object.
The type of base object determines the type of match and the type of match columns
you can define. The base object type is determined by the selected match / search
strategy for the base object. For more information, see “Match/Search Strategy” on
page 493.
Table Description
match key table Contains the match keys that were generated for all base object records.
A match key table uses the following naming convention:
C_baseObjectName_STRP
where baseObjectName is the root name of the base object.
Example: C_PARTY_STRP. For more information, see “Columns in
Match Key Tables” on page 325.
match table Contains the pairs of matched records in the base object resulting from
the execution of the match process on this base object.
Match tables use the following naming convention:
C_baseObjectName_MTCH
where baseObjectName is the root name of the base object.
Example: C_PARTY_MTCH. For more information, see “Populating the
Match Table with Match Pairs” on page 330.
Note: Link-style base objects use a link table (*_LNK) instead.
match flag audit Contains the userID of the user who, in Merge Manager, queued a manual
table match record for automerging.
Match flag audit tables use the following naming convention:
C_baseObjectName_FHMA
where baseObjectName is the root name of the base object.
Used only if Match Flag Audit Table is enabled for this base object, as
described in “Match Flag Audit Table” on page 105.
Match keys are strings that encode data in the columns used to identify candidates for
matching. Match keys are fixed length, compressed, and encoded values built from a
combination of the words and numbers in a name or address such that relevant
variations have the same match key value. Match tokens are strings consisting of match
keys plus the flattened data from the match columns.
The process of generating match tokens is called tokenization. Match tokens are stored
in the match key table associated with the base object. For each record in the base
object, tokenization stores one or more generated match keys in the match key table. In
the match token table, match tokens are stored in the SSA_KEY column, and match
tokens are the combination of data stored in the SSA_KEY plus the SSA_DATA
columns. For more information, see “Columns in Match Key Tables” on page 325.
Match keys are maintained independently of the match process. The match process
depends on the match keys in the match table being current. Updating match keys can
occur:
• after the load process (see “Generate Match Tokens on Load” on page 104), when
load inserts and load updates
• when it is put into the base object using SIF Put or CleansePut requests (see
“Generate Match Tokens on Load” on page 104, as well as the Siperian Services
Integration Framework Guide and the Siperian Hub Javadoc)
• when you run the Generate Match Tokens job (see “Generate Match Tokens Jobs”
on page 725)
• at the start of a match job, as described in “Regenerating Match Keys If Needed”
on page 329
• after consolidating data, as described in “Consolidate Process” on page 335
All base objects have a system column named DIRTY_IND. This dirty indicator
identifies when match keys need to be generated for the base object record. Match keys
are stored in the match key table.
For each record in the base object whose DIRTY_IND is 1, the tokenization process
generates match keys, and then resets the DIRTY_IND to 0.
The following figure shows how the DIRTY_IND flag changes during various batch
processes:
For fuzzy-match base objects, match keys are generated based on the following
settings:
Property Description
key type Identifies the primary type of information being tokenized (Person_Name,
Organization_Name, or Address_Part1) for this base object. The match process
uses its intelligence about name and address characteristics to generate match keys
and conduct searches. Available key types depend on the population set being
used, as described in “Population Sets” on page 326. For more information, see
“Key Types” on page 521.
key width Determines the thoroughness and speed of the search, the number of possible
match candidates returned, and how much disk space the keys consume. Available
key widths are Limited, Standard, Extended, and Preferred. For more
information, see “Key Widths” on page 522.
Because match keys must be able to overcome errors, variations, and word
transpositions in the data, Siperian Hub generates multiple match tokens for each
name, address, or organization. The number of keys generated per base object record
varies, depending on your data and the match key width.
The Match Keys Distribution tab in the Match / Merge Setup Details pane of the
Schema Manager allows you to investigate the distribution of match keys in the match
key table. This tool can assist you with identifying potential hot spots in your data—high
concentrations of match keys that could result in overmatching—where the match
process generates too many matches, including matches that are not relevant. For more
information, see “Investigating the Distribution of Match Keys” on page 583.
The match keys that are generated depend on your configured match settings and
characteristics of the data in the base object. The following example shows match keys
generated from strings using a fuzzy match / search strategy:
In this example, the strings BETH O'BRIEN and LIZ O'BRIEN (keys #3 and 5 in the
example) have the same match token values. The match process would consider these
to be match candidates while searching for match candidates during the match process.
Data Type
Column Name (Size) Description
ROWID_OBJECT CHAR (14) Identifies the record for which this match key was
generated.
SSA_KEY CHAR (8) Generated match token for this record.
Data Type
Column Name (Size) Description
SSA_DATA VARCHAR2 Concatenated, plain text string representing the
(500) source data from all of the match columns defined
in the base object—not just the match key stored
in the SSA_KEY column.
Tokenization Ratio
You can configure the match process to repeat the tokenization process whenever the
percentage of changed records exceeds the specified ratio, which is configured as an
advanced property in the base object. For more information, see “Complete Tokenize
Ratio” on page 102.
Population Sets
For base objects with the fuzzy match/search strategy, the match process uses standard
population sets to account for national, regional, and language differences. The
population set affects how the match process handles tokenization, the match / search
strategy, and match purposes. For more information, see “Fuzzy Population” on page
494.
A population set encapsulates intelligence about name, address, and other identification
information that is typical for a given population. For example, different countries use
different address formats, such as the placement of street numbers and street names,
location of postal codes, and so on. Similarly, different regions have different
distributions for surnames—the surname “Smith” is quite common in the United
States population, for example, but not so common for other parts of the world.
Population sets improve match accuracy by accommodating for the variations and
errors that are likely to appear in data for a particular population. For more
information, see “Configuring Match Settings for Non-US Populations” on page 941.
The match for duplicate data functionality is used to generate matches for duplicates of
all non-system base object columns. These matches are generated when there are more
than a set number of occurrences of complete duplicates on the base object columns
(see “Duplicate Match Threshold” on page 103). For most data, the optimal value is 2.
Although the matches are generated, the consolidation indicator (see “Consolidation
Indicator” on page 289) remains at 4 (unconsolidated) for those records, so that they
can be later matched using the standard match rules.
Note: The Match for Duplicate Data job is visible in the Batch Viewer if the threshold
is set above 1 and there are no NON_EQUAL match rules defined on the
corresponding base object. For more information, see “Match for Duplicate Data
Jobs” on page 740.
The Build Match Group (BMG) process removes redundant matching in advance of
the consolidate process. For example, suppose a base object had the following match
pairs:
• record 1 matches to record 2
• record 2 matches to record 3
• record 3 matches to record 4
After running the match process and creating build match groups, and before the
running consolidation process, you might see the following records:
• record 2 matches to record 1
• record 3 matches to record 1
• record 4 matches to record 1
In this example, there was no explicit rule that matched 4 to 1. Instead, the match was
made indirectly due to the behavior of other matches (record 1 matched to 2, 2 matched
to 3, and 3 matched to 4). An indirect matching is also known as a transitive match. In
the Merge Manager and Data Manager, you can display the complete match history to
expose the details of transitive matches.
You can configure the maximum number of manual matches to process during batch
jobs. Setting a limit helps prevent data stewards from being overwhelmed with
thousands of manual consolidations to process. Once this limit is reached, the match
process stops running run until the number of records ready for manual consolidation
has been reduced. For more information, see “Maximum Matches for Manual
Consolidation” on page 490 and “Consolidate Process” on page 335.
Siperian Hub provides a way to match new data with an existing base object without
actually loading the data into the base object. Rather than run an entire Match job, you
can run the External Match job instead to test for matches and inspect the results. For
more information, see “External Match Jobs” on page 719.
For your Siperian Hub implementation, you can increase the throughput of the match
process by running multiple Cleanse Match Servers in parallel. For more information,
see “Configuring Cleanse Match Servers” on page 407 and the material about
distributed Cleanse Match Servers in the Siperian Hub Installation Guide for your
platform.
When running very large Match jobs with large match batch sizes, if there is a failure of
the application server or the database, you must re-run the entire batch. Match batches
are a unit. There are no incremental checkpoints. To address this, if you think there
might be a database or application server failure, set your match batch sizes smaller to
reduce the amount of time that will be spent re-running your match batches. For more
information, see “Number of Rows per Match Job Batch Cycle” on page 491 and
“Match Jobs” on page 734.
The Merge job executes the match process for a single match batch (see “Flagging the
Match Batch” on page 329). The Auto Match and Merge job cycles repeatedly until
there are no more records to match (no more base object records with a
CONSOLIDATION_IND = 4).
The following base object records are ignored during the match process:
• Records with a CONSOLIDATION_IND of 9 (on hold).
• Records with a PENDING or DELETED status. PENDING records can be
included if explicitly enabled according to the instructions in “Enabling Match on
Pending Records” on page 214.
When the match process (such as a Match or Auto Match and Merge job) executes, it
first checks to determine whether match keys need to be generated for any records in
the base object and, if so, generates the match keys and updates the match key table.
Match keys will be generated if the c_repos_table.STRIP_INCOMPLETE_IND flag
for the base object is 1, or if any base object records have a DIRTY_IND=1 (see
“Base Object Records Flagged for Tokenization” on page 323). For more information,
see “Match Keys and the Tokenization Process” on page 322.
The match process cycles through a series of batches until there are no more base
object records to process. It matches a subset of base object records (the match batch)
against all the records available for matching in the base object (the match pool). The size
of the match batch is determined by the Number of Rows per Match Job Batch Cycle
setting (“Number of Rows per Match Job Batch Cycle” on page 491).
For the match batch, the match process retrieves, in no specific order, base object
records that meet the following conditions:
• the record has a CONSOLIDATION_IND value of 4 (ready for match)
The load process sets the CONSOLIDATION_IND to 4 for any record that is
new (load insert) or updated (load update).
• the record qualifies based on rule set filtering, if configured (see “Enable Filtering”
on page 536 and “Filtering SQL” on page 536)
In this step, the match process applies the configured match rules to the match
candidates. The match process executes the match rules one at a time, in the
configured order. The match process executes exact-match rules and exact
match-column rules first, then it executes fuzzy-match rules.
The match process continues executing the match rules until there is a match or there
are no more rules to execute.
When all of the records in the match batch have been processed, the match process
adds all of the matches for that group to the match table and changes
CONSOLIDATION_IND=2 for the records in the match batch.
Match Pairs
The match process populates a match table for that base object. Each row in the match
table represents a pair of matched records in the base object. The match table stores
the ROWID_OBJECT values for each pair of matched records, as well as the identifier
for the match rule that resulted in the match, an automerge indicator, and other
information.
Match rules also determine how matched records are consolidated: automatically or
manually.
For more information, see “Specifying Consolidation Options for Matched Records”
on page 543.
Task Topic(s)
Configuration Chapter 14, “Configuring the Match Process”
• “Configuring Match Properties for a Base Object” on page 488
• “Configuring Match Paths for Related Records” on page 497
• “Configuring Match Columns” on page 515
• “Configuring Match Rule Sets” on page 531
• “Configuring Match Column Rules for Match Rule Sets” on
page 542
• “Configuring Primary Key Match Rules” on page 578
• “Investigating the Distribution of Match Keys” on page 583
• “Excluding Records from the Match Process” on page 590
Appendix A, “Configuring International Data Support”
• “Configuring Match Settings for Non-US Populations” on page
941
Task Topic(s)
Execution Chapter 17, “Using Batch Jobs”
• “Auto Match and Merge Jobs” on page 716
• “External Match Jobs” on page 719
• “Generate Match Tokens Jobs” on page 725
• “Key Match Jobs” on page 727
• “Match Jobs” on page 734
• “Match Analyze Jobs” on page 738
• “Match for Duplicate Data Jobs” on page 740
• “Reset Links Jobs” on page 744
• “Reset Match Table Jobs” on page 744
Chapter 18, “Writing Custom Scripts to Execute Batch Jobs”
• “Auto Match and Merge Jobs” on page 762
• “External Match Jobs” on page 766
• “Generate Match Token Jobs” on page 767
• “Key Match Jobs” on page 773
• “Match Jobs” on page 783
• “Match Analyze Jobs” on page 785
• “Match for Duplicate Data Jobs” on page 786
Application Siperian Services Integration Framework Guide
Development
Consolidate Process
This section describes concepts and tasks associated with the consolidate process in
Siperian Hub.
The following figure shows cell data in records from three different source systems
being consolidated into a single master record.
The following figure shows the consolidate process in relation to other Siperian Hub
processes.
Traceability
The goal in Siperian Hub is to identify and eliminate all duplicate data and to merge or
link them together into a single, consolidated record while maintaining full traceability.
Traceability is Siperian Hub functionality that maintains knowledge about which
systems—and which records from those systems—contributed to consolidated
records. Siperian Hub maintains traceability using cross-reference and history tables.
Option Description
base object style Determines whether the consolidate process using merging or
linking. For more information, see “Base Object Style” on page 106
and “Consolidation Options” on page 339.
immutable sources Allows you to specify source systems as immutable, meaning that
records from that source system will be accepted as unique and, once
a record from that source has been fully consolidated, it will not be
changed subsequently. For more information, see “Immutable Rowid
Object” on page 594.
distinct systems Allows you to specify source systems as distinct, meaning that the
data from that system gets inserted into the base object without being
consolidated. For more information, see “Distinct Systems” on page
595.
cascade unmerge for Allows you to enable cascade unmerging for child base objects and to
child base objects specify what happens if records in the parent base object are
unmerged. For more information, see “Unmerge Child When Parent
Unmerges (Cascade Unmerge)” on page 597.
child base object For two base objects in a parent-child relationship, if enabled on the
records on parent child base object, child records are resubmitted for the match process
merge if parent records are consolidated. For more information, see
“Requeue On Parent Merge” on page 104.
Consolidation Options
There are two ways to consolidate matched records:
• Merging (physical consolidation) combines the matched records and updates the
base object. Merging occurs for merge-style base objects (link is not enabled).
• Linking (virtual consolidation) creates a logical link between the matched records.
Linking occurs for link-style base objects (link is enabled).
Merging combines two or more records in a base object table. Depending on the
degree of similarity between the two records, merging is done automatically or
manually.
• Records that are definite matches are automatically merged (automerge process).
For more information, see “Automerge Jobs” on page 717.
• Records that are close but not definite matches are queued for manual review
(manual merge process) by a data steward in the Merge Manager tool. The data
steward inspects the candidate matches and selectively chooses matches that
should be merged. Manual merge match rules are configured to identify close
matches. For more information, see “Manual Merge Jobs” on page 732 and, for
the Merge Manager, see the Siperian Hub Data Steward Guide.
• Siperian Hub queues all other records for manual review by a data steward in the
Merge Manager tool.
Match rules are configured to identify definite matches for automerging and close
matches for manual merging.
For a base object, the best version of the truth (sometimes abbreviated as BVT) is a record
that has been consolidated with the best cells of data from the source records.
The precise definition depends on the base object style:
• For merge-style base objects, the base object record is the BVT record, and is built
by consolidating with the most-trustworthy cell values from the corresponding
source records.
• For link-style base objects, the BVT Snapshot job will build the BVT record(s) by
consolidating with the most-trustworthy cell values from the corresponding linked
base object records and return to the requestor a snapshot for consumption.
Task Topic(s)
Configuration Chapter 15, “Configuring the Consolidate Process”
• “About Consolidation Settings” on page 594
• “Changing Consolidation Settings” on page 598
Execution Siperian Hub Data Steward Guide
• “Managing Data”
• “Consolidating Data”
Chapter 17, “Using Batch Jobs”
• “Accept Non-Matched Records As Unique” on page 715
• “Auto Match and Merge Jobs” on page 716
• “Autolink Jobs” on page 715
• “Automerge Jobs” on page 717
• “BVT Snapshot Jobs” on page 719
• “Manual Link Jobs” on page 732
• “Manual Merge Jobs” on page 732
• “Manual Unlink Jobs” on page 733
• “Manual Unmerge Jobs” on page 733
• “Multi Merge Jobs” on page 741
• “Reset Links Jobs” on page 744
• “Reset Match Table Jobs” on page 744
• “Synchronize Jobs” on page 747
Chapter 18, “Writing Custom Scripts to Execute Batch Jobs”
• “Auto Match and Merge Jobs” on page 762
• “Autolink Jobs” on page 762
• “Automerge Jobs” on page 764
• “BVT Snapshot Jobs” on page 765
• “Manual Link Jobs” on page 777
• “Manual Unlink Jobs” on page 779
• “Manual Unmerge Jobs” on page 779
Application Development Siperian Services Integration Framework Guide
Publish Process
This section describes concepts and tasks associated with the publish process in
Siperian Hub.
Other external systems, processes, or applications can listen on the JMS message
queue, retrieve the XML messages, and process them accordingly.
Siperian Hub implementations use the publish process in support of stated business
and technical requirements. However, not all organizations will take advantage of this
functionality, and its use in Siperian Hub implementations is optional.
The processes previously described in this chapter—land, stage, load, match, and
consolidate—are all associated with reconciliation, which is the main inbound flow for
Siperian Hub. With reconciliation, Siperian Hub receives data from one or more source
systems, cleanses the data if applicable, and then reconciles “multiple versions of the
truth” to arrive at the master record—the best version of the truth—for that entity.
In contrast, the publish process belongs to the main Siperian Hub outbound
flow—distribution. Once the master record is established or updated for a given entity,
Siperian Hub can then (optionally) distribute the master record data to other
applications or databases. For an introduction to reconciliation and distribution, see the
Siperian Hub Overview. In another scenario, data changes can be sent to the Activity
Manager Rules queue so that the data change can be evaluated against user-defined
rules.
The land, stage, load, match, and consolidate processes work with batches of records
and are executed as batch jobs or stored procedures. In contrast, the publish process is
executed as the result of a message trigger that executes when a data change occurs in the
Hub Store. The message trigger creates an XML message that gets published on a JMS
message queue.
In this scenario:
1. A batch load or a real-time SIF API request (SIF put or cleanse_put request) may
result in an insert or update on a base object.
You can configure a message rule to control data going to the C_REPOS_MQ_
DATA_CHANGE table.
2. Hub Server polls data from C_REPOS_MQ_DATA_CHANGE table at regular
intervals.
3. For data that has not been sent, Hub Server constructs an XML message based on
the data and sends it to the outbound queue configured for the message queue.
4. It is the external application's responsibility to retrieve the message from the
outbound queue and process it.
Task Topic(s)
Configuration Chapter 16, “Configuring the Publish Process”
• “Configuring Global Message Queue Settings” on page 604
• “Configuring Message Queue Servers” on page 605
• “Configuring Outbound Message Queues” on page 608
• “Configuring Message Triggers” on page 612
• “Generating and Deploying ORS-specific Schemas” on page 827
Execution Siperian Hub publishes an XML message to an outbound message
queue whenever a messages trigger is fired. You do not need to
explicitly execute a batch job from the Batch Viewer or Batch Group
tool.
To monitor run-time activity for message queues using the Audit
Manager tool in the Hub Console, see “Auditing Message Queues”
on page 928.
Application Siperian Services Integration Framework Guide
Development
This chapter explains how to configure the land process for your Siperian Hub
implementation. For an introduction, see “Land Process” on page 292.
Chapter Contents
• Before You Begin
• Configuration Tasks for the Land Process
• Configuring Source Systems
• Configuring Landing Tables
347
Before You Begin
If multiple source systems contribute data for the same column in a base object, you
can configure trust on a column-by-column basis to specify which source system(s) are
more reliable providers of data (relative to other source systems) for that column. Trust
is used to determine survivorship when two records are consolidated, and whether
updates from a source system are sufficiently reliable to update the “best version of the
truth” record. For more information, see “Configuring Trust for Source Systems” on
page 455.
Siperian Hub uses an administration source system for manual trust overrides and data
edits from the Data Manager or Merge Manager tools, which are described in the
Siperian Hub Data Steward Guide. This administration source system can contribute data
to any trust-enabled column. The administration source system is named Admin by
default, but you can optionally change its name according to the instructions in
“Editing Source System Properties” on page 353.
The source systems that you define in the Systems and Trust tool are stored in a special
public Siperian Hub repository table (C_REPOS_SYSTEM, with a display name of
MRM System). This table is visible in the Schema Manager if the Show System Tables
option is selected (for more information, see “Changing the Item View” on page 39).
C_REPOS_SYSTEM can also be used in packages, as described in “Configuring
Packages” on page 196.
The Hub Console displays the Systems and Trust tool, as shown in the following
example.
Pane Description
Navigation Systems: List of every source system that contributes data to Siperian Hub,
including the administration source system described in “Administration
Source System” on page 349.
Trust: Expand the tree to display:
• base objects containing one or more trust-enabled columns
• trust-enabled columns (only)
For more information about configuring trust for base object columns, see
“Configuring Trust for Source Systems” on page 455.
Properties Properties for the selected source system. Trust settings for the base object
column if the base object column is selected.
Property Description
Name Unique, descriptive name for this source system.
Primary Key Primary key for this source system. Unique identifier for this system in the
ROWID_SYSTEM column of C_REPOS_SYSTEM. Read only.
Description Optional description for this source system.
4. Specify the source system properties. For more information, see “Source System
Properties” on page 351.
5. Click OK.
The Systems and Trust tool displays the newly-added source system in the list of
source systems.
Note: When you add a source system, Hub Store uses the first 14 characters of the
system name (in all uppercase letters) as its primary key (ROWID_SYSTEM value
in C_REPOS_SYSTEM).
Note: If this source system has already contributed data to your Siperian Hub
implementation, Siperian Hub continues to track the lineage (history) of data from this
source system even after you have renamed it.
4. Change any of the editable properties. For more information, see “Source System
Properties” on page 351.
5. To change trust settings for a source system, see “Configuring Trust for Source
Systems” on page 455.
6. Click the button to save your changes.
Note: Removing a source system deletes only the source system definition in the Hub
Console—it has no effect outside of Siperian Hub.
The manner in which source systems populate landing tables with data is entirely
external to Siperian Hub. The data model you use for collecting data in landing tables
from various source systems is also external to Siperian Hub. One source system could
populate multiple landing tables. A single landing table could receive data from
different source systems. The data model you use is entirely up to your particular
implementation requirements.
Inside Siperian Hub, however, landing tables are mapped to staging tables, as described
in “Mapping Columns Between Landing and Staging Tables” on page 380. It is in the
staging table—mapped to a landing table—where the source system supplying the data
to the base object is identified. During the load process, Siperian Hub copies data from
a landing table to a target staging table, tags the data with the source system
identification, and optionally cleanses data in the process. A landing table can be
mapped to one or more staging tables. A staging table is mapped to only one landing
table.
As described in “Ways to Populate Landing Tables” on page 294, landing tables are
populated using batch or real-time approaches that are external to Siperian Hub.
After a landing table is populated, the stage process pulls data from the landing tables,
further cleanses the data if appropriate, and then populates the appropriate staging
tables. For more information, see “Stage Process” on page 295.
Note: If the source system table has a multiple-column key, concatenate these columns
to produce a single unique VARCHAR value for the primary key column.
Property Description
Item Type Type of table that you are adding. Select Landing Table.
Display Name Name of this landing table as it will be displayed in the Hub Console.
Physical Name Actual name of the landing table in the database. Siperian Hub will
suggest a physical name for the landing table based on the display name
that you enter.
Data Tablespace Name of the data tablespace for this landing table. For more
information, see the Siperian Hub Installation Guide for your platform.
Index Tablespace Name of the index tablespace for this landing table. For more
information, see the Siperian Hub Installation Guide for your platform.
Description Description of this landing table.
Create Date Date and time when this landing table was created.
Contains Full Specifies whether this landing table contains the full data set from the
Data Set source system, or only updates.
• If selected (default), indicates that this landing table contains the full
set of data from the source system (such as for the initial data load).
When this check box is enabled, you can configure Siperian Hub’s
delta detection feature (see “Configuring Delta Detection for a
Staging Table” on page 401) so that, during the stage process, only
changed records are copied to the staging table.
• If not selected, indicates that this landing table contains only
changed data from the source system (such as for incremental
loads). In this case, Siperian Hub assumes that you filtered out
unchanged records before populating the landing table. Therefore,
the stage process inserts all records from the landing table directly
into the staging table. When this check box is enabled, Siperian
Hub’s delta detection feature is not available.
Note: You can change this property only when editing the source
system properties, as described in “Editing Source System Properties”
on page 353.
5. Specify the properties (described in “Landing Table Properties” on page 357) for
this new landing table.
6. Click OK.
The Schema Manager creates the new landing table in the Operational Record
Store (ORS), along with support tables, and then adds the new landing table to the
schema tree.
7. Configure the columns for your landing table according to the instructions in
“Configuring Columns in Tables” on page 125.
8. If you want to configure this landing table to contain only changed data from the
source system (Contains Full Data Set), edit the landing table properties according
to the instructions in “Editing Landing Table Properties” on page 360.
4. Change the landing table properties you want. For more information, see “Landing
Table Properties” on page 357.
5. Click the button to save your changes.
6. Change the column configuration for your landing table, if you want, according to
the instructions in “Configuring Columns in Tables” on page 125.
This chapter explains how to configure the data staging process for your Siperian Hub
implementation. For an introduction, see “Stage Process” on page 295. In addition, to
learn about cleansing data during the data staging process, see Chapter 12,
“Configuring Data Cleansing.”
Chapter Contents
• Before You Begin
• Configuration Tasks for the Stage Process
• Configuring Staging Tables
• Mapping Columns Between Landing and Staging Tables
• Using Audit Trail and Delta Detection
363
Before You Begin
The structure of a staging table is directly based on the structure of the target object
that will contain the consolidated data. You use the Schema Manager in the Model
workbench to configure staging tables.
Note: You must have at least one source system defined before you can define a
staging table. For more information, see “Configuring Source Systems” on page 348.
Staging tables must be based on the columns provided by the source system for the
target base object or dependent object for which the staging table is defined, even if the
landing tables are shared across multiple source systems. If you do not make the
column on staging tables source-specific, then you create unnecessary trust and
validation requirements.
Trust is a powerful mechanism, but it carries performance overhead. Use trust where it
is appropriate and necessary, but not where the most recent cell value will suffice for
the surviving record.
If you limit the columns in the staging tables to the columns actually provided by the
source systems, then you can restrict the trust columns to those that come from two or
more staging tables. Use this approach instead of treating every column as if it comes
from every source, which would mean needing to add trust for every column, and then
validation rules to downgrade the trust on null values for all of the sources that do not
provide values for the columns.
More trust columns and validation rules obviously affect the load and the merge
processes. Also, the more trusted columns, the longer will the update statements be for
the control table. Bear in mind that Oracle and DB2 have a 32K limit on the size of the
SQL buffer for SQL statements. For this reason, more than 40 trust columns result in a
horizontal split in the update of the control table—MRM will try to update only 40
columns at a time.
Property Description
Staging Identity
Display Name Name of this staging table as it will be displayed in the Hub Console.
Physical Name Actual name of the staging table in the database. Siperian Hub will
suggest a physical name for the staging table based on the display
name that you enter.
System Select the source system for this data. For more information, see
“Configuring Source Systems” on page 348.
Preserve Source Copy key values from the source system rather than using Siperian
System Keys Hub’s internally-generated key values. Applies to staging tables
associated with base objects only (not with dependent objects).
To learn more, see “Preserving Source System Keys” on page 368.
Highest Reserved Key Specify the amount by which the key is increased after the first load.
Visible only if the Preserve Source System Key checkbox is selected.
To learn more, see “Specifying the Highest Reserved Key” on page
369.
Data Tablespace Name of the data tablespace for this staging table. For more
information, see the Siperian Hub Installation Guide for your platform.
Property Description
Index Tablespace Name of the index tablespace for this staging table. For more
information, see the Siperian Hub Installation Guide for your platform.
Description Description of this staging table.
Cell Update Determines whether Siperian Hub updates the cell in the target table
if the value in the incoming record from the staging table is the same.
For more information, see “Enabling Cell Update” on page 369.
Columns Columns in this staging table. For more information, see
“Configuring Columns in Tables” on page 125.
Audit Trail and Delta Configurable after mappings between landing and staging tables have
Detection been defined. For more information, see “Mapping Columns
Between Landing and Staging Tables” on page 380.
Audit Trail If enabled, retains the history of the data in the RAW table based on
the number of loads and timestamps. For more information, see
“Configuring the Audit Trail for a Staging Table” on page 399.
Delta Detection If enabled, Siperian Hub processes only new or changed records and
ignores unchanged records. For more information, see “Configuring
Delta Detection for a Staging Table” on page 401.
By default, this option is not enabled. During Siperian Hub stage jobs (see “Stage Jobs”
on page 745), for each inbound record of data, Siperian Hub generates an internal key
that it inserts in the ROWID_OBJECT column of the target base object.
Enable this option when you want to use the value from the primary key column from
the source system instead of Siperian Hub’s internally-generated key. To enable this
option, when adding a staging table to a base object (see “Adding Staging Tables” on
page 371), check (select) the Preserve Source System Keys check box in the Add
staging to Base Object dialog. Once enabled, during stage jobs, instead of generating an
internal key, Siperian Hub takes the value in the PKEY_SOURCE_OBJECT column
from the staging table and inserts it into the ROWID_OBJECT column in the target
base object.
Note: Once a base object is created, you cannot change this setting.
If the Preserve Source System Keys check box is enabled, then the Schema Manager
displays the Highest Reserved Key field. If you want to insert a gap between the source
key and Siperian Hub’s key, then enter the amount by which the key is increased after
the first load.
Note: Set the Highest Reserved Key to the upper boundary of the source system keys.
To allow a margin, set this number slightly higher, adding a buffer to the expected
range of source system keys. Any records added to the base object that do not contain
this key will be given a key by Siperian Hub that is above the highest reserved value you
set.
Enabling this option has the following consequences when the base object is first
loaded:
1. From the staging table, Siperian Hub takes the value in PKEY_SOURCE_
OBJECT and inserts that into the base object’s ROWID_OBJECT—instead of
generating Siperian Hub’s internal key.
2. Siperian Hub then resets the key's starting position to MAX (PKEY_SOURCE_
OBJECT) + the GAP value.
3. On the next load for this staging table, Siperian Hub continues to use the PKEY_
SOURCE_OBJECT. For loads from other staging tables, it uses the Siperian
Hub-generated key.
Note: Only one staging table per base object can have this option enabled (even if it is
from the same system). The reserved key range is set at the initial load only.
By default, during the stage process (see “Stage Jobs” on page 745), for each inbound
record of data, Siperian Hub replaces the cell value in the target base object whenever
an incoming record has a higher trust level—even if the value it replaces is identical.
Even though the value has not changed, Siperian Hub updates the last update date for
the cell to the date associated with the incoming record, and assigns to the cell the
same trust level as a new value. For more information, see “Configuring Trust for
Source Systems” on page 455.
You can change this behavior by checking (selecting) the Cell Update check box when
configuring a staging table. If cell update is enabled, then during Stage jobs, Siperian
Hub will compare the cell value with the current contents of the cross-reference table
before it updates the target record in the base object. If the cross-reference record for
this system has an identical value in this cell, then Siperian Hub will not update the cell
in the Hub Store. Enabling cell update can increase performance during Stage jobs if
your Siperian Hub implementation does not require updates to the last update date and
trust value in the target base object record.
Property Description
Column Name of this column as defined in the associated base object or
dependent object.
Lookup System Name of the lookup system if the Lookup Table is a cross-reference
table.
Lookup Table For foreign key columns in the staging table, the name of the table
containing the lookup column.
Lookup Column For foreign key columns in the staging table, the name of the lookup
column in the lookup table. For more information, see “Configuring
Lookups For Foreign Key Columns” on page 376.
Allow Null Update Determines whether null updates are allowed when a Load job
specifies a null value for a cell that already contains a non-null value.
• Check (select) this check box to have the Load job update the
cell. Do this if you want Siperian Hub to update the cell value
even though the new value would be null.
• Uncheck (clear, the default) this check box to prevent null
updates and retain the existing non-null value.
Property Description
Allow Null Foreign Determines whether null foreign keys are allowed. Use this option
Key only if null values are valid for the foreign key relationship—that is, if
the foreign key is an optional relationship.
• Check (select) this check box to allow data to be loaded when
you do not have a value for lookup.
• Uncheck (clear, the default) this check box to prevent null
foreign keys. In this case, records with null values in the lookup
column will be written to the rejects table instead of being
loaded.
The Schema Manager displays the Add staging to Base Object (or Dependent
Object) dialog.
6. Specify the staging table properties. For more information, see “Staging Table
Properties” on page 367.
Note: Some of these settings cannot be changed after the staging table has been
added, so make sure that you specify the settings you want before closing this
dialog.
7. From the list of the columns in the base object or dependent object, select all of
the columns that this source system will provide. For more information, see
“Staging Table Columns” on page 365.
• Click the Select All button to select all of the columns without needing to
click each column individually.
• Click the Clear All button to unselect all selected columns.
These staging table columns inherit the properties of their corresponding columns
in the base object or dependent object. You can select columns but you cannot
change its inherited data types and column widths.
Schema Manager creates the new staging table in the Operational Record Store
(ORS), along with any support tables, and then adds the new staging table to the
schema tree.
Note: The Rowid Object and the Last Update Date are automatically selected.
You cannot uncheck these columns or change their properties.
8. Specify column properties. For more information, see “Properties for Columns in
Staging Tables” on page 370.
9. For each column that has an associated foreign key relationship, select the row and
click the button to define the lookup column. For more information, see
“Configuring Lookups For Foreign Key Columns” on page 376.
Note: You will not be able to save this new staging table unless you complete this
step.
10. Click OK.
The Schema Manager creates the new staging table in the Operational Record
Store (ORS), along with any support tables, and then adds the new staging table to
the schema tree.
11. If you want, configure an Audit Trail and Delta Detection for this staging table.
To learn more, see “Using Audit Trail and Delta Detection” on page 398.
The Schema Manager displays the properties for the selected table.
5. Specify the staging table properties. For more information, see “Staging Table
Properties” on page 367.
For each property that you want to edit (Display Name and Description), click the
Edit button next to it, and specify the new value.
6. From the list of the columns in the base object or dependent object, change the
columns that this source system will provide.
• Click the Select All button to select all of the columns without needing to
click each column individually.
• Click the Clear All button to unselect all selected columns.
Note: The Rowid Object and the Last Update Date are automatically selected.
You cannot uncheck these columns or change their properties.
7. If you want, change column properties. For more information, see “Properties for
Columns in Staging Tables” on page 370.
8. If you want, change lookups for foreign key columns. Select the column and click
the button to configure the lookup column. For more information, see
“Configuring Lookups For Foreign Key Columns” on page 376.
9. If you want to change cell updating (see “Enabling Cell Update” on page 369),
click in the Cell update check box.
10. Change the column configuration for your staging table, if you want. For more
information, see “Configuring Columns in Tables” on page 125.
11. If you want, configure an Audit Trail and Delta Detection for this staging table.
To learn more, see “Using Audit Trail and Delta Detection” on page 398.
12. Click the button to save your changes.
The Hub Console launches the Systems and Trust tool and displays the source system
associated with this staging table. For more information, see “Configuring Source
Systems” on page 348.
About Lookups
A lookup is the process of retrieving a data value from a parent table during Load jobs.
In Siperian Hub, when configuring a staging table associated with a base object, if a
foreign key column in the staging table (as the child table) is related to the primary key
in a parent table, you can configure a lookup to retrieve data from that parent table.
The target column in the lookup table must be a unique column (such as the primary
key). For more information, see “Performing Lookups Needed to Maintain Referential
Integrity” on page 312.
For example, suppose your Siperian Hub implementation had two base objects: a
Consumer parent base object and an Address child base object, with the following
relationship between them:
Consumer.Rowid_object = Address.Consumer_Fkey
In this case, the Consumer_Fkey will be included in the Address Staging table and it
will look up data on some column.
Once defined, when the Load job runs on the base object, Siperian Hub looks up the
source system’s Consumer code value in the primary key from source system column
of the Consumer code cross-reference table, and returns the customer type ROWID_
OBJECT value that corresponds to the source consumer type.
Configuring Lookups
The Edit Lookup button is enabled only for foreign key columns.
The Define Lookup dialog contains the parent base object and its cross-reference
table, along with any unique columns (only).
8. Select the target column for the lookup.
• To define the lookup to a base object, expand the base object and select
Rowid_Object (the primary key for this base object).
You can map columns from one landing table to multiple staging tables. However, each
staging table is mapped to only one landing table.
For each column of data in the staging table, the data comes from the landing column
in one of two ways:
In the following figure, data in the Name column is cleansed via a cleanse function,
while data from all other columns is passed directly to the corresponding target column
in the staging table.
Note: A staging table does not need to use every column in the landing table or every
output string from a cleanse function. The same landing table can provide input to
multiple staging tables, and the same cleanse function can be reused for multiple
columns in multiple landing tables.
Cleanse functions can also decompose and aggregate data. Either way, your mappings
need to accommodate the required inputs and outputs.
In the following figure, the cleanse function decomposes the name field, breaking the
data into smaller pieces.
This cleanse function has one input string and five output strings. In your mapping,
you need to make sure that the input string is mapped to the cleanse function, and each
output string is mapped to the correct target column in the staging table.
In the following figure, the cleanse function aggregates data from five fields into a
single string.
This cleanse function has five input strings and one output string. In your mapping,
you need to make sure that the input strings are mapped to the cleanse function and
the output string is mapped to the correct target column in the staging table.
Column Description
Mappings List List of every defined landing-to-staging mapping.
Properties Properties for the selected mapping.
When you select a mapping in the mappings list, its properties are displayed.
When a mapping is selected, the Mappings tool displays the following tabs.
Column Description
General General properties for this mapping. For more information, see
“Mapping Properties” on page 386.
Diagram Interactive diagram that lets you define mappings between columns
in the landing and staging tables. For more information, see
“Mapping Columns Between Landing and Staging Table Columns”
on page 389.
Query Parameters Allows you to specify query parameters for this mapping. For more
information, see “Configuring Query Parameters for Mappings” on
page 392.
Test Allows you to test the mapping.
Mapping Diagrams
When you click the Diagram tab for a mapping, the Mappings tool displays the current
column mappings.
Mapping lines show the mapping from source columns in the landing table to target
columns in the staging table. Colors in the circles at either end of the mapping lines
indicate data types.
Mapping Properties
Mappings have the following properties.
Field Description
Name Name of this mapping as it will be displayed in the Hub Console.
Description Description of this mapping.
Landing Table Select the landing table that will be the source of the mapping.
Staging Table Select the staging table that will be the target of the mapping.
Secure Resource Check (enable) to make this mapping a secure resource, which allows you to
control access to this mapping. Once a mapping is designated as a secure
resource, you can assign privileges to it in the Secure Resources tool.
To learn more, see “Securing Siperian Hub Resources” on page 841, and
“Assigning Resource Privileges to Roles” on page 859.
Adding Mappings
To create a new mapping:
1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click in the area where the mappings are listed and choose Add Mapping.
4. Specify the mapping properties. For more information, see “Mapping Properties”
on page 386.
5. Click OK.
The Mappings tool displays the landing table and staging table on the workspace.
6. Using the workspace tools and the input and output nodes, connect the column in
the landing table to the corresponding column in the staging table.
Tip: If you want to automatically map columns in the landing table to columns
with the same name in the staging table, click the button.
7. Click OK.
8. When you are finished, click the button to save your changes.
Copying Mappings
To create a new mapping by copying an existing one:
1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click the mapping that you want to copy, and then choose Copy Mapping.
The Mappings tool displays the Mapping dialog.
4. Specify the mapping properties. The landing table is already specified. For more
information, see “Mapping Properties” on page 386.
5. Click OK.
6. Click the button to save your changes.
The workspace and the methods of creating a mapping are the same as for creating
cleanse functions. To learn how to use the workspace to define functions, inputs, and
outputs, see “Configuring Graph Functions” on page 424.
2. Mouse-over the output connector (circle) to the right of the column in the landing
table (the circle outline turns red), drag the line to the input connector (circle) to
the left of the column in the staging table, and then release the mouse button.
Note: If you want to load by RowID, create a mapping between the primary key in
the landing table and the Rowid object in the staging table. For more information,
see “Loading by RowID” on page 394.
To cleanse data during Stage jobs, you can include one or more cleanse functions in
your mapping. This section provides brief instructions for configuring cleanse
functions in mappings. To learn more, see “Using Cleanse Functions” on page 414.
To configure mappings between columns in landing and staging tables via cleanse
functions:
1. Navigate to the Diagrams tab according to the instructions in“Navigate to the
Diagrams Tab” on page 389.
2. Add the cleanse function(s) that you want to configure by right-clicking anywhere
in the workspace and choosing the cleanse function that you want to add.
3. For each input connector on the cleanse function, mouse-over the output
connector from the appropriate column in the landing table, drag the line to its
corresponding input connector, and release the mouse button.
4. Similarly, for each output connector on the cleanse function, mouse-over the
output connector, drag the line to its corresponding column in the staging table,
and release the mouse button.
In the following example, the Titlecase cleanse function will process data that
comes from the Last Name column in the landing table and then populate the Last
Name column in the staging table with the cleansed data.
5. If you want, check or uncheck the Enable Distinct check box, as appropriate, to
configure distinct mapping. For more information, see “Distinct Mapping” on
page 393.
6. If you want, check or uncheck the Enable Condition check box, as appropriate, to
configure conditional mapping. For more information, see “Conditional Mapping”
on page 394.
If enabled, type the SQL WHERE clause (omitting the WHERE keyword), and
then click Validate to validate the clause.
7. Click the button to save your changes.
By default, all records are retrieved from the landing table. Optionally, you can
configure a mapping that filters records in the landing table. There are two types of
filters: distinct and conditional. You configure these settings on the Query Parameters
tab in the Mappings tool. For more information, see “Configuring Query Parameters
for Mappings” on page 392.
Distinct Mapping
If you click the Enable Distinct check box on the Query Parameters tab, the Stage job
selects only the distinct records from the landing table. Siperian Hub populates the
staging table using the following SELECT statement:
Using distinct mapping is useful in situations in which you have a single landing table
feeding multiple staging tables and the landing table is denormalized (for example, it
contains both customer and address data). A single customer could have three
addresses. In this case, using distinct mapping prevents the two extra customer records
from being written to the rejects table.
In the mapping to the customer table, check (select) Enable Distinct to avoid having
duplicate records because only LUD, CUST_ID, and NAME are mapped to the
Customer staging table. With Distinct enabled, only one record would populate your
customer table and no rejects would occur.
Alternatively, for the address mapping, you map ADDR_ID and ADDR with Distinct
disabled so that you get two records and no rejects.
Conditional Mapping
If you select the Enable Condition check box, you can apply a SQL WHERE clause to
unload the data in cleanse. For example, suppose the data in your landing table is from
all states in the US. You can use the WHERE clause to filter the data that is written to
the staging tables to include only data from one state, such as California. To do this,
type in a WHERE clause (but omit the WHERE keyword): STATE = 'CA'. When the
cleanse job is run, it unloads and processes records as SELECT * FROM LANDING
WHERE STATE = 'CA'. If you specify conditional mapping, click the Validate button
to validate the SQL statement.
Loading by RowID
You can streamline load, match, and merge processing by explicitly configuring
Siperian Hub to load by RowID. Otherwise, Siperian Hub loads data according to its
default behavior, which is described in “Run-time Execution Flow of the Load
Process” on page 304.
Note: If you clean the BASE OBJECT using the stored procedure, and if you had
setup the TAKE-ON GAP for the particular staging table, the ROWID sequences are
reset to 1.
In the staging table, the Rowid Object column (a nullable column) has a specialized usage.
You can streamline load, match, and merge processing by mapping any column in a
landing table to the Rowid Object column in a staging table. In the following example,
the Address Id column in the landing table is mapped to the Rowid Object column in
the staging table.
Rowid
Object
Mapping to the Rowid Object column allows for the loading of records by present- or
lineage-based ROWID_OBJECT. During the load, if an incoming record with a
populated ROWID_OBJECT is new (the incoming PKEY_SRC_OBJECT + ROWID_
SYSTEM is checked), then this record bypasses the match and merge process and gets
added to the base object directly—a real-time API PUT(_XREF) by ROWID_
OBJECT. Using this feature enhances lineage and unmerge support, enables
closed-loop integration with downstream systems, and can increase throughput.
The initial data load for a base object inserts all records into the target base object.
Therefore, enable loading by rowID for incremental loads that occur after the initial
data load. For more information, see “Initial Data Loads and Incremental Loads” on
page 302 and “Run-time Execution Flow of the Load Process” on page 304.
Jumping to a Schema
The Mappings tool allows you to quickly launch the Schema Manager and display the
schema associated with the selected mapping.
Note: The Jump to Schema command is available only in the Workbenches view, not
the Processes view.
5. The Mappings tool displays the schema for the selected mapping.
Testing Mappings
To test a mapping that you have configured:
1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the mapping that you want to configure.
4. Click the Test tab.
The Mappings tool displays the Test tab for this mapping.
Removing Mappings
To remove a mapping:
1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click the mapping that you want to remove, and choose Delete Mapping.
The Mappings tool prompts you to confirm deletion.
4. Click Yes.
The Mappings tool drops supporting tables, removes the mapping from the
metadata, and updates the list of mappings.
To configure audit trail and delta detection, click the Settings tab.
Note: The Audit Trail has very different functionality from—and is not to be confused
with—the Audit Manager tool described in Chapter 22, “Auditing Siperian Hub
Services and Events”.
3. If you have not already done so, add a mapping for the staging table. For more
information, see “Adding Mappings” on page 386
4. Select the staging table that you want to configure.
5. At the bottom of the properties panel, click Preserve an audit trail in the raw
table to enable the raw data audit trail.
The Schema Manager prompts you to select the retention period for the audit
table.
Option Description
Loads Number of batch loads for which to retain data.
Time Period Period of time for which to retain data.
Once configured, the audit trail keeps data for the retention period that you specified.
For example, suppose you configured the audit trail for two loads (Stage job
executions). In this case, the audit trail will retain data for the two most recent loads to
the staging table. If there were ten records in each load in the landing table, then the
total number of records in the RAW table would be 20.
If the Stage job is run multiple times, then the data in the RAW table will be retained
for the most recent two sets based on the ROWID_JOB. Data for older ROWID_
JOBs will be deleted. For example, suppose the value of the ROWID_JOB for the first
Stage job is 1, for the second Stage job is 2, and so on. When you run the Stage job a
third time, then the records in which ROWID_JOB=1 will be discarded.
Note: Using the Clear History button in the Batch Viewer after the first run of the
process:
If the audit trail is enabled for a staging table and you choose the Clear History button
in the Batch Viewer while the associated stage job is selected, the records in the RAW
and REJ tables will be cleared the next time the stage job is run.
4. Select (check) the Enable delta detection check box to enable delta detection for
the table. You might need to scroll down to see this option.
5. Specify the manner in which you want to have deltas detected. You can choose:
• Detect deltas by comparing all columns in mapping
• Detect deltas via a date column (select the column)
6. Specify whether to allow staging if a prior duplicate was rejected during the stage
process or load process.
• Select (check) this option to allow the duplicate record being staged, during this
next stage process execution, to bypass delta detection if its previously-staged
duplicate was rejected.
Note: If this option is enabled, and a user in the Batch Viewer clicks the Clear
History button while the associated stage job is selected, then the history of
the prior rejection (that this feature relies on) will be discarded because the
records in the REJ table will be cleared the next time the stage job is run.
• Clear (uncheck) this option (the default) to prevent the duplicate record being
staged, during this next stage process execution, from bypassing delta
detection if its previously-staged duplicate was rejected. Delta detection will
filter out any corresponding duplicate landing record that is subsequently
processed in the next stage process execution.
If delta detection is enabled, then the Stage job compares the contents of the landing
table—which is mapped to the selected staging table—against the data set processed in
the previous run of the stage job. This comparison is done to determine whether the
data has changed since the previous run. Changed, new records, and rejected records
will be put into the staging table. Duplicate records are ignored. For more information,
see “Mapping Columns Between Landing and Staging Tables” on page 380.
Note: Reject records move from cleanse to load after the second stage run.
• When delta detection is based on the Last Update Date, any changes to the last
update date or the primary key will be detected. Updates to any values that are not
the last update date or part of the concatenated primary key will not be detected.
• Duplicate primary keys are not considered during subsequent stage processes when
using delta detection by mapped columns.
• Reject handling allows you to:
• View all reject records for a given staging table regarding of the batch job
• View all reject records by day across all staging tables
• Query reject tables based on query filters
This chapter describes how to configure your Hub Store to cleanse data during the
stage process. This chapter is a companion to the material provided in Chapter 11,
“Configuring the Stage Process.”
Chapter Contents
• Before You Begin
• About Data Cleansing in Siperian Hub
• Configuring Cleanse Match Servers
• Using Cleanse Functions
• Configuring Cleanse Lists
405
Before You Begin
Note: Data cleansing that occurs prior to its arrival in the landing tables is outside the
scope of this chapter.
The Cleanse Match Server is multi-threaded so that each instance can process multiple
requests concurrently. It can be deployed on a variety of application servers. See the
Siperian Hub Release Notes for a list of supported application servers. See the Siperian Hub
Installation Guide for your platform for instructions on installing and configuring
Cleanse Match Server(s).
Siperian Hub supports running multiple Cleanse Match Servers for each Operational
Record Store (ORS). The cleanse process is generally CPU-bound. This scalable
architecture allows you to scale your Siperian Hub implementation as the volume of
data increases. Deploying Cleanse Match Servers on multiple hosts distributes the
processing load across multiple CPUs and permits the running of cleanse operations in
parallel. In addition, some external adapters are inherently single-threaded, so this
Siperian Hub architecture allows you to simulate multi-threaded operations by running
one processing thread per application server instance.
• Batch Only
For your Siperian Hub implementation, you can increase the throughput of the cleanse
process by running multiple Cleanse Match Servers in parallel. To learn more about
distributed Cleanse Match Servers, see the Siperian Hub Installation Guide.
If proxy users have been configured for your Siperian Hub implementation, if you
created proxy_user and cmx_ors with different passwords, then you need to either:
• restart the application server and log in to the proxy user from the Hub Console
or
• register the Cleanse Match Server for the proxy user again
Cleanse Requests
All requests for cleansing are issued by database stored procedures. These stored
procedures package a cleanse request as an XML payload and transmit it to a Cleanse
Match Server. When the Cleanse Match Server receives a request, it parses the XML
and invokes the appropriate code:
The Cleanse Match Server is multi-threaded so that each instance can process multiple
requests concurrently. The default timeout for batch requests from Oracle to a Cleanse
Match Server is one year, and the default timeout for on-line requests is one minute.
For DB2, the default timeout for batch requests or SIF requests is 600 seconds (10
minutes).
When running a stage/match job, if more than one cleanse match server is registered,
and if the total number of records to be staged or matched is more than 500, then the
job will get distributed in parallel among the available Cleanse Match Servers.
The Cleanse Match Server tool displays a list of any configured Cleanse Match Servers.
Property Description
Server Host or machine name of the application server on which you
deployed Siperian Hub Cleanse Match Server.
Port HTTP port of the application server on which you deployed the
Cleanse Match Server.
Cleanse Server Determines whether to use the Cleanse Match Server for cleansing
data.
• Select (check) this check box to use the Cleanse Match Server for
cleansing data.
• Clear (uncheck) this check box if you do not want to use the
Cleanse Match Server for cleansing data.
If an ORS has multiple associated Cleanse Match Servers, you can
enhance performance by configuring each Cleanse Match Server as
either a match-only or a cleanse-only server. Use this option in
conjunction with the Match Server check box to implementation this
configuration.
Cleanse Mode Mode that the Cleanse Match Server uses for cleansing data. For
details, see “Modes of Cleanse Operations” on page 407.
Match Server Determines whether to use the Match Server for matching data.
• Check (select) this check box to use the Match Server for
matching data.
• Uncheck (clear) this check box if you do not want to use the
Match Server for matching data.
If an ORS has multiple associated Cleanse Match Servers, you can
enhance performance by configuring each Cleanse Match Server as
either a match-only or a cleanse-only server. Use this option in
conjunction with the Cleanse Server check box to implementation
this configuration.
Match Mode Mode that the Match Server uses for matching data. One of the
following values:
For details, see “Cleanse Requests” on page 408.
Property Description
Offline Determines whether the Cleanse Match Server is offline or online.
• Select (check) this check box to take the Cleanse Match Server
offline, making it temporarily unavailable. Once offline, no
cleanse jobs are sent to that Cleanse Match Server (servlet).
• Clear (uncheck) this check box to make an offline Cleanse Match
Server available again so that Siperian Hub can once again send
cleanse jobs to that Cleanse Match Server.
Note: Siperian Hub looks at this field but does not set it. Taking a
Cleanse Match Server offline is an administrative action.
Thread Count Overrides the default thread count. The default, recommended, value
is 1 thread. Thread counts are defined in the Sipeiran Hub Console
and can be changed without having to restart the server.
Note: You must change this value after migration from an earlier hub
version or all values will default to 1 thread.
CPU Rating Specifies a relative CPU performance rating for the host machine on
which this Cleanse Match Server runs. This rating is relevant only in
relation to CPU ratings for other host machines on which Cleanse
Match Servers are also running.
5. Change the properties you want for this Cleanse Match Server. To learn more, see
“Cleanse Match Server Properties” on page 410.
If proxy users have been configured for your Siperian Hub implementation, see
“Cleanse Match Servers and Proxy Users” on page 408.
6. Click OK to apply your changes.
7. Click the Save button to save your changes.
If the test succeeds, the Cleanse Match Server tool displays a window showing the
connection information and a success message.
If there was a problem, Siperian Hub will display a window with information about
the connection problem.
4. Click OK.
Libraries
Functions are organized into libraries—Java libraries and user libraries, which are folders
used to organize the functions that you can use in the Cleanse Functions tool in the
Model workbench. To learn more, see “Configuring Cleanse Libraries” on page 418.
The functions you see in the Hub Console depend on the cleanse engine that you are
using. Siperian Hub shows the cleanse functions that your cleanse engine makes
available. Regardless of which cleanse engine you use, the overall process of data
cleansing in Siperian Hub is the same.
Pane Description
Navigation pane Shows the cleanse functions in a tree view. Clicking on any node in the
tree shows you the appropriate properties page in the right-hand pane.
Properties pane Shows the properties for the selected function. For any of the custom
cleanse functions, you can edit properties in the right-hand pane.
The functions you see in the left pane depend on the cleanse engine you are using.
Your functions may differ from the ones shown in the previous figure.
Cleanse functions are grouped in the tree according to their type. Cleanse function
types are high-level categories that are used to group similar cleanse functions for
easier management and access.
If you expand the list of cleanse function types in the navigation pane, you can select a
cleanse function to display its particular properties.
In addition to specific cleanse functions, the Misc Functions include Read Database
and Reject functions that provide efficiencies in data management.
Field Description
Read Database Allows a map to lookup records directly from a database table.
Note: This function is designed to be used when there are many
references to the same limited number of data items.
Reject Allows the creator of a map to identify incorrect data and reject the
record, noting the reason.
7. Add cleanse functions to your graph function. See “Adding Functions to a Graph
Function” on page 427.
8. Test your functions. See “Testing Functions” on page 437.
You can add a User Library when you want to create a customized cleanse function
from existing internal or external Siperian cleanse functions.
Field Description
Name Unique, descriptive name for this library.
Description Optional description of this library.
7. Click OK.
The Cleanse Functions tool displays the new library you added in the list under
Cleanse libraries in the navigation pane.
6. Specify the JAR file for this library. You can click the Browse button to look for
the JAR file.
Field Description
Name Unique, descriptive name for this library.
Description Optional description of this library.
8. If applicable, click the Parameters button to specify any parameters for this
library.
The Cleanse Functions tool displays the parameters dialog.
The name, value pairs that are imported from the file will be available to the
user-defined Java function at run time as elements of its Java properties. This
allows you to provide customized values in a generic function, such as “userid”
or “target URL”.
9. Click OK.
The Cleanse Functions tool displays the new library in the list under Cleanse
libraries in the navigation pane.
To learn about adding graph functions to your library, see “Configuring Graph
Functions” on page 424.
In Siperian Hub, a regular expression function allows you to use regular expressions for
cleanse operations. Regular expressions are computational expressions that are used to
match and manipulate text data according to commonly-used syntactic conventions
and symbolic patterns. To learn more about regular expressions, including syntax and
patterns, refer to the Javadoc for java.util.regex.Pattern. Alternatively, to define a graph
function instead, see “Configuring Graph Functions” on page 424.
Field Description
Name Unique, descriptive name for this regular expression function.
Description Optional description of this regular expression function.
5. Click OK.
The Cleanse Functions tool displays the new regular expression function under the
user library in the list in the left pane, with the properties in the right pane.
7. If you want, specify an input or output expression by clicking the icon to edit
the field, entering a regular expression, and then clicking the icon to apply the
change.
8. Click the icon to save your changes.
In Siperian Hub, a graph function is a cleanse function that you can visualize and
configure graphically using the Cleanse Functions tool in the Hub Console. You can
add any pre-defined functions to a graph function. Alternatively, to define a regular
expression function, see “Configuring Regular Expression Functions” on page 422.
For each graph function, you must configure all required inputs and outputs. Inputs
and outputs have the following properties.
Field Description
Name Unique, descriptive name for this input or output.
Description Optional description of this input or output.
Data Type Data type. Must match exactly. One of the following values:
• Boolean—accepts Boolean values only
• Date—accepts date values only
• Float—accepts float values only
• Integer—accepts integer values only
• String—accepts any data
The Cleanse Functions tool displays the Add Graph Function dialog.
Field Description
Name Unique, descriptive name for this graph function.
Description Optional description of this graph function.
5. Click OK.
The Cleanse Functions tool displays the new graph function under the library in
the list in the left pane, with the properties in the right pane:
This graph function is empty. To configure it and add functions, see “Adding
Functions to a Graph Function” on page 427.
You can add as many functions as you want to a graph function. The example in this
section shows adding only a single function.
If you already have graph functions defined, you can treat them just like any other
function in the cleanse libraries. This means that you can add a graph function inside
another graph function. This approach allows you to reuse functions.
Toolbar
Workspace
The area in this tab is referred to as the workspace. You might need to resize the
window to see both the input and output on the workspace.
By default, graph functions have one input and one output that are of type string
(gray circle). The function that you are defining might require more inputs and/or
outputs and different data types. To learn more, see “Configuring Inputs” on page
434 and “Configuring Outputs” on page 435.
4. Right-click on the workspace and choose Add Function from the pop-up menu.
For more on the other commands on this pop-up menu, see “Workspace
Commands” on page 432. You can also add or delete these functions using the
toolbar buttons.
The Cleanse Functions tool displays the Choose Function to Add dialog.
5. Expand the folder containing the function you want to add, select the function to
add, and then click OK.
Note: The functions that are available for you to add depend on your cleanse
engine and its configuration. Therefore, the functions that you see might differ
from the cleanse functions shown in the previous figure.
The Cleanse Functions tool displays the added function in your workspace.
Note: Although this example shows a single graph function on the workspace, you
can add multiple functions to a cleanse function.
To move a function, click it and drag it wherever you need it on the workspace.
The expanded mode shows the labels for all available inputs and outputs for this
function.
To learn more, see “Configuring Inputs” on page 434 and “Configuring Outputs”
on page 435.
7. Mouse-over the input connector, which is the little circle on the right side of the
input box. It turns red when ready for use.
8. Click the node and draw a line to one of the function input nodes.
9. Draw a line from one of the function output nodes to the output box node.
10. Click the button to save your changes. To learn about testing your new
function, see “Testing Functions” on page 437.
Workspace Commands
Function Modes
Function modes determine how the function is displayed on the workspace. Each
function has the following modes, which are accessible by right-clicking the function:
Option Description
Compact Displays the function as a small box, with just the function name.
Standard Displays the function as a larger box, with the name and the nodes for the
input and output, but the nodes are not labeled. This is the default mode.
Expanded Displays the function as a large box, with the name, the input and output
nodes, and the names of those nodes.
Logging Used for debugging. Choosing this option generates a log file for this
Enabled function when you run a Stage job (see “Stage Jobs” on page 745). The log
file records the input and output for every time the function is called
during the stage job. There is a new log file created for each stage job.
The log file is named <jobID><graph function name>.log and is stored
in:
\Siperian\hub\cleanse\tmp\<ORS>
Note: Do not use this option in production, as it will consume disk space
and require performance overhead associated with the disk I/O. To disable
this logging, right-click on the function and uncheck Enable Logging.
Delete Object Deletes the function from the graph function.
You can cycle through the display modes (compact, standard, and expanded) by
double-clicking on the function.
Workspace Buttons
The toolbar on the right side of the workspace provides the following buttons.
Button Description
Save changes.
Expand the graph. This makes more room for the workspace on the screen by
hiding the left pane.
Using Constants
Constants are useful in cases where you know that you have standardized input.
For example, if you have a data set that you know consists entirely of doctors, then you
can use a constant to put Dr. in the title. When you use constants in your graph
function, they are differentiated visually from other functions by their grey background
color.
Configuring Inputs
Note: Once you create an input, you cannot later edit the input to change its type.
If you must change the type of an input, create a new one of the correct type and
delete the old one.
6. Click the button to add another input.
Field Description
Name Unique, descriptive name for this parameter.
Data Type Data type of this parameter.
Description Optional description of this parameter.
8. Click OK.
Add as many inputs as you need for your functions.
Configuring Outputs
Note: Once you create an output, you cannot later edit the output to change its
type. If you must change the type of an output, create a new one of the correct
type and delete the old one.
6. Click the button to add another output.
The Cleanse Functions tool displays the Add Parameter dialog.
Field Description
Name Unique, descriptive name for this parameter.
Data Type Data type of this parameter.
Description Optional description of this parameter.
7. Click OK.
Add as many outputs as you need for your functions.
Testing Functions
Once you have added and configured a graph or regular expression function, it is
recommended that you test it to make sure it is behaving as expected. This test process
mimics a single record coming into the function.
5. For each input, specify the value that you want to test by clicking the cell in the
Value column and typing a value that complies with the data type of the input.
• For Boolean inputs, the Cleanse Functions tool displays a true/false
drop-down list.
• For Calendar inputs, the Cleanse Functions tool displays a Calendar button
that you can click to select a date from the Date dialog.
6. Click Test.
If the test completed successfully, the output is displayed in the output section.
Conditional execution components are similar to the construct of a case (or switch)
statement in a programming language. The cleanse function evaluates the condition
and, based on this evaluation, applies the appropriate graph function associated with
the case that matches the condition. If no case matches the condition, then the default
case is used—the case flagged with an asterisk (*).
Conditional execution components are useful when, for example, you have segmented
data. Suppose a table has several distinct groups of data (such as customers and
prospects). You could create a column that indicated the group of which the record is a
member. Each group is called a segment. In this example, customers might have C in
this column. while prospects would have P. You could use a conditional execution
component to cleanse the data differently for each segment. If the conditional value
does not meet any of the conditions you specify, then the default case will be executed.
6. Enter a value for the condition. Using the customer and prospect example, you
would enter C or P. Click OK.
The Cleanse Functions tool displays the new condition in the list of conditions on
the left, as well as in the input box.
Add as many conditions as you require. You do need to specify a default
condition—the default case is automatically created when you create a new
conditional execution component. However, you can specify the default case with
the asterisk (*). The default case will be executed for all cases that are not covered
by the cases you specify.
7. Add as many functions as you require to process all of the conditions. To learn
more, see “Adding Functions to a Graph Function” on page 427.
8. For each condition—including the default condition—draw a link between the
input node to the input of the function. In addition, draw links between the
outputs of the functions and the output of your cleanse function.
Note: You can specify nested processing logic in graph functions. For example, you
can nest conditional components within other conditional components (such as nested
case statements). In fact, you can define an entire complex process containing many
conditional tests, each one of which contains any level of complexity as well.
Field Description
Name Unique, descriptive name for this cleanse list.
Description Optional description of this cleanse list.
6. Click OK.
The Cleanse Functions tool displays the details pane for the new (empty) cleanse
list on the right side of the screen.
The Cleanse Functions tool displays information about the cleanse list in the right
pane.
4. Change the display name and description in the right pane, if you want, by clicking
the Edit button next to a value that you want to change.
5. Click the Details tab.
The Cleanse Functions tool displays the details for the cleanse list.
7. Specify a search string, an output string, a match type, and click OK.
The search string is the input that you want to cleanse, resulting in the output
string.
Important: Siperian Hub will search through the strings in the order in which they
are entered. The order in which you specify the items can therefore affect the
results obtained. To learn more about the types of matches available, see “Types of
String Matches” on page 445.
Note: As soon as you add strings to a cleanse list, the cleanse list is saved.
The strings that you specified are shown in the Cleanse List Details section.
8. You can add and remove strings. You can also move string forward or backward in
the cleanse list, which affects their order in run-time execution sequence and,
therefore, the results obtained.
9. You can also specify the “Default value” for every input string that does not match
any of the search strings.
If you do not specify a default value, every input string that does not match a
search string is passed to the output string with no changes.
For the output string, you can specify one of the following match types:
2. Specify the connection properties for the source of the data and click Next.
The Cleanse Functions tool displays a list of tables available for import.
The Cleanse Functions tool displays a list of columns available for import.
You can import the records of the sample data either as phrases (one entry for
each record) or as words (one entry for each word in each record). Choose whether
to import the match strings as words or phrases and then click Finish.
The Cleanse List Details box is now populated with data from the specified source.
Note: The imported match strings are not part of the match list. To add them to
the match list, you need to move them to the Search Strings on the right hand side.
• To add match strings to the match list with the match string value in both the
Search String and Output String, select the strings in the Match Strings list, and
click the button.
• If you add match strings to the match list with an Output String value that you
want to define, simply click the record you added and specify a new Search and
Output String.
• To add all Match Strings to the match list, click the button.
• To clear all Match Strings from the match list, click the button.
• Repeat these steps until you have constructed a complete match list.
5. When you have finished changing the match list properties, click the button
to save your changes.
The Cleanse Functions tool displays a list of tables available for import.
The Cleanse Functions tool displays a list of match strings available for import.
8. Click Finish.
The Cleanse List Details box is now populated with data from the specified source.
9. When you have finished changing the match list properties, click the button
to save your changes.
This chapter explains how to configure the load process in your Siperian Hub
implementation. For an introduction, see “Load Process” on page 299.
Chapter Contents
• Before You Begin
• Configuration Tasks for Loading Data
• Configuring Trust for Source Systems
• Configuring Validation Rules
453
Before You Begin
For additional configuration settings that can affect the load process, see:
• “Loading by RowID” on page 394
• “Distinct Systems” on page 595
• “Generate Match Tokens on Load” on page 104
• “Load Process” on page 299
About Trust
Several source systems may contain attributes that correspond to the same column in a
base object table. For example, several systems may store a customer’s address.
However, one system might be a more reliable source for that data than others. If these
systems disagree, then Siperian Hub must decide which value is the best one to use.
To help with comparing the relative reliability of column data from different source
systems, Siperian Hub allows you to configure trust for a column. Trust is a designation
the confidence in the relative accuracy of a particular piece of data. For each column
from each source, you can define a trust level represented by a number between 0 and
100, with zero being the least trustworthy and 100 being the most trustworthy. By
itself, this number has no meaning. It becomes meaningful only when compared with
another trust number to determine which is higher.
Trust takes into account the age of data, how much its reliability has decayed over time,
and the validity of the data. Trust is used to determine survivorship (when two records
are consolidated), and whether updates from a source system are sufficiently reliable to
update the master record.
Trust Levels
A trust level is a number between 0 and 100. By itself, this number has no meaning.
It has meaning only when compared with another trust number.
The reliability of data from a given source system can decay (diminish) over time. In
order to reflect this fact in trust calculations, Siperian Hub allows you to configure
decay characteristics for trust-enabled columns. The decay period is the amount of time
that it takes for the trust level to decay from the maximum trust level (see “Maximum
Trust” on page 459) to the minimum trust level (see “Minimum Trust” on page 459).
For more information, see “Units” on page 459, “Decay” on page 459, and “Graph
Type” on page 460.
Trust Calculations
The load process calculates trust for trust-enabled columns in the base object. For
records with trust-enabled columns, the load process assigns a trust score to cell data.
This trust score is initially based on the configured trust settings for that column.
The trust score may be subsequently downgraded when the load process applies
validation rules—if configured for a trust-enabled column—after the trust calculations.
For more information, see “Run-time Execution Flow of the Load Process” on page
304.
During the load process, if a record in the staging table will be used for a load update
operation, and if that record contains a changed cell value in a trust-enabled column,
the load process calculates trust scores for:
• the cell data in the source record in the staging table (which contains the updated
information)
• the cell data in the target record in the base object (which contains the existing
information)
If the cell data in the source record has a higher trust score than the cell data in the
target record, then Siperian Hub updates the cell in the base object record with the cell
data in the staging table record.
When two records in a base object are consolidated, Siperian Hub calculates the trust
score for each trusted column in the two records being merged. Cells with the highest
trust scores survive in the final consolidated record. If the trust scores are the same,
then Siperian Hub compares records according to an order of precedence, as described
in “Survivorship and Order of Precedence” on page 291.
The following figure shows control tables associated with trust-enabled columns in a
base object.
For each trust-enabled column in a base object record, Siperian Hub maintains a record
in a corresponding control table that contains the last update date and an identifier of
the source system. Based on these settings, Siperian Hub can always calculate the
current trust for the column value.
If history is enabled for a base object, Siperian Hub also maintains a separate history
table for the control table, in addition to history tables for the base object and its
cross-reference table.
The cross-reference table for a base object contains the most recent value from each
source system. By default (without trust settings), the base object contains the most
recent value no matter which source system it comes from.
For trust-enabled columns, the cell value in a base object record might not have the
same value as its corresponding record in the cross-reference table. Validation rules,
which are run during the load process after trust calculations, can downgrade trust for
a cell so that a source that had previously provided the cell value might not update the
cell. For more information about validation rules, see “Configuring Validation Rules”
on page 468.
Data stewards can manually override a calculated trust setting if they have direct
knowledge that a particular value is correct. Data stewards can also enter a value
directly into a record in a base object. For more information, see the Siperian Hub Data
Steward Guide.
For state-enabled base objects, trust is calculated for records with a PENDING or
ACTIVE state, but records with a DELETE state are ignored. For more information,
see Chapter 7, “State Management.”
Synchronize batch jobs can fail for base objects with a large number of trust-enabled
columns. Similarly, Automerge jobs can fail if there is a large number of trust-enabled
or validation-enabled columns. The exact number of columns that cause the job to fail
is variable and is based on the length of the column names and the number of
trust-enabled columns (or, for Automerge jobs, validation-enabled columns as well).
Long column names are at—or close to—the maximum allowable length of 26
characters. To avoid this problem, keep the number of trust-enabled columns below 60
and/or the length of the column names short. A work around is to enable all
trust/validation columns before saving the base object to avoid running the
synchronization job.
Trust Properties
This section describes the trust properties that you can configure for trust-enabled
columns. Trust properties are configured separately for each source system that could
provide records for trust-enabled columns in a base object.
Maximum Trust
The maximum trust (starting trust) is the trust level that a data value will have if it has
just been changed. For example, if source system X changes a phone number field
from 555-1234 to 555-4321, the new value will be given system X’s maximum trust
level for the phone number field. By setting the maximum trust level relatively high,
you can ensure that changes in the source systems will usually be applied to the base
object.
Minimum Trust
The minimum trust is the trust level that a data value will have when it is old (after the
decay period has elapsed). This value must be less than or equal to the maximum trust.
Note: If the maximum and minimum trust are equal, then the decay curve is a flat line
and the decay period and decay type have no effect.
Units
Specifies the units used in calculating the decay period—day, week, month, quarter, or
year.
Decay
Specifies the number (of days, weeks, months, quarters, or years) used in calculating the
decay period.
Note: For the best graph view, limit the decay period you specify to between 1 and
100.
Graph Type
Decay follows a pattern in which the trust level decreases during the decay period. The
graph types show these decay patterns have any of the following settings.
Rapid Initial Most of the decrease occurs toward the beginning of the decay period.
Slow Later Decay follows a concave curve. If a source system has this graph
(RISL) type, then a new value from the system will probably be trusted, but
this value will soon become much more likely to be overridden.
Slow Initial Most of the decrease occurs toward the end of the decay period.
Rapid Later Decay follows a convex curve. If a source system has this graph type,
(SIRL) it will be relatively unlikely for any other system to override the value
that it sets until the value is near the end of its decay period.
By default, the start date for trust decay shown in the Trust Decay Graph is the current
system date. To see the impact of trust decay based on a different start date for a given
source system, specify a different test offset date according to the instructions in
“Changing the Offset Date for a Trust-Enabled Column” on page 466.
meaningful only in relation to the trust levels of other source systems that contribute
data for the trust-enabled column.
Trust is disabled by default. When trust is disabled, Siperian Hub uses the value from
the most recently-executed load process regardless of which source system it comes
from. If column data for a base object comes from only one system, then trust should
remain disabled for that column.
Trust should be enabled, however, for columns in which data can come from multiple
source systems. If you enable trust for a column, you also assign trust levels to specify
the relative reliability of any source systems that could provide records that update the
column.
Before you configure trust for trust-enabled columns, you must have:
• enabled trust for base object columns according to the instructions in “Enabling
Trust for a Column” on page 461
• configured staging tables in the Schema Manager, including associated source
systems and staging table columns that correspond to base object columns,
according to the instructions in “Configuring Staging Tables” on page 364
At a minimum, you must specify trust settings for trust-enabled columns in the
administration source system (called Admin by default). This source system represents
manual updates that you make within Siperian Hub. This source system can contribute
data to any trust-enabled column. Set the trust settings for this source system to high
values (relative to other source systems) to ensure that manual updates override any
existing values from other source systems. For more information, see “Administration
Source System” on page 349.
1. Start the Systems and Trust tool according to the instructions in “Starting the
Systems and Trust Tool” on page 350.
The Systems and Trust tool displays a read-only view of the trust-enabled columns
in the selected base object, indicating with a check mark whether a given source
system supplies data for that column.
For the selected trust-enabled column, the Systems and Trust tool displays the list
of source systems associated with the column, along with editable trust settings to
be configured per source system, and a trust decay graph.
7. Specify the trust properties for each column. For more information, see “Trust
Properties” on page 459.
8. Optionally, you can change the offset date, as described as “Changing the Offset
Date for a Trust-Enabled Column” on page 466.
9. Click the button to save your changes.
The Systems and Trust tool refreshes the Trust Decay Graph based on the trust
settings you specified for each source system for this trust-enabled column.
The X-axis is the trust score and the Y-axis is the time.
By default, the Trust Decay Graph shows the trust decay across all source systems
from the current system date. You can specify a different date (such as a future date) to
test your current trust settings and see how trust would decay from that date. Note that
offset dates are not saved.
After records have been loaded into a base object, if you enable trust for any column,
or if you change trust settings for any trust-enabled column(s) in that base object, then
you must run the Synchronize batch job (see “Synchronize Jobs” on page 747) before
running the consolidation process. If this batch job is not run, then errors will occur
during the consolidation process.
consists of:
• Condition: Length < 3
• Action: Downgrade trust on First_Name by 50%
If the Reserve Minimum Trust flag is set for the column, then the trust cannot be
downgraded below the column’s minimum trust. You use the Schema Manager to
configure validation rules for a base object.
Validation rules are executed during the load process, after trust has been calculated for
trust-enabled columns in the base object. If validation rules have been defined, then
the load process applies them to determine the final trust scores, and then uses the
final trust values to determine whether to update records in the base object with cell
data from the updated records. For more information, see “Run-time Execution Flow
of the Load Process” on page 304.
Validation Checks
A validation check can be done on any column in a base object. The downgrade resulting
from the validation check can be applied to the same column, as well as to any other
columns that can be validated. Invalid data in one column can therefore result in trust
downgrades on many columns.
For example, supposed you used an address verification flag in which the flag is OK if
the address is complete and BAD if the address is not complete. You could configure a
validation rule that downgrades the trust on all address fields if the verification flag is
not OK. Note that, in this case, the verification flag should also be downgraded.
Required Columns
Validation rules are applied regardless of the source of the incoming data. However,
validation rules are applied only if the staging table or if the input—a Services
Integration Framework (SIF) request—contains all of the required columns. If any
required columns are missing, validation rules are not applied.
If a base object contains existing data and you change validation rules, you must run
the Revalidate job to recalculate trust scores for new and existing data, as described in
“Revalidate Jobs” on page 745.
For state-enabled base objects, validation rules are applied to records with a
PENDING or ACTIVE state, but records with a DELETE state are ignored. For
more information, see Chapter 7, “State Management.”
trust/validation columns before saving the base object to avoid running the
synchronization job.
Validation rules are disabled by default. Validation rules should be enabled, however,
for any trust-enabled columns that will use validation rules for trust downgrades.
For example, with a validation downgrade percentage of 50%, and a trust level
calculated at 60:
Final Trust Score = 60 - (60 * 50 / 100)
Validation rules are executed in sequence. If multiple validation rules are configured for
a column, only one validation rule—the rule with the greatest downgrade
percentage—is applied to the column. Downgrade percentages are not
cumulative—rather, the “winning” validation rule overwrites any previous-applied
changes.
Note: The execution sequence for validation rules differs between the load process
described in this chapter and PUT requests invoked by external applications using the
Services Integration Framework (SIF). For PUT requests, validation rules are executed
in order of decreasing downgrade percentage. For more information, see the Siperian
Services Integration Framework Guide and the Siperian Hub Javadoc.
Pane Description
Number of Rules Number of configured validation rules for the selected base object.
Validation Rules List of configured validation rules for the selected base object.
Properties Pane Properties for the selected validation rule. For more information, see
“Validation Rule Properties” on page 473.
Rule Name
Rule Type
Rule Columns
For each column, you specify the downgrade percentage and whether to reserve
minimum trust.
Downgrade Percentage
Percentage by which the trust level of the specified column will be decreased if this
validation rule condition is met. The larger the percentage, the greater the downgrade.
For example, 0% has no effect on the trust, while 100% downgrades the trust
completely (unless the reserve minimum trust is specified, in which case 100%
downgrades the trust so that it equals minimum trust).
If trust is downgraded by 100% and you have not enabled minimum reserve trust for
the column, then the value of that column will not be populated into the base object.
Specifies what will happen if the downgrade causes the trust level to fall below the
column’s minimum trust level. You can retain the minimum trust (so that the trust level
will be reduced to the minimum trust but no lower). If this box is cleared (unchecked),
then the trust level will be reduced by the specified percentage even if this means going
below the minimum trust.
Rule SQL
Specifies the SQL WHERE clause representing the condition for this validation rule.
During the load process, the validation rule is executed. If data meets the criteria
specified in the Rule SQL field, then the trust value is downgraded by the downgrade
percentage configured for this validation rule.
The Validation Rules editor prompts you to configure the SQL WHERE clause based
on the selected Rule Type for this validation rule.
Expression
During the load process, this query is used to check the validity of the data in the
staging table.
The following table provides examples of SQL WHERE clauses based on the selected
rule type.
Examples of WHERE Clause for Each Rule Type
Rule Type WHERE clause Examples Result
Existence WHERE WHERE S.MIDDLE_ Affected columns will be
Check S.ColumnName IS NAME IS NULL downgraded for records
NULL with middle names that are
null. The records that do
not meet the condition will
not be affected.
Domain WHERE WHERE S.Gender Affected columns will be
Check S.ColumnName IN NOT IN ('M', 'F', downgraded if the Gender
('?', '?', '?') 'U') is any value other than M,
F, or U.
Referential WHERE NOT EXISTS WHERE NOT EXISTS Affected columns will be
Integrity (SELECT (SELECT DISTINCT downgraded for records
<blank>’a’ FROM ? 'a' FROM ACCOUNT_ with Account Type values
WHERE ?.? = TYPE WHERE that are not on the Account
S.<Column_Name> ACCOUNT_
TYPE.Account_Type Type table.
WHERE NOT EXISTS = S.Account_Type
(SELECT <blank>
'a' FROM <Ref_
Table> WHERE
<Ref_Table>.<Ref_
Column> =
S.<Column_Name>
Pattern WHERE WHERE S.eMail_ Downgrade will be applied
Validation S.ColumnName LIKE Address NOT LIKE if the e-mail address does
'Pattern' '%@%' not contain an @ character.
Custom WHERE WHERE Downgrade will be applied
LENGTH(S.ZIP_ if the length of the zip code
CODE) > 4 column is less than 4.
You can use the wildcard character (*) to reference tables via an alias.
• s.* aliases the staging table
• I.* aliases a temporary table and provides ROWID_OBJECT, PKEY_SRC_
OBJECT, and ROWID_SYSTEM information for the records being updated.
For Custom rule types, write SQL statements that are well formed and well tuned. If
you need more information about SQL WHERE clause syntax and wild card patterns,
refer to the product documentation for the database platform used in your Siperian
Hub implementation.
Note: Be sure to specify precedence correctly using parentheses according to the SQL
syntax for your database platform. Incorrect or omitted parentheses can have
unexpected results and long-running queries. For example, the following statement is
ambiguous and leaves it up to the database server to determine precedence:
WHERE conditionA OR conditionB or conditionC
These two statements will yield very different results when evaluating records.
3. Specify the properties for this validation rule. For more information, see
“Validation Rule Properties” on page 473.
4. If you want, select the rule column(s) for this validation rule by clicking the
button.
The Validation Rules editor displays the Select Rule Columns dialog.
The available columns are those that have the Validate flag enabled (see “Column
Properties” on page 127. For more information, see “Configuring Columns in
Tables” on page 125.
Select the column(s) for which the trust level will be downgraded if the condition
specified in the WHERE clause for this validation rule is met, and then click OK.
5. Click OK.
The Schema Manager adds the new rule to the list of validation rules.
Note: If a base object contains existing data and you change validation rules, you
must run the Revalidate job to recalculate trust scores for new and existing data, as
described in “Revalidate Jobs” on page 745.
3. Specify the editable properties for this validation rule. You cannot change the rule
type. For more information, see “Validation Rule Properties” on page 473.
4. If you want, select the rule column(s) for this validation rule by clicking the
button.
The Validation Rules editor displays the Select Rule Columns dialog.
The available columns are those that have the Validate flag enabled (see “Column
Properties” on page 127. For more information, see “Configuring Columns in
Tables” on page 125.
Select the column(s) for which the trust level will be downgraded if the condition
specified in the WHERE clause for this validation rule is met, and then click OK.
5. Click the button to save changes.
Note: If a base object contains existing data and you change validation rules, you
must run the Revalidate job to recalculate trust scores for new and existing data, as
described in “Revalidate Jobs” on page 745.
Use the following buttons to change the sequence of validation rules in the list.
Click To....
Move the selected validation rule higher in the sequence.
This chapter describes how to configure your Hub Store to identify and handle
potential duplicate records. For an introduction to the match process, see “Match
Process” on page 317.
Chapter Contents
• Configuration Tasks for the Match Process
• Navigating to the Match/Merge Setup Details Dialog
• Configuring Match Properties for a Base Object
• Configuring Match Paths for Related Records
• Configuring Match Columns
• Configuring Match Rule Sets
• Configuring Match Column Rules for Match Rule Sets
• Configuring Primary Key Match Rules
• Investigating the Distribution of Match Keys
• Excluding Records from the Match Process
483
Before You Begin
Property Description
Duplicate Match Used only with the Match for Duplicate Data job for initial data loads.
Threshold For more information, see “Duplicate Match Threshold” on page 103.
Max Elapsed Match Timeout (in minutes) when executing a match rule. If exceeded, the
Minutes match process exits. For more information, see “Max Elapsed Match
Minutes” on page 103.
Match Flag audit If enabled, then an audit table (BusinessObjectName_FMHA) is created
table and populated with the userID of the user who, in Merge Manager,
queued a manual match record for automerging. For more information,
see “Match Flag Audit Table” on page 105 and the Siperian Hub Data
Steward Guide.
If you want to change settings, you need to Acquire a write lock according to the
instructions in “Acquiring a Write Lock” on page 30.
For a description of each property, see the next section, “Match Properties” on
page 490.
4. Edit the property settings that you want to change, clicking the Edit button
next to the field if applicable.
5. Click the Save button to save your changes.
Match Properties
This section describes the configuration settings on the Match Properties tab.
This setting helps prevent data stewards from being overwhelmed with thousands of
matches for manual consolidation. This sets the limit on the list of possible matches
that must be decided upon by a data steward (default is 1000). Once this limit is
reached, Siperian Hub stops the match process until the number of records for manual
consolidation has been reduced.
This setting specifies an upper limit on the number of records that Siperian Hub will
process for matching during match process execution (Match or Auto Match and
Merge jobs). When the match process starts executing, it begins by flagging records to
be included in the match job batch. From the pool of new/unconsolidated records that
are ready for match (CONSOLIDATION_IND=4, as described in “Consolidation
Indicator” on page 289), the match process changes CONSOLIDATION_IND to 3.
The number of records flagged is determined by the Number of Rows per Match Job
Batch Cycle. The match process then matches those records in the match job batch
against all of the records in the base object.
The number of records in the match job batch affects how long the match process
takes to execute. The value to specify depends on the size of your data set, the
complexity of your match rules, and the length of the time window you have available
to run the match process. The default match batch size is low (10). You increase this
based on the number of records in the base object, as well as the number of matches
generated for those records based on its match rules.
• The lower your match batch size, the more times you will need to run the match
and consolidation processes.
• The higher your match batch size, the more work each match and consolidation
process does.
For each base object, there is a medium ground where you reach the optimal match
batch size. You need to identify this optimal batch size as part of performance tuning
in your environment. Start with a match batch size of 10% of the volume of records to
be matched and merged, run the match job only, see how many matches are generated
by your match rules, and then adjust upwards or downwards accordingly.
Enable (set to Yes) this feature to have Siperian Hub mark as unique
(CONSOLIDATION_IND=1) any records that have been through the match process,
but for which no matches were identified. If enabled, for such records, Siperian Hub
automatically changes their state to consolidated (changes the consolidation indicator
from 2 to 1). Consolidated records are removed from the data steward’s queue via the
Automerge batch job.
By default, this option is disabled. In a development environment, you might want this
option disabled, for example, while iteratively testing and tuning match rules to
determine which records are found to be unique for a given set of match rules.
This option should always be enabled in a production environment. Otherwise, you can
end up with a large number of records with a consolidation indicator of 2. If this
backlog of records exceeds the Maximum Matches for Manual Consolidation setting
(see “Maximum Matches for Manual Consolidation” on page 490), then you will need
to process these records first before you can continue matching and consolidating
other records.
Match/Search Strategy
Select the match/search strategy to specify the reliability of the match versus the
performance you require. Select one of the following options.
An exact strategy is faster, but an exact match will miss some matches if the data is
imperfect. The best option to choose depends on the characteristics of the data, your
knowledge of the data, and your particular match and consolidation requirements.
Certain configuration settings the Match / Merge Setup tab apply to only one type of
base object. In this document, such features are indicated with a graphic that shows
whether it applies to fuzzy-match base objects only (as in the following example), or
exact-match base objects only. No graphic means that the feature applies to both.
Note: The match / search strategy is configured at the base object level. For more
information about the match / search strategy configured at the match rule level, see
“Match / Search Strategy” on page 544.
Fuzzy Population
If the match/search strategy is Fuzzy, then you must select a population, which defines
certain characteristics about the records that you are matching. Data characteristics can
vary from country to country. By default, Siperian Hub comes with the US population,
but Siperian provides standard populations per country. If you require another
population, contact Siperian support. If you chose an exact match/search strategy, then
this value is ignored.
If this setting is enabled (checked), then Siperian Hub matches the current records
against records with lower ROWID_OBJECT values. For example, if the current
record has a ROWID_OBJECT value of 100, then the record will be matched only
against other records in the base object with a ROWID_OBJECT value that is less
than 100 (ignoring all records with a ROWID_OBJECT value that is higher than 100).
Using this feature can reduce the number of matches required and speed performance.
However, if PUTs are executed, or if records are inserted out of rowid order, then
records might not be fully matched. You must assess the trade-off between
performance and match quantity based on the characteristics of your data and your
particular match requirements. By default, this option is disabled (unchecked).
Available only for fuzzy key matching and only if “Match Only Previous Rowid
Objects” is checked (selected). If Match Only Once is enabled (checked), then once a
record has found a match, Siperian Hub will not match it any further within this search
range (the set of similar match key values). Using this feature can reduce duplicates and
increase performance. Instead of finding every match for a record in a search range,
Siperian Hub can find a single match for each. In subsequent match cycles, the merge
process will put these into large groups of XREF records associated with the base
object.
By default, this option is unchecked (disabled). If this feature is enabled, however, you
can miss matches. For example, suppose record A matches record B, and record A
matches record C, but record B and C do not match. You must assess the trade-off
between performance and match quantity based on the characteristics of your data and
your particular match requirements.
During the match process, dynamic match analysis determines whether the match
process will take an unacceptably long period of time. This threshold value specifies
the maximum acceptable number of comparisons.
To enable the dynamic match threshold, specify a non-zero value. Enable this feature if
you have data that is very similar (with high concentrations of matches) to reduce the
amount of work expended for a hot spot in your data. A hotspot is a group of records
representing overmatched data—a large intersection of matches. If Dynamic Match
Analysis Threshold is enabled, then records that produce more than the specified
number of potential match candidates will be skipped during the match process. By
default, this option is zero (disabled).
Before conducting a match on a given search range, Siperian Hub calculates the
number of search records (records being searched for matches), and multiplies it by the
number of file records (the number of records returned from the match key table that
need to be compared). If the result is greater than the specified Dynamic Match
Analysis Threshold, then no comparisons are performed on that range of data, and the
range is noted in the application server log for further investigation.
By default, the match process includes only ACTIVE records and ignores PENDING
records. For state management-enabled objects, select this check box to include
PENDING records in the match process. Note that, regardless of this setting,
DELETED records are ignored by the match process. For more information, see
“Enabling Match on Pending Records” on page 214.
For link-style base objects only, you can unlink consolidated records and requeue them
for match. This can be configured to occur automatically on load update, or manually
by via the Reset Links batch job. For more information, see “Reset Links Jobs” on
page 744.
For link-style base objects only, the Schema Manager displays the following properties.
Property Description
Allow prompt for reset of Specifies whether to prompt for a reset of match links when
match links when match rules / configuration settings for match rules or match columns are
columns are changed changed.
Allow reset of match links for Specifies whether the reset links prompt applies to updated
updated data data (load updates). This prompt is triggered automatically
upon load update.
Allow reset of links to include Specifies whether the reset links process applies to
consolidated records consolidated records.
Note: The reset links process always applies to
unconsolidated records.
Property Description
Allow reset of links to include Specifies whether manually-linked records are included by
manually linked records the reset links process. Autolinked records are always
included.
Note: This setting affects the scope of all other reset links
settings.
Match Paths
A match path allows you to traverse the hierarchy between records—whether that
hierarchy exists between base objects (inter-table paths) or within a single base object
(intra-table paths). Match paths are used for configuring match column rules involving
related records in either separate tables or in the same table.
Configuring match paths that point to other records involves two main components:
Component Description
foreign key Used to traverse the relationships to other records. Allows you to
relationships specify parent-to-child and child-to-parent relationships.
filters (optional) Allow you to selectively include or exclude records based on values in
a given column, such as ADDRESS_TYPE or PARTY_TYPE.
For more information, see “Configuring Filters for Match Paths” on
page 511.
You configure a separate relationship base object for each type of relationship. You can
include additional attributes of the relationship type, such as start date, end date, and
other relationship details. The relationship base object defines a match path that
enables you to configure match column rules.
Important: Do not run the match and consolidation processes on a base object that is
used to define relationships between records in inter-table or intra-table match paths.
Doing so will change the relationship data, resulting in the loss of the associations
between records.
Inter-Table Paths
An inter-table path defines the relationship between records in two different base
objects. In many cases, this relationship can be defined simply by configuring a foreign
key relationship: a key column in the child base object points to the primary key of the
parent base object. For more information, see “Configuring Foreign-Key Relationships
Between Base Objects” on page 140.
In some cases, however, the relationship between records can be more complex,
requiring an intermediary base object that defines the relationship between records in
the two tables.
Consider the following example in which a Siperian Hub implementation has two base
objects:
In order to configure match rules for this kind of relationship between records in
different base objects, you would create a separate base object (such as PersAddrRel)
that describes to Siperian Hub the relationships between records in the two base
objects.
To define the relationship between records in the two base objects, the PersonAddrRel
base object could have the following columns:
Note that the column type of the foreign key columns—CHAR(14)—matches the
primary key to which they point.
After you have configured the relationship base object (PersonAddrRel), you would
complete the following tasks:
1. Configure foreign keys from this base object to the ROWID_OBJECT of the
Person and Address base objects. For more information, see “Configuring
Foreign-Key Relationships Between Base Objects” on page 140.
2. Load the PersAddrRel base object with data that describes the relationships
between records, as shown in the following example.
In this example, note that Person #786 has two addresses, and that Address #1028
has two persons.
3. Use the PersonAddrRel base object when configuring match column rules for the
related records. For more information, see “Configuring Match Column Rules for
Match Rule Sets” on page 542.
Intra-Table Paths
Within a base object, parent/child relationships can exist between individual records.
Siperian Hub allows you to clarify relationships between records in the same base
object, and then use those relationships when configuring column match rules.
The relationships among employees is hierarchical. The CEO is at the top of the
hierarchy, representing what is called the global ultimate parent record.
In order to configure match rules for this kind of object, you would create a separate
base object to describe to Siperian Hub the relationships between records.
For example, you could create and configure a EmplRepRel base object with the
following columns:
Note that the column type of the foreign key columns—CHAR(14)—matches the
primary key to which they point.
After you have configured this base object, you must complete the following tasks:
1. Configure foreign keys from this base object to the ROWID_OBJECT of the
Employee base object. For more information, see “Configuring Foreign-Key
Relationships Between Base Objects” on page 140.
2. Load this base object with data that describes the relationships between records, as
shown in the following example.
Note that you can define many-to-many relationships between records. For
example, the employee whose ROWID_OBJECT is 31 reports to two different
managers (ROWID_OBJECT=82 and ROWID_OBJECT=71), while this
Note: This example used a REPORTS_TO field to define the relationship, but you
could use piece of information to associate the records—even something more generic
and flexible like RELATIONSHIP_TYPE.
Section Description
Path Components Configure the foreign keys used to traverse the relationships. For more
information, see “Configuring Path Components” on page 507.
Filters Configure filters used to include or exclude records for matching. For
more information, see “Configuring Filters for Match Paths” on page 511.
The root base object is displayed automatically in the Path Components section of the
screen and is always available. The root base object represents an entity without child
or parent relationships. If you want to configure match rules that involve parent or
child records, you need to explicitly add path components to the root base object, and
Display Name
The name of this path component as it will be displayed in the Hub Console.
Physical Name
Actual name of the path component in the database. Siperian Hub will suggest a
physical name for the path component based on the display name that you enter.
The Check for Missing Children check box instructs Siperian Hub to either allow for
missing child records (enabled, the default) or to require all parent records to have child
records.
Setting Description
Enabled If you might have some missing child records and you have rules that do not
(Checked) include columns in the tables that might be missing records.
Disabled If all of your rules use the child columns and do not have null match enabled.
(Unchecked) In this case, checking for missing children does not add any value, and it can
have an negative impact on performance.
If you are certain that your data is complete (parent records have child records), and
you include the parent in the child match rule, then inter-table matching works as
expected. However, if your data tends to contain parent records that are missing child
records, or if you do not include the parent column in the child match rule, you must
check (select) the Check for Missing Children check box in the path component
associated with this match column rule to ensure that an outer join occurs when
Siperian Hub checks for records to match.
Note: If the Check for Missing Children option is enabled, Siperian Hub performs an
outer join between the parent and child tables, which can have a performance impact.
Therefore, when not needed, it is more efficient to disable this option.
Constraints
Property Description
Table List of tables in the schema.
Direction Direction of the foreign key:
• Parent-to-Child
• Child-to-Parent
• N/A
Foreign Key On Column to which the foreign key points. This column can be either in a
different base object or the same base object.
4. Specify the properties for this path component. For more information, see
“Properties of Path Components” on page 507.
5. Click OK.
6. Click the button to save your changes.
5. Specify the properties for this path component. You can change the following
values:
• Display Name (see “Display Name” on page 507)
• Check for Missing Children (see “Check For Missing Children” on page 507)
6. Click OK.
7. Click the button to save your changes.
You can delete path components but not the root base object. To delete a path
component:
1. In the Schema Manager, navigate to the Paths tab according to the instructions in
“Navigating to the Paths Tab” on page 505.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the Path Components tree, select the path component that you want to delete.
4. In the Path Components section, click the button.
The Schema Manager prompts you to confirm deletion.
5. Click Yes.
6. Click the button to save your changes.
About Filters
In match paths, a filter allows you to selectively determine whether to include or exclude
records for matching based on values in a given column. When you define a filter for a
column, you specify the filter condition with one or more values that determine which
records qualify for match processing. For example, if you have an Address base object
that contains both shipping and billing addresses, you might configure a filter that
includes only billing addresses for matching and ignores the shipping addresses. During
execution, the match process will match records in the match batch with billing address
records only.
Filter Properties
Setting Description
Column Column to configure in the currently-selected base object.
Operator Operator to use for this filter. One of the following values:
• IN—Include columns that contain the specified values.
• NOT IN—Exclude columns that contain the specified values.
Values One or more values to use for this filter.
Example Filter
For example, if you wanted to match only on mailing addresses in an Address base
object, you could specify:
In this example, only mailing addresses would qualify for matching—records in which
the COLUMN field contains “MAILING”. All other records would be ignored.
Adding Filters
If you add multiple filters, Siperian Hub evaluates the entire expression using the
logical AND operator. For example,
xExpr AND yExpr AND zExpr
To add a filter:
1. In the Schema Manager, navigate to the Paths tab according to the instructions in
“Navigating to the Paths Tab” on page 505.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the Filters section, click the Add button.
The Schema Manager displays the Add Filter dialog.
4. Specify the properties for this path component. For more information, see
“Properties of Path Components” on page 507.
5. Specify the value(s) for this filter according to the instructions in “Editing Values
for a Filter” on page 513.
6. Click the button to save your changes.
• Add a filter. For more information, see “Adding Filters” on page 512.
• Edit filter properties. For more information, see “Editing Filter Properties” on
page 513.
2. In either the Add Filter or Edit Filter dialog, click the button next to the
Values field.
The Schema Manager displays the Edit Values dialog.
3. Configure the values for this filter.
• To add a value, click the button. When prompted, specify a value and
then click OK.
• To delete a value, select it in the Edit Values dialog, click the button, and
then click Yes when prompted to delete the value.
4. Click OK.
5. Click the button to save your changes.
4. Specify the properties for this path component. For more information, see
“Properties of Path Components” on page 507.
5. Specify the value(s) for this filter according to the instructions in “Editing Values
for a Filter” on page 513.
6. Click the button to save your changes.
Deleting Filters
To delete a filter:
1. In the Schema Manager, navigate to the Paths tab according to the instructions in
“Navigating to the Paths Tab” on page 505.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the Filters section, select the filter that you want to delete, and then click the
button.
The Schema Manager prompts you to confirm deletion.
4. Click Yes.
The types of match columns that you can configure depend on the type of the base
object that you are configuring (see “Exact-match and Fuzzy-match Base Objects” on
page 320). The type of base object is defined by the selected match / search strategy
(see “Match/Search Strategy” on page 493).
Path Component
The path component is either the source table to use for a match column definition, or
the match path used to navigate a hierarchy of records. Match paths are used for
configuring match column rules involving related records in either separate tables or in
the same table. Before you can specify a path component, the match path must be
configured. For more information, see “Configuring Match Paths for Related Records”
on page 497.
Field Types
For fuzzy-match columns, the field name drop-down list displays the following field
types. For more information, see “Adding Exact-match Columns for Fuzzy-match Base
Objects” on page 525.
Field Types
Field Name Description
Address_Part1 Includes the part of address up to, but not including, the locality last
line. The position of the address components should be the normal
word order used in your data population. Pass this data in one field.
Depending on your base object, you may concatenate these
attributes into one field before matching. For example, in the US, an
Address_Part1 string includes the following fields: Care-of +
Building Name + Street Number + Street Name + Street Type +
Apartment Details. Address_Part1 uses methods and options
designed specifically for addresses.
Address_Part2 Locality line in an address. For example, in the US, a typical
Address_Part2 includes: City + State + Zip (+ Country). Matching
on Address_Part2 uses methods and options designed specifically
for addresses.
Attribute1, Attribute2 Two general purpose fields. These fields are matched using a general
purpose, string matching algorithm that compensates for
transpositions and missing characters or digits.
Date Matches any type of date, such as date of birth, expiry date, date of
contract, date of change, creation date, and so on. It expects the date
to be passed in Day+Month+Year format. It supports the use or
absence of delimiters between the date components. Matching on
dates uses methods and options designed specifically for dates. It
overcomes the typical error and variation found in this data type.
ID Matches any type of ID number, such as: Account number,
Customer number, Credit Card number, Drivers License number,
Passport, Policy number, SSN or other identity code, VIN, and so
on. It uses a string matching algorithm that compensates for
transpositions and missing characters or digits.
Organization_Name Matches the names of organizations, such as company names,
business names, institution names, department names, agency names,
trading names, and so on. This field supports matching on a single
name or on a compound name (such as a legal name and its trading
style). You may also use multiple names (for example, a legal name
and a trading style) in a single Organization_Name column for the
match.
For example:
Anna Maria Gonzales MD
Fuzzy-match base objects can have both fuzzy and exact-match columns.
For exact-match base objects instead, see “Configuring Match Columns for
Exact-match Base Objects” on page 527.
The Schema Manager displays the Match Columns tab for the fuzzy-match base
object.
The Match Columns tab for a fuzzy-match base object has the following sections.
Property Description
Fuzzy Match Key Properties for the fuzzy match key. For more information, see
“Configuring Fuzzy Match Key Properties” on page 521.
Match Columns Match columns and their properties:
• Field Name (see “Field Types” on page 517)
• Column Type (see “Match Column Types” on page 515)
• Path Component (see “Path Component” on page 516)
• Source Table—table referenced in the path component, or the
base object (if the path component is root)
Match Column List of available columns in the base object, as well as columns that
Contents have been selected for match.
This section describes how to configure the match column properties for fuzzy-match
base objects (see “Match/Search Strategy” on page 493).
Key Types
The match key type describes important characteristics about a column to Siperian Hub.
Siperian Hub has some intelligence about names and addresses, so this information
helps Siperian Hub generate keys correctly and conduct better searches. This is the
main criterion for the search that builds the initial list of potential match candidates.
This key type should be based on the main type of data that is in physical column(s)
that make up the fuzzy match key.
For a fuzzy-match base object, you can select one of the following key types:
Note: Key types are based on the population you select. The above list of key types
applies to the default population (US). Other populations might have different key
types. If you require another population, contact Siperian support.
Key Widths
The match key width determines how fast the searches are, the number of possible match
candidates returned, and how much disk space the keys consume. Key widths apply to
fuzzy match objects only.
Property Description
Key Type Type of field primarily used in the match. This is the main criterion
for the search that builds the initial list of potential match
candidates. This key type should be based on the main type of data
stored in the base object. For more information, see “Key Types”
on page 521.
Key Width Size of the search range for which keys are generated. For more
information, see “Key Widths” on page 522.
Path Component Path component for this fuzzy match key. This is a table containing
the column(s) to designate as the key type: Base Object, Child Base
Object table, or Cross-reference table. For more information, see
“Path Component” on page 516.
Property Description
Match Path Match path component for this fuzzy-match column. For a
Component fuzzy-match column, the source table can be the parent table, a
parent cross-reference table, or any child base object table. For
more information, see “Path Component” on page 516.
Field Name Name of this field as it will be displayed in the Hub Console. For
fuzzy match columns, this is a drop-down list where you can select
the type of data in the match column being defined, as described in
“Field Types” on page 517.
The Schema Manager adds the match column to the Match Columns list.
7. Click the Save button to save your changes.
Property Description
Match Path Match path component for this exact-match column. For an
Component exact-match column, the source table can be the parent table and /
or child physical columns. For more information, see “Path
Component” on page 516.
Field Name Name of this field as it will be displayed in the Hub Console.
Before you define match column rules, you must define the match columns on which
they will be based. Exact-match base objects can have only exact-match columns. For
more information about configuring match columns for fuzzy-match base objects
instead, see “Configuring Match Columns for Fuzzy-match Base Objects” on page 519.
The Schema Manager displays the Match Columns tab for the exact-match base
object.
The Match Columns tab for an exact-match base object has the following sections.
Property Description
Match Columns Match columns and their properties:
• Field Name
• Column Type (see “Match Column Types” on page 515)
• Path Component (see “Path Component” on page 516)
• Source Table—table referenced in the path component, or the
base object (if the path component is root)
Match Column List of available columns and columns selected for matching.
Contents
You can add only exact-match columns for exact-match base objects. Fuzzy-match
columns are not allowed.
Property Description
Match Path Match path component for this exact-match column. For an
Component exact-match column, the source table can be the parent table and /
or child physical columns. For more information, see “Path
Component” on page 516.
Field Name Name of this field as it will be displayed in the Hub Console.
Match rule sets allow you to execute different sets of match column rules at different
times. The match process uses only one match rule set per execution. To match using a
different match rule set, the match rule set must be selected and the match process
must be executed again.
Note: Only one match column rule in the match rule set needs to succeed in order to
declare a match between records.
You can configure any number of rule sets. When users want to run the Match batch
job, they select one rule set from the list of rule sets that have been defined for the base
object.
For more information about choosing match rule sets, see “Selecting a Match Rule Set”
on page 737.
In the Schema Manager, you designate one match rule set as the default.
Default (*)
Match rule sets allow you to accommodate different match column rule requirements
at different times. For example, you might use one match rule set for an initial data load
and a different match rule set for subsequent incremental loads. Similarly, you might
use one match rule set to process all records, and another match rule set with a filter to
process just a subset of records (see “Filtering SQL” on page 536).
Before saving any changes to a match rule set (including any changes to match rules in
the match rule set), the Schema Manager analyzes the match rule set and prompts you
with a warning message if the match rule set has any issues, as shown in the following
example.
Note: This is only a warning message. You can choose to ignore the message and save
changes anyway.
Name
Search Levels
Used with fuzzy-match base objects only. When you configure a match rule set, you
define a search level that instructs Siperian Hub on how stringently and thoroughly to
search for candidate matches.
The goal of the match process is to find the optimal number of matches for your data:
• not too few (called undermatching), which misses relevant matches, or
• not too many (called overmatching), which generates too many matches, including
matches that are not relevant
For any name or address in a fuzzy match key, Siperian Hub uses the defined search
level to generate different key ranges for the purpose of determining which records are
possible match candidates—and to which records the match column rules will be
applied.
The search level you choose should be determined by the size of your data set, your
time constraints, and how critical the matches are. Depending on your circumstances
and requirements, it is sometimes more appropriate to undermatch, while at other
times, it is more appropriate to overmatch. Implementations dealing with relatively
reliable and complete data can use the Narrow level, while implementations dealing
with less reliable data or with more critical problems should use Exhaustive or
Extreme.
The search level might also differ depending on the phase of a project. It might be
necessary to have a looser level (exhaustive or extreme) for initial matching, and tighten
as the data is deduplicated.
By default, when an application calls the SIF searchMatch request, all possible match
columns are generated from the package or mapping records specified in the request,
and the match is performed by treating all columns with equal weight. You can enable
this option, however, to allow applications to specify input match columns, in which
case the searchMatch API ignores any columns that were not passed as part of the
request. You might use this feature if, for example, you were using a custom population
definition and wanted to call the searchMatch API with a particular set of rules.
Enable Filtering
For example, if you had an Organization base object that contained multiple types of
organizations (customers, vendors, prospects, partners, and so on), you could define
different match rule sets that selectively processed only the type of records you want to
match: MatchAll (no filter), MatchCustomersOnly, MatchVendorsOnly, and so on.
Filtering SQL
By default, when the Match batch job is run (see “Match Jobs” on page 734), the
match rule set processes all records. If the Enable Filtering check box (see “Enable
Filtering” on page 536) is selected (checked), you can specify a filter condition to
restrict processing to only those rules that meet the filter condition. A filter is analogous
to a WHERE clause in a SQL statement. The filter expression can be any expression
that is valid for the WHERE clause syntax used in your database platform.
Note: The match rule set filter is applied to the base object records that are selected
for the match batch only (the records to match from)—not the records in the match pool
(the records to match to). For more information, see “Flagging the Match Batch” on
page 329.
For example, suppose your implementation had an Organization base object that
contained multiple types of organizations (customers, vendors, prospects, partners, and
so on). Using filters, you could define a match rule set (MatchCustomersOnly) that
processed customer data only.
org_type=’C’
All other, non-customer records would be ignored and not processed by the Match job.
Match Rules
This area of the window displays a list of match column rules that have been
configured for the selected match rule set. For more information, see “Configuring
Match Column Rules for Match Rule Sets” on page 542.
The Schema Manager displays the Match Rule Sets tab for the selected base object.
4. Enter a unique, descriptive name for this new match rule set.
5. Click OK.
The Schema Manager adds the new match rule set to the list.
6. Configure the match rule set according to the instructions in the next section,
“Editing Match Rule Set Properties” on page 539.
• The following example shows the properties for an exact-match base object.
4. Configure properties for this match rule set. For more information, see “Match
Rule Set Properties” on page 534.
5. Configure match columns for this match rule set according to the instructions in
“Configuring Match Column Rules for Match Rule Sets” on page 542.
6. Click the Save button to save your changes.
Before saving changes, the Schema Manager analyzes the match rule set and
prompts you with a message if the match rule set contains certain incongruences.
For more information, see “Rule Set Evaluation” on page 533.
7. If you are prompted to confirm saving changes, click OK button to save your
changes.
You can configure match column rules only after you have:
• configured the columns that you intend to use in your match rules, as described in
“Configuring Match Columns” on page 515
• created at least one match rule set, as described in “Configuring Match Rule Sets”
on page 531
The properties for match column rules differ between exact match and fuzzy-match
base objects (see “Exact-match and Fuzzy-match Base Objects” on page 320).
• For exact-match base objects, you can configure only exact column types.
• For fuzzy-match base objects, you can configure fuzzy or exact column types.
For more information, see “Match Rule Properties for Fuzzy-match Base Objects
Only” on page 544.
For each match column rule, decide whether matched records should be automatically or
manually consolidated. For more information, see “Specifying Consolidation Options
for Match Column Rules” on page 574 and “Consolidating Records Automatically or
Manually” on page 336.
This section describes match rule properties for fuzzy-match base objects.
These properties do not apply to exact-match base objects.
For fuzzy-match base objects, the match / search strategy defines the strategy that Siperian
Hub uses for searching and matching in the match rule. Select one of the following
options:
Certain configuration settings on the Match / Merge Setup tab apply to only one type
of column. In this document, such features are indicated with a graphic that shows
whether it applies to fuzzy-match columns only (as in the following example), or
exact-match columns only. No graphic means that the feature applies to both.
The match / search strategy determines how to match candidate A with candidate B
using fuzzy or exact methods. The match / search strategy can affects the quantity and
quality of the match candidates. An exact match / search strategy requires clean and
complete data—it might miss some matches if the data is less clean, incomplete, or full
of duplicates. When defining match rule properties, you must find the optimal balance
between finding all possible candidates, and not encumber the process with too many
irrelevant candidates.
Note: This match / search strategy is configured at the match rule level. For more
information about the match / search strategy configured at the base object level
(which determines whether it is a fuzzy-match base object or exact-match base object),
see “Match/Search Strategy” on page 493.
When specifying the match / search strategy for a fuzzy-match base object, consider
the implications of configuring the following types of match rules:
Match Purpose
For fuzzy-match base objects, the match purpose defines the primary goal behind a match
rule. For example, if you're trying to identify matches for people where address is an
important part of determining whether two records are for the same person, then you
would choose the Match Purpose called Resident.
For every match rule you define, you must choose the purpose of the rule from a list of
predefined match purposes provided by Siperian. Each match purpose contains
knowledge about how best to compare two records to achieve the purpose of the
match. Siperian Hub uses the selected match purpose as a basis for applying the match
rules to determine matched records. The behavior of the rules is dependent on the
selected purpose. The list of available match purposes depends on the population used,
as described in “Fuzzy Population” on page 494,
Two rules with all attributes identical (except for the purpose) will return different sets
of matches because of the different purpose.
Each match purpose supports a combination of mandatory and optional fields. Each
field is weighted according to its influence in the match decision. Some fields in some
purposes may be grouped. There are two types of groupings:
• Required—requires at least one of the field members to be non-null
• Best of—contributes only the best score from the fields in the group to the overall
match score
The overall score returned by each purpose is calculated by adding the participating
field scores multiplied by their respective weight and divided by the total of all field
weights. If a field is optional and is not provided, it is not included in the weight
calculation.
Name Formats
Siperian Hub match has the concept of a default name format which tells it where to
expect the last name. The options are:
• Left—last name is at the start of the full name, for example Smith Jim
• Right—last name is at the end of the full name, for example, Jim Smith
The name format used by Siperian Hub depends on the purpose that you're using.
If you are using Organization, then the default is Last name, First name, Middle name.
If using Person/Resident then the default is First Middle Last.
Bear this in mind when formatting data for matching. It might not make a big
difference, but there are edge cases where it helps, particularly for names that do not
fall within the selected population.
Match Levels
For fuzzy-match base objects, the match level determines how precise the match is.
You can specify one of the following match levels for a fuzzy-match base object:
Match Levels
Level Description
Typical Appropriate for most matches.
Conservative Produces fewer matches than the Typical level. Some data that actually
matches may pass through the match process without being flagged as a
match. This situation is called undermatching.
Loose Produces more matches than the Typical level. Loose matching may
produce a significant number of match candidates that are not really
matches. This situation is called overmatching. You might choose to use this in
a match rule for manual merges, to make sure that other, tighter match
rules have not missed any potential matches.
Select the level based on your knowledge of the data to be matched: Typical,
Conservative (fewer matches), or Looser (more matches). When in doubt, use Typical.
For fuzzy-match base objects, the accept limit is a number that determines the
acceptability of a match. This setting does the exact same thing as the match level (see
“Match Levels” on page 558), but to a more granular degree. The accept limit is
defined by Siperian within a population in accordance with its match purpose. The
Accept Limit Adjustment allows a coarse adjustment to what is considered to be a
match for this match rule.
• A positive adjustment results in more conservative matching.
• A negative adjustment results in looser matching.
For example, suppose that, for a given field and a given population, the accept limit for
a typical match level is 80, for a loose match level is 70, and for a conservative match
level is 90. If you specify a positive number (such as 3) for the adjustment, then the
accept level becomes slightly more conservative. If you specify a negative number (such
as -2), then the accept level becomes looser.
Configuring this setting provides a optional refinement to your match settings that
might be helpful in certain circumstances. Adjusting the accept limit even a few points
can have a dramatic effect on your matches, resulting in overmatching or
undermatching. Therefore, it is recommended that you test different settings iteratively,
with small increments, to determine the best setting for your data.
Match Subtype
For base objects containing different types of data, the match subtype option allows you
to apply match rules to specific types of data within the same base object. You have the
option to enable or disable match subtyping for exact-match columns that have
parent/child path components. Match subtype is available only for:
• exact-match column types that are based on a non-root Path Component, and
• match rules that have a fuzzy match / search strategy
To use match subtyping, for each match rule, specify one or more exact-match column(s)
that will serve as the “subtyping” column(s) to use. The subtype indicator can be set
for any of the exact-match columns regardless of whether they are used for segment
match or not. During the match process, evaluation of the subtype column precedes
evaluation of the other match columns. Use match subtyping judiciously, because it can
have a performance impact on the match process.
Match Subtype behaves just like a standard parent/child matching scenario with the
additional requirement that the match column marked as Match Subtype must be the
same across all records being matched. In the following example, the Match Subtype
column is Address Type and the match rule consists of Address Line1, City, and State.
Without Match Subtype, Parent ID 3 would match with 5 and 7. With Match Subtype,
however, Parent ID 3 will not match with 5 nor 7 because the matching rows are
distributed between different Address Types. Parent ID 5 and 7 will match with each
other, however, because the matching rows all fall within the 'Billing' Address Type.
Non-Equal Matching
Note: Non-Equal Matching and Segment Matching are mutually exclusive. If one is
selected, then the other cannot be selected.
Use non-equal matching in match rules to prevent equal values in a column from
matching each other. Non-equal matching applies only to exact-match columns.
NULL Matching
Note: Null Matching and Segment Matching are mutually exclusive. If one is selected,
then the other cannot be selected.
Use NULL matching to specify how the match process should behave when null values
match other null values. NULL matching applies only to exact-match columns.
By default, null matching is disabled, meaning that Siperian Hub treats nulls as unequal
values when it searches for matches (a null value will not match with anything).
To enable null matching, you must explicitly select a null matching option for the
match columns to allow null matching.
Property Description
Disabled Regardless of the other value, nothing will match (nulls are
unequal values). Default setting. A NULL is seen as a
placeholder for an unknown value.
NULL Matches NULL If both values are NULL, then it is considered a match.
NULL Matches Non-NULL If one value is NULL and the other value is not NULL,
then it is considered a match.
Once null matching is configured, Build Match Groups will allow only a single “Null to
non NULL” match into any group, thereby reducing the possibility of unwanted
transitive matching. For more information, see “Build Match Groups and Transitive
Matches” on page 327.
Note: Null matching is exclusive of exact matching. For example, if you enable NULL
Matches Non-Null, the match rule returns only those matches in which one of the cell
values is NULL. It will not provide exact matches where both cells are equal in
addition to also matching NULL against non-NULL. Therefore, if you need both
behaviors, you must create two exact match rules—one with NULL matching enabled,
and the other with NULL matching disabled.
Segment Matching
Note: Segment Matching and Non-Equal Matching are mutually exclusive. If one is
selected, then the other cannot be selected. Segment Matching and NULL Matching
are also mutually exclusive. If one is selected, then the other cannot be selected.
For exact-match columns only, you can use segment matching to limit match rules to
specific subsets of data. For example, you could define different match rules for
customers in different countries by using segment matching to limit certain rules to
specific country codes. Segment matching applies to both exact-match and
fuzzy-match base objects. For more information, see “Configuring Segment Matching
for a Column” on page 576.
If the Segment Matching check box is checked (selected), you can configure two other
options: Segment Matches All Data and Segment Match Values.
When unchecked (the default), Siperian Hub will only match records within the set of
values defined in Segment Match Values. For example, suppose a base object contained
Leads, Partners, Customers, and Suppliers. If Segment Match Values contained the
values Leads and Partners, and Segment Matches All Data were unchecked, then
Siperian would only match within records that contain Leads or Partners.
All Customers and Suppliers records will be ignored.
With Segment Matches All Data checked (selected), then Leads and Partners would
match with Customers and Suppliers, but Customers and Suppliers would not match
with each other.
For segment matching, specifies the list of segment values to use for segment matching.
You must specify one or more values (for a match column) that defines the segment
matching. For example, for a given match rule, suppose you wanted to define segment
matching by Gender. If you specified a segment match value of M (for male), then, for
that match rule, Siperian Hub searches for matches (based on the other match
columns) only on male records—and can only match to other male records, unless you
also enabled Segment Matches All Data.
Note: Segment match values are case-sensitive. When using segment matching on
fuzzy and exact base objects, the values that you set are case-sensitive when executing
the Match batch job.
For exact matches with segment matching enabled on concatenated columns, a space
character must be added to each piece of data present in the concatenated fields.
• Match columns can also be used to match on a column from a child base object,
which in turn can be based on any text column or combination of text columns in
the child base object. Matching on the match columns of a child base object is
called intertable matching.
• When using intertable match and creating match rules for the child table (via a
foreign key), you must include the foreign key from the parent table in each match
rule on the child. If you do not, when the child is merged, the parent records
would lose the child records that had previously belonged to them.
For more information, see “Match Columns Depend on the Search Strategy” on page
515.
Button Description
Adds a match rule. For more information, see “Adding Match Column Rules” on
page 565.
Edits properties for the selected a match rule. For more information, see “Editing
Match Column Rules” on page 570.
Deletes the selected match rule. For more information, see “Deleting Match
Column Rules” on page 572.
Moves the selected match rule up in the sequence. For more information, see
“Changing the Execution Sequence of Match Column Rules” on page 573.
Moves the selected match rule down in the sequence. For more information, see
“Changing the Execution Sequence of Match Column Rules” on page 573.
Changes a manual consolidation rule to an automatic consolidation rule. Select a
manual consolidation record and then click the button. For more information, see
“Specifying Consolidation Options for Match Column Rules” on page 574.
Changes an automatic consolidation rule to a manual consolidation rule. Select an
automatic consolidation record and then click the button. For more information,
see “Specifying Consolidation Options for Match Column Rules” on page 574.
Important: If you change your match rules after matching, you are prompted to reset
your matches. When you reset your matches, it deletes everything in the match table
and, in records where the consolidation indicator is 2, resets the consolidation indicator
to 4. For more information, see “About the Consolidate Process” on page 335 and
“Reset Match Table Jobs” on page 744.
The Schema Manager displays the properties for the selected match rule set.
5. In the Match Rules section of the screen, click the plus button .
The Schema Manager displays the Edit Match Rule dialog. This dialog differs
slightly between exact match and fuzzy-match base objects.
6. For fuzzy-match base objects, configure the match rule properties at the top of the
dialog box. For more information, see “Match Rule Properties for Fuzzy-match
Base Objects Only” on page 544.
7. Configure the match column(s) for this match rule.
Only columns you have previously defined as match columns are shown.
• For exact-match base objects or match rules with an exact match / search
strategy, only exact column types are available.
• For fuzzy-match base objects, you can choose fuzzy or exact column types.
To learn more, see “Match Columns Depend on the Search Strategy” on page 515.
a. Click the Edit button next to the Match Columns list.
b. Check (select) the check box next to any column that you want to include.
c. Uncheck (clear) the check box next to any column that you want to omit.
d. Click OK.
The Schema Manager displays the selected columns in the Match Columns list.
8. Configure the match properties for each match column in the Match Columns list.
For more information, see:
• “Match Column Properties for Match Rules” on page 559
• “Configuring the Match Weight of a Column” on page 575
• “Configuring Segment Matching for a Column” on page 576
• “NULL Matching” on page 561
7. Configure the match column(s) for this match rule, if you want.
Only columns you have previously defined as match columns are shown.
• For exact-match base objects or match rules with an exact match / search
strategy, only exact column types are available.
• For fuzzy-match base objects, you can choose fuzzy or exact columns types.
To learn more, see “Match Columns Depend on the Search Strategy” on page 515.
a. Click the Edit button next to the Match Columns list.
The Schema Manager displays the Add/Remove Match Columns dialog.
b. Check (select) the check box next to any column that you want to include.
c. Uncheck (clear) the check box next to any column that you want to omit.
d. Click OK.
The Schema Manager displays the selected columns in the Match Columns list.
8. Change the match properties for any match column that you want to edit. For
more information, see:
• “Match Column Properties for Match Rules” on page 559
• “Configuring the Match Weight of a Column” on page 575
• “Configuring Segment Matching for a Column” on page 576
• “NULL Matching” on page 561
• “Match Subtype” on page 559
9. Click OK.
10. If this is an exact match, specify the match properties for this match rule. For more
information, see “Requirements for Exact-match Columns in Match Column
Rules” on page 563. Click OK.
11. Click the Save button to save your changes.
Before saving changes, the Schema Manager analyzes the match rule set and
prompts you with a message if the match rule set contains certain incongruences.
For more information, see “Rule Set Evaluation” on page 533.
12. If you are prompted to confirm saving changes, click OK button to save your
changes.
Note: A base object cannot have more than 200 user-defined columns if it will have
match rules that are configured for automatic consolidation.
For a fuzzy-match column, you can change its match weight in the Edit Match Rule
dialog box. For each column, Siperian Hub assigns an internal match weight, which is a
number that indicates the importance of this column (relative to other columns in the
table) for matching. The match weight varies according to the selected match purpose
and population. For example, if the match purpose is Person_Name, then Siperian
Hub, when evaluating matches, views a data match in the name column with greater
importance than a data match in a different column (such as the address).
By adjusting the match weight of a column, you give added weight to, and elevate the
significance of, that column (relative to other columns) when Siperian Hub analyzes
values for matches.
1. In the Edit Match Rule dialog box, select an exact-match column in the Match
Columns list.
2. Check (select) the Segment Matching check box to enable this feature.
3. Check (select) the Segment Matches All Data check box, if you want. For more
information, see “Segment Matches All Data” on page 562.
4. Specify the segment match values for segment matching. For more information,
see “Segment Match Values” on page 563.
a. Click the Edit button.
The Schema Manager displays the Edit Values dialog.
For example, two systems might use the same set of customer IDs. If both systems
provide information about customer XYZ123 using identical primary key values, the
two systems are certainly referring to the same customer and the records should be
automatically consolidated.
When you specify a primary key match, you simply specify which source systems that
have the same primary key values. You also check the Auto-merge matching records
check box to have Siperian Hub automatically consolidate matching records when a
Merge or Link batch job is run. To learn more, see “Automerge Jobs” on page 717 and
“Autolink Jobs” on page 715.
The Schema Manager displays the Primary Key Match Rules tab.
The Primary Key Match Rules tab has the following columns.
Column Description
Key Combination Two source systems for which this primary match key rule will
be used for matching. These source systems must already be
defined in Siperian Hub (see “Configuring Source Systems” on
page 348), and staging tables for this base object must be
associated with these source systems (see “Configuring Staging
Tables” on page 364).
Auto-Merge Specifies whether this primary key match rule results in
automatic or manual consolidation. For more information, see
“About the Consolidate Process” on page 335.
5. Check (select) the check box next to two source systems for which you want to
match records based on the primary key.
6. Check (select) the Auto-merge matching records check box if you are certain
that records with identical primary keys are matches.
You can change your choice for Auto-merge matching records later, if you want.
7. Click OK.
The Schema Manager displays the new rule in the Primary Key Rule tab.
9. Choose Yes. to delete all matches currently stored in the match table, if you want.
4. Scroll to the primary key match rule that you want to edit.
5. Check or uncheck the Auto-merge matching records check box to enable or
disable auto-merging, respectively.
6. Click the Save button to save your changes.
The Schema Manager asks you whether you want to reset existing matches.
7. Choose Yes to delete all matches currently stored in the match table, if you want.
4. Select the primary key match rule that you want to delete.
5. Click the Delete button.
The Schema Manager prompts you to confirm deletion.
6. Choose Yes.
The Schema Manager removes the deleted rule from the Primary Key Match Rules
tab.
8. Choose Yes to delete all matches currently stored in your Match table, if you want.
In the Match / Merge Setup Details pane of the Schema Manager, the Match Keys
Distribution tab allows you to investigate the distribution of match keys in the match
key table. This tool can assist you with identifying potential hot spots in your data—high
concentrations of match keys that could result in overmatching—where the match
process generates too many matches, including matches that are not relevant.
By knowing where hot spots occur in your data, you can refine data cleansing and
match rules to reduce hot spots and generate an optimal distribution of match keys for
use in the match process. Ideally, you want to have a relatively even distribution across
all keys.
Histogram
Match Columns
Histogram
The histogram displays the statistical distribution of match keys in the match key table.
Axis Description
Key (X-axis) Starting character(s) of the match key. If no filter is applied (the default),
this is the starting character of the match key. If a filter is applied, this is the
starting sequence of characters in the match key, beginning with the
left-most character. For more information, see “Filtering Match Keys” on
page 587.
Count (Y-axis) Number of match keys in the match key table that begins with the starting
character(s). Hotspots in the match key table show up as disproportionately
tall spikes (high number of match keys), relative to other characters in the
histogram.
67
The Match Keys List on the Match Keys Distribution tab displays records in the match
key table. For each record, it displays cell data for the following columns:
Depending on the configured match rules and the nature of the data in a record, a
single record in the base object table can have multiple generated match keys.
Use the following command buttons to navigate the records in the match key table.
Button Description
Displays the first page of records in the match key table.
Match Columns
The Match Columns area on the Match Keys Distribution tab displays match column
data for the selected record in the match keys list. This is the SSA_DATA column in
the match key table. For each match column that is configured for this base object (see
“Configuring Match Columns” on page 515), it displays the column name and cell data.
The filter condition specifies the beginning string sequence for qualified match keys,
evaluated from left to right. For example, to view only match keys beginning with the
letter M, you would select M for the filter. To further restrict match keys and view data
for only the match keys that start with the letters MD you would add the letter D to the
filter. The longer the filter expression, the more restrictive the display.
Setting a Filter
To set a filter:
• Click the vertical bar in the Histogram associated with the character you want to
add to the filter.
For example, suppose you started with the following default view in the Histogram.
If you click the vertical bar above the M character, the Histogram refreshes and displays
the distribution for all match keys beginning with the character M.
Note that the Match Keys List now displays only those match keys that meet the filter
condition.
Navigating Filters
Button Description
Clears the filter. Displays the default view (no filter).
Displays the previously-selected filter (removes the right-most character from the
filter).
Siperian Hub provides a mechanism for selectively excluding records from the match
process. You might want to do this if, for example, your data contained records that
you wanted the match process to ignore.
To configure this feature, in the Schema Manager, you add a column named
EXCLUDE_FROM_MATCH to a base object. This column must be an integer type
with a default value of zero (0), as described in “Adding Columns” on page 134.
Once the table is populated and before running the Match job, to exclude a record
from matching, change its value in the EXCLUDE_FROM_MATCH column to a one
(1) in the Data Manager. When the Match job runs, only those records with an
EXCLUDE_FROM_MATCH value of zero (0) will be tokenized and processed—all
other records will be ignored. When the cell value is changed, the DIRTY_IND for
this record is set to 1 so that match keys will be regenerated when the tokenization
process is executed, as described in “Match Keys and the Tokenization Process” on
page 322.
This chapter describes how to configure the consolidate process for your Siperian Hub
implementation.
Chapter Contents
• Before You Begin
• About Consolidation Settings
• Changing Consolidation Settings
593
Before You Begin
Note: If the Requeue on Parent Merge setting for a child base object is set to 2, in the
event of a merging parent, the consolidation indicator will be set to 4 for the child
record. For more information, see “Requeue On Parent Merge” on page 104.
Immutable sources are also distinct systems, as described in “Distinct Source Systems”
on page 596. All records are stored in the Siperian Hub as master records. For all
source records from an immutable source system, the consolidation indicator for Load
and PUT is always 1 (consolidated record). If the Requeue on Parent Merge setting for
a child base object is set to 2, then in the event of a merging parent, the consolidation
indicator will be set to 4 for the child record. For more information, see
“Consolidation Status for Base Object Records” on page 289.
To specify an immutable source for a base object, click the drop-down list next to
Immutable Rowid Object and select a source system.
This list displays the source system(s) associated with this base object. Only one source
system can be designated an immutable source system. To learn more, see
“Configuring Source Systems” on page 348.
Immutable source systems are applicable when, for example, Siperian Hub is the only
persistent store for the source data. Designating an immutable source system
streamlines the load, match, and merge processes by preventing intra-source matches
and automatically accepting records from immutable sources as unique. If two
immutable records must be merged, then a data steward needs to perform a manual
verification in order to allow that change. At that point, Siperian Hub allows the data
steward to choose the key that remains.
Distinct Systems
A distinct system provides data that gets inserted into the base object without being
consolidated. Records from a distinct system will never match with other records from
the same system, but they can be matched to and from other records in other systems
(their CONSOLIDATION_IND is set to 4 on load). You can specify distinct source
systems and configure whether, for each source system, records are consolidated
automatically or manually.
You can designate a source system as a distinct source (also known as a golden source),
which means that records from that source will not be merged. For example, if the
ABC source has been designated as a distinct source, then the match rules will never
match (or merge) two records that come from the same source. Records from a distinct
source will not match through a transient match in an Auto Match and Merge process
(see “Auto Match and Merge Jobs” on page 716). Such records can be merged only
manually by flagging them as matches.
The following example shows both options selected for the Billing system.
For distinct systems only, you can enable this option to allow you to configure what
types of rules are executed for the associated distinct source system. Check (select) this
check box if you want Siperian Hub to apply only the automatic consolidation rules
(not the manual consolidation rules) for this distinct system. By default, this option is
disabled (unchecked).
For child base objects, Siperian Hub provides a cascade unmerge feature that allows you to
specify what happens if records in the parent base object are unmerged. By default, this
feature is disabled, so that unmerging parent records does not unmerge associated child
records. In the Unmerge Child When Parent Unmerges portion near the bottom of the
Merge Settings tab, if you check (select) the Cascade Unmerge check box for a child
base object, when records in the parent object are unmerged, Siperian Hub also
unmerges affected records in the child base object.
In the Unmerge Child When Parent Unmerges portion near the bottom of the Merge
Settings tab, the Schema Manager displays only those match-enabled columns in the
child base object that are configured with a foreign key. To learn more, see
“Configuring Foreign-Key Relationships Between Base Objects” on page 140.
In situations where a parent base object has multiple child base objects, you can
explicitly enable cascade unmerge for each child base object. Once configured, when
the parent base object is unmerged, then all affected records in all associated child base
objects are unmerged as well.
A full unmerge of affected records is not required in all implementations, and it can
have a performance overhead on the unmerge because many child records can be
affected. In addition, it does not always make sense to enable this property. One
example is when Customer is a child of Customer Type. In this situation, you might not
want to unmerge Customers if Customer Type is unmerged. However, in most cases, it
is a good idea to unmerge addresses linked to customers if Customer unmerges.
Note: When cascade unmerge is enabled, the child record may not be unmerged if a
previous manual unmerge was done on the child base object.
When you enable the unmerge feature, it applies to the child table and the child
cross-reference table. Once enabled, if you then unmerge the parent cross-reference,
the original child cross-reference should be unmerged as well. This feature has no
impact on the parent—the feature operates on the child tables to provide additional
flexibility.
The Schema Manager displays the Merge Settings tab for the selected base object.
This chapter describes how to configure the publish process for Siperian Hub data
using message triggers and embedded message queues. For an introduction, see
“Publish Process” on page 342.
Chapter Contents
• Before You Begin
• Configuration Steps for the Publish Process
• Starting the Message Queues Tool
• Configuring Global Message Queue Settings
• Configuring Message Queue Servers
• Configuring Outbound Message Queues
• Configuring Message Triggers
• JMS Message XML Reference
601
Before You Begin
The Siperian installer automatically sets up message queues and the connection
factory configuration. For more information, see the Siperian Hub Installation Guide
for your platform.
2. Configure global message queue settings. For more information, see “Configuring
Global Message Queue Settings” on page 604.
3. Add at least one message queue server. For more information, see “Configuring
Message Queue Servers” on page 605.
4. Add at least one message queue to the message queue server. For more
information, see “Configuring Outbound Message Queues” on page 608.
5. Generate the JMS event message schema for each ORS that has data that you want
to publish. For more information, see “Generating and Deploying ORS-specific
Schemas” on page 827.
6. Configure message triggers for your message queues. For more information, see
“Configuring Message Triggers” on page 612.
After you have configured message queues, you can review run-time activities using the
Audit Manager according to the instructions in “Auditing Message Queues” on page
928.
Pane Description
Navigation pane Shows (in a tree view) the message queues that are defined for this
Siperian Hub implementation.
Properties pane Shows the properties for the selected message queue.
Click the button next to any property that you want to change.
5. Click the button to save your changes.
Property Description
Connection Factory Name of the connection factory for this message queue server.
Name
Display Name Name of this message queue server as it will be displayed in the Hub
Console.
Description Descriptive information for this message queue server.
WebSphere Properties
Property Description
Server Name Name of the server where the message queue is defined.
Channel Channel of the server where the message queue is defined.
Port Port on the server where the message queue is defined.
The Message Queues tool displays the Add Message Queue Server dialog.
4. the Message Queues tool displays Specify the properties for this message queue
server. For more information, see “Message Queue Server Properties” on page
605.
Property Description
Queue Name Name of this message queue. This must match the JNDI queue name
as configured on your application server.
Property Description
Display Name Name of this message queue as it will be displayed in the Hub
Console.
Description Descriptive information for this message queue.
4. Specify the message queue properties. For more information, see “Message Queue
Properties” on page 608.
5. Click OK.
The Message Queues tool prompts you to choose the queue assignment.
Assignment Description
Leave Unassigned Queue is currently unassigned and not in use. Select this option
to use this queue as the outbound queue for Siperian Hub API
responses, or to indicate that the queue is currently unassigned
and is not in use.
Use with Message Queue is currently assigned and is available for use by message
Queue Triggers triggers that are defined in the Schema Manager according to the
instructions in “Configuring Message Triggers” on page 612.
Use Legacy XML Select (check) this option only if your Siperian Hub
implementation requires that you use the legacy XML message
format (Siperian Hub XU version) instead of the current version
of the XML message format. For more information, see “Legacy
JMS Message XML Reference” on page 644.
You can use the same message queue for all triggers, or you can use a different message
queue for each trigger. In order for an action to trigger a message trigger, the message
queues must be configured, and a message trigger must be defined for that base object
and action.
The following types of events can cause a message trigger to be fired and a message
placed in the queue.
Events for Which Message Queue Rules Can Be Defined
Event Description
Add new data • Add the data through the load process
• Add the data through the Data Manager
• Add the data through the API verb using PUT or CLEANSE_PUT
(either through HTTP, SOAP, MQ, and so on)
Consider the following issues when setting up message triggers for your
implementation:
• If a message queue is used in any message trigger definition under a base object in
any Hub Store, the message queue displays the following message: “The message
queue is currently in use by message triggers.” In this case, you cannot edit the
properties of the message queue. Instead, you must create another message queue
to make the necessary changes.
• Message triggers apply to one base object only, and they fire only when a specific
action occurs directly on that base object. If you have two tables that are in a
parent-child relationship, then you need to explicitly define message queues
separately, for each table. Change detection is based on specific changes to each
base object (such as a load INSERT, load UPDATE, MERGE, or PUT). Changes
to a record of the parent table can fire a message trigger for the parent record only.
If changes in the parent record affect one or more associated child records, then a
message trigger for the child table must be explicitly configured to fire when such
an action occurs in the child records.
• In addition to base objects, message triggers can be configured for dependent and
relationship objects. However, only insert and update actions are available for
dependent and relationship objects.
If no message triggers have been set up, then the Schema Tool displays an empty
screen.
8. Select the package that will be used to build the message. For more information,
see “Configuring Packages” on page 196.
9. Click Next.
The Add Message Trigger wizard prompts you to specify the target message queue.
10. Select the message queue to which the message will be written.
11. Click Next.
The Add Message Trigger wizard prompts you to specify the rules for this message
trigger.
For more information, see “Types of Events for Message Triggers” on page 612.
13. Configure the system properties for this message trigger:
Note: You must select at least one Triggering system and one In Message system.
For example, suppose your implementation had three source systems (A, B, and C)
and a base object record had cross-reference records for A and B. Suppose the
cross-reference in system A for this base object record were updated.
The following table shows possible message trigger configurations and the
resulting message:
14. Identify the system to which the event applies, columns to listen to for changes,
and the package used to construct the message.
All events send the base object record—and all corresponding cross-references
that make up that record—to the message, based on the specified package.
15. Click Next if you have selected an Update option. Otherwise click Finish.
16. If you have clicked the Update action, the Schema Manager prompts you to select
the columns to monitor for update actions.
• Select the Trigger message if change on any column check box to monitor
all columns for updates.
18. Click Finish.
5. Change the settings you want. For more information, see “Adding Message
Triggers” on page 615 and “Types of Events for Message Triggers” on page 612.
Click the button next to editable property that you want to change.
6. Click the button to save your changes.
Note: If your Siperian Hub implementation requires that you use the legacy XML
message format (Siperian Hub XU version) instead of the current version of the XML
message format (described in this section), see “Legacy JMS Message XML Reference”
on page 644 instead.
Schema Manager tool in the Hub Console. For more information, see “Generating and
Deploying ORS-specific Schemas” on page 827.
Field Description
Root Node
<siperianEvent> Root node in the XML message.
Event Metadata
<eventMetadata> Root node for event metadata.
<messageId> Unique ID for siperianEvent messages.
<eventType> Type of event, as described in “Types of Events for Message
Triggers” on page 612. One of the following values:
• Insert
• Update
• Update XREF
• Accept as Unique
• Merge
• Unmerge
• Merge Update
<baseObjectUid> UID of the base object affected by this action.
<packageUid> UID of the package associated with this action.
<messageDate> Date/time when this message was generated.
<orsId> ID of the Operational Record Store (ORS) associated with this
event.
<triggerUid> UID of the rule that triggered the event that generated this
message.
Event Details
<eventTypeEvent> Root node for event details.
<sourceSystemName> Name of the source system associated with this event.
Field Description
<sourceKey> Value of the PKEY_SRC_OBJECT associated with this event.
<eventDate> Date/time when the event was generated.
<rowid> RowID of the base object record that was affected by the
event.
<xrefKey> Root node of a cross-reference record affected by this event.
<systemName> System name of the cross-reference record affected by this
event.
<sourceKey> PKEY_SRC_OBJECT of the cross-reference record affected
by this event.
<packageName> Name of the secure package associated with this event.
<columnName> Each column in the package is represented by an element in
the XML file. Examples: rowidObject and
consolidationInd. Defined in the ORS-specific XSD that
is generated using the JMS Event Schema Manager tool. For
more information, see “Generating and Deploying
ORS-specific Schemas” on page 827.
<mergedRowid> List of ROWID_OBJECT values for the losing records in the
merge. This field is included in messages for Merge events only.
<dependentSourceKey> Applies only to an insert in or update of the relationship of
dependent objects.
Filtering Messages
You can use the custom JMS header named MessageType to filter incoming messages
based on the message type. The following message types are indicated in the message
header.
<acceptAsUniqueEvent>
<sourceSystemName>Admin</sourceSystemName>
<sourceKey>SVR1.1T1</sourceKey>
<eventDate>2008-09-10T16:33:14.000-07:00</eventDate>
<rowid>2 </rowid>
<xrefKey>
<systemName>Admin</systemName>
<sourceKey>SVR1.1T1</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>2 </rowidObject>
<creator>admin</creator>
<createDate>2008-08-13T20:28:02.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-10T16:33:14.000-07:00</lastUpdateDate>
<consolidationInd>1</consolidationInd>
<lastRowidSystem>SYS0 </lastRowidSystem>
<dirtyInd>0</dirtyInd>
<firstName>Joey</firstName>
<lastName>Brown</lastName>
</contactPkg>
</acceptAsUniqueEvent>
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
AMRule Message
<eventDate>2008-09-19T11:43:42.979-07:00</eventDate>
<contactPkgAmEvent>
<amRuleUid>AM_RULE.RuleSet1|Rule1</amRuleUid>
<contactPkg>
<rowidObject>64 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-08T16:24:35.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-18T16:26:45.000-07:00</lastUpdateDate>
<consolidationInd>2</consolidationInd>
<lastRowidSystem>SYS0 </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>Johnny</firstName>
<lastName>Brown</lastName>
<hubStateInd>1</hubStateInd>
</contactPkg>
<cContact>
<event>
<eventType>Update</eventType>
<system>Admin</system>
</event>
<event>
<eventType>Update XREF</eventType>
<system>Admin</system>
</event>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>PK1265</sourceKey>
</xrefKey>
<xrefKey>
<systemName>Admin</systemName>
<sourceKey>64</sourceKey>
</xrefKey>
</cContact>
</contactPkgAmEvent>
</amRuleEvent>
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
BoDelete Message
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
BoSetToDelete Message
<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>BO set to Delete</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>319</messageId>
<messageDate>2008-09-19T14:21:03.000-07:00</messageDate>
</eventMetadata>
<boSetToDeleteEvent>
<sourceSystemName>Admin</sourceSystemName>
<eventDate>2008-09-19T14:21:03.000-07:00</eventDate>
<rowid>102 </rowid>
<xrefKey>
<systemName>CRM</systemName>
</xrefKey>
<xrefKey>
<systemName>Admin</systemName>
</xrefKey>
<xrefKey>
<systemName>WEB</systemName>
</xrefKey>
<contactPkg>
<rowidObject>102 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-19T13:57:09.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-19T14:21:03.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>SYS0 </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<hubStateInd>-1</hubStateInd>
</contactPkg>
</boSetToDeleteEvent>
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Delete Message
</deleteEvent>
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Insert Message
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Merge Message
<lastName>Brown</lastName>
</contactPkg>
</mergeEvent>
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
<lastName>Jones</lastName>
</contactPkg>
</mergeUpdateEvent>
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
No Action Message
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-10T17:25:42.000-07:00</lastUpdateDate>
<consolidationInd>1</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>Thomas</firstName>
<lastName>Jones</lastName>
</contactPkg>
</noActionEvent>
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
PendingInsert Message
<lastUpdateDate>2008-09-19T13:57:09.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>SYS0 </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>John</firstName>
<lastName>Smith</lastName>
<hubStateInd>0</hubStateInd>
</contactPkg>
</pendingInsertEvent>
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
PendingUpdate Message
<creator>admin</creator>
<createDate>2008-09-19T13:57:09.000-07:00</createDate>
<updatedBy>sifuser</updatedBy>
<lastUpdateDate>2008-09-19T14:01:36.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>John</firstName>
<lastName>Smith</lastName>
<hubStateInd>1</hubStateInd>
</contactPkg>
</pendingUpdateEvent>
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
PendingUpdateXref Message
<sourceKey>SVR1.2V3</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>102 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-19T13:57:09.000-07:00</createDate>
<updatedBy>sifuser</updatedBy>
<lastUpdateDate>2008-09-19T14:01:36.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>John</firstName>
<lastName>Smith</lastName>
<hubStateInd>1</hubStateInd>
</contactPkg>
</pendingUpdateXrefEvent>
</siperianEvent>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Unmerge Message
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Update Message
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
XRefDelete Message
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
XRefSetToDelete Message
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
BO Delete Message
<LAST_ROWID_SYSTEM>CRM </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME>John</FIRST_NAME>
<LAST_NAME>Smith</LAST_NAME>
<HUB_STATE_IND>-1</HUB_STATE_IND>
</DATA>
</DATAAREA>
</SIP_EVENT>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
BO set to Delete
<DATA>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<CREATOR>admin</CREATOR>
<CREATE_DATE>19 Sep 2008 13:57:09</CREATE_DATE>
<UPDATED_BY>admin</UPDATED_BY>
<LAST_UPDATE_DATE>19 Sep 2008 14:21:03</LAST_UPDATE_DATE>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<DELETED_IND />
<DELETED_BY />
<DELETED_DATE />
<LAST_ROWID_SYSTEM>SYS0 </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME />
<LAST_NAME />
<HUB_STATE_IND>-1</HUB_STATE_IND>
</DATA>
</DATAAREA>
</SIP_EVENT>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Delete Message
<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>
<XREF>
<SYSTEM>WEB</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>107 </ROWID_OBJECT>
<CREATOR>sifuser</CREATOR>
<CREATE_DATE>19 Sep 2008 14:35:28</CREATE_DATE>
<UPDATED_BY>admin</UPDATED_BY>
<LAST_UPDATE_DATE>19 Sep 2008 14:35:53</LAST_UPDATE_DATE>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<DELETED_IND />
<DELETED_BY />
<DELETED_DATE />
<LAST_ROWID_SYSTEM>CRM </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME>John</FIRST_NAME>
<LAST_NAME>Smith</LAST_NAME>
<HUB_STATE_IND>-1</HUB_STATE_IND>
</DATA>
</DATAAREA>
</SIP_EVENT>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Insert Message
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Merge Message
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
</SIP_EVENT>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
</DATA>
</DATAAREA>
</SIP_EVENT>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
<DELETED_DATE />
<LAST_ROWID_SYSTEM>CRM </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME>John</FIRST_NAME>
<LAST_NAME>Smith</LAST_NAME>
<HUB_STATE_IND>1</HUB_STATE_IND>
</DATA>
</DATAAREA>
</SIP_EVENT>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Update Message
<XREF>
<SYSTEM>SFA</SYSTEM>
<PKEY_SRC_OBJECT>49 </PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT>74 </PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<CONSOLIDATION_IND>1</CONSOLIDATION_IND>
<FIRST_NAME>Jimmy</FIRST_NAME>
<MIDDLE_NAME>Neville</MIDDLE_NAME>
<LAST_NAME>Darwent</LAST_NAME>
<SUFFIX>Jr</SUFFIX>
<GENDER>M </GENDER>
<BIRTH_DATE>1938-06-22</BIRTH_DATE>
<SALUTATION>Mr</SALUTATION>
<SSN_TAX_NUMBER>659483773</SSN_TAX_NUMBER>
<FULL_NAME>Jimmy Darwent, Stony Brook Ny</FULL_NAME>
</DATA>
</DATAAREA>
</SIP_EVENT>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
<PKEY_SRC_OBJECT>196 </PKEY_SRC_OBJECT>
</SOURCE_XREF>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>196 </PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>SFA</SYSTEM>
<PKEY_SRC_OBJECT>49 </PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT>74 </PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<CONSOLIDATION_IND>1</CONSOLIDATION_IND>
<FIRST_NAME>Jimmy</FIRST_NAME>
<MIDDLE_NAME>Neville</MIDDLE_NAME>
<LAST_NAME>Darwent</LAST_NAME>
<SUFFIX>Jr</SUFFIX>
<GENDER>M </GENDER>
<BIRTH_DATE>1938-06-22</BIRTH_DATE>
<SALUTATION>Mr</SALUTATION>
<SSN_TAX_NUMBER>659483773</SSN_TAX_NUMBER>
<FULL_NAME>Jimmy Darwent, Stony Brook Ny</FULL_NAME>
</DATA>
</DATAAREA>
</SIP_EVENT>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Unmerge Message
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
</SIP_EVENT>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
</DATA>
</DATAAREA>
</SIP_EVENT>
Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.
Contents
• Chapter 17, “Using Batch Jobs”
• Chapter 18, “Writing Custom Scripts to Execute Batch Jobs”
665
666 Siperian Hub Administrator Guide
17
Using Batch Jobs
This chapter describes how to configure and execute Siperian Hub batch jobs using the
Batch Viewer and Batch Group tools in the Hub Console. For more information about
creating batch jobs using job execution scripts, see Chapter 18, “Writing Custom
Scripts to Execute Batch Jobs.”
Chapter Contents
• Before You Begin
• About Siperian Hub Batch Jobs
• Running Batch Jobs Using the Batch Viewer Tool
• Running Batch Jobs Using the Batch Group Tool
• Batch Jobs Reference
667
Before You Begin
One of the tasks Siperian Hub batch jobs perform is to move data from landing tables
to the appropriate target location in Siperian Hub. Therefore, before you run Siperian
Hub batch jobs, you must first have your source systems or an ETL tool write data into
the landing tables. The landing tables are Siperian Hub’s interface for batch loads. You
deliver the data to the landing tables, and Siperian Hub batch procedures manipulate
the data and copy it to the appropriate location(s). For more information, see the
description of the Siperian Hub data management process in the Siperian Hub Overview.
Batch jobs need to be executed in a certain sequence. For example, a Match job must
be run for a base object before running the consolidation process. For merge-style base
objects, you can run the Auto Match and Merge job, which executes the Match job and
then Automerge job repeatedly, until either all records in the base object have been
checked for matches, or until the maximum number of records for manual
consolidation limit is reached (see “Maximum Matches for Manual Consolidation” on
page 490).
The general rule of thumb is that all parent tables (tables that other tables reference)
must be loaded first.
If two tables have a foreign key relationship between them, you must load the table that
is being referenced gets loaded first, and the table doing the referencing gets loaded
second. The following foreign key relationships can exist in Siperian Hub:
• from one base object (child with foreign key) to another base object (parent with
primary key)
• from a dependent object to the base object that owns it
In most cases, you will schedule these jobs to run on a regular basis.
When you configure your Hub Store, the following types of batch jobs are
automatically created:
• Auto Match and Merge Jobs
• Autolink Jobs
• Automerge Jobs
• BVT Snapshot Jobs
• External Match Jobs
• Generate Match Tokens Jobs
• Load Jobs
• Manual Link Jobs
• Manual Merge Jobs
• Manual Unlink Jobs
• Manual Unmerge Jobs
• Match Jobs
• Match Analyze Jobs
• Migrate Link Style To Merge Style Jobs
• Promote Jobs
• Reset Links Jobs
• Stage Jobs
The following batch jobs are created when you make changes to the match and merge
setup, set properties, or enable trust settings after initial loads:
• Accept Non-Matched Records As Unique
• Key Match Jobs
• Reset Links Jobs
• Reset Match Table Jobs
• Revalidate Jobs (that is, if you enable validation for a column)
• Synchronize Jobs
Note: The Batch Viewer does not provide automated scheduling. For more
information about how to create custom scripts to execute batch jobs and batch
groups, see “About Executing Siperian Hub Batch Jobs” on page 750
The Hub Console displays the Batch Viewer tool, as shown in the following example.
2. Expand the tree to display the batch job that you want to run, and then click it to
select it.
The Batch Viewer displays a screen for the selected batch job with properties and
command buttons.
Field Description
Identification information for this batch job. Stored in the
Identity C_REPOS_TABLE_OBJECT_V table
Name Type code for this batch job. For example, Load jobs have
the CMXLD.LOAD_MASTER type code. Stored in the
OBJECT_NAME column of the C_REPOS_TABLE_
OBJECT_V table.
Field Description
Description Description for this batch job in the format:
JobName for | from BaseObjectName
Examples:
• Load from Consumer_Credit_Stg
• Match for Address
This description is stored in the OBJECT_DESC column of the
C_REPOS_TABLE_OBJECT_V table.
Status Status information for this batch job
Current Status Current status of the job. Examples:
• Executing
• Incomplete
• Completed
• Not Executing
• <Batch Job> Successful
• Description of failure
Certain types of batch jobs have additional fields that you can configure before running
the batch job.
After you have selected a batch job, you can click the following command buttons.
.
Button Description
Executes the selected batch job.
Important: You must have the application server running for the duration of an
executing batch job.
To execute batch jobs in other ways, see “Ways to Execute Batch Jobs” on page 668.
While a batch job is running, you can click Refresh Status to check if the status has
changed.
In very rare circumstances, you might want to change the status of a running job by
clicking Set Status to Incomplete and execute the job again. Only do this if the batch
job has stopped executing (due to an error, such as a server reboot or crash) but
Siperian Hub has not detected that the job has stopped due to a job application lock in
the metadata. You will know this is a problem if the current status is Executing but
the database, application server, and logs show no activity. If this occurs, click this
button to clear the job application lock so that you can run the batch job again;
otherwise, you will not be able to execute the batch job. Setting the status to
Incomplete just updates the status of the batch job—it does not abort the job.
Note: This option is available only if your user ID has Siperian Administrator rights.
Each job execution log entry has one of the following status values:
Icon Description
Batch job is currently running.
The Batch Viewer displays a screen for the selected job execution log.
For each job execution log entry, the Batch Viewer displays the following information:
Field Description
Identification information for this batch job. Stored in the
Identity C_REPOS_TABLE_OBJECT_V table
Name Name of this job execution log. Date / time when the batch job started.
Description Description for this batch job in the format:
JobName for / from BaseObjectName
Examples:
• Load from Consumer_Credit_Stg
• Match for Address
Field Description
Source system One of the following:
• source system of the processed data
• Admin
Source table Source table of the processed data.
Status Status information for this batch job
Current Status Current status of this batch job. If an error occurred, displays
information about the error. For more information, see “Job Execution
Status” on page 682.
Metrics Metrics for this batch job
[Various] Statistics collected during the execution of the batch job (if applicable).
For more information, see:
• “Auto Match and Merge Metrics” on page 716
• “Automerge Metrics” on page 718
• “Load Job Metrics” on page 731
• “Match Job Metrics” on page 737
• “Match Analyze Job Metrics” on page 739
• “Stage Job Metrics” on page 746
• “Promote Job Metrics” on page 743
Time Timestamp for this batch job
Start Date / time when this batch job started.
Stop Date / time when this batch job ended.
Elapsed time Elapsed time for the execution of this batch job.
For Stage jobs or Load jobs only, if the batch job resulted in records being written to
the rejects table, then the job execution log displays a View Rejects button.
To view the rejected records and the reason why each was rejected:
1. Click the View Rejects button.
2. Click Close.
To copy the current status of a batch to the Windows Clipboard (to paste into a
document or e-mail, for example):
• Click the button.
Note: The actual procedure steps to clear job history will be slightly different
depending on the view (By Table, By Date, or By Procedure Type); the following
procedure assumes you are using the By Table view.
The Batch Viewer does not provide automated scheduling. For more information
about how to create custom scripts to execute batch jobs and batch groups, see
Chapter 18, “Writing Custom Scripts to Execute Batch Jobs.”
For more information about developing custom batch jobs and batch groups that can
be made available in the Batch Group tool, see “Developing Custom Stored
Procedures for Batch Jobs” on page 806.
Note: If you delete an object from the Hub Console (for example, if you delete a
mapping), the Batch Group tool highlights any batch jobs that depend on that object
(for example, a stage job) in red. You must resolve this issue prior to re-executing the
batch group.
Execution Paths
An execution path is the sequence in which batch jobs are executed when the entire batch
group is executed. The execution path begins with the Start node and ends with the
End node. The Batch Group tool does not validate the execution sequence for you—it
is up to you to ensure that the execution sequence is correct. For example, the Batch
Group tool would not notify you of an error if you incorrectly specified the Load job
for a base object ahead of its Stage job, or if you specified the Load job for a
dependent object ahead of the Load job for the base object on which it depends.
Levels
In a batch group, the execution path consists of a series of one or more levels that are
executed in sequence (see “Running Batch Jobs in Sequence” on page 670).
Start Node
Batch Job
Levels
End Node
All batch jobs in the level must complete before the batch group proceeds to the next
task in the sequence.
Note: Because all of the batch jobs in a level are executed in parallel, none of the batch
jobs in the same level should have any dependencies. For example, the Stage and Load
jobs for a base object should be in separate levels that are executed in the proper
sequence. For more information, see “Running Batch Jobs in Sequence” on page 670.
In addition to using the Batch Group tool, you can execute batch groups in the
following ways:
• Services Integration Framework (SIF) requests—Applications can invoke the
SIF ExecuteBatchGroupRequest request to execute batch groups directly. For
more information, see the Siperian Services Integration Framework Guide.
• Stored procedures—Execute batch groups through stored procedures using any
job scheduling software (such as Tivoli, CA Unicenter, and so on). For more
information, see “Executing Batch Groups Using Stored Procedures” on page 798.
Area Description
Navigation Tree Hierarchical list of batch groups and execution logs.
Properties Pane Properties and command
The Batch Group tool adds a “New Batch Group” to the Batch Group tree.
Batch Group Properties
Field Description
Name Specify a unique, descriptive name for this batch group.
Description Enter a description for this batch group.
As described in “About Batch Groups” on page 688, a batch group contains one or
more levels that are executed in sequence. This section describes how to specify the
execution sequence by configuring the levels in a batch group.
Command Description
Add Level Above Add a level to this batch group above the selected item.
Add Level Below Add a level to this batch group below the selected item.
Move Level Up Move this batch group level above the prior level.
Move Level Down Move this batch group level below the next level.
Remove this Level Remove this batch group level.
The Batch Group tool displays the Choose Jobs to Add to Batch Group dialog.
5. Expand the base object(s) for the job(s) that you want to add.
To select jobs that you want to execute in parallel, hold down the CTRL key and
click each job that you want to select.
7. Click OK. The Batch Group tool adds the selected job(s) to the batch group.
5. Click Yes.
The Batch Group tool removes the deleted level from the batch group.
4. In the batch groups tree, right click on the level you want to move down, and
choose Move Level Down.
The Batch Group tool moves the level down within the batch group.
In the Batch Group tool, a job is a Siperian Hub batch job. Each level contains one or
more batch jobs. If a level contains multiple batch jobs, then all of those batch jobs are
executed in parallel.
5. Expand the base object(s) for the job(s) that you want to add.
When configuring a batch group, you can configure job options for certain kinds of
batch jobs. For more information about these job options, see “Options to Set Before
Executing Batch Jobs” on page 679.
The Batch Group tool moves the selected job up one level in the batch group.
Important: You must have the application server running for the duration of an
executing batch group.
Note: If you delete an object from the Hub Console (for example, if you delete a
mapping), the Batch Group tool highlights any batch jobs that depend on that object
(for example, a stage job) in red. You must resolve this issue prior to re-executing the
batch group.
The Control & Logs screen is where you can control the execution of a batch group
and view its execution logs.
3. Expand the batch group and click the Control & Logs node.
The Batch Group tool displays the Control & Logs screen for this batch group.
Toolbar Execution logs for this batch group
Component Description
Toolbar Command buttons for managing batch group execution.
To learn more, see “Command Buttons for Batch Groups” on
page 703.
Logs for the Batch Group Execution logs for this batch group.
Logs for Batch Jobs Execution logs for individual batch jobs in this batch group.
Button Description
Executes this batch group.
Button Description
Sets the execution status of a running batch group to
incomplete. To learn more, see “Handling Incomplete Batch
Group Execution” on page 708.
Removes the selected group or job execution log.
For more information, see “Navigating to the Control & Logs Screen” on page
702.
2. Click on the node and then select Batch Group > Execute, or click on the
Execute button.
The Batch Group tool executes the batch group and updates the logs panel with
the status of the batch group execution.
3. Click the Refresh button to see the execution result.
Icon Description
Processing. The batch group is currently running.
Batch group execution completed with additional information. For example, for
Stage and Load jobs, this can indicate that some records were rejected (see “Viewing
Rejected Records” on page 710). For Match jobs, this can indicate that the base
object is empty or that there are no more records to match.
Icon Description
Batch group execution failed. For more information, see “Restarting a Batch Group
That Failed Execution” on page 707.
Batch group execution is incomplete. For more information, see “Handling
Incomplete Batch Group Execution” on page 708.
Batch group execution has been reset to start over. For more information, see
“Restarting a Batch Group That Failed Execution” on page 707.
Each time that it executes a batch group, the Batch Group tool generates a group
execution log entry. Each log entry has the following properties:
Field Description
Status Current status of this batch job. If batch group execution failed, displays
a description of the problem. For more information, see “Group
Execution Status” on page 705.
Start Date / time when this batch job started.
End Date / time when this batch job ended.
Message Any messages regarding batch group execution.
Each time that it executes a batch job within a batch group, the Batch Group tool
generates a job execution log entry.
Field Description
Job Name Name of this batch job.
Status Current status of this batch job. For more information, see “Job
Execution Status” on page 682.
Start Date / time when this batch job started.
End Date / time when this batch job ended.
Message Any messages regarding batch group execution.
Note: If you want to view the metrics for a completed batch job, you can use the Batch
Viewer. For more information, see “Viewing Job Execution Logs” on page 682.
If batch group execution fails, then you can resolve any problems that may have caused
the failure to occur, then restart batch group from the beginning.
The Batch Group tool changes the status of this batch job to Restart.
3. Resolve any problems that may have caused the failure to occur and execute the
batch group again. For more information, see “Executing a Batch Group” on page
704.
The Batch Group tool executes the batch group and creates a new execution log
entry.
Note: If a batch group fails and you do not click either the Set to Restart button (see
“Restarting a Batch Group That Failed Execution” on page 707) or the Set to
Incomplete button (see “Handling Incomplete Batch Group Execution” on page 708)
in the Logs for My Batch Group list, Siperian Hub restarts the batch job from the prior
failed level.
In very rare circumstances, you might want to change the status of a running batch
group.
• If the batch group status says it is still executing, you can click Set Status to
Incomplete and execute the batch group again. You do this only if the batch
group has stopped executing (due to an error, such as a server reboot or crash) but
Siperian Hub has not detected that the batch group has stopped due to a job
application lock in the metadata.
You will know this is a problem if the current status is Executing but the
database, application server, and logs show no activity. If this occurs, click this
button to clear the job application lock so that you can run the batch group again;
otherwise, you will not be able to execute the batch group. Setting the status to
Incomplete just updates the status of the batch group (as well as all batch jobs
within the batch group)—it does not terminate processing.
Note that, if the job status is Incomplete, you cannot set the job status to Restart.
• If the job status is Failed, you can click Set to Restart. Note that, if the job status
is Restart, you cannot set the job status to Incomplete.
Changing the status allows you to continue doing something else while the batch group
completes.
3. Execute the batch group again. For more information, see “Executing a Batch
Group” on page 704.
Note: If a batch group fails and you do not click either the Set to Restart button (see
“Restarting a Batch Group That Failed Execution” on page 707) or the Set to
Incomplete button (see “Handling Incomplete Batch Group Execution” on page 708)
in the Logs for My Batch Group list, Siperian Hub restarts the batch job from the prior
failed level.
If batch group execution resulted in records being written to the rejects table (during
the execution of Stage jobs or Load jobs), then the job execution log enables the View
Rejects button.
3. Click the particular batch group log entry you want to review in the upper half of
the logs panel.
Siperian Hub displays the detailed job execution logs for that batch group in the
lower half of the panel. For additional information, see:
• “Group Execution Status” on page 705
• “Viewing the Group Execution Log for a Batch Group” on page 706
• “Viewing the Job Execution Log for a Batch Job” on page 706
Note: Batch group logs can be deleted by selecting a batch group log and clicking the
Clear Selected button. To delete all logs shown in the panel, click the Clear All
button.
Autolink Jobs
For link-style base objects only, after the Match job has been run, you can run the
Autolink job to automatically link any records that qualified for autolinking during the
match process.
Important: Do not run an Auto Match and Merge job on a base object that is used to
define relationships between records in inter-table or intra-table match paths. Doing so
will change the relationship data, resulting in the loss of the associations between
records. For more information, see “Relationship Base Objects” on page 498.
If you execute an Auto Match and Merge job, it completes successfully with one job
shown in the status. However, if you stop and restart the application server and return
to the Batch Viewer, you see a second job (listed under Match jobs) with a warning a
few seconds later. The second job is to ensure that either the base object is empty or
there are no more records to match.
After running an Auto Match and Merge job, the Batch Viewer displays the following
metrics (if applicable) in the job execution log:
Metric Description
Matched records Number of records that were matched by the Auto Match and Merge
job.
Records tokenized Number of records that were tokenized prior to the Auto Match and
Merge job.
Automerged records Number of records that were merged by the Auto Match and Merge
job.
Accepted as unique Number of records that were accepted as unique records by the Auto
records Match and Merge job. For more information, see “Automerge Jobs”
on page 717.
Applies only if this base object has Accept All Unmatched Rows as
Unique enabled (set to Yes) in the Match / Merge Setup
configuration. For more information, see “Accept All Unmatched
Rows as Unique” on page 492.
Queued for Number of records that were queued for automerge by a Match job
automerge that was executed by the Auto Match and Merge job. For more
information, see “Automerge Jobs” on page 717.
Queued for manual Number of records that were queued for manual merge. Use the
merge Merge Manager in the Hub Console to process these records. For
more information, see the Siperian Hub Data Steward Guide.
Automerge Jobs
For merge-style base objects only, after the Match job has been run, you can run the
Automerge job to automatically merge any records that qualified for automerging
during the match process. When an Automerge job is run, it processes all matches in
the MATCH table that are flagged for automerging (Automerge_ind=1).
Note: For state-enabled objects only, records that are PENDING (source and target
records) or DELETED are never automerged. When a record is deleted, it is removed
from the match table and its consolidation_ind is reset to 4. For more information
regarding how to manage the state of base object or XREF records, refer to
“Configuring State Management for Base Objects” on page 211.
Auto Match and Merge batch jobs execute a continual cycle of a Match job, followed
by an Automerge job, until there are no more records to match, or until the maximum
number of records for manual consolidation limit is reached (see “Maximum Matches
for Manual Consolidation” on page 490). For additional information, see “Auto Match
and Merge Jobs” on page 716.
An Automerge job will fail if there is a large number of trust-enabled columns. The
exact number of columns that cause the job to fail is variable and based on the length
of the column names and the number of trust-enabled columns. Long column names
are at—or close to—the maximum allowable length of 26 characters. To avoid this
problem, keep the number of trust-enabled columns below 40 and/or the length of the
column names short.
Automerge Metrics
After running an Automerge job, the Batch Viewer displays the following metrics (if
applicable) in the job execution log:
Metric Description
Automerged records Number of records that were automerged by the Automerge job.
Accepted as unique Number of records that were accepted as unique records by the
records Automerge job. Applies only if this base object has Accept All
Unmatched Rows as Unique enabled (set to Yes) in the Match /
Merge Setup configuration. For more information, see “Accept All
Unmatched Rows as Unique” on page 492.
Note: For state-enabled base objects only, the BVT logic uses the HUB_STATE_IND
to ignore the non contributing base objects where the HUB_STATE_IND is -1 or 0
(PENDING or DELETED state). For the online BUILD_BVT call, provide
INCLUDE_PENDING_IND parameter.
2. If this parameter is 1 then include ACTIVE and PENDING base object records.
3. If this parameter is 2 then calculate based on ACTIVE and PENDING XREF
records to provide “what-if ” functionality.
4. If this parameter is 3 then calculate based on ACTIVE XREF records to provide
current BVT based on XREFs, which may be different than the scenario 1.
For more information regarding how to manage the state of base object or XREF
records, refer to Chapter 7, “State Management.”
The External Match job executes as a batch job only—there is no corresponding SIF
request that external applications can invoke. For more information, see “Running
External Match Jobs” on page 724.
In addition to the base object and its associated match key table, the External Match
job uses the following input and output tables.
Each base object has an External Match Input (EMI) table for External Match jobs.
This table uses the following naming pattern:
C_BaseObject_EMI
where BaseObject is the name of the base object associated with this External Match job.
When you create a base object, the Schema Manager automatically creates the
associated EMI table, and automatically adds the following system columns:
When populating the EMI table (see “Populating the Input Table” on page 724), at
least one of these columns must contain data. Note that the column names are
non-restrictive—they can contain any identifying data, as long as the composite
three-column primary key is unique.
In addition, when you configure match rules for a particular column (for example,
Person_Name, Address_Part1, or Exact_Cust_ID), the Schema Manager adds that
column automatically to the C_BaseObject_EMI table.
You can view the columns of an external match table in the Schema Manager by
expanding the External Match Table node, as shown in the following example.
The records in the EMI table are analogous to the match batch used in Match jobs. As
described in “Flagging the Match Batch” on page 329, the match batch contains the set
of records that are matched against the rest of records in the base object. The
difference is that, for Match jobs, the match batch records reside in the base object,
while for External Match, these records reside in a separate input table.
Each base object has an External Match Output (EMO) table that contains the output
data for External Match jobs. This table uses the following naming pattern:
C_BaseObject_EMO
where BaseObject is the name of the base object associated with this External Match job.
Before the External Match job is executed, Siperian Hub drops and re-creates this
table.
Instead of populating the match table for the base object, the External Match job
populates this EMO table with match pairs. Each row in the EMO represents a pair of
matched records—one from the EMI table and one from the base object:
Before running an External Match job, the EMI table must be populated with records
to match against the records in the base object. The process of loading data into an
EMI table is external to Siperian Hub—you must use a data loading tool that works
with your database platform (such as SQL*Loader).
Important: When you populate this table, you must supply data for at least one of the
system columns (SOURCE_KEY, SOURCE_NAME, and FILE_NAME) to help link
back from the _EMI table. In addition, the C_BaseObject_EMI table must contain flat
records—like the output of a JOIN, with unique source keys and no foreign keys to
other tables.
5. Execute the External Match job according to the instructions in “Running Batch
Jobs Manually” on page 677 or “Executing Batch Groups Using the Batch Group
Tool” on page 701.
• The External Match job matches all records in the C_BaseObject_EMI table
against the records in the base object. There is no concept of a consolidation
indicator in the input or output tables.
• The Build Match Group is not run for the results.
6. Inspect the results in the C_BaseObject_EMO table using a data management tool
(external to Siperian Hub).
7. If you want to save the results, make a backup copy of the data before running the
External Match job again.
Note: The C_BaseObject_EMO table is dropped and recreated after every External
Match Job execution.
Note: For state-enabled base objects only, the tokenize batch process skips records
that are in the DELETED state. These records can be tokenized through the Tokenize
API, but will be ignored in batch processing. PENDING records can be matched on a
per base object basis by setting the MATCH_PENDING_IND (default off). For more
information regarding how to manage the state of base object or XREF records, refer
to “Configuring State Management for Base Objects” on page 211.
Before you run a Generate Match Tokens job, you can use the Re-generate All Match
Tokens check box to specify the scope of match token generation.
After the match tokens are generated, you can run the Match job for a base object.
Note: Hub Delete jobs execute as a batch only stored procedure—you can not call a
Hub Delete job from the Batch Viewer or Batch Group tools, and there is no
corresponding SIF request that external applications can invoke. For more
information, see “Hub Delete Jobs” on page 769.
Load Jobs
Load jobs move data from a staging table to the corresponding target table (base object
or dependent object) in the Hub Store. Load jobs also calculate trust values for base
objects with defined trusted columns, and they apply validation rules (if defined) to
determine the final trust values. For more information about loading data, including
trust, validation, and delta detection, see “Configuration Tasks for Loading Data” on
page 454.
For state-enabled base objects, the load batch process can load records in any state.
The state is specified as an input column on the staging table. The input state can be
specified in the mapping view a landing table column or it can be derived. If an input
state is not specified in the mapping, then the state is assumed to be ACTIVE. For
more information regarding how to manage the state of base object or XREF records,
refer to “Configuring State Management for Base Objects” on page 211.
The following table describes how input states affect the states of existing XREFs.
Existing No XREF
XREF (Load by No Base
State: ACTIVE PENDING DELETED rowid) Object
Incoming
XREF
State:
Update Update + Update + Insert Insert
ACTIVE Promote Restore
Existing No XREF
XREF (Load by No Base
State: ACTIVE PENDING DELETED rowid) Object
Pending Pending Pending Pending Pending
Update Update Update + Update Insert
PENDING Restore
DELETED Soft Delete Hard Delete Hard Delete Error Error
Treat as Treat as Treat as Treat As Treat As
Undefined Active Pending Deleted Active Active
The following table provides a matrix of how Siperian Hub processes records (for
state-enabled base objects) during Load (and Put) for certain operations based on the
record state:
Additional notes:
• If the incoming state is not specified (for a Load update), then the incoming state
is assumed to be the same as the current state. For example if the incoming state is
null and the existing state of the XREF or base object to update is PENDING,
then the incoming state is assumed to be PENDING instead of null.
• Siperian Hub deletes XREF records using the Hub Delete batch job. The Hub
Delete batch job removes specified data—up to and including an entire source
system—from Siperian Hub based on your base object/XREF input to the
cmxdm.hub_delete_batch stored procedure. For more information, see “Hub
Delete Jobs” on page 769.
For more information regarding how to manage the state of base object or XREF
records, refer to “Configuring State Management for Base Objects” on page 211.
• Run the Load job for a parent base object before you run the Load job for a
dependent object.
• If a lookup on the child object is not defined (the lookup table and column were
not populated), in order to successfully load data, you must repeat the Stage job on
the child object prior to running the Load job.
• Only one Load job at a time can be run for the same base object or dependent
object. Multiple Load jobs for the same base object or dependent object cannot be
run concurrently.
Before you run a Load job, you can use the Force Update check box to configure how
the Load job loads data from the staging table to the target base object or dependent
object. By default, Siperian Hub checks the Last Update Date for each record in the
staging table to ensure that it has not already loaded the record. To override this
behavior, check (select) the Force Update check box, which ignores the Last Update
Date, forces a refresh, and loads each record regardless of whether it might have
already been loaded from the staging table. Use this approach prudently, however.
Depending on the volume of data to load, forcing updates can carry a price in
processing time.
When configuring the advanced properties of a base object in the Schema tool, you can
check (select) the Generate Match Tokens on Load check box to generate match
tokens during Load jobs, after the records have been loaded into the base object. By
default, this check box is unchecked (cleared), and match tokens are generated during
the Match process instead. For more information, see “Editing Base Object
Properties” on page 108 and “Run-time Execution Flow of the Load Process” on page
304.
After running a Load job, the Batch Viewer displays the following metrics (if
applicable) in the job execution log:
Metric Description
Total records Number of records processed by the Load job.
Inserted Number of records inserted by the Load job into the target object.
Updated Number of records updated by the Load job in the target object.
No action Number of records on which no action was taken (the records
already existed in the base object).
Updated XREF Number of records that updated the cross-reference table for this
base object. If you are loading a record during an incremental load,
that record has already been consolidated (exists only in the XREF
and not in the base object).
Records tokenized Number of records tokenized by the Load job. Applies only if the
Generate Match Tokens on Load check box is selected in the Schema
tool. For more information, see “Generating Match Tokens During
Load Jobs” on page 730.
Unmerged source Number of source records that were not merged by the Load job.
records
Metric Description
Missing Lookup / Number of source records that were missing lookup information or
Invalid rowid_object had invalid rowid_object records.
records
In the Schema Manager, you can configure the maximum number of matches ready for
manual consolidation to prevent data stewards from being overwhelmed with
thousands of manual merges for processing. Once this limit is reached, the Match jobs
and the Auto Match and Merge jobs will not run until the number of matches has been
reduced. For more information, see “Maximum Matches for Manual Consolidation” on
page 490.
When you start a Manual Merge job, the Merge Manager displays a dialog with a
progress indicator. A manual merge can take some time to complete. If problems occur
during processing, an error message is displayed on completion. This error also shows
up in the job execution log for the Manual Merge job in the Batch Viewer.
In the Merge Manager, the process dialog includes a button labeled Mark process as
incomplete that updates the status of the Manual Merge job but does not abort the
Manual Merge job. If you click this button, the merge process continues in the
background. At this point, there will be an entry in the Batch Viewer for this process.
When the process completes, the success or failure is reported. For more information
about the Merge Manager, see the Siperian Hub Data Steward Guide.
When you start a Manual Unmerge job, the Data Manager displays a dialog with a
progress indicator. A manual unmerge can take some time to complete, especially when
a record in question is the product of many constituent records If problems occur
during processing, an error message is displayed on completion. This error also shows
up in the job execution log for the Manual Unmerge in the Batch Viewer.
In the Data Manager, the process dialog includes a button labeled Mark process as
incomplete that updates the status of the Manual Unmerge job but does not abort the
Manual Unmerge job. If you click this button, the unmerge process continues in the
background. At this point, there will be an entry in the Batch Viewer for this process.
When the process completes, the success or failure is reported.
Match Jobs
A match job generates search keys for a base object, searches through the data for match
candidates (records that are possible matches), applies the match rules to the match
candidates, generates the matches, and then queues the matches for either automatic or
manual consolidation. For an introduction, see “Match Process” on page 317.
When you create a new base object in an ORS, Siperian Hub automatically creates its
Match job. Each Match job compares new or updated records in a base object with all
records in the base object. For a detailed description, see “Run-Time Execution Flow
of the Match Process” on page 329.
After running a Match job, the matched rows are queued for automatic and manual
consolidation. Siperian Hub creates jobs that automatically consolidate the appropriate
records (automerge or autolink). If a record is flagged for manual consolidation
(manual merge or manual link), data stewards must use the Merge Manager to perform
the manual consolidation. For more information about manual consolidation, see the
Siperian Hub Data Steward Guide. For more information about consolidation, see “About
the Consolidate Process” on page 335.
You configure Match jobs in the Match / Merge Setup node in the Schema Manager.
For more information, see “Configuration Tasks for the Match Process” on page 484.
Important: Do not run a Match job on a base object that is used to define
relationships between records in inter-table or intra-table match paths. Doing so will
change the relationship data, resulting in the loss of the associations between records.
For more information, see “Relationship Base Objects” on page 498.
Match Tables
When a Siperian Hub Match job runs for a base object, it populates its match table.
Match tables are usually named as Base_Object_MTCH. For more information, see
“Populating the Match Table with Match Pairs” on page 330.
The following table describes the details of the match batch process behavior given the
incoming states for state-enabled base objects:
Note: For Build Match Group (BMG), do not build groups with PENDING records.
PENDING records to be left as individual matches. PENDING matches will have
automerge_ind=2. For more information regarding how to manage the state of base
object or XREF records, refer to “Configuring State Management for Base Objects”
on page 211.
For merge-style base objects only, you can run the Auto Match and Merge job for a
base object. Auto Match and Merge batch jobs execute a continual cycle of a Match
job, followed by an Automerge job, until there are no more records to match, or until
the maximum number of records for manual consolidation limit is reached (see
“Maximum Matches for Manual Consolidation” on page 490). For more information,
see “Auto Match and Merge Jobs” on page 716.
The Match job for a base object does not attempt to match every record in the base
object against every other record in the base object. Instead, you specify (in the Schema
tool):
• how many records the job should match each time it runs. For more information,
see “Number of Rows per Match Job Batch Cycle” on page 491.
• how many matches are allowed for manual consolidation.
This feature helps to prevent data stewards from being overwhelmed with manual
merges for processing. Once this limit is reached, the Match job will not run until
the number of matches ready for manual consolidation has been reduced. For
more information, see “Maximum Matches for Manual Consolidation” on page
490.
For Match jobs, before executing the job, you can select the match rule set that you
want to use for evaluating matches.
The default match rule set for this base object is automatically selected. To choose any
other match rule set, click the drop-down list and select any other match rule set that
has been defined for this base object. For more information, see “Configuring Match
Rule Sets” on page 531.
After running a Match job, the Batch Viewer displays the following metrics (if
applicable) in the job execution log:
Metric Description
Matched records Number of records that were matched by the Match job.
Records tokenized Number of records that were tokenized by the Match job.
Queued for automerge Number of records that were queued for automerge by the Match
job. Use the Automerge job to process these records. For more
information, see “Automerge Jobs” on page 717.
Queued for manual Number of records that were queued for manual merge by the Match
merge job. Use the Merge Manager in the Hub Console to process these
records. For more information, see the Siperian Hub Data Steward
Guide.
Each Match Analyze job is dependent on new / updated records in the base object that
have been tokenized and are thus queued for matching. For base objects that have
intertable match enabled, the Match Analyze job is also dependent on the successful
completion of the data tokenization jobs for all child tables, which in turn is dependent
on successful Load jobs for the child tables.
You can limit the number of records that the Match Analyze job moves to the on-hold
status. By default, no limit is set. To configure a limit, edit the cmxcleanse.properties
file and add the following setting:
cmx.server.match.threshold_to_move_range_to_hold = n
where n is the maximum number of records that the Match Analyze job can move to
the on-hold status. For more information about the cmxcleanse.properties file, see the
Siperian Hub Installation Guide for your platform.
After running a Match Analyze job, the Batch Viewer displays the following metrics (if
applicable) in the job execution log.
Metric Description
Records moved to Hold Status Number of records moved to Hold
Records analyzed (to be matched) Number of records analyzed for match
Match comparisons required Number of actual matches that would be required to
process this base object
Statistics
Statistic Description
Top 10 range count Top ten number of records in a given search range.
Top 10 range comparison count Top ten number of match comparison that will need to
be performed for a given search range.
Total records moved to hold Count of the records moved to hold.
Total matches moved to hold Total number of matches these records moved to hold
required.
Total ranges processed Number of ranges required to process all the matches
in base object.
Statistic Description
Total candidates Total number of match candidates required to process
all matches for this base object.
Time for analyze Amount of time required to run the analysis.
Note: The Match for Duplicate Data job does not display in the Batch Viewer when
the duplicate match threshold is set to 1 and non-equal matches are enabled on the
base object.
2. Once the Match for Duplicate Data job is complete, run the Automerge job to
process the duplicates found by the Match for Duplicate Data job.
3. Once the Automerge job is complete, run the regular match and merge process
(Match job and then Automerge job, or the Auto Match and Merge job).
Promote Jobs
For state-enabled objects, the Promote job reads the PROMOTE_IND column from
an XREF table and changes the system state to ACTIVE for all rows where the
column’s value is 1. Siperian Hub resets PROMOTE_IND after the Promote job has
run.
Here are the behavior details for the Promote batch job:
You can run the Promote job using the following methods:
• Using the Hub Console; for more information, see “Running Promote Jobs Using
the Hub Console”.
• Using the CMXSM.AUTO_PROMOTE stored procedure; for more information,
see “Promote Jobs” on page 790.
• Using the Services Integration Framework (SIF) API (and the associated
SiperianClient Javadoc); for more information, see the Siperian Services Integration
Framework Guide.
After running a Promote job, the Batch Viewer displays the following metrics (if
applicable) in the job execution log.
Once the Promote job has run, you can view these statistics on the job summary page
in the Batch Viewer.
Recalculate BO Jobs
There are two versions of Recalculate BO:
• Using the ROWID_OBJECT_TABLE Parameter—Recalculates all base
objects identified by ROWID_OBJECT column in the table/inline view (note that
brackets are required around inline view).
• Without the ROWID_OBJECT_TABLE Parameter—Recalculates all records
in the base object, in batches of MATCH_BATCH_SIZE or 1/4 the number of
the records in the table, whichever is less.
If you change your match rules after matching, you are prompted to reset your
matches. When you reset matches, everything in the match table is deleted. In addition,
the Reset Match Table job then resets the consolidation_ind=4 where it is =2. To learn
more, see “About the Consolidate Process” on page 335.
When you save changes to the schema match columns, the following message box is
displayed.
Click Yes to reset the existing matches and create a Reset Match Table job in the Batch
Viewer.
Note: If you do not reset the existing matches, your next Match job will take longer to
execute because Siperian Hub will need to regenerate the match tokens before running
the Match job.
Revalidate Jobs
Revalidate jobs execute the validation logic/rules for records that have been modified
since the initial validation during the Load Process. You can run Revalidate if/when
records change post the initial Load process’s validation step. If no records change, no
records are updated. If some records have changed and get caught by the existing
validation rules, the metrics will show the results.
Note: Revalidate jobs can only be run if validation is enabled on a column after an
initial load and prior to merge on base objects that have validate rules setup.
Revalidate is executed manually using the batch viewer for base objects. For more
information, see “Running Batch Jobs Using the Batch Viewer Tool” on page 674.
Stage Jobs
Stage jobs move data from a landing table to a staging table, performing any cleansing
that has been configured in the Siperian Hub mapping between the tables (see
“Mapping Columns Between Landing and Staging Tables” on page 380). Stage jobs
have parallel cleanse jobs that you can run (see “About Data Cleansing in Siperian
Hub” on page 406). The stage status indicates which Cleanse Match Server is hit
during a stage. For more information about staging data, see “Configuration Tasks for
the Stage Process” on page 364.
For state-enabled base objects, records are rejected if the HUB_STATE_IND value is
not valid. For more information regarding how to manage the state of base object or
XREF records, refer to “About State Management in Siperian Hub” on page 206.
Note: If the Stage job is grayed out, then the mapping has become invalid due to
changes in the staging table, in a column mapping, or in a cleanse function. Open the
specific mapping using the Mappings tool, verify it, and then save it. For more
information, see “Mapping Columns Between Landing and Staging Tables” on page
380.
After running a Stage job, the Batch Viewer displays the following metrics in the job
execution log:
Metric Description
Total records Number of records processed by the Stage job.
Inserted Number of records inserted by the Stage job into the target object.
Rejected Number of records rejected by the Stage job. For more information,
see “Viewing Rejected Records” on page 685.
Synchronize Jobs
You must run the Synchronize job after any changes are made to the schema trust
settings. The Synchronize job is created when any changes are made to the schema
trust settings, as described in “Batch Jobs That Are Created When Changes Occur” on
page 673. For more information, see “Configuring Trust for Source Systems” on page
455.
When you save changes to schema column trust settings in the Systems and Trust tool,
the following message box is displayed.
To run the Synchronize job, navigate to the Batch Viewer, find the correct Synchronize
job for the base object, and run it. Siperian Hub updates the metadata for the base
objects that have trust enabled after initial load has occurred.
the length of the column names and the number of trust-enabled columns. Long
column names are at—or close to—the maximum allowable length of 26
characters. To avoid this problem, keep the number of trust-enabled columns
below 48 and/or the length of the column names short. A workaround is to enable
all trust/validation columns before saving the base object to avoid running the
Synchronize job.
This chapter explains how to create custom scripts to execute batch jobs and batch
groups in a Siperian Hub implementation. The information in this chapter is intended
for implementation teams and system administrators. For information how to
configure and execute Siperian Hub batch jobs using the Batch Viewer and Batch
Group tools in the Hub Console, see “About Siperian Hub Batch Jobs” on page 668.
Important: You must have the application server running for the duration of a batch
job.
Chapter Contents
• About Executing Siperian Hub Batch Jobs
• Setting Up Job Execution Scripts
• Monitoring Job Results and Statistics
• Stored Procedure Reference
• Executing Batch Groups Using Stored Procedures
• Developing Custom Stored Procedures for Batch Jobs
749
About Executing Siperian Hub Batch Jobs
In the Hub Console, the Siperian Hub Batch Viewer and Batch Group tools provide
simple mechanisms for executing Siperian Hub batch jobs. However, they do not
provide a means for executing and managing jobs on a scheduled basis. To execute and
manage jobs according to a schedule, you need to execute stored procedures that do
the work of batch jobs or batch groups. Most organizations have job management
tools that are used to control IT processes. Any such tool capable of executing Oracle
PL*SQL or DB2 SQL commands can be used to schedule and manage Siperian Hub
batch jobs.
Error handling code in job execution scripts can look for return codes and trap any
associated error messaged.
Note: All the input parameters that need a delimited list require a trailing “~”
character.
INTO V_ROWID_TABLE
FROM C_REPOS_TABLE
WHERE TABLE_NAME = 'C_CUSTOMER';
SELECT ROWID_USER
INTO V_ROWID_USER
FROM C_REPOS_USER
WHERE USER_NAME = 'ADMIN';
Autolink Jobs
Autolink jobs automatically link records that have qualified for autolinking during the
match process and are flagged for autolinking (Autolink_ind = 1).
Important: Do not run an Auto Match and Merge job on a base object that is used to
define relationships between records in inter-table or intra-table match paths. Doing so
will change the relationship data, resulting in the loss of the associations between
records. For more information, see “Relationship Base Objects” on page 498.
To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.
The Auto Match and Merge jobs for a target base object can either be run on
successful completion of each Load job, or on successful completion of all Load jobs
for the object.
Auto Match and Merge jobs must complete with a RUN_STATUS of 0 (Completed
Successfully) or 1 (Completed with Errors) to be considered successful.
Sample Job Execution Script for Auto Match and Merge Jobs
DECLARE
IN_ROWID_TABLE CHAR(14);
IN_USER_NAME VARCHAR2(200);
IN_MATCH_SET_NAME VARCHAR(200);
OUT_ERROR_MSG VARCHAR2(2000);
OUT_RETURN_CODE NUMBER;
BEGIN
IN_ROWID_TABLE := 'SVR1.188';
IN_USER_NAME := 'CMX_ORS';
IN_MATCH_SET_NAME := 'MRS2';
OUT_ERROR_MSG := NULL;
OUT_RETURN_CODE := NULL;
CMXMA.MATCH_AND_MERGE ( IN_ROWID_TABLE, IN_USER_NAME,
IN_MATCH_SET_NAME, OUT_ERROR_MSG, OUT_RETURN_CODE );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('RC = ' || TO_CHAR(RC));
COMMIT;
END;
Automerge Jobs
Automerge jobs automatically merge records that have qualified for automerging
during the match process and are flagged for automerging (Automerge_ind = 1).
Automerge jobs are used with merge-style base objects only. For more information, see
“Automerge Jobs” on page 717.
To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.
Each Automerge job is dependent on the successful completion of the match process,
and the queuing of records for automerge.
BEGIN
IN_ROWID_TABLE := NULL;
IN_USER_NAME := NULL;
OUT_ERROR_MESSAGE := NULL;
OUT_RETURN_CODE := NULL;
For more information, see “Stored Procedures for Batch Groups” on page 799.
Note: The External Batch job executes as a batch job only—there is no corresponding
SIF request that external applications can invoke.
Schedule Generate Match Tokens jobs if you run the load process without data
tokenization, or if match failed during tokenization. The Generate Match Tokens job
generates the match tokens for the entire base object (when IN_FULL_RESTRIP_
IND is set to 1).
Note: Check (select) the Re-generate All Match Tokens check box in the Batch Viewer
to populate the IN_FULL_RESTRIP_IND parameter.
To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.
Each Generate Match Tokens job is dependent on the successful completion of the
Load job responsible for loading data into the base object.
For more information, see “Stored Procedures for Batch Groups” on page 799.
Although the Hub Delete job deletes the XREF record, a pointer to the deleted record
(actually to the parent base object of this XREF) could potentially be present on the
_HMXR table (on column ORIG_TGT_ROWID_OBJECT). The Match Tree tool
displays REMOVED (ID#: xxxx) for the removed record(s).
Important:
• The Hub Delete batch job will not delete the data if there are records queued for
an Automerge job.
• Do not run a Hub Delete job when there are automerge records in the match
table. Run the Hub Delete job after the automerge matches are processed.
Cascade Delete
The Hub Delete job performs a cascade delete if you set the parameter IN_ALLOW_
CASCADE_DELETE_IND=1 for a base object in the stored procedure. With
cascade delete, when records in the parent object are deleted, Hub Delete also removes
the affected records in the child base object. Hub Delete checks each child BO table
for related data that should be deleted given the removal of the parent BO record.
Important: For the prior example, the Hub Delete job may potentially delete XREF
records from other source systems. To ensure that Hub Delete does not delete XREF
records from other systems, do not use cascade delete. IN_ALLOW_CASCADE_
DELETE_IND forces Hub Delete to delete the child base objects and
cross-references (regardless of system) when the parent base object is being deleted.
Notes:
• If you do not set the IN_ALLOW_CASCADE_DELETE_IND=1, Siperian Hub
generates an error message if there are child base objects referencing the deleted
base objects record; Hub Delete fails, and Siperian Hub performs a rollback
operation for the associated data.
• IN_CASCADE_CHILD_SYSTEM_XREF=1 is not supported in XU SP1. Since
there may be situations where you would want to selectively cascade deletes to
child records, you would have to perform child deletes first, and then parent
deletes with the cascade delete feature disabled.
Note: Siperian Hub sets the HUB_STATE_IND to -9 in the HXRF when XREFs are
deleted. The HIST table will be set to -9 if the BO record is deleted.
The Hub Delete job removes “records on hold” or records that have had their
CONSOLIDATION_IND column set to 9.
Parameters
Parameter Description
IN_BO_TABLE_NAME Name of the table that contains the list of base
objects to delete.
IN_XREF_LIST_TO_BE_DELETED Name of the table that contains the list of XREFs to
delete.
IN_RECALCULATE_BVT_IND If set to one (1), recalculates BVT following BO
and/or XREF delete.
IN_ALLOW_CASCADE_DELETE_IND If set to one (1), specifies that when records in the
parent object are deleted, Hub Delete also removes
the affected records in the child base object. Hub
Delete checks each child BO table for related data
that should be deleted given the removal of the
parent BO record.
IN_CASCADE_CHILD_SYSTEM_XREF Not supported in XU SP1. Leave the value for this
parameter as the default (0) when executing the
procedure.
IN_OVERRIDE_HISTORY_IND If set to one (1), Hub Delete does not write to
history tables when deleting. If you set IN_
OVERRIDE_HISTORY_IND=1 and set IN_PURGE_
HISTORY_IND=1, then Hub Delete removes
history tables to delete all traces of the data.
Parameter Description
IN_PURGE_HISTORY_IND If set to one (1) Hub Delete
Returns
Parameter Description
OUT_DELETED_XREF_COUNT Number of deleted XREFs.
OUT_DELETED_BO_COUNT Number of deleted BOs.
OUT_ERROR_MSG Error message text.
OUT_RETURN_CODE Error code. If zero (0), then the stored procedure
completed successfully.
The procedure will return a non-zero value in case
of an error.
IN_OVERRIDE_HISTORY_IND := 0;
IN_PURGE_HISTORY_IND := 0;
IN_USER_NAME := 'ADMIN';
IN_ALLOW_COMMIT_IND := 0;
DELETE TMP_DELETE_KEYS;
To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.
Key Match jobs are dependent on the successful completion of the Load job
responsible for loading data into the base object. The Key Match job cannot have been
run after any changes were made to the data.
Load Jobs
Load jobs move data from staging tables to the final target objects, and apply any trust
and validation rules where appropriate. For more information about Load jobs and the
load process, see “Load Jobs” on page 727.
To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.
Each Load job is dependent on the success of the Stage job that precedes it.
In addition, each Load job is governed by the demands of referential integrity
constraints and is dependent on the successful completion of all other Load jobs
responsible for populating tables referenced by the table that is the target of the load.
For Run
Base Objects Run the loads for parent tables before the loads for child tables.
Dependent Objects Run the loads for all referenced base objects before the load for the
dependent object.
Cascade Unmerge
The Unmerge job performs a cascade unmerge if this feature is enabled for this base
object in the Schema Manager in the Hub Console. With cascade unmerge, when
records in the parent object are unmerged, Siperian Hub also unmerges affected
records in the child base object.
This feature applies to unmerging records across base objects. This is configured per
base object (using the Unmerge Child When Parent Unmerges check box on the
Merge Settings tab in the Schema Manager). Cascade unmerge applies only when a
foreign-key relationship exists between two base objects.
For example: Customer A record (parent) in the Customer base object has multiple
address records (children) in the Address base object. The two tables are linked by a
unique key (Customer_ID).
• When cascade unmerge is enabled—Unmerging the parent record (Customer
A) in the Customer base object also unmerges Customer A's child address records
in the Address base object.
• When cascade unmerge is disabled—Unmerging the parent record (Customer
A) in the Customer base object has no effect on Customer A's child records in the
Address base object; they are NOT unmerged.
In your job execution script, you can specify the scope of records to unmerge by
setting IN_UNMERGE_ALL_XREFS_IND.
• IN_UNMERGE_ALL_XREFS_IND=0: Default setting. Unmerges the single
record identified in the specified XREF to its state prior to the merge.
• IN_UNMERGE_ALL_XREFS_IND=1: Unmerges all XREFs to their state prior
to the merge. Use this option to quickly unmerge all XREFs for a single
consolidated record in a single operation.
These features apply to unmerging contributing records from within a single base
object. There is a hierarchy of merges consisting of a root (top of the tree, or BVT),
branches (merged records), and leaves (the original contributing records at end of the
branches). This hierarchy can be many levels deep.
In your job execution script, you can specify the type of unmerge (linear or tree
unmerge) by setting IN_TREE_UNMERGE_IND:
• IN_TREE_UNMERGE_IND=0: Default setting. Linear Unmerge
• IN_TREE_UNMERGE_IND=1: Tree Unmerge
Linear Unmerge
Linear unmerge is the default behavior. During a linear unmerge, a base object record is
unmerged and taken out of the existing merge tree structure. Only the unmerged base
object record itself will come out the merge tree structure, and all base object records
below it in the merge tree will stay in the original merge tree.
Tree Unmerge
• HMRG table provides a hierarchical view of the merge history, a tree of merged
base object records, as well as an interactive unmerge history.
During a tree unmerge, you unmerge a tree of merged base object records as an intact
sub-structure. A sub-tree having unmerged base object records as root will come out
from the original merge tree structure. (For example, merge a1 and a2 into a, then
merge b1 and b2 into b, and then finally merge a and b into c. If you then perform a
tree unmerge on a, and then unmerge a from a1, a2 is a sub tree and will come out
from the original tree c. As a result, a is the root of the tree after the unmerge.)
To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.
Each Manual Unmerge job is dependent on data having already been merged.
Match Jobs
Match jobs find duplicate records in the base object, based on the current match rules.
For more information about Match jobs and the match process, see “Match Jobs” on
page 734.
Important: Do not run a Match job on a base object that is used to define
relationships between records in inter-table or intra-table match paths. Doing so will
change the relationship data, resulting in the loss of the associations between records.
For more information, see “Relationship Base Objects” on page 498.
For a complete list of the identifiers used to execute the stored procedure associated
with this batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on
page 753.
Each Match job is dependent on new / updated records in the base object that have
been tokenized and are thus queued for matching. For parent base objects that have
children, the Match job is also dependent on the successful completion of the data
tokenization jobs for all child tables, which in turn is dependent on successful Load
jobs for the child tables.
For a complete list of the identifiers used to execute the stored procedure associated
with this batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on
page 753.
Each Match Analyze job is dependent on new / updated records in the BO that have
been tokenized and are thus queued for matching. For parent BOs, the Match Analyze
job is also dependent on the successful completion of the data tokenization jobs for all
child tables, which in turn is dependent on successful Load jobs for the child tables.
BEGIN
IN_ROWID_TABLE := NULL;
IN_USER_NAME := NULL;
OUT_ERROR_MSG := NULL;
OUT_RETURN_CODE := NULL;
IN_VALIDATE_TABLE_NAME := NULL;
IN_MATCH_ANALYZE_IND := 1;
Note: The Match for Duplicate Data batch job has been deprecated.
To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 743.
Match for Duplicate Data jobs require the existence of unconsolidated data in the BO.
Match for Duplicate Data jobs must complete with a RUN_STATUS of 0 (Completed
Successfully).
Sample Job Execution Script for Match for Duplicate Data Jobs
DECLARE
IN_ROWID_TABLE CHAR(14);
IN_USER_NAME VARCHAR2(200);
OUT_ERROR_MSG VARCHAR2(2000);
OUT_RETURN_CODE NUMBER;
BEGIN
IN_ROWID_TABLE := NULL;
IN_USER_NAME := NULL;
OUT_ERROR_MSG := NULL;
OUT_RETURN_CODE := NULL;
To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.
Each Multi Merge job is dependent on the successful completion of the match process
for this base object.
BEGIN
IN_ROWID_TABLE := 'SVR1.CP4 ';
IN_SURVIVING_ROWID := '40 ';
IN_MEMBER_ROWID_LIST := '42 ~44 ~45
~47 ~48 ~49 ~';
IN_ROWID_MATCH_RULE := NULL;
IN_COL_LIST := 'SVR1.CSB ~SVR1.CSE ~SVR1.CSG
~SVR1.CSH ~SVR1.CSA ~';
IN_VAL_LIST := 'INDU~THOMAS~11111111111~F~1000~';
IN_INTERACTION_ID := 0;
IN_USER_NAME := 'INDU';
OUT_ERROR_MESSAGE := NULL;
OUT_RETURN_CODE := NULL;
Promote Jobs
For state-enabled objects, a promote job reads the PROMOTE_IND column from an
XREF table and for all rows where the column’s value is 1, changes the ACTIVE state
to on. Siperian Hub resets PROMOTE_IND after the Promote job has run. For more
information regarding how to manage the state of base object or XREF records, refer
to “About State Management in Siperian Hub” on page 206.
Recalculate BO Jobs
There are two versions of Recalculate BO:
• Using the ROWID_OBJECT_TABLE Parameter—Recalculates all BOs
identified by ROWID_OBJECT column in the table/inline view (note that
brackets are required around inline view).
• Without the ROWID_OBJECT_TABLE Parameter—Recalculates all records
in the BO, in batches of MATCH_BATCH_SIZE or 1/4 the number of the
records in the table, whichever is less.
For more information, see “Stored Procedures for Batch Groups” on page 799.
Note: This job cannot be run from the Batch Viewer. For more information, see
“Reset Match Table Jobs” on page 744.
Revalidate Jobs
Revalidate jobs execute the validation logic/rules for records that have been modified
since the initial validation during the Load Process. You can run Revalidate if/when
records change post the initial Load process’s validation step. If no records change, no
records are updated. If some records have changed and get caught by the existing
validation rules, the metrics will show the results. Revalidate is executed manually using
the batch viewer for base objects. For more information. see “Running Batch Jobs
Using the Batch Viewer Tool” on page 674.
Note: Revalidate can only be run after an initial load and prior to merge on base
objects that have validate rules setup.
CMXUT.REVALIDATE_BO(IN_TABLE_NAME, IN_TABLE_NAME,
OUT_ERROR_MESSAGE, RC);
DBMS_OUTPUT.PUT_LINE ( 'OUT_ERROR_MESSAGE= ' ||
SUBSTR(OUT_ERROR_MESSAGE,1,200) );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('RC = ' || TO_CHAR(RC));
COMMIT;
END;
Stage Jobs
Stage jobs copy records from a landing to a staging table. During execution, Stage jobs
optionally cleanse data according to the current cleanse settings. For more information
about Stage jobs and the stage process, see “Stage Jobs” on page 745.
To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.
Each Stage job is dependent on the successful completion of the Extraction Transform
Load (ETL) process responsible for loading the Landing table used by the Stage job.
There are no dependencies between Stage jobs.
IN_STG_ROWID_TABLE VARCHAR2(200);
IN_ROWID_TABLE_OBJECT VARCHAR2(200);
IN_RUN_SYNCH VARCHAR2(200);
OUT_ERROR_MSG VARCHAR2(2000);
OUT_ERROR_CODE NUMBER;
BEGIN
IN_STG_ROWID_TABLE := NULL;
IN_ROWID_TABLE_OBJECT := NULL;
IN_RUN_SYNCH := NULL;
OUT_ERROR_MSG := NULL;
OUT_ERROR_CODE := NULL;
Synchronize Jobs
You must run the Synchronize job after any changes are made to the schema trust
settings. The Synchronize job is created when any changes are made to the schema
trust settings, as described in “Batch Jobs That Are Created When Changes Occur” on
page 673. For more information, see “Configuring Trust for Source Systems” on page
455.
To run the Synchronize job, navigate to the Batch Viewer, find the correct Synchronize
job for the base object, and run it. Siperian Hub updates the metadata for the base
objects that have trust enabled after initial load has occurred. For more information,
see “Synchronize Jobs” on page 747.
This section describes how to execute batch groups using stored procedures and job
scheduling software (such as Tivoli, CA Unicenter, and so on). Siperian Hub provides
stored procedures for managing batch groups, as described in “Stored Procedures for
Batch Groups” on page 799. Siperian Hub also allows you to create and run custom
stored procedures for batch groups, as described in “Developing Custom Stored
Procedures for Batch Jobs” on page 806. You can also create and run stored
procedures using the SIF API (using Java, SOAP, or HTTP/XML).
You can also use the Batch Group tool in the Hub Console to configure and run batch
groups. However, to schedule batch groups, you need to do so using stored procedures,
as described in this section. For more information about the Batch Group tool, see
“Running Batch Jobs Using the Batch Group Tool” on page 688.
Note: If a batch group fails and you do not click either the Set to Restart button (see
“Restarting a Batch Group That Failed Execution” on page 707) or the Set to
Incomplete button (see “Handling Incomplete Batch Group Execution” on page 708)
in the Logs for My Batch Group list, Siperian Hub restarts the batch job from the prior
failed level.
In addition to using parameters that are associated with the corresponding SIF request,
these stored procedures require the following parameters:
• URL of the Hub Server (for example, http://localhost:7001/cmx/request)
• username and password
• target ORS
Note: These stored procedures construct an XML message, perform an HTTP POST
to a server URL using SIF, and return the results.
CMXBG.EXECUTE_BATCHGROUP
Execute Batch Group jobs execute a batch group. Execute Batch Groups jobs have an
option to execute asynchronously, but not to receive a JMS response for asynchronous
execution. If you need to use asynchronous execution and need to know when
execution is finished, then poll with the cmxbg.get_batchgroup_status stored
procedure. Alternatively, if you need to receive a JMS response for asynchronous
execution, then execute the batch group directly in an external application (instead of a
Signature
FUNCTION CMXBG.EXECUTE_BATCHGROUP(
IN_MRM_SERVER_URL IN VARCHAR2(500)
, IN_USERNAME IN VARCHAR2(500)
, IN_PASSWORD IN VARCHAR2(500)
, IN_ORSID IN VARCHAR2(500)
, IN_BATCHGROUP_UID IN VARCHAR2(500)
, IN_RESUME IN VARCHAR2(500)
, IN_ASYNCRONOUS IN VARCHAR2(500)
, OUT_ROWID_BATCHGROUP_LOG OUT VARCHAR2(500)
, OUT_ERROR_MSG OUT VARCHAR2(500)
) RETURN NUMBER --Return the error code
Parameters
Name Description
IN_MRM_SERVER_ Hub Server SIF URL.
URL
IN_USERNAME User account with role-based permissions to execute batch groups.
IN_PASSWORD Password for the user account with role-based permissions to
execute batch groups.
IN_ORSID ORS ID as shown in Console > Configuration > Databases.
To learn more, see “Configuring Operational Record Stores” on
page 62.
IN_BATCHGROUP_ Siperian Object UID of batch group to [execute, reset, get status,
UID etc.].
IN_RESUME One of the following values:
• true: if previous execution failed, resume at that point
• false: regardless of previous execution, start from the beginning
IN_ASYNCRONOUS Specifies whether to execute asynchronously or synchronously. One
of the following values:
• true: start execution and return immediately (asynchronous
execution).
• false: return when group execution is complete (synchronous
execution).
Returns
Parameter Description
OUT_ROWID_ c_repos_job_group_control.rowid_job_group_control
BATCHGROUP_LOG
OUT_ERROR_MSG Error message text.
NUMBER Error code. If zero (0), then the stored procedure completed
successfully. If one (1), then the stored procedure returns an
explanation in out_error_msg.
CMXBG.RESET_BATCHGROUP
Note: In addition to this stored procedure, there are Java API requests and the SOAP
and HTTP XML protocols available using Services Integration Framework (SIF). The
Reset Batch Group Status job has the following SIF API requests available:
ResetBatchGroup. For more information about this SIF API request, see the Siperian
Services Integration Framework Guide.
Signature
FUNCTION CMXBG.RESET_BATCHGROUP(
IN_MRM_SERVER_URL IN VARCHAR2(500)
, IN_USERNAME IN VARCHAR2(500)
, IN_PASSWORD IN VARCHAR2(500)
, IN_ORSID IN VARCHAR2(500)
, IN_BATCHGROUP_UID IN VARCHAR2(500)
, OUT_ROWID_BATCHGROUP_LOG OUT VARCHAR2(500)
, OUT_ERROR_MSG OUT VARCHAR2(500)
) RETURN NUMBER --Return the error code
Parameters
Name Description
IN_MRM_SERVER_URL Hub Server SIF URL.
IN_USERNAME User account with role-based permissions to execute batch
groups.
IN_PASSWORD Password for the user account with role-based permissions to
execute batch groups.
IN_ORSID ORS ID as specified in the Database tool in the Hub
Console. To learn more, see “Configuring Operational
Record Stores” on page 62.
IN_BATCHGROUP_UID Siperian Object UID of batch group to [execute, reset, get
status of, and so on].
Returns
Parameter Description
OUT_ROWID_BATCHGROUP_LOG c_repos_job_group_control.rowid_job_group_
control
OUT_ERROR_MSG Error message text.
NUMBER Error code. If zero (0), then the stored procedure
completed successfully. If one (1), then the stored
procedure returns an explanation in out_error_
msg.
CMXBG.GET_BATCHGROUP_STATUS
Get Batch Group Status jobs return the batch group status.
Note: In addition to this stored procedure, there are Java API requests and the SOAP
and HTTP XML protocols available using Services Integration Framework (SIF). The
Get Batch Group Status job has the following SIF API requests available:
GetBatchGroupStatus. For more information about this SIF API request, see the
Siperian Services Integration Framework Guide.
Signature
FUNCTION CMXBG.GET_BATCHGROUP_STATUS(
IN_MRM_SERVER_URL IN VARCHAR2(500)
, IN_USERNAME IN VARCHAR2(500)
, IN_PASSWORD IN VARCHAR2(500)
, IN_ORSID IN VARCHAR2(500)
, IN_BATCHGROUP_UID IN VARCHAR2(500)
, IN_ROWID_BATCHGROUP_LOG IN VARCHAR2(500)
, OUT_ROWID_BATCHGROUP OUT VARCHAR2(500)
, OUT_ROWID_BATCHGROUP_LOG OUT VARCHAR2(500)
, OUT_START_RUNDATE OUT VARCHAR2(500)
, OUT_END_RUNDATE OUT VARCHAR2(500)
, OUT_RUN_STATUS OUT VARCHAR2(500)
, OUT_STATUS_MESSAGE OUT VARCHAR2(500)
, OUT_ERROR_MSG OUT VARCHAR2(500)
) RETURN NUMBER --Return the error code
Parameters
Name Description
IN_MRM_SERVER_URL Hub Server SIF URL.
IN_USERNAME User account with role-based permissions to execute batch
groups.
IN_PASSWORD Password for the user account with role-based permissions to
execute batch groups.
IN_ORSID ORS ID as specified in the Database tool in the Hub Console.
To learn more, see “Configuring Operational Record Stores”
on page 62.
IN_BATCHGROUP_UID Siperian Object UID of batch group to [execute, reset, get
status of, and so on].
If IN_ROWID_BATCHGROUP_LOG is null, the most
recent log for this group will be used.
IN_ROWID_ c_repos_job_group_control.rowid_job_group_control
BATCHGROUP_LOG
Either IN_BATCHGROUP_UID or IN_ROWID_
BATCHGROUP_LOG is required.
Returns
Parameter Description
OUT_ROWID_BATCHGROUP c_repos_job_group.rowid_job_group
OUT_ROWID_BATCHGROUP_LOG c_repos_job_group_control.rowid_job_group_
control
OUT_START_RUNDATE Date / time when this batch job started.
OUT_END_RUNDATE Date / time when this batch job ended.
OUT_RUN_STATUS Job execution status code that is displayed in the
Batch Group tool. For more information, see
“Executing Batch Groups Using the Batch
Group Tool” on page 701.
OUT_STATUS_MESSAGE Job execution status message that is displayed in
the Batch Group tool. For more information, see
“Executing Batch Groups Using the Batch
Group Tool” on page 701.
OUT_ERROR_MSG Error message text for this stored procedure call,
if applicable.
NUMBER Error code. If zero (0), then the stored
procedure completed successfully. If one (1),
then the stored procedure returns an explanation
in out_error_msg.
Sample Job Execution Script for Get Batch Group Status Jobs
DECLARE
OUT_ROWID_BATCHGROUP CMXLB.CMX_SMALL_STR;
OUT_ROWID_BATCHGROUP_LOG CMXLB.CMX_SMALL_STR;
OUT_START_RUNDATE CMXLB.CMX_SMALL_STR;
OUT_END_RUNDATE CMXLB.CMX_SMALL_STR;
OUT_RUN_STATUS CMXLB.CMX_SMALL_STR;
OUT_STATUS_MESSAGE CMXLB.CMX_SMALL_STR;
OUT_ERROR_MSG CMXLB.CMX_SMALL_STR;
OUT_RETURNCODE INT;
RET_VAL INT;
BEGIN
RET_VAL := CMXBG.GET_BATCHGROUP_STATUS(
'HTTP://LOCALHOST:7001/CMX/REQUEST/PROCESS/'
, 'ADMIN'
, 'ADMIN'
,'LOCALHOST-MRM-XU_3009'
, 'BATCH_GROUP.MYBATCHGROUP'
, NULL
, OUT_ROWID_BATCHGROUP
, OUT_ROWID_BATCHGROUP_LOG
, OUT_START_RUNDATE
, OUT_END_RUNDATE
, OUT_RUN_STATUS
, OUT_STATUS_MESSAGE
, OUT_ERROR_MSG
);
CMXLB.DEBUG_PRINT('GET_BATCHGROUP_STATUS: CODE='|| RET_VAL || '
MESSAGE='|| OUT_ERROR_MSG || ' STATUS=' || OUT_STATUS_MESSAGE || ' |
OUT_ROWID_BATCHGROUP_LOG='|| OUT_ROWID_BATCHGROUP_LOG);
END;
/
Signature
PROCEDURE EXAMPLE_JOB(
IN_ROWID_TABLE_OBJECT IN CHAR(14) --C_REPOS_TABLE_OBJECT.ROWID_
TABLE_OBJECT, RESULT OF CMXUT.REGISTER_CUSTOM_TABLE_OBJECT
,IN_USER_NAME IN VARCHAR2(50) --Username calling the function
,IN_ROWID_JOB IN CHAR(14) --C_REPOS_JOB_CONTROL.ROWID_JOB, for
reference, do not update status
,OUT_ERR_MSG OUT VARCHAR --Message about success or error
,OUT_ERR_CODE OUT INT -- >=0: Completed successfully. <0: Error
)
Parameters
Name Description
in_rowid_table_object IN c_repos_table_object.rowid_table_object
cmxlb.cmx_rowid
Result of cmxut.REGISTER_CUSTOM_TABLE_
OBJECT
in_user_name IN User name calling the function.
cmxlb.cmx_user_name
Returns
Parameter Description
out_err_msg Error message text.
out_err_code Error code.
Signature
PROCEDURE REGISTER_CUSTOM_TABLE_OBJECT(
IN_ROWID_TABLE IN CHAR(14)
, IN_OBJ_FUNC_TYPE_CODE IN VARCHAR
, IN_OBJ_FUNC_TYPE_DESC IN VARCHAR
, IN_OBJECT_NAME IN VARCHAR
)
Parameters
Name Description
IN_ROWID_TABLE Foreign key to c_repos_table.rowid_table.
CMXLB.CMX_ROWID
When the Hub Server calls the custom job in a batch
group, this value is passed in.
IN_OBJ_FUNC_TYPE_CODE Job type code. Must be 'A' for batch group custom jobs.
IN_OBJ_FUNC_TYPE_DESC Display name for the custom batch job in the Batch
Groups tool in the Hub Console.
IN_OBJECT_NAME package.procedure name of the custom job.
Example
BEGIN
cmxut.REGISTER_CUSTOM_TABLE_OBJECT (
'SVR1.RS1B ' -- c_repos_table.rowid_table
,'A' -- Job type, must be 'A' for batch group
,'CMXBG_EXAMPLE.UPDATE_TABLE EXAMPLE' -- Display name
,'CMXBG_EXAMPLE.UPDATE_TABLE' -- Package.procedure
);
END;
Example
DECLARE
IN_ROWID_TABLE CHAR(14);
IN_ROWID_COL_LIST VARCHAR2(2000);
IN_USER_NAME VARCHAR2(50);
IN_INDEX_TYPE VARCHAR2(200);
BEGIN
IN_ROWID_TABLE := '<ROWID_TABLE>' ; -- rowid_table from c_repos_
table where table_name = 'your table name'
-- Notes:
-- 1. Trailing spaces in the rowid_column values are significant
-- 2. Separate each rowid_column with a ~ character and end the
list with ~ character e.g. '123 ~456 ~'
Example
DECLARE
IN_TABLE_NAME VARCHAR2(30);
OUT_ERROR_MESSAGE VARCHAR2(1024);
RC NUMBER;
BEGIN
IN_TABLE_NAME := 'C_BO_TO_CLEAN'; --Name of the BO table
OUT_ERROR_MESSAGE := NULL; --Return msg; output parameter
RC := NULL; --Return code; output parameter
CMXUT.CLEAN_TABLE ( IN_TABLE_NAME, OUT_ERROR_MESSAGE, RC );
COMMIT;
END;
Example
DECLARE
IN_DEBUG_TEXT VARCHAR2(32000);
BEGIN
IN_DEBUG_TEXT := NULL; --String that you want to print in the log
file
CMXLB.DEBUG_PRINT ( IN_DEBUG_TEXT );
COMMIT;
END;
PROCEDURE UPDATE_TABLE(
IN_ROWID_TABLE_OBJECT IN CMXLB.CMX_ROWID
,IN_USER_NAME IN CMXLB.CMX_USER_NAME
,IN_ROWID_JOB IN CMXLB.CMX_ROWID
,OUT_ERR_MSG OUT VARCHAR
,OUT_ERR_CODE OUT INT
);
END CMXBG_EXAMPLE;
/
CREATE OR REPLACE PACKAGE BODY CMXBG_EXAMPLE
AS
BEGIN
DECLARE
CUTOFF_DATE DATE;
RECORD_COUNT INT;
RUN_STATUS INT;
STATUS_MESSAGE VARCHAR2 (2000);
START_DATE DATE := SYSDATE;
MRM_ROWID_TABLE CMXLB.CMX_ROWID;
OBJ_FUNC_TYPE CHAR (1);
JOB_ID CHAR (14);
SQL_STMT VARCHAR2 (2000);
TABLE_NAME VARCHAR2(30);
RET_CODE INT;
REGISTER_JOB_ERR EXCEPTION;
BEGIN
SQL_STMT :=
'ALTER SESSION SET NLS_DATE_FORMAT=''DD MON YYYY
HH24:MI:SS''';
SELECT ROWID_TABLE
INTO MRM_ROWID_TABLE
FROM C_REPOS_TABLE_OBJECT
WHERE ROWID_TABLE_OBJECT = IN_ROWID_TABLE_OBJECT;
SELECT START_RUN_DATE
INTO CUTOFF_DATE
FROM C_REPOS_JOB_CONTROL
WHERE ROWID_JOB = IN_ROWID_JOB;
EXCEPTION
WHEN OTHERS
THEN
OUT_ERR_CODE := SQLCODE;
OUT_ERR_MSG := SUBSTR (SQLERRM, 1, 200);
END;
END;
END CMXBG_EXAMPLE;
/
Contents
• Chapter 19, “Generating ORS-specific APIs and Message Schemas”
• Chapter 20, “Setting Up Security”
• Chapter 21, “Viewing Registered Custom Code”
• Chapter 22, “Auditing Siperian Hub Services and Events”
815
816 Siperian Hub Administrator Guide
19
Generating ORS-specific APIs and
Message Schemas
This chapter describes how to use the SIF Manager tool to generate ORS-specific APIs
and how to use the JMS Event Schema Manager tool to generate ORS-specific JMS
Event Message objects.
Chapter Contents
• Before You Begin
• Generating ORS-specific APIs
• Generating ORS-specific Message Schemas
817
Before You Begin
Note: Use of the ORS-specific API does not imply that you must use the SIF SDK.
Alternatively, you could use the ORS-specific API as SOAP web-services.
Area Description
SIF ORS-Specific APIs Shows the logical name, java name, WSDL URL, and API
generation time for the SIF ORS-specific APIs.
Use this function to generate and deploy SIF APIs for
packages, remote packages, mappings, and cleanse functions in
an ORS database. Once generated, the ORS-specific APIs will
be available with SiperianClient by using the client jar and also
as a web service. The logical name is used to name the
components of the deployment.
Out of Sync Objects Shows the database objects in the schema that are out of sync.
with the generated schema.
Note: The following procedure assumes that you have already configured the base
objects and packages of the ORS. If you subsequently change any of these, regenerate
the ORS-specific APIs.
Note: SIF API generation requires at least one secure package, remote package,
cleanse function or mapping.
Note: To prevent running out of heap space for the associated SIF API Javadocs, you
may need to increase the size of the heap. The default heap size is 256M. You can also
override this default using the SIF.JVM.HEAP.SIZE parameter.
APIs have no dependencies on each other, so you can use one while the other is
not in use.
You can use the resulting URL to access the WSDL descriptions from your
development environment.
Note: To prevent running out of heap space for any associated SIF API Javadocs, you
may need to increase the size of the heap. The default heap size is 256M. You can also
override this default using the SIF.JVM.HEAP.SIZE parameter.
You can download ORS-specific JAR file at any point after the APIs have been
generated.
The SIF Manager Find Out of Sync Objects function compares the last generated
APIs to the defined objects in the ORS. The SIF Manager reports any differences
between these. If differences are found, the ORS-specific API should be regenerated.
Note: Once you have evaluated the impact of the out-of-sync objects, you can then
decide whether or not to re-generate the schema (typically, external components which
interact with the Hub are written to work with a specific version of the generated
schema). If you regenerate the schema, these external components may no longer
work.
Note: If your Siperian Hub implementation requires that you use the legacy XML
message format (Siperian Hub XU version) instead of the current version of the XML
message format (described in this section), see “Legacy JMS Message XML Reference”
on page 644 instead.
Use the JMS Event Schema Manager tool to generate and deploy ORS-specific JMS
Event Messages for the current ORS. The XML schema for these messages can be
downloaded or accessed using a URL. For more information about JMS Event
Messages, see “JMS Message XML Reference” on page 622.
Note: JMS Event Schema generation requires at least one secure package or remote
package.
Important: If there are two databases that have the same schema (for example, CMX_
ORS), the logical name (which is the same as the schema name) will be duplicated for
JMS Events when the configuration is initially saved. Consequently, the database
display name is unique and should be used as the initial logical name instead of the
schema name to be consistent with the SIF APIs. You will need to change the logical
name before generating the schema.
Additionally, each ORS has an XSD file specific to the ORS that uses the elements
from the common XSD file (siperian-mrm-events.xsd). The ORS-specific XSD is
named as <ors-name>-siperian-mrm-event.xsd. The XSD defines two objects for
each package and remote package in the schema:
Note: If legacy XML event message objects are to be used, ORS-specific message
object generation is not required.
The Hub Console displays the JMS Event Schema Manager tool, as shown in the
following example.
The JMS Event Schema Manager tool displays the following areas:
Area Description
JMS ORS-specific Event Shows the event message schema for the ORS.
Message Schema
Use this function to generate and deploy ORS-specific JMS
Event Messages for the current ORS. The logical name is used
to name the components of the deployment. The schema can
be downloaded or accessed using a URL.
Note: If legacy XML event message objects are to be used,
ORS-specific message object generation is not required.
Out of Sync Objects Shows the database objects in the schema that are out of sync.
with the generated API.
Note: The following procedure assumes that you have already configured the base
objects, packages, and mappings of the ORS. If you subsequently change any of these,
regenerate the ORS-specific schemas.
Note: JMS Event Schema generation requires at least one secure package or remote
package.
Important: If there are two databases that have the same schema (for example, CMX_
ORS), the logical name (which is the same as the schema name) will be duplicated for
JMS Events when the configuration is initially saved. Consequently, the database
display name is unique and should be used as the initial logical name instead of the
schema name to be consistent with the SIF APIs. You will need to change the logical
name before generating the schema.
Note: There must be at least one secure package or remote package configured to
generate the schema. If there are no secure objects to generate, the Siperian Hub
generates a runtime error message.
An XSD file defines the structure of an XML file and can also be used to validate the
XML file. For example, if an XML file contains a reference to an XSD, an XML
validation tool can be used to verify that the tags in the XML conform to the
definitions defined in the XSD.
You use Find Out Of Sync Objects to determine if the event schema needs to be
re-generated to reflect changes in the system. The JMS Event Schema Manager displays
a list of packages and remote packages that have changed since the last schema
generation.
Note: The Out of Sync Objects function compares the generated APIs to the database
objects in the schema so both must be present to find the out-of-sync objects.
In order to make any changes to the schema, you must have a write lock. To learn
more, see “Acquiring a Write Lock” on page 30.
3. Click Find Out of Sync Objects.
The JMS Event Schema Manager displays all out of sync objects in the lower panel.
Note: Once you have evaluated the impact of the out-of-sync objects, you can then
decide whether or not to re-generate the schema (typically, external components which
interact with the Hub are written to work with a specific version of the generated
schema). If you regenerate the schema, these external components may no longer
work.
If the JMS Event Schema Manager returns any out-of-sync objects, click Generate and
Deploy ORS-specific Schema to re-generate the event schema. For more
information, see “Generating and Deploying ORS-specific Schemas” on page 827.
You can configure Siperian Hub to periodically search for out-of-sync objects and
re-generate the schema as needed. This auto-poll feature operates within the data
change monitoring thread which automatically engages a specified number of
milliseconds between polls. You specify this time frame using the Message Check
Interval in the Message Queues tool. When the monitoring thread is active, this
automatic service first checks if the out-of-sync interval has elapsed and if so, performs
the out-of-sync check and then re-generates the event schema as needed.
3. Select the root node Message Queues and set the Out of sync check interval
(milliseconds). For more information, see “Configuring Global Message Queue
Settings” on page 604.
Since the out-of-sync auto-poll feature effectively depends on the Message check
interval, you should set the Out-of-sync check interval to a value greater than or
equal to that of the Message check interval.
Note: You can disable to out-of-sync check by setting the out-of-sync check
interval to 0.
This chapter describes how to set up security for your Siperian Hub implementation
using the Hub Console. To learn how to configure user access to the Hub Console, see
“About User Access to Hub Console Tools” on page 989.
To learn more about configuring security using the Services Integration Framework
(SIF) instead, see the Siperian Services Integration Framework Guide.
Chapter Contents
• About Setting Up Security
• Securing Siperian Hub Resources
• Configuring Roles
• Configuring Siperian Hub Users
• Configuring User Groups
• Assigning Users to the Current ORS Database
• Assigning Roles to Users and User Groups
• Managing Security Providers
831
About Setting Up Security
Before setting up security for your Siperian Hub implementation, it is important for
you to understand some key concepts.
Note: SAM security applies primarily to users of third-party applications who want to
gain access to Siperian Hub resources. SAM applies only tangentially to Hub Console
users. The Hub Console has its own security mechanisms to authenticate users and
authorize access to Hub Console tools and resources.
Authentication
Authentication is the process of verifying the identity of a user to ensure that they are
who they claim to be. A user is an individual who wants to access Siperian Hub
resources (see “Configuring Siperian Hub Users” on page 866). In Siperian Hub, users
are authenticated based on their supplied credentials—user name / password, security
payload, or a combination of both.
Siperian Hub implementations can use each type of authentication exclusively, or they
can use a combination of them. The type of authentication used in your Siperian Hub
implementation depends on how you configure security, as described in “Security
Implementation Scenarios” on page 836.
Authorization
Siperian Hub implementations can use either type of authorization exclusively, or they
can use a combination of both. The type of authorization used in your Siperian Hub
implementation depends on how you configure security, as described in “Security
Implementation Scenarios” on page 836.
Siperian Hub provides general types of resources that you can configure to be secure
resources: base objects, dependent objects, mappings, packages, remote packages,
cleanse functions, match rule sets, batch groups, metadata, content metadata, Metadata
Manager, HM profiles, the audit table, and the users table. You can configure security
for these resources in a highly granular way, granting access to Siperian Hub resources
according to various privileges (read, create, update, merge, and execute). Resources are
either PRIVATE (the default) or SECURE. Privileges can be granted only to secure
resources. To learn more see “Securing Siperian Hub Resources” on page 841.
Roles
In Siperian Hub, resource privileges are allocated to roles. A role represents a set of
privileges to access secure Siperian Hub resources (see “Configuring Roles” on page
854). Users and user groups are assigned to roles. A user’s resource privileges are
determined by the roles to which they are assigned, as well as by the roles assigned to
the user group(s) to which the user belongs. Security Access Manager enforces
resource authorization for requests from external application users. Administrators and
data stewards who use the Hub Console to access Siperian Hub resources are less
directly affected by resource privileges (see “Privileges” on page 843).
For users who will be using the Hub Console to access Siperian Hub resources, you
can use the Tool Access tool in the Configuration workbench to control access
privileges to Hub Console tools. For example, data stewards typically have access to
only the Data Manager and Merge Manager tools. To learn more, see “About User
Access to Hub Console Tools” on page 989.
At run time, in order to execute a SIF request, the logged-in user must be assigned a
role that has the required privilege(s) to access the resource(s) involved with the
request. Otherwise, the user’s request will be denied.
Internal-only PDP
The following figure shows a security deployment in which all PDPs are handled
internally by Siperian Hub.
In this scenario, Siperian Hub makes all policy decisions based on how users, groups,
roles, privileges, and resources are configured using the Hub Console.
The following figure shows a security deployment in which Siperian Hub integrates
with an external directory.
In this scenario, the external user directory manages user accounts, groups, and user
profiles. The external user directory is able to authenticate users and provide
information to Siperian Hub about group membership and user profile information.
Users or user groups that are maintained in the external user directory must still be
registered in Siperian Hub. Registration is required so that Siperian Hub roles—and
their associated privileges—can be assigned to these users and groups.
The following figure shows a security deployment where role assignment—in addition
to user accounts, groups, and user profiles—is handled externally to Siperian Hub.
In this scenario, external roles are explicitly mapped to Siperian Hub roles.
The following figure shows a security deployment in which role definition and privilege
assignment—in addition to user accounts, groups, user profiles, and role
assignment—is handled externally to Siperian Hub.
In this scenario, Siperian Hub simply exposes the protected resources using external
proxies, which are synchronized with the internally-protected resources using SIF
requests (RegisterUsers, UnregisterUsers, and ListSiperianObjects). All policy decisions
are external to Siperian Hub.
4. Optionally, configure user groups and assign users to them, if applicable. For
instructions on using the Users and Groups tool to configure user groups, see
“Configuring User Groups” on page 881.
5. Configure secure Siperian Hub resources and (optionally) resource groups. For
instructions on using the Secure Resources tool to configure resources and
resource groups, see “Setting the Status of a Siperian Hub Resource” on page 847.
6. Define roles and assign resource privileges to roles. For instructions on using the
Roles tool to configure roles, see “Configuring Roles” on page 854.
7. Assign roles to users and (optionally) user groups. For instructions on using the
Users and Groups tool to assign roles, see “Assigning Roles to Users and User
Groups” on page 887.
8. For non-administrator users who will interact with Siperian Hub using the Hub
Console, provide them with access to the Hub Console tools that they will need to
use, as described in “Configuring User Access to ORS Databases” on page 875.
For example, data stewards typically need access to the Merge Manager and Data
Manager tools (which are described in the Siperian Hub Data Steward Guide).
If you are using external security providers instead to handle any portion of security in
your Siperian Hub implementation, you must configure them in the Hub Console, as
described in “Managing Security Providers” on page 889.
Note: This document describes how to configure Siperian Hub’s internal security
framework using the Hub Console. If you are using third-party security providers to
handle any portion of security in your Siperian Hub implementation, refer to your
security provider’s configuration instructions instead.
The following types of Siperian Hub resources can be configured as secure resources:
In addition, the Hub Console allows you to protect other resources that are accessible
by SIF requests, including content metadata, match rule sets, metadata, batch groups,
validate metadata, the audit table, and the users table.
In order for external applications to access a Siperian Hub resource using SIF requests,
that resource must be configured as SECURE. Because all Siperian Hub resources are
PRIVATE by default, you must explicitly make a resource SECURE after the resource
has been added.
There are certain Siperian Hub resources that you might not want to expose to external
applications. For example, your Siperian Hub implementation might have mappings or
packages that are used only in batch jobs (not in SIF requests), so these could remain
private.
Note: Package columns are not considered to be secure resources. They inherit the
secure status and privileges from the parent base object (or dependent object) columns.
If package columns are based on system table columns (that is, C_REPOS_AUDIT),
or columns of tables that are not based on the base object / dependent object (that is,
landing tables), there is no need to set up security for them, since they are accessible by
default.
Privileges
With Siperian Hub internal authorization, each role is assigned one of the following
privileges.
Privileges determine the access that external application users have to Siperian Hub
resources. For example, a role might be configured to have READ, CREATE,
UPDATE, and MERGE privileges on particular packages.
Note: Each privilege is distinct and must be explicitly assigned. Privileges do not
aggregate other privileges. For example, having UPDATE access to a resource does
automatically give you READ access to it as well—both privileges must be individually
assigned.
These privileges are not enforced when using the Hub Console, although the settings
still affect the use of Hub Console to some degree. For example, the only packages that
data stewards can see in the Merge Manager and Data Manager tools are those
packages to which they have READ privileges. In order for data stewards to edit and
save changes to data in a particular package, they must have UPDATE and CREATE
privileges to that package (and associated columns). If they do not have UPDATE or
CREATE privileges, then any attempts to change the data in the Data Manager will
fail. Similarly, a data steward must have MERGE privileges to merge or unmerge
records using the Merge Manager. To learn more about the Merge Manager and Data
Manager tools, see the Siperian Hub Data Steward Guide.
Resource Groups
A resource group is a logical collection of secure resources. Using the Secure Resources
tool, you can define resource groups, and then assign related resources to them.
Resource groups simplify privilege assignment, allowing you to assign privileges to
multiple resources at once and easily assigning resource groups to a role.
A resource group can also contain other resource groups—except itself or any resource
group to which it belongs—allowing you to build a hierarchy of resource groups and to
simplify the management of a large collection of resources.
The Hub Console displays the Secure Resources tool, as shown in the following
example.
Resource Status
Resources (Global Resources) (SECURE or PRIVATE)
Column Description
Resources Used to set the status of individual Siperian Hub resources (SECURE
or PRIVATE). Siperian Hub resources organized in a hierarchy that
shows the relationships among resources. Global resources appear at
the top of the hierarchy. For details, see “Configuring Resources” on
page 846.
Resource Groups Used to configure resource groups. For details, see “Configuring
Resource Groups” on page 849.
Configuring Resources
Use the Resources tab in the Secure Resources tool to browse and configure Siperian
Hub resources.
Resources are organized hierarchically in the navigation tree by resource type, as shown
in the following example.
You can configure the resource status (SECURE or PRIVATE) for any concrete
Siperian Hub resource.
Note: This status setting does not apply to resource groups (which contain only
SECURE resources) or to global resources (for example, BASE_OBJECT.*)—only to
the resources that they contain.
OR
• Click to make all selected resources private.
Filtering Resources
To simplify changing the status of a collection of Siperian Hub resources, especially for
an implementation with a large and complex schema, you can specify a filter that
displays only the resources that you want to change.
4. Do the following:
• Check (select) the resource type(s) that you want to include in the filter.
• Uncheck (clear) the resource type(s) that you want to exclude in the filter.
5. Click OK.
The Secure Resources tool displays the filtered Resources tree.
The Secure Resources tool differentiates visually between resources that belong directly
to the current resource group (explicitly added) and resources that belong indirectly
because they are members of a resource group that belongs to this resource group
(implicitly added). For example, suppose you have two resource groups:
• Resource Group A contains the Consumer base object, which means that the
Consumer base object is a direct member of Resource Group A
• Resource Group B contains the Address base object
• Resource Group A contains Resource Group B, which means that the Address
base object is an indirect member of Resource Group A
While editing Resource Group A, the Address base object is slightly grayed, as shown
in the following example.
Indirect Membership
Direct Membership
In this example, you cannot change the check box for the Address base object when
you are editing Resource Group A. You can change the check box only when editing
Resource Group B.
The Secure Resources displays the Add Resources to Resource Group dialog.
9. Check (select) the resources that you want to assign to this resource group.
10. Uncheck (clear) the resources that you want to remove this resource group.
11. Click OK.
Configuring Roles
This section describes how to configure roles for your Siperian Hub implementation.
Note: If you are using a Comprehensive Centralized PDP security deployment (see
“Comprehensive Centralized PDP” on page 838), in which users are authorized
externally, if your external authorization provider does not require you to define roles
in Siperian Hub, then you can skip this section.
About Roles
In Siperian Hub, a role represents a set of privileges to access secure Siperian Hub
resources. In order for a user to view or manipulate a secure Siperian Hub resource,
that user must be assigned a role that grants them sufficient privileges to access the
resource. Roles determine what a user is authorized to access and do in Siperian Hub.
To learn more, see “Authorization” on page 833 and “Privileges” on page 843.
Siperian Hub roles are highly granular and flexible, allowing administrators to
implement complex security safeguards according to your organization’s unique
security policies, procedures, and requirements. Some users might be assigned to a
single role with access to everything (such as an administrator) or with
explicitly-restricted privileges (such as a data steward), while others might be assigned
to multiple roles of varying privileges.
A role can also have other roles assigned to it, thereby inheriting the access privileges
configured for those roles. Privileges are additive, meaning that, when roles are
combined, their privileges are combined as well. For example, suppose Role A has
READ privileges to an Address base object, and Role B has CREATE and UPDATE
privileges to it. If a user account is assigned Role A and Role B, then that user account
will have READ, CREATE, and UPDATE privileges to the Address base object.
A user account inherits the privileges configured for any role to which the user account
is assigned.
Resource privileges vary depending on the scope of access that is required for users to
do their jobs—ranging from broad and deep access (for example, super-user
administrators) to very narrow, focused access (for example, READ privileges on one
base object). It is generally recommended that you follow the principle of least
privilege—users should be assigned the least set of privileges needed to do their work.
Because Siperian Hub provides you with the ability to vary resource privileges per role,
and because resource privileges are additive, you can define roles in a highly-granular
manner for your Siperian Hub implementation. For example, you could define separate
roles to provide different access levels to human resources data (such as
HRAppReadOnly, HRAppCreateOnly, and HRAppUpdateOnly), and then combine
them into another aggregate role (such as HRAppAll). You would then assign to
various users just the role(s) that are appropriate for their job function.
Column Description
Resource Privileges Used to assign resource privileges to roles. For details, see “Assigning
Resource Privileges to Roles” on page 859.
Roles Used to assign roles to other roles. For details, see “Assigning Roles
to Other Roles” on page 862.
Report Used to generate a distilled report of resource privileges granted to a
given role. For details, see “Generating a Report of Resource
Privileges for Roles” on page 863.
Adding Roles
To add a new role:
1. Start the Roles tool. To learn more, see “Starting the Roles Tool” on page 855.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Point anywhere in the navigation pane, right-click the mouse, and choose Add
Role.
The Roles tool displays the Add Role dialog.
Field Description
Name Name of this role. Enter a unique, descriptive name.
Description Optional description of this role.
External Name External name (alias) of this role. To learn more, see “Mapping
Internal Roles to External Roles” on page 859.
5. Click OK.
The Roles tool adds the new role to the roles list.
Editing Roles
To edit an existing role:
1. Start the Roles tool. To learn more, see “Starting the Roles Tool” on page 855.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Scroll the roles list and select the role that you want to edit.
4. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
5. Click the Save button to save your changes.
You can also assign and edit resource privileges for roles. To learn more, see “Assigning
Resource Privileges to Roles” on page 859.
Inheriting Privileges
You can also edit the privileges for a specific role to inherit privileges from other roles;
to learn more see “Assigning Roles to Other Roles” on page 862.
Deleting Roles
To delete an existing role:
1. Start the Roles tool. To learn more, see “Starting the Roles Tool” on page 855.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Scroll the roles list and select the role that you want to delete.
4. Point anywhere in the navigation pane, right-click the mouse, and choose Delete
Role.
The Roles tool prompts you to confirm deletion.
5. Click Yes.
The Roles tool removes the deleted role from the roles list.
Note: There is no predefined format for a configuration file. It might not be an XML
file or even a file at all. The mapping is a part of the custom user profile or
authentication provider implementation. The purpose of the mapping is to populate a
user profile object roles list with internal role IDs (rowids).
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Scroll the roles list and select the role for which you want to assign resource
privileges.
4. Click the Resource Privileges tab.
Field Description
Resources Hierarchy of secure Siperian Hub resources. Displays only those
Siperian Hub resources whose status has been set to SECURE in
the Secure Resources tool. To learn more, see “Setting the Status
of a Siperian Hub Resource” on page 847.
Privileges Privileges to assign to secure resources. To learn more, see
“Privileges” on page 843
5. Expand the Resources hierarchy to show the secure resources that you want to
configure for this role.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Scroll the roles list and select the role to which you want to assign other roles.
4. Click the Roles tab.
The Roles tool displays the Roles tab.
The Roles tool displays any role(s) that can be assigned to the selected role.
5. Check (select) any role that you want to assign to the selected role.
6. Uncheck (clear) any role that you want to remove from this role.
7. Click the Save button to save your changes.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Scroll the roles list and select the role for which you want to generate a report.
4. Click the Report tab.
The Roles tool displays the Report tab.
5. Click Generate. The Roles tool generates the report and displays it on the tab.
The Roles tool prompts you to specify the target location for the saved report.
The Roles tool saves the current report as an HTML file in the target location.
You can subsequently display this report using a browser.
• you are using Siperian Hub’s external authorization (see “External User Directory”
on page 837)
• multiple users will run the Hub Console using different accounts (for example,
administrators and data stewards).
User Accounts
Users are represented in Siperian Hub by user accounts, which are defined in the master
database in the Hub Store. You use the Users tool in the Configuration workbench to
define and configure user accounts for Siperian Hub users, as well as to change
passwords and enable external authentication. External applications with sufficient
authorization can also register user accounts using SIF requests, as described in the
Siperian Services Integration Framework Guide. A user needs to be defined only once, even if
the same user will access more than one ORS associated with the Master Database.
A user account gains access to Siperian Hub resources using the role(s) assigned to it,
inheriting the privileges configured for each role, as described in “About Roles” on
page 854.
Siperian Hub allows for multiple concurrent SIF requests from the same user account.
For an external application in which granular auditing and user tracking is not required,
multiple users can use the same user account when submitting SIF requests.
Tab Description
User Displays a list of all users that have been defined, except the
default admin user (which is created when Siperian Hub is
installed). To learn more, see “Configuring Users” on page 869.
Target Database Assign users to target databases. To learn more, see “Configuring
User Access to ORS Databases” on page 875.
Global Password Policy Specify global password policies. To learn more, see “Managing
the Global Password Policy” on page 877.
Configuring Users
This section describes how to configure users in the Users tool. It refers to
functionality that is available on the Users tab of the Users tool.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Click the button. The Users tool displays the Add User dialog.
Property Description
First name First name for this user.
Middle name Middle name for this user.
Last name Last name for this user.
User name Name of the user account for this user. Name that this user will
enter to log into the Hub Console.
Default database Default database for this user. This is the database that is
automatically selected when the user logs into Hub Console, as
described in “Starting the Hub Console” on page 19. If you want
to change this database later, see “Configuring User Access to
ORS Databases” on page 875.
Password Password for this user. If you want to change this password later,
see “Changing Password Settings for User Accounts” on page 874.
Verify password Type the password again to verify.
Use external One of the following settings:
authentication?
• Check (select) this option to use external authentication using
a third-party security provider instead of Siperian Hub’s
default authentication. To learn more, see “Managing Security
Providers” on page 889.
• Uncheck (clear) this option to use the default Siperian Hub
authentication.
6. Click OK.
The Users tool adds the new user to the list of users on the Users tab.
For each user, you can update their name, their default login database, and specify
other settings—such as whether Siperian Hub retains a log of user logins/logouts,
whether they can log into Siperian Hub, and whether they have administrator-level
privileges.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Select the user account that you want to configure.
Property Description
Administrator One of the following settings:
• Check (select) this option to give this user administrative
access, which allows them to have access to all Hub Console
tools and all databases.
• Uncheck (clear) this option if you do not want to grant
administrative access to this user. This is the default.
Enable One of the following settings:
• Check (select) this option to activate this user account and
allow this user to log in.
• Uncheck (clear) this option to disable this user account and
prevent this user from logging in.
When adding or editing a user account that will be authenticated externally, you need to
check (select) the Use External Authentication check box. If unchecked (cleared),
then Siperian Hub’s default authentication will be used for this user account instead. To
learn more, see “Managing Security Providers” on page 889.
In Siperian Hub implementations that are not tied to an external user directory (see
“External User Directory” on page 837), you can use Siperian Hub to manage
supplemental information for each user, such as their e-mail address and phone
numbers. Siperian Hub does not require that you provide this information, nor does
Siperian Hub use this information in any special way.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Select the user whose properties you want to edit.
5. Click the Edit button.
Property Description
Title User’s title, such as Dr. or Ms. Click the drop-down list and
select a title.
Initials User’s initials.
Suffix User’s suffix, such as MD or Jr.
Job title User’s job title.
Email User’s e-mail address.
Telephone area code Area code for user’s telephone number.
Telephone number User’s telephone number.
Fax area code Area code for user’s fax number.
Property Description
Fax number User’s fax number.
Mobile area code Area code for user’s mobile phone.
Mobile number User’s mobile phone number.
Login message Message that the Hub Console displays after this user logs in.
7. Click OK.
8. Click the Save button to save your changes.
To remove a user:
1. Start the Users tool. To learn more, see “Starting the Users Tool” on page 868.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Select the user that you want to remove.
5. Click the button.
In the Users tool prompts you to confirm deletion.
6. Click Yes to confirm deletion.
The Users tool removes the deleted user account from the list of users on the
Users tab.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Select the user whose password you want to change.
6. Specify the new password and in both the Password and Verify password fields, if
you want.
7. Do one of the following:
• Check (select) this option to use external authentication using a third-party
security provider instead of Siperian Hub’s default authentication. To learn
more, see “Managing Security Providers” on page 889.
• Uncheck (clear) this option to use the default Siperian Hub authentication.
8. Click OK.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Target Database tab.
4. Expand each database node to see which users that can access that database.
5. To change user assignments to a database, right-click on the database name and
choose Assign User.
The Users tool displays the Assign User to Database dialog.
6. Check (select) the names of any users that you want to assign to the selected
database.
7. Uncheck (clear) the names of any users that you want to unassign from the
selected database.
8. Click OK.
The global password policy applies to users who do not have private password policies
specified for them (as described in “Specifying Private Password Policies for Individual
Users” on page 879).
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Global Password Policy tab.
Policy Description
Password Length Minimum and maximum length, in characters.
Password Expiry Do one of the following:
• Check (select) the Password Expires check box and specify the
number of days before the password expires.
• Uncheck (clear) the Password Expires check box so that the
password never expires.
Login Settings Number of grace logins and maximum number of failed logins.
Policy Description
Password History Number of times that a password can be re-used.
Password Other configuration settings, such as:
Requirements
• enforce case-sensitivity
• enforce password validation
• enforce a minimum number of unique characters
• password patterns
For any given user, you can specify a private password policy that overrides the global
password policy (see “Managing the Global Password Policy” on page 877).
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Select the user for whom you want to set the private password policy.
5. Click the button.
The Users tool displays the Private Password Policy window for the selected user.
To configure user names and passwords for a secured JDBC data source in the
cmxserver.properties file, use the following parameters:
databaseId.username=username
databaseId.password=encryptedPassword
where databaseId is the unique ID of the JDBC data source. For example:
localhost-jdbc-ds.username=weblogic
localhost-jdbc-ds.password=9C03B113CD8E4BBFD236C56D5FEA56EB
You use the Groups tab in the Users and Groups tool in the Security Access Manager
workbench to configure users groups and assign user accounts to user groups. To use
the Users and Groups tool, you must be connected to an ORS.
2. Expand the Security Access Manager workbench and click Users and Groups.
The Hub Console displays the Users and Groups tool, as shown in the following
example.
Tab Description
Groups Used to define user groups and assign users to user groups. To learn
more, see “Configuring User Groups” on page 881.
Users Assigned to Used to associate user accounts with a database. To learn more, see
Database “Assigning Users to the Current ORS Database” on page 886.
Assign Users/Groups Used to associate users and user groups with roles. To learn more,
to Role see “Assigning Users and User Groups to Roles” on page 887.
Assign Roles to User / Used to associate roles with users and user groups. To learn more,
Group see “Assigning Roles to Users and User Groups” on page 888.
4. Scroll the list of user groups and select the user group that you want to edit.
5. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
6. Click the Save button to save your changes.
Select All
Users/ User Groups
6. Check (select) the names of any users and user groups that you want to assign to
the selected user group.
7. Uncheck (clear) the names of any users and user groups that you want to unassign
from the selected user group.
8. Click OK.
5. Check (select) the names of any users that you want to assign to the selected ORS
database.
6. Uncheck (clear) the names of any users that you want to unassign from the
selected ORS database.
7. Click OK.
You can choose the way that is most expedient for your implementation.
4. Select the role to which you want to assign users and user groups.
5. Click the Edit button.
The Users and Groups tool displays the Assign Users to Role dialog.
Select All
Users / User Groups
6. Check (select) the names of any users and user groups that you want to assign to
the selected role.
7. Uncheck (clear) the names of any users and user groups that you want to unassign
from the selected role.
8. Click OK.
4. Select the user or user group to which you want to assign roles.
Clear All
Selected Roles
6. Check (select) the roles that you want to assign to the selected user or user group.
7. Uncheck (clear) the roles that you want to unassign from the selected user or user
group.
8. Click OK.
Service Description
Authentication Authenticates a user by validating their identity. Informs Siperian Hub only
that the user is who they claim to be—not whether they have access to any
Siperian Hub resources.
Authorization Informs Siperian Hub whether a user has the required privilege(s) to access
particular Siperian Hub resources.
User Profile Informs Siperian Hub about individual users, such as user-specific
attributes and the roles to which the user belongs.
Internal Providers
Siperian Hub comes with a set of default internal security providers (labeled Internal
Provider in the Security Providers tool). You can also add your own third-party
security providers. Internal security providers cannot be removed.
The Hub Console displays the Security Providers tool, as shown in the following
example.
In the Security Providers tool, the navigation tree has the following main nodes:
Tab Description
Provider Files Expand to display the provider files that have been uploaded in your
Siperian Hub implementation. For more information, see “Managing
Provider Files” on page 892.
Providers Expand to display the list of providers that are defined in your Siperian
Hub implementation. For more information, see “Managing Security
Provider Settings” on page 896.
The Siperian sample installer copies a sample implementation of a provider file into the
SamSample subdirectory under the target samples directory (such as
c:\siperian\oracle\sample\SamSample). To learn more, see the Siperian Hub
Installation Guide for your platform.
The Security Providers tool displays a list of provider files under the Provider Files
node in the left navigation pane. You use right-click menus in the left navigation pane
of the Security Providers tool to upload, delete, and move provider files in the Provider
Files list.
3. In the left navigation pane, right-click Provider Files and choose Upload Provider
File.
The Security Provider tool prompts you to select the JAR file for this provider.
4. Specify the JAR file, navigating the file system as needed and selecting the JAR file
that you want to upload.
5. Click Open.
The Security Provider tool checks the selected file to determine whether it is a
valid provider file.
If the provider name from the manifest is the same as the name of an existing
provider file, then the Security Provider tool asks you whether to overwrite the
existing provider file. Click Yes to confirm.
The Security Provider tool uploads the JAR file to the application server, adds the
provider file to the list, populates the Providers list with the additional provider
information, and refreshes the left navigation pane.
Once the file has been uploaded, the original file can be removed from the file
system, if you want. The Security Provider tool has already imported the
information and does not subsequently refer to the original file.
Note: Internal security providers that are shipped with Siperian Hub cannot be
removed. For internal security providers, there is no separate provider file under the
Provider Files node.
3. In the left navigation pane, right-click the provider file that you want to delete, and
then choose Delete Provider File.
The Security Provider tool prompts you to confirm deletion.
4. Click Yes.
The Security Provider tool removes the deleted provider file from the list.
You use right-click menus in the left navigation pane of the Security Providers tool to
move providers up and down in the Providers list.
The order of providers in the Provider list represents the order in which they are
invoked. For example, when a user attempts to log in and supplies their user name and
password, Siperian Hub submits their login credentials to each authentication provider
in the Authentication list, proceeding sequentially through the list. If authentication
succeeds with one of the providers in the list, then the user is deemed authenticated.
If authentication fails with all available authentication providers, then authentication for
that user fails. To learn about changing the processing order, see “Moving a Security
Provider Up in the Processing Order” on page 906 and “Moving a Security Provider
Down in the Processing Order” on page 907.
The Security Providers tool displays the Provider panel for the selected provider
file, as shown in the following example.
Field Description
Name Name of this security provider.
Description Description of this security provider.
Provider Type Type of security provider. One of the following values:
• Authentication
• Authorization
• User Profile
For more information, see “About Security Providers” on page 889.
Provider File Name of the provider file associated with this security provider, or
Internal Provider for internal providers. For more information, see
“Managing Provider Files” on page 892.
Enabled Indicates whether this security provider is enabled (checked) or not
(unchecked). Note that internal providers cannot be disabled.
Properties Additional properties for this security provider, if defined by the security
provider. Each property is a name-value pair. A security provider might
require or allow unique properties that you can specify here. To learn more,
see “Configuring Provider Properties” on page 898.
A provider property is a name-value pair that a security provider might require in order to
access for the service(s) that they provide. You can use the Security Providers tool to
define these properties.
3. In the left navigation pane, select the authentication provider for which you want
to edit properties.
4. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
5. Click the Save button to save your changes.
Custom-added Providers
You can also package custom provider classes in the JAR/ZIP file. Specify the settings
for the custom providers in properties file named providers.properties. You must
place this file within the JAR file in the META-INF directory. These settings (that is, the
name/value pairs) are then read by the loader and translated to what is displayed in the
Hub Console.
Note: The provider archive file (JAR/ZIP) must contain all the classes required for the
custom provider to be functional, as well as all of the required resources. These
resources are specific to your implementation.
Siperian Hub supports the use of external authentication for users through the Java
Authentication and Authorization Service (JAAS). Siperian Hub provides templates for
the following types of authentication standards:
• Lightweight Directory Access Protocol (LDAP)
• Microsoft Active Directory
• Network authentication using the Kerberos protocol
These templates provide the settings (protocols, server names, ports, and so on) that
are required for these authentication standards. You can use these templates to add a
new login module and provide the settings you need. To learn more about these
authentication standards, see the applicable vendor documentation.
4. Click the down arrow and select a template for the login module.
5. Click OK.
The Security Providers tool adds the new login module to the list.
6. In the Properties panel, click the Edit button next to any property that you
want to edit, such as its name and description, and change the setting.
For LDAP, you can specify the following settings.
Property Description
java.naming.factory.initial Required. Java class name of the JNDI implementation for
connecting to an LDAP server. Use the following value:
com.sun.jndi.ldap.LdapCtxFactory.
java.naming.provider.url Required. URL of the LDAP server. For example:
ldap://localhost:389/
username.prefix Optional. Tells Siperian Hub how to parse the LDAP
username. An OpenLDAP user name looks like this:
cn=myopenldapuser,dc=siperian,dc=com
where
• myopenldapuser is the user name
• siperian is the domain name
• com is the top-level domain
In this example, the username.prefix is: cn=
username.postfix Optional. User in conjunction with username.prefix. Using
the previous example, set username.postfix to:
,dc=siperian,dc=com
Note the comma in the beginning of the string.
For Microsoft Active directory, you can specify the following settings:
Property Description
java.naming.factory.initial Required. Java class name of the JNDI implementation for
connecting to an LDAP server. Use the following value:
com.sun.jndi.ldap.LdapCtxFactory.
java.naming.provider.url Required. URL of the LDAP server. For example:
ldap://localhost:389/
3. Select the security provider whose properties you want to change, as described in
“Selecting a Security Provider” on page 896.
4. In the Properties panel, click the Edit button next to any property that you
want to edit.
5. Click the Save button to save your changes.
As described in “Sequence of the Providers List” on page 896, Siperian Hub processes
security providers in the order in which they appear in the Providers list.
As described in “Sequence of the Providers List” on page 896, Siperian Hub processes
security providers in the order in which they appear in the Providers list.
This chapter describes how to use the User Object Registry tool to view registered
custom code.
Chapter Contents
• About User Objects
• About User Objects
• Starting the User Object Registry Tool
• Viewing User Exits
• Viewing Custom Stored Procedures
• Viewing Custom Java Cleanse Functions
• Viewing Custom Button Functions
909
About User Objects
Note: To view custom user code in the User Object Registry tool, you must have
registered the following types of objects:
• Custom Stored Procedures; for more information regarding stored procedures, see
“Developing Custom Stored Procedures for Batch Jobs” on page 806
• Custom Java Cleanse Functions; for more information regarding Java cleanse
functions, see “Using Cleanse Functions” on page 414
• Custom Button Functions; for more information regarding custom buttons, see
“About Custom Buttons in the Hub Console” on page 978
Note: You do not have to pre-configure user exit procedures to view them in the User
Object Registry tool.
Column Description
Registered User Object Types Hierarchical tree of user objects registered in the selected
ORS, organized by the following categories:
• User Exits
• Custom Stored Procedures
• Custom Java Cleanse Functions
• Custom Button Functions
User Object Properties Properties for the selected user object.
Note: The User Object Registry tool displays the types of pre-existing user exits.
Siperian Hub also allows you to create and run custom stored procedures for batch
jobs. For more information, see “Developing Custom Stored Procedures for Batch
Jobs” on page 806. You can also create and run stored procedures using the SIF API
(using Java, SOAP, or HTTP/XML). For more information, see the Siperian Services
Integration Framework Guide.
The User Object Registry tool displays the registered custom Java cleanse
functions, as shown in the following example.
Server and client-based custom functions are visible in the User Object Registry. For
more information, see “Server-Based and Client-Based Custom Functions” on page
982.
This chapter describes how to set up auditing and debugging in the Hub Console.
Chapter Contents
• About Integration Auditing
• Starting the Audit Manager
• Auditing SIF API Requests
• Auditing Message Queues
• Auditing Errors
• Using the Audit Log
919
About Integration Auditing
Auditing is configured separately for each Operational Record Store (ORS) in your
Siperian Hub implementation.
Auditable Events
Integration with external applications often involves complexity. Multiple applications
interact with each other, exchange data synchronously or asynchronously, use data
transformations back and forth, and engage various business rules to execute business
processes across applications.
The Siperian Hub audit mechanism is optional and configurable. It tracks invocations
of SIF requests that are audit-enabled, collects data about what occurred when, and
provides some contextual information as to why certain actions were fired. It stores
audit information in an audit log table (C_REPOS_AUDIT) that you can subsequently
view using TOAD or another compatible, external data management tool.
Note: Auditing is in effect whether metadata caching is enabled (on) or disabled (off).
Pane Description
Navigation pane Shows (in a tree view) the following information:
• auditing types for this Siperian Hub implementation (see “Auditable
API Requests and Message Queues” on page 923)
• the systems to audit (see “Systems to Audit” on page 923)
• message queues to audit (see “Auditing Message Queues” on page
928)
Properties pane Shows the properties for the selected auditing type or system.
Type Description
API Requests Request invocations made by external applications using the Services
Integration Framework (SIF) Software Development Kit (SDK).
Message Queues Message queues used for message triggers. To learn more, see
Chapter 16, “Configuring the Publish Process”.
Note: Message queues are defined at the CMX_SYSTEM level.
These settings apply only to messages for this Operational Record
Store (ORS).
Systems to Audit
For each type of item to audit, the Audit Manager displays the list of systems that can
be audited, along with the SIF requests that are associated with that system.
System Description
No System Services that are not—or not necessarily—associated with a specific system
(such as merge operations).
Admin Services that are associated with the Admin system.
System Description
Defined Source Services that are associated with predefined source systems. To learn more,
Systems see “About the Databases Tool” on page 60.
Note: The same API request or message queue can appear in multiple source systems
if, for example, its use is optional on one of those source systems.
Audit Properties
Note: A write lock is not required to configure auditing.
When you select an item to audit, the Audit Manager displays properties in the
properties pane with the following configurable settings.
Field Description
System Name Name of the selected system. Read-only.
Description Description of the selected system. Read-only.
API Request List of API requests that can be audited.
Message Queue List of message queues that can be audited.
Enable Audit? By default, auditing is not enabled.
• Select (check) to enable auditing for the item.
• Clear (uncheck) to disable auditing for the item.
Field Description
Include XML? This check box is available only if auditing is enabled for this item. By
default, capturing XML in the log is not included. To learn more, see
“Capturing XML for Requests and Responses” on page 921.
• Check (select) to include XML in the audit log for this item.
• Uncheck (clear) to exclude XML from the audit log for this item.
Note: Passwords are never stored in the audit log. If a password exists in
the XML stream (whether encrypted or not), Siperian Hub replaces the
password with asterisks, as shown in the following example:
...<get>
<username>admin</username>
<password>
<encrypted>false</encrypted>
<password>******</password>
</password>
...
Important: Selecting this option can cause the audit log file to grow very
large rapidly. To learn more, see “Periodically Purging the Audit Log” on
page 935.
For the Enable Audit? and Include XML? check boxes, you can use the following
buttons.
For more information regarding the SIF API requests, see Siperian Services Integration
Framework Guide.
In the edit pane, the Audit Manager displays the configurable API requests for the
selected system. To learn more, see “Audit Properties” on page 924.
3. For each SIF request that you want to audit, select (check) the Enable Audit check
box.
4. If auditing is enabled for a particular API request and you also want to include
XML associated with that API request in the audit log, then select (check) the
Include XML check box.
5. Click the Save button to save your changes.
Note: Your saved settings might not take effect in the Hub Server for up to 60
seconds.
3. For each message queue that you want to audit, select (check) the Enable Audit
check box.
4. If auditing is enabled for a particular message queue and you also want to include
XML associated with that message queue in the audit log, then select (check) the
Include XML check box.
5. Click the Save button to save your changes.
Note: Your saved settings might not take effect in the Hub Server for up to 60
seconds.
Auditing Errors
You can capture error information for any SIF request invocation that triggers the
error mechanism in the Web service—such as syntax errors, run-time errors, and so
on. You can enable auditing for all errors associated with SIF requests.
Auditing errors is a feature that you enable globally. Even when auditing is not
currently enabled for a particular SIF request, if an error occurs during that SIF request
invocation, then the event is captured in the audit log.
4. If you select Enable Audit and you also want to include XML associated with
errors in the audit log, then select (check) the Include XML check box.
Note: If you only select Enable Audit, Siperian Hub provides the associated audit
information in C_REPOS_AUDIT.
If you also select Include XML, Siperian Hub includes an additional column in
C_REPOS_AUDIT named DATA_XML which includes detail log data for audit.
If you select both check boxes, when you run an Insert, Update, or Delete job in
the Data Manager, or run the associated batch job, Siperian Hub includes the audit
data in DATA_XML of C_REPOS_AUDIT.
5. Click the Save button to save your changes.
Note: The SIF Audit request allows an external application to insert new records in
the C_REPOS_AUDIT table. You would use this request to report activity involving a
record(s) in Siperian Hub, that is at a higher level, or has more information that can be
recorded by the Hub. For example, audit an update to a complex object before
transforming and decomposing it to Hub objects. To learn more, see the Siperian
Services Integration Framework Guide.
If available in the data management tool you use to view the log file, you can focus
your viewing by filtering entries—by audit level (view only debug-level or info-level
entries), by time (view entries within the past hour), by operation success / failure
(show error entries only), and so on.
Here is an example C_REPOS_AUDIT with audit log entries that includes the XML
column. For this example, both Enable Audit and Include XML check boxes were
enabled.
Contents
• Appendix A, “Configuring International Data Support”
• Appendix B, “Backing Up and Restoring Siperian Hub”
• Appendix C, “Configuring User Exits”
• Appendix D, “Viewing Configuration Details”
• Appendix E, “Implementing Custom Buttons in Hub Console Tools”
• Appendix F, “Configuring Access to Hub Console Tools”
937
938 Siperian Hub Administrator Guide
A
Configuring International Data Support
This topic explains how to configure character sets in a Siperian Hub implementation.
The database needs to support the character set you want to use, the terminal must be
configured to support the character set you want to use, and the NLS_LANG
environment variable must include the Oracle name for the character set used by your
client terminal.
Appendix Contents
• Configuring Unicode in Siperian Hub
• Configuring the ANSI Code Page (Windows Only)
• Configuring NLS_LANG
939
Configuring Unicode in Siperian Hub
Notes:
• The NLS_LANG setting should match the database character set.
• The language_territory portion of the NLS_LANG setting (represented as
“AMERICA_AMERICA” in the above example) is locale-specific and might not be
suitable for all Siperian Hub implementations. For example, a Japanese
implementation might need to use the following setting instead:
NLS_LANG=JAPANESE_JAPAN.AL32UTF8
• If you use AL32UTF8 (or even UTF8) as the database character set, then it is
highly recommended that you set NLS_LENGTH_SEMANTICS to CHAR
(in the Oracle init.ora file) when you instantiate the database. Doing so forces
Oracle to default to CHAR (not BYTE) for variable length definitions.
By default, this setting is one (1), which means that column lengths are declared as
byte values. Changing this to zero (0) means that column lengths are declared as
CHAR values in support of Unicode values.
Configuring Populations
By default, Siperian Hub supports the population for the United States (provides a
usa.ysp file in the default installation). If your implementation needs to use a
population other than the US population, then additional analysis of the data is
required.
• If the data is exclusively from a different country, and Siperian provides a
population for that country, then use that population. Contact Siperian Support to
obtain the population.ysp file that is appropriate for your implementation, along
with instructions to enable the population.
• If the data is mostly from one country with very small amounts of mixed data from
one or more other populations, consider using the majority population. Contact
Siperian Support to obtain the population.ysp file for the majority population,
along with any instructions.
• If large quantities of data from different countries are mixed, consider whether it is
meaningful to match across such a disparate set of data. If so, then consider using
Contact Siperian Support to obtain the appropriate means to enable the population
you want to use. The SSA_POPULATION defines the Standard Population Set to use
for match purposes. A Standard Population Set contains the rules that define how
the Key Building, Search Strategies, and Match Purposes operate on a particular
population of data. There is one Standard Population set for each supported
country, language, or population.
2. Copy the appropriate population.ysp file obtained from Siperian Support to the
following location.
Windows
SIP_HOME\cleanse\resources\match
For example:
C:\siperian\hub\cleanse\resources\match
Unix
SIP_HOME/hub/cleanse/
Note: Siperian ships the usa.ysp file by default. If you need to use the population
set for a different country, contact Siperian Support to obtain the population.ysp
file that is appropriate for your implementation, along with instructions to enable
the population.
This setting helps with the processing of UTF8 characters during match, ensuring that
all data is represented in UTF16 (although its representation in the database is still
UTF8).
Siperian Hub provides you with the ability to use multiple populations within a single
base object. This is useful if data in a base object comes from different
populations—for example, 70% of the records from the United States and 30% from
China. Populations can vary on a record-by-record basis.
3. Copy the applicable population.ysp file(s) obtained from Siperian Support to the
following location.
Windows
SIP_HOME\cleanse\resources\match
For example:
C:\siperian\hub\cleanse\resources\match
Unix
SIP_HOME/hub/cleanse/
4. Restart the application server.
5. In the Schema Manager, add a column to the base object that will contain the
population to use for each record.
Note: The width of the VARCHAR column must fit the largest population name
in use. A width of 30 is probably sufficient for most implementations.
6. Configure the match column as an exact match column with the name of SIP_
POP, according to the instructions in “Configuring Match Columns” on page 515.
7. For each record in the base object that will use a non-default population, provide (in
the SIP_POP column) the name of the population to use instead.
• You can specify values for the SIP_POP column in any manner of
ways—adding the data in the landing tables, using cleanse functions that
calculate the values during the stage process, invoking SIF requests from
external applications—even manually editing the cells using the Data Manager
tool. The only requirement is that the SIP_POP cells must contain this data
for all non-default populations just prior to executing the Generate Match
Tokens process.
• The data in the SIP_POP column can be in any case (upper, lower, or mixed)
because all alphabetic characters will be converted to lowercase in the match
key table. For example, Us, US, and us are all valid values for this column.
• Invalid values in this column will be processed using the default population.
Invalid values include NULLs, empty strings, and any string that does not
match a population name as defined in c_repos_ssa_
population.population_name.
8. Execute the Generate Match Tokens process on this base object to update the
match key table.
9. Execute the match process on this base object.
Note: The match process compares only records that share the same population.
For example, it will compare Chinese records with Chinese records, and American
records with American records. Any resulting match pairs will be between records
that share the same population.
Hub Console
In the Hub Console, menus, warnings, and so on are in English. Current Siperian Hub
UTF support applies only to business data—not metadata or the interface. The Hub
Console will have UTF8 support in a future release.
You can configure the system locale settings (which define settings for the system
language) to use UTF-8 by completing the following steps:
2. Determine whether you can find a locale for your language with a name ending
in .utf8.
localedef -f UTF-8 -i en_US en_US.utf8
3. Once you know whether you have a locale that allows you to use UTF-8, instruct
the UNIX system to use that locale.
Export LC_ALL="en_US.utf8"
export LANG="en_US.utf8"
export LANGUAGE="en_US.utf8"
Note: There are many registry entries with very similar names, so be sure to look at the
right place in the registry.
Note: On Windows XP systems, you might need to install support for non-Western
languages.
Configuring NLS_LANG
To specify the locale behavior of your client Oracle software, you need to set your NLS_
LANG setting, which specifies the language, territory, and the character set of your client.
This section describes several ways in which to configure the NLS_LANG setting.
where:
Setting Description
LANGUAGE Specifies the language used for Oracle messages, as well as the names of
days and months.
TERRITORY Specifies monetary and numeric formats, as well as territory and
conventions for calculating week and day numbers.
CHARACTERSET Controls the character set used by the client application, or it matches
your Windows code page, or it is set to UTF8 for a Unicode application.
Note: The character set defined with the NLS_LANG parameter does not change your
client's character set. Instead, it is used to let Oracle know which character set you are
using on the client side so that Oracle can perform the proper conversion.
The character set part of the NLS_LANG parameter is never inherited from the server.
You can modify this subkey using the Windows Registry Editor:
1. From the Start menu, choose Run...
When starting an Oracle tool (such as sqlplusw), the tool will read the contents of the
oracle.key file located in the same directory to determine which registry tree will be
used (therefore, which NLS_LANG subkey will be used).
Because these environment variables take precedence over the parameters specified in
your Windows Registry, you should not set Oracle parameters at this location unless
you have a very good reason. In particular, note that the ORACLE_HOME parameter
is set on Unix but not on Windows.
This appendix explains how to back up and restore a Siperian Hub implementation.
Appendix Contents
• Backing Up Siperian Hub
• Backup and Recovery Strategies for Siperian Hub
951
Backing Up Siperian Hub
Non-logging operations (such as CTAS, Direct Path SQL Load, and Direct Insert) are
occasionally performed on permanent Hub tables to speed-up batch processes. These
operations are not recorded in the redo logs and, as such, are not generally recoverable.
However, recovery is possible if a backup is made immediately after the operations are
completed.
Backup and recovery strategies are dependent on the value of the GLOBAL_NOLOGGING_
IND column (in the C_REPOS_DB_RELEASE table), which turns non-logging operations
on or off.
To recover changes that the non-logging operations make, you must perform an
immediate back-up procedure.
3. Use the following command to disable index creation with the non-logging option:
Run sql:
update c_repos_table set NOLOGGING_IND = 0;
COMMIT;
4. Make sure that the database is running in the archive log mode.
5. Perform a database backup.
6. If recovery is needed, apply redo logs on the backup.
This chapter provides reference information for the various predefined Siperian Hub
user exit procedures.
Appendix Contents
• About User Exits
• Types of User Exits
955
About User Exits
Note: The POST_LANDING, PRE_STAGE, and POST_STAGE user exits are only
called from the batch Stage process. For more information, see “Stage Jobs” on page
745.
Siperian Hub automatically provides the appropriate input parameter values when it
calls a user exit procedure. In addition, Siperian Hub automatically checks the return
code returned by a user exit procedure. A negative return code causes the Hub process
to terminate with an error condition.
A user exit must perform its own transaction handling. COMMITs / ROLLBACKs
must be explicitly issued for any data manipulation operation(s) in a user exit, or in
stored procedures called from user exits. However, this is not true for the Siperian SIF
API requests (for example, Merge, Unmerge, and so on). Transactions for the API
requests are handled by Java code. Any COMMITs / ROLLBACKs in such a case may
cause a Java distributed transaction error.
Note: Dynamic SQL is recommended for all DML/DDL statements, as a user exit
could access objects that only exist at run time.
Note: For Oracle databases, all user exit procedures are located in the cmxue package.
Use a POST_LANDING user exit for custom work on the landing table prior to delta
detection. For example:
• Hard delete detection
• Replace control characters with printable characters
• Perform any special pre-cleansing processes on Addresses
POST_LANDING Parameters
Use a PRE_STAGE user exit for any special handling of delta processes. For example,
use a PRE_STAGE user exit to check delta volumes and determine whether they
exceed pre-defined allowable delta volume limits (for example, “stop process if source
system is System A and the number of deltas is greater than 500,000”).
PRE_STAGE Parameters
Use a POST_STAGE user exit for any special processing at the end of a Stage job.For
example, use a POST_STAGE user exit for special handling of rejected records from
the Stage job (for example, to automatically delete rejects for known, non-critical
conditions).
POST_STAGE Parameters
Use a POST_LOAD user exit after an update or after an insert from Load.
For the Load process, the IN_ACTION_TABLE has the name of the work table
containing the ROWID_OBJECT values to be inserted/updated.
POST_LOAD Parameters
Parameter Name Description
IN_ROWID_JOB Job id for the Load job, as registered in c_repos_job_
control (Blank for the PUT).
IN_TABLE_NAME Name of the target table (Base Object / Relationship Table
/ Dependent Object) for the Load job.
IN_STAGE_TABLE Name of the source table for the Load job.
IN_ACTION_TABLE For the Load job, this is the name of the table containing
the rows to be inserted or updated (staging_table_name_
TINS for inserts, staging_table_name_TOPT for updates).
OUT_ERROR_MESSAGE Error message.
Use a POST_MATCH user exit for custom work on the match table.
For example, use a POST_MATCH user exit to manipulate matches in the match
queue.
POST_MATCH Parameters
Parameter Name Description
IN_ROWID_JOB Job id for the Match job, as registered in c_repos_job_
control
IN_TABLE_NAME Base Object that the Match job is running on.
IN_MATCH_SET_NAME Match ruleset.
OUT_ERROR_MESSAGE Error message.
OUT_RETURN_CODE Return code.
Use a POST_MERGE user exit to perform custom work after the Merge process.
For example, use a POST_MERGE user exit to automatically match and merge child
records affected by the match and merge of a parent record.
POST_MERGE Parameters
Parameter Name Description
IN_ROWID_JOB Job id for the Merge job, as registered in c_repos_job_
control.
IN_TABLE_NAME Base Object that the Merge job is running on.
IN_ROWID_OBJECT_TABLE Bulk merge–action table.
On-line merge–in line view.
OUT_ERROR_MESSAGE Error message.
OUT_RETURN_CODE Return code.
Use a POST_UNMERGE user exit for custom work after the Unmerge process.
POST_UNMERGE Parameters
Parameter Name Description
IN_ROWID_JOB Job id for the Unmerge transaction, as registered in c_repos_
job_control.
IN_TABLE_NAME Base Object that the Unmerge job is running on.
IN_ROWID_OBJECT Re-instated rowid_object.
OUT_ERROR_MESSAGE Error message.
OUT_RETURN_CODE Return code.
Use this user exit to override or extend user assignment lists. This user exit procedure
runs before the user merge assignment is updated. Note that user assignment lists are
stored in C_REPOS_USER_MERGE_ASSIGNMENTS.
This appendix explains how to view the configuration details in a Siperian Hub
implementation using the Enterprise Manager in the Hub Console.
Appendix Contents
• About the Enterprise Manager
• Starting the Enterprise Manager
• Enterprise Manager Properties
967
About the Enterprise Manager
In the Enterprise Manager screen, from the Select a hub component menu, choose
the type of information you want to view: Hub Servers, Cleanse Servers, Master
database, or ORS databases. The screen displays properties that are specific for your
choice.
When you click Version History, you see version information for the choice you made
in the Select a hub component field. Version history is sorted in descending order
of install start time. All version histories of hub components are similar to the graphic
shown below.
To view more information about each property, slide your cursor or mouse over the
property.
The following table describes Hub Server properties that the Enterprise Manager can
display in the Properties tab, depending on your preference. These properties are found
in the cmxserver.properties file (in the hub server installation directory), and are
not configurable.
The following table describes Cleanse Server properties that the Enterprise Manager
can display in the Properties tab, depending on your preference. These properties are
found in the cmxcleanse.properties file.
? cmx.server.datalayer.cleanse.working_
files=KEEP
? cmx.server.datalayer.cleanse.execution=LOC
AL
Installation Installation directory of the cmx.home= C:/siperian/hub/server
directory Siperian Hub server
Application server Type of application server: cmx.appserver.type=<application_server_
type JBoss, Websphere, WebLogic. name>
Number cmx.server.match.server_encoding=0
The top panel contains a list of ORS databases that are registered with the Master
Database. The bottom panel displays the properties and the version history of the ORS
that is selected in the top panel. Properties of ORS include database vendor and
version, as well as information from the C_REPOS_DB_RELEASE table. Version
history is also kept in the C_REPOS_DB_VERSION table.
Environment Report
When you choose Environment from the Select field, the Enterprise Manager displays
a summary of the properties of all the other choices, along with any associated error
messages. This report can be downloaded in HTML format to a file system.
This chapter explains how, in a Siperian Hub implementation, you can add custom
buttons to tools in the Hub Console that allow you to invoke external services on
demand.
Appendix Contents
• About Custom Buttons in the Hub Console
• Adding Custom Buttons
977
About Custom Buttons in the Hub Console
Custom buttons can give users the ability to invoke a particular external service (such
as retrieving data or computing results), perform a specialized operation (such as
launching a workflow), and other tasks. Custom buttons can be designed to access data
services by a wide range of service providers, including—but not limited
to—enterprise applications (such as CRM or ERP applications), external service
providers (such as foreign exchange calculators, publishers of financial market indexes,
or government agencies), and even Siperian Hub itself (for more information, see the
Siperian Services Integration Framework Guide).
For example, you could add a custom button that invokes a specialized cleanse
function, offered as a Web service by a vendor, that cleanses data in the customer
record that is currently selected in the Merge Manager screen. When the user clicks the
button, the underlying code would capture the relevant data from the selected record,
create a request (possibly including authentication information) in the format expected
by the Web service, and then submit that request to the Web service for processing.
When the results are returned, the Hub displays the information in a separate Swing
dialog (if you created one and if you implemented this as a client custom function) with
the customer rowid_object from Siperian Hub.
Custom buttons are not installed by default, nor are they required for every Siperian
Hub implementation. For each custom button you need to implement a Java interface,
package the implementation in a JAR file, and deploy it by running a command-line
utility. To control the appearance of the custom button in the Hub Console, you can
supply either text or an icon graphic in any Swing-compatible graphic format (such as
JPG, PNG, or GIF).
The custom code can process the service response as appropriate—log the results,
display the data to the user in a separate Swing dialog (if custom-coded and the custom
function is client-side), allow users to copy and paste the results into a data entry field,
execute real-time PUT statements of the data back into the correct business objects,
and so on.
Swing dialog
Hub
Custom buttons are displayed to the right of the top panel of the Merge Manager, in
the same location as the regular Merge Manager buttons. This example shows a button
called fx.
Custom
Button
Custom buttons are displayed in the top part of the top panel of the Hierarchy
Manager screen, in the same location as other Hierarchy Manager buttons. This
example shows a button called fx.
Custom
Button
Once an external service button is visible in the Hub Console, users can click the
button to invoke the service.
com.siperian.mrm.customfunctions.api.CustomFunction
To learn more about this interface, see the Javadoc that accompanies your Siperian
Hub distribution.
Environment Description
Client UI-based custom function—Recommended when you want to display
elements in the user interface, such as a separate dialog that displays
response information. To learn more, see “Example Client-Based Custom
Function” on page 982.
Server Server-based custom button—Recommended when it is preferable to call
the external service from the server for network or performance reasons.
To learn more, see “Example Server-Based Function” on page 984.
This section provides the Java code for two example custom functions that implement
the com.siperian.mrm.customfunctions.api.CustomFunction interface. The
code simply prints (on standard error) information to the server log or the Hub
Console log.
The name of the client function class for the following sample code is
com.siperian.mrm.customfunctions.test.TestFunction.
//=====================================================================
//project: Siperian Master Reference Manager, Hierarchy Manager
//---------------------------------------------------------------------
//copyright: Siperian Inc. (c) 2008-2009. All rights reserved.
//=====================================================================
package com.siperian.mrm.customfunctions.test;
import java.awt.Frame;
import java.util.Properties;
import javax.swing.Icon;
import com.siperian.mrm.customfunctions.api.CustomFunction;
The name of the server function class for the following code is
com.siperian.mrm.customfunctions.test.TestFunctionClient.
//=====================================================================
//project: Siperian Master Reference Manager, Hierarchy Manager
//---------------------------------------------------------------------
//copyright: Siperian Inc. (c) 2008-2009. All rights reserved.
//=====================================================================
package com.siperian.mrm.customfunctions.test;
import java.awt.Frame;
import java.util.Properties;
import javax.swing.Icon;
import com.siperian.mrm.customfunctions.api.CustomFunction;
/**
* This is a sample custom function that is executed on the Server.
* To deploy this function, put it in a jar file and upload the jar file
* to the DB using DeployCustomFunction.
*/
public class TestFunction implements CustomFunction {
public String getActionText() {
return "Test Server";
}
public Icon getGuiIcon() {
return null;
}
Method Description
getActionText Specify the text for the button label. Uses the default visual appearance
for custom buttons.
getGuiIcon Specify the icon graphic in any Swing-compatible graphic format (such as
JPG, PNG, or GIF). This image file can be bundled with the JAR file for
this custom function.
Label Description
(L)ist Displays a list of currently-defined custom buttons.
(A)dd Adds a new custom button. The DeployCustomFunction tool prompts
you to specify:
• the JAR file for your custom button
• the name of the custom function class that implements the
com.siperian.mrm.customfunctions.api.CustomFunc
tion interface
• the type of the custom button: m—Merge Manager, h—Hierarchy
Manager (you can specify one or two letters)
Label Description
(U)pdate Updates the JAR file for an existing custom button.
The DeployCustomFunction tool prompts you to specify:
• the rowID of the custom button to update
• the JAR file for your custom button
• the name of the custom function class that implements the
com.siperian.mrm.customfunctions.api.CustomFunc
tion interface
• the type of the custom button: m—Merge Manager, h—Hierarchy
Manager (you can specify one or two letters)
(C)hange Type Changes the type of an existing custom button. The
DeployCustomFunction tool prompts you to specify:
• the rowID of the custom button to update
• the type of the custom button: m—Merge Manager, and /or
h—Hierarchy Manager (you can specify one or two letters)
(S)et Properties Specify a properties file, which defines name/value pairs that the
custom function requires at execution time (name=value).
The DeployCustomFunction tool prompts you to specify the
properties file to use.
(D)elete Deletes an existing custom button. The DeployCustomFunction tool
prompts you to specify the rowID of the custom button to delete.
(Q)uit Exits the DeployCustomFunction tool.
Appendix Contents
• About User Access to Hub Console Tools
• Starting the Tool Access Tool
• Granting User Access to Tools and Processes
• Revoking User Access to Tools and Processes
You use the Tool Access tool in the Configuration workbench to configure access to
Hub Console tools. To use the Tool Access tool, you must be connected to the master
database.
Note: The Tool Access tool applies only to Siperian Hub users who are not configured
as administrators (users who do not have the Administrator check box selected in the
Users tool, as described in “Editing User Accounts” on page 870).
989
Starting the Tool Access Tool
In the above example, the cmx_global user account exists only to store the global
password policy, which is described in “Managing the Global Password Policy” on page
877.
2. In the Tool Access tool, scroll the User list and select the user that you want to
configure.
3. Do one of the following:
• In the Available processes list, select a process to which you want to grant
access.
• In the Available workbenches list, select a workbench containing the tool(s)
to which you want to grant access.
4. Click the button.
The Tool Access tool adds the selected tool or process to the Accessible tools
and processes list. Granting access to a process automatically grants access to any
tool that the process uses. Granting access to a tool automatically grants access to
any process that uses the tool.
The user will have access to these processes and tools for every ORS to which they
have access. You cannot give a user access to one tool for one ORS and another tool
for a different ORS.
Note: If you want to grant access to only some of the tools in a workbench, then
expand the associated workbench in the Accessible tools and processes list, select
the tool, and revoke access according to the instructions in the next section, “Revoking
User Access to Tools and Processes” on page 992.
2. In the Tool Access tool, scroll the User list and select the user that you want to
configure.
3. Scroll the Accessible tools and processes list and select the process, workbench, or
tool to which you want to revoke access.
To select a tool, expand the associated workbench.
4. Click the button.
The Tool Access tool prompts you to confirm that you want to remove the access.
5. Click Yes.
The Tool Access tool removes the selected item from the Accessible tools and
processes list. Revoking access to a process automatically revokes access to any
tool that the process uses. Revoking access to a tool automatically revokes access
to any process that uses the tool.
accept limit
A number that determines the acceptability of a match. The accept limit is defined by
Siperian within a population in accordance with its match purpose.
This is a state associated with a base object or cross reference record. A base object
record is active if at least one of its cross reference records is active. A cross reference
record contributes to the consolidated base object only if it is active.
Active records participate in Hub processes by default. These are the records that are
available to participate in any operation. If records are required to go through an
approval process, then these records have been through that process and have been
approved.
Activity Manager
Siperian Activity Manager (AM) evaluates data events, synchronizes master data, and
delivers unified views of reference and activity data from disparate sources. AM builds
upon the extensible, template-driven schema of Siperian Hub and uses the rules-based,
configurable approach to combining reference, relationship, and activity data.
It conducts a rules evaluation using a combination of reference, transactional, and
analytical data from disparate sources. It also conducts an event-driven, rules-based
orchestration of data write-backs to selected sources, and performs other event-driven,
rules-based actions to centralize data integration and delivery of relevant data to
993
subscribing users and applications. AM has an intuitive, powerful UI for defining,
designing, delivering, and managing unified views to downstream applications and
systems, as well as built-in lineage, history, and audit functionality.
Default source system. Used for manual trust overrides and data edits from the Data
Manager or Merge Manager tools. See source system.
administrator
Siperian Hub user who has the primary responsibility for configuring the Siperian Hub
system. Administrators access Siperian Hub through the Hub Console, and use
Siperian Hub tools to configure the objects in the Hub Store, and create and modify
Siperian Hub security.
authentication
Process of verifying the identity of a user to ensure that they are who they claim to be.
In Siperian Hub, users are authenticated based on their supplied credentials—user
name / password, security payload, or a combination of both. Siperian Hub provides
an internal authentication mechanism and also supports user authentication via
third-party authentication providers. See credentials, security payload.
authorization
automerge
Process of merging records automatically. For merge-style base objects only. Match
rules can result in automatic merging or manual merging. A match rule that instructs
base object
A table that contains information about an entity that is relevant to your business, such
as customer or account.
batch group
A collection of individual batch jobs (for example, Stage, Load, and Match jobs) that
can be executed with a single command. Each batch job in a group can be executed
sequentially or in parallel to other jobs. See also batch job.
batch job
batch mode
Way of interacting with Siperian Hub via batch jobs, which can be executed in the Hub
Console or using third-party management tools to schedule and execute batch jobs (in
the form of stored procedures) on the database server. See also real-time mode, batch
job, batch group, stored procedure.
A record that has been consolidated with the best cells of data from the source records.
Sometimes abbreviated as BVT.
• For merge-style base objects, the base object record is the BVT record, and is built
by consolidating the most-trustworthy cell values from the corresponding source
records.
Glossary 995
BI vendor
The process for removing redundant matching in advance of the consolidate process.
For example, suppose a base object had the following match pairs:
• record 1 matches to record 2
• record 2 matches to record 3
• record 3 matches to record 4
After running the match process and creating build match groups, and before the
running consolidation process, you might see the following records:
• record 2 matches to record 1
• record 3 matches to record 1
• record 4 matches to record 1
bulk merge
See automerge.
BVT
cascade delete
When the Delete stored procedure deletes records in the parent object, it also removes
the affected records in the child base object. To enable a cascade delete operation, set
the CASCADE_DELETE_IND parameter to 1. The Delete job checks each child BO table
for related data that should be deleted given the removal of the parent BO record.
cascade unmerge
When records in a parent object are unmerged, Siperian Hub also unmerges affected
records in the child base object.
cell
Intersection of a column and a record in a table. A cell contains a data value or null.
change list
cleanse
cleanse engine
A cleanse engine is a third party product used to perform data cleansing with the
Siperian Hub.
cleanse function
Code changes the incoming data during Stage jobs, converting each input string to an
output string. Typically, these functions are used to standardize data and thereby
Glossary 997
optimize the match process. By combining multiple cleanse functions, you can perform
complex filtering and standardization. See also data cleansing, internal cleanse.
cleanse list
A logical grouping of rules for replacing parts of an input string during the cleanse
process. See cleanse function, data cleansing.
The Cleanse Match Server run-time component is a servlet that handles cleanse
requests. This servlet is deployed in an application server environment. The servlet
contains two server components:
• a cleanse server that handles data cleansing operations
• a match server that handles match operations
The Cleanse Match Server is multi-threaded so that each instance can process multiple
requests concurrently. It can be deployed on a variety of application servers.
The Cleanse Match Server interfaces with any of the supported cleanse engines, such as
the Trillium Director cleanse engine. The Cleanse Match Server and the cleanse engine
work to standardize the data. This standardization works closely with the Siperian
Consolidation Engine (formerly referred to as the Merge Engine) to optimize the data
for consolidation.
column
In a table, a set of data values of a particular type, one for each row of the table. See
system column, user-defined column.
A change list that is the result of comparing the contents of two repositories and
generating the list of changes to make to the target repository. Comparison change lists
are used in Metadata Manager when promoting or importing design objects. See also
change list, creation change list, Metadata Manager.
The display of the complete or original match chain that caused two records to be
matched through intermediate records.
conditional mapping
A mapping between a column in a landing table and a staging table that uses a SQL
WHERE clause to conditionally select only those records in the landing table that meet
the filter condition. See mapping, distinct mapping.
Configuration workbench
Includes tools for configuring a variety of Hub objects, including, the ORS, users,
security, message queues, and metadata validation.
consolidation process
Process of merging or linking duplicate records into a single record. The goal in
Siperian Hub is to identify and eliminate all duplicate data and to merge or link them
together into a single, consolidated record while maintaining full traceability.
consolidation indicator
Indicator
Value State Name Description
1 CONSOLIDATED Indicates the record has been through the
match and merge process.
2 UNMERGED Indicates that the record has gone through the
match process.
Glossary 999
Indicator
Value State Name Description
3 QUEUED_FOR_MATCH Indicates that the record is ready to be put
through the match process against the rest of
the records in the base object.
4 NEWLY_LOADED Indicates that the record has been newly loaded
into the base object and has not gone through
the match process.
9 ON_HOLD Indicates that the Data Steward has put the
record on hold, to deal with later.
control table
A type of system table in an ORS that Siperian Hub automatically creates for a base
object. Control tables are used in support of the load, merge, and unmerge processes.
For each trust-enabled column in a base object, Siperian Hub maintains a record (the
last update date and an identifier of the source system) in a corresponding control
table.
A change list that is the result of exporting the contents of a repository. Creation
change lists are used in Metadata Manager for importing design objects. See also
change list, comparison change list, Metadata Manager.
credentials
What a user supplies at login time to gain access to Siperian Hub resources. Credentials
are used during the authorization process to determine whether a user is who they
claim to be. Login credentials might be a user name and password, a security payload
(such as a security token or some other binary data), or a combination of user
name/password and security payload. See authentication, security payload.
A type of system table in an ORS that Siperian Hub automatically creates for a base
object. For each record of the base object, the cross-reference table contains zero to n
(0-n) records per source system. This record contains the primary key from the source
system and the most recent value that the source system has provided for each cell in
the base object table.
A discipline within Master Data Management (MDM) that focuses on customer master
data and its related attributes. See master data.
These application server level capabilities enable Siperian Hub to support multiple
modes of data access and expose numerous Siperian Hub data services via the Siperian
Services Integration Framework (SIF). This facilitates both real-time synchronous
integration, as well as asynchronous integration.
database
Organized collection of data in the Hub Store. Siperian Hub supports two types of
databases: a Master Database and an Operational Record Store (Operational Record
Store). See Master Database, Operational Record Store (ORS), and Hub Store.
data cleansing
The process of standardizing data content and layout, decomposing and parsing text
values into identifiable elements, verifying identifiable values (such as zip codes) against
data libraries, and replacing incorrect values with correct values from data libraries. See
cleanse function.
Glossary 1001
Data Manager
Use the Data Manager tool to search for records, view their cross-references, unmerge
records, unlink records, view history records, create new records, edit records, and
override trust settings. The Data Manager displays all records that meet the search
criteria you define.
datasource
data steward
Siperian Hub user who has the primary responsibility for data quality. Data stewards
access Siperian Hub through the Hub Console, and use Siperian Hub tools to
configure the objects in the Hub Store.
Part of the Siperian Hub UI used to review consolidated data as well as matched data
queued for exception handling by data analysts or stewards who understand the data
semantics and are guardians of data reliability in an organization.
Includes tools for using the Data Manager, Merge Manager, and Hierarchy Manager.
decay curve
Visually shows the way that trust decays over time. Its shape is determined by the
configured decay type and decay period. See decay period, decay type.
decay period
The amount of time (days, weeks, months, quarters, and years) that it takes for the trust
level to decay from the maximum trust level to the minimum trust level. See decay
curve, decay type.
decay type
The way that the trust level decreases during the decay period. See linear decay, RISL
decay, SIRL decay, decay curve, decay period.
Deleted records are records that are no longer desired to be part of the Hub’s data.
These records are not used in process (unless specifically requested). Records can only
be deleted explicitly and once deleted can be restored if desired. When a record that is
Pending is deleted, it is permanently deleted and cannot be restored.
delta detection
During the stage process, Siperian Hub only processes new or changed records when
this feature is enabled. Delta detection can be done either by comparing entire records
or via a date column.
Glossary 1003
dependent object
A table that is used to store detail information about the records in a base object (for
example, supplemental notes). One record in a base object table can map to multiple
records in a dependent object table.
design object
Parts of the metadata used to define the schema and other configuration settings for an
implementation. Design objects include instances of the following types of Siperian
Hub objects: base objects and columns, landing and staging tables, columns, indexes,
relationships, mappings, cleanse functions, queries and packages, trust settings,
validation and match rules, Security Access Manager definitions, Hierarchy Manager
definitions, and other settings. See metadata, Metadata Manager.
distinct mapping
A mapping between a column in a landing table and a staging table that selects only the
distinct records from the landing table. Using distinct mapping is useful in situations in
which you have a single landing table feeding multiple staging tables and the landing
table is denormalized (for example, it contains both customer and address data). See
mapping, conditional mapping.
A source system that provides data that gets inserted into the base object without being
consolidated. See source system.
distribution
Process of distributing the master record data to other applications or databases after
the best version of the truth has been establish via reconciliation. See reconciliation,
publish.
Operation that occurs when inserting or updating data using the load process or using
CleansePut & Put APIs when a validation rule reduces the trust for a record by a
percentage.
duplicate
One or more records in which the data in certain columns (such as name, address, or
organization data) is identical or nearly identical. Match rules executed during the
match process determine whether two records are sufficiently similar to be considered
duplicates for consolidation purposes.
entity
In Hierarchy Manager, an entity is a typed object that can be related to other entities.
Examples of entities are: individual, organization, product, and household. See entity
type.
An entity base object is a base object used to store information about Hierarchy
Manager entities. See entity type and entity.
entity type
In Hierarchy Manager, entity types define the kinds of objects that can be related using
Hierarchy Manager. Examples are individual, organization, product, and household. All
entities with the same entity type are stored in the same entity base object. In the HM
Configuration tool, entity types are displayed in the navigation tree under the Entity
Object with which the Type is associated. See entity.
exact match
A match / search strategy that matches only records that are identical. If you specify an
exact match, you can define only exact match columns for this base object
Glossary 1005
(exact-match base objects cannot have fuzzy match columns). A base object that uses
the exact match / search strategy is called an exact-match base object. See also match /
search strategy, fuzzy match.
exclusive lock
In the Hub Console, a lock that is required in order to make exclusive changes to the
underlying schema. An exclusive lock prevents all other Hub Console users from
making changes to the target database at the same time. An exclusive lock must be
released by the user with the exclusive lock; it cannot be cleared by another user. See
write lock.
execution path
The sequence in which batch jobs are executed when the entire batch group is
executed in the Siperian Hub. The execution path begins with the Start node and ends
with the End node. The Batch Group tool does not validate the execution sequence for
you—it is up to you to ensure that the execution sequence is correct.
export process
Siperian Hub user who access Siperian Hub data indirectly via third-party applications.
external cleanse
The process of cleansing data prior to populating the landing tables. External cleansing
is typically performed outside of Siperian Hub using an extract-transform-load (ETL)
external match
Process that allows you to match new data (stored in a separate input table) with
existing data in a fuzzy-match base object, test for matches, and inspect the results—all
without actually changing data in the base object in any way, or changing the match
table associated with the base object.
A software tool (external to Siperian Hub) that extracts data from a source system,
transforms the data (using rules, lookup tables, and other functionality) to convert it to
the desired state, and then loads (writes) the data to a target database. For Siperian Hub
implementations, ETL tools are used to extract data from source systems and populate
the landing tables. See also data cleansing, external cleanse.
foreign key
fuzzy match
A match / search strategy that uses probabilistic matching, which takes into account
spelling variations, possible misspellings, and other differences that can make matching
records non-identical. If selected, Siperian Hub adds a special column (Fuzzy Match
Key) to the base object. This column is the primary field used during searching and
matching to generate match candidates for this base object. All fuzzy base objects have
one and only one Fuzzy Match Key. A base object that uses the fuzzy match / search
strategy is called a fuzzy-match base object. Using fuzzy match requires a selected
population. See also match / search strategy, exact match, and population.
Glossary 1007
global business identifier (GBID)
A column that contains common identifiers (key values) that allow you to uniquely and
globally identify a record based on your business needs. Examples include:
• identifiers defined by applications external to Siperian Hub, such as ERP or CRM
systems.
• Identifiers defined by external organizations, such as industry-specific codes (AMA
numbers, DEA numbers. and so on), or government-issued identifiers (social
security number, tax ID number, driver’s license number, and so on).
hard delete
A base object or XREF record is physically removed from the database. See soft delete.
Hierarchies Tool
Siperian Hub administrators use the design-time Siperian Hierarchies tool (was
previously the “Hierarchy Manager Configuration Tool”) to set up the structures
required to view and manipulate data relationships in Hierarchy Manager. Use the
Hierarchies tool to define Hierarchy Manager components—such as entity types,
hierarchies, relationships types, packages, and profiles—for your Siperian Hub
implementation. See Hierarchy Manager.
Hierarchy Manager
Part of the Siperian Hub UI used to set up the structures required to view and
manipulate data relationships. Siperian Hierarchy Manager (Hierarchy Manager or HM)
builds on Siperian Master Reference Manager (MRM) and the repository managed by
Siperian Hub for reference and relationship data. Hierarchy Manager gives you visibility
into how relationships correlate between systems, enabling you to discover
opportunities for more effective customer service, to maximize profits, or to enact
compliance with established standards.
The Hierarchy Manager tool is accessible via the Data Steward workbench.
In Hierarchy Manager, a set of relationship types. These relationship types are not
ranked based on the place of the entities of the hierarchy, nor are they necessarily
related to each other. They are merely relationship types that are grouped together for
ease of classification and identification. See hierarchy type, relationship, relationship
type.
hierarchy type
history table
HM package
hotspot
Hub Console
Siperian Hub user interface that comprises a set of tools for administrators and data
stewards. Each tool allows users to perform a specific action, or a set of related actions,
such as building the data model, running batch jobs, configuring the data flow, running
Glossary 1009
batch jobs, configuring external application access to Siperian Hub resources, and other
system configuration and operation tasks.
hub object
A generic term for various types of objects defined in the Hub that contain
information about your business entities. Some examples include: base objects,
dependent objects, cross reference tables, and any object in the hub that you can
associate with reporting metrics.
Hub Server
A run-time component in the middle tier (application server) used for core and
common services, including access, security, and session management.
Hub Store
In a Siperian Hub implementation, the database that contains the Master Database and
one or more Operational Record Store (ORS) database. See Master Database,
Operational Record Store (ORS).
immutable source
A data source that always provides the best, final version of the truth for a base object.
Records from an immutable source will be accepted as unique and, once a record from
that source has been fully consolidated, it will not be changed—even in the event of a
merge. Immutable sources are also distinct systems. For all source records from an
immutable source system, the consolidation indicator for Load and PUT is always 1
(consolidated record).
implementer
Siperian Hub user who has the primary responsibility for designing, developing, testing,
and deploying Siperian Hub according to the requirements of an organization. Tasks
include (but are not limited to) creating design objects, building the schema, defining
match rules, performance tuning, and other activities.
In Metadata Manager, the process of adding design objects from a library or change list
to a repository. The design object does not already exist in the target repository.
See also Metadata Manager, validation process, promotion process, change list.
incremental load
Any load process that occurs after a base object has undergone its initial data load.
Called incremental loading because only new or updated data is loaded into the base
object. Duplicate data is ignored. See initial data load.
The very first time that you data is loaded into an empty base object. During the initial
data load, all records in the staging table are inserted into the base object as new
records.
Insight Manager
The Insight Manager is a Siperian Hub product that generates reporting metadata for
data in the Hub Store, including information about data quality, hub performance, and
data steward productivity. Insight Manager uses this reporting metadata to create
reports and metrics for this data. In addition, third-party reporting tools can be
integrated into the Siperian Hub for report generation.
internal cleanse
The process of cleansing data during the stage process, when data is copied from
landing tables to the appropriate staging tables. Internal cleansing occurs inside
Siperian Hub using configured cleanse functions that are executed by the Cleanse
Match Server in conjunction with a supported cleanse engine. See also data cleansing,
cleanse engine, external cleanse.
Glossary 1011
job execution log
In the Batch Viewer and Batch Group tools, a log that shows job completion status
with any associated messages, such as success, failure, or warning.
For Siperian Hub implementations, a script that is used in job scheduling software
(such as Tivoli or CA Unicenter) that executes Siperian Hub batch jobs via stored
procedures.
A Siperian Hub batch job that matches records from two or more sources when these
sources use the same primary key. Key Match jobs compare new records to each other
and to existing records, and then identify potential matches based on the comparison
of source record keys as defined by the primary key match rules. See primary key match
rule, match process.
key type
Identifies important characteristics about the match key to help Siperian Hub generate
keys correctly and conduct better searches. Siperian Hub provides the following match
key types: Person_Name, Organization_Name, and Address_Part1. See match process.
key width
During match, determines how fast searches are during match, the number of possible
match candidates returned, and how much disk space the keys consume. Key width
options are Standard, Extended, Limited, and Preferred. Key widths apply to fuzzy
match objects only. See match process.
land process
Process of populating landing tables from a source system. See source system, landing
table.
A table where a source system puts data that will be processed by Siperian Hub.
linear decay
The trust level decreases in a straight line from the maximum trust to the minimum
trust. See decay type, trust.
linear unmerge
A base object record is unmerged and taken out of the existing merge tree structure.
Only the unmerged base object record itself will come out the merge tree structure,
and all base object records below it in the merge tree will stay in the original merge
tree.
load insert
When records are inserted into the target table (base object or dependent object).
During the load process, if a record in the staging table does not already exist in the
target table, then Siperian Hub inserts the record into the target table. See load process,
load update.
load process
Process of loading data from a staging table into the corresponding base object or
dependent object in the Hub Store. If the new data overlaps with existing data in the
Hub Store, Siperian Hub uses trust settings and validation rules to determine which
value is more reliable. See trust, validation rule, load insert, load update.
load update
When records are inserted into the target table (base object or dependent object).
During the load process, if a record in the staging table does not already exist in the
Glossary 1013
target table, then Siperian Hub inserts the record into the target table. See load process,
load insert.
lock
lookup
Process of retrieving a data value from a parent table during Load jobs. In Siperian
Hub, when configuring a staging table associated with a base object, if a foreign key
column in the staging table (as the child table) is related to the primary key in a parent
table, you can configure a lookup to retrieve data from that parent table.
manual merge
Process of merging records manually. Match rules can result in automatic merging or
manual merging. A match rule that instructs Siperian Hub to perform a manual merge
identifies records that have enough points of similarity to warrant attention from a data
steward, but not enough points of similarity to allow the system to automatically merge
the records. See automerge.
manual unmerge
mapping
Defines a set of transformations that are applied to source data. Mappings are used
during the stage process (or using the SiperianClient CleansePut API request) to
transfer data from a landing table to a staging table. A mapping identifies the source
column in the landing table and the target column to populate in the staging table,
along with any intermediate cleanse functions used to clean the data. See conditional
mapping, distinct mapping.
The controlled process by which the master data is created and maintained as the
system of record for the enterprise. MDM is implemented in order to ensure that the
master data is validated as correct, consistent, and complete,
and—optionally—circulated in context for consumption by internal or external
business processes, applications, or users. See master data, Customer Data Integration
(CDI).
Master Database
Master Reference Manager (MRM) is the foundation product of Siperian Hub. Siperian
MRM consists of the following major components: Hierarchy Manager, Security
Access Manager, Metadata Manager, Services Integration Framework (SIF), Insight
Manager, and Activity Manager. Its purpose is to build an extensible and manageable
system-of-record for all master reference data. It provides the platform to consolidate
and manage master reference data across all data sources—internal and external—of
an organization, and acts as a system-of-record for all downstream applications.
Glossary 1015
match
match candidate
For fuzzy-match base objects only, any record in the base object that is a possible
match.
match column
A column that is used in a match rule for comparison purposes. Each match column is
based on one or more columns from the base object. See match process.
Match rule that is used to match records based on the values in columns you have
defined as match columns, such as last name, first name, address1, and address2. See
primary key match rule, match process.
When you specify a match column, Siperian Hub creates a special key called a match
key (also known as a token string) on a special table called the match key table
(formerly referred to as the token table or strip table). Before the Siperian Hub Match
batch job runs, it first ensures that the correct match keys have been generated in the
match key table. The match job compares the match keys according to the match rules
that have been defined to determine which records are duplicates. See also tokenizing.
match list
Allows you to traverse the hierarchy between records—whether that hierarchy exists
between base objects (inter-table paths) or within a single base object (intra-table paths).
Match paths are used for configuring match column rules involving related records in
either separate tables or in the same table.
match process
match purpose
For fuzzy-match base objects, defines the primary goal behind a match rule. For
example, if you're trying to identify matches for people where address is an important
part of determining whether two records are for the same person, then you would use
the Match Purpose called Resident. Each match purpose contains knowledge about
how best to compare two records to achieve the purpose of the match. Siperian Hub
uses the selected match purpose as a basis for applying the match rules to determine
matched records. The behavior of the rules is dependent on the selected purpose. See
match process.
match rule
Defines the criteria by which Siperian Hub determines whether records might be
duplicates. Match columns are combined into match rules to determine the conditions
under which two records are regarded as being similar enough to merge. Each match
rule tells Siperian Hub the combination of match columns it needs to examine for
points of similarity. See match process.
A logical collection of match rules that allow users to execute different sets of rules at
different stages in the match process. Match rule sets include a search level that dictates
Glossary 1017
the search strategy, any number of automatic and manual match rules, and optionally, a
filter that allows you to selectively include or exclude records during the match process
Match rules sets are used to execute to match column rules but not primary key match
rules. See match process.
match subtype
Used with base objects that containing different types of data, such as an Organization
base object containing customer, vendor, and partner records. Using match subtyping,
you can apply match rules to specific types of data within the same base object. For
each match rule, you specify an exact match column that will serve as the “subtyping”
column to filter out the records that you want to ignore for that match rule. See match
process.
match table
Type of system table, associated with a base object, that supports the match process.
During the execution of a Match job for a base object, Siperian Hub populates its
associated match table with the ROWID_OBJECT values for each pair of matched
records, as well as the identifier for the match rule that resulted in the match, and an
automerge indicator. See match process.
match token
Strings that encode data in the columns used to identify candidates for matching.
Match tokens are fixed length, compressed, and encoded values built from a
combination of the words and numbers in a name or address such that relevant
variations have the same key value. For each record being matched, the match process
stores a generated match token in the tokenization table associated with the base
object. See match process.
match type
Each match column has a match type that determines how the match column will be
tokenized in preparation for the match comparison. See match process.
Specifies the reliability of the match versus the performance you require: fuzzy or
exact. An exact match / search strategy is faster, but an exact match will miss some
matches if the data is imperfect. See fuzzy match, exact match., match process.
maximum trust
The trust level that a data value will have if it has just been changed. For example, if
source system A changes a phone number field from 555-1234 to 555-4321, the new
value will be given system A’s maximum trust level for the phone number field. By
setting the maximum trust level relatively high, you can ensure that changes in the
source systems will usually be applied to the base object.
merge process
Process of combining two or more records of a base object table because they have the
same value (or very similar values) in the specified match columns. See consolidation
process, automerge, manual merge, manual unmerge.
Merge Manager
Tool used to review and take action on the records that are queued for manual
merging.
message
In Siperian Hub, refers to a Java Message Service (JMS) message. A message queue
server handles two types of JMS messages:
• inbound messages are used for the asynchronous processing of Siperian Hub
service invocations
• outbound messages provide a communication channel to distribute data changes
via JMS to source systems or other systems.
Glossary 1019
message queue
A mechanism for transmitting data from one process to another (for example, from
Siperian Hub to an external application).
A mechanism for identifying base object events and transferring the effected records to
the internal system for update. Message queue rules are supported for updates, merges,
and records accepted as unique.
In Siperian Hub, a Java Message Service (JMS) server, defined in your application
server environment, that Siperian Hub uses to manage incoming and outgoing JMS
messages.
message trigger
A rule that gets fired when which a particular action occurs within Siperian Hub. When
an action occurs for which a rule is defined, a JMS message is placed in the outbound
message queue. A message trigger identifies the conditions which cause the message to
be generated (what action on which object) and the queue on which messages are
placed.
metadata
Data that is used to describe other data. In Siperian Hub, metadata is used to describe
the schema (data model) that is used in your Siperian Hub implementation, along with
related configuration settings. See also Metadata Manager, design object, schema.
Metadata Manager
The Metadata Manager tool in the Hub Console is used to validate metadata for a
repository, promote design objects from one repository to another, import design
objects into a repository, and export a repository to a change list. See also metadata,
metadata validation
minimum trust
The trust level that a data value will have when it is “old” (after the decay period has
elapsed). This value must be less than or equal to the maximum trust. If the maximum
and minimum trust are equal, the decay curve is a flat line and the decay period and
decay type have no effect. See also decay period.
Model workbench
Part of the Siperian Hub UI used to configure the solution during deployment by the
implementers, and for on-going configuration by data architects of the various types of
metadata and rules in response to changing business needs.
Includes tools for creating query groups, defining packages and other schema objects,
and viewing the current schema.
A cross-reference (XREF) record that does not contribute to the BVT (best version of
the truth) of the BO record. As a consequence, the values in XREF will never show up
in the BO record. Note that this is for state-enabled records only.
non-equal matching
When configuring match rules, prevents equal values in a column from matching each
other. Non-equal matching applies only to exact match columns.
Glossary 1021
null value
The absence of a value in a column of a record. Null is not the same as blank or zero.
operation
Database that contains the rules for processing the master data, the rules for managing
the set of master data objects, along with the processing rules and auxiliary logic used
by the Siperian Hub in defining the BVT. A Siperian Hub configuration can have one
or more ORS databases. The default name of an ORS is CMX_ORS. See also Master
Database.
overmatching
For fuzzy-match base objects only, a match that results in too many matches, including
matches that are not relevant. When configuring match, the goal is to find the optimal
number of matches for your data. See undermatching.
package
A package is a public view of one or more underlying tables in Siperian Hub. Packages
represent subsets of the columns in those tables, along with any other tables that are
joined to the tables. A package is based on a query. The underlying query can select a
subset of records from the table or from another package.
password policy
Specifies password characteristics for Siperian Hub user accounts, such as the password
length, expiration, login settings, password re-use, and other requirements. You can
define a global password policy for all user accounts in a Siperian Hub implementation,
and you can override these settings for individual users.
Pending records are records that have not yet been approved for general usage in the
Hub. These records can have most operations performed on them, but operations have
to specifically request Pending records. If records are required to go through an
approval process, then these records have not yet been approved and are in the midst
of an approval process.
Specific security check points that determine, at run time, the validity of a user’s
identity (authentication), along with that user’s access to Siperian Hub resources
(authorization).
Specific security check points that enforce, at run time, security policies for
authentication and authorization requests.
population
Defines certain characteristics about data in the records that you are matching. By
default, Siperian Hub comes with the US population, but Siperian provides a standard
population per country. Populations account for the inevitable variations and errors
that are likely to exist in name, address, and other identification data; specify how
Siperian Hub builds match tokens; and specify how search strategies and match
purposes operate on the population of data to be matched. Used only with the Fuzzy
match/search strategy.
Glossary 1023
primary key
In a relational database table, a column (or set of columns) whose value uniquely
identifies a record. For example, the Department_Number column would be the
primary key of the Department table.
Match rule that is used to match records from two systems that use the same primary
keys for records. See also match column rule.
private resource
A Siperian Hub resource that is hidden from the Roles tool, preventing its access via
Services Integration Framework (SIF) operations. When you add a new resource in
Hub Console (such as a new base object), it is designated a PRIVATE resource by
default. See also secure resource, resource.
privilege
Privileges determine the access that external application users have to Siperian Hub
resources. For example, a role might be configured to have READ, CREATE,
UPDATE, and MERGE privileges on particular packages and package columns. These
privileges are not enforced when using the Hub Console, although the settings still
affect the use of Hub Console to some degree. See secure resource, role.
In Hierarchy Manager, describes what fields and records an HM user may display, edit,
or add. For example, one profile can allow full read/write access to all entities and
relationships, while another profile can be read-only (no add or edit operations
allowed).
promotion process
provider
provider property
A name-value pair that a security provider might require in order to access for the
service(s) that they provide.
publish
query
A request to retrieve data from the Hub Store. Siperian Hub allows administrators to
specify the criteria used to retrieve that data. Queries can be configured to return
Glossary 1025
selected columns, filter the result set with a WHERE clause, use complex query syntax
(such as GROUP BY, ORDER BY, and HAVING clauses), and use aggregate
functions (such as SUM, COUNT, and AVG).
query group
raw table
real-time mode
Way of interacting with Siperian Hub via third-party applications, which invoke
Siperian Hub operations via the Services Integration Framework (SIF) interface. SIF
provides operations for various services, such as reading, cleansing, matching, inserting,
and updating records. See also batch mode, Services Integration Framework (SIF).
reconciliation
For a given entity, Siperian Hub obtains data from one or more source systems, then
reconciles “multiple versions of the truth” to arrive at the master record—the best
version of the truth—for that entity. Reconciliation can involve cleansing the data
beforehand to optimize the process of matching and consolidating records for a base
object. See distribution.
record
regular expression
A computational expression that is used to match and manipulate text data according
to commonly-used syntactic conventions and symbolic patterns. In Siperian Hub, a
regular expression function allows you to use regular expressions for cleanse
operations. To learn more about regular expressions, including syntax and patterns,
refer to the Javadoc for java.util.regex.Pattern.
reject table
A table that contains records that Siperian Hub could not insert into a target table,
such as:
• staging table (stage process) after performing the specified cleansing on a record of
the specified landing table
• Hub store table (load process)
A record could be rejected because the value of a cell is too long, or because the
record’s update date is later than the current date.
relationship
In Hierarchy Manager, describes the affiliation between two specific entities. Hierarchy
Manager relationships are defined by specifying the relationship type, hierarchy type,
attributes of the relationship, and dates for when the relationship is active. See
relationship type, hierarchy.
A relationship base object is a base object used to store information about Hierarchy
Manager relationships.
Glossary 1027
relationship type
repository
An Operational Record Store (ORS). The ORS stores metadata about its own schema
and related property settings. In Metadata Manager, when copying metadata between
repositories, there is always a source repository that contains the design object to copy, and
the target repository that is destination for the design object. See also Metadata Manager,
validation process, import process, promotion process, export process, change list.
request
Siperian Hub request (API) that allows external applications to access specific Siperian
Hub functionality using the Services Integration Framework (SIF), a request/response
API model.
resource
Any Siperian Hub object that is used in your Siperian Hub implementation. Certain
resources can be configured as secure resources: base objects, dependent objects,
mappings, packages, remote packages, cleanse functions, HM profiles, the audit table,
and the users table. In addition, you can configure secure resources that are accessible
by SIF operations, including content metadata, match rule sets, metadata, batch groups,
the audit table, and the users table. See private resource, secure resource, resource
group.
Resource Kit
The Siperian Hub Resource Kit is a set of utilities, examples, and libraries that provide
examples of Siperian Hub functionality that can be expanded on and implemented.
RISL decay
Rapid Initial Slow Later decay puts most of the decrease at the beginning of the decay
period. The trust level follows a concave parabolic curve. If a source system has this
decay type, a new value from the system will probably be trusted but this value will
soon become much more likely to be overridden.
role
Defines a set of privileges to access secure Siperian Hub resources. See user, user
group, privilege.
row
See record.
rule
rule set
Glossary 1029
rule set filtering
Ability to exclude records from being processed by a match rule set. For example, if
you had an Organization base object that contained multiple types of organizations
(customers, vendors, prospects, partners, and so on), you could define a match rule set
that selectively processed only vendors. See match process.
schema
The data model that is used in a customer’s Siperian Hub implementation. Siperian
Hub does not impose or require any particular schema. The schema is independent of
the source systems.
Schema Manager
The Schema Manager is a design-time component in the Hub Console used to define
the schema, as well as define the staging and landing tables. The Schema Manager is
also used to define rules for match and merge, validation, and message queues.
The Schema Viewer tool is a design-time component in the Hub Console used to
visualize the schema configured for your Siperian Hub implementation. The Schema
Viewer is particularly helpful for visualizing a complex schema.
search levels
Defines how stringently Siperian Hub searches for matches: narrow, typical, exhaustive,
or extreme. The goal is to find the optimal number of matches for your data—not too
few (undermatching), which misses significant matches, or too many (overmatching),
which generates too many matches, including insignificant ones. See overmatching,
undermatching.
A protected Siperian Hub resource that is exposed to the Roles tool, allowing the
resource to be added to roles with specific privileges. When a user account is assigned
to a specific role, then that user account is authorized to access the secure resources via
SIF according to the privileges associated with that role. In order for external
applications to access a Siperian Hub resource using SIF operations, that resource must
be configured as SECURE. Because all Siperian Hub resources are PRIVATE by
default, you must explicitly make a resource SECURE after the resource has been
added. See also private resource, resource.
security
Glossary 1031
security provider
security payload
Raw binary data supplied to a Siperian Hub operation request that can contain
supplemental data required for further authentication and/or authorization.
segment matching
Way of limiting match rules to specific subsets of data. For example, you could define
different match rules for customers in different countries by using segment matching
to limit certain rules to specific country codes. Segment matching is configured on a
per-rule basis and applies to both exact-match and fuzzy-match base objects.
The part of Siperian Hub that interfaces with client programs. Logically, it serves as a
middle tier in the client/server model. It enables you to implement the
request/response interactions using any of the following architectural variations:
• Loosely coupled Web services using the SOAP protocol.
• Tightly coupled Java remote procedure calls based on Enterprise JavaBeans (EJBs)
or XML.
• Asynchronous Java Message Service (JMS)-based messages.
• XML documents going back and forth via Hypertext Transfer Protocol (HTTP).
SIRL decay
Slow Initial Rapid Later decay puts most of the decrease at the end of the decay period.
The trust level follows a convex parabolic curve. If a source system has this decay type,
it will be relatively unlikely for any other system to override the value that it sets until
the value is near the end of its decay period.
source record
A raw record from a source system. See also record, source system.
source system
An external system that provides data to Siperian Hub. See distinct source system,
source record.
stage process
Process of reading the data from the landing table, performing any configured
cleansing, and moving the cleansed data into the corresponding staging table. If you
enable delta detection, Siperian Hub only processes new or changed records. See
staging table, landing table.
staging table
A table where cleansed data is temporarily stored before being loaded into base objects
and dependent objects via load jobs. See stage process, load process.
state management
The process for managing the system state of base object and XREF records to affect
the processing logic throughout the MRM data flow. You can assign a system state to
base object and XREF records at various stages of the data flow using the Hub tools
that work with records. In addition, you can use the various Hub tools for managing
Glossary 1033
your schema to enable state management for a base object, or to set user permissions
for controlling who can change the state of a record.
Rules that determine whether and when a record can change from one state to another.
State transition rules differ for base object and cross-reference records.
stored procedure
A named set of Structured Query Language (SQL) statements that are compiled and
stored on the database server. Siperian Hub batch jobs are encoded in stored
procedures so that they can be run using job execution scripts in job scheduling
software (such as Tivoli or CA Unicenter).
stripping
strip table
system column
A column in a table that Siperian Hub automatically creates and maintains. System
columns contain metadata. Common system columns for a base object include
ROWID_OBJECT, CONSOLIDATION_IND, and LAST_UPDATE_DATE. See
column, user-defined column.
Describes how base object records are supported by Siperian Hub. The following states
are supported: ACTIVE, PENDING, and DELETED. See state management.
Systems and Trust tool is a design-time tool used to name the source systems that can
provide data for consolidation in Siperian Hub. You use this tool to define the trust
settings associated with each source system for each trust-enabled column in a base
object.
table
target database
In the Hub Console, the Master Database or an Operational Record Store (ORS) that
is the target of the current tool. Tools that manage data stored in the Master Database,
such as the Users tool, require that your target database is the Master Database. Tools
that manage data stored in an ORS require that you specify which ORS to
tokenizing
Glossary 1035
token table
traceability
The maintenance of data so that you can determine which systems—and which records
from those systems—contributed to consolidated records.
transactional data
tree unmerge
trust
Mechanism for measuring the confidence factor associated with each cell based on its
source system, change history, and other business rules. Trust takes into account the
age of data, how much its reliability has decayed over time, and the validity of the data.
For a source system that provides records to Siperian Hub, a number between 0 and
100 that assigns a level of confidence and reliability to that source system, relative to
other source systems. The trust level has meaning only when compared with the trust
level of another source system.
trust score
The current level of confidence in a given record. During load jobs, Siperian Hub
calculates the trust score for each records. If validation rules are defined for the base
object, then the Load job applies these validation rules to the data, which might further
downgrade trust scores. During the consolidation process, when two records are
candidates for merge or link, the values in the record with the higher trust score wins.
Data stewards can manually override trust scores in the Merge Manager tool.
undermatching
For fuzzy-match base objects only, a match that results in too few matches, which
misses relevant matches. When configuring match, the goal is to find the optimal
number of matches for your data. See overmatching.
unmerge
user
An individual (person or application) who can access Siperian Hub resources. Users are
represented in Siperian Hub by user accounts, which are defined in the Master Database.
See user group, Master Database.
Glossary 1037
user-defined column
Any column in a table that is not a system column. User-defined columns are added in
the Schema Manager and usually contain business data. See column, system column.
user exit
Developers can extend Siperian Huh batch processes by adding custom code to the
appropriate user exit procedure for pre- and post-batch job processing. See stored
procedure.
user group
user object
User-defined functions or procedures that are registered with the Siperian Hub to
extend its functionality. There are four types of user objects:
Includes tools for auditing application event, configuring and running batch groups,
and generating the SIF APIs.
validation process
Process of verifying the completeness and integrity of the metadata that describes a
repository. The validation process compares the logical model of a repository with its
physical schema. If any issues arise, the Metadata Manager generates a list of issues
requiring attention. See also Metadata Manager.
validation rule
Rule that tells Siperian Hub the condition under which a data value is not valid. When
data meets the criteria specified by the validation rule, the trust value for that data is
downgraded by the percentage specified in the validation rule. If the Reserve Minimum
Trust flag is set for the column, then the trust cannot be downgraded below the
column’s minimum trust.
workbench
In the Hub Console, a mechanism for grouping similar tools. A workbench is a logical
collection of related tools. For example, the Cleanse workbench contains
cleanse-related tools: Cleanse Match Server, Cleanse Functions, and Mappings.
write lock
In the Hub Console, a lock that is required in order to make changes to the underlying
schema. All non-data steward tools (except the ORS security tools) are in read-only
mode unless you acquire a write lock. Write locks allow multiple, concurrent users to
make changes to the schema. See exclusive lock.
Glossary 1039
1040 Siperian Hub Administrator Guide
Index
A events 920
log entries, examples of 934
Accept Non-Matched Records As Unique message queues 928
jobs 715, 760 password changes 922
ACTIVE system state, about 206 purging the audit log 935
Address match purpose 553 systems to audit 923
Address_Part1 key type 521 viewing the audit log 933
Admin source system XML 921
about the Admin source system 349 authentication
renaming 353 about authentication 832
allow null foreign key 371 external authentication providers 833
allow null update 370 external directory authentication 833
ANSI Code Page 946 internal authentication 833
asynchronous batch jobs 755 authorization
audience xxv about authorization 833
Audit Manager external authorization 833
about the Audit Manager 921 internal authorization 833
starting 922 Auto Match and Merge jobs 716, 762
types of items to audit 923 Autolink jobs 715, 762
audit trails, configuring 399
Automerge jobs 717, 764
auditing Auto Match and Merge jobs 718
about integration auditing 920
API requests 926
audit log 930 B
audit log table 931
base object style 106
Audit Manager tool 921
base objects
authentication and 921
adding columns 90
configurable settings 924
converting to entity base objects 244
enabling 921
creating 107
errors 929
1041
defined 995 batch jobs
defining 94 about batch jobs 668
deleting 116 Accept Non-Matched Records As Unique
described 83 715, 760
editing 108 asynchronous execution 755
exact match base objects 320 Auto Match and Merge jobs 716, 762
fuzzy match base objects 320 Autolink jobs 715, 762
history table 101 automatically-created batch jobs 672
impact analysis 115 Automerge jobs 717, 764
load inserts 307 BVT Snapshot jobs 719
load updates 309 C_REPOS_JOB_CONTROL table 757
overview of 94 C_REPOS_JOB_METRIC table 757
record survivorship, state management C_REPOS_JOB_METRIC_TYPE
211 table 757
relationship base objects 498 C_REPOS_JOB_STATUS_TYPEC
reserved suffixes 88 table 757
reverting from relationship base objects C_REPOS_TABLE_OBJECT_V
264 table 754
style 106 clearing history 687
system columns 95 command buttons 680
batch groups configurable options 679
about batch groups 798 configuring 667
adding 691 design considerations 671
cmxbg.execute_batchgroup stored proce- executing 680
dure 799 executing, about 750
cmxbg.get_batchgroup_status stored pro- execution scripts 750
cedure 803 External Match jobs 719, 766
cmxbg.reset_batchgroup stored proce- foreign key relationships and 671
dure 802 Generate Match Token jobs 767
deleting 693 Generate Match Tokens jobs 725
editing 693 Hub Delete jobs 726
executing 701 job execution logs 682
executing with stored procedures 798 job execution status 682
levels, configuring 694 Key Match jobs 727, 773
stored procedures for 799 Load jobs 727, 775
1043
outputs 435 modes 407
properties of 416 on-line operations 408
regular expression functions 422 properties of 410
secure resources 415 testing 413
testing 437 cleansing data
types 416 about cleansing data 406
types of 414 setup tasks 406
user libraries 418 Unicode settings 945
using 414 clearing history of batch jobs 687
workspace buttons 433 cmxbg.execute_batchgroup 799
workspace commands 432 cmxbg.get_batchgroup_status 803
Cleanse Functions tool cmxbg.reset_batchgroup 802
starting 415 cmxue package
workspace buttons 433 user exits for Oracle databases 956
workspace commands 432 CMXUT.CLEAN_TABLE
cleanse lists removing BO data 810
about cleanse lists 440 color choices window 267
adding 441 columns
editing 442 adding to tables 125
exact match 445 data types 126
match output strings, importing 449 properties of 127
match strings, importing 445 reserved names 89
regular expression match 445 command buttons 43
SQL match 445 complete tokenize ratio 102
string matches 445 concepts 3
Cleanse Match Servers conditional execution components
about Cleanse Match Servers 407 about conditional execution
adding 411 components 438
batch jobs 408 adding 439
Cleanse Match Server tool 409 when to use 438
cleanse requests 408 conditional mapping 394
configuring 407 configuration requirements
deleting 413 User Object Registry, for custom code
distributed 408 910
editing 412 Configuration workbench 48, 990
1045
example code 811 decay periods 1003
index, registering 809 decay types
parameters of 807 defined 1003
registering 808 linear 460
viewing 914 RISL 460
SIRL 460
DELETED system state, about 207
D DELETED_BY column 95, 99, 365
data cleansing DELETED_DATE column 96, 99, 365
about data cleansing 406 DELETED_IND column 95, 99, 365
Cleanse Match Servers 407 delta detection
defined 1002 configuring 401
Data Manager 1002 considerations for using 403
Data Steward workbench 50 defined 1003
data stewards how handled 403
tools for 50 landing table configuration 357
data types 126 DEP_PKEY_SRC_OBJECT column 120
Database Debug Log DEP_ROWID_SYSTEM column 120
writing messages to 810 dependent objects
database object name, constraints 88 creating 119, 121
databases defined 1004
database ID 70 deleting 125
selecting 23 described 83, 119
target database 21 editing 123
Unicode, configuring 940 load inserts 308
user access 875 load updates 312
Databases tool 61 system columns 120
about the Databases tool 60 DIRTY_IND column 96
starting 61 display packages 197
datasources distinct mapping 393
about datasources 77 distinct source systems 596
creating 77 Division match purpose 555
JDBC datasources 77 documentation xxviii
removing 78 duplicate data
decay curve 459 eliminating 999
1047
adding 143 H
deleting 147
hierarchies
editing 145
about 253
virtual relationships 144
adding 253
fuzzy matches
configuring 253
fuzzy match / search strategy 544
deleting 254
fuzzy match base objects 320, 519
editing 254
fuzzy match columns 515
Hierarchies tool
fuzzy match strategy 493
configuration overview 225
starting 234
G Hierarchy Manager
configuration overview 225
GBID columns 129
entity icons, uploading 237
Generate Match Tokens jobs 725, 767
prerequisites 224
Generate Match Tokens on Load 730
repository base object tables 235
generating match tokens on load 104
sandboxes 281
global
upgrading from previous versions 237
password policy 877
highest reserved key 369
roles 1008
history
Global Identifier (GBID) columns 129
enabling 102
glossary 993
history tables
graph functions
base object history tables 101
about graph functions 424
cross-reference history tables 101
adding 425
defined 84, 1009
adding functions to 427
enabling 108
conditional execution components 438
HM packages
inputs 425
about HM packages 269
outputs 425 adding 270
group execution logs assigning to entity types 275
status values 705
configuring 269
viewing 706 deleting 275
editing 275
Household match purpose 550
Hub
installation details 47
1049
finding out-of-sync objects 828 LAST_ROWID_SYSTEM column 96
starting 825 LAST_UPDATE_DATE column 95, 121,
JMS Event Schema Manager tool 356, 366
about 824 limited key widths 522
linear decay 460, 1013
linear unmerge 780
K Load jobs 727, 775
Key Match jobs 727, 773 forced updates, about 730
key types 521 Generate Match Tokens on Load 730
key widths 522 load batch size 103
rejected records 685
rules for running 729
L load process
land process data flow 300
C_REPOS_SYSTEM table 349 load inserts 306
configuration tasks 348 load updates 306
data flow 293 overview 299
external batch process 294 steps for managing data 304
extract-transform-load (ETL) tool 294 tables, associated 301
landing tables 292 loading by rowid 394
managing 294 loading data 453
overview 292 incremental loads 302
real-time processing (API calls) 294 initial data loads (IDLs) 302
source systems 292 locking
ways to populate landing tables 294 expiration 29
landing tables locks
about landing tables 355 about locks 28
adding 358 types of 28
columns 356 login
defined 83, 1013 changing 32
editing 360 entering 20
properties of 351, 357 lookups
removing 361 about lookups 376
Unicode 945 configuring 377
1051
string matches in cleanse lists 445 data flow 319
types in match columns 1018 exact match base objects 320
Match Analyze jobs 785 execution sequence 329
match column rules fuzzy match base objects 320
adding 565 managing 333
deleting 572 match key table 321
editing 570 match pairs 331
match columns 1016 match rules 320
about match columns 515 match table 331
exact match base objects 527 match tables 321
exact match columns 515 overview 317
fuzzy match base objects 519 populations 326
fuzzy match columns 515 support tables 321
key widths 522 transitive matches 327
match key types 521 match purposes
missing children 507 field types 517
Match for Duplicate Data jobs 786 match rule sets
Match jobs 734, 783 about match rule sets 531
state-enabled BOs 735 adding 538
match key tables deleting 542
defined 84 editing 539
match link editing the name 541
Autolink jobs 715 filters 536
Manual Link jobs 732 properties of 534
Manual Unlink jobs 733 search levels 534
Migrate Link Style to Merge Style jobs match rules 1017
740 about match rules 320
Reset Links jobs 744 accept limit 558
match paths defining 542
about match paths 497 exact match columns 563
inter-table paths 498 match / search strategy 544
intra-table paths 502 match levels 558
relationship base objects 498 match purposes
match process about 545
build match groups (BMGs) 327 Address match purpose 553
1053
merge message 632, 652 Multi Merge jobs 741
merge update message 633
no action message 634
PendingInsert message 635 N
PendingUpdate message 636 narrow search level 534
PendingUpdateXref message 637 New Query Wizard 167, 191
unmerge message 639 NLS_LANG 947
update message 640 non-equal matching 560
update XREF message 641 non-exclusive locks 28
XRefDelete message 642 NULL matching 561
XRefSetToDelete message 643 null values
examples (legacy) allowing null values in a column 127
accept as unique message 646
bo delete message 647
bo set to delete message 648 O
delete message 649 object record stores (ORSs)
insert message 651 logical names 818, 824
merge update message 653 OBJECT_FUNCTION_TYPE_DESC
pending insert message 654 list of values 753
pending update message 655 Operational Record Stores (ORS)
pending update XREF message 656 about ORSs 56
unmerge message 660 assigning users to 886
update message 657 configuring 62
update XREF message 658 connection testing 71
XREF delete message 661 creating 58
XREF set to Delete message 662 editing 69
filtering 625, 645 editing registration properties 67
message fields 644 GETLIST limit (rows) 70
metadata JNDI data source name 70
synchronizing 138 password, changing 73
trust 138 registering 62
Migrate Link Style to Merge Style jobs 740 unregistering 76
minimum trust 459, 1021 Oracle databases
missing children, checking for 507 user exits located in cmxue package 956
Model workbench 49 Organization match purpose 554
1055
preferred key widths 522 run-time flow 345
preserving source system keys 368 XSD file 344
primary key match rules purposes, match 545
about 578 PUT_UPDATE_MERGE_IND column 99
adding 578 PUT-enabled packages 197
deleting 582
editing 581
primary keys 1024 Q
Private password policy 880 queries
Processes view 26 about queries 162
product support xxxi adding 166
profiles columns 174
about profiles 278 conditions 178
adding 278 custom queries 190
copying 282 deleting 195
deleting 283 editing 168
editing 280 impact analysis, viewing 194
validating 280 join queries 204
Promote batch job New Query Wizard 167, 191
promoting records 218 overview of 162
Promote jobs 741 packages and queries 162
about 790 Queries tool 162, 164, 198
providers results, viewing 193
custom-added 899 sort order for results 183
providers.properties file SQL, viewing 190
example 901 tables 170
publish process Queries tool 164, 198
distribution flow 343 query groups
managing 346 about query groups 164
message queues 344 adding 165
message triggers 343 deleting 166
optional 343 editing 165
ORS-specific schema file 344
overview 342
1057
context menu 155 uploading 893
Diagram pane 149 security provider, defined 1032
hierarchic view 153 Security Providers tool
options 156 about security providers 889
orientation 156 provider files 892
orthogonal view 154 starting 890
Overview pane 149 segment matching 562
panes 149 sequencing batch jobs 670
printing 158 SIF API
saving as JPG 157 ORS-specific, generating 818
starting 148 ORS-specific, removing 823
toggling views 154 ORS-specific, renaming 821
zooming all 152 SIF Manager
zooming in 150 generating ORS-specific APIs 818
zooming out 151 out-of-sync objects, finding 823
schemas SIF Manager tool
about schemas 82 about 818
search levels for match rule sets 534 SIRL decay 1032
secure resources 1031 source systems
security about source systems 348
authentication 832 adding 352
authorization 833 Admin source system 349
concepts 832 defined 1033
configuring 831 defining 348
defined 1031 distinct source systems 596
JDBC data sources, configuring 880 highest reserved key 369
roles 854 immutable source systems 594
tools 50 preserving keys 368
Security Access Manager (SAM) 832 removing 354
Security Access Manager workbench 50 renaming 353
security provider files system repository table
about security provider files 892 (C_REPOS_SYSTEM) 349
deleting 895 Systems and Trust tool, starting 350
list of provider files 892 SRC_LUD column 99
selecting 893 SRC_ROWID column 366
1059
system repository table 349 reject tables 383
system states staging tables 83
about 206 supporting tables used by batch
system tables, showing 39 process 669
Systems and Trust tool 350 system repository table
(C_REPOS_SYSTEM) 349
target database
T changing 31
table columns selecting 21
about table columns 126 technical support xxxi
adding 134 tokenization 1035
deleting 139 tokenization process
editing 137 about the tokenization process 322
Global Identifier (GBID) columns 129 DIRTY_IND column 323
importing from another table 135 match keys 322
staging tables 130 match tokens 322
tables when to execute 322
adding columns to 125 Tool Access tool 990
base objects 83 tools
C_REPOS_AUDIT table 931 Batch Viewer tool 674
C_REPOS_JOB_CONTROL table 757 Cleanse Functions tool 415
C_REPOS_JOB_METRIC table 757 Data Steward tools 50
C_REPOS_JOB_METRIC_TYPE Databases tool 61
table 757 described 48
C_REPOS_JOB_STATUS_TYPE Mappings tool 746
table 757 Merge Manager tool 336
C_REPOS_TABLE_OBJECT_V Queries tool 162, 164, 198
table 751 Schema Manager 90
control tables 457 security tools 50
cross-reference tables 84 Tool Access tool 990
dependent objects 83 user access to 989
history tables 84 Users tool 868
Hub Store 83 utilities tools 51
landing tables 83 write locks 28
match key tables 84 traceability 337
1061
custom Java cleanse functions, viewing custom validation rules 473, 477
915 defined 468, 1039
custom stored procedures, viewing 914 defining 468
starting 911 domain checks 473
user exits, viewing 913 downgrade percentage 474
user objects editing 480, 481
about 910 enabling columns for validation 470
users examples of 476
about users 867 execution sequence 471
adding 869 existence checks 473
assigning to Operational Record Stores pattern validation 473
(ORS) 886 properties of 473
database access 875 referential integrity 473
deleting 874 removing 482
editing 870 required columns 469
external application users 867 reserve minimum trust 474
global password policies 877 rule column properties 474
password settings 874 rule name 473
private password policies 879 rule SQL 474
properties of 868 rule types 473
supplemental information 872 state-enabled base objects 469
tool access 989 validation checks 468
types of users 867
user accounts 867
Users and Groups tool 882 W
Users tool 868 Web Services Description Language
utilities tools 51 (WSDL)
Utilities workbench 51 ORS-specific APIs 821
Wide_Contact match purpose 557
Wide_Household match purpose 552
V Workbenches view 25
validation checks 468 workbenches, defined 24
validation rules write lock
about validation rules 468 acquiring 30
adding 478 releasing 30
1063
1064 Siperian Hub Administrator Guide