Testing 2

Data Warehouse
• Typically a relational database that is designed for query and

analysis rather than for transaction processing
• A place where historical data is stored for archival, analysis
and security purposes.
• Contains either raw data or formatted data
• Combines data from multiple sources
• Sales
• Salaries
• Operational data
• Human resource data
• Inventory data
• Web logs
• Social networks
• Internet text and docs
• Other
Why build a Data Warehouse?
• Data stored in operational systems (OLTP) not easily accessible
• OLTP systems are not designed for end-user analysis
• The data in OLTP is constantly changing
• May lack of historical data

The Data Warehouse Business Solution
• Collects data from different sources (other databases, files, web
services, etc)
• Integrates data into logical business areas
• Provides direct access to data with powerful reporting tools (BI)

Data Warehouse – about the data
The Data Warehouse data
• Subject-oriented
• Integrated
• Non-volatile
• Time-variant
Data Warehouse – the ETL process
Source Data ETL Process Target DW

Extract
Legacy
DB Transform
CRM/ER
P DB
Finance
DB
Load
Testing a DW – Resources Involved
Resources involved
• Business Analysts create requirements
• QA Testers develop and execute test plans and test

cases. ***Skill Set required: Very strong SQL!!!
• Architects set up test environments
• Developers perform unit tests
• Business Users perform functional User Acceptance

Tests
Testing the DW– Manual Testing Flow
Tools Tasks
Review
Mapping
Docs
Write SQL
in favorite Run
editor TESTs
Compare
Dump results
results to a manually or
file w/compare
tool
Report
Defects
and issues
Timeline
Unit testing involves the following
 Check the Mapping of fields present in staging
level.
 Check for the duplication of values generated
using Sequence generator.
 Check for the correctness of surrogate keys,
which uniquely identifies rows in database.
 Check for Data type constraints of the fields
present in staging and core levels.
 Check for string columns are left and right
trimmed.
Etc.
Database Testing Data Warehouse testing
Smaller in Scale Large scale. Voluminous Data
Usually used to test data at the Includes several facts. Extraction ,

source Transformation & Loading
mechanisms being
the major ones
Usually Homogeneous data Heterogeneous data involved
Normalized data. De normalized data
CRUD Operations Usually Read-only operations

list of objects that are treated as essential for
validation in ETL testing:
- Verify that data transformation from source to
destination works as expected
- Verify that expected data is added in target
system
- Verify that all DB fields and field data is loaded
without any truncation
- Verify data checksum for record count match
- Verify that for rejected data proper error logs
are generated with all details
 - Verify NULL value fields
- Verify that duplicate data is not loaded
- Verify data integrity
 ETL Testing Challenges:
 ETL testing is quite different from conventional
testing. There are many challenges we faced
while performing data warehouse testing.
- Incompatible and duplicate data.
- Loss of data during ETL process.
- Testers have no privileges to execute ETL jobs
by their own.
- Volume and complexity of data is very huge.
- Missing business flow information.
 Mapping doc validation
 Verify mapping doc whether corresponding ETL information is
provided or not. Change log should maintain in every mapping doc.
Define the default test strategy If mapping docs are missed out
some optional information. Ex: data types length etc
Ø Structure validation
 1. Validate the source and target table structure against
corresponding mapping doc.
2. Source data type and Target data type should be same.
3. Length of data types in both source and target should be equal.
4. Verify that data field types and formats are specified
5. Source data type length should not less than the target data type
length.
6. Validate the name of columns in table against mapping doc.

 Ø Constraint Validation
 Ensure the constraints are defined for specific table as expected.

Ø Data Consistency Issues

 1. The data type and length for a particular attribute may vary in
files or tables though the semantic definition is the same.
Example: Account number may be defined as: Number (9) in
one field or table and Varchar2(11) in another table
2. Misuse of Integrity Constraints: When referential integrity
constraints are misused, foreign key values may be left
“dangling” or
inadvertently deleted.
Example: An account record is missing but dependent records
are not deleted.
 Data Completeness Issues
 Ensures that all expected data is loaded in to
target table
1. Compare records counts between source and
target. Check for any rejected records.
2. Check Data should not be truncated in the
column of target table.
3. Check boundary value analysis (ex: only
>=2008 year data has to load into the target)
 Data Correctness Issues
 1. Data that is misspelled or inaccurately
recorded.
2. Null, non-unique, or out of range data may

be stored when the integrity constraints are
disabled.
Example: The primary key constraint is
disabled during an import function. Data is
entered into the existing data with
null,unique identifiers.
 Null Validation
 Verify the null values where "Not Null" specified for
specified column.

Ø Duplicate check
 1. Needs to validate the unique key, primary key and
any other column should be unique as per the
business requirements are having any duplicate rows.
2. Check if any duplicate values exist in any column
which is extracting from multiple columns in source
and combining into one column.
 Complete Data Validation
(using minus and intersect)
 1. To validate the complete data set in source and target table minus query is best solution.

2. We need to source minus target and target minus source.
 3. If minus query returns any value those should be considered as mismatching rows.
 4. And also we needs to matching rows among source and target using Intersect statement.
 5. The count returned by intersect should match with individual counts of source and target
tables.
 6. If minus query returns o rows and count intersect is less than source count or target table
count then we can considered as duplicate rows are exists.

 Some Useful testscenarios
 1. Verify that extraction process did not
extract duplicate data from the source
(usually this happens in repeatable processes
where at point zero we need to extract all
data from the source file, but the during the
next intervals we only need to capture the
modified, and new rows.)

 Data cleanness
 Unnecessary columns should be deleted before loading
into the staging area.
Example2: If a column have name but it is taking extra
space , we have to “trim” space so before loading in the
staging area with the help of expression transformation
space will be trimmed.
Example1: Suppose telephone number and STD code in
different columns and requirement says it should be in
one column then with the help of expression
transformation we will concatenate the values in one
column.


Testing 2

Uploaded by

Copyright:

Available Formats

Testing 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Testing 2

Uploaded by

Copyright:

Available Formats

Data Warehouse

• Typically a relational database that is designed for query and

• OLTP systems are not designed for end-user analysis

• The data in OLTP is constantly changing

• May lack of historical data

• Integrates data into logical business areas

• Provides direct access to data with powerful reporting tools (BI)

The Data Warehouse data

Source Data ETL Process Target DW

• Business Analysts create requirements

• QA Testers develop and execute test plans and test

• Architects set up test environments

• Developers perform unit tests

• Business Users perform functional User Acceptance

Usually used to test data at the Includes several facts. Extraction ,

Normalized data. De normalized data

CRUD Operations Usually Read-only operations

Ø Data Consistency Issues

2. Null, non-unique, or out of range data may

You might also like