Informatica MR .Reddy
Informatica MR .Reddy
Informatica MR .Reddy
1. Unit Testing –
This type of testing is being performed at the developer’s end. In unit testing,
each unit/component of modules is separately tested. Each modules of the
whole data warehouse, i.e. program, SQL Script, procedure,, Unix shell is
validated and tested.
Integration testing is the process of testing the interface between two software
units or module. It’s focus on determining the correctness of the interface. The
purpose of the integration testing is to expose faults in the interaction between
integrated units. Once all the modules have been unit tested, integration testing
is performed.
In other words, we verify whether the important features are working and there
are no showstoppers in the build that is under testing.
It is a mini and quick regression test of major functionality. Smoke testing shows
that the product is ready for testing. This helps in determining if the build is
flawed as to make any further testing a waste of time and resources.
It is a way of software testing in which the It is a way of testing the software in which
internal structure or the program or the the tester has knowledge about the internal
code is hidden and nothing is known about structure or the code or the program of the
it. software.
No knowledge of implementation is
needed. Knowledge of implementation is required.
This testing can be initiated on the basis of This type of testing of software is started
requirement specifications document. after detail design document.
required. programming.
It is the behavior testing of the software. It is the logic testing of the software.
Can be done by trial and error ways and Data domains along with inner or internal
methods. boundaries can be better tested.
Differences between White Box Testing and Gray Box Testing:
White Box Testing Gray Box Testing
High programming skills are required basic programming skills are enough to
to perform white box testing. perform this testing.
The difference between Alpha and Beta Testing is as follow:
Alpha testing may require long Beta testing requires only a few weeks of
execution cycle. execution.
In unit testing each module of the In integration testing all modules of the
software is tested separately. the software are tested combined.
In unit testing tester knows the In integration testing doesn’t know the
internal design of the software. internal design of the software.
Unit testing is a white box testing. Integration testing is a black box testing.
It tests parts of the project without It tests only after the completion of all
waiting for others to be completed. parts.
ODS means for operational reporting and supports current or near real- A data warehouse is intended for histo
time reporting requirements.
It is used for detailed decision making and operational reporting. It is used for long term decision making
It serves as conduct for data between operational and analytics system. It serves as a repository for cleansed an
It is updated often as the transactions system generates new data. It is usually updated in batch processin
Additive: Additive facts are facts that can be summed up through all of
the dimensions in the fact table.
Semi-Additive: Semi-additive facts are facts that can be summed up for
some of the dimensions in the fact table, but not the others.
Non-Additive: Non-additive facts are facts that cannot be summed up
for any of the dimensions present in the fact table.
SCD3: It keeps the both current and previous values only in the target.
The fact table maintains one-to-many relations with all the dimension tables. Every row in a
fact table is associated with its dimension table rows with a foreign key reference.
Due to the above reason, navigation among the tables in this model is easy for querying
aggregated data. An end-user can easily understand this structure. Hence all the Business
Intelligence (BI) tools greatly support the Star schema model.
While designing star schemas the dimension tables are purposefully de-normalized. They
are wide with many attributes to store the contextual data for better analysis and reporting.
From the above Star schema example, if a business user wants to know how many Novels
and DVDs have been sold in the state of Kerala in January in 2018, then you can apply the
query as follows on Star schema tables:
Novels 12,702
DVDs 32,919
Hope you understood how easy it is to query a Star Schema.
While designing SnowFlake schemas the dimension tables are purposefully normalized.
Foreign keys will be added to each level of the dimension tables to link to its parent
attribute. The complexity of the SnowFlake schema is directly proportional to the hierarchy
levels of the dimension tables.
Different levels of hierarchies from the above diagram can be referred to as follows:
Quarterly id, Monthly id, and Weekly ids are the new surrogate keys that are created
for Date dimension hierarchies and those have been added as foreign keys in the
Date dimension table.
State id is the new surrogate key created for Store dimension hierarchy and it has
been added as the foreign key in the Store dimension table.
Brand id is the new surrogate key created for the Product dimension hierarchy and it
has been added as the foreign key in the Product dimension table.
City id is the new surrogate key created for Customer dimension hierarchy and it has
been added as the foreign key in the Customer dimension table.
Querying A Snowflake Schema
We can generate the same kind of reports for end-users as that of star schema structures
with SnowFlake schemas as well. But the queries are a bit complicated here.
From the above SnowFlake schema example, we are going to generate the same query
that we have designed during the Star schema query example.
That is if a business user wants to know how many Novels and DVDs have been sold in the
state of Kerala in January in 2018, you can apply the query as follows on SnowFlake
schema tables.
Novels 12,702
DVDs 32,919
Points To Remember While Querying Star (or) SnowFlake Schema Tables
Any query can be designed with the below structure:
SELECT Clause:
The attributes specified in the select clause are shown in the query results.
The Select statement also uses groups to find the aggregated values and hence we
must use group by clause in the where condition.
FROM Clause:
All the essential fact tables and dimension tables have to be chosen as per the
context.
WHERE Clause:
Appropriate dimension attributes are mentioned in the where clause by joining with
the fact table attributes. Surrogate keys from the dimension tables are joined with the
respective foreign keys from the fact tables to fix the range of data to be queried.
Please refer to the above-written star schema query example to understand this. You
can also filter data in the from clause itself if in case you are using inner/outer joins
there, as written in the SnowFlake schema example.
Dimension attributes are also mentioned as constraints on data in the where clause.
By filtering the data with all the above steps, appropriate data is returned for the
reports.
As per the business needs, you can add (or) remove the facts, dimensions, attributes, and
constraints to a star schema (or) SnowFlake schema query by following the above
structure. You can also add sub-queries (or) merge different query results to generate data
for any complex reports.
This type of schema is used for sophisticated requirements and for aggregated fact tables
that are more complex to be supported by the Star schema (or) SnowFlake schema. This
schema is difficult to maintain due to its complexity.
Star schema is the base to design a star cluster schema and few essential dimension tables
from the star schema are snowflaked and this, in turn, forms a more stable schema
structure.
You can go ahead with the SnowFlake schema either if you want to save some storage
space or if your DW system has optimized tools to design this schema.
S.N
Star Schema Snow Flake Schema
o
2 Storage space for dimension tables is more. Storage space for dimension tables is comparativ
less.
4 Single fact table is surrounded by multiple Single fact table is surrounded by multiple hierar
dimension tables. dimension tables.
5 Queries use direct joins between fact and Queries use complex joins between fact and dim
dimensions to fetch the data. to fetch the data.
7 Anyone can easily understand and design the It is tough to understand and design the schema.
schema.
1. Xplenty –
Xplenty is a cloud-based ETL solution which requires no coding and
provides simple visualized interface for performing ETL activities. It also
connects with a large variety of data sources.
2. IBM – DataStage –
It is a business intelligence tool for integrating data across various enterprise
systems, it is part of IBM information platforms solution suite it uses
visualized notation to making etl processes, it is a powerful data integration
tool.
3. Informatica –
Informatica is leading market in data integration, Informatica’s suite of data
integration software includes PowerCenter, which is known for its strong
automation capabilities. Informatica PowerCenter is developed by
Informatica Corporation. Informatica PowerCenter can connect to many
sources for fetching data for data integration.
Informatica PowerCenter have four client tools which is used in development
process.
PowerCenter Designer
Workflow Manager
Workflow Monitor
Repository Manager
4. Microsoft SQL Server SSIS –
Microsoft offers SSIS, a graphical interface for managing ETL using MS SQL
Server. SSIS have user friendly interface, allowing users to deploy integrated
data warehousing solutions without having to get involved with writing lots of
code. SSIS is a fast and flexible data warehousing tool. The graphical
interface allows for easy drag-and-drop ETL for multiple data types and
warehouse destinations.
5. Talend –
Talend is open source software which integrate, cleanse profile data and
helps you get business insights easily. Talend has a GUI that enables
managing a large number of source systems. This tool has Master Data
Management(MDM) functionality. It also provides metadata repository using
which user can easily re-use work.
6. Azure Data Factory –
Microsoft Azure Data Factory is a cloud based data integration service that
automates the ETL process. We can say it is SSIS in the cloud because they
share same idea but SSIS provide more powerful GUI, debugging and
intelligence tools.
7. Oracle Data Integrator –
Oracle Data Integrator is based on Extract, load and transform (ELT)
architecture which means it performs load first then transform data. This tool
is produced by Oracle that offers a graphical environment and it is also very
cost effective.
8. data junction
9. warehouse builder.
SURROGATE KEY:
A surrogate key is like a artificial primary key which is generated automatically
by the system and the value of surrogate key is numeric and it is
automatically incremented for each new row.
Generally, a DBMS designer needs a surrogate key when the primary key is
used inappropriately.
Features of the surrogate key :
It is automatically generated by the system.
It holds anonymous integer.
It contains unique value for all records of the table.
The value can never be modified by the user or application.
Surrogate key is called the fact less key as it is added just for our ease of
identification of unique values and contains no relevant fact(or information)
that is useful for the table.
fact table.