Teradata Certification Associate Teaser

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

The Tera-Tom Video Series

Lessons with Tera-Tom


Teradata Architecture and SQL Video Series
These exciting videos make learning and certification much easier

Three ways to view them:


1. Safari (now called O’Reilly) (look up Coffing Studios)
2. CoffingDW.com (sign-up on our website)
3. Your company can buy them all for everyone to see (contact [email protected])
The Tera-Tom Genius Series

The Tera-Tom Genius Series consists of ten Teradata and Aster Data books. Each book design is for a specific
audience level. The books take a building block approach; always starting simple, then each page builds upon the
previous point. Order them all at www.CoffingDW.com or from your favorite book site.
Tera-Tom- Author of over 50 Teradata Books

Who is the number one technical book author in the world? Tera-Tom! He has written over 75 books on all big
data technologies. His books have helped to educate millions of people worldwide. What people love the most
about the Tera-Tom books is that they are so easy that a seven-year-old boy (raised by wolves) can understand!
Tera-Tom - Author of 25 Books on Other Major Platforms

Writing 75 books on so many different technologies has given me insight into how computer systems work.
The Nexus and Nexus Core Server for the Cloud and On-Premises

Some software companies specialize in query tools, while others focus on data movement. Some companies join
data across different platforms, yet others specialize in gathering data from different systems to build analytics
inside the user’s PC. Some vendors focus on business users through visualization and point-and-click capabilities,
while others focus on the IT developers or DBA teams. And some companies choose to build their applications for
the desktop, while others run their software on servers.

We designed and built the Nexus and NexusCore Server to do all of the above, plus more. The brilliance with our
software approach is that we allow the user to do everything from their desktop and then execute on their PC or
from a server. We call the desktop application the Nexus and the server the NexusCore Server. Users can run
queries, data movement, or cross-system joins (Federated queries) and run them from the PC on smaller data or
from any NexusCore Server on larger data. Nobody in the world has built anything like this! Try out a free trial at
www.CoffingDW.com.
Trademarks and Copyrights

Teradata*, NCR, BYNET, and SQL Assistant have registered trademarks of Teradata Corporation, Dayton, Ohio,
U.S.A., IBM, DB2 and Netezza are registered trademarks of IBM Corporation, ANSI is a registered trademark of
the American National Standards Institute. Microsoft Windows, Windows 2003 Server, .NET, PDW, and SQL
Server, are trademarks of Microsoft. Ethernet is a trademark of Xerox. UNIX is a trademark of The Open Group.
Linux is a trademark of Linus Torvalds. Java and Oracle is a trademark of Oracle. ParAccel is a trademark of
ParAccel. Kognitio is a trademark of Kognitio. Greenplum is a trademark of EMC Corporation. Nexus Query
Chameleon is a trademark of Coffing Data Warehousing.

Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity concerning any loss
or damages arising from the information contained in this book or from the use of programs or program segments.
The manual is not a publication of Teradata Corporation, nor was it produced in conjunction with Teradata
Corporation.

Copyright © September 2019 by Coffing Publishing

ISBN 978-1-940540-40-5 Teradata Certification Associate Exam

All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any
means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the
publisher — no assumption for patent liability concerning the use of the information contained herein. Although
we took seriously every precaution in the preparation of this book, the publisher and author assume no
responsibility for errors or omissions, neither is any liability assumed for damages resulting from the use of the
information contained herein.
About Tom Coffing

Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing, where he has been CEO for the past 20 years.
Tom has written over 50 books on all aspects of Teradata, Netezza, Kognitio, Redshift, ParAccel, Vertica, SQL Server, and
Greenplum. Tom has taught over 1,000 Teradata classes in places such as India, Africa, Europe, China, Malaysia, and throughout
North America.

Tom is also the owner and designer of the Nexus Query Chameleon, the most sophisticated enterprise query tool in the industry. The
Nexus works on all platforms, including Hadoop, converts table structures between all systems, and allows companies to load their
Erwin logical model inside Nexus. The Nexus guides users like GPS. Users point and click on any table or view from any system, and
then shown what joins to what. As users choose the columns they want on their report, the SQL creates automatically.

In High School, Tom was the first athlete from his school to ever place at state. He was selected by his school to represent them at
Buckeye Boys State, and Tom earned his induction into the first class of the Lakota High School Hall of Fame.

At the University of Arizona and University of Nevada Las Vegas, Tom was a two-time All-American wrestler, Sophomore Athlete of
the year, and a two-time winner of the 1980 Olympic wrestling trials. Tom graduated with a bachelor’s degree in Speech
Communications.

After college, Tom became a state and national champion speech winner for Toastmasters and won two orchid awards as an actor.
Tom is the proud father of three wonderful children and is married for the past 38 years.

You can contact Tom at 513 300-0341 or [email protected].


Table of Contents

Contents
Chapter 1 – Database Concepts ................................................................................................................................ 3
Teradata Certification Exams, Releases, and Websites .......................................................................................... 4
Teradata Database Certification and Exam Path .................................................................................................... 5
Teradata Associate Exam Objectives and Section Weightings .............................................................................. 6
What is Parallel Processing? ................................................................................................................................. 7
What is a Database? .............................................................................................................................................. 8
What is SQL? ........................................................................................................................................................ 9
How Can You Use a Database?............................................................................................................................10
There are Two Types of Databases ......................................................................................................................11
Relational Database Model and Mathematical Set Theory Concepts ....................................................................12
Relational Term Review.......................................................................................................................................13
Rules for Teradata Databases, Tables, and Columns ............................................................................................14
What is a Constraint? ...........................................................................................................................................15
What is the Difference Between a Primary Key and a Foreign Key? ....................................................................16
Referential Integrity .............................................................................................................................................17
Quiz - Logical Model Primary Key Quiz .............................................................................................................18
Answer - Logical Model Primary Key Quiz .........................................................................................................19
Quiz - Logical Model Foreign Key Quiz ..............................................................................................................20
Answer - Logical Model Foreign Key Quiz .........................................................................................................21
Dimensional Model – One Fact Table and Many Dimension Tables ....................................................................22
Quiz – What is the Primary Key of the Fact Table? .............................................................................................23
Answer – What is the Primary Key of the Fact Table? .........................................................................................24
Quiz – Name the Foreign Keys ............................................................................................................................25
Answer – Name the Foreign Keys ........................................................................................................................26
What is a Star Schema? ........................................................................................................................................27
Table of Contents

Normalization ......................................................................................................................................................28
A Denormalized Table .........................................................................................................................................29
Getting to First Normal Form ...............................................................................................................................30
A Table that is in Second Normal Form ...............................................................................................................31
Tables Reaching Third Normal Form ...................................................................................................................32
Dimensional Modeling .........................................................................................................................................33
What are the Advantages of a Normalized vs. Dimensional Model? ....................................................................34
Logical Data Model vs. Physical Data Model ......................................................................................................35
Logical Data Model vs. Physical Data Model Example ........................................................................................36
Data Warehouses are Designed for Analytics .......................................................................................................37
Chapter 1 Database Concepts

Chapter 1 – Database Concepts

“I find that the harder I work the more luck I seem to have.”
– Thomas Jefferson

Database Concepts - 16%


Relational concepts
Relational database models
Dimensional models
Use cases of deploying models

3
Chapter 1 Database Concepts

Teradata Certification Exams, Releases, and Websites

Associate Exam – Teradata 16.00


DBA Exam – Teradata 16.10
Advanced DBA Exam – Teradata 16.10
Developer Exam – Teradata 16.20
Advanced Developer Exam – Teradata 16.20
Architect Exam – Teradata 16.20 (available late 2019)

Teradata Certification (TCPP) Website:


www.teradata.com/certification
Bypass Exams with Prior Teradata 14 Certification information:
www.teradata.com/products-and-services/Certification/Database#waivers
Registration Information:
www.pearsonvue.com/teradata
TCCP Contact Information:
email: [email protected] phone: 1-800-845-2273

The Certification Exams use the Teradata releases above. The Teradata Certification Exams will be updated
periodically to the latest Teradata release. The websites listed above provide additional information. The first
website listed above explain the concepts and objectives for each exam.

4
Chapter 1 Database Concepts

Teradata Database Certification and Exam Path

Teradata Database
Teradata Database Certified
Certified DBA Advanced DBA

Advanced DBA Exam


DBA Exam

Teradata Database Teradata


Certified Database
Teradata Database
Teradata Database Advanced Developer Certified
Certified Associate
Certified Developer Master
Advanced Developer Exam
Associate Exam
Developer Exam
Teradata Database
Certified
System Architect

System Architect Exam

The current Teradata Certification Exams are listed above. You can earn a wide variety of Teradata Certified
Designations depending upon the tests completed. The exams require knowledge plus experience with Teradata
release 16 and higher. Many of the test questions are scenario-based, and Teradata experience is needed to answer
many of the questions, but the Tera-Tom Certification books will teach you what you need to know to give you the
best chance of passing.

5
Chapter 1 Database Concepts

Teradata Associate Exam Objectives and Section Weightings

Database Concepts - 16% Data Warehouse Architecture -17%


Relational concepts Teradata concepts and components
Relational database models Deployment options
Dimensional models Implementation types
Use cases of deploying models Data flow of data warehouses

Architecture and Components - 21% Data Distribution and Access -19%


Data flow from source to consumption Primary, Secondary, and Join Indexes
Features and benefits Data distribution
Teradata physical components Primary Index vs. Primary Key
Teradata logical components Use cases of all indexing

Security and Privacy - 19% Teradata Tools and Utilities -14%


User privacy mechanisms Teradata access tools and their uses
Security mechanisms Teradata Studio Express
Management of user privileges BTEQ
Functions and types of authentication Teradata App Center

The exam objectives and section weightings above describe the content covered on the Associate Exam.

6
Chapter 1 Database Concepts

What is Parallel Processing?


“After enlightenment, the laundry”
- Zen Proverb

Tera-Tom’s
Parallel
Processing
Wash
and Dry

“After parallel processing the laundry, enlightenment!”


-Teradata Zen Proverb

Two guys were having fun on a Saturday night when one said, “I’ve got to go and do my laundry.” The other said,
“What?!” The man explained that if he went to the laundry mat the next morning, he would be lucky to get one
machine and be there all day. But, if he went on Saturday night, he could get all the machines. Then, he could do
all his wash and dry in two hours. Now that’s parallel processing mixed in with a little dry humor!

7
Chapter 1 Database Concepts

What is a Database?

Customer_Table Order_Table
CustNo
_______ Cust Name
________________ Order_No CustNo
________ _______ Order_Date Order_Total
__________ ___________
1 Benny's Boots 1
1000 2019-01-04 299.50
2 Dave's Dogs 2
1001 2019-01-05 1254.67
3 Hanna's Hamsters 4
1002 2019-01-08 67.00
4 Frank's Frogs 5
1003 2019-01-09 12.50
5 Rachel's Rabbits

A table is made up of columns and rows. The Customer_Table has two columns and five
rows. The Order_Table has four columns and four rows.

Each row (also called a record) contains one instance of all the columns (also called
fields) in a table.

Databases provide a mechanism for organizing data. Relational databases store data in
tables. The storing, managing and retrieving of data are qualities of a database.

A table looks almost like an Excel spreadsheet because it has columns and rows, but a spreadsheet is considered a
flat file. Tables in a relational database have relations with other tables. Notice the CustNo column is in two tables.

8
Chapter 1 Database Concepts

What is SQL?
Order_Table
Order_No CustNo
________ _______ Order_Date Order_Total
__________ ___________
Select all 1000 1 2019-01-04 299.50
columns 1001 2 2019-01-05 1254.67 Select two
and sort the 1002 4 2019-01-08 67.00 columns for
data 1003 4 2019-01-09 12.50 CustNo 4

SELECT * SQL SELECT Order_Date SQL


FROM Order_Table Query ,Order_Total Query
ORDER BY Order_Total FROM Order_Table
WHERE CustNo = 4
Order_No CustNo
________ _______ Order_Date Order_Total
__________ ___________
1003 4 2019-01-09 12.50 Order_Date __________
__________ Order_Total
1002 4 2019-01-08 67.00 2019-01-09 12.50
1000 1 2019-01-04 299.50 2019-01-08 67.00
1001 2 2019-01-05 1254.67
Relational database management systems (RDBMS) use SQL (structured query language), pronounced either
(Sequel or S Q L) as a database query language. Humans communicate with the help of varying languages, but
SQL is a common language that all relational databases understand. The commonality of SQL is because of the
American National Standards Institute (ANSI), which has representatives from most RDBMS Companies. ANSI
compliant database can use a common SQL language, but each database will add extensions to their SQL to make
them special. It is the SQL that is used to create, manage, manipulate, and query database objects, such as tables or
views.

9
Chapter 1 Database Concepts

How Can You Use a Database?

Customer_Table Order_Table
CustNo
_______ CustName
________________ Order_No CustNo Order_Date
________ _______ Order_Total
__________ ___________
1 Benny's Boots 1000 1 2019-01-04 299.50
2 Dave's Dogs 1001 2 2019-01-05 1254.67
3 Hanna's Hamsters 1002 4 2019-01-08 67.00
4 Frank's Frogs 1003 5 2019-01-09 12.50
5 Rachel's Rabbits

Retrieve rows (records) that Update rows Perform complex Cross-reference rows
match a certain criteria. (records) in mass. Aggregation. in different tables.
Update Order_Table SELECT * FROM
SELECT Cust_Name SELECT
Set Order_Total = Order_Table O
FROM Customer_Table AVG(Order_Total)
Order_Total * 1.1 ; Join Customer_Table C
WHERE Cust_No = 1 ; FROM Order_Table ; On O.CustNo = C.CustNo

Not all databases are alike. Teradata is special. No two database systems are the same.

10
Chapter 1 Database Concepts

There are Two Types of Databases

Customer_Table Order_Table Relational databases


are accessed by SQL.
CustNo
______ CustName
______________ Order_No CustNo Order_Date
________ _______ Order_Total
__________ ___________ Tables with
1 Benny's Boots 1000 1 2019-01-04 299.50 relationships have a
2 Dave's Dogs 1001 2 2019-01-05 1254.67 common column for
3 Hanna's Hamsters 1002 4 2019-01-08 67.00 joins. Teradata,
4 Frank's Frogs 1003 5 2019-01-09 12.50 Oracle, and DB2 are
5 Rachel's Rabbits relational databases

The Constitution of the United States Product Report NoSQL databases don't use
50 SQL, but MapReduce or a
We the People of the United States, in 40 proprietary language.
Order to form a more perfect Union, 30 NoSQL only retrieve
establish Justice, insure domestic 20 information. DynamoDB
Tranquility, provide for the common 10 and MongoDB are both
defense, promote the general Welfare, Ice Eggs Tea Ham NoSQL databases.

SQL databases have relations between tables and include constraints on data types and consistency. NoSQL does
away with relational rules for the sake of speed, flexibility, and scale. Relational databases follow strict rules.
NoSQL provides documents, contracts, pictures, and XML or JSON (Java Script Object Notation).

11
Chapter 1 Database Concepts

Relational Database Model and Mathematical Set Theory Concepts

Relational Concept Mathematical Concept


Entity A person, place or thing Relation Table
about which the table
Tuple Row
contains information.
Attribute Column
Cardinality The number of rows
in a table. A Venn diagram

Degree The number of columns A B illustrating the


intersection of
in a table. two sets

Set theory is a branch of mathematical logic that studies sets. In theory, no


two rows of a table can be duplicated, thus the name SET Table. A
Teradata MULTISET table allows duplicate rows.

Set theory is a collections of objects. Although any type of object can be


collected into a set, set theory is applied most often to objects that are
relevant to mathematics.

A Relational database model uses concepts from mathematical set theory. A Union is a set of all objects that are a
member of A, or B, or both. The Intersection is the set of all objects that are members of both A and B.

12
Chapter 1 Database Concepts

Relational Term Review

Logical Operational
Model Relational File
Theory
& RDBMS Systems

Entity or Table Relation File


Row Tuple Record
Column Attribute Field

PK – Primary Key – Uniquely identifies each row in a table


FK – Foreign Key – Identifies the relationship between tables
SA – System Assigned – A unique number assigned by the system
UA – User Assigned – Entry assigned by a user
NN – No NULLS – Primary Keys should never have null values
ND – No Duplicates – Primary Keys should always be unique
NC – No Changes – Primary Keys should never be changed

Above are some review terms that might be good to memorize.

13
Chapter 1 Database Concepts

Rules for Teradata Databases, Tables, and Columns

1. No two databases can have the same name in a teradata system.


2. No two tables within the same database can have the same name.
3. No two columns within the same table can have the same name.

database Tablename

CREATE SET TABLE SQL_Class.Order_Table


(
Order_Number INTEGER NOT NULL Constraint
Column
,Customer_Number INTEGER Data Type
Names
,Order_Date DATE
,Order_Total Decimal(10,2) All tables will have either:
) UNIQUE PRIMARY INDEX (Order_Number) • Unique Primary Index
• Non-Unique Primary Index
This is how a table is created in teradata. • No Primary Index

Teradata has databases and within a database are tables. Teradata doesn't implement schemas like SQL Server, but
a schema in Teradata refers to the organization of data as a logical grouping or blueprint of how the database
divides up into tables. A database schema refers to a set of formulas or sentences, which means integrity constraints
imposed on a database. A NOT NULL constraint, for example, will not allow a NULL value in that column.

14
Chapter 1 Database Concepts

What is a Constraint?

CREATE Table SQL_Class.Employee_Table


( EmpNo INTEGER NOT NULL CONSTRAINT EmpPK PRIMARY KEY
,DeptNo INTEGER CONSTRAINT Ref_1 REFERENCES Dept_Tbl(DeptNo) BETWEEN 100 AND 500
,Last_Name CHAR(20) NOT NULL
,First_Name VARCHAR(20) NOT NULL
,Salary DECIMAL(10,2) CONSTRAINT SalCheck CHECK (salary >=10000 and salary < 1000000)
,Hire_Date DATE
,Soc_Security INTEGER NOT NULL CONSTRAINT NameUniq UNIQUE ) ;

Constraints generally fall This table above has constraints on:


into three areas: Primary Key and Foreign Key
1. Unique Multiple NOT NULL constraints
2. Check A UNIQUE constraint on Soc_Security
3. Referential Integrity CHECK constraint on the salary

At times it is advisable to add restrictions to columns within a table. Defining constraints is done to provide data
integrity, availability, and ease of use to the table and its data. You can define constraints on column values during
the CREATE Table process. Defining constraints can also be done using the ALTER command once the table
already exists. You define constraints at the column or table level.

15
Chapter 1 Database Concepts

What is the Difference Between a Primary Key and a Foreign Key?

Customer_Table Order_Table
CustNo
_______ Cust Name
________________ Order_No CustNo Order_Date Order_Total
________ _______ __________ __________
1 Benny's Boots 1000 1 2019-01-04 299.50
2 Dave's Dogs 1001 2 2019-01-05 1254.67
3 Hanna's Hamsters 1002 4 2019-01-08 67.00
4 Frank's Frogs 1003 4 2019-01-09 12.50

A PK is usually a single column in a table that is a unique identifier for each row (record).

A FK is a normal column in a table that is a Primary Key in another table.

Above, we have a column named CustNo in the Customer_Table that is a Primary Key (PK).
We also have a column named CustNo in the Order_Table that is a Foreign Key (FK).
CustNo is a unique column in the Customer_Table, but in the Order_Table it is not unique.

The two tables can be joined because they have a PK/FK relation on CustNo.

Do you know what the initials NN, NC, ND mean for a Primary Key, which refers to PK? Logically, a Primary
Key should never be NULL (No Nulls (NN)), and never be changed (No Changes (NC)), but most importantly it
should never have a duplicate (No Dups (ND)).

16
Chapter 1 Database Concepts

Referential Integrity
Employee_Table Payroll_Table

EmpNo Name Salary Social_Sec


___________ EmpNo Wk_Sal
______ _________
______ _______ ________
1 John Doe 50000.00 245-55-1234 1 961.53
2 Jane Doe 60000,00 123-43-6789 2 1153.84
3 Lois Lane 80000.00 321-67-8974 3 1538.46
4 Clark Kent 75000.00 426-88-9023 4 1,442.30
PK PK FK
Imagine getting laid off and your company deletes you from the Employee_Table,
but they accidentally forget to delete you from the Payroll_Table.
If you have integrity you would report that within the next decade!
Referential Integrity (RI) means the foreign key must agree with the primary key
that is referenced by the foreign key.
You can delete someone from the Payroll_Table, but you cannot delete someone from
the Employee_Table unless they have already been deleted from the Payroll_Table.

Any primary key field changes must apply to all foreign keys, or not at all. Any foreign key updates (not deletions)
must propagate to the primary parent key. Referential Integrity prevents inconsistencies, which prevents the
database from being corrupted on inserts, updates, merges, and deletes. Referential integrity prevents someone
from being deleted from the Employee_Table and not from the Payroll_Table.

17
Chapter 1 Database Concepts

Quiz - Logical Model Primary Key Quiz

Addresses Subscribers Claims Each family member


has the same
Subscriber_No Subscriber_No Claim_ID Subscriber_No.
Member_No If the Jones family has
Subscriber_No a mom, dad and a
Member_No baby they would get:
Services Member_No 1
Place a PK on Member_No 2
Service_Code Claim_Service
each column Member_No 3
that is a Each family has an
Primary Key address. Each time a
Services
claim is filed a provider
Provider_No who performs the
Provider_Code
service is recorded.

This tricky quiz is going to teach you a lot in your quest to understand logical and physical modeling. You will
soon understand relational modeling and normalization as well as dimensional modeling.

18
Chapter 1 Database Concepts

Answer - Logical Model Primary Key Quiz

Addresses Subscribers Claims

Subscriber_No Subscriber_No *
PK
Claim_ID PK
PK Member_No
Subscriber_No
Member_No
Services

PK Service_Code Claim_Service

Services

PK Provider_No
Provider_Code

* The Subscribers table has a Subscriber_No, Member_No multi-column Primary Key

Above, we have a picture of s simple logical model. The logical modelers theoretically believe that this is the best
way to store, retrieve, and manipulate data for an insurance model. Each PK is unique. A Primary Key is always a
unique identifier for each row in that table, and it should never contain a NULL value.

19
Chapter 1 Database Concepts

Quiz - Logical Model Foreign Key Quiz

Addresses Subscribers Claims

Subscriber_No Subscriber_No *
PK
Claim_ID PK
PK Member_No
Subscriber_No
Member_No
Place a FK on Services
each column
that is a PK Service_Code Claim_Service
Foreign Key,
and draw a line
between the Services
PK and FK
PK Provider_No
relationship Provider_Code

* The Subscribers table has a Subscriber_No, Member_No multi-column Primary Key

A Foreign Key can have duplicate values, can be NULL, and it can be changed. Remember, both a Primary Key
or a Foreign Key can be a single column or a multi-column combination. Also, know that a PK and an FK column
can have different names, while still representing "like" data. Representing “like” data means the columns have
the same domain, which means they represent the same associated data.

20
Chapter 1 Database Concepts

Answer - Logical Model Foreign Key Quiz

Addresses Subscribers Claims


FK *
PK
Subscriber_No Subscriber_No Claim_ID PK
PK Member_No
Subscriber_No
Member_No
*
FK
Each FK
Services
can be
Non-Unique. PK Service_Code FK
Claim_Service
The PK/FK
relatiionship
means these Services
tables can
PK Provider_No FK
be joined. Provider_Code

A Foreign Key (FK) is a normal column(s) in a table that joins to a Primary Key (PK) column(s) in another table.
The PK/FK logical model is the relation between the tables, and that is why it is called a "Relational Model." A
Foreign Key can have duplicate values, can be NULL, and it can be changed. Notice that both a Primary Key or a
Foreign Key can be a single column or a multi-column combination. Also, notice that a PK and an FK column can
have different names, but they represent the same type of data. Since the two columns represent like data that is
joinable, they are from the same domain, which means they have the same range of values and data types.

21
Chapter 1 Database Concepts

Dimensional Model – One Fact Table and Many Dimension Tables


Fact Table Line_Order_Fact_Table
Small Dimension Table
LO_Order_Key
Small Dimension Table
LO_Line_Number Customer_Table
Part_Table LO_Cust_Key C_Cust_Key
P_Part_Key LO_Part_Key
Small Dimension Table
LO_Quantity
LO_Extended_Price Supplier_Table
Every dimensional model is composed of
single-table with a multipart key, called a LO_Supp_Key S_Supp_Key
fact table. The fact table is long, meaning LO_Order_Total_Price
Small Dimension Table
it has fewer columns, but millions of rows. LO_Discount
LO_Tax Date_Table
The fact table joins to a set of smaller LO_Order_Date D_Date_Key
tables called dimension tables. The
dimension tables are wide because they Each dimension table has a single-part primary key
have a lot of columns, but fewer rows. that corresponds exactly to one of the components
of the multipart key from the fact table.

Dimensional model design is best for business user reporting. Navigation paths are modeled and align with the
work process of users. Relational model design is best for business rules, but a Dimensional model emphasizes
usability. Tables in a Dimension define the way that that particular business uses the data. The model organizes
tables in a way that a given user or group of users think about the data.

22
Chapter 1 Database Concepts

Quiz – What is the Primary Key of the Fact Table?

Fact Table Line_Order_Fact_Table


Small Dimension Table
LO_Order_Key
Small Dimension Table
LO_Line_Number Customer_Table
Part_Table LO_Cust_Key C_Cust_Key
P_Part_Key LO_Part_Key
Small Dimension Table
LO_Quantity
A fact table has an M:M (many to LO_Extended_Price Supplier_Table
many) relationship. Fact tables should LO_Supp_Key S_Supp_Key
contain one or more numerical LO_Order_Total_Price
Small Dimension Table
measure, which are deemed "facts" of LO_Discount
the table. Each fact occurs for the LO_Tax Date_Table
combination of keys that define each LO_Order_Date D_Date_Key
column (tuple) in the table.

What is the Primary Key of the Fact table named Line_Order_Fact_Table?

23
Chapter 1 Database Concepts

Answer – What is the Primary Key of the Fact Table?

Fact Table Line_Order_Fact_Table


Small Dimension Table
LO_Order_Key
Small Dimension Table
LO_Line_Number Customer_Table
Part_Table LO_Cust_Key C_Cust_Key
P_Part_Key LO_Part_Key
Small Dimension Table
LO_Quantity
LO_Extended_Price Supplier_Table
Every dimensional LO_Supp_Key S_Supp_Key
model is composed of LO_Order_Total_Price
Small Dimension Table
single-table with a LO_Discount
multipart key, called a LO_Tax Date_Table
fact table. LO_Order_Date D_Date_Key

The Primary Key of the Fact Table is


LO_Cust_Key, LO_Part_Key, LO_Supp_Key, LO_Order_Date

The Primary Key of the Fact Table is a multi-column Primary Key of the four columns in the
Line_Order_Fact_Table colored in blue. They are LO_Cust_Key, LO_Part_Key, LO_Supp_Key, LO_Order_Date.

24
Chapter 1 Database Concepts

Quiz – Name the Foreign Keys

Fact Table Line_Order_Fact_Table


Small Dimension Table
LO_Order_Key
Small Dimension Table
LO_Line_Number Customer_Table
Part_Table LO_Cust_Key C_Cust_Key
P_Part_Key LO_Part_Key
Small Dimension Table
LO_Quantity
LO_Extended_Price Supplier_Table
Place a FK on LO_Supp_Key S_Supp_Key
each column LO_Order_Total_Price
Small Dimension Table
that is a LO_Discount
Foreign Key LO_Tax Date_Table
LO_Order_Date D_Date_Key

Name the Foreign Keys (FK) you see above.

25
Chapter 1 Database Concepts

Answer – Name the Foreign Keys

Fact Table Line_Order_Fact_Table


Small Dimension Table
LO_Order_Key
Small Dimension Table
LO_Line_Number Customer_Table
Part_Table LO_Cust_Key C_Cust_Key FK
FK P_Part_Key LO_Part_Key
Small Dimension Table
LO_Quantity
Each dimension table has LO_Extended_Price Supplier_Table
a single-part primary key LO_Supp_Key S_Supp_Key FK
that corresponds exactly LO_Order_Total_Price
Small Dimension Table
to one of the components LO_Discount
of the multipart Primary LO_Tax Date_Table
Key from the fact table. LO_Order_Date D_Date_Key FK

The Primary Key of the Fact Table is


LO_Cust_Key, LO_Part_Key, LO_Supp_Key, LO_Order_Date

The FK boxes display each Foreign Key column. The fact table has a multi-column Primary Key. Each
Dimension table has a corresponding Foreign Key so the dimension can join to the fact. Notice, each dimension
table joins to the fact table on a different column.

26
Chapter 1 Database Concepts

What is a Star Schema?

Referred to as a
Dimension Snowflake
Table schema.
Dimension Dimension
Table Table

A dimension contains
A star schema model is reference information about
a simple form of a Fact Table the fact, such as date,
dimensional model. product, or customer.

Dimension . A fact is an event Dimension


Table that is counted or Table
measured, such as
a product sale or a
customer login.

A star schema model is for a data warehouse or business intelligence analytics. A star schema model is a simple
form of a dimensional model. Data organizes into a single fact table and multiple dimension tables. A fact is an
event that is counted or measured, such as a product sale or a customer login. A dimension contains reference
information about the fact, such as date, product, or customer. A Star Schema is also referred to as a Snowflake
schema.

27
Chapter 1 Database Concepts

Normalization

The process of placing columns in the correct tables is called normalization. There are three
major forms of normalization. You start by trying to get to first normal form. You then attempt
second normal form, and your end goal is to get to third normal form.

First Normal Form (1NF) means that columns (attributes) must not repeat within a table. No
repeating groups such as Monday_Sales, Tuesday_Sales, Wednesday_Sales, etc.

Second Normal Form (2NF) means that all columns relate to the entire Primary Key and not just
a portion of the Primary Key. Columns can also relate to each other, such as Dept_No and
Department_Name, by both belonging to the same table. A table that has a single column Primary
Key is always said to be in at least 2NF.

Third Normal Form (3NF) means all columns must relate to the Primary Key and not to each
other. In other words, all columns must relate to the primary key, the whole key, and nothing but
the key, so help you, Codd! It was a gentleman that worked for IBM named Codd that came up with
the normalization techniques.

If a table is in 3NF, it is automatically also in 2NF and 1NF. If a table only achieves 2NF, then it automatically
qualifies to also be in 1NF. Normalization can be confusing, but the next couple of pages will explain it well.

28
Chapter 1 Database Concepts

A Denormalized Table

Product_Daily_Sales_Table
If a business
ProdID Mon ______
Tue Wed Thu Fri Sat Sun
performs best _______ ______ ______ _____ _____ _____ _____
when product 1 90.00 95.44 80.01 91.74 97.48 90.61 81.73
sales are 2 75.50 78.34 85.56 73.76 73.64 80.46 73.74
tracked daily, 3 56.37 59.50 59.57 57.55 51.88 52.67 47.65
and then 4 64.90 61.77 63.65 68.66 41.77 67.75 78.67
compared from 5 87.53 77.16 80.63 97.79 70.76 70.13 87.79
day to day, a 6 92.10 82.23 95.22 89.43 72.93 78.42 99.53
denormalized 7 20.45 27.41 30.46 19.80 23.31 40.56 29.40
table if fine. 8 15.23 25.53 19.83 35.48 24.73 29.43 25.68

This table is completely denormalized. It has not even reached


First Normal Form (1NF) because it has repeating groups. There
are seven groups ranging from Mon-Sun.

The table above is completely denormalized. It has not even reached the First Normal Form (1NF) because it has
repeating groups that are ranging from Mon-Sun. The three major forms of normalization are First Normal Form
(1NF), Second Normal Form (2NF), and Third Normal Form (3NF). If a table is in 3NF, it is automatically also in
2NF and 1NF. If a table only achieves 2NF, then it automatically qualifies to also be in 1NF.

29
Chapter 1 Database Concepts

Getting to First Normal Form

Prod_Sizes_Price_Table To bring this table into first normal form, we split the
This table is table into two tables.
not in first ProdID ____________
______ Sizes Price
______
normal form 1 Sm, Med, Lg 80.01 Prod_Price_Table Prod_Sizes_Table
because the 2 Sm, Med 85.56
(sizes) column ProdID Price ProdID ________
______ Sizes
3 Med 59.57
contains
multiple 4 Sm 63.65 1 80.01 1 Sm
values. 5 XLG 80.63 2 85.56 1 Med
3 59.57 1 Lg
4 63.65 2 Sm
5 80.63 2 Med
3 Med
4 Sm
5 XLG

The table on the left is not in first normal form, but once we split it into two tables, we have reached the first
normal form.

30
Chapter 1 Database Concepts

A Table that is in Second Normal Form

Employee_Table

Emp_Nbr First_Name __________


________ __________ Last_Name _________
Salary ________
Dept_No Dept_Name
___________
1 Abbie Gonzales 80000.00 100 Sales
2 Billy Patel 70000.00 100 Sales
3 Carl Smith 76000.00 100 Sales
4 David Chao 120000.00 200 Mrkt
5 Edwin Sidorski 58000.00 300 HR

This table has reached second normal form. This is because of


the column (Dept_Name). The Dept_Name column is
dependent on the Dept_No more than the Emp_Nbr primary key.

Because this table has reached second normal form, it


automatically qualifies as also reaching first normal form.

Second Normal Form (2NF) means that all columns relate to the entire Primary Key and not just a portion of the
Primary Key. Columns can also relate to each other.

31
Chapter 1 Database Concepts

Tables Reaching Third Normal Form

Employee_Table Department_Table

Emp_Nbr First_Name __________


Last_Name _________
Salary ________
Dept_No Dept_No ___________
________ Dept_Name
________ __________
1 Abbie Gonzales 80000.00 100 100 Sales
2 Billy Patel 70000.00 100 200 Mrkt
3 Carl Smith 76000.00 100 300 HR
4 David Chao 120000.00 200
5 Edwin Sidorski 58000.00 300

This tables have reached third normal form. In each table, the columns
all relate to the primary key, the whole key, and nothing but the key.

A Second Normal Form (2NF) means that all columns relate to the entire Primary Key and not just a portion of the
Primary Key. In the Second Normal Form, columns can also relate to each other. Third Normal Form (3NF) means
all columns must relate to only the Primary Key and not to each other. In other words, all columns must relate to
the primary key, the whole key, and nothing but the key.

32
Chapter 1 Database Concepts

Dimensional Modeling
Dimension Table
Dimension Table
Product_Id Product_Name Price
Store_Id Store_Location
_______ ______________ Address
______ _________ ______________ ______

11223344 Dayton, OH Main St. 1000 Nexus 1,000.32


99998888 Phoenix, AZ Apple St. 2000 SmartCompress 5,000.99
3000 Super Join Builder 1,300.00

Fact Table

________ _________
Store_ID Product_ID _________
Sale_Date __________
Daily_Sales
Dimensional models are 11223344 1000 2014-10-16 5240.50
also referred to as Star 11223344 2000 2014-10-16 6023.77
Schema or
11223344 3000 2014-10-16 6787.21
Star Join Schema model.
99998888 1000 2014-10-16 6223.97
99998888 2000 2014-10-16 5543.22
99998888 3000 2014-10-16 6123.85

Dimension Table
Sale_Date ______________
_________ Quarter_of_Year ___________
Day_of_Week _____________
Week_of_Month
2014-10-15 4 Wednesday 2
2014-10-16 4 Thursday 2

Dimensional modeling contains two types of tables. They are fact and dimension tables. A fact table will have
fewer columns, but millions of rows potentially. There is only one fact table because it is the main table that
contains good information such as the products a company sells, sales totals, quantities, and the dates, times, and
places they sold. There will be many dimension tables. Dimension tables generally have more columns, but fewer
rows. Each dimension table that joins back to the main fact table provides detail about the actual place or product
or date the sale took place. Benefits include easy querying, flexibility, and improved performance.

33
Chapter 1 Database Concepts

What are the Advantages of a Normalized vs. Dimensional Model?

Order_Table Dimension
Order_No CustNo Order_Date Order_Total Table
________ _______ __________ ___________
1000 1 2019-01-04 299.50
1001 2 2019-01-05 1254.67 Dimension Dimension
1002 4 2019-01-08 67.00 Table
Table
1003 4 2019-01-09 12.50
Fact Table
Customer_Table
Normalized data
is optimized for CustNo
______ CustName
______________
entity level 1 Benny's Boots De-normalized data is optimized for driving
transactions, 2 Dave's Dogs decision making by answering business questions.
such as row 3 Hanna's Hamsters
inserts, updates, 4 Frank's Frogs Dimension denormalization supports dimensional
and deletes. 5 Rachel's Rabbits modeling's top two priorities; speed and simplicity.

Normalized data optimizes for entity-level transactions, such as row inserts, updates, and deletes. De-normalized
data optimizes for driving decision making by answering business questions. Dimension denormalization supports
dimensional modeling's top two priorities; speed and simplicity.

34
Chapter 1 Database Concepts

Logical Data Model vs. Physical Data Model

A logical model is 100% A physical data model is the technical


technology independent. solution. It is specific to a given
database software and hardware.
Logical models can be
either Relational or Physical models can be either
Dimensional. Relational or Dimensional.

You would use the same You would use a different physical
logical model on a model on Teradata. A different physical
Teradata, Oracle, DB2, model on Oracle. A different physical
SQL Server or Amazon model on DB2. A different physical
Redshift system. model on SQL Server, etc.

Logical models are not Physical models are always


implemented on any implemented on hardware. They are
hardware, because they the CREATE Table statements, including
are theory oriented. the indexing and table type decisions.

Great logical modelers get a room full of a company's business leaders in a conference room, and they use a
whiteboard or flip chart to build the tables with the proper columns for each table. The logical model is merely
theoretically how the business wants to organize their data. When finished with the logical model, a leader gives
the logical model to the IT staff. The physical modelers and the DBA team build the tables on a Teradata system
(or another system) with proper indexing so that reports, Ad Hoc queries, joins and tactical queries run as fast and
efficiently as possible.

35
Chapter 1 Database Concepts

Logical Data Model vs. Physical Data Model Example

CREATE MULTISET Table Claims


Subscribers Claims (Claim_ID INTEGER NOT NULL
FK
,Subscriber_No INTEGER
PK PK
Subscriber_No Claim_ID ,Member_No INTEGER
Member_No ,Claim_Date DATE
Subscriber_No ,Claim_Service SMALLINT
Member_No FK ,Claim_Amt Decimal (10,2)
Services ,Provider_No INTEGER
) Unique Primary Index (Claim_ID)
Claim_Date
PK Service_Code Claim_Service FK CREATE TABLE [DB].[dbo].[CLAIMS]
Service_Desc ([CLAIM_ID] INTEGER NOT NULL,
Claim_Amt [SUBSCRIBER_NO] INTEGER NULL,
Services [MEMBER_NO] SMALLINT NULL,
[CLAIM_DATE] DATE NULL,
PK Provider_Code Provider_No FK [CLAIM_SERVICE] SMALLINT NULL,
[CLAIM_AMT] DECIMAL(12,2) NULL,
Provider_Name
[PROVIDER_NO] SMALLINT NULL)

Same logical model for Teradata and SQL Server Teradata physical (top) SQL Server (bot)

The logical model for how the company wants to organize its tables is the example above on the left. The company
has chosen a relational model that is in third normal form but could have chosen a dimensional model. The
physical model (for the Claims table only) displays above on the right. The top example is the physical model for a
Teradata system. The bottom example is a physical model for Microsoft SQL Server.

36
Chapter 1 Database Concepts

Data Warehouses are Designed for Analytics

Building a date warehouse is a


Oracle, SQL Server process, not a product. A data warehouse is a
and IBM's DB2 are Oracle data repository,
teradata
databases designed to specially constructed,
Data Warehouse
handle transactions. where data is
organized so that it
OLTP databases insert Microsoft can be easily accessed
single records and are SQL Server by end users for a
not designed to variety of applications.
handle, or structured, teradata loads data
A data warehouse starts small,
to do analytics well. prove it's value, and then grows. directly from
IBM Data warehouses answer operational systems
Analytics are designed DB2 business questions not making the data timely.
for a data warehouse. previously possible.

Data warehouses store enterprise-wide detailed data for analysis. This analysis allows for a strategic and tactical
business decision. A data warehouse is a "Decision Support System" or DSS. Teradata allows for transactions
from OLTP systems to immediately load into a Teradata system so that real-time tactical queries can run. A tactical
query is designed to return an answer in seconds. OLTP systems hold days, weeks, or months of data, but a data
warehouse holds years to decades of information.

37

You might also like