Introduction To Teradata and Teradata Architecture

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 66

Introduction to Teradata and Teradata

Architecture

Agenda
Teradata Overview
Teradata Components, Architecture and Teradata Features
Teradata Data Dictionary and Objects
Teradata Database, User and Space
Teradata Data Recovery and Protection

Teradata Overview

Teradata Overview
Teradata is a Relational Database Management System (RDBMS)
Designed specifically for DW and DSS
Open, UNIX-MP-RAS or NT-based system platforms
Compliant with ANSI industry standards
Runs on single (SMP) or multiple (MPP) nodes
Database server to client applications throughout the enterprise
Uses parallelism to manage terabytes of data
Shared nothing Architecture

Teradata Overview
Enormous capacity
Billions of rows
Terabytes of data
High-performance parallel processing
Single database server for multiple clients Single Version of the Truth
Network and mainframe connectivity
Industry standard access language (SQL)
Manageable growth via modularity
Fault tolerance at all levels of hardware and software

Teradata Components, Architecture and Features

Components and Architecture

Components and Architecture

Components and Architecture

Teradata Functional Overview

Teradata Features

Teradata Data Dictionary and Objects

The DD/D ...

is an integrated set of system tables

contains definitions of and information about all objects in the system

is entirely maintained by the RDBMS

is data about the data or metadata

is distributed across all AMPs like all tables

may be queried by administrators or support staff

is accessed via Teradata supplied views

Examples of DD/D views:


DBC.Tables - information about all tables
DBC.Users - information about all users
DBC.AllRights

- information about access rights

DBC.AllSpace

- information about space utilization

HELP Commands Teradata SQL Extension


Databases and Users:
HELP

DATABASE

HELP

USER

Customer_Service ;

Dave_Jones ;

Tables, Views, and Macros:


HELP

TABLE

Employee ;

HELP

VIEW

Emp;

HELP

MACRO Payroll_3;

HELP

COLUMN Employee.*;

Employee.last_name;
Emp.* ;
Emp.last;
HELP

INDEX

Employee;

HELP

STATISTICS

HELP

CONSTRAINT

Employee;
Employee.over_21;

Example of HELP DATABASE


Command:

HELP DATABASE Customer_Service;


*** Help information returned. 15 rows.
*** Total elapsed time was 1 second.
Table/View/Macro name
Contact T
?
Customer T
?
Cust_Comp_Orders V
Cust_Pend_Orders V
Cust_Order_ix I
?
Department
T
?
Employee T
?
Employee_Phone
T
Job T
?
Location T
?
Location_Employee T
Location_Phone
T
Orders
T
?
Set_Ansidate_on
M
Set_Integerdate_on M

Kind Comment
?
?

?
?
?
?
?

SHOW Command Teradata SQL Extension


SHOW commands display how an object was created.
Command
Returns statement
SHOW TABLE tablename;
CREATE TABLE statement
SHOW VIEW
viewname;CREATE VIEW ...
SHOW MACRO macroname;
CREATE MACRO ...
SHOW TRIGGER
triggername;
CREATE TRIGGER
SHOW PROCEDURE procedurename;
CREATE PROCEDURE
SHOW TABLE Employee;
CREATE SET TABLE CUSTOMER_SERVICE.Employee, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
employee_number INTEGER,
manager_employee_number INTEGER,
department_number INTEGER,
job_code INTEGER,
:
salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number );

EXPLAIN Facility Teradata SQL Extension


The EXPLAIN modifier in front of any SQL statement generates an English translation of the Parsers
plan.
The request is fully parsed and optimized, but not actually executed.
EXPLAIN returns:
Text showing how a statement will be processed (a plan)
An estimate of how many rows will be involved

A relative cost of the request (in units of time)

This information is useful for:


predicting row counts

predicting performance

testing queries before production

analyzing various approaches to a problem EXPLAIN

EXPLAIN SELECT last_name, department_number FROM Employee ;


Explanation (partial):
3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.Employee by way of an all-rows
scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of
Spool 1 is estimated to be 24 rows. The estimated time for this step is 0.15 seconds.

Teradata Database, User and Space

Storing and Accessing Data Rows

Storing Rows
AMP

AMP

AMP

AMP

Table A rows
Table B rows

The rows of every table are distributed among all AMPs

Each AMP is responsible for a subset of the rows of each table.

Ideally, each table will be evenly distributed among all AMPs.

Evenly distributed tables result in evenly distributed workloads.

The uniformity of distribution of the rows of a table depends on the choice of the
Primary Index.

Creating a Primary Index

A Primary Index is defined at table creation.

It may consist of a single column, or a combination of columns

Limit of 16 columns with V2R4.1 and prior releases

Limit of 64 columns with V2R5.

UPI

NUPI

CREATE TABLE sample_1


(col_a
INTEGER
,col_b
INTEGER
,col_c
INTEGER)
UNIQUE PRIMARY INDEX (col_b);

If the index choice of column(s) is unique,


we call this a UPI (Unique Primary Index).

CREATE TABLE sample_2


(col_x
INTEGER
,col_y
INTEGER
,col_z
INTEGER)
PRIMARY INDEX (col_x);

If the index choice of column(s) isnt unique,


we call this a NUPI (Non-Unique Primary
Index).

Note: Changing the choice of Primary Index


requires dropping and recreating the table.

A UPI choice will result in even distribution


of the rows of the table across all AMPs.

A NUPI choice will result in even distribution


of the rows of the table proportional to the
degree of uniqueness of the index.

Primary Index Values

The value of the Primary Index for a specific row determines the AMP assignment for
that row.

This is done using a hashing algorithm.

PE
Row assignment
Row access

AMP

Secondary index access


Full table scans

Hashing
Algorithm

AMP

Accessing the row by its Primary Index value is:

Other table access techniques:

PI Value

always a one-AMP operation


the most efficient way to access a row

AMP

Accessing Via a Unique Primary Index


A UPI access is a one-AMP operation which may access at most a single row.
CREATE TABLE sample_1
(col_a
INTEGER
,col_b
INTEGER
,col_c
INTEGER)
UNIQUE PRIMARY INDEX (col_b);

PE

SELECT col_a
,col_b
,col_c
FROM
sample_1
WHERE col_b = 345;

UPI = 345
Hashing
Algorithm

AMP

AMP

col_a col_b

col_c

AMP

col_a col_b

col_c

col_a col_b

123

345

567

234

456

678

col_c

Accessing Via a Non-Unique Primary Index


A NUPI access is a one-AMP operation which may access multiple rows.
CREATE TABLE sample_2
(col_x
INTEGER
,col_y
INTEGER
,col_z
INTEGER)
PRIMARY INDEX (col_x);

PE

NUPI = 25

SELECT col_x
,col_y
,col_z
FROM
sample_2
WHERE col_x = 25;

Hashing
Algorithm

AMP

Both UPI and NUPI


accesses are one
AMP operations.

AMP

col_x col_y

col_z

AMP

col_x col_y

col_z

col_x col_y

col_z

10

30

20

50

70

10

30

25

55

30

80

35

40

25

60

30

80

Primary Keys and Primary Indexes

Indexes are conceptually different from keys.

A PK is a relational modeling convention which allows each row to be uniquely identified.

A PI is a Teradata convention which determines how the row will be stored and accessed.

A significant percentage of tables may use the same columns for both the PK and the PI.

A well-designed database will use a PI that is different from the PK for some tables.

Primary Key

Primary Index

Logical concept of data modeling

Physical mechanism for access and storage

Teradata doesnt need to recognize

Each table must have exactly one primary index

No limit on number of columns

16 column limit (V2R4.1); 64 column limit (V2R5)

Documented in data model

Defined in CREATE TABLE statement

(Optional in CREATE TABLE)


Must be unique

May be unique or non-unique

Identifies each row

May be unique or non-unique

Values should not change

Values may be changed (Delete + Insert)

May not be NULL requires a value

May be NULL

Does not imply an access path

Defines most efficient access path

Chosen for logical correctness

Chosen for physical performance

Duplicate Rows
A duplicate row is a row of a table
whose column values are all identical to
another row in the same table.

Duplicate Rows

col_a col_b

col_c

20

50

25

50

25

50

Because a PK uniquely identifies each row, ideally a relational table should not have
duplicate rows!
The ANSI standard, however, permits duplicate rows for specialized situations, thus
Teradata permits them as well.
You may select whether your table will or will not allow them.
The Teradata default

The ANSI default

CREATE SET TABLE table_A


:
:

CREATE MULTISET TABLE table_B


:
:

Checks for * and disallows duplicate rows.

Doesnt check for and allows duplicate rows.

* Note: If a UPI is selected on a SET table, the duplicate row check is replaced by a
check for duplicate index values.

Row Distribution Using a UPI Case 1

Order

Notes:
Often, but not always, the PK column(s) will be
used as a UPI.
PI values for Order_Number are known to be
unique (its a PK).
Teradata will distribute different index values
evenly across all AMPs.
Resulting row distribution among AMPs is very
uniform.
Assures maximum efficiency for parallel
operations.

AMP

o_#

c_#

7202
7415

AMP

o_dt o_st

o_#

c_#

4/09

7325

4/13

AMP

o_dt o_st

o_#

c_#

4/13

7188

7103

4/10

7225

7402

4/16

AMP

o_dt o_st

o_#

c_#

o_dt o_st

4/13

7324

4/13

4/15

7384

4/12

Row Distribution Using a NUPI Case 2

Order

Notes:
Customer_Number may be the preferred access
column for ORDER table, thus a good index
candidate.
Values for Customer_Number are somewhat nonunique.
Choice of Customer_Number is therefore a NUPI.
Rows with the same PI value distribute to the same
AMP.
Row distribution is less uniform or skewed.

AMP

o_#

c_#

7325

AMP

o_dt o_st

o_#

c_#

4/13

7384

7202

4/09

7225

4/15

AMP

AMP

o_dt o_st

o_#

c_#

o_dt o_st

4/12

7402

4/16

7103

4/10

7324

4/13

7415

4/13

7188

4/13

Row Distribution Using a Highly Non-Unique Primary Index


(NUPI) Case 3
Order

Notes:
Values for Order_Status are highly nonunique.
Choice of Order_Status column is a NUPI.
Only two values exist, so only two AMPs
will ever be used for this table.
Table will not perform well in parallel
operations.
Highly non-unique columns are poor PI
choices generally.
The degree of uniqueness is critical to
efficiency.

AMP

AMP

AMP

AMP

o_dt o_st

o_#

c_#

4/16

7103

4/10

7202

4/09

7324

4/13

7225

4/15

7325

4/13

7415

4/13

7188

4/13

7384

4/12

o_#

c_#

7402

o_dt o_st

Primary Index
Mechanics

Hashing Primary Index Values


SQL with primary index values
and data.

PARSER

PI value = 38

Hashing
Algorithm

For example:
Assume PI value is 38

Hashing
Algorithm

Row Hash
DSW

PI values
and data

1177 7C3C
DSW

Message Passing Layer (Hash Maps)


AMP 0

AMP 1

...

...

AMP x

AMP n - 1

Hash Maps
AMP n

AMP #

Data Table
Row ID
Row Hash

Row Data

Uniq Value

The MPL uses the DSW of 1177


and uses this value to locate
bucket #1177 in the Hash Map.

x '00000000'

RH
Data

x'1177 7C3C' 0000 0001

x 'FFFFFFFF'

Summary

38

Bucket# 1177 contains the AMP


number that has this hash
value effectively the AMP with
this row.

Hashing Down to the AMPs


Index value(s)

hashing algorithm

Row Hash
DSW or
Hash Bucket #

Hash Map

AMP #

{
{
{
{

The hashing algorithm is designed to insure even distribution of


unique values across all AMPs.
Different hashing algorithms are used for different international
character sets.

A Row Hash is the 32-bit result of applying a hashing algorithm to


an index value.
The DSW or Hash Bucket is represented by the high order 16 bits
of the Row Hash.

A Hash Map is uniquely configured for each system.


It is a array of 65,536 entries (buckets) which associates bucket
numbers with specific AMPs.

Two systems with the same number of AMPs will have the same
Hash Map.
Changing the number of AMPs in a system requires a change to
the Hash Map.

Identifying Rows
A row hash is not adequate to uniquely identify a row.
Consideration #1
1254
A Row Hash = 32 bits = 4.2 billion possible values
Because there is an infinite number of possible
data values, some data values will have to share
the same row hash.

7769

Data values input

Hash Algorithm
10A2 2936

10A2 2936

Hash Synonyms

(Dave)
'Smith'

NUPI Duplicates

Consideration #2
A Primary Index may be non-unique (NUPI).

(John)
'Smith'

Different rows will have the same PI value and thus


the same row hash.

Hash Algorithm

0016 5557

Conclusion
A row hash is not adequate to uniquely identify a row.

0016 5557

Rows have
same hash

The Row ID
TO UNIQUELY IDENTIFY A ROW, WE ADD A 32-BIT UNIQUENESS VALUE.
THE COMBINED ROW HASH AND UNIQUENESS VALUE IS CALLED A ROW ID.
Row ID
Each stored row
has a Row ID as a
prefix.

Rows are logically


maintained in Row
ID sequence.

Row Hash
(32 bits)

Uniqueness Id
(32 bits)

Row ID

Row Data

Row ID

Row Data

Row Hash

Unique ID

Emp_No

3B11 5032
3B11 5032
3B11 5032
3B11 5033
3B11 5034
3B11 5034
:

0000 0001
0000 0002
0000 0003
0000 0001
0000 0001
0000 0002
:

1018
1020
1031
1014
1012
1021
:

Last_Name
Reynolds
Davidson
Green
Jacobs
Chevas
Carnet
:

First_Name
Jane
Evan
Jason
Paul
Jose
Jean
:

Secondary Indexes and Table Scans

Secondary Indexes
There are 3 general ways to access a table:
Primary Index access (one AMP access)
Secondary Index access
Full Table Scan

(two or all AMP access)

(all AMP access)

A secondary Index provides an alternate path to the rows of a table.


A table can have from 0 to 32 secondary indexes.
Secondary Indexes:
Do not effect table distribution.
Add overhead, both in terms of disk space and maintenance.
May be added or dropped dynamically as needed.
Are chosen to improve table performance.

Choosing a Secondary Index


A Secondary Index may be defined ...
at table creation

(CREATE TABLE)

following table creation

(CREATE INDEX)

may be up to 16 columns (V2R4.1); 64 columns (V2R5.0)


NUSI

USI
If the index choice of column(s) is
unique, it is called a USI.
Unique Secondary Index)

If the index choice of column(s) is nonunique, it is called a NUSI.


Non-Unique Secondary Index

Accessing a row via a USI is a 2 AMP


operation.

Accessing row(s) via a NUSI is an all AMP


operation.

CREATE UNIQUE INDEX

CREATE INDEX

(Employee_Number) ON Employee;

(Last_Name) ON Employee;

Notes:
Secondary Indexes cause an internal sub-table to be built.

Dropping the index causes the sub-table to be deleted.

Unique Secondary Index (USI) Access


Message Passing Layer

Create USI
CREATE UNIQUE INDEX
(Cust) ON Customer;

AMP 1

AMP 2

USI Subtable

Access via USI

RowID
244, 1
505, 1
744, 4
757, 1

SELECT *
FROM
Customer
WHERE Cust = 56;

Cust
74
77
51
27

RowID
884, 1
639, 1
915, 9
388, 1

AMP 3

USI Subtable
RowID
135, 1
296, 1
602, 1
969, 1

Cust
98
84
56
49

AMP 1

778

AMP 2

Base Table
Table ID
100

Row Hash USI Value


602

to MPL

56

Cust
31
40
45
95

RowID
638, 1
640, 1
471, 1
778, 3

RowID
175, 1
489, 1
838, 4
919, 1

Cust
37
72
12
62

RowID
107, 1
717, 2
147, 2
822, 1

Message Passing Layer

USI Value = 56
Hashing
Algorithm

RowID
288, 1
339, 1
372, 2
588, 1

USI Subtable

Row Hash Unique Val

100

Customer
Table ID = 100

USI Subtable

RowID
555, 6
536, 5
778, 7
147, 1

Table ID

PE

AMP 4

RowID Cust Name


USI
107, 1 37 White
536, 5 84 Rice
638, 1 31 Adams
640, 1 40 Smith

Phone
NUPI
555-4444
666-5555
111-2222
222-3333

AMP 3

Base Table
RowID Cust Name
USI
471, 1 45 Adams
555, 6 98 Brown
717, 2 72 Adams
884, 1 74 Smith

Phone
NUPI
444-6666
333-9999
666-7777
555-6666

AMP 4

Base Table
RowID Cust Name
USI
147, 1 49 Smith
147, 2 12 Young
388, 1 27 Jones
822, 1 62 Black

Phone
NUPI
111-6666
777-4444
222-8888
444-5555

Base Table
RowID Cust Name
USI
639, 1 77 Jones
778, 3 95 Peters
778, 7 56 Smith
915, 9 51 Marsh

Phone
NUPI
777-6666
555-7777
555-7777
888-2222

Non-Unique Secondary Index (NUSI) Access


Message Passing Layer

Create NUSI
CREATE INDEX (Name) ON
Customer;

AMP 1

AMP 2

NUSI Subtable

Access via NUSI


SELECT *
FROM
Customer
WHERE Name = 'Adams';

RowID
432, 8
448, 1
567, 3
656, 1

Name
Smith
White
Adams
Rice

RowID
640, 1
107, 1
638, 1
536, 5

AMP 3

NUSI Subtable
RowID Name
432, 3 Smith
567, 2 Adams
852, 1

Brown

RowID
884, 1
471, 1
717, 2
555, 6

AMP 4

NUSI Subtable
RowID
432, 1
448, 4
567, 6
770, 1

Name
Smith
Black
Jones
Young

RowID
147, 1
822, 1
338, 1
147, 2

NUSI Subtable
RowID
155, 1
396, 1
432, 5
567, 1

Name
Marsh
Peters
Smith
Jones

RowID
915, 9
778, 3
778, 7
639, 1

PE
Customer
NUSI Value = 'Adams'
Table ID = 100
Hashing
Algorithm

Table ID
100

to MPL

AMP 2

Base Table

Row Hash NUSI Value


567

AMP 1

Adams

RowID Cust Name


NUSI
107, 1 37 White
536, 5 84 Rice
638, 1 31 Adams
640, 1 40 Smith

Phone
NUPI
555-4444
666-5555
111-2222
222-3333

AMP 3

Base Table
RowID Cust Name
NUSI
471, 1 45 Adams
555, 6 98 Brown
717, 2 72 Adams
884, 1 74 Smith

Phone
NUPI
444-6666
333-9999
666-7777
555-6666

AMP 4

Base Table
RowID Cust Name
NUSI
147, 1 49 Smith
147, 2 12 Young
388, 1 27 Jones
822, 1 62 Black

Phone
NUPI
111-6666
777-4444
222-8888
444-5555

Base Table
RowID Cust Name
NUSI
639, 1 77 Jones
778, 3 95 Peters
778, 7 56 Smith
915, 9 51 Marsh

Phone
NUPI
777-6666
555-7777
555-7777
888-2222

Full Table Scans


Every row of the table must be read.
All AMPs scan their portion of the table in parallel.
Fast and efficient on Teradata due to parallelism.
Full table scans typically occur when either:
An index is not used in the query
An index is used in a non-equality test
Customer
Cust_ID
USI

Cust_Name

Cust_Phone
NUPI

Examples of Full Table Scans:


SELECT * FROM Customer WHERE Cust_Phone LIKE '524-_ _ _ _';
SELECT * FROM Customer WHERE Cust_Name = 'Davis';
SELECT * FROM Customer WHERE Cust_ID > 1000;

Teradat Data Recovery and Protection

Data Protection Features


Facilities that provide system-level protection
Disk Arrays
RAID data protection (e.g., RAID 1)
Redundant SCSI buses and array controllers
Cliques and Vproc Migration
SMP or O.S. failures - Vprocs can migrate to other nodes within the clique.
Facilities that provide Teradata DB protection
Locks

provides data integrity

Fallback

provides data access with a down AMP

Down AMP Recovery Journal

fast recovery of fallback rows for AMPs

Transient Journal

automatic rollback of aborted transactions

Permanent Journal

optional before and after-image journaling

RAID Technologies
RAID - Redundant Array of Independent Disks
RAID technology provides data protection at the disk drive level. With RAID 1
and RAID 5 technologies, access to the data is continuous even if a disk
drive fails.
RAID technologies available with Teradata:
RAID 1

For every disk there is a mirror disk available.

RAID 5

For every 3 disks there is a parity disk available.If any failure


happened at the disk level the parity disk will be used.

RAID 1 Mirroring
2 Drive Groups each with 1 mirrored pair of disks
Operating system sees 2 logical disks (LUNs) or volumes
If LUN 0 has more activity , more disk I/Os occur on the first two drives in
the array.
Disk Array Controller
LUN 1

LUN 0
Disk 1

Mirror 1

Disk 3

Mirror 3

Block A0

Block A0

Block B0

Block B0

Block A1

Block A1

Block B1

Block B1

Block A2

Block A2

Block B2

Block B2

Block A3

Block A3

Block B3

Block B3

2 Drive Groups each with 1 pair of


mirrored disks

If physical drives are 36 GB each, then each logical unit


(LUN) or volume is effectively 36 GB.

Characteristics

RAID 1
Summary

data is fully replicated


striped mirroring is possible with multiple pairs of disks in a drive group
transparent to operating system

Advantages

maximum data availability


read performance gains
no performance penalty with write operations
fast recovery and restoration

Disadvantages

50% of disk space is used for mirrored data

Summary

RAID 1 provides high data availability and performance, but storage costs are higher

RAID 5 Data Parity Protection


Sometimes referred to as 3 + 1 RAID 5.
When data is updated, parity is also updated.
new_data XOR current_data XOR current_parity = new_parity
Disk Array Controller
LUN 0
Disk 4

Disk 1

Disk 2

Disk 3

Block 0

Block 1

Block 2

Parity

Block 3

Block 4

Parity

Block 5

Block 6

Parity

Block 7

Block 8

Parity

Block 9

Block 10

Block 11

Block 12

Block 13

Block 14

Parity

If physical drives are 36 GB each, then each logical unit


(LUN) or volume is effectively 108 GB.

RAID 5 Summary
Characteristics
data and parity is striped and interleaved across multiple disks
XOR logic is used to calculate parity
transparent to operating system
Advantages
provides high availability with minimum disk space (e.g., 25%) used for
parity overhead
Disadvantages
write performance penalty
performance degradation during data recovery and reconstruction
Summary
High data availability with minimum storage cost
Good choice when majority of I/Os are reads and storage space is at a
premium

Cliques
Clique a set of SMPs that share a common set of disk arrays.
SMP001-4 AMPs
0

4 . 36

DAC-A

DAC-B

SMP001-5 AMPs
1

5 . 37

DAC-A

DAC-B

SMP002-4 AMPs
2

6 . 38

DAC-A

DAC-B

SMP002-5 AMPs
3

7 . 39

DAC-A

DAC-B

Teradata Vproc Migration


Vproc Migration vprocs in the failed node are started in the remaining
nodes within the clique.
SMP001-4 AMPs

SMP Fails

DAC-A

DAC-B

SMP001-5 AMPs
0

3 36

DAC-A

39

DAC-B

SMP002-4 AMPs
1

4 . 37

DAC-A

DAC-B

SMP002-5 AMPs
2

5 .

DAC-A

38

DAC-B

Fallback
A Fallback table is fully available in the event of an unavailable AMP.
A Fallback row is a copy of a Primary row which is stored on a different AMP.
AMP

Primary
rows
Fallback
rows

AMP

2
3

11

6
8

AMP

3
5

Benefits of Fallback
Permits access to table data during AMP off-line period.
Adds a level of data protection beyond disk array RAID.
Automatic restore of data changed during AMP off-line.
Critical for high availability applications.
Cost of Fallback
Twice the disk space for table storage.
Twice the I/O for Inserts, Updates and Deletes.

AMP

7
1

12
11

1
6

Note:
Loss of any two
AMPs in a cluster
causes RDBMS to
halt!

12

Fallback Clusters

A Fallback cluster is a defined set of AMPs across which fallback is implemented.

All Fallback rows for AMPs in a cluster must reside within the cluster.

Loss of one AMP in the cluster permits continued table access.

Loss of two AMPs in the cluster causes the RDBMS to halt.

Cluster 0

AMP 1

AMP 2

AMP 3

AMP 4

Primary
rows

62

27

34

22

50

78

19

14

38

Fallback
rows

34

14

19

38

22

62

50

27

78

Cluster 1

AMP 5

AMP 6

AMP 7

AMP 8

Primary
rows

41

66

58

93

20

88

45

17

37

72

Fallback
rows

93

72

88

45

17

37

58

41

20

66

Recovery Journal for Down AMPs


Recovery Journal is:

Automatically activated when an AMP is taken off-line.


Maintained by other AMPs in the cluster.
Totally transparent to users of the system.

While AMP is off-line

Journal is active.
Table updates continue as normal.
Journal logs Row IDs of changed rows for down-AMP.

When AMP is back on-line

Restores rows on recovered AMP to current status.


Journal discarded when recovery complete.

AMP 1

AMP 2

AMP 3

AMP 4

Vdisk
Primary
rows

62

27

34

22

50

78

19

14

38

Fallback
rows

34

14

19

38

22

62

50

27

78

Recovery
Journal

Row ID for 34

Row ID for 62

Row ID for 14

Transient Journal
Transient Journal provides transaction integrity

A journal of transaction before images.


Provides for automatic rollback in the event of TXN failure.
Is automatic and transparent.
Before images are reapplied to table if TXN fails.
Before images are discarded upon TXN completion.

Successful TXN
BEGIN TRANSACTION
UPDATE Row A
Before image Row A recorded (Add $100 to checking)
UPDATE Row B
Before image Row B recorded (Subtract $100 from savings)
END TRANSACTION
Discard before images

Failed TXN
BEGIN TRANSACTION
UPDATE Row A
Before image Row A recorded
UPDATE Row B
Before image Row B recorded
(Failure occurs)
(Rollback occurs) Reapply before images
(Terminate TXN)
Discard before images

Permanent Journal
The Permanent Journal is an optional, user-specified, system-maintained
journal which is used for recovery of a database to a specified point in time.
The Permanent Journal:
Is used for recovery from unexpected hardware or software disasters.
May be specified for ...
a.) One or more tables
b.) One or more databases
Permits capture of Before Images for database rollback.
Permits capture of After Images for database rollforward.
Permits archiving change images during table maintenance.
Reduces need for full table backups.
Provides a means of recovering NO FALLBACK tables.
Requires additional disk space for change images.
Requires user intervention for archive and recovery activity.

You might also like