SQL Tuning
SQL Tuning
SQL Tuning
2
Course Objectives
The tuning of SQL is more important than any other area of Oracle tuning,
and proper tuning of SQL statements can dramatically improve the
performance of an entire Oracle database.
This session is an effort for all Oracle professionals who must write SQL
statements in providing some of important guidelines, tips and techniques
for tuning any type of SQL.
3
Session Outline
Introduction
SQL Layout
SQL Writing Techniques
Index Design
Understanding Different Join Methods
Analyzing Query Plan
SQL Over Database Links
Partitioning
Schema Design
Analytical Functions
4
Introduction: Why Tuning?
5
Likely areas of Performance Problems
6
SQL Layout
Proper Layout aids...
Readability
Maintenance
7
Bad Layout…
SELECT /*+ ORDERED */
C.PRODUCT_GRP_ID, ROUND(sum(A.limit_base_amount)/1000000,3) ,
ROUND(sum(NVL(A.pndg_limit_base_amount,A.limit_base_amount))/1000000,3) ,
sum(ROUND(A.GROSS_EXP_BASE_AMOUNT/1000000,3))
FROM
PAR_FACILITY B, Par_Limit A,
PAR_BA_PRODUCTTYPE C
WHERE
B.BP_ID IN (SELECT COR.BP_ID FROM CORPORATE COR WHERE
COR.GROUP_BP_ID=:1.MO_INP_PRODGRP_RMET4021.BP_ID)
and 2= B.PRODUCT_GRP_ID
and C.PRODUCT_GRP_ID =2
and B.MDR_ID is null
and B.FACILITY_STATUS IN (1,2,3,5,6)
and B.PNDG_MDR_ID is null
and C.BA_PRODTYPE_ID = B.FACILITY_TYPE_ID
and A.owner_type=2
and A.owner_id=B.facility_id
and A.tenor_ind in ('L','B')
GROUP BY C.PRODUCT_GRP_ID
8
Improved Layout…
SELECT /*+ ORDERED */
PBPT.PRODUCT_GRP_ID,
ROUND(SUM(PL.LIMIT_BASE_AMOUNT)/1000000,3),
ROUND(SUM(NVL(PL.PNDG_LIMIT_BASE_AMOUNT,PL.LIMIT_BASE_AMOUNT))/1000000,3),
SUM(ROUND(PL.GROSS_EXP_BASE_AMOUNT/1000000,3))
FROM PAR_FACILITY PF,
PAR_LIMIT PL,
PAR_BA_PRODUCTTYPE PBPT
WHERE PF.BP_ID IN
(
SELECT COR.BP_ID
FROM CORPORATE COR
WHERE COR.GROUP_BP_ID=:1.MO_INP_PRODGRP_RMET4021.BP_ID
)
AND PF.PRODUCT_GRP_ID=2
AND PBPT.PRODUCT_GRP_ID=2
AND PF.MDR_ID IS NULL
AND PF.FACILITY_STATUS IN(1,2,3,5,6)
AND PF.PNDG_MDR_ID IS NULL
AND PBPT.BA_PRODTYPE_ID = PF.FACILITY_TYPE_ID
AND PL.OWNER_TYPE=2
AND PL.OWNER_ID=PF.FACILITY_ID
AND PL.TENOR_IND IN('L','B')
GROUP BY PBPT.PRODUCT_GRP_ID;
9
Bad Layout…
SELECT A.BP_ID,
A.SHORT_NAME,
A.LEGAL_NAME,
C.MNMNC,
TO_CHAR(B.RATING_DATE,'YYYYMMDD'),
E.MNMNC,
TO_CHAR(D.RATING_DATE,'YYYYMMDD')
FROM CORPORATE A,
PAR_COUNTERPARTY_RATING B,
SYSTEM_CODES C,
PAR_COUNTERPARTY_RATING D,
SYSTEM_CODES E
WHERE A.BP_ID=B.BP_ID(+)
AND B.RATING_AGENCY_ID(+)=2
AND B.RATING_VALUE=C.CODE_VAL(+)
AND C.CODE_ID(+)=1112
AND A.BP_ID=D.BP_ID(+)
AND D.RATING_AGENCY_ID(+)=6
AND D.RATING_VALUE=E.CODE_VAL(+)
AND E.CODE_ID(+)=1112
10
Improved Layout…
SELECT COR.BP_ID, /*COUNTERPARTY_ID */
COR.SHORT_NAME, /* SHORT_CODE */
COR.LEGAL_NAME, /* LEGAL_NAME */
SC_CRF.MNMNC, /* CRF_RATING */
TO_CHAR(PCR_CRF.RATING_DATE,'YYYYMMDD'), /* CRF_RATING_DATE */
SC_ERF.MNMNC, /* ERF_RATING */
TO_CHAR(PCR_ERF.RATING_DATE,'YYYYMMDD') /* ERF_RATING_DATE */
FROM CORPORATE COR,
PAR_COUNTERPARTY_RATING PCR_CRF,
SYSTEM_CODES SC_CRF,
PAR_COUNTERPARTY_RATING PCR_ERF,
SYSTEM_CODES SC_ERF
WHERE COR.BP_ID=PCR_CRF.BP_ID(+)
AND PCR_CRF.RATING_AGENCY_ID(+)=2 /* CRF (d_RatingAgencyCode)
*/
AND PCR_CRF.RATING_VALUE=SC_CRF.CODE_VAL(+)
AND SC_CRF.CODE_ID(+)=1112 /* d_RatingCode */
AND COR.BP_ID=PCR_ERF.BP_ID(+)
AND PCR_ERF.RATING_AGENCY_ID(+)=6 /* ERF (d_RatingAgencyCode)
*/
AND PCR_ERF.RATING_VALUE=SC_ERF.CODE_VAL(+)
AND SC_ERF.CODE_ID(+)=1112 /* d_RatingCode */
11
Guidelines for Layout
Statements should be indented so that individual lines are neatly aligned.
Each expression in the SELECT list should be on a different line.
Every table in the FROM list should be on a fresh line.
Oracle keywords (SELECT, INTO, FROM, WHERE, AND/OR, GROUP BY,
HAVING, CONNECT BY, FOR UPDATE OF, ORDER BY) should be placed
on a new line.
12
Guidelines for Layout
All keywords within the same select should line up with the previous keywords.
Nested select statements should be indented and line up with the second word
on the previous line.
SELECT PP.PORTFOLIO_ID,
PP.PORTFOLIO_NAME,
PL.LIMIT_AMOUNT
FROM PAR_PORTFOLIO PP,
PAR_LIMIT PL
WHERE PP.PORTFOLIO_ID=PL.OWNER_ID
AND PL.LIMIT_TYPE_ID = ’CCE’
AND EXISTS (
SELECT1
FROM PAR_PORTFOLIO_DETAILS PPD
AND PPD.PORTFOLIO_ID =
PP.PORTFOLIO_ID
);
13
Guidelines for Layout
Constants, program variables etc., should be on the right hand side of a
WHERE or HAVING clause.
SELECTLEGAL_NAME
FROM CORPORATE
WHERE BP_ID = 21;
SELECTLEGAL_NAME
FROM CORPORATE
WHERE BP_ID = : h_BP_ID;
14
Guidelines for Layout
Columns requiring a outer joined should appear on the right hand side of a
WHERE or HAVING clause.
SELECT PF.FACILITY_ID
FROM PAR_FACILITY PF,
PAR_FACILITY_DETAILS PFD
WHERE PF.FACILITY_ID = PFD.FACILITY_ID (+);
15
Guidelines for Layout
Table aliases should be used in all queries that have more than one table in
the FROM clause. Instead of using A, B, C, T1, T2 etc as aliases, it is better
to use some naming convention which would easily identify the table say
COR for CORPORATE, PF for PAR_FACILITY, PM for PAR_MDR etc.
SELECTCOR.BP_ID,
PF.FACILITY_ID
FROM CORPORATE COR,
PAR_FACILITY PF
WHERE COR.BP_ID = PF.BP_ID;
16
Guidelines for Layout
As an aid to understanding, WHERE clauses should be laid out with the join
clauses first, and the restriction clauses second.
SELECT PF.FACILITY_ID,
PFD.MATURITY_DATE,
PFD.MATURITY_PERIOD_TYPE,
PFD.MATURITY_PERIOD_VALUE
FROM PAR_FACILITY PF,
PAR_FACILITY_DETAILS PFD
WHERE PF.FACILITY_ID=PFD.FACILITY_ID
AND PF.PRODUCT_GRP_ID=2;
17
SQL Technical Writing
Unintentionally disabling indexes
Issues with date related queries
Implicit Type Conversions
Investigate Rewriting the Query
Resource intensive operations
Know your data …
Queries without proper joins
Queries on views
Data Model Changes
18
Unintentionally disabling of indexes
Do not modify columns in the WHERE clause. The reason for this is that
Oracle cannot use an index if the column is modified in the where clause.
The following queries would stop usage of indexes.
SELECT *
FROM PAR_LIMIT
WHERE UPPER (LIMIT_TYPE_ID) = ’CCE’;
SELECT *
FROM CORPORATE
WHERE BP_ID + 1 = 21;
Note: At times intentionally disabling of indexes can also be useful. This needs to
decided as per the search condition.
19
SQL Technical Writing
Do not concatenate columns in the WHERE clause, this also would stop usage of
indexes.
Query :
SELECT*
FROM PAR_LIMIT
WHERE PERIOD_VALUE||PERIOD_TYPE=‘0D’
will not use the index. The query need to written as
Query :
SELECT*
FROM PAR_LIMIT
WHERE PERIOD_VALUE=0
AND PERIOD_TYPE=‘D’
20
SQL Technical Writing
If a column has a function-based index the WHERE clause also should use
that function.If there is function-based index on LEGAL_NAME using UPPER
function
SELECT*
FROM CORPORATE
WHERE LEGAL_NAME=’ BASF AKTIENGESELLSCHAFT’;
SELECT*
FROM CORPORATE
WHERE UPPER (LEGAL_NAME) = ‘BASF
AKTIENGESELLSCHAFT’;
21
SQL Technical Writing
For SQL queries involving DATE related queries, do not use TRUNC function and
also BETWEEN in the WHERE clauses. The first one would stop usage of any
index available on that column and second one would result in returning wrong data
The below query would not use the index (on ANNOTATION_DATE).
SELECT*
FROM PAR_ANNOTATION
WHERE TRUNC (ANNOTATION_DATE) = ‘01-MAR-03’;
The below query will use the index but would also include ANY records for midnight
of ‘02-MAR-03’.
SELECT*
FROM PAR_ANNOTATION
WHERE ANNOTATION_DATE BETWEEN ‘01-MAR-03’
AND ‘02-MAR-03’;
22
SQL Technical Writing
The correct way for writing above query would be
SELECT*
FROM PAR_ANNOTATION
WHERE ANNOTATION_DATE >= TO_DATE (‘01-MAR-03
00:00:00’,’DD-MON-YY HH24:MI:SS’)
AND ANNOTATION_DATE < TO_DATE (‘02-
MAR- 03 00:00:00’,’DD-MON-YY HH24:MI:SS’);
23
Implicit Type conversions
Developers need to be aware of the type conversions that Oracle performs
implicitly. Oracle may choose to convert either the column or the constant. If the
column is chosen then the indexes cannot be used.
24
Implicit Type conversions
The below query would not use the index (ANNOTATION_TYPE which is of
VARCHAR2 datatype is part of PRIMARY KEY).
SELECT *
FROM PAR_ANNOTATION
WHERE ANNOTATION_TYPE = 2;
For the query to use the index, we need to rewrite the query as
SELECT*
FROM PAR_ANNOTATION
WHERE ANNOTATION_TYPE = TO_CHAR (2);
25
Implicit Type conversions
In Quartz perspective, while using the MO’s please check the DATATYPE of
the column to which you are associating it in the WHERE clause of the query.
All the MO’s are strings (or CHAR data type) and appropriate function needs
to be chosen in queries to avoid implicit conversions.
26
Investigate Rewriting Of A Query
When doing any referential validations i.e., foreign key validations use EXISTS
operator. The reason for this is that with the EXISTS operator the oracle kernal
knows that once it has found one match it can stop. It doesn’t have to continue the
Full Table Scan.
The below queries could have been rewritten using EXISTS
Query 1:
SELECTPP.PORTFOLIO_ID
FROM PAR_PORTFOLIO PP,
PAR_PORTFOLIO_DETAILS PPD
WHERE PP.PORTFOLIO_ID=PPD.PORTFOLIO_ID;
Query 2:
SELECTPP.PORTFOLIO_ID
FROM PAR_PORTFOLIO PP
WHERE PP.PORTFOLIO_ID IN (
SELECTPPD.PORTFOLIO_ID
FROM PAR_PORTFOLIO_DETAILS
);
27
Query using Existing Operator
SELECT PP.PORTFOLIO_ID
FROM PAR_PORTFOLIO PP
WHERE EXISTS
(SELECT 1
FROM PAR_PORTFOLIO_DETAILS PPD
WHERE PPD.PORTFOLIO_ID=PP.PORTFOLIO_ID
);
28
Implicit Type conversions
Using EXISTS would speed up the query if the referential column has an
index on it, else using IN would faster because the inner query is executed
only once, where as for EXISTS it is executed for each record
fetched.
We also need to check how many records the inner query would return
on using IN. The decision needs to be based on the availability of index
and data distribution.
29
Resource Intensive Operations
Avoid using DISTINCT, UNION, MINUS, INTERSECT, ORDER BY and
GROUP BY operations as they call upon resource intensive sorts.
Other ways of writing the queries should be found. Most queries that use the
operators, UNION, MINUS and INTERSECT, can be rewritten in other ways.
UNION ALL can be used instead of UNION.
DISTINCT has been used in the below query to suppress the multiple records
being fetched.
30
Resource Intensive Operations
Can be rewritten to remove the DISTINCT operator
SELECTPL.OWNER_ID,
PL.LIMIT_AMOUNT,
PL.GROSS_EXP_AMOUNT
FROM PAR_LIMIT PL
AND PL.LIMIT_TYPE_ID=’CCE’
AND EXISTS (
SELECT1
FROM PAR_PORTFOLIO_DETAILS PPD
WHERE PPD.PORTFOLIO_ID = PL.OWNER_ID
AND PPD.BA_PRODTYPE_ID = 146
);
31
Know Your Data
Before using any functions check whether they are really required with
reference to the data on which the queries would be running. The following
cases should be avoided.
32
Know Your Data
For queries, which run only for Single Tile portfolios, which cannot be
clustered, you can add one more clause of
AND
PAR_PORTFOLIO_DETAILS.CLUSTER_PORTFOLIO_LINK IS
NULL
AND PAR_FACILITY.PRODUCT_GRP_ID = 2
33
Queries Without Proper joins
Ensure that queries do not result in Cartesian product. If queries are not properly
joined the query runs into Cartesian Product. This can be at times disastrous also
when you are trying to insert result of a query into a table. The below should be
avoided
The below query has a missing join with PBAP which returns (Result set of join
between PL & PPD) * (Number of records in PBAP).
SELECT PL.OWNER_ID,
PL.GROSS_EXP_AMOUNT,
PL.NETT_EXP_AMOUNT,
PBAP.SHORT_NAME
FROM PAR_LIMIT PL,
PAR_PORTFOLIO_DETAILS PPD,
PAR_BA_PRODUCTTYPE PBAP
WHERE PL.OWNER_ID = PPD.PORTFOLIO_ID
AND PPD.PORTFOLIO_ID = 5455763
AND PL.LIMIT_TYPE_ID = ‘CCE’
34
Queries Without Proper joins
The below query would return (Number of records in COR) * (Number of records in
PF)
SELECT COUNT (*)
FROM CORPORATE COR,
PAR_FACILITY PF;
SELECT COR.BP_ID
FROM CORPORATE COR,
PAR_PARTICIPATING_BRANCH PPB;
35
Queries Without Proper joins
All above examples can result in drastic performance issues for the
application when used in REPORTS, BATCHES, VIEWS and all UI related
queries.
Such also result in duplicate records, which are normally overlooked by using
DISTINCT in the queries. DISTINCT works well in most of the cases as a
workaround but has very bad impact on performance. Imagine doing a
distinct on few millions of records.
36
Queries On Views
Do not use an existing view as a part of the query. This is because the view
itself needs to be executed first.
37
Queries On Views
CREATE OR REPLACE VIEW V_PAR_CP_TYPE(
BP_ID,
SHORT_NAME,
CP_TYPE)
AS SELECT
COR.BP_ID,
COR.SHORT_NAME,
SC.MNMNC
FROM CORPORATE COR,
SYSTEM_CODES SC
WHERE COR.TYPE=SC.CODE_VAL(+)
AND SC.CODE_ID(+)=126028
38
Instead of Writing a Query
The below query does a Full Table Scan of CORPORATE thrice.
SELECT COR.BP_ID,
COR.SHORT_NAME,
VPCR.RESPONSIBILITY,
VPCT.CP_TYPE
FROM CORPORATE COR,
V_PAR_CP_RESP VPCP,
V_PAR_CP_TYPE VPCT
WHERE COR.BP_ID = VPCP.BP_ID
AND VPCP.BP_ID = VPCT.BP_ID;
39
Instead of Writing a Query
Alternate and efficient way of rewriting the above query
SELECT COR.BP_ID,
COR.SHORT_NAME,
SC_RESP.MNMNC,
SC_TYPE.MNMNC
FROM CORPORATE COR,
SYSTEM_CODES SC_RESP,
SYSTEM_CODES SC_TYPE
WHERE COR. RESPONSIBILITY = SC_RESP.CODE_VAL
(+)
AND SC_RESP. CODE_ID (+) = 126481 /*
d_ParResponsibilityId */
AND COR. TYPE = SC_TYPE.CODE_VAL (+)
AND SC_TYPE.CODE_ID (+) = 126028 /*
d_ParCounterpartytyp */
40
Data Model Changes
In certain cases, we also need to look at the possibility of changes data
model. There can be few cases where changing table attributes can also
result in efficient queries.
PAR_ANNOTATION table also holds the history of annotations (of
counterparty, facility and portfolio). Queries find the latest annotation using
MAX function for ANNOTATION_DATE and ANNOTATION_TIME. Since the
table is a big one, doing a MAX and GROUP BY makes queries slower.
These queries can be made much faster by adding an extra column like
LATEST_FLAG and the UI services update latest annotation with Y and older
ones with N and queries picking up only the Y ones could result in much
better performances.
41
Index Design
Index Selectivity
Implicit Indexes
Concatenated Indexes
Index Merges
NULL Values
Functional Indexes
Foreign Keys
42
Index Selectivity
The selectivity of a column or group of columns is a common measure of the
usefulness of an index. Columns or indexes are selective if they have a large
number of unique values or few duplicate values.
From the below distribution of TYPE of CORPORATE table, TYPE is a non-
selective column.
TYPE COUNT(*)
1 105216
2 69086
3 155940
Selective indexes are more efficient than nonselective indexes because they
point more directly to specific values.
43
Unique Indexes
A unique index is one that prevents any duplicate values for the columns that
make up the index.
A unique index is typically created in order to prevent duplicate values rather
than to improve performance. However, unique index columns are often
efficient-they point exactly one row and are therefore very selective.
In PAR_ANNOTATION table a unique index can be (OWNER,
ANNOTATION_TYPE, ANNOTATION_DATE, ANNOTATION_TIME).
44
Implicit Indexes
Implicit indexes are created automatically by Oracle in order to implement
either a primary key or unique constraint.
BP_ID is PRIMARY KEY for CORPORATE table, hence it is also an implicit
index.
In addition to the PRIMARY KEY, there may be a number of candidate keys
that are also unique. Oracle allows you to define these keys (and also foreign
keys) when you create a table.
In order to implement either a PRIMARY or UNIQUE constraint, Oracle
creates a unique index. In addition to enforcing uniqueness, these indexes
are available to enhance the performance of queries.
45
Concatenated Indexes
A concatenated index is simply an index comprising of more than one
column.
It is often more selective than a single key index.
The combination of columns will point to a smaller number of rows than
indexes composed of the individual columns.
A concatenated index that contains all of the columns referred in an SQL
statement’s WHERE clause will usually be very effective.
For querying any limits associated with a PORTFOLIO or FACILITY, its
common to query OWNER_ID, OWNER_TYPE, PERIOD_TYPE and
PERIOD_VALUE from PAR_LIMIT. Hence all these columns qualify for a
concatenated index.
46
Choosing the Best Column Order
The commonly used column in the queries should be the initial or leading or
earliest columns.
SELECT *
FROM PAR_ANNOTATION
WHERE OWNER = 6378859
AND ANNOTATION_TYPE=1
AND ANNOTATION_DATE=’01-MAR-03’
AND ANNOTATION_TIME= 188322;
47
Choosing the Best Column Order
Will not use the index.
SELECT *
FROM PAR_ANNOTATION
WHERE ANNOTATION_TYPE=1
AND ANNOTATION_DATE=’01-MAR-
03’
AND ANNOTATION_TIME= 188322;
48
Index Merges
If more than one column from a table appears in the WHERE clause and
there is no concatenated index on the columns concerned but there are
indexes on the individual columns, then Oracle may decide to perform an
index merge.
In order to perform an index merge, Oracle retrieves all rows from each
index with matching values and then merges these two lists or result sets and
returns only those that appear in both the lists.
Performing index merges is almost always less efficient than the equivalent
concatenated index. If you see an index merge (shown in execution plans
with AND EQUALS operator), consider creating an appropriate concatenated
index.
49
NULL Values
NULLs are not indexed. When an indexed column is NULL, or when all
columns in a concatenated index are NULL, then the row concerned will not
have entry in the index. If you are using IS NULL in WHERE clause, it would
result in full-table scan.
SELECT *
FROM PAR_PORTFOLIO_DETAILS
WHERE CLUSTER_PORTFOLIO_LINK IS NULL.
50
Functional Indexes
Functional indexes allow you to create an index on a function or expression.
The function or expression must be deterministic, which means that if a given
the same inputs it must always return the same outputs regardless of
changes in the environment or the date and time.
51
Functional Indexes
The following is invalid because the value of the F_CRTD_ON change over
time:
52
Foreign Keys
You can declare referential integrity constraints to prevent rows being
inserted into detail (or child) tables that do not have a matching row in a
master (or parent table). This will also reduce unnecessary searches in
queries over data, which is not valid.
53
Understanding Different Join Methods
Joins
Join Order
How Join Method Affects Join Order
Illustrations
54
Joins
Joins allow rows from two or more tables to be merged, usually based on
common key values.
Oracle supports three join techniques
The SORT-MERGE join
The NESTED Loop Joins
The HASH Join
55
The SORT MERGE Join
The sort-merge join is a join method that does not require indexes. Oracle
sorts each table (or result set from a previous operation) on the column
values used to join the two tables. Oracle then merges the two sorted result
sets into one.
E.g. SELECT PL.OWNER_ID,
PL.LIMIT_AMOUNT,
CURR.NAME
FROM PAR_LIMIT PL,
CURRENCY_CIRECTORY CURR
WHERE
PL.LIMIT_CURR=CURR.CRNCY_CODE_ALPHBTC;
There are no indexes on LIMIT_CURR and CRNCY_CODE_ALPHBTC.
56
The NESTED LOOPS Join
The nested loops join is a join algorithm that usually involves an index on at
least one of the tables.
In a nested loops join, a full-table scan is done on one of the tables
(depends on the join-conditions applied) or result sets (probably either the
smaller table or the table or the table that does not have an index on the join
column). For every row found in the result set, a lookup-usually involving an
index-is performed on the second table and the matching row is retrieved.
For a nested loop join ensure that the index used for the join contains as
many columns as possible from the WHERE clause.
57
The HASH Join
In a hash join, a hash table (a sort of on-the-fly index) is constructed for the
larger of the two tables. The smaller table is then scanned, and the hash
table is used to find the matching rows in the larger table.
This can work very well, especially if the hash table can fit in memory. It
performs well if the two tables are of different sizes.
58
Join Order
The driving table-the first table in the join order-should be the one that has
the most selective and efficient WHERE clause condition. That is, eliminate
any rows from the result sets as early in the join order as possible. This will
reduce the number of rows returned in each step.
If you are joining small subsets of the tables involved, try to use nested loops
for each subsequent join provided there is a supporting index.
Make sure the indexes supporting the nested loops join contain all of the
columns in the WHERE clause for the table being joined.
59
How Join Method Effects Join Order
If joining using the SORT-MERGE method, the join order is irrelevant. Each
table must be fully read, sorted, and merged. The order in which tables
appear in the FROM clause or which table is referenced in the USE_MERGE
hint is irrelevant.Both the queries will have same performance on changing
join order and also table referenced in USE_MERGE hint.
60
How Join Method Effects Join Order
61
How Join Method Effects Join Order
62
ILLUSTRATION
The size of each table is
63
ILLUSTRATION
Query 1:
SELECT PL.OWNER_ID,
PL.PERIOD_VALUE,
PL.PERIOD_TYPE,
PL.LIMIT_TYPE_ID,
PL.LIMIT_AMOUNT,
PBAP.SHORT_NAME
FROM PAR_LIMIT PL,
PAR_PORTFOLIO_DETAILS PPD,
PAR_BA_PRODUCTTYPE PBAP
WHERE PL.OWNER_TYPE=1
AND PL.OWNER_ID = PPD.PORTFOLIO_ID
AND PPD.CLUSTER_PORTFOLIO_LINK IS
NULL
AND
PPD.BA_PRODTYPE_ID=PBAP.BA_PRODTYPE_ID;
64
ILLUSTRATION
Execution Plan
SELECT STATEMENT
HASH JOIN
TABLE ACCESS (FULL) OF 'PAR_BA_PRODUCTTYPE’
MERGE JOIN
SORT (JOIN)
TABLE ACCESS (FULL) OF
'PAR_PORTFOLIO_DETAILS’
SORT (JOIN)
TABLE ACCESS (FULL) OF 'PAR_LIMIT'
65
Execution Plan
The plan shows that PAR_PORTFOLIO_DETAILS, PAR_LIMIT are having
FULL table scan and the join the optimizer id taking is MERGE-SORT and
the result set is having a HASH-JOIN with PAR_BA_PRODUCTTYPE.
Here the query performance was found to be very bad. It took about 4-5
minutes to get the result set.
The poor performance can be judged as all tables are having a full table
scan and also for PPD and PL are undergoing a MERGE-SORT which is
very much inefficient due to bigger sizes of the tables. HASH JOIN seems
to be OK on PAR_BA_PRODUCTTYPE as this one is a small table and
FULL table scan would be faster and also less memory required for a hash
table.
But we also observe that though joins are all on indexed columns none are
being used. This is a good example where optimizer gave a not really a
optimum plan (All tables were analyzed before taking this plan).
66
ILLUSTRATION
Query 2:
67
ILLUSTRATION
SELECT /*+ ORDERED USE_NL (PPD) */
PL.OWNER_ID,
PL.LIMIT_AMOUNT,
PL.PERIOD_VALUE,
PL.PERIOD_TYPE,
PL.LIMIT_TYPE_ID,
PBAP.SHORT_NAME
FROM PAR_LIMIT PL,
PAR_PORTFOLIO_DETAILS PPD,
PAR_BA_PRODUCTTYPE PBAP
WHERE PL.OWNER_TYPE=1
AND PL.OWNER_ID = PPD.PORTFOLIO_ID
AND PPD.CLUSTER_PORTFOLIO_LINK IS NULL
AND PPD.BA_PRODTYPE_ID=PBAP.BA_PRODTYPE_ID;
68
Execution plan
SELECT STATEMENT
HASH JOIN
TABLE ACCESS (FULL) OF 'PAR_BA_PRODUCTTYPE’
NESTED LOOPS
TABLE ACCESS (FULL) OF 'PAR_LIMIT’
TABLE ACCESS (BY INDEX ROWID) OF 'PAR_PORTFOLIO_DETAILS’
INDEX (RANGE SCAN) OF 'PAR_PORTFOLIO_DETAILS_PK’
69
Analyzing the Query Plan
EXPLAIN PLAN
How to get the EXECUTION PLAN for the query?
How to read the plan?
70
Explain Plan
Queries can be analyzed using EXPLAIN PLAN Command
The EXPLAIN PLAN command displays the execution plan chosen by the
Oracle optimizer for SELECT, UPDATE, INSERT, and DELETE statements.
A statement's execution plan is the sequence of operations that Oracle
performs to execute the statement. By examining the execution plan, you can
see exactly how Oracle executes your SQL statement.
EXPLAIN PLAN results alone cannot tell you which statements will perform
well, and which badly. For example, just because EXPLAIN PLAN indicates
that a statement will use an index does not mean that the statement will run
quickly. The index might be very inefficient!
Use EXPLAIN PLAN to determine the access plan, and to test modifications
to improve the performance.
71
Explain Plan
EXPLAIN PLAN tells you the execution plan the optimizer would choose if it
were to produce an execution plan for a SQL statement at the current time,
with the current set of initialization and session parameters.
But this is not necessarily the same as the plan that was used at the time the
given statement was actually executed. The optimizer bases its analysis on
many pieces of data--some of which may have changed!
72
How to get the EXECUTION PLAN for the query?
TOAD has the facility to see the query execution plan.
If you are not having the TOAD, then on the SQL Plus session issue the
following command
SQL > SET AUTOTRACE TRACEONLY EXP
Run your query.
73
How to Read the Plan?
It is easier to form a tree structure of the plan where each node is a statement
to be executed.
Then start reading from bottom to top.
74
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=56132
card=62639 Bytes=2818755)
1 0 HASH JOIN (Cost=56132 Card=62639 Bytes=2818755)
2 1 TABLE ACCESS (FULL) OF 'PAR_BA_PRODUCTTYPE' (Cost=1
Card =110 Bytes=2090)
75
Tree Diagram
0 SELECT STATEMENT
1 HASH JOIN
76
SQL Over Database Links
A database link defines a connection from one database instance to another.
Used in conjunction with synonyms, database links can make remote objects
appear to be local as far as applications and users are concerned.
77
Database links
A database link created in DB_A connecting to DB_B, can allow users in
DB_A to see remote objects as if they were local.
DB_A DB_B
78
Design Guidelines
79
Interdatabase referential integrity
80
Interdatabase referential integrity
If in certain cases this is unavoidable, say CUSTOMERS table exists at
Headquarters database and ORDERS being booked at regional site, then
interdatabase referential integrity can achieved by
Method 1:
Use of BEFORE ROW trigger on the ORDERS table that checks for the
existence of a corresponding CUSTOMER_ID record in the remote
CUSTOMERS table whenever a record is inserted.
Drawbacks:
Dependency on availability of remote database always. To avoid this
dependency to some extent maybe you can postpone the validation process
till the database becomes available using some STATUS columns.
81
Interdatabase referential integrity
Method 2:
Create a local snapshot of the master table using WITH PRIMARY KEY
clause. Snapshots can be refreshed at regular intervals of time.
82
Naming Conventions
Keep names unique across all schema in the distributed environment. This is
not a requirement, but it makes administration less confusing.
Example:
COUNTRY and COUNTRIES can be a very POOR choice.
Why?
Because this name is likely to be common to more than one application and
therefore several schema. When tables are replicated, either as master
tables or snapshots, it may not be possible to create identical public
SYNONYMS in all the databases.
83
Naming Conventions
You can avoid name space collisions by prefixing table names with two or
three characters that associate them with a given application.
e.g., COUNTRIES table that the order entry system uses would become
OE_COUNTRIES.
Check the length of table names. Various replication-related objects have
generated names that are based on a table name such as the
SNAP$_CUSTOMERS, If the generated name ends up exceeding more than
30 characters, then the name needs to be truncated, which may result in
naming conflicts.
Keep table names short – never more than 23 characters
84
Distributed Queries and Transactions
Distributes queries fall into two categories:
Queries whose target tables are all located in single remote site also called
remote queries.
When an application initiates a remote query, the optimizer recognizes that
the tables involved are in a single database, and it passes the entire query to
the remote database, which executes it with the same efficiency as a local
query. The remote database then sends that data back.
There are really NO optimisations that you can make on a remote query,
although if you find that your application is performing a high number of
remote queries you might consider relocating the data to the local database
either in its entirety or as snapshots, thereby reducing the network traffic.
85
Distributed Queries and Transactions
Queries whose target tables are located at multiple sites.
Example:
SELECT CUSTOMER_NAME,
TO_CHAR(ORDER_DATE,’Month’) month,
SUM(ORDER_TOTAL) monthly_sales
FROM CUSTOMERS@DB_LINK_1 C,
ORDERS@DB_LINK_2 O
WHERE C.CUSTOMER_ID = O.CUSTOMER_ID
GROUP BY CUSTOMER_NAME,
TO_CHAR(ORDER_DATE, ‘Month’),
TO_CHAR(ORDER_DATE,’MM’)
ORDER BY CUSTOMER_NAME, TO_CHAR (ORDER_DATE, ‘MM’);
86
Distributed Queries and Transactions
Oracle must first determine which columns are in which tables by querying
the data dictionaries at DB_LINK_1 and DB_LINK_2. Then it retrieves the
data from these two remote databases. These queries return all records from
the remote tables.
87
Distributed Queries and Transactions
In DB_LINK_1
SELECT “CUSTOMER_ID”,”CUSTOMER_NAME” FROM “CUSTOMERS” C
In DB_LINK_2
SELECT “CUSTOMER_ID”,”ORDER_TOTAL” FROM “ORDERS” O
Oracle performs joins and aggregations of this data at the site at which the
query originated, also known as driving site.
88
Control the Driving Site
Be mindful of how much data is being shipped to the driving site. Whenever
possible, originate distributed queries from the database that holds the most target
data, or use the DRIVING_SITE hint to force a specific site to be the driving site.
In previous query, we could force the driving site to be DB_LINK_2 since ORDERS
table is bigger than CUSTOMERS table:
SELECT /*+ DRIVING_SITE (ORDERS) */
CUSTOMER_NAME,
TO_CHAR(ORDER_DATE,’Month’) month,
SUM(ORDER_TOTAL) monthly_sales
FROM CUSTOMERS@DB_LINK_1 C,
ORDERS@DB_LINK_2 O …
This hint forces the site with the ORDERS table to perform the joins and
aggregations the query requires.
89
Control the Driving Site
Another way to force a specific site to be the driving site is to create views of
remote data at the desired site. The views tricks the optimiser into thinking
that the table is at the site.
For same example,
At DB_LINK_2
CREATE VIEW CUSTOMERS_VIEW AS
SELECT CUSTOMER_ID, CUSTOMER_NAME
FROM CUSTOMERS@DB_LINK_2
90
Control the Driving Site
And at the site originating the query
SELECT
CUSTOMER_NAME,
TO_CHAR(ORDER_DATE,’Month’) month,
SUM(ORDER_TOTAL) monthly_sales
FROM COUSTOMERS_VIEW@DB_LINK_2 C,
ORDERS@DB_LINK_2 O
91
Control Index Usage
The only way that Oracle will even consider the use of indexes in distributed
queries is if the database uses the cost-based optimizer, if your database is
configured to use the rule-based optimizer, your distributed queries are
performing full table scans.
92
Partitioning
Partitioning is the ability of the database to take very large tables or indexes
and physically break them into smaller, more manageable pieces.
93
Partitioning Schemes
Range-Partition Tables
Hash Partitioning
List Partitioning (New in Oracle 9i)
94
Range Partitioning Tables
Simply breaking up data based on ranges. Performance is best when data is
evenly distributed within each range, so your KNOWLEDGE of DATA is very
important.
CREATE TABLE DEPT (
DEPTNO NUMBER (2),
DEPT_NAME VARCHAR2 (30))
PARTITION BY RANGE (DEPTNO)
(
PARTITION D1 VALUES LESS THAN (10) TABLESPACE DEPT1,
PARTITION D2 VALUES LESS THAN (20) TABLESPACE DEPT2,
PARTITION D1 VALUES LESS THAN (MAXVALUE) TABLESPACE
DEPT3);
95
In KS Project
CREATE TABLE KS_PD_DY_GCO
(
BP_ID NUMBER(12) not null,
BUSINESS_DATE DATE not null,
…..
RAROC_DATE DATE,
PARTITION_KEY NUMBER(2)
)
PARTITION BY RANGE (PARTITION_KEY)
(
PARTITION KS_PD_DY_GCO_P1 VALUES LESS THAN (2),
PARTITION KS_PD_DY_GCO_P2 VALUES LESS THAN (3),
PARTITION KS_PD_DY_GCO_P3 VALUES LESS THAN (4),
PARTITION KS_PD_DY_GCO_P4 VALUES LESS THAN (5),
PARTITION KS_PD_DY_GCO_P5 VALUES LESS THAN (6),
PARTITION KS_PD_DY_GCO_P6 VALUES LESS THAN (7));
96
Range Partitioning (Multicolumn)
In addition to using single columns to range-partition data, you can also use
multi columns that more commonly map to business processes.
The following example shows how to extend simple range partitioning to
multiple columns that make up quarterly data-a perfect application for multi-
column partitioning. The columns sale_yr, sale_mth and sale_day are the
ranges need to specify a given quarter.
97
Range Partitioning (Multicolumn)
CREATE TABLE CUST_SALES (
ACCT_NO NUMBER (5),
CUST_NAME CHAR (30),
SALE_DAY INTEGER NOT NULL,
SALE_MTH INTEGER NOT NULL,
SALE_YR INTEGER NOT NULL)
PARTITION BY RANGE (SALE_YR, SALE_MTH, SALE_DAY)
(
PARTITION CUST_SALES_Q1 VALUES LESS THAN (1998, 04, 01) TABLESPACE USERS1,
PARTITION CUST_SALES_Q2 VALUES LESS THAN (1998, 07, 01) TABLESPACE USERS2,
PARTITION CUST_SALES_Q3 VALUES LESS THAN (1998, 10, 01) TABLESPACE USERS3,
PARTITION CUST_SALES_Q4 VALUES LESS THAN (1999, 01, 01) TABLESPACE
USERS4);
98
Hash Partitioning
For hash partitioning, you need only specify the number of partitions and
where you want them to be physically placed;
Oracle database then automatically inserts rows into partitions based on
hash of the partition key.
It is important to specify a power of 2 for number of hash partitions (define 2
(power n) partitions); otherwise uneven distributions of rows may occur.
You can specify the storage for the whole table and the tablespace for the
partitions.
Local indexes are always equi partitioned (on the same partitioning key and
partition break points).
99
Example
Here’s an example of a hash-partitioned table that splits the table into four
parts based on the hash of the partition key, acct_no.
100
List Partitioning(New in Oracle9i)
List portioning is new with Oracle9i. It allows for a more flexible means of
partitioning when you understand your data very well and can balance it
equally by segmenting data into lists of values. The example below partitions
the table based on lists of department names.
List partitioning is similar to range partitioning, but note that there is no
“maxvalue” option here – if a value other than those specified in the list is
inserted, Oracle database will reject that record.
101
List Partitioning(New in Oracle9i)
CREATE TABLE DEPT_PART (
DEPTNO NUMBER (2),
DNAME VARCHAR2 (14),
LOC VARCHAR2 (13))
PARTITION BY LIST (DNAME)
(
PARTITION D1_EAST VALUES (‘NEW YORK’),
PARTITION D2_WEST VALUES (‘SAN FRANCISCO’, ‘LOS
ANGELES’),
PARTITION D3_SOUTH VALUES (‘ATLANTA’,’DALLAS’,’HOUSTON’),
PARTITION D4_NORTH VALUES (‘CHICAGO’,’DETROIT’));
102
Schema Design
Fundamental Schema Design Principles
Let the Database Enforce Data Integrity
Data Integrity will be compromised at some point in time
Server-enforced integrity is blindingly fast
The database provides more information
103
Schema Design
Data Integrity will be compromised at some point in time.Virtually every
developed system that chooses to enforce foreign key outside the database
has orphaned child rows (child rows without parents).
Check the following query in such systems, you will find it returns some rows
SELECT foreign_key_columns
FROM child_table
MINUS
SELECT primary_key_columns
FROM parent;
You might also find NULL values where none should be and data that does
not conform to your business rules (out-of-range data for example). These
arise from an inconsistent application of business rules.
104
Schema Design
Server-enforced integrity is blindingly fast.Otherwise, imagine code that
needs to be written on client side for every integrity to be maintained.
105
Use the Correct Database
DO NOT
Use a string to store dates or times
Use a string to store numbers
Use VARCHAR2(4000) to store all strings
Use CHAR(2000) to store all strings, wasting tons of space and forcing the
use of a lot of TRIM function calls
Put text in a BLOB (raw) type
106
Simple Rule
Put dates in dates, numbers in numbers, and strings in strings. Never use a
datatype to store something other than what it was designed for, and use the
most specific type possible.
Furthermore, only compare dates to dates, strings to strings, and numbers to
numbers.
107
Simple Rule
When dates and numbers are stored in strings, or stored using inappropriate
lengths, your system suffers:
You lose the edit upon insertion to the database, verifying that your dates are
actual dates and numbers are valid numbers.
You lose performance
You potentially increase storage needs
You definitely decrease data integrity
108
Common Errors
109
How the Data Integrity Suffers?
Systems that use strings for dates or numbers will have some records with
dates that are not valid and numbers that are not numbers.
Functions written to convert strings to dates return NULL when the dates
won’t convert.
01/02/03 is what?
YY/MM/DD or DD/MM/YY
110
How the Performance Suffers?
Dates in strings vs Dates in Dates: The first one during any comparison is a
search of characters and second one oracle understands it is between two
dates.
Numbers in dates needs to be handled using TO_NUMBER functions, else
comparisons are again character comparison.
Oracle can go in for explicit function conversions. Oracle might choose to
either convert a column/ value (We have already talked about it).
111
How to Increase storage Requirements?
112
Bottlenecks
What is going to happen when users pull this in a query that formats each
field based on width of the column in the database?
“They will see one column and need to scroll way over to see the second, the
third, and so on”
Say developers start to build a data-entry screen to put data into the
database.
“That code filed can be 4,000 characters long and that first name can be
4000 characters long!”
“Sorting of data???”
113
Bottlenecks
A code prepares a query that selects 10 columns that are VARCHAR2. Say,
developers like to fetch them into an Array of say 100 rows (very typical).
RAM allocation = 4000 * 10 * 100 = 4 MB (!!!)
Can we calculate for 1000 rows/ 10,000 rows?
A CHAR(2000) will consume 2000 bytes of storage whether you put in a
letter a, the string ‘hello world’ or 2,000 characters. A CHAR is always blank-
padded.
In above case if you happen to use an index – storage-related problems?
114
Analytical Functions
115
Analytical Functions – Details and Summary Data
Counterparty 21 and 23 has 4 and 2 facilities and corresponding
limits as given below
116
Bottlenecks
Requirement: List sum of limits against a counterparty and also display facilities, i.e., sort of summary and detail as
well.
BP_ID SUM(LIMIT_AMOUN
T)
21 1000 Cannot display facilities !
23 1100
117
Advantages
Using analytical functions it becomes easier
SELECT FACILITY_ID, BP_ID,
SUM(LIMIT_AMOUNT) OVER (PARTITION BY
BP_ID) AS SUM_LIM
FROM TEMP
Would give
FACILITY_ID BP_ID SUM_LIM
10000 21 1000
PARTITION identifies rows
10001 21 1000
to aggregate. Rows must
10002 21 1000
have the same BP_ID as
10003 21 1000 the detail row
20000 23 1100
20001 23 1100
118
Analytical Functions – Ranking Results
Ranking cannot be done just by simple sorting
Position in sorted list is different from rank in list
Ties are given different positions but the same rank
BP_ID FACILITY_I LIMIT_AMOUN SORT_ RANK DENSE_
D T ORDER RANK DENSE_RANK
does not skip
21 10000 600 1 1 1 rank due to tie.
21 10001 500 2 2 2
21 10002 300 3 3 3
21 10003 300 4 3 3
23 20000 200 5 5 4
23 20001 100 6 6 5
SELECT BP_ID, FACILITY_ID, LIMIT_AMOUNT,
RANK() OVER (ORDER BY LIMIT_AMOUNT DESC) RANK_VALUES,
DENSE_RANK() OVER (ORDER BY LIMIT_AMOUNT DESC)
DENSE_RANK_VALUES
FROM TEMP 119
Analytical Functions
SELECT BP_ID, FACILITY_ID, LIMIT_AMOUNT,
RANK() OVER (ORDER BY LIMIT_AMOUNT DESC)
RANK_VALUES
FROM TEMP
WHERE RANK_VALUES <=4
Would give
BP_I FACILITY LIMIT_AMOU RAN
D _ID NT K
21 10000 600 1
21 10001 500 2
21 10002 300 3
21 10003 300 3
120
Analytical Functions
SELECT BP_ID, FACILITY_ID, LIMIT_AMOUNT,
DENSE_RANK() OVER (ORDER BY LIMIT_AMOUNT
DESC) DENSE_RANK_VALUES
FROM TEMP
WHERE RANK_VALUES <=4
Would give
BP_I FACILITY_I LIMIT_AMOU RANK
D D NT
21 10000 600 1
21 10001 500 2 RANK doesn’t
21 10002 300 3 return this row
due to tie
21 10003 300 3
23 20000 200 4
121
Analytical Functions
TOP 5 employees in terms of hours worked
SELECT *
FROM ( SELECT EMP_NO, SUM(HOURS) AS SUM_HRS
FROM TIME_SHEETS
GROUP BY EMP_NO )
WHERE 5 >=
( SELECT COUNT(COUNT(*))
FROM TIME_SHEETS
GROUP BY EMP_NO
HAVING SUM(HOURS)>SUM_HRS )
TIME_SHEETS contain 55,00,000 entries (50,000 employees * 22 days * 5 tasks)
Imagine execution time – Try it, probably you will find query takes hell lot of time (MAY NOT
come out in your lifetime as well)
122
Analytical Function
Use RANK function instead
SELECT *
FROM
(
SELECT EMP_NO, SUM(HOURS) AS SUM_HRS,
RANK() OVER (ORDER BY SUM(HOURS) DESC ) AS RNK
FROM TIME_SHEETS
GROUP BY EMP_NO
)
WHERE RNK <=5
Would give result very quickly
123