SQL Tuning

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 123

SQL Tuning

© Tata Consultancy Services ltd. June 9, 2024 1


Course Prerequisites

 A basic understanding of SQL in Oracle


 Understanding of a procedural language

2
Course Objectives
 The tuning of SQL is more important than any other area of Oracle tuning,
and proper tuning of SQL statements can dramatically improve the
performance of an entire Oracle database.

 This session is an effort for all Oracle professionals who must write SQL
statements in providing some of important guidelines, tips and techniques
for tuning any type of SQL.

3
Session Outline
 Introduction
 SQL Layout
 SQL Writing Techniques
 Index Design
 Understanding Different Join Methods
 Analyzing Query Plan
 SQL Over Database Links
 Partitioning
 Schema Design
 Analytical Functions

4
Introduction: Why Tuning?

 Performance is the key to success of an application

 Performance issues are to be dealt with, during all stages of software


development

 AIM : Get best system throughput, with available / minimum resources

 Avoid unnecessary investment in additional hardware

 Better system throughput implies better productivity

 Better productivity implies better service

5
Likely areas of Performance Problems

6
SQL Layout
Proper Layout aids...
 Readability
 Maintenance

7
Bad Layout…
SELECT /*+ ORDERED */
C.PRODUCT_GRP_ID, ROUND(sum(A.limit_base_amount)/1000000,3) ,
ROUND(sum(NVL(A.pndg_limit_base_amount,A.limit_base_amount))/1000000,3) ,
sum(ROUND(A.GROSS_EXP_BASE_AMOUNT/1000000,3))
FROM
PAR_FACILITY B, Par_Limit A,
PAR_BA_PRODUCTTYPE C
WHERE
B.BP_ID IN (SELECT COR.BP_ID FROM CORPORATE COR WHERE
COR.GROUP_BP_ID=:1.MO_INP_PRODGRP_RMET4021.BP_ID)
and 2= B.PRODUCT_GRP_ID
and C.PRODUCT_GRP_ID =2
and B.MDR_ID is null
and B.FACILITY_STATUS IN (1,2,3,5,6)
and B.PNDG_MDR_ID is null
and C.BA_PRODTYPE_ID = B.FACILITY_TYPE_ID
and A.owner_type=2
and A.owner_id=B.facility_id
and A.tenor_ind in ('L','B')
GROUP BY C.PRODUCT_GRP_ID

8
Improved Layout…
SELECT /*+ ORDERED */
PBPT.PRODUCT_GRP_ID,
ROUND(SUM(PL.LIMIT_BASE_AMOUNT)/1000000,3),
ROUND(SUM(NVL(PL.PNDG_LIMIT_BASE_AMOUNT,PL.LIMIT_BASE_AMOUNT))/1000000,3),
SUM(ROUND(PL.GROSS_EXP_BASE_AMOUNT/1000000,3))
FROM PAR_FACILITY PF,
PAR_LIMIT PL,
PAR_BA_PRODUCTTYPE PBPT
WHERE PF.BP_ID IN
(
SELECT COR.BP_ID
FROM CORPORATE COR
WHERE COR.GROUP_BP_ID=:1.MO_INP_PRODGRP_RMET4021.BP_ID
)
AND PF.PRODUCT_GRP_ID=2
AND PBPT.PRODUCT_GRP_ID=2
AND PF.MDR_ID IS NULL
AND PF.FACILITY_STATUS IN(1,2,3,5,6)
AND PF.PNDG_MDR_ID IS NULL
AND PBPT.BA_PRODTYPE_ID = PF.FACILITY_TYPE_ID
AND PL.OWNER_TYPE=2
AND PL.OWNER_ID=PF.FACILITY_ID
AND PL.TENOR_IND IN('L','B')
GROUP BY PBPT.PRODUCT_GRP_ID;

9
Bad Layout…
SELECT A.BP_ID,
A.SHORT_NAME,
A.LEGAL_NAME,
C.MNMNC,
TO_CHAR(B.RATING_DATE,'YYYYMMDD'),
E.MNMNC,
TO_CHAR(D.RATING_DATE,'YYYYMMDD')
FROM CORPORATE A,
PAR_COUNTERPARTY_RATING B,
SYSTEM_CODES C,
PAR_COUNTERPARTY_RATING D,
SYSTEM_CODES E
WHERE A.BP_ID=B.BP_ID(+)
AND B.RATING_AGENCY_ID(+)=2
AND B.RATING_VALUE=C.CODE_VAL(+)
AND C.CODE_ID(+)=1112
AND A.BP_ID=D.BP_ID(+)
AND D.RATING_AGENCY_ID(+)=6
AND D.RATING_VALUE=E.CODE_VAL(+)
AND E.CODE_ID(+)=1112

10
Improved Layout…
SELECT COR.BP_ID, /*COUNTERPARTY_ID */
COR.SHORT_NAME, /* SHORT_CODE */
COR.LEGAL_NAME, /* LEGAL_NAME */
SC_CRF.MNMNC, /* CRF_RATING */
TO_CHAR(PCR_CRF.RATING_DATE,'YYYYMMDD'), /* CRF_RATING_DATE */
SC_ERF.MNMNC, /* ERF_RATING */
TO_CHAR(PCR_ERF.RATING_DATE,'YYYYMMDD') /* ERF_RATING_DATE */
FROM CORPORATE COR,
PAR_COUNTERPARTY_RATING PCR_CRF,
SYSTEM_CODES SC_CRF,
PAR_COUNTERPARTY_RATING PCR_ERF,
SYSTEM_CODES SC_ERF
WHERE COR.BP_ID=PCR_CRF.BP_ID(+)
AND PCR_CRF.RATING_AGENCY_ID(+)=2 /* CRF (d_RatingAgencyCode)
*/
AND PCR_CRF.RATING_VALUE=SC_CRF.CODE_VAL(+)
AND SC_CRF.CODE_ID(+)=1112 /* d_RatingCode */
AND COR.BP_ID=PCR_ERF.BP_ID(+)
AND PCR_ERF.RATING_AGENCY_ID(+)=6 /* ERF (d_RatingAgencyCode)
*/
AND PCR_ERF.RATING_VALUE=SC_ERF.CODE_VAL(+)
AND SC_ERF.CODE_ID(+)=1112 /* d_RatingCode */

11
Guidelines for Layout
 Statements should be indented so that individual lines are neatly aligned.
 Each expression in the SELECT list should be on a different line.
 Every table in the FROM list should be on a fresh line.
 Oracle keywords (SELECT, INTO, FROM, WHERE, AND/OR, GROUP BY,
HAVING, CONNECT BY, FOR UPDATE OF, ORDER BY) should be placed
on a new line.

12
Guidelines for Layout
 All keywords within the same select should line up with the previous keywords.
Nested select statements should be indented and line up with the second word
on the previous line.

SELECT PP.PORTFOLIO_ID,
PP.PORTFOLIO_NAME,
PL.LIMIT_AMOUNT
FROM PAR_PORTFOLIO PP,
PAR_LIMIT PL
WHERE PP.PORTFOLIO_ID=PL.OWNER_ID
AND PL.LIMIT_TYPE_ID = ’CCE’
AND EXISTS (
SELECT1
FROM PAR_PORTFOLIO_DETAILS PPD
AND PPD.PORTFOLIO_ID =
PP.PORTFOLIO_ID
);

13
Guidelines for Layout
 Constants, program variables etc., should be on the right hand side of a
WHERE or HAVING clause.

SELECTLEGAL_NAME
FROM CORPORATE
WHERE BP_ID = 21;

SELECTLEGAL_NAME
FROM CORPORATE
WHERE BP_ID = : h_BP_ID;

14
Guidelines for Layout
 Columns requiring a outer joined should appear on the right hand side of a
WHERE or HAVING clause.

SELECT PF.FACILITY_ID
FROM PAR_FACILITY PF,
PAR_FACILITY_DETAILS PFD
WHERE PF.FACILITY_ID = PFD.FACILITY_ID (+);

15
Guidelines for Layout
 Table aliases should be used in all queries that have more than one table in
the FROM clause. Instead of using A, B, C, T1, T2 etc as aliases, it is better
to use some naming convention which would easily identify the table say
COR for CORPORATE, PF for PAR_FACILITY, PM for PAR_MDR etc.

SELECTCOR.BP_ID,
PF.FACILITY_ID
FROM CORPORATE COR,
PAR_FACILITY PF
WHERE COR.BP_ID = PF.BP_ID;

16
Guidelines for Layout

 As an aid to understanding, WHERE clauses should be laid out with the join
clauses first, and the restriction clauses second.

SELECT PF.FACILITY_ID,
PFD.MATURITY_DATE,
PFD.MATURITY_PERIOD_TYPE,
PFD.MATURITY_PERIOD_VALUE
FROM PAR_FACILITY PF,
PAR_FACILITY_DETAILS PFD
WHERE PF.FACILITY_ID=PFD.FACILITY_ID
AND PF.PRODUCT_GRP_ID=2;

17
SQL Technical Writing
 Unintentionally disabling indexes
 Issues with date related queries
 Implicit Type Conversions
 Investigate Rewriting the Query
 Resource intensive operations
 Know your data …
 Queries without proper joins
 Queries on views
 Data Model Changes

18
Unintentionally disabling of indexes
 Do not modify columns in the WHERE clause. The reason for this is that
Oracle cannot use an index if the column is modified in the where clause.
The following queries would stop usage of indexes.

SELECT *
FROM PAR_LIMIT
WHERE UPPER (LIMIT_TYPE_ID) = ’CCE’;

SELECT *
FROM CORPORATE
WHERE BP_ID + 1 = 21;

Note: At times intentionally disabling of indexes can also be useful. This needs to
decided as per the search condition.
19
SQL Technical Writing
 Do not concatenate columns in the WHERE clause, this also would stop usage of
indexes.
Query :
SELECT*
FROM PAR_LIMIT
WHERE PERIOD_VALUE||PERIOD_TYPE=‘0D’
will not use the index. The query need to written as
Query :
SELECT*
FROM PAR_LIMIT
WHERE PERIOD_VALUE=0
AND PERIOD_TYPE=‘D’

20
SQL Technical Writing
 If a column has a function-based index the WHERE clause also should use
that function.If there is function-based index on LEGAL_NAME using UPPER
function

SELECT*
FROM CORPORATE
WHERE LEGAL_NAME=’ BASF AKTIENGESELLSCHAFT’;

Will not use the index.

SELECT*
FROM CORPORATE
WHERE UPPER (LEGAL_NAME) = ‘BASF
AKTIENGESELLSCHAFT’;

Will use the index.

21
SQL Technical Writing
 For SQL queries involving DATE related queries, do not use TRUNC function and
also BETWEEN in the WHERE clauses. The first one would stop usage of any
index available on that column and second one would result in returning wrong data

 The below query would not use the index (on ANNOTATION_DATE).

SELECT*
FROM PAR_ANNOTATION
WHERE TRUNC (ANNOTATION_DATE) = ‘01-MAR-03’;

 The below query will use the index but would also include ANY records for midnight
of ‘02-MAR-03’.

SELECT*
FROM PAR_ANNOTATION
WHERE ANNOTATION_DATE BETWEEN ‘01-MAR-03’
AND ‘02-MAR-03’;

22
SQL Technical Writing
 The correct way for writing above query would be

SELECT*
FROM PAR_ANNOTATION
WHERE ANNOTATION_DATE >= TO_DATE (‘01-MAR-03
00:00:00’,’DD-MON-YY HH24:MI:SS’)
AND ANNOTATION_DATE < TO_DATE (‘02-
MAR- 03 00:00:00’,’DD-MON-YY HH24:MI:SS’);

23
Implicit Type conversions
 Developers need to be aware of the type conversions that Oracle performs
implicitly. Oracle may choose to convert either the column or the constant. If the
column is chosen then the indexes cannot be used.

The conversion chosen is defined in the following table

Mixed datatype Common Unit Function Chosen


============ =========== ==============
Char with number number TO_NUMBER
Char with rowid rowid TO_ROWID
Char with date date TO_DATE

24
Implicit Type conversions

 The below query would not use the index (ANNOTATION_TYPE which is of
VARCHAR2 datatype is part of PRIMARY KEY).

SELECT *
FROM PAR_ANNOTATION
WHERE ANNOTATION_TYPE = 2;

 For the query to use the index, we need to rewrite the query as

SELECT*
FROM PAR_ANNOTATION
WHERE ANNOTATION_TYPE = TO_CHAR (2);

25
Implicit Type conversions

 In Quartz perspective, while using the MO’s please check the DATATYPE of
the column to which you are associating it in the WHERE clause of the query.
All the MO’s are strings (or CHAR data type) and appropriate function needs
to be chosen in queries to avoid implicit conversions.

26
Investigate Rewriting Of A Query
 When doing any referential validations i.e., foreign key validations use EXISTS
operator. The reason for this is that with the EXISTS operator the oracle kernal
knows that once it has found one match it can stop. It doesn’t have to continue the
Full Table Scan.
The below queries could have been rewritten using EXISTS
Query 1:
SELECTPP.PORTFOLIO_ID
FROM PAR_PORTFOLIO PP,
PAR_PORTFOLIO_DETAILS PPD
WHERE PP.PORTFOLIO_ID=PPD.PORTFOLIO_ID;

Query 2:
SELECTPP.PORTFOLIO_ID
FROM PAR_PORTFOLIO PP
WHERE PP.PORTFOLIO_ID IN (
SELECTPPD.PORTFOLIO_ID
FROM PAR_PORTFOLIO_DETAILS
);

27
Query using Existing Operator
SELECT PP.PORTFOLIO_ID
FROM PAR_PORTFOLIO PP
WHERE EXISTS
(SELECT 1
FROM PAR_PORTFOLIO_DETAILS PPD
WHERE PPD.PORTFOLIO_ID=PP.PORTFOLIO_ID
);

28
Implicit Type conversions

 Using EXISTS would speed up the query if the referential column has an
index on it, else using IN would faster because the inner query is executed
only once, where as for EXISTS it is executed for each record
fetched.

 We also need to check how many records the inner query would return
on using IN. The decision needs to be based on the availability of index
and data distribution.

29
Resource Intensive Operations
 Avoid using DISTINCT, UNION, MINUS, INTERSECT, ORDER BY and
GROUP BY operations as they call upon resource intensive sorts.
 Other ways of writing the queries should be found. Most queries that use the
operators, UNION, MINUS and INTERSECT, can be rewritten in other ways.
UNION ALL can be used instead of UNION.
DISTINCT has been used in the below query to suppress the multiple records
being fetched.

SELECT DISTINCT PL.OWNER_ID,


PL.LIMIT_AMOUNT,
PL.GROSS_EXP_AMOUNT
FROM PAR_LIMIT PL,
PAR_PORTFOLIO_DETAILS PPD
WHERE PL.OWNER_ID = PPD.PORTFOLIO_ID
AND PL.LIMIT_TYPE_ID = ’CCE’
AND PPD.BA_PRODTYPE_ID = 146;

30
Resource Intensive Operations
Can be rewritten to remove the DISTINCT operator

SELECTPL.OWNER_ID,
PL.LIMIT_AMOUNT,
PL.GROSS_EXP_AMOUNT
FROM PAR_LIMIT PL
AND PL.LIMIT_TYPE_ID=’CCE’
AND EXISTS (
SELECT1
FROM PAR_PORTFOLIO_DETAILS PPD
WHERE PPD.PORTFOLIO_ID = PL.OWNER_ID
AND PPD.BA_PRODTYPE_ID = 146
);

31
Know Your Data

 Before using any functions check whether they are really required with
reference to the data on which the queries would be running. The following
cases should be avoided.

UPPER (PAR_LIMIT.LIMIT_TYPE_ID) = ‘CCE’

UPPER (CORPORATE.COUNTRY_RESIDENT) = ‘GB’


 In both the above cases the data for LIMIT_TYPE_ID and
COUNTRY_RESIDENT are in UPPER CASE.
 Use restriction clauses wherever applicable in order to narrow down the
searches.

32
Know Your Data
 For queries, which run only for Single Tile portfolios, which cannot be
clustered, you can add one more clause of

AND
PAR_PORTFOLIO_DETAILS.CLUSTER_PORTFOLIO_LINK IS
NULL

 For queries, on Commercial Banking Facilities use PRODUCT_GRP_ID

AND PAR_FACILITY.PRODUCT_GRP_ID = 2

33
Queries Without Proper joins
 Ensure that queries do not result in Cartesian product. If queries are not properly
joined the query runs into Cartesian Product. This can be at times disastrous also
when you are trying to insert result of a query into a table. The below should be
avoided
 The below query has a missing join with PBAP which returns (Result set of join
between PL & PPD) * (Number of records in PBAP).

SELECT PL.OWNER_ID,
PL.GROSS_EXP_AMOUNT,
PL.NETT_EXP_AMOUNT,
PBAP.SHORT_NAME
FROM PAR_LIMIT PL,
PAR_PORTFOLIO_DETAILS PPD,
PAR_BA_PRODUCTTYPE PBAP
WHERE PL.OWNER_ID = PPD.PORTFOLIO_ID
AND PPD.PORTFOLIO_ID = 5455763
AND PL.LIMIT_TYPE_ID = ‘CCE’

34
Queries Without Proper joins
 The below query would return (Number of records in COR) * (Number of records in
PF)
SELECT COUNT (*)
FROM CORPORATE COR,
PAR_FACILITY PF;

 The below query would return (Number of records in COR)

SELECT COUNT (*)


FROM CORPORATE COR,
PAR_FACILITY PF
WHERE PF.FACILITY_ID = 37065;

 The below query would return millions of duplicate records

SELECT COR.BP_ID
FROM CORPORATE COR,
PAR_PARTICIPATING_BRANCH PPB;

35
Queries Without Proper joins
 All above examples can result in drastic performance issues for the
application when used in REPORTS, BATCHES, VIEWS and all UI related
queries.
 Such also result in duplicate records, which are normally overlooked by using
DISTINCT in the queries. DISTINCT works well in most of the cases as a
workaround but has very bad impact on performance. Imagine doing a
distinct on few millions of records.

36
Queries On Views
 Do not use an existing view as a part of the query. This is because the view
itself needs to be executed first.

CREATE OR REPLACE VIEW V_PAR_CP_RESP(


BP_ID,
SHORT_NAME,
RESPONSIBILITY)
AS SELECT
COR.BP_ID,
COR.SHORT_NAME,
SC.MNMNC

FROM CORPORATE COR,


SYSTEM_CODES SC
WHERE COR.RESPONSIBILITY=SC.CODE_VAL(+)
AND SC.CODE_ID(+)=126481

37
Queries On Views
CREATE OR REPLACE VIEW V_PAR_CP_TYPE(
BP_ID,
SHORT_NAME,
CP_TYPE)
AS SELECT
COR.BP_ID,
COR.SHORT_NAME,
SC.MNMNC
FROM CORPORATE COR,
SYSTEM_CODES SC
WHERE COR.TYPE=SC.CODE_VAL(+)
AND SC.CODE_ID(+)=126028

38
Instead of Writing a Query
The below query does a Full Table Scan of CORPORATE thrice.

SELECT COR.BP_ID,
COR.SHORT_NAME,
VPCR.RESPONSIBILITY,
VPCT.CP_TYPE
FROM CORPORATE COR,
V_PAR_CP_RESP VPCP,
V_PAR_CP_TYPE VPCT
WHERE COR.BP_ID = VPCP.BP_ID
AND VPCP.BP_ID = VPCT.BP_ID;

39
Instead of Writing a Query
Alternate and efficient way of rewriting the above query

SELECT COR.BP_ID,
COR.SHORT_NAME,
SC_RESP.MNMNC,
SC_TYPE.MNMNC
FROM CORPORATE COR,
SYSTEM_CODES SC_RESP,
SYSTEM_CODES SC_TYPE
WHERE COR. RESPONSIBILITY = SC_RESP.CODE_VAL
(+)
AND SC_RESP. CODE_ID (+) = 126481 /*
d_ParResponsibilityId */
AND COR. TYPE = SC_TYPE.CODE_VAL (+)
AND SC_TYPE.CODE_ID (+) = 126028 /*
d_ParCounterpartytyp */

40
Data Model Changes
 In certain cases, we also need to look at the possibility of changes data
model. There can be few cases where changing table attributes can also
result in efficient queries.
 PAR_ANNOTATION table also holds the history of annotations (of
counterparty, facility and portfolio). Queries find the latest annotation using
MAX function for ANNOTATION_DATE and ANNOTATION_TIME. Since the
table is a big one, doing a MAX and GROUP BY makes queries slower.

 These queries can be made much faster by adding an extra column like
LATEST_FLAG and the UI services update latest annotation with Y and older
ones with N and queries picking up only the Y ones could result in much
better performances.

41
Index Design

 Index Selectivity
 Implicit Indexes
 Concatenated Indexes
 Index Merges
 NULL Values
 Functional Indexes
 Foreign Keys

42
Index Selectivity
 The selectivity of a column or group of columns is a common measure of the
usefulness of an index. Columns or indexes are selective if they have a large
number of unique values or few duplicate values.
 From the below distribution of TYPE of CORPORATE table, TYPE is a non-
selective column.
TYPE COUNT(*)
1 105216
2 69086
3 155940
 Selective indexes are more efficient than nonselective indexes because they
point more directly to specific values.

43
Unique Indexes
 A unique index is one that prevents any duplicate values for the columns that
make up the index.
 A unique index is typically created in order to prevent duplicate values rather
than to improve performance. However, unique index columns are often
efficient-they point exactly one row and are therefore very selective.
 In PAR_ANNOTATION table a unique index can be (OWNER,
ANNOTATION_TYPE, ANNOTATION_DATE, ANNOTATION_TIME).

44
Implicit Indexes
 Implicit indexes are created automatically by Oracle in order to implement
either a primary key or unique constraint.
 BP_ID is PRIMARY KEY for CORPORATE table, hence it is also an implicit
index.
 In addition to the PRIMARY KEY, there may be a number of candidate keys
that are also unique. Oracle allows you to define these keys (and also foreign
keys) when you create a table.
 In order to implement either a PRIMARY or UNIQUE constraint, Oracle
creates a unique index. In addition to enforcing uniqueness, these indexes
are available to enhance the performance of queries.

45
Concatenated Indexes
 A concatenated index is simply an index comprising of more than one
column.
 It is often more selective than a single key index.
 The combination of columns will point to a smaller number of rows than
indexes composed of the individual columns.
 A concatenated index that contains all of the columns referred in an SQL
statement’s WHERE clause will usually be very effective.
 For querying any limits associated with a PORTFOLIO or FACILITY, its
common to query OWNER_ID, OWNER_TYPE, PERIOD_TYPE and
PERIOD_VALUE from PAR_LIMIT. Hence all these columns qualify for a
concatenated index.

46
Choosing the Best Column Order
 The commonly used column in the queries should be the initial or leading or
earliest columns.

 If a concatenated index is available on (OWNER, ANNOTATION_TYPE,


ANNOTATION_DATE, and ANNOTATION_TIME) then for the following
queries:

Would use the index

SELECT *
FROM PAR_ANNOTATION
WHERE OWNER = 6378859
AND ANNOTATION_TYPE=1
AND ANNOTATION_DATE=’01-MAR-03’
AND ANNOTATION_TIME= 188322;

47
Choosing the Best Column Order
Will not use the index.

SELECT *
FROM PAR_ANNOTATION
WHERE ANNOTATION_TYPE=1
AND ANNOTATION_DATE=’01-MAR-
03’
AND ANNOTATION_TIME= 188322;

Note: OWNER is the leading column

48
Index Merges
 If more than one column from a table appears in the WHERE clause and
there is no concatenated index on the columns concerned but there are
indexes on the individual columns, then Oracle may decide to perform an
index merge.
 In order to perform an index merge, Oracle retrieves all rows from each
index with matching values and then merges these two lists or result sets and
returns only those that appear in both the lists.
 Performing index merges is almost always less efficient than the equivalent
concatenated index. If you see an index merge (shown in execution plans
with AND EQUALS operator), consider creating an appropriate concatenated
index.

49
NULL Values
 NULLs are not indexed. When an indexed column is NULL, or when all
columns in a concatenated index are NULL, then the row concerned will not
have entry in the index. If you are using IS NULL in WHERE clause, it would
result in full-table scan.

 Creating an index on CLUSTER_PORTFOLIO_LINK which is NULL for all


single tile portfolios would not be useful if you have in the WHERE clause for
querying single tile portfolios IS NULL constraint.

SELECT *
FROM PAR_PORTFOLIO_DETAILS
WHERE CLUSTER_PORTFOLIO_LINK IS NULL.

Would result in Full Table Scan.

50
Functional Indexes
 Functional indexes allow you to create an index on a function or expression.
 The function or expression must be deterministic, which means that if a given
the same inputs it must always return the same outputs regardless of
changes in the environment or the date and time.

The following is a legal index definition

CREATE INDEX COR_LEGAL_NAME_INDX ON CORPORATE


(UPPER (LEGAL_NAME));

51
Functional Indexes
 The following is invalid because the value of the F_CRTD_ON change over
time:

CREATE OR REPLACE FUCNTION F_CRTD_ON (argDate DATE) IS


BEGIN
RETURN (SYSDATE – argDate);
END;

CREATE INDEX PL_CRTD_ON_INDX ON PAR_LIMIT


(F_CRTD_ON (CRTD_ON));

52
Foreign Keys
 You can declare referential integrity constraints to prevent rows being
inserted into detail (or child) tables that do not have a matching row in a
master (or parent table). This will also reduce unnecessary searches in
queries over data, which is not valid.

ALTER TABLE CORPORATE


ADD CONSTRAINT CORPORATE_FK
FOREIGN KEY (COUNTRY_RESIDENT)
REFERENCES COUNTRY_DIRECTORY
(CNTRY_CODE_ALPHBTC);

Note: All FOREIGN keys need to be indexed.

53
Understanding Different Join Methods
 Joins
 Join Order
 How Join Method Affects Join Order
 Illustrations

54
Joins
 Joins allow rows from two or more tables to be merged, usually based on
common key values.
 Oracle supports three join techniques
 The SORT-MERGE join
 The NESTED Loop Joins
 The HASH Join

55
The SORT MERGE Join
 The sort-merge join is a join method that does not require indexes. Oracle
sorts each table (or result set from a previous operation) on the column
values used to join the two tables. Oracle then merges the two sorted result
sets into one.
E.g. SELECT PL.OWNER_ID,
PL.LIMIT_AMOUNT,
CURR.NAME
FROM PAR_LIMIT PL,
CURRENCY_CIRECTORY CURR
WHERE
PL.LIMIT_CURR=CURR.CRNCY_CODE_ALPHBTC;
There are no indexes on LIMIT_CURR and CRNCY_CODE_ALPHBTC.

56
The NESTED LOOPS Join
 The nested loops join is a join algorithm that usually involves an index on at
least one of the tables.
 In a nested loops join, a full-table scan is done on one of the tables
(depends on the join-conditions applied) or result sets (probably either the
smaller table or the table or the table that does not have an index on the join
column). For every row found in the result set, a lookup-usually involving an
index-is performed on the second table and the matching row is retrieved.
 For a nested loop join ensure that the index used for the join contains as
many columns as possible from the WHERE clause.

57
The HASH Join
 In a hash join, a hash table (a sort of on-the-fly index) is constructed for the
larger of the two tables. The smaller table is then scanned, and the hash
table is used to find the matching rows in the larger table.
 This can work very well, especially if the hash table can fit in memory. It
performs well if the two tables are of different sizes.

58
Join Order
 The driving table-the first table in the join order-should be the one that has
the most selective and efficient WHERE clause condition. That is, eliminate
any rows from the result sets as early in the join order as possible. This will
reduce the number of rows returned in each step.
 If you are joining small subsets of the tables involved, try to use nested loops
for each subsequent join provided there is a supporting index.
 Make sure the indexes supporting the nested loops join contain all of the
columns in the WHERE clause for the table being joined.

59
How Join Method Effects Join Order
 If joining using the SORT-MERGE method, the join order is irrelevant. Each
table must be fully read, sorted, and merged. The order in which tables
appear in the FROM clause or which table is referenced in the USE_MERGE
hint is irrelevant.Both the queries will have same performance on changing
join order and also table referenced in USE_MERGE hint.

SELECT /*+ USE_MERGE (COUN) USE_MERGE (CURR) */


COUN.CNTRY_CODE_ALPHBTC,
COUN.CNTRY_NAME,
CURR.CRNCY_CODE_ALPHBTC,
CURR.NAME
FROM COUNTRY_DIRECTORY COUN,
CURRENCY_DIRECTORY CURR
WHERE
COUN.CRNCY_CODE_ALPHBTC=CURR.CRNCY_CODE_ALPHBTC;

60
How Join Method Effects Join Order

SELECT /*+ USE_MERGE (COUN) USE_MERGE (CURR) */


COUN.CNTRY_CODE_ALPHBTC,
COUN.CNTRY_NAME,
CURR.CRNCY_CODE_ALPHBTC,
CURR.NAME
FROM CURRENCY_DIRECTORY CURR,
COUNTRY_DIRECTORY COUN
WHERE
COUN.CRNCY_CODE_ALPHBTC=CURR.CRNCY_CODE_ALPHBTC;

61
How Join Method Effects Join Order

 If joining using index-based NESTED loops, the second table should be


smaller of the two. This is because the overhead of performing N index
lookups is higher than the overhead of full-table scanning of N rows.
 If joining using the HASH-JOIN method, the second table should also be the
smaller of the two. The hash-join algorithm works best when the hash table
fits into memory and is on the smaller of the two tables. However, if the
second table is too small, then this advise can be misleading, since the table
might be too small to get any benefit from being used.

62
ILLUSTRATION
The size of each table is

PAR_LIMIT 44,97,342 records


PAR_PORTFOLIO_DETAILS 63,122 records
PAR_BA_PRODUCTTYPE 110 records

63
ILLUSTRATION
Query 1:
SELECT PL.OWNER_ID,
PL.PERIOD_VALUE,
PL.PERIOD_TYPE,
PL.LIMIT_TYPE_ID,
PL.LIMIT_AMOUNT,
PBAP.SHORT_NAME
FROM PAR_LIMIT PL,
PAR_PORTFOLIO_DETAILS PPD,
PAR_BA_PRODUCTTYPE PBAP
WHERE PL.OWNER_TYPE=1
AND PL.OWNER_ID = PPD.PORTFOLIO_ID
AND PPD.CLUSTER_PORTFOLIO_LINK IS
NULL
AND
PPD.BA_PRODTYPE_ID=PBAP.BA_PRODTYPE_ID;
64
ILLUSTRATION
Execution Plan
SELECT STATEMENT
HASH JOIN
TABLE ACCESS (FULL) OF 'PAR_BA_PRODUCTTYPE’
MERGE JOIN
SORT (JOIN)
TABLE ACCESS (FULL) OF
'PAR_PORTFOLIO_DETAILS’
SORT (JOIN)
TABLE ACCESS (FULL) OF 'PAR_LIMIT'

65
Execution Plan
 The plan shows that PAR_PORTFOLIO_DETAILS, PAR_LIMIT are having
FULL table scan and the join the optimizer id taking is MERGE-SORT and
the result set is having a HASH-JOIN with PAR_BA_PRODUCTTYPE.
Here the query performance was found to be very bad. It took about 4-5
minutes to get the result set.

 The poor performance can be judged as all tables are having a full table
scan and also for PPD and PL are undergoing a MERGE-SORT which is
very much inefficient due to bigger sizes of the tables. HASH JOIN seems
to be OK on PAR_BA_PRODUCTTYPE as this one is a small table and
FULL table scan would be faster and also less memory required for a hash
table.

 But we also observe that though joins are all on indexed columns none are
being used. This is a good example where optimizer gave a not really a
optimum plan (All tables were analyzed before taking this plan).

66
ILLUSTRATION
Query 2:

 To improve the performance, first step to take would be to avoid MERGE-


SORT (HASH is totally ruled out as PPD and PL are too big tables and lot of
memory is required for hash tables). So Nested loop join was used for PPD
along with ORDERED hint.

67
ILLUSTRATION
SELECT /*+ ORDERED USE_NL (PPD) */
PL.OWNER_ID,
PL.LIMIT_AMOUNT,
PL.PERIOD_VALUE,
PL.PERIOD_TYPE,
PL.LIMIT_TYPE_ID,
PBAP.SHORT_NAME
FROM PAR_LIMIT PL,
PAR_PORTFOLIO_DETAILS PPD,
PAR_BA_PRODUCTTYPE PBAP
WHERE PL.OWNER_TYPE=1
AND PL.OWNER_ID = PPD.PORTFOLIO_ID
AND PPD.CLUSTER_PORTFOLIO_LINK IS NULL
AND PPD.BA_PRODTYPE_ID=PBAP.BA_PRODTYPE_ID;

68
Execution plan
SELECT STATEMENT
HASH JOIN
TABLE ACCESS (FULL) OF 'PAR_BA_PRODUCTTYPE’
NESTED LOOPS
TABLE ACCESS (FULL) OF 'PAR_LIMIT’
TABLE ACCESS (BY INDEX ROWID) OF 'PAR_PORTFOLIO_DETAILS’
INDEX (RANGE SCAN) OF 'PAR_PORTFOLIO_DETAILS_PK’

 Now you can see that PRIMARY KEY on PAR_PORTFOLIO_DETAILS is being


used. Hence the query would result in a better performance.
 The HASH JOIN on PBAP is fine as this one is a small table. This can be avoided
using USE_NL on PBAP also.

69
Analyzing the Query Plan

 EXPLAIN PLAN
 How to get the EXECUTION PLAN for the query?
 How to read the plan?

70
Explain Plan
 Queries can be analyzed using EXPLAIN PLAN Command
 The EXPLAIN PLAN command displays the execution plan chosen by the
Oracle optimizer for SELECT, UPDATE, INSERT, and DELETE statements.
 A statement's execution plan is the sequence of operations that Oracle
performs to execute the statement. By examining the execution plan, you can
see exactly how Oracle executes your SQL statement.

 EXPLAIN PLAN results alone cannot tell you which statements will perform
well, and which badly. For example, just because EXPLAIN PLAN indicates
that a statement will use an index does not mean that the statement will run
quickly. The index might be very inefficient!

 Use EXPLAIN PLAN to determine the access plan, and to test modifications
to improve the performance.

71
Explain Plan

 EXPLAIN PLAN tells you the execution plan the optimizer would choose if it
were to produce an execution plan for a SQL statement at the current time,
with the current set of initialization and session parameters.

 But this is not necessarily the same as the plan that was used at the time the
given statement was actually executed. The optimizer bases its analysis on
many pieces of data--some of which may have changed!

72
How to get the EXECUTION PLAN for the query?
 TOAD has the facility to see the query execution plan.
 If you are not having the TOAD, then on the SQL Plus session issue the
following command
 SQL > SET AUTOTRACE TRACEONLY EXP
 Run your query.

73
How to Read the Plan?
 It is easier to form a tree structure of the plan where each node is a statement
to be executed.
 Then start reading from bottom to top.

74
Execution Plan
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=56132
card=62639 Bytes=2818755)
 1 0 HASH JOIN (Cost=56132 Card=62639 Bytes=2818755)
 2 1 TABLE ACCESS (FULL) OF 'PAR_BA_PRODUCTTYPE' (Cost=1
Card =110 Bytes=2090)

 3 1 MERGE JOIN (Cost=56097 Card=62639 Bytes=1628614)


 4 3 SORT (JOIN) (Cost=568 Card=62639 Bytes=626390)
 5 4 TABLE ACCESS (FULL) OF 'PAR_PORTFOLIO_DETAILS'
(Cost =55 Card=62639 Bytes=626390)
 6 3 SORT (JOIN) (Cost=55529 Card=2260835 Bytes=36173360)
 7 6 TABLE ACCESS (FULL) OF 'PAR_LIMIT' (Cost=15481 Card=
2260835 Bytes=36173360)

75
Tree Diagram

0 SELECT STATEMENT

1 HASH JOIN

TABLE ACCESS (FULL) 2 3 MERGE JOIN


OF
PAR_BA_PRODUCTTYPE

SORT (JOIN) 4 6 SORT (JOIN)

TABLE ACCESS (FULL) OF 5 7 TABLE ACCESS (FULL)


PAR_PORTFOLIO_DETAILS OF PAR_LIMIT

76
SQL Over Database Links
 A database link defines a connection from one database instance to another.
 Used in conjunction with synonyms, database links can make remote objects
appear to be local as far as applications and users are concerned.

77
Database links
 A database link created in DB_A connecting to DB_B, can allow users in
DB_A to see remote objects as if they were local.

DB_A DB_B

Table T1 Database Link (unidirectional) Table T1

78
Design Guidelines

 Interdatabase Referential Integrity


 Naming Conventions
 Distributed Queries and Transactions
 Control the Driving Site
 Control Index Usage

79
Interdatabase referential integrity

 Avoid interdatabase referential integrity:


 DDL operations are not allowed on a remote database.
 It makes no sense, for example, to separate tables with parent-child
relationships into two separate databases.

80
Interdatabase referential integrity
 If in certain cases this is unavoidable, say CUSTOMERS table exists at
Headquarters database and ORDERS being booked at regional site, then
interdatabase referential integrity can achieved by
Method 1:
 Use of BEFORE ROW trigger on the ORDERS table that checks for the
existence of a corresponding CUSTOMER_ID record in the remote
CUSTOMERS table whenever a record is inserted.
Drawbacks:
 Dependency on availability of remote database always. To avoid this
dependency to some extent maybe you can postpone the validation process
till the database becomes available using some STATUS columns.

81
Interdatabase referential integrity

Method 2:
 Create a local snapshot of the master table using WITH PRIMARY KEY
clause. Snapshots can be refreshed at regular intervals of time.

82
Naming Conventions
 Keep names unique across all schema in the distributed environment. This is
not a requirement, but it makes administration less confusing.
Example:
COUNTRY and COUNTRIES can be a very POOR choice.
Why?
 Because this name is likely to be common to more than one application and
therefore several schema. When tables are replicated, either as master
tables or snapshots, it may not be possible to create identical public
SYNONYMS in all the databases.

83
Naming Conventions
 You can avoid name space collisions by prefixing table names with two or
three characters that associate them with a given application.
 e.g., COUNTRIES table that the order entry system uses would become
OE_COUNTRIES.
 Check the length of table names. Various replication-related objects have
generated names that are based on a table name such as the
SNAP$_CUSTOMERS, If the generated name ends up exceeding more than
30 characters, then the name needs to be truncated, which may result in
naming conflicts.
 Keep table names short – never more than 23 characters

84
Distributed Queries and Transactions
Distributes queries fall into two categories:
 Queries whose target tables are all located in single remote site also called
remote queries.
 When an application initiates a remote query, the optimizer recognizes that
the tables involved are in a single database, and it passes the entire query to
the remote database, which executes it with the same efficiency as a local
query. The remote database then sends that data back.
 There are really NO optimisations that you can make on a remote query,
although if you find that your application is performing a high number of
remote queries you might consider relocating the data to the local database
either in its entirety or as snapshots, thereby reducing the network traffic.

85
Distributed Queries and Transactions
Queries whose target tables are located at multiple sites.
Example:
SELECT CUSTOMER_NAME,
TO_CHAR(ORDER_DATE,’Month’) month,
SUM(ORDER_TOTAL) monthly_sales
FROM CUSTOMERS@DB_LINK_1 C,
ORDERS@DB_LINK_2 O
WHERE C.CUSTOMER_ID = O.CUSTOMER_ID
GROUP BY CUSTOMER_NAME,
TO_CHAR(ORDER_DATE, ‘Month’),
TO_CHAR(ORDER_DATE,’MM’)
ORDER BY CUSTOMER_NAME, TO_CHAR (ORDER_DATE, ‘MM’);

86
Distributed Queries and Transactions
 Oracle must first determine which columns are in which tables by querying
the data dictionaries at DB_LINK_1 and DB_LINK_2. Then it retrieves the
data from these two remote databases. These queries return all records from
the remote tables.

87
Distributed Queries and Transactions

 In DB_LINK_1
SELECT “CUSTOMER_ID”,”CUSTOMER_NAME” FROM “CUSTOMERS” C
 In DB_LINK_2
SELECT “CUSTOMER_ID”,”ORDER_TOTAL” FROM “ORDERS” O
 Oracle performs joins and aggregations of this data at the site at which the
query originated, also known as driving site.

88
Control the Driving Site
 Be mindful of how much data is being shipped to the driving site. Whenever
possible, originate distributed queries from the database that holds the most target
data, or use the DRIVING_SITE hint to force a specific site to be the driving site.
 In previous query, we could force the driving site to be DB_LINK_2 since ORDERS
table is bigger than CUSTOMERS table:
SELECT /*+ DRIVING_SITE (ORDERS) */
CUSTOMER_NAME,
TO_CHAR(ORDER_DATE,’Month’) month,
SUM(ORDER_TOTAL) monthly_sales
FROM CUSTOMERS@DB_LINK_1 C,
ORDERS@DB_LINK_2 O …
 This hint forces the site with the ORDERS table to perform the joins and
aggregations the query requires.

89
Control the Driving Site
 Another way to force a specific site to be the driving site is to create views of
remote data at the desired site. The views tricks the optimiser into thinking
that the table is at the site.
For same example,
At DB_LINK_2
CREATE VIEW CUSTOMERS_VIEW AS
SELECT CUSTOMER_ID, CUSTOMER_NAME
FROM CUSTOMERS@DB_LINK_2

90
Control the Driving Site
And at the site originating the query
SELECT
CUSTOMER_NAME,
TO_CHAR(ORDER_DATE,’Month’) month,
SUM(ORDER_TOTAL) monthly_sales
FROM COUSTOMERS_VIEW@DB_LINK_2 C,
ORDERS@DB_LINK_2 O

91
Control Index Usage

 The only way that Oracle will even consider the use of indexes in distributed
queries is if the database uses the cost-based optimizer, if your database is
configured to use the rule-based optimizer, your distributed queries are
performing full table scans.

92
Partitioning

 Partitioning is the ability of the database to take very large tables or indexes
and physically break them into smaller, more manageable pieces.

93
Partitioning Schemes
 Range-Partition Tables
 Hash Partitioning
 List Partitioning (New in Oracle 9i)

94
Range Partitioning Tables
 Simply breaking up data based on ranges. Performance is best when data is
evenly distributed within each range, so your KNOWLEDGE of DATA is very
important.
CREATE TABLE DEPT (
DEPTNO NUMBER (2),
DEPT_NAME VARCHAR2 (30))
PARTITION BY RANGE (DEPTNO)
(
PARTITION D1 VALUES LESS THAN (10) TABLESPACE DEPT1,
PARTITION D2 VALUES LESS THAN (20) TABLESPACE DEPT2,
PARTITION D1 VALUES LESS THAN (MAXVALUE) TABLESPACE
DEPT3);

95
In KS Project
CREATE TABLE KS_PD_DY_GCO
(
BP_ID NUMBER(12) not null,
BUSINESS_DATE DATE not null,
…..
RAROC_DATE DATE,
PARTITION_KEY NUMBER(2)
)
PARTITION BY RANGE (PARTITION_KEY)
(
PARTITION KS_PD_DY_GCO_P1 VALUES LESS THAN (2),
PARTITION KS_PD_DY_GCO_P2 VALUES LESS THAN (3),
PARTITION KS_PD_DY_GCO_P3 VALUES LESS THAN (4),
PARTITION KS_PD_DY_GCO_P4 VALUES LESS THAN (5),
PARTITION KS_PD_DY_GCO_P5 VALUES LESS THAN (6),
PARTITION KS_PD_DY_GCO_P6 VALUES LESS THAN (7));
96
Range Partitioning (Multicolumn)

 In addition to using single columns to range-partition data, you can also use
multi columns that more commonly map to business processes.
 The following example shows how to extend simple range partitioning to
multiple columns that make up quarterly data-a perfect application for multi-
column partitioning. The columns sale_yr, sale_mth and sale_day are the
ranges need to specify a given quarter.

97
Range Partitioning (Multicolumn)
CREATE TABLE CUST_SALES (
ACCT_NO NUMBER (5),
CUST_NAME CHAR (30),
SALE_DAY INTEGER NOT NULL,
SALE_MTH INTEGER NOT NULL,
SALE_YR INTEGER NOT NULL)
PARTITION BY RANGE (SALE_YR, SALE_MTH, SALE_DAY)
(
PARTITION CUST_SALES_Q1 VALUES LESS THAN (1998, 04, 01) TABLESPACE USERS1,
PARTITION CUST_SALES_Q2 VALUES LESS THAN (1998, 07, 01) TABLESPACE USERS2,
PARTITION CUST_SALES_Q3 VALUES LESS THAN (1998, 10, 01) TABLESPACE USERS3,
PARTITION CUST_SALES_Q4 VALUES LESS THAN (1999, 01, 01) TABLESPACE
USERS4);

98
Hash Partitioning
 For hash partitioning, you need only specify the number of partitions and
where you want them to be physically placed;
 Oracle database then automatically inserts rows into partitions based on
hash of the partition key.
 It is important to specify a power of 2 for number of hash partitions (define 2
(power n) partitions); otherwise uneven distributions of rows may occur.
 You can specify the storage for the whole table and the tablespace for the
partitions.
 Local indexes are always equi partitioned (on the same partitioning key and
partition break points).

99
Example
 Here’s an example of a hash-partitioned table that splits the table into four
parts based on the hash of the partition key, acct_no.

CREATE TABLE CUST_SALES_HASH (


ACCT_NO NUMBER (5),
CUST_NAME CHAR (30))
PARTITION BY HASH (ACCT_NO) PARTITIONS 4
STORE IN (USERS1, USERS2,
USERS3,
USERS4);

100
List Partitioning(New in Oracle9i)
 List portioning is new with Oracle9i. It allows for a more flexible means of
partitioning when you understand your data very well and can balance it
equally by segmenting data into lists of values. The example below partitions
the table based on lists of department names.
 List partitioning is similar to range partitioning, but note that there is no
“maxvalue” option here – if a value other than those specified in the list is
inserted, Oracle database will reject that record.

101
List Partitioning(New in Oracle9i)
CREATE TABLE DEPT_PART (
DEPTNO NUMBER (2),
DNAME VARCHAR2 (14),
LOC VARCHAR2 (13))
PARTITION BY LIST (DNAME)
(
PARTITION D1_EAST VALUES (‘NEW YORK’),
PARTITION D2_WEST VALUES (‘SAN FRANCISCO’, ‘LOS
ANGELES’),
PARTITION D3_SOUTH VALUES (‘ATLANTA’,’DALLAS’,’HOUSTON’),
PARTITION D4_NORTH VALUES (‘CHICAGO’,’DETROIT’));

102
Schema Design
Fundamental Schema Design Principles
Let the Database Enforce Data Integrity
 Data Integrity will be compromised at some point in time
 Server-enforced integrity is blindingly fast
 The database provides more information

103
Schema Design
 Data Integrity will be compromised at some point in time.Virtually every
developed system that chooses to enforce foreign key outside the database
has orphaned child rows (child rows without parents).
 Check the following query in such systems, you will find it returns some rows
SELECT foreign_key_columns
FROM child_table
MINUS
SELECT primary_key_columns
FROM parent;
 You might also find NULL values where none should be and data that does
not conform to your business rules (out-of-range data for example). These
arise from an inconsistent application of business rules.

104
Schema Design
 Server-enforced integrity is blindingly fast.Otherwise, imagine code that
needs to be written on client side for every integrity to be maintained.

 The database provides more information. By keeping the integrity rules in


the database, your system is infinitely more self-documenting. A simple query
tells you what relates to what and how.

105
Use the Correct Database
DO NOT
 Use a string to store dates or times
 Use a string to store numbers
 Use VARCHAR2(4000) to store all strings
 Use CHAR(2000) to store all strings, wasting tons of space and forcing the
use of a lot of TRIM function calls
 Put text in a BLOB (raw) type

106
Simple Rule
 Put dates in dates, numbers in numbers, and strings in strings. Never use a
datatype to store something other than what it was designed for, and use the
most specific type possible.
 Furthermore, only compare dates to dates, strings to strings, and numbers to
numbers.

107
Simple Rule
When dates and numbers are stored in strings, or stored using inappropriate
lengths, your system suffers:
 You lose the edit upon insertion to the database, verifying that your dates are
actual dates and numbers are valid numbers.
 You lose performance
 You potentially increase storage needs
 You definitely decrease data integrity

108
Common Errors

Common Errors due to incorrect use of datatypes


 ORA-01722: invalid number
 ORA-01858: a non-numeric character was found where a numeric was
expected.

109
How the Data Integrity Suffers?

 Systems that use strings for dates or numbers will have some records with
dates that are not valid and numbers that are not numbers.
 Functions written to convert strings to dates return NULL when the dates
won’t convert.
01/02/03 is what?
YY/MM/DD or DD/MM/YY

110
How the Performance Suffers?

 Dates in strings vs Dates in Dates: The first one during any comparison is a
search of characters and second one oracle understands it is between two
dates.
 Numbers in dates needs to be handled using TO_NUMBER functions, else
comparisons are again character comparison.
 Oracle can go in for explicit function conversions. Oracle might choose to
either convert a column/ value (We have already talked about it).

111
How to Increase storage Requirements?

 Use most specific data type and data length.


 VARCHAR2(20) vs VARCHAR2(2000) – what is the difference? How does it
matter?
 Common statements that flow around
“Oracle just stores the length, etc … “
“No Overheads !!!…”

112
Bottlenecks

 What is going to happen when users pull this in a query that formats each
field based on width of the column in the database?
“They will see one column and need to scroll way over to see the second, the
third, and so on”
 Say developers start to build a data-entry screen to put data into the
database.
“That code filed can be 4,000 characters long and that first name can be
4000 characters long!”
“Sorting of data???”

113
Bottlenecks

 A code prepares a query that selects 10 columns that are VARCHAR2. Say,
developers like to fetch them into an Array of say 100 rows (very typical).
 RAM allocation = 4000 * 10 * 100 = 4 MB (!!!)
 Can we calculate for 1000 rows/ 10,000 rows?
 A CHAR(2000) will consume 2000 bytes of storage whether you put in a
letter a, the string ‘hello world’ or 2,000 characters. A CHAR is always blank-
padded.
 In above case if you happen to use an index – storage-related problems?

114
Analytical Functions

 Analytical functions make certain computations possible which used


to be impossible to in SQL or which was very inefficient to perform
using normal SQL.
An example
“We can compute values in a row, as in
SELECT C1+C2 FROM TABLE
Analytics allows us to compute values across rows
Something like aggregate and also show with details like Aggregate
limits for a counterparty but display along with the facilities as well.

115
Analytical Functions – Details and Summary Data
Counterparty 21 and 23 has 4 and 2 facilities and corresponding
limits as given below

BP_ID FACILITY_ID LIMIT_AMOUNT


21 10000 100
21 10001 200
21 10002 300
21 10003 400
23 20000 500
23 20001 600

116
Bottlenecks
 Requirement: List sum of limits against a counterparty and also display facilities, i.e., sort of summary and detail as
well.

SELECT BP_ID, SUM(LIMIT_AMOUNT)


FROM TEMP
GROUP BY BP_ID
Would give

BP_ID SUM(LIMIT_AMOUN
T)
21 1000 Cannot display facilities !

23 1100

117
Advantages
Using analytical functions it becomes easier
SELECT FACILITY_ID, BP_ID,
SUM(LIMIT_AMOUNT) OVER (PARTITION BY
BP_ID) AS SUM_LIM
FROM TEMP
Would give
FACILITY_ID BP_ID SUM_LIM
10000 21 1000
PARTITION identifies rows
10001 21 1000
to aggregate. Rows must
10002 21 1000
have the same BP_ID as
10003 21 1000 the detail row
20000 23 1100
20001 23 1100
118
Analytical Functions – Ranking Results
 Ranking cannot be done just by simple sorting
 Position in sorted list is different from rank in list
 Ties are given different positions but the same rank
BP_ID FACILITY_I LIMIT_AMOUN SORT_ RANK DENSE_
D T ORDER RANK DENSE_RANK
does not skip
21 10000 600 1 1 1 rank due to tie.
21 10001 500 2 2 2
21 10002 300 3 3 3
21 10003 300 4 3 3
23 20000 200 5 5 4
23 20001 100 6 6 5
SELECT BP_ID, FACILITY_ID, LIMIT_AMOUNT,
RANK() OVER (ORDER BY LIMIT_AMOUNT DESC) RANK_VALUES,
DENSE_RANK() OVER (ORDER BY LIMIT_AMOUNT DESC)
DENSE_RANK_VALUES
FROM TEMP 119
Analytical Functions
SELECT BP_ID, FACILITY_ID, LIMIT_AMOUNT,
RANK() OVER (ORDER BY LIMIT_AMOUNT DESC)
RANK_VALUES
FROM TEMP
WHERE RANK_VALUES <=4
Would give
BP_I FACILITY LIMIT_AMOU RAN
D _ID NT K
21 10000 600 1
21 10001 500 2
21 10002 300 3
21 10003 300 3
120
Analytical Functions
SELECT BP_ID, FACILITY_ID, LIMIT_AMOUNT,
DENSE_RANK() OVER (ORDER BY LIMIT_AMOUNT
DESC) DENSE_RANK_VALUES
FROM TEMP
WHERE RANK_VALUES <=4
Would give
BP_I FACILITY_I LIMIT_AMOU RANK
D D NT
21 10000 600 1
21 10001 500 2 RANK doesn’t
21 10002 300 3 return this row
due to tie
21 10003 300 3
23 20000 200 4
121
Analytical Functions
TOP 5 employees in terms of hours worked
SELECT *
FROM ( SELECT EMP_NO, SUM(HOURS) AS SUM_HRS
FROM TIME_SHEETS
GROUP BY EMP_NO )
WHERE 5 >=
( SELECT COUNT(COUNT(*))
FROM TIME_SHEETS
GROUP BY EMP_NO
HAVING SUM(HOURS)>SUM_HRS )
TIME_SHEETS contain 55,00,000 entries (50,000 employees * 22 days * 5 tasks)
Imagine execution time – Try it, probably you will find query takes hell lot of time (MAY NOT
come out in your lifetime as well)

122
Analytical Function
Use RANK function instead
SELECT *
FROM
(
SELECT EMP_NO, SUM(HOURS) AS SUM_HRS,
RANK() OVER (ORDER BY SUM(HOURS) DESC ) AS RNK
FROM TIME_SHEETS
GROUP BY EMP_NO
)
WHERE RNK <=5
Would give result very quickly

123

You might also like