67 Perforamance Planning
67 Perforamance Planning
67 Perforamance Planning
Release 2 (9.2)
March 2002
Part No. A96532-01
Oracle9i Database Performance Planning, Release 2 (9.2)
Contributors: Jorn Bartels, Maria Colgan, Michele Cyran, Bjorn Engsig, Cecilia Gervasio, Connie Dialeris
Green, Mattias Jankowitz, Peter Kilpatrick, Anjo Kolk, JP Polk, Virag Saksena, Sabrina Whitehouse,
Graham Wood
The Programs (which include both the software and documentation) contain proprietary information of
Oracle Corporation; they are provided under a license agreement containing restrictions on use and
disclosure and are also protected by copyright, patent and other intellectual and industrial property
laws. Reverse engineering, disassembly or decompilation of the Programs, except to the extent required
to obtain interoperability with other independently created software or as specified by law, is prohibited.
The information contained in this document is subject to change without notice. If you find any problems
in the documentation, please report them to us in writing. Oracle Corporation does not warrant that this
document is error-free. Except as may be expressly permitted in your license agreement for these
Programs, no part of these Programs may be reproduced or transmitted in any form or by any means,
electronic or mechanical, for any purpose, without the express written permission of Oracle Corporation.
If the Programs are delivered to the U.S. Government or anyone licensing or using the programs on
behalf of the U.S. Government, the following notice is applicable:
Restricted Rights Notice Programs delivered subject to the DOD FAR Supplement are "commercial
computer software" and use, duplication, and disclosure of the Programs, including documentation,
shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement.
Otherwise, Programs delivered subject to the Federal Acquisition Regulations are "restricted computer
software" and use, duplication, and disclosure of the Programs shall be subject to the restrictions in FAR
52.227-19, Commercial Computer Software - Restricted Rights (June, 1987). Oracle Corporation, 500
Oracle Parkway, Redwood City, CA 94065.
The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently
dangerous applications. It shall be the licensee's responsibility to take all appropriate fail-safe, backup,
redundancy, and other measures to ensure the safe use of such applications if the Programs are used for
such purposes, and Oracle Corporation disclaims liability for any damages caused by such use of the
Programs.
Oracle is a registered trademark, and Oracle Store, Oracle9i, PL/SQL, and SQL*Plus are trademarks or
registered trademarks of Oracle Corporation. Other names may be trademarks of their respective
owners.
Contents
Preface........................................................................................................................................................... vii
iii
Application Modeling ................................................................................................................ 1-25
Testing, Debugging, and Validating a Design........................................................................ 1-25
Deploying New Applications......................................................................................................... 1-27
Rollout Strategies ........................................................................................................................ 1-27
Performance Checklist ............................................................................................................... 1-27
Index
iv
Send Us Your Comments
Oracle9i Database Performance Planning, Release 2 (9.2)
Part No. A96532-01
Oracle Corporation welcomes your comments and suggestions on the quality and usefulness of this
document. Your input is an important part of the information used for revision.
■ Did you find any errors?
■ Is the information clearly presented?
■ Do you need more information? If so, where?
■ Are the examples correct? Do you need more examples?
■ What features did you like most?
If you find any errors or have any other suggestions for improvement, please indicate the document
title and part number, and the chapter, section, and page number (if available). You can send com-
ments to us in the following ways:
■ Electronic mail: [email protected]
■ FAX: (650) 506-7227 Attn: Server Technologies Documentation Manager
■ Postal service:
Oracle Corporation
Server Technologies Documentation
500 Oracle Parkway, Mailstop 4op11
Redwood Shores, CA 94065
USA
If you would like a reply, please give your name, address, telephone number, and (optionally) elec-
tronic mail address.
If you have problems with the software, please contact your local Oracle Support Services.
v
vi
Preface
This book describes ways to improve Oracle performance by starting with good
application design and using statistics to monitor application performance. It
explains the Oracle Performance Improvement Method, as welll as emergency
performance techniques for dealing with performance problems.
This preface contains these topics:
■ Audience
■ Organization
■ Related Documentation
■ Conventions
■ Documentation Accessibility
vii
Audience
Oracle9i Database Performance Planning is a high-level aid for people responsible for
the operation, maintenance, and performance of Oracle. To use this book, you could
be a database administrator, application designer, programmer, or manager. You
should be familiar with Oracle9i, the operating system, and application design
before reading this manual.
Organization
This document contains:
Related Documentation
Before reading this manual, you should have already read Oracle9i Database
Concepts, the Oracle9i Application Developer’s Guide - Fundamentals, and the Oracle9i
Database Administrator’s Guide.
For more information about Oracle Enterprise Manager and its optional
applications, see Oracle Enterprise Manager Concepts Guide and Oracle Enterprise
Manager Administrator’s Guide.
For more information about tuning the Oracle Application Server, see the Oracle
Application Server Performance and Tuning Guide.
Many of the examples in this book use the sample schemas of the seed database,
which is installed by default when you install Oracle. Refer to Oracle9i Sample
Schemas for information on how these schemas were created and how you can use
them yourself.
In North America, printed documentation is available for sale in the Oracle Store at
viii
http://oraclestore.oracle.com/
Customers in Europe, the Middle East, and Africa (EMEA) can purchase
documentation from
http://www.oraclebookshop.com/
If you already have a username and password for OTN, then you can go directly to
the documentation section of the OTN Web site at
http://otn.oracle.com/docs/index.htm
Conventions
This section describes the conventions used in the text and code examples of the this
documentation set. It describes:
■ Conventions in Text
■ Conventions in Code Examples
Conventions in Text
We use various conventions in text to help you more quickly identify special terms.
The following table describes those conventions and provides examples of their use.
ix
Convention Meaning Example
Bold Bold typeface indicates terms that are When you specify this clause, you create an
defined in the text or terms that appear in index-organized table.
a glossary, or both.
Italics Italic typeface indicates book titles, Oracle9i Database Concepts
emphasis, syntax clauses, or placeholders.
You can specify the parallel_clause.
Run Uold_release.SQL where old_release
refers to the release you installed prior to
upgrading.
UPPERCASE Uppercase monospace typeface indicates You can specify this clause only for a NUMBER
monospace elements supplied by the system. Such column.
(fixed-width font) elements include parameters, privileges,
You can back up the database using the BACKUP
datatypes, RMAN keywords, SQL
command.
keywords, SQL*Plus or utility commands,
packages and methods, as well as Query the TABLE_NAME column in the USER_
system-supplied column names, database TABLES data dictionary view.
objects and structures, user names, and
Specify the ROLLBACK_SEGMENTS parameter.
roles.
Use the DBMS_STATS.GENERATE_STATS
procedure.
lowercase Lowercase monospace typeface indicates Enter sqlplus to open SQL*Plus.
monospace executables and sample user-supplied
The department_id, department_name,
(fixed-width font) elements. Such elements include
and location_id columns are in the
computer and database names, net
hr.departments table.
service names, and connect identifiers, as
well as user-supplied database objects Set the QUERY_REWRITE_ENABLED
and structures, column names, packages initialization parameter to true.
and classes, user names and roles,
Connect as oe user.
program units, and parameter values.
The following table describes typographic conventions used in code examples and
provides examples of their use.
x
Convention Meaning Example
[] Brackets enclose one or more optional DECIMAL (digits [ , precision ])
items. Do not enter the brackets.
{} Braces enclose two or more items, one of {ENABLE | DISABLE}
which is required. Do not enter the
braces.
| A vertical bar represents a choice of two {ENABLE | DISABLE}
or more options within brackets or braces.
[COMPRESS | NOCOMPRESS]
Enter one of the options. Do not enter the
vertical bar.
... Horizontal ellipsis points indicate either:
■ That we have omitted parts of the CREATE TABLE ... AS subquery;
code that are not directly related to
the example
SELECT col1, col2, ... , coln FROM
■ That you can repeat a portion of the
employees;
code
. Vertical ellipsis points indicate that we SQL> SELECT NAME FROM V$DATAFILE;
. have omitted several lines of code not NAME
. directly related to the example. ------------------------------------
/fsl/dbs/tbs_01.dbf
/fs1/dbs/tbs_02.dbf
.
.
.
/fsl/dbs/tbs_09.dbf
9 rows selected.
Other notation You must enter symbols other than acctbal NUMBER(11,2);
brackets, braces, vertical bars, and ellipsis
acct CONSTANT NUMBER(4) := 3;
points as it is shown.
Italics Italicized text indicates variables for CONNECT SYSTEM/system_password
which you must supply particular values.
UPPERCASE Uppercase typeface indicates elements SELECT last_name, employee_id FROM
supplied by the system. We show these employees;
terms in uppercase in order to distinguish
SELECT * FROM USER_TABLES;
them from terms you define. Unless terms
appear in brackets, enter them in the DROP TABLE hr.employees;
order and with the spelling shown.
However, because these terms are not
case sensitive, you can enter them in
lowercase.
xi
Convention Meaning Example
lowercase Lowercase typeface indicates SELECT last_name, employee_id FROM
programmatic elements that you supply. employees;
For example, lowercase indicates names
sqlplus hr/hr
of tables, columns, or files.
Documentation Accessibility
Our goal is to make Oracle products, services, and supporting documentation
accessible, with good usability, to the disabled community. To that end, our
documentation includes features that make information available to users of
assistive technology. This documentation is available in HTML format, and contains
markup to facilitate access by the disabled community. Standards will continue to
evolve over time, and Oracle Corporation is actively engaged with other
market-leading technology vendors to address technical obstacles so that our
documentation can be accessible to all of our customers. For additional information,
visit the Oracle Accessibility Program Web site at
http://www.oracle.com/accessibility/
xii
1
Designing and Developing for Performance
Good system performance begins with design and continues throughout the life of
your system. Carefully consider performance issues during the initial design phase,
and it will be easier to tune your system during production.
This chapter contains the following sections:
■ Oracle’s New Methodology
■ Understanding Investment Options
■ Understanding Scalability
■ System Architecture
■ Application Design Principles
■ Workload Testing, Modeling, and Implementation
■ Deploying New Applications
Understanding Scalability
The word scalability is used in many contexts in development environments. The
following section provides an explanation of scalability that is aimed at application
designers and performance specialists.
What is Scalability?
Scalability is a system’s ability to process more workload, with a proportional
increase in system resource usage. In other words, in a scalable system, if you
double the workload, then the system would use twice as many system resources.
This sounds obvious, but due to conflicts within the system, the resource usage
might exceed twice the original workload.
Examples of bad scalability due to resource conflicts include the following:
■ Applications requiring significant concurrency management as user
populations increase
■ Increased locking activities
■ Increased data consistency workload
■ Increased operating system workload
■ Transactions requiring increases in data access as data volumes increase
■ Poor SQL and index design resulting in a higher number of logical I/Os for the
same number of rows returned
■ Reduced availability, because database objects take longer to maintain
An application is said to be unscalable if it exhausts a system resource to the point
where no more throughput is possible when it’s workload is increased. Such
applications result in fixed throughputs and poor response times.
Examples of resource exhaustion include the following:
■ Hardware exhaustion
■ Table scans in high-volume transactions causing inevitable disk I/O shortages
■ Excessive network requests, resulting in network and scheduling bottlenecks
■ Memory allocation causing paging and swapping
■ Excessive process and thread allocation causing operating system thrashing
This means that application designers must create a design that uses the same
resources, regardless of user populations and data volumes, and does not put loads
on the system resources beyond their limits.
Internet Scalability
Applications that are accessible through the Internet have more complex
performance and availability requirements. Some applications are designed and
written only for Internet use, but even typical back-office applications, such as a
general ledger application, might require some or all data to be available online.
Characteristics of Internet age applications include the following:
■ Availability 24 hours a day, 365 days a year
■ Unpredictable and imprecise number of concurrent users
■ Difficulty in capacity planning
■ Availability for any type of query
■ Multitier architectures
■ Stateless middleware
■ Rapid development timescale
■ Minimal time for testing
Required Workload
Time
Figure 1–1 illustrates the classic Internet/e-business and demand growth curve,
with demand growing at an increasing rate. Applications must scale with the
increase of workload and also when additional hardware is added to support
increasing demand. Design errors can cause the implementation to reach its
maximum, regardless of additional hardware resources or re-design efforts.
Internet applications are challenged by very short development timeframes with
limited time for testing and evaluation. However, bad design generally means that
at some point in the future, the system will need to be re-architected or
re-implemented. If an application with known architectural and implementation
limitations is deployed on the Internet, and if the workload exceeds the anticipated
demand, then there is real chance of failure in the future. From a business
perspective, poor performance can mean a loss of customers. If Web users do not
get a response in seven seconds, then the user’s attention could be lost forever.
In many cases, the cost of re-designing a system with the associated downtime costs
in migrating to new implementations exceeds the costs of properly building the
original system. The moral of the story is simple: design and implement with
scalability in mind from the start.
System Architecture
There are two main parts to a system’s architecture:
■ Hardware and Software Components
■ Configuring the Right System Architecture for Your Requirements
Hardware Components
Today’s designers and architects are responsible for sizing and capacity planning of
hardware at each tier in a multitier environment. It is the architect's responsibility to
achieve a balanced design. This is analogous to a bridge designer who must
consider all the various payload and structural requirements for the bridge. A
bridge is only as strong as its weakest component. As a result, a bridge is designed
in balance, such that all components reach their design limits simultaneously.
CPU There can be one or more CPUs, and they can vary in processing power from
simple CPUs found in hand-held devices to high-powered server CPUs. Sizing of
other hardware components is usually a multiple of the CPUs on the system.
I/O Subsystem The I/O subsystem can vary between the hard disk on a client PC and
high performance disk arrays. Disk arrays can perform thousands of I/Os each
second and provide availability through redundancy in terms of multiple I/O paths
and hot pluggable mirrored disks.
Network All computers in a system are connected to a network, from a modem line
to a high speed internal LAN. The primary concerns with network specifications are
bandwidth (volume) and latency (speed).
Software Components
The same way computers have common hardware components, applications have
common functional components. By dividing software development into functional
components, it is possible to comprehend the application design and architecture
better. Some components of the system are performed by existing software bought
to accelerate application implementation or to avoid re-development of common
components.
The difference between software components and hardware components is that
while hardware components only perform one task, a piece of software can perform
the roles of various software components. For example, a disk drive only stores and
retrieves data, but a client program can manage the user interface and perform
business logic.
Managing the User Interface This component is the most visible to application users.
This includes the following functions:
■ Painting the screen in front of the user
■ Collecting user data and transferring it to business logic
■ Validating data entry
■ Navigating through levels or states of the application
Implementing Business Logic This component implements core business rules that are
central to the application function. Errors made in this component could be very
costly to repair. This component is implemented by a mixture of declarative and
procedural approaches. An example of a declarative activity is defining unique and
foreign keys. An example of procedure-based logic is implementing a discounting
strategy.
Common functions of this component include the following:
■ Moving a data model to a relational table structure
■ Defining constraints in the relational table structure
■ Coding procedural logic to implement business rules
Managing User Requests and Resource Allocation This component is implemented in all
pieces of software. However, there are some requests and resources that can be
influenced by the application design and some that cannot.
In a multiuser application, most resource allocation by user requests are handled by
the database server or the operating system. However, in a large application where
the number of users and their usage pattern is unknown or growing rapidly, the
system architect must be proactive to ensure that no single software component
becomes overloaded and unstable.
Managing Data and Transactions This component is largely the responsibility of the
database server and the operating system.
Common functions of this component include the following:
■ Providing concurrent access to data using locks and transactional semantics
■ Providing optimized access to the data using indexes and memory cache
■ Ensuring that data changes are logged in the event of a hardware failure
■ Enforcing any rules defined for the data
The following questions should stimulate thought on architecture, though they are
not a definitive guide to system architecture. These questions demonstrate how
business requirements can influence the architecture, ease of implementation, and
overall performance and availability of a system. For example:
■ How many users will the system support?
Most applications fall into one of the following categories:
– Very few users on a lightly-used or exclusive machine
For this type of application, there is usually one user. The focus of the
application design is to make the single user as productive as possible by
providing good response time, yet make the application require minimal
administration. Users of these applications rarely interfere with each other
and have minimal resource conflicts.
– A medium to large number of users in a corporation using shared
applications
For this type of application, the users are limited by the number of
employees in the corporation actually transacting business through the
system. Therefore, the number of users is predictable. However, delivering
a reliable service is crucial to the business. The users will be using a shared
resource, so design efforts must address response time under heavy system
load, escalation of resource for each session usage, and room for future
growth.
– An infinite user population distributed on the Internet
For this type of application, extra engineering effort is required to ensure
that no system component exceeds its design limits. This would create a
bottleneck that brings the system to a halt and becomes unstable. These
applications require complex load balancing, stateless application servers,
and efficient database connection management. In addition, statistics and
governors should be used to ensure that the user gets some feedback if their
requests cannot be satisfied due to system overload.
■ What will be the user interaction method?
The choices of user interface range from a simple Web browser to a custom
client program.
■ Where are the users located?
The distance between users influences how the application is engineered to
cope with network latencies. The location also affects which times of the day are
busy, when it is impossible to perform batch or system maintenance functions.
■ What is the network speed?
Network speed affects the amount of data and the conversational nature of the
user interface with the application and database servers. A highly
conversational user interface can communicate with back-end servers on every
key stroke or field level validation. A less conversational interface works on a
screen-sent and a screen-received model. On a slow network, it is impossible to
get good data entry speeds with a highly conversational user interface.
■ How much data will the user access, and how much of that data is largely read
only?
The amount of data queried online influences all aspects of the design, from
table and index design to the presentation layers. Design efforts must ensure
that user response time is not a function of the size of the database. If the
application is largely read only, then replication and data distribution to local
caches in the application servers become a viable option. This also reduces
workload on the core transactional server.
■ What is the user response time requirement?
Data Modeling
Data modeling is important to successful relational application design. This should
be done in a way that quickly represents the business practices. Chances are, there
will be heated debates about the correct data model. The important thing is to apply
greatest modeling efforts to those entities affected by the most frequent business
transactions. In the modeling phase, there is a great temptation to spend too much
time modeling the non-core data elements, which results in increased development
lead times. Use of modeling tools can then rapidly generate schema definitions and
can be useful when a fast prototype is required.
B-Tree Indexes These are the standard index type, and they are excellent for primary
key and highly-selective indexes. Used as concatenated indexes, B-tree indexes can
be used to retrieve data sorted by the index columns.
Bitmap Indexes These are suitable for low cardinality data. Through compression
techniques, they can generate a large number of rowids with minimal I/O.
Combining bitmap indexes on non-selective columns allows efficient AND and OR
operations with a great number of rowids with minimal I/O. Bitmap indexes are
particularly efficient in queries with COUNT(), because the query can be satisfied
within the index.
Partitioned Indexes Partitioning a global index allows partition pruning to take place
within an index access, which results in reduced I/Os. By definition of good range
or list partitioning, fast index scans of the correct index partitions can result in very
fast query times.
Reverse Key Indexes These are designed to eliminate index hot spots on insert
applications. These indexes are excellent for insert performance, but they are limited
in that they cannot be used for index range scans.
Using Views
Views can speed up and simplify application design. A simple view definition can
mask data model complexity from the programmers whose priorities are to retrieve,
display, collect, and store data.
However, while views provide clean programming interfaces, they can cause
sub-optimal, resource-intensive queries. The worst type of view use is when a view
references other views, and when they are joined in queries. In many cases,
developers can satisfy the query directly from the table without using a view.
Usually, because of their inherent properties, views make it difficult for the
optimizer to generate the optimal execution plan.
■ Soft Parsing. A SQL statement is submitted for the first time, and a match is
found in the shared pool. The match can be the result of previous execution
by another user. The SQL statement is shared, which is good for
performance. However, soft parses are not ideal, because they still require
syntax and security checking, which consume system resources.
Because parsing should be minimized as much as possible, application
developers should design their applications to parse SQL statements once and
execute them many times. This is done through cursors. Experienced SQL
programmers should be familiar with the concept of opening and re-executing
cursors.
Application developers must also ensure that SQL statements are shared within
the shared pool. To do this, bind variables to represent the parts of the query
that change from execution to execution. If this is not done, then the SQL
statement is likely to be parsed once and never re-used by other users. To
ensure that SQL is shared, use bind variables and do not use string literals with
SQL statements. For example:
Statement with string literals:
SELECT * FROM emp
WHERE ename
LIKE ’KING’;
The following example shows the results of some tests on a simple OLTP
application:
Test #Users Supported
6. Optimize the interfaces between components, and ensure that all components
are used in the most scalable configuration. This rule requires minimal
explanation and applies to all modules and their interfaces.
7. Use foreign key references. Enforcing referential integrity through an
application is expensive. You can maintain a foreign key reference by selecting
the column value of the child from the parent and ensuring that it exists. The
foreign key constraint enforcement supplied by Oracle (which does not use
SQL) is fast, easy to declare, and does not create network traffic.
If you adopt an object-oriented approach to schema design, then make sure that you
do not lose the flexibility of the relational storage model. In many cases, the
object-oriented approach to schema design ends up in a heavily denormalized data
structure that requires considerable maintenance and REF pointers associated with
objects. Often, these designs represent a step backward to the hierarchical and
network database designs that were replaced with the relational storage method.
In summary, if you are storing your data in your database for the long-term and you
anticipate a degree of ad hoc queries or application development on the same
schema, then you will probably find that the relational storage method gives the
best performance and flexibility.
Sizing Data
You could experience errors in your sizing estimates when dealing with variable
length data if you work with a poor sample set. Also, as data volumes grow, your
key lengths could grow considerably, altering your assumptions for column sizes.
When the system becomes operational it becomes harder to predict database
growth, especially that of indexes. Tables grow over time, and indexes are subject to
the individual behavior of the application in terms of key generation, insertion
pattern, and deletion of rows. The worst case is where you insert using an
ascending key and then delete most rows from the left-hand side but not all the
rows. This leaves gaps and wasted space. If you have index use like this make sure
that you know how to use the online index rebuild facility.
Most good DBAs monitor space allocation for each object and look for objects that
could grow out of control. A good understanding of the application can highlight
objects that could grow rapidly or unpredictably. This is a crucial part of both
performance and availability planning for any system. When implementing the
production database, the design should attempt to ensure that minimal space
management takes place when interactive users are using the application. This
applies for all data, temp, and rollback segments.
Estimating Workloads
Estimation of workloads for capacity planning and testing purposes is often
described as a black art. When considering the number of variables involved it is
easy to see why this process is largely impossible to get precisely correct. However,
designers need to specify machines with CPUs, memory, and disk drives, and
eventually roll out an application. There are a number of techniques used for sizing,
and each technique has merit. When sizing, it is best to use at least two methods to
validate your decision-making process and provide supporting documentation.
Benchmarking
The benchmarking process is both resource and time consuming, and it might not
get the correct results. By simulating in a benchmark an application in early
development or prototype form, there is a danger of measuring something that has
no resemblance to the actual production system. This sounds strange, but over the
many years of benchmarking customer applications with the database development
organization, we have yet to see good correlation between the benchmark
application and the actual production system. This is mainly due to the number of
application inefficiencies introduced in the development process.
However, benchmarks have been used successfully to size systems to an acceptable
level of accuracy. In particular, benchmarks are very good at determining the actual
I/O requirements and testing recovery processes when a system is fully loaded.
Benchmarks by their nature stress all system components to their limits. As all
components are being stressed be prepared to see all errors in application design
and implementation manifest themselves while benchmarking. Benchmarks also
test database, operating system, and hardware components. Because most
benchmarks are performed in a rush, expect setbacks and problems when a system
component fails. Benchmarking is a stressful activity, and it takes considerable
experience to get the most out of a benchmarking exercise.
Application Modeling
Modeling the application can range from complex mathematical modeling exercises
to the classic simple calculations performed on the back of an envelope. Both
methods have merit, with one attempting to be very precise and the other making
gross estimates. The down side of both methods is that they do not allow for
implementation errors and inefficiencies.
The estimation and sizing process is an imprecise science. However, by
investigating the process, some intelligent estimates can be made. The whole
estimation process makes no allowances for application inefficiencies introduced by
writing bad SQL, poor index design, or poor cursor management. A good sizing
engineer builds in margin for application inefficiencies. A good performance
engineer discovers the inefficiencies and makes the estimates look realistic. The
process of discovering the application inefficiencies is described in the Oracle
performance method.
Rollout Strategies
When new applications are rolled out, two strategies are commonly adopted:
■ Big Bang Approach - All users migrate to the new system at once.
■ Trickle Approach - Users slowly migrate from existing systems to the new one.
Both approaches have merits and disadvantages. The Big Bang approach relies on
good testing of the application at the required scale, but has the advantage of
minimal data conversion and synchronization with the old system, because it is
simply switched off. The Trickle approach allows debugging of scalability issues as
the workload increases, but might mean that data needs to be migrated to and from
legacy systems as the transition takes place.
It is hard to recommend one approach over the other, because each method has
associated risks that could lead to system outages as the transition takes place.
Certainly, the Trickle approach allows profiling of real users as they are introduced
to the new application and allows the system to be reconfigured only affecting the
migrated users. This approach affects the work of the early adopters, but limits the
load on support services. This means that unscheduled outages only affect a small
percentage of the user population.
The decision on how to roll out a new application is specific to each business. The
approach adopted will have its own unique pressures and stresses. The more testing
and knowledge derived from the testing process, the more you will realize what is
best for the rollout.
Performance Checklist
To assist in the rollout process, build a list of tasks that, if performed correctly,
increase the chance of good performance in production and, if there is a problem,
enable rapid debugging of the application. For example:
1. When you create the control file for the production database, allow for growth
by setting MAXINSTANCES, MAXDATAFILES, MAXLOGFILES, MAXLOGMEMBERS,
and MAXLOGHISTORY to values higher than what you anticipate for the rollout.
This results in more disk space usage and bigger control files, but saves time
later should these need extension in an emergency.
2. Set block size and optimizer mode to that used to develop the application.
Export the schema statistics from the development/test environment to the
production database if the testing was done on representative data volumes and
the current SQL execution plans are correct.
3. Set the minimal number of initialization parameters. The important parameters
to set size the various caches within the SGA. The additional parameters that
specify the behavior of the archive dump destinations should be set for backup
and debugging purposes. Ideally, most other parameters should be left at
default. If there is more tuning to perform, this shows up when the system is
under load.
5. All SQL statements should be verified to be optimal and their resource usage
understood.
6. Validate that middleware and programs that connect to the database are
efficient in their connection management and do not logon/logoff repeatedly.
7. Validate that the SQL statements use cursors efficiently. Each SQL statement
should be parsed once and then executed multiple times. The most common
reason this does not happen is because bind variables are not used properly and
WHERE clause predicates are sent as string literals. If the precompilers are used
to develop the application, then make sure that the parameters
MAXOPENCURSORS, HOLD_CURSOR, and RELEASE_CURSOR have been reset
from the default values prior to precompiling the application.
8. Validate that all schema objects have been correctly migrated from the
development environment to the production database. This includes tables,
indexes, sequences, triggers, packages, procedures, functions, java objects,
synonyms, grants, and views. Ensure that any modifications made in testing are
made to the production system.
9. As soon as the system is rolled out, establish a baseline set of statistics from the
database and operating system. To do this, use Enterprise Manager or
Statspack. This first set of statistics validates or corrects any assumptions made
in the design and rollout process.
10. Start anticipating the first bottleneck (there will always be one) and follow the
Oracle performance method to make performance improvement.
Importance of Statistics
Before reacting to a problem, collect all possible statistics and get an overall picture
of the application. Getting a complete landscape of the system may take
considerable effort. But, if data has already been collected and embedded into the
application,thenthisprocessismuch easier.
After collecting as much initial data as possible, outline issues found from the
statistics, the same way doctors collect symptoms from patients. Reacting to
symptoms too early in the performance analysis process generally results in an
incorrect analysis, which wastes time later. For example, it is extremely risky for a
doctor to prescribe open heart surgery for a patient who complains of chest pains
on the initial consultation.
CPU Statistics CPU utilization is the most important operating system statistic in the
tuning process. Get CPU utilization for the entire system and for each individual
CPU on multi-processor environments. Utilization for each CPU can detect
single-threading and scalability issues.
Most operating systems report CPU usage as time spent in user space or mode and
time spent in kernel space or mode. These additional statistics allow better analysis
of what is actually being executed on the CPU.
On an Oracle data server system, where there is generally only one application
running, the server runs database activity in user space. Activities required to
service database requests (such as scheduling, synchronization, I/O, memory
management, and process/thread creation and tear down) run in kernel mode. In a
system where all CPU is fully utilized, a healthy Oracle system runs between 65%
and 95% in user space.
Note: On UNIX systems, where wait for I/O is derived for part of
the CPU statistics, this value should be treated as idle time.
Virtual Memory Statistics Virtual memory statistics should mainly be used as a check
to validate that there is very little paging or swapping activity on the system.
System performance degrades rapidly and unpredictably when paging or swapping
occurs.
Individual process memory statistics can detect memory leaks due to a
programming failure to deallocate memory taken from the process heap. These
statistics should be used to validate that memory usage does not increase after the
system has reached a steady state after startup. This problem is particularly acute
on shared server applications on middle tier machines where session state may
persist across user interactions, and on completion state information that is not fully
deallocated.
Disk Statistics Because the database resides on a set of disks, the performance of the
I/O subsystem is very important to the performance of the database. Most
operating systems provide extensive statistics on disk performance. The most
important disk statistics are the current response time and the length of the disk
queues. These statistics show if the disk is performing optimally or if the disk is
being overworked. If a disk shows response times over 20 milliseconds, then it is
performing badly or is overworked. This is your bottleneck. If disk queues start to
exceed two, then the disk is a potential bottleneck of the system.
Network Statistics Network statistics can be used in much the same way as disk
statistics to determine if a network or network interface is overloaded or not
performing optimally. In today's networked applications, network latency can be a
large portion of the actual user response time. For this reason, these statistics are a
crucial debugging tool.
Database Statistics
Database statistics provide information on the type of load on the database, as well
as the internal and external resources used by the database. When database
resources become exhausted, it is possible to identify bottlenecks in the application.
Database statistics can be queried directly from the database in a relational manner
using SQL. These statistics can be inserted back into the database with the INSERT
INTO x AS SELECT ... or CREATE TABLE x AS SELECT ... statements. This is the
basis of most snapshot mechanisms that allow statistical gathering over time. Most
statistics are contained in a series of virtual tables or views known as the V$ tables,
because they are prefixed with V$. These are read only, and they are owned by SYS.
Many of the tables contain identifiers and keys that can be joined to other V$ tables.
In order to get meaningful database statistics, the TIMED_STATISTICS parameter
must be enabled for the database instance. The performance impact of having
TIMED_STATISTICS enabled is minimal compared to instance performance. The
performance improvements and debugging value of a complete set of statistics
make this parameter crucial to effective performance analysis.
The core database statistics are:
■ Buffer Cache
■ Shared Pool
■ Wait Events
Buffer Cache The buffer cache manages blocks read from disk into buffers in
memory. It also holds information on the most recently used buffers and those
modified in normal database operation. To get best query performance, a user query
accesses all required data blocks within the buffer cache, thus satisfying the query
from memory. However, this might not always happen, because the database is
many multiples the size of the buffer cache. With this in mind, it is easy to see that
the buffer cache requires management and tuning.
The objective in tuning the buffer cache is to get acceptable user query time by
having as many of the required blocks in the cache as possible. Also, eliminate time
consuming I/Os without inducing any serialization points or performance spikes as
old blocks are aged out of the cache. This process requires a working knowledge of
the buffer cache mechanism, the database writer, and the checkpointing mechanism.
Most information can be extracted from the V$SYSSTAT table.
Shared Pool The shared pool contains information about user sessions, shared data
structures used by all database sessions, and the dictionary cache.
Querying the shared pool allows analysis of the SQL statements run in the database.
This is particularly important if you have limited or no knowledge of the
application source code. In addition to the actual SQL, you can determine how
many times it is run and how much CPU and disk I/Os are performed by the SQL.
This information can be extracted from the V$SQL table. Analyzing this information
is crucial in objective bottleneck identification when debugging an unknown
application.
Wait Events In the process of usual database server operations, there are times when
processes need to share resources or synchronize with other processes; for example,
allocating memory in the shared pool or waiting for a lock. Similarly, there are times
when the database process gives control to external code or other processes out of
its control; for example, performing I/O and waiting for the log writer to
synchronize the redo log.
In these cases, the user process stops working and starts waiting. This wait time
becomes part of the eventual user response time. If there are multiple processes
queuing on a shared resource or demanding the same external resource, then the
database starts to single-thread, and scalability is impacted. Performance analysis
should determine why queuing on resources in the database is happening.
The V$SYSTEM_EVENT, V$SESSION_EVENT, and V$SESSION_WAIT tables allow
querying of historical wait events or wait events in real time. The V$SESSION_
WAIT table has additional columns that can be joined to other V$ tables based on the
wait event recorded. These additional join columns specified in V$SESSION_WAIT
allow focused drill down and analysis of the wait event.
Application Statistics
Application statistics are probably the most difficult statistics to get, but they are the
most important statistics in measuring any performance improvements made to the
system. At a minimum, application statistics should provide a daily summary of
user transactions processed for each working period. More complete statistics
provide precise details of what transactions were processed and the response times
for each transaction type. Detailed statistics also provide statistics on the
decomposition of each transaction time spent in the application server, the network,
the database, and so on.
Performance Intuition
Database and operating system statistics provide an indication of how well a
system is performing. By correlating statistics with actual throughput, you can see
how the system is performing and determine where future bottlenecks and resource
shortages could exist. This is a skill acquired through the experience of monitoring
systems and working with the Oracle server.
CPU utilization is the easiest system usage statistic to understand and monitor.
Monitoring this statistic over time, you can see how the system is used during the
work day and over a number of weeks. However, this statistic provides no
indication of how many business transactions were executed or what resources
were used for each transaction.
Two other statistics that give a better indication of actual business transactions
executed are the number of commits and the volume of redo generated. These are
found in the V$SYSSTAT view under USER COMMITS and REDO SIZE. These
statistics show the number of actual transactions and the volume of data changed in
the database. If these statistics increase in number over time, and if application and
transaction logic are not altered, then you know that more business transactions
were executed. The number of logical blocks read (V$SYSSTAT statistic ’session
logical reads’) also indicates the query workload on a system. Be careful
interpreting this number. A change in the number of logical blocks read can be a
result of an execution plan change rather than an increase in workload.
With experience, it becomes easy to correlate database statistics with the application
workload. A performance DBA learns to use intuition with database statistics and
the application profile to determine a system's workload characteristics. A DBA
Today's systems are so different and complex that hard and fast rules for
performance analysis cannot be made. In essence, the Oracle performance
improvement method defines a way of working, but not a definitive set of rules.
With bottleneck detection, the only rule is that there are no rules! The best
performance engineers use the data provided and think laterally to determine
performance problems.
6. Propose a series of remedy actions and the anticipated behavior to the system,
and apply them in the order that can benefit the application the most. A golden
rule in performance work is that you only change one thing at a time and then
1. Is the response time/batch run time acceptable for a single user on an empty or
lightly loaded machine?
If it is not acceptable, then the application is probably not coded or designed
optimally, and it will never be acceptable in a multiple user situation when
system resources are shared. In this case, get application internal statistics, and
get SQL Trace and SQL plan information. Work with developers to investigate
problems in data, index, transaction SQL design, and potential deferral of work
to batch/background processing.
2. Is all the CPU being utilized?
If the kernel utilization is over 40%, then investigate the operating system for
network transfers, paging, swapping, or process thrashing. Otherwise, move
onto CPU utilization in user space. Check to see if there are any non-database
jobs consuming CPU on the machine limiting the amount of shared CPU
resources, such as backups, file transforms, print queues, and so on. After
determining that the database is using most of the CPU, investigate the top SQL
by CPU utilization. These statements form the basis of all future analysis. Check
the SQL and the transactions submitting the SQL for optimal execution. In
Oracle Server releases prior to 9i, use buffer gets as the measure for CPU usage.
With release 9i, Oracle provides the actual CPU statistics in V$SQL.
If the application is optimal and there are no inefficiencies in the SQL execution,
consider rescheduling some work to off-peak hours or using a bigger machine.
3. At this point, the system performance is unsatisfactory, yet the CPU resources
are not fully utilized.
In this case, you have serialization and unscalable behavior within the server.
Get the WAIT_EVENTS statistics from the server, and determine the biggest
serialization point. If there are no serialization points, then the problem is most
likely outside the database, and this should be the focus of investigation.
Elimination of WAIT_EVENTS involves modifying application SQL and tuning
database parameters. This process is very iterative and requires the ability to
drill down on the WAIT_EVENTS systematically to eliminate serialization
points.
Index-1
G network statistics, 2-3
networks, 1-8
gathering data, 2-6
O
H
object-orientation, 1-22
hardware components, 1-7 operating system statistics, 2-2
historical data, 2-8 operating systems
hours of service, 1-13 checking, 2-12
collecting symptoms, 2-12
I Oracle performance improvement method, 2-9
steps, 2-11
implementing business logic, 1-9
indexes
adding columns, 1-15 P
appending columns, 1-15
partitioned indexes, 1-16
bit-mapped, 1-16
performance emergencies, 3-2
B-tree, 1-16
performance improvement method, 2-9
column order, 1-17
steps, 2-11
costs, 1-17
programming languages, 1-20
design, 1-15
function-based, 1-16
partitioned, 1-16 Q
reverse key, 1-17 queries
sequences in, 1-17 data, 1-12
serializing in, 1-17
index-organized tables, 1-16
Internet scalability, 1-4 R
IOT (index-organized table), 1-16 resource allocation, 1-9, 1-20
response time, 1-12
L reverse key indexes, 1-17
rollout strategies
linear scalability, 1-6 big bang approach, 1-27
trickle approach, 1-27
M
managing the user interface, 1-9 S
memory, 1-8 scalability, 1-3
modeling factors preventing, 1-6
conceptual, 2-12 Internet, 1-4
data, 1-14 linear, 1-6
workloads, 1-25 service hours, 1-13
shared pool, 2-4
N software components, 1-8
statistics
network speed, 1-12 correlation with application workload, 2-8
Index-2
databases, 2-4 response time, 1-12
buffer caches, 2-4
shared pool, 2-4
gathering tools, 2-6
V
BSTAT/ESTAT scripts, 2-7 validating designs, 1-25
database data, 2-6 views, 1-18
Oracle Enterprise Manager, 2-6 virtual memory statistics, 2-3
Statspack, 2-6
operating systems, 2-2 W
CPU statistics, 2-2
disk statistics, 2-3 workloads
network statistics, 2-3 estimating, 1-23
virtual memory statistics, 2-3 benchmarking, 1-24
Statspack, 2-6 extrapolating, 1-24
system architecture, 1-7 modeling, 1-25
configuration, 1-10 testing, 1-25
hardware components, 1-7
CPUs, 1-8
I/O subsystems, 1-8
memory, 1-8
networks, 1-8
software components, 1-8
data and transactions, 1-10
implementing business logic, 1-9
managing the user interface, 1-9
user requests and resource allocation, 1-9
T
tables
design, 1-15
testing designs, 1-25
transactions and data, 1-10
trickle rollout strategy, 1-27
U
user requests, 1-9
users
interaction method, 1-12
interfaces, 1-20
location, 1-12
network speed, 1-12
number of, 1-11
requests, 1-20
Index-3
Index-4