DB 2 V 5 R 4
DB 2 V 5 R 4
DB 2 V 5 R 4
www-03.ibm.com/servers/eserver/iseries/db2/
Mike Cain - DB2 for i5/OS Center of Competency Doug Mack - DB2 for i5/OS Product Marketing Kent Milligan - DB2 for i5/OS Technology Team
1992
1996
2000
2004
DB2 for i5/OS Microsoft SQL Server 2000 Oracle 10g 0 10 20 30 40 50 60 70 80 90 100
This chart shows the DB2 UDB for iSeries strategic directions. As you progress through this course, you will be able to relate the V5R4 enhancements back to these strategic initiatives. SQL is the industry standard for database access and programming. While the heritage of application development on the iSeries platform has been to use native interfaces, more and more application developers are using SQL as a programming language as the result of the DB2 UDB for iSeries strategic investment in SQL. You can see the trend toward greater use of SQL in the upper graph on this chart. SQL is the strategic database interface for DB2 UDB for iSeries to accommodate all the porting of third-party applications and tools, a. Almost all of which third-party solutions utilize SQL-based interfaces for their data access. As you can see on the lower right-hand graph, DB2 UDB for iSeries actually is the first DBMS to fully comply with the Core level of the SQL 2003 Standard. The beauty of the IBM DB2 UDB family is that we can all share from this, and DB2 UDB for iSeries will continue to leverage the technology leadership position of IBM and maintain close compatibility with the other DB2 UDB family members. Additionally, DB2 UDB for iSeries will continue to leverage and build on the strengths the iSeries. Those strengths include reliability, /availability, total cost of ownership, scalability, and usability.
Microsoft SQL Server 2005 128 8000 38 38 10 5 53 24 53 1024 1024 1024 8 megabytes 16 900 900 Limited by length 8060 Limited by length 8060 16 256 2 gig 4096
DB2 for i5/OS V5R4 128 32766 63 63 10 5 53 24 53 8000 8000 8000 3.5 gigabytes 120 32K 32K 120 32K 10000 32K 120 1000 No limit 8000
DB2 for LUW V9 128 254 31 31 10 5 53 24 53 1012 1012 1012 2 gigabytes 64 8192 8192 1012 32677 1012 32677 64 No limit No limit 1012
DB2 for z/OS V9 128 255 31 31 10 5 53 21 or 24 53 750 750 750 2 gigabytes 64 2000 2000 750 4000 750 4000 64 225 No limit 750
255
DB_BLOCK_SIZE
255
DB_BLOCK_SIZE
32
1000
Largest limit
2006 IBM Corporation 3
Logical File
Create Physical File (Data Descr. Specifications) Access Data with Native HLL I/O
DB2
64 Bit Power4 RISC Single Level Storage
IOP IOP IOP IOP IOP IOP IOP
Cust #1
Cust #2
Cust #3
Cust #4
Cust #5
Cust #6
Cust #7
DB2 for i5/OS has many programming and operational interfaces available for application developers, DBAs, and operations/support personnel. However, the database engine is still the same regardless of the interface being used. With this architecture, you actually have MORE flexibility in how you use the database and what skills are available in the marketplace to help develop or support. DDS (Data Description Specifications) provide one option to define database tables and indexes. RPG is a common programming language in the System i world. These interfaces, commonly referred to as native interfaces, were in place before SQL became an industry standard and continue to be used heavily in a System i environment. DB2 for i5/OS also supports SQL interfaces, and you can create tables and indexes through SQL, generally considered the open standard for database development. DB2 for i5/OS also supports many different high level languages (e.g., Cobol, JAVA), 4GL and CASE tools, and web or client/server based interfaces (JDBC, ODBC, Microsofts .NET framework) to allow a tremendous amount of flexibility while still leveraging the underlying advantages of the single RDBMS!
AIX JDBC DRDA iSeries Toolbox JDBC DB2 Connect DB2 Information Integrator iSeries Toolbox JDBC DB2 Connect iSeries ODBC iSeries Access DB2 Connect DB2 Information Integrator
Linux JDBC ODBC DRDA Windows JDBC ODBC OLE DB .NET DRDA
* i5/OS PASE only supports CLI
Any interface allowing SQL can be used to interface DB2 for i5/OS
Enterprise Warehouse
DataJoiner FedSVR
Operational Data
Heterogeneous Database Access and Data Replication External Data read, write, update, insert, delete White Paper: http://www-1.ibm.com/servers/enable/site/education/ibo/record.html?hetdata Web site: http://www-306.ibm.com/software/data/integration/
2006 IBM Corporation 6
Data Integration is a MAJOR issue with companies many customers nowadays have data spread across a variety of different format sources. We can provide them with an IBM solution that will make our data mgmt friends happy, and that is WebSphere (DB2) Federation Server. With this product, I could write an RPG application that accesses data in SQLServer or Oracle.i.e., thus deploying the great i5/OS power also towards data originated on other sources/platforms.
Self Healing
Systems Managed Access Path Protection Self Managed Database Logging Auto Index Rebalancing Database Locking Issues
IOP IOP
Cust #1 Cust #2
Cust #3
Cust #4
Cust #5
Cust #6
Cust #7
Self Tuning
Auto Tuner, Sophisticated Cost Based Query Optimizer Automatic Index Advisor, Auto Stats Collection
Self Protecting
Object Based Kernel in i5/OS Refer to Virus got you down?
http://skyviewpartners.com/java-skyviewp/pdf/Virus-Got-You-Down.pdf
"DB2 UDB for iSeries: Autonomic Computing FAQs SystemSales and PartnerInfo DB2 UDB for iSeries Sales Kit
2006 IBM Corporation 7
So what makes DB2 UDB running in i5/OS so unique? The key benefit of this DB2 family member is its ability to leverage the self-managing, or autonomic computing features of i5/OS. Because DB2 is built into i5/OS, it takes advantage of architectural benefits of System i, including Single Level Store, the object based implementation of i5/OS, and strong I/O architecture through the use of I/O processors. It is for these reasons that historically the AS/400 could tout no DBA required. While this is really still true in many cases, even in the most demanding situation, the amount of DBA effort required is 1/3rd of other database management systems. The most demanding situations that will require DBA skills for DB2 for i5/OS includes heavy SQL based
Availability of DBA skills on the marketplace is a subject that requires some further investigation. For instance, keep in mind that (a) many typical DBA tasks required in other database products are not required on DB2 for i5/OS, (b) DBAs have many skills that are transferable to DB2 for i5/OS, and (c) the DBA community has evolved because of the need for this specific skill on other database management systems. As noted, many of the skills of a DBA are common across all relational databases. This includes knowledge of SQL, relational data modeling, indexing, and query optimization. With the exception of query optimization, learning these skills on DB2 for i5/OS is a matter of learning new user interfaces. Online tutorials exist for this via the DB2 for i5/OS home page: ibm.com/iseries/db2. Query optimization on DB2 for i5/OS will require some education for the DBA . IBM has developed a four day class called DB2 for i5/OS SQL Performance Tuning and Monitoring Workshop. For details, refer to: ibm.com/iseries/service/igs/db2performance.html. Lastly, it is important to note that many of the tasks to manage other databases are not required with DB2 for i5/OS. These tasks may include creation and monitoring of tablespaces, logs, and cache area disk usage, re-balancing of indexes, running utilities to gather statistics, and partitioning of data. The System i architecture and sophisticated cost-based optimizer of DB2 for i5/OS automates these processes.
In V5R4, DB2 for i5/OS continues its position as a leader in ease of use and integration. V5R4 contains many new capabilities that continue to make DB2 UDB for iSeries a database that is simple to use for programmers, analysts, and administrators. iSeries Navigator delivers the most simplification with the addition of the DB2 On Demand Performance Center that will dramatically simplify SQL performance analysis for database analysts and administrators. iSeries Programmers will also be able to lead a simpler life with a richer SQL toolbox. SQL programmers will be able to solve a wider range of business problems with additions such as Recursive SQL, Triple DES encryption, and the integration of SQL with Free Format RPG syntax. DB2 for i5/OS, already proven to be a leader in data warehousing through recent benchmarks, is enhancing the SQL capabilities it has to tackle the problems that affect businesses today. With the addition of OLAP functions, Recursive SQL and improved Materialized Query Table support, the database can now take on an even broader range of workloads. Performance and scalability are improved on a number of fronts boosted by the SQL Query Engine (SQE) handling more DB2 requests and its Autonomic Index capability. SQL Procedures, functions and triggers will receive a significant performance boost with V5R4.
http://www-03.ibm.com/servers/eserver/iseries/db2/sqe.html http://www-03.ibm.com/servers/enable/site/education/abstracts/4aea_abs.html
2006 IBM Corporation 10
Extended Dynamic
Prepare once and then reference
The optimizer and database engine are separated at different layers of the operating system
SLIC
11
To fully understand the implementation of query management and processing in DB2 UDB for iSeries on i5/OS V5R2 and subsequent releases, it is important to see how the queries were implemented in releases of i5/OS previous to V5R2. The figure above shows a high-level overview of the architecture of DB2 UDB for iSeries before i5/OS V5R2. The optimizer and database engine are implemented at different layers of the operating system. The interaction between the optimizer and the database engine occurs across the Machine Interface (MI).
Extended Dynamic
Prepare once and then reference
The optimizer and database engine merged to form the SQL Query Engine and much of the work was moved to SLIC
SLIC SLIC
12
This figure shows an overview of the DB2 for i5/OS architecture on i5/OS V5R3 and V5R4, and where each SQE component fits. Some components have been move at a lower level, below the Machine Interface.
Optimizer
Query Dispatcher CQE Optimizer SQE Optimizer
The optimizer and database engine merged to form the SQL Query Engine and much of the work was moved to SLIC
13
The functional separation of each SQE component is clearly evident. In line with design objectives, this division of responsibility enables IBM to more easily deliver functional enhancements to the individual components of SQE, as and when required. Notice that most of the SQE Optimizer components are implemented below the MI. This translates into enhanced performance efficiency. A new component, the Query Dispatcher, has been introduced. This is in charge of dispatching a query to the traditional CQE optimizer or to the new one, SQE, according to the clauses and conditions of the request.
Initial step for all query optimization that occurs on the system The use of SQE is being phased in over several releases
functionalities may depend on level of fixes (DB Group PTFs) installed
V5R2
V5R2+
V5R3
V5R4
14
The function of the Dispatcher is to route the query request to either CQE or SQE, depending on the attributes of the query. All queries are processed by the Dispatcher and you cannot bypass it. Currently, the Dispatcher will route an SQL statement to CQE if it find that the statement references or contains any of the following: INSERT WITH VALUES statement or the target of an INSERT with subselect statement NLSS or CCSID translation between columns Lateral correlation Logical files Datalink columns Tables with Read Triggers User-defined table functions Read-only queries with more than 1000 dataspaces or updateable queries with more than 256 dataspaces. DB2 Multisystem tables non-SQL queries, for example the QQQQry API, Query/400, or OPNQRYF
SQE
Read only SQL (subSELECT of an INSERT supported) Single Table (i.e. no joins) AND, OR, IN predicates (no LIKE support) Joins SMP requested
V5R3
All of the above plus VIEWs UNIONs SubQueries INSERT, UPDATE, DELETE Star Schema Join (STAR_JOIN without QAQQINI support)
15
Goal is to use SQE if at all possible. Conditions where SQE can be used varies according to the OS level.
SQE
16
See previous chart more queries allowed into SQE as of i5/OS V5R4.
Cost to back up and revert to CQE adds about 15% to the total optimization time QAQQINI option to ignore S/O and derived keyed logical files
IGNORE_DERIVED_INDEXES *YES
17
The Dispatcher also has the built-in capability to re-route an SQL query to CQE that was initially routed to SQE. Unless the IGNORE_DERIVED_INDEX option with a parameter value of *YES is specified, a query will typically be reverted back to CQE from SQE whenever the Optimizer processes table objects that have any of the following logical files or indexes defined: Logical files with the SELECT/OMIT DDS keyword specified Non-standard indexes or derived keys, for example logical files specifying the DDS keywords RENAME or Alternate Collating Sequence (ACS) on any field referenced in the key Sort Sequence NLSS specified for the index or logical file As new functionality is added in the future, the Dispatcher will route more queries to SQE and increasingly fewer to CQE. It is possible to implement a QAQQINI option to allow the optimizer to IGNORE logical files with select/omit keyword, hence allowing the query to be processed by SQE even when such a logical file exists.
QAQQINI
IGNORE_DERIVED_INDEX IGNORE_DERIVED_INDEX = = *YES *YES
SQE? SQE?
NO YES!
HLL HLL Native Native I/O I/O Physical Physical File File
Index Index
18
Setting to *YES the IGNORE_DERIVED_INDEX option of the QAQQINI file will allow the SQE optimizer to ignore the derived
index and process the query. The resulting query plan will be created without any regard to the existence of the derived index(s). The index types that are ignored include: Keyed logical files defined with select or omit criteria and with the DYNSLT keyword omitted Keyed logical files built over multiple physical file members (V5R2 restriction, not a restriction for V5R3) Keyed logical files where one or more keys reference an intermediate derivation in the DDS. Exceptions to this are: 1. when the intermediate definition is defining the field in the DDS so that shows up in the logical's format and 2. RENAME of a field (these two exceptions do not make the key derived) Keyed logical files with K *NONE specified. Keyed logical files with Alternate Collating Sequence (ACS) specified SQL indexes created when the sort sequence active at the time of creation requires a weighting (translation) of the key to occur. This is true when any of several non-US language IDs are specified. It also occurs if language ID shared weight is specified, even for language US.
19
Biggest issue with poor plans after upgrading is no stats and no indexes. Statistical information is constructed of single columns of a table and stored internally as part of the table. By default, this information is collected automatically by the system, but you can manually control the collection of statistics. Unlike indexes, however, statistics are not maintained immediately as data in the tables change.
Statistics
All query optimizers rely upon statistics to make plan decisions
DB2 for i5/OS has always relied upon indexes as its source for column statistics Other databases rely upon manual stats collection for their source
20
On many platforms, statistics collection is a manual process that is the responsibility of the database administrator. With iSeries servers, the database statistics collection process is handled automatically, and only rarely is it necessary to update statistics manually. The Statistics Manager does not actually run or optimize the query. It controls the access to the metadata and other information that is required to optimize the query. It uses this information to answer questions posed by the query optimizer. The answers can either be derived from table header information, from existing indexes, or from single-column statistics. The Statistics Manager must always provide an answer to the questions from the Optimizer. It uses the best method available to provide the answers. For example, it may use a single-column statistic or perform a key range estimate over an index. Along with the answer, the Statistics Manager returns a confidence level to the optimizer that the optimizer may use to provide greater latitude for sizing algorithms. If the Statistics Manager provides a low confidence in the number of groups that are estimated for a grouping request, then the optimizer may increase the size of the temporary hash table allocated.
21
More options should allow for better overall query performance Indexes are maintained as the underlying data changes
Column statistics are not maintained, only refreshed
22
An SQL index is a subset of the data in the columns of a table that are logically arranged in either ascending or descending order. Each index contains a separate arrangement. These arrangements are used for ordering (ORDER BY clause), grouping (GROUP BY clause), and joining. An SQL index is a keyed logical file. The index is used by the system for faster data retrieval. Creating an index is optional. You can create any number of indexes. You can create or drop an index at any time. The index is automatically maintained by the system. If you are trying to decide whether to use statistics or indexes to provide information to the Statistics Manager, keep the following differences in mind. One major difference between indexes and column statistics is that indexes are permanent objects that are updated when changes to the underlying table occur, while column statistics are not. If your data is constantly changing, the Statistics Manager may need to rely on stale column statistics. However, maintaining an index after each change to the table might take up more system resources than refreshing the stale column statistics after a group of changes to the table have occurred. Another difference is the effect that the existence of new indexes or column statistics has on the Optimizer. When new indexes become available, the Optimizer will consider them for implementation. If they are candidates, the Optimizer will re-optimize the query and try to find a better implementation. However, this is not true for column statistics. When new or refreshed column statistics are available, the Statistics Manager will interrogate immediately. Reoptimization will occur only if the answers are significantly different from the ones that were given before these refreshed statistics. This means that it is possible to use statistics that are refreshed without causing a reoptimization of an access plan. When trying to determine the selectivity of predicates, the Statistics Manager considers column statistics and indexes as resources for its answers in the following order: Try to use a multi-column keyed index when ANDed or ORed predicates reference multiple columns If there is no perfect index that contains all of the columns in the predicates, it will try to find a combination of indexes that can be used. For single column questions, it will use available column statistics If the answer derived from the column statistics shows a selectivity of less than 2%, indexes are used to verify this answer Accessing column statistics to answer questions is faster than trying to obtain these answers from indexes. Column statistics can only be used by SQE. For CQE, all statistics are retrieved from indexes. Finally, column statistics can be used only for query optimization. They cannot be used for the actual implementation of a query, whereas indexes can be used for both.
Cache is cleared during an IPL New interface to observe and analyze SQE plan information
23
The Plan Cache is a repository that contains the access plans for queries that were optimized by SQE. Access plans generated by CQE are not stored in the Plan Cache; instead, they are stored in SQL Packages, the system-wide statement cache, and job cache). The purposes of the Plan Cache are to: Facilitate the reuse of a query access plan when the same query is re-executed Store runtime information for subsequent use in future query optimizations Once an access plan is created, it is available for use by all users and all queries, regardless of where the query originates. Furthermore, when an access plan is tuned, when creating an index for example, all queries can benefit from this updated access plan. This eliminates the need to reoptimize the query, resulting in greater efficiency. There is a separate plan cache for each IASP for the system, so varying the IASP will also cause the plan cache to be cleared.
Plan Cache
Statement 1
SQE Plan Cache SQE stores all query plans in a centralized system wide plan cache. Along with the plans, Plan Ymeta SQE keeps data and runtime information for the plans (i.e. queries). The plan cache is Plan Z considered temporary storage.
Plan X
Statement 2
24
As shown in this chart, the Plan Cache is interrogated each time a query is executed in order to determine if a valid access plan exists that satisfies the requirements of the query. If a valid access plan is found, it is used to implement the query. Otherwise a new access plan is created and stored in the Plan Cache for future use. The Plan Cache is automatically updated with new query access plans as they are created, or is updated for an existing plan (the next time the query is run) when new statistics or indexes become available. The Plan Cache is also automatically updated by the database with runtime information as the queries are run. Each plan cache entry contains the original query, the optimized query access plan and cumulative runtime information gathered during the runs of the query. In addition, several instances of query runtime objects are stored with a plan cache entry. These runtime objects are the real executables and temporary storage containers (hash tables, sorts, temporary indexes, and so on) used to run the query. All systems are currently configured with the same size Plan Cache, regardless of the server size or the hardware configuration. When the Plan Cache exceeds its designated size, a background task is automatically scheduled to remove plans from the Plan Cache. Access plans are deleted based upon the age of the access plan, how frequently it is being used and how much cumulative resources (CPU/IO) were consumed by the runs of the query. The total number of access plans stored in the Plan Cache depends largely upon the complexity of the SQL statements that are being executed. In certain test environments, there have been typically between 10,000 to 20,000 unique access plans stored in the Plan Cache. The Plan Cache is cleared when a system Initial Program Load (IPL) is performed. Multiple access plans can be maintained for a single SQL statement. Although the SQL statement itself is the primary hash key to the Plan Cache, different environmental settings can cause different access plans to be stored in the Plan Cache. Examples of these environmental settings include: Different SMP Degree settings for the same query Different library lists specified for the query tables Different settings for the job's share of available memory in the current pool Different ALWCPYDTA settings Currently, the Plan Cache can maintain a maximum of 3 different access plans for the same SQL statement. As new access plans are created for the same SQL statement, older access plans are discarded to make room for the new access plans. There are, however, certain conditions that can cause an existing access plan to be invalidated. Examples of these include: Specifying REOPTIMIZE_ACCESS_PLAN(*YES) or (*FORCE) in the QAQQINI table or in Run SQL Scripts Deleting or recreating the table that the access plan refers to Deleting an index that is used by the access plan
Repository of information that is used for feedback and automatic column stats generation Enabling auto stats collection causes the Statistics Manager to interrogate the Plan Cache looking for plans where stats would have been helpful
25
Specific details about the current implementation of the Plan Caches: 1. The plan cache is initially created with a maximum size of 256 million bytes (not 256 Meg). This will contain the original query that was optimized as well as the optimized query plan. There are other objects stored with the query that do not get rolled into this total 2. All systems are currently configured for this same size plan cache regardless of the size of the hardware. 3. When the plan cache exceeds its size, then a background task takes charge to remove old plans. During this window while the background task is processing, the plan cache may exceed it allocated size. 4. Then plan cache can hold many different plans. The total number will depend upon the complexity of the SQL statements that are being run. On the systems that we have been currently monitoring (i.e. single table queries) the plan cache has typically been storing around 6,000 unique plans. 5. The size of the plan varies, but in its compressed mode it is still significantly smaller then the size of the plans that CQE still keeps around. 6. Multiple plans can be maintained for a single SQL statement. The SQL statement is the primary hash key for the plan cache so identical SQL statements will all be stored together, but different environmental setting can cause us to keep different plans in the cache that match the specific environment. Examples of this would include: 1. Two plans where the SMP Degree settings are different 2. Two plans where the library list has changed for the tables 3. Two plans with different settings for this jobs share of memory available in the current pool 4. Two plans with different ALWCPYDTA settings (etc)
Query Optimization
Print SQL Information Messages
26
Not much user interaction can be accomplished for either the validation or execution of the query. The user can have a real effect during the query optimization. Will focus on query optimization phase. The DB2 UDB for AS/400 optimizer is "cost based" with full query rewrite capability Stats provided by the DB are used to come up with the best cost or best plan (least costly, less time) Up to V5R4 we essentially had 4 different sources of information to be used for query optimization:
Indexes are:
Temporary Maintained while any cursor is open
Maintenance is delayed when all cursors are closed
Shared between queries and jobs Primarily created on smaller tables Not used for statistics
Allows additional queries to use SQE (ex. Sensitive cursors) Allows DB2 for i5/OS to tune itself
2006 IBM Corporation 27
Using V5R4 i5/OS means using the new DB2 for i5/OS SQL Query Engine Preparing for the OS upgrade will provide the best experience In other words getting the most out of DB2 for i5/OS enhancements
28
SQE can perform significantly better than CQE, more and more queries can take advantage of it.
Parallel CPU
Fee, requires 5722SS1 Opt.26 DB2 UDB Symmetric Multiprocessing Enabled system wide via sysval QQRYDEGREE (*OPTIMIZE) Enabled for single job via CHGQRYA command or QAQQINI PARALLEL_DEGREE keyword
29
There are two types of parallel processing available. The first is a parallel I/O that is available at no charge. The second is DB2 UDB Symmetric Multiprocessing, a feature that you can purchase. You can turn parallel processing on and off. Even though parallelism has been enabled for a server or given job, the individual queries that run in a job might not actually use a parallel method. This might be because of functional restrictions, or the optimizer might choose a non-parallel method because it runs faster. Because queries being processed with parallel access methods aggressively use main storage, CPU, and disk resources, the number of queries that use parallel processing should be limited and controlled. The QQRYDEGREE system value ca be used to control parallel processing for a server. Query parallel processing can also be cotrolled at the job level using the DEGREE parameter of the Change Query Attributes (CHGQRYA) command or in the QAQQINI file, or using the SET_CURRENT_DEGREE SQL statement.
DB2 for i5/OS introduced parallel database functionality, also known as SMP (Symmetric Multi-Processing) capability in V3R1 (1995), and has enhanced those functions significantly since then. The DB2 for i5/OS SMP feature is an Operating System add-on that can significantly improve performance of database tasks by breaking the processing across multiple CPUs. By doing this, SMP optimizes the available resources to complete the database task in a cost-effective manner. Many other databases use the notion of data partitioning to address scalability
QUERY SQL
M E M O R Y
30
Parallelism within the database is available across a broad range of tasksbut ONLY if the DB2 Symmetric Multi-processing feature of OS/400 is installed and activated (via system value or user profile. Again, this is a huge performance benefit in servers with multiple CPUs configured in the server or partition, because of the ability of the database to spread tasks simultaneously across these processors for any given query/database load/index build/re-org,etc.
Indexing strategy
http://www-03.ibm.com/servers/enable/site/education/abstracts/indxng_abs.html http://www-03.ibm.com/servers/enable/site/education/abstracts/16fa_abs.html
2006 IBM Corporation 31
Each type of index has specific uses and advantages Respective indexing technologies complement each other Indexes can be used for statistics and implementation Indexes can provide RRNs and/or data Indexes are scanned or probed
Probe can only occur on contiguous, leading key columns Scan can occur on any key column Probe and scan can be used together
2006 IBM Corporation 32
...WHERE COLOR = 'BLACK AND ITEM_NO = 003 ...WHERE SIZE = 'MEDIUM' ...WHERE ITEM_NO = 001 AND SIZE = LARGE
2006 IBM Corporation 33
The optimizer is able to perform radix index probe over any combination of the primary key columns, plus one additional secondary key column. Therefore it is important that the first secondary key column be the most selective secondary key column. The optimizer will use radix index scan with any of the remaining secondary key columns. While radix index scan is not as fast as radix index probe it can still reduce the number of keys selected. Hence, secondary key columns that are fairly selective should be included. It is up to the user to determine the true selectivity of any secondary key columns and to determine whether those key columns should be included when creating the index. When building the index the primary key columns should be the leftmost key columns followed by any of the secondary key columns the user chooses and they should be prioritized by selectivity. Note: After creating the suggested index and executing the query again, it is possible that the query optimizer will choose not to use the suggested index. The CQE optimizer when suggesting indexes only considers the selection criteria and does not include join, ordering, and grouping criteria. The SQE optimizer includes selection, join, ordering, and grouping criteria when suggesting indexes. You can access index advisor information in many different ways. These include: The index advisor interface in iSeries Navigator SQL performance monitor Show statements Visual Explain interface Querying the Database monitor view 3020 - Index advised.
Radix Index
Index tree structure Key values are compressed
Common patterns are stored once Unique portion stored in leaf pages Positive impact on size and depth of the index tree
Maintenance
Index data is automatically spread across all available disk units Tree is automatically rebalanced to maintain an efficient structure
Temporary indexes
Considered a temporary data structure to assist the DB engine Maintained temporary indexes available in SQE V5R4
34
An SQL index (or keyed sequence access path) is a permanent object that is created over a table and used by the optimizer to provide a sequenced view of the data for a scan or probe operation. The rows in the tables are sequenced in the index based upon the key columns specified on the creation of the object. When the key columns are matched up by the optimizer to a query, it gives the optimizer the ability to use the radix index to help satisfy any selection, ordering, grouping or join requirements. Typically the use of an index operation will also include a Table Probe operation to provide access to any columns needed to satisfy the query that cannot be found as index keys. If all of the columns necessary to satisfy the query request for a table can be found as keys of an index, then the Table Probe is not required and the query uses Index Only Access. Avoiding the Table Probe can be an important savings for a query. The I/O associated with a Table Probe is typically the more expensive synchronous random I/O.
Radix Index
Database Table 001 002 003 004 005 ARKANSAS MISSISSIPPI MISSOURI IOWA ARIZONA
ROOT ROOT
MISS MISS
ADVANTAGES: Very fast access to a single key value Also fast for small, selected range of key values (low cardinality) Provides order
DISADVANTAGES: Table rows retrieved in order of key values (not physical order) which equates to random I/Os No way to predict which physical index pages are next when traversing the index for large number of key values
35
EMPLOYEE Table
RRN STATE
Perform Perform a a probe probe into into the the range range using using the the local local selection selection value(s) value(s)
... IOWA (004) IOWA (007) IOWA (010) IOWA (017) KANSAS (011) MISSISSIPPI (002) MISSISSIPPI (013) MISSOURI (003) ...
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017
ARKANSAS MISSISSIPPI MISSOURI IOWA ARIZONA MONTANA IOWA NEBRASKA NEBRASKA IOWA KANSAS WISCONSIN MISSISSIPPI WISCONSIN WISCONSIN ARKANSAS IOWA
36
Table scan (arrival sequence processing) is very efficient on the AS/400 due to the independent I/O subsystem, large memory and single level storage and automatic data stripping. SQLxxxx messages are used to illustrate feedback from the optimization. SQLxxxx messages are used for PRTSQLINF and allow the meaningful 1st level text to be shown in the spool file and the message text to be translatable MRI.
Not a tree structure Can only be created through an SQL interface or iSeries Navigator GUI
CREATE ENCODED VECTOR INDEX SchemaName/IndexName ON SchemaName/TableName (ColumnName) WITH n DISTINCT VALUES;
37
An encoded vector index is a permanent object that provides access to a table by assigning codes to distinct key values and then representing those values in a vector. See next chart for a visual representation of an EVI Even though an encoded vector index is used to represent the values stored in a table, the index itself cannot be used to directly gain access to the table. Instead, the encoded vector index can only be used to generate either a temporary row number list or a temporary row number bitmap. These temporary objects can then be used in conjunction with a Table Probe to specify the rows in the table that the query needs to process. The main difference with the Table Probe associated with an encoded vector index (versus a radix index) is that the paging associated with the table can be asynchronous. The I/O can now be scheduled more efficiently to take advantage of groups of selected rows. Large portions of the table can be skipped over where no rows are selected. EVI can be created via SQL or iNav GUI.
RRN 1 2 3 4 5 6 7 8 9
Symbol table contains information for each distinct key value - Each key value is assigned a unique code (key compression) - Code is 1, 2, or 4 bytes depending on number of distinct key values Rather then a bit array for each distinct key value, use one array of codes
2006 IBM Corporation 38
An encoded vector index is a permanent object that provides access to a table by assigning codes to distinct key values and then representing those values in a vector. The size of the vector will match the number of rows in the underlying table. Each vector entry represents the table row number in the same position. The codes generated to represent the distinct key values can be 1, 2 or 4 bytes in length, depending upon the number of distinct values that need to be represented. Because of their compact size and relative simplicity, the EVI can be used to process large amounts of data very efficiently.
Bitmap
Key
Binary search symbol table for key(s) and code(s)
Code
1 2 3 4 5 6 7 8 ...
1 17 5 9 2 7 49 49 5 . . . . . . .
0 0 1 0 0 0 0 0 1 . . . . . . .
Row 1 2 3 4 5 6 7 8 9 ...
39
EVI
EVI
SELECT * FROM EMPLOYEE WHERE STATE = MINNESOTA' AND WORKDEPT IN ( 'B01', C01, 'E01')
Final RRN list
3 10 1000 3001
40
If the optimizer decides to use an EVI to process the query, the database engine uses the vector to build the dynamic bitmap (or a list of selected row ids) that contains one bit for each row in the table, the bit being turned on for each selected row. Like a bitmap index, these intermediate dynamic bitmaps (or lists) can be AND'ed and OR'ed together to satisfy an ad hoc query.
This picture is used to describe EVIs generated RRN list ANDed to produce a final RRN list that represents all the local selection.
Radix
Intermediate Bitmap
EVI
SELECT * FROM EMPLOYEE WHERE STATE = IOWA' OR WORKDEPT IN ( 'B01', C01, 'E01')
Final Bitmap
Intermediate Bitmap
State
0 0 0 1 0 0 0 0 0 . . . 1 . . .
OR
(Merge)
Workdepts
0 0 1 0 0 0 0 0 1 . . . 1 . . .
0 0 1 1 0 0 0 0 1 . . . 1 . . .
41
Zigzag for ANDing is when the starting (or continuing) point for one bitmap/rrnlist comes from the other rrnlist/bitmap. So it essentially leap frogs through the bitmaps/rrn lists. Consequently ANDing of multiple 'not so selective' indexes can be combined to generate a pretty fast plan. Zigzag for ORing is not nearly as interesting. Its a merge, so the OR processing walks both sides, taking the next (ordered) RRN from the source bitmaps/rrn list. Therefore each side is walked and a merge is done on the 'ordered' rrns to get the outputted ordered. The merge is done so that the subsequent table probes are still skip
High cardinality = large distinct number of values Low cardinality = small distinct number of values
In general
A radix index is best when accessing a small set of rows and the key cardinality is high An encoded vector index is best when accessing a set of rows and the key cardinality is low Understanding the data and query are key
2006 IBM Corporation 42
SQE
Robust advice Radix and EVI indexes Based on all parts of the query Multiple indexes can be advised for the same query Some limitations
43
SQE index advice was enhanced in V5R3 (with latest DB PTFs) and V5R4
System only adds (summary) rows, user must manage the data
Options to clear or prune
44
SQE
New feature in V5R4 Temporary indexes are reused and shared Creation is based on watching the query requests over time Creation is based on optimizers own index advice Temporary index maintenance is delayed when all associated cursors are closed
45
A cooler SQE enhancement in V5R4 is extra intelligence in the SQE query optimizer to improve performance with the automatic creation of indexes known as autonomic indexes. The optimizer sits in the background and looks for frequently executed queries that could benefit from an index being created. If such a query is identified, then an Autonomic Index is created by DB2 UDB. This Autonomic Index (AI) is created as a temporary index that is maintained by DB2 UDB as the underlying table data is changed similar to a permanent SQL index. The AI, however, will disappear once the system is shut down or restarted. i5 shops that make heavy use of commitment control will also be able to recover faster with the parallel rollback option and the ability to customize commitment control disk behavior to get the best performance for their business. ============ Autonomic Indexes can be disabled with CACHE_RESULTS & ALLOW_TEMPORARY_INDEXES INI options Autonomic Index is not used for stats & can disappear if all the owning plans are removed from the Plan Cache. Can be viewed with Index Evaluator.
Accurate statistics means accurate costing Accurate costing means optimal query plan Optimal query plans means best performance
2006 IBM Corporation 46
47
Providing the Optimizer with the correct index can greatly improve performance i.e. attain lower response times
48
It is possible to check what indexes are actually in plan on any table using the iNav GUI.
http://www-03.ibm.com/servers/enable/site/education/abstracts/438a_abs.html http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp?topic=/rzajq/rzajqlpg.htm
2006 IBM Corporation 49
I I
V5R4 removes some restrictions on use (i.e. better matching query MQT) MQTs contain:
Query definition Results of the query
SQE optimizer can rewrite the user's query to use the MQT
Query uses MQT's pre-aggregated, pre-calculated, pre-joined data Can significantly increase query performance Can significantly decrease resource utilization
I I I
Default behavior is: do not use MQTs QAQQINI file options to enable/disable support More information at: ibm.com/servers/enable/site/education/ibo/view.html?wp
2006 IBM Corporation 50
T1
T2
T3
Base Tables
MQT
51
Data is inserted into the MQT when it is created Data is not inserted into the MQT when it is created Data in the MQT can be refreshed at any time using the REFRESH TABLE statement Not currently supported
MQT can be used for query optimization MQT cannot be used for query optimization but the MQT can still be queried directly MQT is maintained by the user with INSERT, DELETE, UPDATE, or REFRESH TABLE Not currently supported
2006 IBM Corporation 52
Data is inserted into the MQT when it is created Data is not inserted into the MQT when it is created Data in the MQT can be refreshed at any time using the REFRESH TABLE statement Not currently supported
MQT can be used for query optimization MQT cannot be used for query optimization but the MQT can still be queried directly MQT is maintained by the user with INSERT, DELETE, UPDATE, or REFRESH TABLE Not currently supported
2006 IBM Corporation 53
Optimizer could use the MQT instead of fully executing the following query:
SELECT Geography, Year, SUM(Revenue) AS Total_Revenue, SUM(Quantity) AS Total_Quantity, FROM Example_Table WHERE Year IN (2004, 2005) GROUP BY Geography, Year; white paper: ibm.com/servers/enable/site/education/abstracts/438a_abs.html
User responsible for keeping MQT data current and activating optimizer MQT awareness with QAQQINI options
54
Materialized Query Tables (MQTs) are another feature that only can be used by SQE. Just like Instead of Triggers, MQTs were introduced after the GA of V5R3 via PTF and the capabilities have been enhanced in V5R4. A materialized query table (MQT) is a DB2 table that contains the results of a query, along with the querys definition. Similar to autonomic indexes, they can be used by SQE to automatically improve query performance. When an SQL query is run, the SQE query optimizer checks to see if there are any MQTs on the system that either the match the query being run or partially match. When the query optimizer does find a match or partial match, it will rewrite the query and just use the contents of the MQT as a short cut to speed up the delivery of the query results. MQTs are most beneficial in data warehousing and business intelligence environments since the data does not change as frequently as a transactional system.
55
A special type of transitive closure called look ahead predicate generation (LPG) may be costed for joins. In this case, the optimizer attempts to minimize the random I/O costs of a join by pre-applying the results of the query to a large fact table. LPG will typically be used with a class of queries referred to as star join queries, however it can possibly be used with any join query. Look at the following query: SELECT * FROM EMPLOYEE,EMP_ACT WHERE EMPLOYEE.EMPNO = EMP_ACT.EMPNO AND EMPLOYEE.EMPNO ='000010' The optimizer may decide to internally modify the query to be: WITH HT AS (SELECT * FROM EMPLOYEE WHERE EMPLOYEE.EMPNO='000010') SELECT * FROM HT, EMP_ACT WHERE HT.EMPNO = EMP_ACT.EMPNO AND EMP_ACT.EMPNO IN (SELECT DISTINCT EMPNO FROM HT) The optimizer places the results of the "subquery" into a temporary hash table. The hash table of the subquery can be applied in one of two methods against the EMP_ACT (fact) table:
LPG
Big_Table
Storekey
Small_Table
Storekey
Join Condition
Store
Dial 2
Dial 1
56
Item_Dim keylist
Itemkey
EVI 1
Dynamic Bitmap or RRN list
EVI 2
Dynamic Bitmap or RRN list
EVI 3
Datekey
Date_Dim keylist
Datekey
Sales Quantity
Keylist Keylist used used as as local local selection selection (LPG) (LPG)
Select store, item, date, weather, sum(sales), sum(quantity) from item_dim, store_dim, date_dim, fact_table where itemkey in (...keylist...) and storekey in (...keylist...) and datekey in (...keylist...) and itemkey=itemkey, storekey=storekey, datekey=datekey group by store, item, date, weather
57
CQE: Current maximum for key list is 30,000 keys or 5MB, whichever is smaller. If the keys are not represented in the DLIST then the benefits of the index ANDing and skip sequential access method on the fact table will be minimized. Those keys will not be used for local selection and bitmap optimization.
58
59
As mentioned earlier, the iSeries Navigator V5R4 enhancements contain some of the most notable DB2 enhancements. Ignoring the DB2 OnDemand Performance Center & Health Center, most of the enhancements listed on this chart are pretty minor. With many of these enhancements, just adding a graphical interface for system commands such as Copy File (CPYF). Onto the more significant enhancements ============ Delimiter option on creates allows the user to control the naming mode that iNav uses on Creates since the SQL & System naming mode have differences in object ownership and authorities.
Index Advice, also enhanced (SQE Only) SQE Optimizer will also examine Join, Grouping, Ordering criteria Available on V5R3 with latest DB Group PTF
60
The Database component of iSeries Navigator is the major driver of DB2 UDB simplification in V5R4 with the addition of new tools and streamlining of existing tools. This new combination of tools is known as the DB2 OnDemand Performance Center. The system-wide Index Advisor is one of the new tools in V5R4. Index advice from the DB2 UDB query optimizer is not a new capability in V5R4. Feedback on suggested indexes in past releases could be found in optimizer debug messages within a joblog or a database monitor collection. However, this feedback was only available if someone manually started a database monitor or turned on debug messaging. Even if this was done, an analyst had to deal with the complexities of extracting the advised index details of various joblogs and database monitor files. The new Index Advisor eliminates these hurdles with a click of mouse. No user intervention such as starting a database monitor is needed since the query optimizer automatically logs index advice for all queries to a repository in V5R4. Just right click on your Database name in the iSeries Navigator tree and select the Index Advisor task and youre given the output displayed here. Even better is the fact that index advice provided in V5R4 by the SQL Query Engine (SQE) query optimizer is more intelligent and complete. The Index Advice given in past releases only focused on filtering criteria of a query and ignored the join, grouping and ordering criteria. Again, this design required manual intervention from an analyst to take the DB2 index advice and then review the associated query to supplement the DB2 index advice. Look closely at the Index Advisor output and you will also see that the optimizer also provides advice on the type of index to create. Some queries may benefit from a traditional radix index while others would benefit from an encoded vector index structure. =============== SQE still does not advise indexes for IN & OR predicates
SystemWide Advisor Repository - QSYS2/SYSIXADV Index Advice can be done at Database, Schema & Table level
Another tool that is always up and running like the Index Advisor is the SQL Plan Cache toolset. The SQL Plan Cache is an internal repository used by SQE to store the access plans and associated statistics for SQL statements that are currently running or recently have been executed on your server. The big benefit with this Plan Cache analysis toolset is that for the first time you can perform detailed SQL performance analysis without the overhead of running a database monitor. A detailed database monitor collection can eat up disk space in a hurry and noticeably slow your disk response times. Gone are the days when a user calls about poor performance and your only response for an SQL-based application was starting a database monitor and having the user rerun the report or application. Now, when a performance issue is raised on V5R4, you can just jump right in and analyze the contents of the SQL Plan Cache to see if a long-running SQL statement is the cause of the performance problems. Using the SQL Plan Cache tool is as simple as rightclicking on the SQL Plan Cache Snapshots component and selecting the SQL Plan Cache -> Show Statements task. This action will produce the screen show here. One feature of this Plan Cache viewer not to miss is the statement filters that are available on the left hand-side of the dialog. These filters allow you to intelligently subset the SQL statements that you want to analyze as opposed to scrolling through a random list of thousands of SQL statements. For instance, you can pick the Top 10 statements that are executed most often or take the most time. You can also choose to just analyze the statements currently running or the SQL statements run by a specific user. Furthermore, once you have selected an SQL statement to analyze you can seamlessly launch Visual Explain for that statement from the same window. You will need to remember that the SQL Plan Cache is fixed in size, so some SQL statements may get purged out of the cache if they have not been recently executed. In these situations, the database monitor tool will have to be used. If you want to archive the contents of the cache into a permanent object, you can create an SQL Plan Cache Snapshot. Effectively this operation takes the contents of the SQL plan cache and converts it into a database monitor output file. Once the snapshot is created, you can run reports to analyze the contents just like a database monitor collection.
Exit point (QIBM_QWC_PWRDWNSYS ) can be used to call a program each time PWRDWNSYS and ENDSBS commands are issued. The program is called before the system actually powers down. iSeries customers could automate capturing a plan cache snapshot prior to each IPL by setting up this exit point and having exit program call the stored procedure QSYS2/DUMP_PLAN_CACHE. This way, they do not have to worry about doing it manually. Set it up once and it does it every time.
62
63
64
Database monitors have existed on DB2 UDB for iSeries for many releases, but the usability of these monitors takes a big step forward in V5R4 with addition of both pre- and post-filters. These filters are very similar to the filters that were available with the SQL Plan Cache viewer. Database monitor filters are important for a couple of reasons. First, the filters can minimize the overhead and disk space consumed by a database monitor collection. As a case in point, a customer recently ran a database monitor collection on all jobs on there system for 15 minutes. An SQL-based ERP application that was running on the system at the time resulted in almost 3 million rows of detailed monitor data. Applying one of the filters available in V5R4 (see Fig 3) such as only collecting monitor data for a certain user or only for SQL statements referencing a specified table could have significantly reduced the amount of data that has to be collected by DB2 UDB. These filters are made available when you start a new SQL Performance Monitor collection via the graphical interface or the STRDBMON CL command.
Drill-Thru Analysis
65
The filters discussed on the previous chart eliminate the number of rows in a database monitor collection, but there are still hundreds of fields in a single database monitor row and many monitor row types that need to be sorted thru to identify and fix performance problems. This maze of data made it a challenge to quickly look at a database monitor collection and determine if there were any issues unless you were a DB2 performance expert. Thus, a dashboard summary was added in V5R4 to address this issue. On the first analysis of a database monitor collection, DB2 for iSeries will automatically summarize some of the key performance indicators in the collection and then present the results in the summary window depicted at the top of this chart. This summary will allow an analyst to quickly scroll through some high-level DB2 performance indicators to determine if more detailed analysis is needed. V5R4 also addresses another issue with the database monitor analysis reports by providing drill-through analysis. The reports could be used to find an SQL statement with performance issues. However, if more detailed research of this statement was needed with a tool such as Visual Explain, the user had to exit out of the tool and then manually copy that SQL statement for Visual Explain analysis. As you can see with the bottom figure, an analyst can easily just rightclick to drill into more detail with Visual Explain. This will be a great enhancement for improving the efficiency of your database analysts and administrators.
66
Often times the performance of an application can change over time as the server configuration changes and new versions are installed. Unfortunately neither IBM or its customers are perfect when it comes to making changes sometimes have an unexpected negative impact on performance. With the new monitor comparison utility, users can utilize database monitor collections to help determine what change is causing performance issues. This technique requires that a customer collects and saves database monitor collections when a critical DB2 workload or report is performing and running well on their server. Once performance of this workload or report changes, a new database monitor collection can be compared with the good performance monitor collection to help identify the differences causing performance problems. This can be a long-running process with large database monitor collections, so plan accordingly.
One final iSeries Navigator addition to highlight is the DB2 Health Center. The Health Center allows you to monitor and perform a checkup on how your well database objects are doing compared to different DB2 UDB limits such as the maximum number of rows allowed in a table or maximum size of an index object. In addition, you have the ability to change the thresholds on when to mark a DB2 object as being close to a limit and the ability to save these checkups to a permanent file to enable historical comparisons. This feature is launched by right-clicking on your named database and selecting the Health Center task. The Index & MQT Evaluators allow you to determine how valuable an index or MQT is by allowing you to check how often its used by the DB2 Query Optimizer. This can be helpful for administrators and analysts trying to cleanup unused DB2 objects on their system. To access these evaluators just right-click on the table object and select the appropriate Show task. If you want to determine if an SQL statement is behind a slow-moving application that an end user is complaining about, than iSeries Navigator Show Current SQL is a great tool to use. The tool has been available since V5R2, but in V5R4 the tool returns more attributes about the SQL request such as program name to help you more quickly identify the issue. This tool can be accessed by rightclicking on the Databases icon in the iSeries Navigator tree.
Additional Education
Lecture on DB2 for i5/OS Overview of V5R4 DB2 for i5/OS new features and functions Review "why System i for BI" Introduction to query optimization and execution Review indexing technology and index strategies Review statistics technology and statistics strategies Review SQL interfaces and best practices for performance Review SQL performance monitoring and analysis tools Relatore MIKE CAIN, Rochester DB2 for i5/OS Competency Center assistente Simona Pacchiarini, System i PreSales DATA: 5 giugno CODICE CORSO: OLB04IT DURATA: 2 giorni LINGUA: inglese con affiancamento PREZZO : euro 490,00 + IVA a partecipante SEDE: IBM Segrate (prenotata aula Volta) Iscrizioni alla pagina http://www-05.ibm.com/services/learning/it/ta-iris.nsf/(ExtCourseNr)/OLB04IT Lab with DB2 for i5/OS Brief review of query optimizer and analysis tools Use V5R4 iSeries Navigator monitoring and analysis tools Work through exercises to understand and tune SQL requests Relatore MIKE CAIN, Rochester DB2 for i5/OS Competency Center assistente Simona Pacchiarini, System i PreSales DATA: 7 giugno CODICE CORSO: OLB05IT DURATA: 1 giorno LINGUA: inglese con affiancamento PREZZO : euro 300,00 + IVA a partecipante SEDE: Milano, Via Frigia 27 Iscrizioni alla pagina http://www-05.ibm.com/services/learning/it/ta-iris.nsf/(ExtCourseNr)/OLB05IT
2006 IBM Corporation 68
Additional Information
DB2 for i5/OS home page - ibm.com/iseries/db2 DB2 Navigator lab, downloadable
ftp://www.redbooks.ibm.com/redbooks/2006_System_i5_Forum_Session_files/2006DBL01/
69