Academia.eduAcademia.edu

Performance monitor for a relational information system

1976, Proceedings of the annual conference on - ACM 76

Although some relational information systems have recently become available for production use, very few, if any of them, contain facilities to collect performance data. This paper describes a method for implementing a performance monitor and some of the data collected by this performance monitor which was recently installed in the REGIS (RElational General Information System). REGIS is currently being used within General Motors. The performance monitor is used to collect data about the usefulness of the command language, performance improvements following major system upgrades and performance predictions based on past runs. While installing the performance monitor several system deficiencies were uncovered. Correction of the deficiencies has already improved performance by almost an order of magnitude. Future improvements are expected to improve performance by at least another order of magnitude.

PERFORMANCE MONITOR FOR A RELATIONAL INFORMATION SYSTEM N. N. Oliver and John D. Joyce Computer Science Department, Research Laboratories General Motors Corporation Warren, Michigan, 48090, U.S.A. Although some relational information systems have recently become available for production use, very few, if any of them, contain facilities to collect performance data. This paper describes a method for implementing a performance monitor and some of the data collected by this performance monitor which was recently installed in the REGIS (RElational General Information System). REGIS is currently being used within General Motors. The performance monitor is used to collect data about the usefulness of the command language, performance improvements following major system upgrades and performance predictions based on past runs. While installing the performance monitor several system deficiencies were uncovered. Correction of the deficiencies has already improved performance by almost an order of magnitude. Future improvements are expected to improve performance by at least another order of magnitude. E GI REGIS (RElational General Information System) i s an interactive relational information handler which includes a comprehensive set of commands to handle data, perform statistical analyses and produce graphical output [1,2]. After the initial development stages of REGIS had been successfully accomplished, it became clear that a performance monitoring program for the REGIS system would be a valuable asset to further development in the project. Before giving an overview of the performance monitor, a brief description of the system itself is in order. The main power of REGIS lles in its ability to quickly handle unforeseen queries and to enable the user to utilize intermediate results to determine the future course of his analysis. An emphasis is placed on providing a view of data in which the relationships are easily understandable and easily manipulated to derive new relationships. Capabilities are integrated into the system for those users whose main interests are to use flexible graphical tools to view their data. Other users may want to place an emphasis on statistical analysis of data. REGIS contains elementary statistical programs and planned interfaces to make reasonably smooth transitions between REGIS and advanced statistical packages. The user can prepare his data for analysis utilizing the (relational) information handling facilities, analyze it with the statistical functions and plot the results all within o n e package. Fig. 1 gives an overview of the major components of the system. Interfaces to a statistical system and a graph plotting system are transparent to the user giving a view of a unified system with multiple capabilities. S COMMAND t STATISTICAL ANALYSIS FUNCTIONS INTERPRET t .... RELATIONAL INFORMATION HANDLING FACILITIES I GRAPHICAL PLOTTING OPERATORS Fig. i - Major System Components 2. PERFORMANCE MONITOR OVERVIEW In any scientific discipline, measurement is one of the vital keys to understanding a complex system. This is especially true with a .new information handling system. Measurements ere required to understand better the strengths and weaknesses of the new relational techniques and their interaction with the data. Since a number of major design changes have been planned for the internals of the REGIS system, performance data before and after these changes would give valuable insight into the effects of these and future changes. This was the prime motivating factor in designing and building the performance monitor. Since REGIS is primarily an interactive system, it is important to determine the characteristics of the command language. Commonly used sequences of commands, relative frequency of command usage, response times for various types of co~mands, the capacity of the system in terms of cost and response for various size data bases and user errors c a n be monitored. Some of the above information gives a basis to predict costs and response times when new applications are considered for the system. 329 It is appropriate to indicate the criteria that were used in the design and implementation of the performance monitor. First, one should be able to use REGIS facilities to perform a variety of analyses on its own measurement data. This implied the first ground rule, namely that the output of the performance monitor for REGIS should be in the form of standard REGIS tables so that they can be read in for subsequent analysis with little or no preprocessing necessary. This would eliminate the effort of having to write special analysis programs. A second ground rule was that since REGIS has facilities for convenient and flexible information handling capabilities, one should be able to make use of the internal support facilities to do much of the work in actually gathering, recording and saving the measurement data. The objective here was to minimize the effort required for new software for the monitor and to take advantage of imbedded programs in REGIS which were already checked out and fully operational. (and the table is stored in a permanent file, on disk). At the end of the session the same row is reentered into the table but this • time it includes also the session ending time (and the table is stored again in the permanent file). Thus whenever a session table without the ending time is found in the permanent file it indicates that the session did not terminate normally. This helps in monitoring the REGIS (and the Operating System) reliability. The difference between the ending and starting times provides the duration (elapsed time) of the session. (The date, starting and ending time, and the CPU time are obtained from the operating system.) Command measurement data: This data i s stored in the REGIS Command-table which contains as many rows as the number of commands executed during the session (one row for each c o m m a n d ) . It includes 25 columns which contain different data categories such as: session-ID; command identification information (number and name); sizes (number of rows and colunms) of the tables (maximum of three tables with one command) referenced; performance data which is obtained from the operating system such as: response time, cpu time, number of supervisor calls, number of page faults, and the user thinktime; table I/O information (number of get/put rows, number of get/put columns, number of translations from one type of data into another, etc); count of sorts requested and how many were performed (a sort is performed only on unsorted tables); and a count of calls to two key utility subroutines which were suspected to be major sources of inefficiency. The data for both tables described above is stored on a permanent disk file and is periodically analyzed using REGIS. The following sections will describe the performance monitor, the selection of benchmarks for controlled testing, various aspects of the monitor in operation and some results obtained thus far. 3. THE MEASUREMENT MECHANISM Measurement data is stored in standard REGIS tables. The storing is done by adding one row to a t a b l e for each command executed. Two kinds of measurement data are collected: Session data pertaining to the whole REGIS session and Command data recorded before and after execution of each comm~nd. A user may turn on a brief display of CPU time used and page faults after each command if immediate feedback on relative costs o f , commands is desired. 4. Fig. 2 illustrates the measurement operations on the two kinds of data during the three phases of a REGIS session. initialization execution In selecting benchmarks for a relational information system, many of the criteria are similar to those one would use for testing other database systems or other application systems. These criteria will only be mentioned briefly. Other points which are primarily concerned with the particular features of the system and performance monitor will be described in more detail. termination session table created data row recorded table saved on disk row recorded saved on disk command data table created table recording-one row added for saved on disk each command The benchmark operations were chosen so that a wide variety of demands would be placed on the REGIS system. As a consequence a variety of demands are placed on the supporting computer system, ranging from heavy computation to heavy I/O or paging demands, The benchmarks were basically centered around the idea of using the commands required to solve a particular problem which a user might typically have. All benchmarks were designed to be run as a replay of an interactive session. Each problem to be solved was treated as a separate benchmark. This provided the opportunity to use only a small part of the total benchmark for preliminary testing purposes or for measuring the effects of minor changes. Some analysis of the benchmarks was done on a complete interactive session in addition to F i g . 2 - Measurement Operations Session measurement data: This d a t a is stored in the Session-table which contains only one row. It contains: the sessionID uniquely identifying the session; the date; the starting and ending times; the total CPU time used; BENCHMARK SELECTION and a user-ID. The row of data including all measurements except the session ending time, is entered into the Session-table at the beginning of the session 330 the analysis on a per command basis. This gave the opportunity to compare various performance factors when different problems were being solved. Fig. 3 gives a summary of the benchmarks actually used. # OF COMMANDS I 41 36 29 43 28 50 78 60 the calls. As a result t h e overall performance was improved by a factor of 4. Excessive Hash Coding: In order to avoid character string manipulation the REGIS system used to hash code all the terminal and file input. The performance monitor pointed out that this global hash coding operation is expensive. As a result we changed REGIS to selectively hash code only the necessary input keywords. This modification improved the data reading efficiency by a factor of 2. DESCRIPTIONS Null run for initalization, termination Statistical, graphical, engineering data Information inquiry from multiple tables Various graphical views of data Text and multiple table manipulation Logical and statistical analysis Analysis of hierarchical relationships General logical relationships Heavy computation, logical manipulation High Paging Rate: The REGIS system runs, at present, under the TSS virtual memory operating system [ 3 ] . Being a virtual memory operating system it uses paging for data I/O between the main memory and the auxiliary storage devices. The performance monitoring data collected showed higher paging rates than expected for each cow,hand execution. A closer look discovered an operating system Link-Editor related error. Once this error was corrected, the size of the REGIS system in virtual memory was decreased by 30%, the real memory utilized decreased by 12%, the paging rate was reduced by 26% and the overall performance was increased 10%. Fig. 3 - Benchmark Characteristics Five groups of data from actual user problems were used for the nine benchmarks. The data was taken from quality control analysis, machine failures and project control and scheduling applications. The tables had a mixture of both numeric and textual data with a predominance of textual data. In addition to the standardized performance aspects of benchmarks, these standard applications also provide an additional verification of correct REGIS operation prior to each integration of new modules into the standard user's version of the system. 5. REGIS Correction the above inefficiency combined with the other improvements already cited resulted in an overall performance improvement of a factor of 8. BENEFITS OF PERFORMANCE MONITORING 5.2 Short term future benefits: The benefits to REGIS from the performance monitoring can be divided into three classes: The first class includes all the immediate benefits which were realized shortly after the installation of the monitoring mechanism. The second and third classes involve short and long term future benefits. These benefits include elimination of the inefficiencies which have been discovered as of this time but have not been corrected yet due to the relatively complicated nature of these corrections. However, we do intend to eliminate these inefficiencies in the near future. They are : 5.1 High Cost of Data Conversion: Immediate Benefits: Some inefficiencies in the REGIS system were detected during the installation and testing phases of the performance monitor. Some others were discovered when analysing the performance data of the initial benchmark runs. These inefficiencies were corrected upon discovery. They are: One of the first steps in the REGIS execution sequence is usually the conversion of the input data from an external (dlsplayable) format into internal format. The performance monitor indicated that the conversion is very expensive. Most of the conversions will be avoided by providing an option to store the data with the internal format on disk. Excessive Modularity: Expensive Sorting: REGIS has been designed and implemented in a very modular fashion to facilitate future modifications and enhancements. This resulted in a large number of subroutine calls. The performance monitor pointed out that in some cases an excessively large number of calls were executed activating a few utility subroutines and these calls came from a few other utility subroutines (90000 calls per command in some cases). Although the sorting algorithm utilized in REGIS is considered one of the best (Qulcksort) it is nevertheless expensive to perform the sorting operation. The data itself (the row of data) is. physically moved with the sort and the implementation was not tailored to optimize sorting. The performance monitor confirmed our earlier suspicions of this sorting inefficiency. An initial improvement will allow multiple sorts of a table by utilizing multiple pointer arrays for each table and avoiding the necessity to move the data itself. To correct this situation the few lines of code contained in the excessively called subroutine was put in line in the few u£ility programs initiating 331 We expect that the implementation of these t w o enhancements will significantly improve the system performance especially on large data bases. 5.3 significance of this, however, lles i~ the amount of disturbance which is caused and the purposes of measurements. Since we are using the the facilities of REGIS to assist in the data collection and storage of measurement data, it may be instructive to look at the components of the overhead during measurement. Long term future benefits: These include all the benefits which we expect to realize in the future by continuously monitoring the execution of the REGIS users. The overhead for each command is expressed in eqn. (I) : Performance Change Following Enhancements: OVERHEAD(i) = FIXED + VARIABLE(i) Performance data collected before and after major REGIS enhancements will provide us with information about the effectiveness of the modifications. (We already benefited from this facility in recording the performance improvement detailed above under "Immediate Benefits".) The total, measurement overhead is expressed in eqn. (2): OVERHEAD = INITIAL + FINAL + SUM(i=1 to N) of OVERHEAD (i) Performance Prediction: OVERHEAD = INITIAL + FINAL + N * FIXED + SUM(i=1 to N) of VARIABLE(i) The REGIS command language is a step By step non-procedural [4] algebraic language believed to encompass the most useful relational functions. With the assistance of the performance monitor we will obtain information about the usefulness of this kind of a command language such as: The average number of commands required to answer a typical query. This will help us decide if a new higher level language [5,6] will be better suited for our system. The high resources improving first. use so the be made more For any given test of the system the FIXED overhead is a constant amount independent of the function the command is performing or the amount of data being processed within the command. This FIXED overhead per command consists of the resources required to add one row of data to the table which is accumulating the measurement data for each command, initializing counters for the next command and obtaining the computer system resources used (real time, cpu time, paging data). Although this value is a constant for each command for a given test of the system, this overhead may change as the system changes because the efficiency of the mechanism which adds rows to a table may change. powerful commands which use a lot of that we can concentrate on performance of these commands The potential usefulness of a natural Englishlike command language. There has been a trend towards investigating natural language as a front end interface to relational data base managers [7]. The performance monitor might provide us with information about the usefulness of such a front end on our system (e.g. by counting the number of syntax related errors, number of typing and spelling errors etc.) . The variable overhead per command VARIABLE(i) is both a function of the particular command being executed and a function of the amount of data being processed by the command. For counters which are not incremented to extremely high values, the counting mechanism is modularized within a few subroutines which are called in the appropriate places. Other counters which are frequently incremented are installed right in the appropriate subroutines. Originally all counters were modularized into counting routines, but early tests indicated that the overhead of doing this was exorbitant. Clearly the number of increments to these counters is dependent on the function of the command and the date which drives the internai loops. The expected long term gains provided the initial motivation for the installation of the performance monitor. The fact that we could realize some immediate gains from its installation was an unexpected pleasant surprise. 6. (3) The INITIAL overhead is concerned with the creation, setup and storing of the session identification table. INITIAL also includes preparation of the table which will contain the data for each command. The FINAL overhead includes the resources used to update and write out the table which contains the data describing the session itself. In addition, this overhead includes the conversion and writing out of the table which contains the performance data for each command. This latter overhead varies somewhat depending on the number of commands monitored. Command Language Information: should (2) Substituting from eqn. (I) into (2) gives the following: Based on the data base size and the nature of the queries which the user intends to answer we expect to be able to predict the performance (mainly response time and the cost) of the expected REGIS execution. Which commands and flexible. (i) PERFORMANCE MONITOR OVERHEAD Like any software monitor, this monitor disturbs the conditions which are being measured. The 332 Provisions were built into the performance monitor to switch off all monitoring activities. One can also switch off only the command monitors while leaving on the monitor which gathers and displays data about the usage of some of the computer system resources. While the monitor is running, most of the costs of the monitor itself are excluded from the data recorded for each command. The costs of the monitor are borne by the user for the entire session, however. ACKNOWLEDGEMENT We are grateful for the support of G. G. Dodd in the evolution and development of the REGIS project. REFERENCES [1] Joyce, John D., Oliver N. N., Preliminary Users Manual for the REGIS Information System, Research Publication GMR-2008, General Motors Research Laboratories, Warren, Michigan, October 1975. The total overhead of the full monitor is significant, but not excessive when one is obtaining useful measurement data. It is feasible to turn on the monitor for long periods of time to monitor the usage patterns of all users of the REGIS system. Normally, the monitor is disabled to eliminate the overhead. A user has the option of turning on only a portion of the monitor and having displayed some of the computer resources used by each command. The overhead of this facility, as indicated in the first llne of Fig. 4 is small. Few users turn on their own display facility. If they are concerned about costs, their concern is on total job costs rather than individual command costs. Fig. 4 indicates the proportions of various aspects of the overhead COSTS. MEASURE OF OVERHEAD % ADDED TO SESSION [2] Joyce, John D., Oliver N. N., REGIS - A Relational Information System With Graphics and Statistics, AFIPS National Computer Conference Proceedings, vol. 45, 1976, 839844. [3] IBM System/360 Time System User's Corporation. [4] McLeod, Dennis., Meldman, Monte., RISS-A Generalized Minicomputer Relational Data Base Management System, AFIPS National Computer Conference Proceedings, vol. 44, 1975, 397402. % OF OVERHEAD ONLY SHOW COMPUTER RESOURCES INITIAL + FINAL COMMAND OVERHEAD i.I 3.4 4.8 11.8 36.6 51.6 TOTAL 9.3 i00.0 [5] Boyce, Raymond F., Chamberlln Donald D., A Structured English Query Language, Proc. of ACM SIGFIDET Workshop~ May 1974. [e] McDonald, Nancy., Stonebraker, Michael., CUPID - The Friendly Query Language, Laboratory Memorandum: ERL-M487, University of California, Berkeley, October 1974. Fig. 4 - Measurement Overhead In a Typical Session 7. Sharing System, Coumand Guide, Order No. GC282001, IBM CONCLUSIONS [7] Codd, Edgar F., Seven Steps to Rendezvous with the Casual User, Proc. IFIP Workin~ Conference on Data Base Management, American Elsevier Publishing Company, New York, N.Y., 1974, 179-200. The techniques of using existing modules within the REGIS package for data gathering and storing proved to be beneficial. Producing standard output tables which contained the results of measurements is very useful. The analysis of data in a variety of unforeseen ways "is easy to do and requires no special programs to be written. The approaches just outlined did save considerable implementation effort. The effort in designing and installing the performance monitor was well justified. The REGIS users have already gained almost an order of magnitude improvement as a result of correcting some of the problems discovered by using the performance monitor. We expect that removal of other bottlenecks which have already been discovered will improve performance by at least another order of magnitude. And as more of our long term goals such as performance prediction and evaluation of command languages are met we expect the REGIS users to even further benefit from the performance monitoring. Finally, we hope that the data which we are collecting on performance and command usage in an industrial user environment will also be useful to designers of data base hardware and firmware which incorporate relational operations. 333