PERFORMANCE MONITOR FOR A RELATIONAL INFORMATION SYSTEM
N. N. Oliver and John D. Joyce
Computer Science Department, Research Laboratories
General Motors Corporation
Warren, Michigan, 48090, U.S.A.
Although some relational information systems have recently become available for production use,
very few, if any of them, contain facilities to collect performance data.
This paper describes
a method for implementing a performance monitor and some of the data collected by this performance monitor which was recently installed in the REGIS
(RElational General Information
System). REGIS is currently being used within General Motors.
The performance monitor is used
to collect data about the usefulness of the command language,
performance improvements
following major system upgrades and performance predictions based on past runs.
While
installing the performance monitor several system deficiencies were uncovered.
Correction of
the deficiencies has already improved performance by almost an order of magnitude.
Future
improvements are expected to improve performance by at least another order of magnitude.
E GI
REGIS (RElational General Information System) i s
an interactive
relational information handler
which includes a comprehensive set of commands to
handle data, perform statistical analyses and
produce graphical output [1,2]. After the initial
development stages of REGIS had been successfully
accomplished, it became clear that a performance
monitoring program for the REGIS system would be a
valuable asset to further development in the
project.
Before
giving
an overview of the
performance monitor,
a brief description of the
system itself is in order.
The main power of
REGIS lles in its ability to quickly handle
unforeseen queries and to enable the user to
utilize intermediate results to determine the
future course of his analysis.
An emphasis is
placed on providing a view of data in which the
relationships are easily understandable and easily
manipulated
to
derive
new
relationships.
Capabilities are integrated into the system for
those users whose main interests are to use
flexible graphical tools to view their data.
Other users may want to place an emphasis on
statistical analysis of data.
REGIS contains
elementary
statistical
programs
and planned
interfaces to make reasonably smooth transitions
between REGIS and advanced statistical packages.
The user
can prepare his data for analysis
utilizing the (relational)
information handling
facilities,
analyze
it
with the statistical
functions and plot the results all within o n e
package. Fig. 1 gives an overview of the major
components of
the
system.
Interfaces
to a
statistical system and a graph plotting system are
transparent to the user giving a view of a unified
system with multiple capabilities.
S
COMMAND
t
STATISTICAL
ANALYSIS
FUNCTIONS
INTERPRET
t ....
RELATIONAL
INFORMATION
HANDLING
FACILITIES
I
GRAPHICAL
PLOTTING
OPERATORS
Fig. i - Major System Components
2.
PERFORMANCE MONITOR OVERVIEW
In any scientific discipline, measurement is one
of the vital keys to understanding a complex
system.
This is especially true with a .new
information handling system.
Measurements ere
required to understand better the strengths and
weaknesses of the new relational techniques and
their interaction with the data.
Since a number
of major design changes have been planned for the
internals of the REGIS system, performance data
before and after these changes would give valuable
insight into the effects of these and future
changes.
This was the prime motivating factor in
designing and building the performance monitor.
Since REGIS is primarily an interactive system,
it is important to determine the characteristics
of the command language.
Commonly used sequences
of commands, relative frequency of command usage,
response times for various types of co~mands, the
capacity of the system in terms of cost and
response for various size data bases and user
errors c a n be monitored.
Some of the above
information gives a basis to predict costs and
response
times
when
new
applications
are
considered for the system.
329
It is appropriate to indicate the criteria that
were used in the design and implementation of the
performance monitor. First, one should be able to
use REGIS facilities to perform a variety of
analyses on its
own measurement data.
This
implied the first ground rule, namely that the
output of the performance monitor for REGIS should
be in the form of standard REGIS tables so that
they can be read in for subsequent analysis with
little or no preprocessing necessary. This would
eliminate the effort of having to write special
analysis programs.
A second ground rule was that
since REGIS has facilities for convenient and
flexible information handling capabilities, one
should be able to make use of the internal support
facilities to do much of the work in actually
gathering, recording and saving the measurement
data. The objective here was to minimize the
effort required for new software for the monitor
and to take advantage of imbedded programs in
REGIS which were already checked out and fully
operational.
(and the table is stored in a permanent file, on
disk). At the end of the session the same row is
reentered into the table but this • time it includes
also the session ending time (and the table is
stored again
in
the permanent file).
Thus
whenever a session table without the ending time
is found in the permanent file it indicates that
the session did not terminate normally.
This
helps in monitoring the REGIS (and the Operating
System) reliability.
The difference between the
ending and starting times provides the duration
(elapsed time)
of
the session.
(The date,
starting and ending time, and the CPU time are
obtained from the operating system.)
Command measurement data:
This data i s stored in the REGIS Command-table
which contains as many rows as the number of
commands executed during the session (one row for
each c o m m a n d ) . It includes 25 columns which
contain
different
data
categories such as:
session-ID;
command identification information
(number and name);
sizes (number of rows and
colunms) of the tables (maximum of three tables
with one command) referenced; performance data
which is obtained from the operating system such
as:
response time, cpu time, number of supervisor
calls, number of page faults, and the user thinktime;
table I/O information (number of get/put
rows, number
of
get/put columns, number of
translations from one type of data into another,
etc); count of sorts requested and how many were
performed (a sort is performed only on unsorted
tables); and a count of calls to two key utility
subroutines which were suspected to be major
sources of inefficiency. The data for both tables
described above is stored on a permanent disk file
and is periodically analyzed using REGIS.
The
following
sections
will
describe
the
performance monitor, the selection of benchmarks
for controlled testing, various aspects of the
monitor in operation and some results obtained
thus far.
3.
THE MEASUREMENT MECHANISM
Measurement data is stored in standard REGIS
tables. The storing is done by adding one row to
a t a b l e for each command executed. Two kinds of
measurement data are collected:
Session data
pertaining to the whole REGIS session and Command
data recorded before and after execution of each
comm~nd.
A user may turn on a brief display of
CPU time used and page faults after each command
if immediate
feedback
on relative costs o f ,
commands is desired.
4.
Fig. 2 illustrates the measurement operations on
the two kinds of data during the three phases of a
REGIS session.
initialization
execution
In
selecting
benchmarks
for
a
relational
information system, many of the criteria are
similar to those one would use for testing other
database systems or other application systems.
These criteria will only be mentioned briefly.
Other points which are primarily concerned with
the
particular
features
of the system and
performance monitor will be described in more
detail.
termination
session table created
data
row recorded
table saved on disk
row recorded
saved on disk
command
data
table created
table
recording-one
row added for saved on disk
each command
The benchmark operations were chosen so that a
wide variety of demands would be placed on the
REGIS system.
As a consequence a variety of
demands are placed on the supporting computer
system, ranging from heavy computation to heavy
I/O or
paging demands,
The benchmarks were
basically centered around the idea of using the
commands required to solve a particular problem
which a user might typically have. All benchmarks
were designed to be run as a replay of an
interactive session.
Each problem to be solved
was treated
as
a separate benchmark.
This
provided the opportunity to use only a small part
of the total benchmark for preliminary testing
purposes or for measuring the effects of minor
changes. Some analysis of the benchmarks was done
on a complete interactive session in addition to
F i g . 2 - Measurement Operations
Session measurement data:
This d a t a is stored in the Session-table which
contains only one row. It contains: the sessionID uniquely identifying the session; the date;
the starting and ending times; the total CPU time
used;
BENCHMARK SELECTION
and a user-ID.
The row
of
data including all measurements
except the session ending time, is entered into
the Session-table at the beginning of the session
330
the analysis on a per command basis. This gave
the opportunity to compare various performance
factors when different problems were being solved.
Fig. 3 gives a summary of the benchmarks actually
used.
# OF
COMMANDS
I
41
36
29
43
28
50
78
60
the calls.
As
a result t h e overall
performance was improved by a factor of 4.
Excessive Hash Coding:
In order to avoid character string manipulation
the REGIS system used to hash code all the
terminal and file input. The performance monitor
pointed out that this global hash coding operation
is expensive.
As a result we changed REGIS to
selectively hash code only the necessary input
keywords. This modification improved the data
reading efficiency by a factor of 2.
DESCRIPTIONS
Null run for initalization, termination
Statistical, graphical, engineering data
Information inquiry from multiple tables
Various graphical views of data
Text and multiple table manipulation
Logical and statistical analysis
Analysis of hierarchical relationships
General logical relationships
Heavy computation, logical manipulation
High Paging Rate:
The REGIS system runs, at present, under the TSS
virtual memory operating system [ 3 ] . Being a
virtual memory operating system it uses paging for
data I/O between the main memory and the auxiliary
storage devices. The performance monitoring data
collected showed higher paging rates than expected
for each
cow,hand execution.
A closer look
discovered an operating system Link-Editor related
error. Once this error was corrected, the size of
the REGIS system in virtual memory was decreased
by 30%, the real memory utilized decreased by 12%,
the paging rate was reduced by 26% and the overall
performance was increased 10%.
Fig. 3 - Benchmark Characteristics
Five groups of data from actual user problems were
used for the nine benchmarks. The data was taken
from quality control analysis, machine failures
and project control and scheduling applications.
The tables had a mixture of both numeric and
textual data with a predominance of textual data.
In addition
to
the standardized performance
aspects of benchmarks, these standard applications
also provide an additional verification of correct
REGIS operation prior to each integration of new
modules into the standard user's version of the
system.
5.
REGIS
Correction the above inefficiency combined with
the other improvements already cited resulted in
an overall performance improvement of a factor of
8.
BENEFITS OF PERFORMANCE MONITORING
5.2
Short term future benefits:
The benefits
to
REGIS from the performance
monitoring can be divided into three classes: The
first class includes all the immediate benefits
which were realized shortly after the installation
of the monitoring mechanism. The second and third
classes involve
short
and long term future
benefits.
These
benefits
include
elimination of
the
inefficiencies which have been discovered as of
this time but have not been corrected yet due to
the relatively
complicated
nature
of these
corrections. However, we do intend to eliminate
these inefficiencies in the near future. They
are :
5.1
High Cost of Data Conversion:
Immediate Benefits:
Some inefficiencies in the REGIS system were
detected during
the installation and testing
phases of the performance monitor. Some others
were discovered when analysing the performance
data of
the
initial benchmark runs.
These
inefficiencies were
corrected upon discovery.
They are:
One of the first steps in the REGIS execution
sequence is usually the conversion of the input
data from an external (dlsplayable) format into
internal
format.
The
performance
monitor
indicated that the conversion is very expensive.
Most of
the conversions will be avoided by
providing an option to store the data with the
internal format on disk.
Excessive Modularity:
Expensive Sorting:
REGIS has been designed and implemented in a
very modular
fashion
to
facilitate
future
modifications and enhancements.
This resulted in
a large
number
of
subroutine
calls.
The
performance monitor pointed out that in some cases
an excessively large number of calls were executed
activating a few utility subroutines and these
calls came from a few other utility subroutines
(90000 calls per command in some cases).
Although the sorting algorithm utilized in REGIS
is considered one of the best (Qulcksort) it is
nevertheless expensive to perform the sorting
operation. The data itself (the row of data) is.
physically
moved
with
the
sort
and
the
implementation was
not
tailored to optimize
sorting.
The performance monitor confirmed our
earlier suspicions of this sorting inefficiency.
An initial improvement will allow multiple sorts
of a table by utilizing multiple pointer arrays
for each table and avoiding the necessity to move
the data itself.
To correct this situation the few lines of code
contained in the excessively called subroutine was
put in line in the few u£ility programs initiating
331
We
expect that the implementation of these t w o
enhancements will significantly improve the system
performance especially on large data bases.
5.3
significance of this, however, lles i~ the amount
of disturbance which is caused and the purposes of
measurements.
Since
we are using the
the
facilities of
REGIS
to assist in the data
collection and storage of measurement data, it may
be instructive to look at the components of the
overhead during measurement.
Long term future benefits:
These include all the benefits which we expect
to realize
in
the
future
by continuously
monitoring the execution of the REGIS users.
The overhead for each command is expressed in eqn.
(I) :
Performance Change Following Enhancements:
OVERHEAD(i) = FIXED + VARIABLE(i)
Performance data
collected
before and after
major REGIS enhancements will provide us with
information
about
the
effectiveness of the
modifications. (We already benefited from this
facility in recording the performance improvement
detailed above under "Immediate Benefits".)
The total, measurement overhead is expressed in
eqn. (2):
OVERHEAD = INITIAL + FINAL + SUM(i=1 to N) of
OVERHEAD (i)
Performance Prediction:
OVERHEAD = INITIAL + FINAL + N * FIXED
+ SUM(i=1 to N) of VARIABLE(i)
The REGIS command language is a step By step
non-procedural [4] algebraic language believed to
encompass the most useful relational functions.
With the assistance of the performance monitor we
will obtain information about the usefulness of
this kind of a command language such as:
The average number of commands required to
answer a typical query.
This will help us
decide if a new higher level language [5,6]
will be better suited for our system.
The high
resources
improving
first.
use
so
the
be made more
For any given test of the system the FIXED
overhead is a constant amount independent of the
function the command is performing or the amount
of data being processed within the command. This
FIXED overhead
per
command consists of the
resources required to add one row of data to the
table which is accumulating the measurement data
for each command, initializing counters for the
next command and obtaining the computer system
resources used (real time, cpu time, paging data).
Although this value is a constant for each command
for a given test of the system, this overhead may
change
as
the
system changes because
the
efficiency of the mechanism which adds rows to a
table may change.
powerful
commands which use a lot of
that
we can concentrate on
performance of these commands
The potential usefulness of a natural Englishlike command language. There has been a trend
towards investigating natural language as a
front end interface to relational data base
managers [7].
The performance monitor might
provide
us
with
information
about
the
usefulness of such a front end on our system
(e.g. by counting the number of syntax related
errors, number of typing and spelling errors
etc.) .
The variable overhead per command VARIABLE(i) is
both a function of the particular command being
executed and a function of the amount of data
being processed by the command.
For counters
which are not
incremented to extremely high
values, the counting mechanism is modularized
within a few subroutines which are called in the
appropriate places.
Other counters which are
frequently incremented are installed right in the
appropriate subroutines.
Originally all counters
were modularized into counting routines, but early
tests indicated that the overhead of doing this
was exorbitant. Clearly the number of increments
to these counters is dependent on the function of
the command and the date which drives the internai
loops.
The expected long term gains provided the initial
motivation for the installation of the performance
monitor.
The fact that we could realize some
immediate gains from its installation was an
unexpected pleasant surprise.
6.
(3)
The INITIAL
overhead
is concerned with the
creation, setup
and
storing of the session
identification table.
INITIAL
also includes
preparation of the table which will contain the
data for
each
command.
The FINAL overhead
includes the resources used to update and write
out the table which contains the data describing
the session itself.
In addition, this overhead
includes the conversion and writing out of the
table which contains the performance data for each
command.
This latter overhead varies somewhat
depending on the number of commands monitored.
Command Language Information:
should
(2)
Substituting from eqn. (I) into (2) gives the
following:
Based on the data base size and the nature of the
queries which the user intends to answer we expect
to be able to predict the performance (mainly
response time and the cost) of the expected REGIS
execution.
Which commands
and flexible.
(i)
PERFORMANCE MONITOR OVERHEAD
Like any software monitor, this monitor disturbs
the conditions which are being measured.
The
332
Provisions
were
built
into the performance
monitor to switch off all monitoring activities.
One can also switch off only the command monitors
while leaving on the monitor which gathers and
displays data about the usage of some of the
computer system resources.
While the monitor is
running, most of the costs of the monitor itself
are excluded from the data recorded for each
command.
The costs of the monitor are borne by
the user for the entire session, however.
ACKNOWLEDGEMENT
We are grateful for the support of G. G. Dodd in
the evolution
and
development of the REGIS
project.
REFERENCES
[1] Joyce,
John D., Oliver N. N., Preliminary
Users Manual for the REGIS Information System,
Research Publication GMR-2008, General Motors
Research
Laboratories,
Warren,
Michigan,
October 1975.
The total
overhead
of the full monitor is
significant, but
not
excessive when one is
obtaining useful measurement data. It is feasible
to turn on the monitor for long periods of time to
monitor the usage patterns of all users of the
REGIS system.
Normally, the monitor is disabled
to eliminate the overhead. A user has the option
of turning on only a portion of the monitor and
having displayed some of the computer resources
used by each command.
The overhead of this
facility, as indicated in the first llne of Fig. 4
is small.
Few users turn on their own display
facility.
If they are concerned about costs,
their concern is on total job costs rather than
individual command costs.
Fig. 4 indicates the
proportions of various aspects of the overhead
COSTS.
MEASURE OF OVERHEAD
% ADDED
TO SESSION
[2] Joyce,
John D., Oliver N.
N., REGIS - A
Relational Information System With Graphics
and
Statistics,
AFIPS
National Computer
Conference Proceedings, vol.
45, 1976, 839844.
[3] IBM System/360 Time
System User's
Corporation.
[4] McLeod,
Dennis.,
Meldman,
Monte., RISS-A
Generalized Minicomputer Relational Data Base
Management System, AFIPS National Computer
Conference Proceedings, vol.
44, 1975, 397402.
% OF OVERHEAD
ONLY
SHOW COMPUTER RESOURCES
INITIAL + FINAL
COMMAND OVERHEAD
i.I
3.4
4.8
11.8
36.6
51.6
TOTAL
9.3
i00.0
[5] Boyce,
Raymond F., Chamberlln Donald D., A
Structured English Query Language, Proc. of
ACM SIGFIDET Workshop~ May 1974.
[e] McDonald,
Nancy.,
Stonebraker,
Michael.,
CUPID - The
Friendly
Query
Language,
Laboratory Memorandum:
ERL-M487, University
of California, Berkeley, October 1974.
Fig. 4 - Measurement Overhead In a Typical Session
7.
Sharing System, Coumand
Guide, Order No. GC282001, IBM
CONCLUSIONS
[7] Codd,
Edgar F., Seven Steps to Rendezvous
with the Casual User, Proc.
IFIP Workin~
Conference on Data Base Management, American
Elsevier Publishing Company, New York, N.Y.,
1974, 179-200.
The techniques of using existing modules within
the REGIS package for data gathering and storing
proved to be
beneficial.
Producing standard
output tables which contained the results of
measurements is very useful. The analysis of data
in a variety of unforeseen ways "is easy to do and
requires no special programs to be written. The
approaches just outlined did save considerable
implementation effort.
The effort in designing
and installing the performance monitor was well
justified.
The REGIS users have already gained
almost an order of magnitude improvement as a
result
of
correcting
some of the problems
discovered by using the performance monitor. We
expect that removal of other bottlenecks which
have already
been
discovered
will
improve
performance
by
at
least another order
of
magnitude. And as more of our long term goals
such as performance prediction and evaluation of
command languages are met we expect the REGIS
users to even further benefit from the performance
monitoring.
Finally, we hope that the data which we are
collecting on performance and command usage in an
industrial user environment will also be useful to
designers of data base hardware and firmware which
incorporate relational operations.
333