Lecture 2 Processor Performance
Lecture 2 Processor Performance
Lecture 2 Processor Performance
The above characteristics look as advantages for using the clock rate as a
measure of performance, but as a matter of fact it is nonlinear measure
(doubling the clock rate does not mean doubling the resulting
performance) and is an unreliable metric. As many owners of personal
1.3.2.2
Relative MIPS
One attempt to retain the use of MIPS, but to make it useful among
diferent instruction sets, was to choose a definition of MIPS that is relative
to some agreed-upon reference machine. The term used is Relative MIPS
and defines as follows:
Relative MIPS = (Time-reference / Time-unrated) * MIPS-reference
(1.3)
Where
:
Time-reference = Execution time of a program on the
reference machine
Time-unrated = Execution time of the same program on the
machine to be related
MIPS-reference = Agreed-upon MIPS rating of the reference
machine
Relative MIPS is proportional to execution time only for a given program
and a given input. Even when these are identified, it becomes harder to
find a reference machine on which to run programs as the machine ages.
Moreover, should the older machine be run with the newest release of
the compiler and operating system, or should the software be fixed so
the reference machine does not become faster over time? There is also
the temptation to generalize from a relative MIPS rating obtained using
one benchmark to a general statement about relative performance, even
though there can be wide variations in performance of two machines
across a complete set of benchmarks.
In the 1980s the dominant reference machine was the VAX-11/780, which
was called a 1-MIPS machine. If a machine is five times faster than VAX11/789, its rating for that benchmark is 5 relative MIPS.
The 1970s and 1980s
which was defined by
programs. MIPS was
hence the invention
second).
1.3.2.3
MFOLPS (Million FLOating-Point Operations
Per Second)
Most performance measures look at the processors ability to process
integer data: whole numbers, text and the like. This is perfectly valid,
because most processing is done on this sort of data. However, intensive
foating point applications, such as spreadsheets and some graphics
programs and games, make heavy use of the processors foating point
unit. As a matter of fact, processors that have similar performance on
integer tasks can have very diferent performance on foating point
operations. MFOLPS is another measure of performance (speed) that
takes into consideration this type of processing.
The MFLOPS as a performance metric tries, also to correct the primary
shortcoming of the MIPS metric by more precisely defining the type of
the operation that will be taken as unit of counting. It defines an
arithmetic operation on two foating-point (i.e. fractional) quantities to be
the basic unit.
MFLOPS, for a specific program running on a specific computer, is a
measure of millions of foating point-operation (megafops) per second:
MFLOPS = Number of floating-point operations / (Execution time
x 106 )
(1.4)
7
A foating-point operation is an addition,
subtraction, multiplication,
or division operation applied to numbers represented by a single or
double precision foating-point representation. Such data items are
heavily used in scientific calculations and are specified in programming
languages using key words like float, real, double, or double precision.
MFLOPS depends on the program. Diferent programs require the
execution of diferent numbers of foating point operations. Since
MFLOPS were intended to measure foating-point performance, they are
not applicable outside that range. Compiler, as an example, have a
MFLOPS rating near 0 no matter how fast the machine is, because
compilers rarely use foating-point arithmetic.
1.3.3
CPU time =
seconds
Instructio ns
Cycles
Seconds
=
x
x
program
Pr ogram
Instructio n Cycles
(1.6)
In the general case, executing the program means the use of diferent instruction
types each has its own frequency of occurrence and its CPI.
CPIi
Fi
Then:
CPU =
(CP xF )
i
i =1
In such cases the number of total CPU clock cycles will be expressed as:
n
(1.7)
i =1
av
er
EXAMPLE 1.1:
(1.8)
Operation
ALU
Load
Store
Branch
Freq, Fi
50%
20%
10%
20%
CPIi
1
5
3
2
CPIi x Fi
.5
1.0
.3
.4
% Time
23%
45%
14%
18%
SOLUTION:
(1.9)
i =1
Instruction class
A
B
C
CPI
1
2
3
Code from:
Compiler 1
Compiler 2
Assume we optimized the compiler of the load-store machine for which the measurements given in
Table 1.1 have been made. The compiler discards 50% of the arithmetic logic unit (ALU) instructions,
although it cannot reduce loads, stores, or branches. Ignoring systems issues and assuming a 500 MHz
clock rate and 1.57 unoptimized CPI, what is the MIPS rating for optimized code versus unoptimized code?
Does the rating of MIPS agree with the ranking of execution time?
Instruction type
ALU
Loads
Stores
Branches
Frequency
43%
21%
12%
24%
SOLUTION: Given the value CPIunoptimized = 1.57, then from equation (1.1), we get:
6
Instruction type
ALU
Loads
Stores
Branches
Frequency
27.4%
26.8%
15.3%
30.5%
11