L5-L6-Performance Issues
L5-L6-Performance Issues
L5-L6-Performance Issues
Performance Issues
+
Designing for Performance
The cost of computer systems continues to drop dramatically, while the performance
and capacity of those systems continue to rise equally dramatically
Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago
components
Reduce the frequency
Architectural examples of memory access by
incorporating
include: increasingly complex
and efficient cache
structures between
the processor and
main memory
Graphics display
Wi-Fi modem
(max speed)
Hard disk
Optical disc
Laser printer
Scanner
Mouse
Keyboard
101 102 103 104 105 106 107 108 109 1010 1011
Data Rate (bps)
RC delay
Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them
Delay increases as the RC product increases
As components on the chip decrease in size, the wire
interconnects become thinner, increasing resistance
Also, the wires are closer together, increasing capacitance
Memory latency
Memory speeds lag processor speeds
107
106
Transistors (Thousands)
105 Frequency (MHz)
Power (W)
104 Cores
103
102
+
10
0.1
1970 1975 1980 1985 1990 1995 2000 2005 2010
Law
in the development of multi-core
machines
Software must be adapted to a highly
parallel execution environment to
exploit the power of parallel
processing
(1 – f)T fT
N
1
1 f 1 T
N
f = 0.90
+ f = 0.75
f = 0.5
Number of Processors
Queuing system
If server is idle an item is served immediately, otherwise an
arriving item joins a queue
There can be a single queue for a single server or for multiple
servers, or multiple queues with one being for each of multiple
servers
an
co di alog
nv git to
er al
sio
n
T = Ic x [p + (m × k)] × τ
Compiler technology X X X
Processor X X
implementation
Cache and memory
X X
hierarchy
• Arithmetic
• Geometric
• Harmonic
+
AM
(f) GM
HM
MD
AM
(g) GM The three means applied to various data sets, each of
Example:
which
HM
has eleven data points and a maximum data point value of
11. The median value is also included in the chart.
0 1 2 3 4 5 6 7
(a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11) M
(b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7, 8, 8, 9, 1 1) A
(c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1) G
(d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 1 1, 11) H
(e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 1 1)
(f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(g) Lower outlier (1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11)
MD
AM
(b) GM
HM
MD
AM
(c) GM
HM
MD
AM
(d) GM
HM
MD
AM
(e) GM
HM
MD
AM
(f) GM
HM
MD
AM
(g) GM
HM
0 1 2 3 4 5 6 7 8 9 10 11
(a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11) MD = median
(b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7, 8, 8, 9, 1 1) AM = arithmetic mean
(c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1) GM = geometric mean
(d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 1 1, 11) HM = harmonic mean
(e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 1 1)
(f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(g) Lower outlier (1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11)
SPEC
An industry consortium
Defines and maintains the best known collection of benchmark
suites aimed at evaluating computer systems
Performance measurements are widely used for comparison and
research purposes
+ Best known SPEC benchmark suite
Table 2.5
C Compiler Based on gcc Version 3.2,
403.gcc 2.24 1,064 C
generates code for Opteron.
Combinatoria Vehicle scheduling
429.mcf 2.53 327 C l algorithm.
Optimization
Artificial Plays the game of Go, a
445.gobmk 2.91 1,603 C Intelligence simply described but deeply
complex game.
Table 2.6
434.zeusmp 2.53 1,566 Fortran Physics / CFD dynamics simulation of
astrophysical phenomena.
Simulate Newtonian
Biochemistry /
equations of motion for
435.gromacs 1.98 1,958 C, Fortran Molecular
Dynamics hundreds to millions of
particles.
436.cactusAD 3.32 1,376 C, Fortran Physics / General Solves the Einstein
M Relativity evolution equations.
437.leslie3d
444.namd
2.61
2.23
1,273
2,483
Fortran
C++
Fluid Dynamics
Biology /
Molecular
Model fuel injection flows.
Simulates large
biomolecular systems.
SPEC
CPU2006
Dynamics
Program library targeted at
Finite Element
447.dealII 3.18 2,323 C++ adaptive finite elements and
Analysis
Floating-Point
error estimation.
Linear Test cases include railroad
450.soplex 2.32 703 C++ Programming, planning and military airlift
Optimization models.
453.povray
454.calculix
1.48
2.29
940
3,04`
C++
C, Fortran
Image Ray-tracing
Structural
Mechanics
3D Image rendering.
Finite element code for
linear and nonlinear 3D
Benchmarks
structural applications.
459.GemsFDT Computational Solves the Maxwell
2.95 1,320 Fortran
D Electromagnetics equations in 3D.
Quantum chemistry
Quantum
465.tonto 2.73 2,392 Fortran package, adapted for
Chemistry
crystallographic tasks.
Simulates incompressible
470.lbm 3.82 1,500 C Fluid Dynamics
fluids in 3D.
481.wrf 3.10 1,684 C, Fortran Weather Weather forecasting model
482.sphinx3 5.41 2,472 C Speech recognition Speech recognition
software. (Table can be found on page 70
in the textbook.)
+
Terms Used in SPEC Documentation
Benchmark Peak metric
A program written in a high-level This enables users to attempt to
language that can be compiled optimize system performance by
and executed on any computer optimizing the compiler output
that implements the compiler
Speed metric
System under test This is simply a measurement of the
This is the system to be evaluated
time it takes to execute a compiled
benchmark
Used for comparing the ability of
Reference machine a computer to complete single
This is a system used by SPEC to tasks
establish a baseline performance
for all benchmarks Rate metric
Each benchmark is run and This is a measurement of how many
measured on this machine to tasks a computer can accomplish in
establish a reference time for a certain amount of time
that benchmark This is called a throughput,
capacity, or rate measure
Base metric Allows the system under test to
These are required for all execute simultaneous tasks to
reported results and have strict take advantage of multiple
guidelines for compilation processors
Start
Get next
program
Run program
three times
Select
median value
Ratio(prog) =
Tref(prog)/TSUT(prog)
End