Ece586 Lec5 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Lecture Topics

ECE 486/586
Computer Architecture
Lecture # 5
Spring 2015
Portland State University

Quantitative Principles of Computer Design


Fallacies and Pitfalls
Instruction Set Principles
Introduction
Classifying Instruction Set Architectures

Reference:
Chapter 1: Sections 1.9, 1.11
Appendix A: Sections A.1, A.2

Principle #1: Exploit Parallelism

Key Principles of Computer Architecture

System Level

Take Advantage of Parallelism


Principle of Locality
Focus on the Common Case
Amdahls Law
Processor Performance Equation

Multiple processors
Multiple disks
Multiple memory channels
Pipelined buses

Processor Level
Pipelined instruction execution
Multiple functional units

Logic level
Carry lookahead adders
Multi-banked caches
Multi-ported register files

Principle #2: Exploit Locality


Temporal

Principle # 3: Focus on Common Case


Implication of Amdahls Law:

Recently accessed items likely to be accessed in the near future


Code
Loops and function calls

Data
Repeated access to the same variable, e.g., loop counter

Speeding up 90% of the execution by only 10% is as good as


speeding up 10% of the execution by 10x

Examples:
The number of add/subtract instructions in a typical program is
substantially higher than divide instructions
Focus more on building fast adders as compared to fast dividers

Spatial
Items whose addresses are near one another tend to be
referenced close together in time
Code

Most loop branches are taken


Use branch prediction (fetch the branch target instead of the next
sequential instruction)

Sequential instruction execution

Data
Array elements, fields in a data structure

Fallacies and Pitfalls


Fallacy

Fallacy
The relative performance of two processors with the same
ISA can be judged by clock rate or by the performance of a
single benchmark suite

A falsehood often widely believed to be true

Pitfall
Easily made mistake
Generalizations of principles that are true in a limited
context

Problems with the above argument:


The processors may have the same clock rate, but may differ
considerably in their pipelines and cache subsystems
Same clock rate but different CPIs

A processor may be tuned to one particular benchmark suite,


while performing poorly on other benchmarks

Fallacy

Fallacy

Benchmarks remain valid indefinitely

Peak performance tracks observed performance

Why not?

Problems with the above argument:

Vulnerability to Benchmark engineering


Once a benchmark becomes popular, there is tremendous
pressure to improve performance by bending the rules for
running the benchmark
Kernels which spend majority of their time on a very small
section of code are particularly vulnerable

Peak performance is only useful as an upper bound on the


performance that a system can deliver
Typical performance can vary 10x or more from peak
performance
Difference between typical performance and peak
performance can vary greatly from program to program

Example: matrix300 kernel

Fallacy

Fallacy

Multiprocessors are a silver bullet

Synthetic benchmarks predict performance for real programs

Why not?

Why not?

The switch to multiprocessors happened due to ILP wall and


Power wall, not due to dramatically simplified parallel
programming
In the multi-core era, improving performance is now the
burden of programmers
Programmers must make their programs more and more
parallel, an uphill task

Synthetic benchmarks may not take into account effects of real


world systems (loading, context switching)
System may not fare as well in practice as it does on the
benchmark

Synthetic benchmarks may under-reward performanceenhancing optimizations


Whetstone loops with few iterations

System which optimizes loop branch prediction wont fare as


well on the benchmark as in practice

Fallacy
MIPS (Millions of Instructions per Second) is an accurate
measure for comparing performance among computers
 =

   

  
=

 106
   106

Problems:
Whats an instruction? Depends upon ISA
One instruction on an ISA may do as much work as ten
instructions on another ISA

MIPS can vary inversely with performance

Pitfall
Comparing hand-coded assembly and compiler-generated,
high-level language performance
Potential issues:
Hand-coded assembly requires specialized programmers; less
likely to be used except in embedded systems
Unless the compiler can perform the same optimizations that
can be done by assembly language programmer, performance
of the compiler-generated code will not match the hand-coded
program

HW floating point instructions vs. software routines


HW faster but executes fewer instructions

MIPS can vary among programs on same computer

Pitfall
Falling prey to Amdahls Law
Dont forget to assess the potential usage/impact of a feature before
embarking on the long journey to implement it

Pitfall
A single point of failure
Dependability is no stronger than the weakest link in the chain
Make every component redundant so that no single component
failure could bring down the whole system

Instruction Set Principles


Reading:
Hennessy and Patterson, Appendix A
RISC paper (Patterson & Sequin): posted on course website

Instruction Set Architecture


Instruction Set Architecture (ISA)
Traditional meaning of computer architecture
What is visible to the programmer/compiler writer
Independent of organization and implementation
E.g., ISA doesnt include caches and pipelines

Instructions, Operands, Addressing Modes

Instruction Set Architecture


Compiler

Input: high level language


Output: assembly language for target ISA
Global, local optimizations
Register allocation

Assembler
Input: Assembly language
Output: Machine code (object file)

Linker
Inputs: Object files, library files
Outputs: Executable program

Loader
Reads executable from disk
Passes command line arguments
Optionally fixes absolute addresses

ISA Classification

ISA Classification

ISA Examples
Stack
HP calculator
Pentium FP (x87 co-processor)
8 registers organized as stack

Accumulator
PDP-8
8051 microcontroller

Load/Store (Register/Register)
RISC: MIPS, Alpha, ARM, PowerPC, SPARC
Itanium
C=A+B
A, B and C are
memory locations.
R1, R2 and R3 are
registers

Register/Memory
IA-32 (Intel x86), Motorola 68000, IBM 360
PDP-11
VAX (really Memory/Memory)

You might also like