Chapter - 1 DTT 210
Chapter - 1 DTT 210
Chapter - 1 DTT 210
Lecture 01
Subject Outline
Assignment 30%
Quiz 20%
Examination 40%
Textbook and other Resources
• William Stallings, Computer Organisation
and Architecture: Designing for
Performance. Prentice Hall, 2000
• Englander, Irv, The Architecture of
Computer Hardware and System
Software. Wiley, 2000
• K.F. Ibrahim, PC Operation and Repair,
Prentice Hall, 2002
• www.intel.com
• www.ibm.com
• www.pcguide.com
DTT 201: Computer Architecture
Chapter 1
Introduction
Architecture & Organization 1
• Architecture is those attributes visible to
the programmer
—Instruction set, number of bits used for data
representation, I/O mechanisms, addressing
techniques.
—e.g. will computer have a multiply instruction?
• Organization is how features are
implemented
—Control signals, interfaces, memory
technology.
—e.g. will instruction be implemented by a
multiply unit or is it done by repeated
addition?
Microprocessor
Function of Microprocessor ?
• A microprocessor incorporates most or
all of the functions of a computer's central
processing unit (CPU) on a single
integrated circuit (IC, or microchip).
Architecture & Organization 2
• All Intel x86 family share the same basic
architecture
• The IBM System/370 family share the
same basic architecture
Peripherals Computer
Central Main
Processing Memory
Unit
Computer
Systems
Interconnection
Input
Output
Communication
lines
Structure – The CPU
CPU
Computer Arithmetic
Registers and
I/O Logic Unit
System CPU
Bus
Internal CPU
Memory Interconnection
Control
Unit
Structure - The Control Unit
Control Unit
CPU
Sequencing
ALU Logic
Control
Internal
Unit
Bus
Control Unit
Registers Registers and
Decoders
Control
Memory
ENIAC - background
• Electronic Numerical Integrator And
Computer
• Eckert and Mauchly
• University of Pennsylvania
• Trajectory tables for weapons
• Started 1943
• Finished 1946
—Too late for war effort
• Used until 1955
ENIAC - details
• Decimal (not binary)
• 20 accumulators of 10 digits
• Programmed manually by switches
• 18,000 vacuum tubes
• 30 tons
• 15,000 square feet
• 140 kW power consumption
• 5,000 additions per second
ENIAC - cont
Vacuum tubes
von Neumann/Turing
• Stored Program concept
• Main memory storing programs and data
• ALU operating on binary data
• Control unit interpreting instructions from
memory and executing
• Input and output equipment operated by
control unit
• Princeton Institute for Advanced Studies
—IAS
• Completed 1952
Structure of von Neumann machine
IAS - details
• Memory -> 1000 storage x 40 bit words
— Binary number
— Number word -> a sign bit & 39 bit value
— Instruction word -> 2 x 20 bit instructions
• Set of registers (storage in CPU)
— Memory Buffer Register
— Memory Address Register
— Instruction Register
— Instruction Buffer Register
— Program Counter
— Accumulator
— Multiplier Quotient
Structure of IAS –
detail
Commercial Computers
• 1947 - Eckert-Mauchly Computer
Corporation
• UNIVAC I (Universal Automatic Computer)
• US Bureau of Census 1950 calculations
• Became part of Sperry-Rand Corporation
• Late 1950s - UNIVAC II
—Faster
—More memory
IBM
• Punched-card processing equipment
• 1953 - the 701
—IBM’s first stored program computer
—Scientific calculations
• 1955 - the 702
—Business applications
• Lead to 700/7000 series
Transistors
• Replaced vacuum tubes
• Smaller
• Cheaper
• Less heat dissipation
• Solid State device
• Made from Silicon (Sand)
• Invented 1947 at Bell
Labs
• William Shockley et al.
Transistor Based Computers
• Second generation machines
• NCR & RCA produced small transistor
machines
• IBM 7000
• DEC - 1957
—Produced PDP-1
Microelectronics
• Literally - “small electronics”
• A computer is made up of gates, memory
cells and interconnections
• These can be manufactured on a
semiconductor
• e.g. silicon wafer
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
—Up to 100 devices on a chip
• Medium scale integration - to 1971
—100-3,000 devices on a chip
• Large scale integration - 1971-1977
—3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
—100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 -
—Over 100,000,000 devices on a chip
Moore’s Law
• Increased density of components on chip
• Gordon Moore – co-founder of Intel
• Number of transistors on a chip will double every
year
• Since 1970’s development has slowed a little
— Number of transistors doubles every 18 months
• Cost of a chip has remained almost unchanged
• Higher packing density means shorter electrical
paths, giving higher performance
• Smaller size gives increased flexibility
• Reduced power and cooling requirements
• Fewer interconnections increases reliability
Growth in CPU Transistor Count
IBM 360 series
• 1964
• Replaced (& not compatible with) 7000
series
• First planned “family” of computers
—Similar or identical instruction sets
—Similar or identical O/S
—Increasing speed
—Increasing number of I/O ports (i.e. more
terminals)
—Increased memory size
—Increased cost
DEC PDP-8
• 1964
• First minicomputer (after miniskirt!)
• Did not need air conditioned room
• Small enough to sit on a lab bench
• $16,000
—$100k+ for IBM 360
• Embedded applications & OEM
• BUS STRUCTURE – Omnibus (96 separate
signal paths to carry control, address and
data signals)
DEC - PDP-8 Bus Structure
Intel
• 1971 - 4004
—First microprocessor
—All CPU components on a single chip
—4 bit
• Followed in 1972 by 8008
—8 bit
—Both designed for specific applications
• 1974 - 8080
—Intel’s first general purpose microprocessor
Techniques built into processor
• Branch prediction
—Predicts which branches of instructions are
likely to be processed
—Buffer pre-fetched instructions
• Data flow analysis
—Create an optimised schedule of instructions
which are dependant on other’s results
—Prevent delay
• Speculative execution
—Execute instructions in advance and holds the
results in temporary locations
—Keep execution engines busy by executing
needed instructions
Performance Balance
• Processor speed increased
• Memory capacity increased
• Memory speed lags behind processor
speed
Logic and Memory Performance Gap
Solutions
• Increase number of bits retrieved at one
time
—Make DRAM “wider” rather than “deeper”
• Change DRAM interface
—Include cache
• Reduce frequency of memory access
—More complex cache and cache on chip
• Increase interconnection bandwidth
—High speed buses
—Hierarchy of buses
Approaches to increase processor speed
• Increase hardware speed of processor
—Fundamentally due to shrinking logic gate size
– More gates, packed more tightly, increasing clock
rate
– Propagation time for signals reduced
• Increase size and speed of caches
—Dedicating part of processor chip
– Cache access times drop significantly
• Change processor organization and
architecture
—Increase effective speed of execution
—Parallelism
Problems from Clock Speed and Logic
Density
• Power
— Power density increases with density of logic and clock
speed
— Dissipating heat
• RC delay
— Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them
— Delay increases as RC product increases
— Wire interconnects thinner, increasing resistance
— Wires closer together, increasing capacitance
• Memory latency
— Memory speeds lag processor speeds
• Solution:
— More emphasis on organizational and architectural
approaches to improve performance
Intel Microprocessor Performance
Strategies to increase performance
• Strategy 1: Increased Cache Capacity
—Typically two or three levels of cache
between processor and main memory
—Chip density increased
– More cache memory on chip
+Faster cache access
—Pentium chip devoted about 10% of
chip area to cache
—Pentium 4 devotes about 50%
Strategies to increase performance - cont
• Strategy 2: More Complex Execution Logic
—Enable parallel execution of instructions
—Pipeline works like assembly line
– Different stages of execution of different
instructions at same time along pipeline
—Superscalar allows multiple pipelines
within single processor
– Instructions that do not depend on one
another can be executed in parallel
Diminishing Returns
• Internal organization of processors is
exceedingly complex
—Further increases in this direction is small
• Benefits from cache are reaching limit
• Increasing clock rate runs into power
dissipation problem
—Some fundamental physical limits are being
reached
New Approach – Multiple Cores
• Multiple processors on single chip
— With large shared cache
• Within a processor, increase in performance
proportional to square root of increase in
complexity
• If software can use multiple processors, doubling
number of processors almost doubles
performance
• So, use two simpler processors on the chip rather
than one more complex processor
• With two processors, larger caches are justified
— Power consumption of memory logic less than
processing logic
• Example: IBM POWER4
— Two cores based on PowerPC
Pentium Evolution (1)
• 8080
— first general purpose microprocessor
— 8 bit data path
— Used in first personal computer – Altair
• 8086
— much more powerful
— 16 bit
— instruction cache, prefetch few instructions
— 8088 (8 bit external bus) used in first IBM PC
• 80286
— 16 Mbyte memory addressable
— up from 1Mb
• 80386
— 32 bit
— Support for multitasking
Pentium Evolution (2)
• 80486
—sophisticated powerful cache and instruction
pipelining
—built in maths co-processor
• Pentium
—Superscalar
—Multiple instructions executed in parallel
• Pentium Pro
—Increased superscalar organization
—Aggressive register renaming
—branch prediction
—data flow analysis
—speculative execution
Pentium Evolution (3)
• Pentium II
— MMX technology
— graphics, video & audio processing
• Pentium III
— Additional floating point instructions for 3D graphics
• Pentium 4
— Note Arabic rather than Roman numerals
— Further floating point and multimedia enhancements
• Itanium
— 64 bit
— see chapter 15
• Itanium 2
— Hardware enhancements to increase speed
• See Intel web pages for detailed information on
processors
PowerPC
• 1975, 801 minicomputer project (IBM) RISC
• Berkeley RISC I processor
• 1986, IBM commercial RISC workstation product,
RT PC.
— Not commercial success
— Many rivals with comparable or better performance
• 1990, IBM RISC System/6000
— RISC-like superscalar machine
— POWER architecture
• IBM alliance with Motorola and Apple
— Resulted in implementing PowerPC architecture
• PowerPC architecture derived from POWER
architecture
• Result from PowerPC architecture
— Superscalar RISC
— Apple Macintosh
— Embedded chip applications
PowerPC Family (1)
• 601:
— Quickly to market. 32-bit machine
• 603:
— Low-end desktop and portable
— 32-bit
— Comparable performance with 601
— Lower cost and more efficient implementation
• 604:
— Desktop and low-end servers
— 32-bit machine
— Much more advanced superscalar design
— Greater performance
• 620:
— High-end servers
— 64-bit architecture
PowerPC Family (2)
• 740/750:
—Also known as G3
—Two levels of cache on chip
• G4:
—Increases parallelism and internal speed
• G5:
—Improvements in parallelism and internal
speed
—64-bit organization