SMT and CMP Architectures
SMT and CMP Architectures
SMT and CMP Architectures
DINESH
Instruction-level parallelism(ILP)
Thread-level parallelism(TLP)
Multiprocessor(MP)
Simultaneous Multithreading
Key idea
Issue multiple instructions from multiple threads each
cycle
Features
Fine-Grained Coarse-Grained
Thread 1
Thread 2
Thread 3
Thread 4
Simultaneous
Multithreading
Thread 5
Idle slot
Simultaneous Multithreading:
threads
SS
MP2
MP4
FGMT
SMT
3.3
2.4
1.5
3.3
3.3
--
4.3
2.6
4.1
4.7
--
--
4.2
4.2
5.6
--
--
--
3.5
6.1
Comparison of SMT vs
Superscalar
SMT processors are compared to base superscalar
processors in several key measures :
Utilization of functional units.
Utilization of fetch units.
Accuracy of branch predictor.
Hit rates of primary caches.
Hit rates of secondary caches.
Performance improvement:
Issue slots.
Funtional units.
Renaming registers.
CMP Architecture
Core 2
Core 3
Core 4
Chip Multithreading
Chip Multithreading = Chip Multiprocessing + Hardware
Multithreading.
CMPs Performance
CMPs are now the only way to build high performance
microprocessors , for a variety of reasons:
o
Large uniprocessors are no longer scaling in performance,
because it is only possible to extract a limited amount of
parallelism from a typical instruction stream.
o
Cannot simply ratchet up the clock speed on todays
processors,or the power dissipation will become prohibitive.
o
CMT processors support many h/w strands through efficient
sharing of on-chip resources such as pipelines, caches and
predictors.
o
CMT processors are a good match for server workloads,which
have high levels of TLP and relatively low levels of ILP.
The performance race between SMT and CMP is not yet decided.
CMP is easier to implement, but only SMT has the ability to hide
latencies.
A functional partitioning is not exactly reached within a SMT
processor due to the centralized instruction issue.
o
A separation of the thread queues is a possible solution,
although it does not remove the central instruction issue.
o
A combination of simultaneous multithreading with the CMP
may be superior.
Research : combine SMT or CMP organization with the ability to
create threads with compiler support of fully dynamically out of a
single thread.
o
Thread-level speculation
o
Close to multiscalar
SMT
Unutilized
Thread 1
Thread 2
THANK U GUYS