Module 4 MP
Module 4 MP
Module 4 MP
The 80386 processor will enters in real address mode when it is resetted.
➔ In this mode 80386 will work similar to 8086 processor but can access 32 bit registers
➔ Paging is disabled in this mode
➔ 80386 can only access 1 Mb of physical memory like 8086 in this mode
➔ 20 bit physical memory is computed by shifting segment address to left 4 times
and adding it to offset address
➔ Only some instructions in 80386 can be executed in this mode
➔ Primary purpose of this mode is to set up 80386 for protected virtual address mode
Protected Virtual Address Mode
The 80386 can switch from real address mode to PVAM by setting PE bit (Protection
Enable bit)
➔ In PVAM the processor can run all 8086 and 80286 programs
➔ In this mode the processor has 4 Gb of physical memory address space and 64 Tb of
virtual memory address space(The virtual memory does not exist physically it still
appears to be available within the system)
➔ The concept of VM is implemented using Physical memory that the CPU can directly
access and secondary memory that is used as a storage for data and program, which are
stored in secondary memory initially
➔ 80386 operates in two memory management modes in PVAM
➔ Non-Paged Mode
➔ Paged Mode
➔ paging is a memory management scheme by which a computer stores and retrieves
data from secondary sttorage for use in main memory.In this scheme,the memory is
divided in to fixed size pages and stores in secondary memory,when a program needs a
page the OS copies certain number of pages from secondary memory to main
memory.Paging is an important part of virtual memory implementations in modern
operating systems.
➔ In PVAM virtual address can be 48 bit or 32 bit
➔ In non-paged mode the pysical address is computed using the selector,descriptor and
offset like 80286
➔ In paged mode a linear address is computed using selector,descriptor amd offset,later
this linear address is converted to physical address by the paging unit
Address Computation in PVAM when paging is disabled
Virtual Mode
➢ The processor can switch from PVAM to Virtual Mode by setting VM bit
➔ Virtual Mode permits the execution of 8086 applications with all protection features
of 80386
➔ Virtual Mode is similiar to Real Mode
➔ So also in this mode the processor computes 20 bit address by shifting the segment
register left by 4 times and adding it to offset
➔ In Virtual Mode paging can be enabled
ARCHITECTURE OF 80386
The six functional units are: a)Bus Interface Unit
b)Code Prefetch Unit:It is used to prefetch code from m/m & store it in 16 bit temporary
code queue.
The queue acts as a buffer between prefetch unit & the instruction decode unit.
c)Instruction decode unit :It translates the instructions from the code prefetch unit into
microcodes.
d)Segmentation unit:It produces a translated linear address which paging unit translates into
physical addresss.
e)Paging unit:It checks for paging violations before it sends a bus request and the address to
BIU & external bus.
➢ Execution Unit:It operates on the decoded instruction, performing the steps needed to
execute it.
➢ It contains control unit,data unit and protection test unit
➢ Control Unit contains microcodes and parallel hardware for fast multiply,divided and
effective address calculation.
➢ Data Unit includes ALU,8general purpose registers and 64 barrel shifter for
performing multiple bit shift in one clock.
➢ Protection test unit checks for segmentation violation under the control of microcode
Pentium Microprocessor
The execution unit has two parallel pipelines named U-Pipe and V-Pipe with individual ALU
for each pipe.
➔ Pipeline has 5 stages
➔ They are Prefetch (PF),Decode Stage-1 ( D1),Decode Stage-2 (D2),Execute (E) and
Writeback (WB)
➔ U is the default pipeline and slightly more capable than V-pipeline
➔ The U-pipeline can handle any instructions while the V-Pipeline can handle simple
instructions
➢ PREFETCH:Instructions are fetched from instruction cache and aligned in prefetch
buffer for decoding
➢ Decode 1:Instructions are decoded into Pentium’s internal instruction format.Branch
prediction also takes place at this stage.
➢ Decode 2:addresss computation takes place at this stage.
➢ Execute:The Integer hardware executes the instruction.
➢ Write-back:The results of computation are written back into register
SUPERSCALAR ARCHITECTURE
The Pentium supports a two-way superscalar architecture with two integer pipelines,
simultaneously executing two consecutive instructions.
The stages PF and D1 are common to both U and V pipelines
The D1 uses various techniques to decide if the given two consecutive instructions
can be executed simultaneously considering the type of instructions and
dependencies between them.
While executing instructions the processor checks next two instructions.
If the execution of one instruction does not depend on the other then the first
instruction is issued to U-Pipe and the second instruction to V-Pipe. Thus two
instruction is executed simultaneously.
If it's not possible to execute two instructions simultaneously then two instructions are
issued to U-pipe one by one
Pipe Line Hazards
Hazards are situations in pipelining which prevents the next instruction in the instruction
stream from executing during the designated clock pulses
➔ Hazards prevents the ideal speed up gained by pipelining and are classified in to three
classes :-
a) Structural Hazards
b) Data Hazards
c) Control Hazards
Structural Hazards. They arise from resource conflicts when the hardware cannot
support all possible combinations of instructions in simultaneous overlapped
execution.
Data hazards occur when instructions that exhibit data dependence modify data in
different stages of a pipeline. Ignoring potential data hazards can result in race
conditions (also termed race hazards). There are three situations in which a data
hazard can occur:
read after write (RAW), a true dependency
write after read (WAR), an anti-dependency
write after write (WAW), an output dependency
Control Hazards.They arise from the pipelining of branches and other instructions
that change the PC.
Hyperthreading Technology(HTT)
➔ Hyperthreading provides two logical processor in a single processor,ie a single
processor acts like multiple processor
➔ HTT allows processor to work more efficiently.
➔ HTT enables different part of processor to work on different tasks concurrently.
➔ It divides workload in to processes and threads.
➔ HTT make use of resources that would sit idle and done more work in same amount of
time.
➔ It is first introduced in INTEL Xeon processor
Hyper-Threading Technology Architecture
Each logical processor maintains one copy of the architecture state
MMX -(Multi Media Extension) Processor
The MMX technology consists of following improvements over the non-MMX Pentium
microprocessor:
I. 57 new instructions have been added that are designed to handle video, audio, and
graphical data more efficiently.
II. 4 new data types
A new process, Single Instruction Multiple Data (SIMD), makes it possible for one
instruction to perform the same operation on multiple data items.
SSE
➔
➔ A Multi-core processor is an integrated circuit in which two or more processors have
been attached for enhanced performance, reduced power consumption and more
efficient simultaneous processing of multiple tasks.
➔ The concept of multicore technology is mainly centered on the possibility of parallel
computing.
➔ Architecture of multicore processor enables communication between all available
cores to ensure that processing tasks are divided and assigned accurately.
Advantages
Reduced Cost: Multiple processors share the same resources (like power supply and mother
board).
Increased Reliability: The failure of one processor does not affect the other processors
though it will slow down the machine provided there is no master and slave processor.
Increased Throughput:An increase in the number of processes completes the work in less
time.
Major issues in multicore processing
Interconnect issue
• Since there are so many components on chip in a multicore processor like cores, caches,
network controllers etc., the interaction between them can affect the performance if
the interconnection issues are not resolved properly
• In the initial processors, bus was used for communication between the components.
• In order to reduce the latency, crossbar and mesh topologies are used for
interconnection of components.
• Also, as the parallelism increases at the thread level, communication also increases off-
chip for memory access, I/O etc.
• For issues like this, packet based interconnection is actively used. This packet based
interconnection has been used by Intel .
cache coherence
• cache coherence is the uniformity of shared resource data that ends up stored in
multiple local caches.
• When clients in a system maintain caches of a common memory resource,
problems may arise with incoherent data, which is particularly the case
with CPUs in a multiprocessing system.
• consider both the clients have a cached copy of a particular memory block from a
previous read.
• Suppose the client on the bottom updates/changes that memory block, the client
on the top could be left with an invalid cache of memory without any notification
of the change.
•
Cache coherence is intended to manage such conflicts by maintaining a coherent
view of the data values in multiple caches.
BUS-SNOOPING PROTOCOL, a protocol for maintaining cache coherency in
symmetric multiprocessing environments.
In a directory based coherence approach, the information about which caches have a copy of a
block is maintained in a structure called Directory.