ved mitra

IBM PowerPC Processor Bus Architecture
A Seminar Report
Submitted
in partial fulfillment of the
requirements for the award of degree of
Bachelor of Technology
(Computer Science & Engineering)
Supervised By Submitted By
Ved Mitra Shringi Omkar Sharma

Coordinator Roll No. – 18EEJIT011
Department of Computer Science & Engineering

GOVERNMENT ENGINEERING COLLEGE, JHALAWAR
Jhalrapatan, Distt. Jhalawar – 326023
November–December, 2024
Certificate
This is to certify that Mr/Ms. ………………………………………, a student of

B.Tech (Computer Science & Engineering)……. semester has submitted his/her
Technical Seminar report under my supervision.
Ved Mitra Shringi

(Seminar –Coordinator)
Acknowledgement
I would like to express my sincere gratitude towards my Supervisor ……………… and also to
………………… for their valuable support and guidance throughout the year. Without their help
and support, this seminar would not have taken the present shape.
Abstract
Contents
1 Introduction 1
2 Major Functional Blocks on Virtex-II Pro FPGA 3

2.1 The Processor Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 PowerPC 405D5 Core. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 On-Chip Memory Controller (OCM) . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Clock/Control Interface Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.4 CPU – FPGA Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Block SelectRAM+ Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Configurable Logic Blocks (CLBs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 18 x 18 Bit Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 CoreConnect Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Instruction Level Parallelism (ILP) 9

3.1 Temporal Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 The Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Spatial Parallelism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 The Superpipelining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 The Superscalar Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Pipeline Performance Parameters and Dependencies 13

4.1 CPI and MIPS ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
4.2 Speed-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
4.4 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5 Pipeline Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5.1 Resource Conflict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
4.5.2 Data Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
4.5.3 Control Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
5 Pipeline Performance Measures 20

5.1 The Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2 The Superpipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 The Superscalar Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4.1 Cycle Time (Freq.) versus Pipeline Depth . . . . . . . . . . . . . . . . . .25
5.4.2 TPI versus Pipeline Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4.3 Normalized Eff. versus Branch Misprediction Latency . . . . 26
5.4.4 Normalized Performance versus ALU Latency . . . . . . . . . . . 27
6 Proposed Work 28
6.1 Aim–I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Aim–II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Aim–III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7 Conclusion 31
8 References 32
List of Figures
2.1 Processor Block Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Embedded PPC 405D5 Core Block Diagram . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Core Connect Bus System-on-a-Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1 Instruction Processing Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Space-Time Diagram for a 4-stage Non-Pipelined Processor . . . . . . . . . . . . 10
3.3 Space-Time Diagram for a 4-stage Pipelined Processor . . . . . . . . . . . . . 10
4.1 Space-Time Diagram for a 4-stage Pipelined Processor . . . . . . . . . . . . . . 14

4.2 Space-Time Diagram showing Resource-Conflict . . . . . . . . . . . . . . . . . . . 16
4.3 Space-Time Diagram showing RAW Hazard . . . . . . . . . . . . . . . . . . . . . 17
4.4 Space-Time Diagram showing Branch-Conflict . . . . . . . . . . . . . . . . . . . 18
5.1 Space-Time Diagram showing – R3000 Pipeline . . . . . . . . . . . . . . . . 20

5.2 Space-Time Diagram showing – Modified R3000 Pipeline . . . . . . . . 21
5.3 Space-Time Diagram showing – R4000 Superpipeline . . . . . . . . . . . . 22
5.4 Space-Time Diagram showing – R4000 with Branch Mis. Latency . . 23
5.5 Space-Time Diagram showing – R5000 Superscalar Architecture . . . 24
5.6 Dependency of Cycle Time on Pipeline Depth . . . . . . . . . . . . . . . . . . . . 25
5.7 TPI as a function of Pipeline Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
5.8 Normalized Efficiency versus Branch Misprediction Latency. . . . . . . . 26
5.9 Performance versus ALU Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.1 Proposed Design on Virtex-II Pro FPGA . . . . . . . . . . . . . . . . . . . . . . . . 28

6.2 Block Ram Memory Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3 Block Diagram for Proposed Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
List of Tables
Chapter 1
Introduction
1.1 Cloud Computing
1.2 Cloud Computing Security

Chapter 2
Major Functional Blocks on Virtex-II Pro FPGA and

the CoreConnect Bus
The Virtex-II Pro families contain user-programmable gate arrays with various configurable
elements and embedded blocks optimized for high-density and high-performance system designs.
The family incorporates Multi-gigabit Transceivers and PowerPC CPU Blocks in Virtex-II Pro
Series FPGA architecture. It empowers complete solutions for telecommunication, wireless,
networking, video, and DSP applications.
Virtex-II Pro devices has the following Blocks –
2.1 RocketIO Transceiver[1]

The RocketIO Multi-Gigabit Transceivers are flexible parallel-to-serial and serial-to-parallel
Embedded Transceiver Cores used for high-bandwidth interconnection between Buses, or other
subsystems. Virtex-II Pro devices support upto twenty RocketIO Transceivers .
RocketIO Transceivers have following features –
 Full-Duplex Serial Transceiver capable of data transfer rates from 600 Mbps to 3.125
Gbps.
 Monolithic clock synthesis and clock recovery (CDR)
 8/16/32-bit (RocketIO) selectable FPGA interface.
 Optional transmit and receive data inversion.
 50Ω /75Ω selectable on-chip transmit and receive terminations.
 Cyclic Redundancy Check support.
2.2 The Processor Block[1]

Virtex-II Pro devices support upto four IBM PowerPC Processor Blocks. Figure 2.1 shows the
internal architecture of the Processor Block.
Figure 2.1 Processor Block Architecture[1]
2.2.1 Power PC 405D5 Core
PPC 405D5 is a true RISC Core that can execute One Instruction per Clock cycle. On-chip
Instruction and Data Cache reduce design complexity and improve system throughput.
The Figure 2.2 shows the Block diagram of PowerPC 405D5 Processor Core.
Figure 2.2 Embedded PPC 405D5 Core Block Diagram[1]
PPC 405D5 has the following features –
 Embedded 300+ MHz Harvard Architecture Core

 Low PowerConsumption - 0.9 mW/MHz
 Five-stage Instruction Pipeline with Single-cycle execution of most Instructions,
including Load/Store.
 Separate Instruction & Data Cache Units.
 Thirty-two 32-bit general purpose Registers (GPRs).
 Static Branch Prediction.
 Unaligned and Aligned Load/Store support to Cache, Main Memory and On-Chip
Memory.
 Hardware Multiply/Divide for faster integer Arithmetic.
 Enhanced String and Multiple-Word handling.
 Big/Little Endian Byte-Ordering support.
 16 KB Array Instruction Cache Unit(ICU), 16 KB Array Data Cache Unit(DCU).
 Memory Management Unit (MMU).
 Three Timers Support on 64-bit time base – Programmable Interval Timer (PIT), Watch
Dog Timer(WDT) and Fixed Interval Timer (FIT)
 Debug Support through JTAG.
2.2.2 On-Chip Memory(OCM) Controller
OCM Controllers provide dedicated interfaces between Block SelectRAM+ memory in the FPGA
fabric and Processor Block instruction and data paths for high-speed access. The OCM Controller
provides an interface to both the 64-bit Instruction-Side Block RAM (ISBRAM) and the 32-bit
Data-Side Block RAM (DSBRAM).
The designer can choose to implement –
• ISBRAM only
• DSBRAM only
• Both ISBRAM and DSBRAM
• No ISBRAM and no DSBRAM
Typical applications for DSOCM include Scratch-pad Memory, as well as use of the Dual-port
feature of Block RAM to enable bidirectional data transfer between Processor and FPGA. Typical
applications for ISOCM include storage of Interrupt Service Routines(ISRs). Data transfer is
controlled through DCR Registers.
2.2.3 Clock/Control Interface Logic
The Clock/Control interface logic provides proper initialization and connections for PPC405
Clock/power management, resets, PLB Cycle control, and OCM interfaces. It also couples user
Signals between the FPGA fabric and the Embedded PPC405 CPU Core.
The Processor Clock connectivity is similar to CLB clock pins. It can connect either to global
clock nets or general routing resources. Therefore the Processor Clock source can come from
DCM, CLB, or user package pin.
Chapter 5
Conclusion and Future Work

References
[1] “Virtex-II Pro Series FPGA, Xilinx Documentation”, XC4000E/ XC4000X

Series FPGAs, Jan. 29, 1999
http://www.xilinx.com
[2] “Xilinx University Program Virtex-II Pro Development System, Hardware

Reference Manual”, UG069 (v1.0) March 8, 2005
[3] Hayes, J. P. – “Computer Architecture and Organization”, ed. 3rd : McGraw-

Hill, 1998.
[4] David A . Patterson and John L. Hennessy – “Computer Organization and

Design, A Hardware/ Software approach”, 3rd Edition, Morgan Kaufmann/
Elsevier Publishers.
[5] Kane G. and J. Heinrich – “MIPS RISC Architecture”, Englewood Cliffs, NJ :

Prentice – Hall, 1992.
[6] Hwang K. and Briggs F. A.– “Computer Architecture & Parallel Processing”,
McGraw-Hill, 1995.
[7] Yeager, K. C. – “The MIPS R10000 Superscalar Microprocessor” IEEE

Micro, vol. 16 (April 1996) pp. 28-40.
[8] Hartstein A. and Puzak R. Thomas – “The Optimum Pipeline Depth for a
Microprocessor” IBM – T. J. Watson Research Centre, 2002.
[20] Ramesh Gaonkar, “Microprocessor Architecture, Programming and

Applications with the 8085”, 5th edition, Penram International Publishing
(India) Pvt. Ltd.

ved mitra

Uploaded by

Copyright:

Available Formats

ved mitra

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ved mitra

Uploaded by

Copyright:

Available Formats

IBM PowerPC Processor Bus Architecture

Ved Mitra Shringi Omkar Sharma

Department of Computer Science & Engineering

This is to certify that Mr/Ms. ………………………………………, a student of

Ved Mitra Shringi

2 Major Functional Blocks on Virtex-II Pro FPGA 3

3 Instruction Level Parallelism (ILP) 9

4 Pipeline Performance Parameters and Dependencies 13

5 Pipeline Performance Measures 20

2.1 Processor Block Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.1 Instruction Processing Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.1 Space-Time Diagram for a 4-stage Pipelined Processor . . . . . . . . . . . . . . 14

5.1 Space-Time Diagram showing – R3000 Pipeline . . . . . . . . . . . . . . . . 20

6.1 Proposed Design on Virtex-II Pro FPGA . . . . . . . . . . . . . . . . . . . . . . . . 28

1.1 Cloud Computing

1.2 Cloud Computing Security

Major Functional Blocks on Virtex-II Pro FPGA and

Virtex-II Pro devices has the following Blocks –

2.1 RocketIO Transceiver[1]

RocketIO Transceivers have following features –

2.2 The Processor Block[1]

2.2.1 Power PC 405D5 Core

PPC 405D5 has the following features –

 Embedded 300+ MHz Harvard Architecture Core

2.2.2 On-Chip Memory(OCM) Controller

The designer can choose to implement –

2.2.3 Clock/Control Interface Logic

Conclusion and Future Work

[1] “Virtex-II Pro Series FPGA, Xilinx Documentation”, XC4000E/ XC4000X

[2] “Xilinx University Program Virtex-II Pro Development System, Hardware

[3] Hayes, J. P. – “Computer Architecture and Organization”, ed. 3rd : McGraw-

[4] David A . Patterson and John L. Hennessy – “Computer Organization and

[5] Kane G. and J. Heinrich – “MIPS RISC Architecture”, Englewood Cliffs, NJ :

[7] Yeager, K. C. – “The MIPS R10000 Superscalar Microprocessor” IEEE

[20] Ramesh Gaonkar, “Microprocessor Architecture, Programming and

You might also like