ved mitra

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

IBM PowerPC Processor Bus Architecture

A Seminar Report

Submitted
in partial fulfillment of the
requirements for the award of degree of

Bachelor of Technology
(Computer Science & Engineering)

Supervised By Submitted By

Ved Mitra Shringi Omkar Sharma


Coordinator Roll No. – 18EEJIT011

Department of Computer Science & Engineering


GOVERNMENT ENGINEERING COLLEGE, JHALAWAR
Jhalrapatan, Distt. Jhalawar – 326023
November–December, 2024
Certificate

This is to certify that Mr/Ms. ………………………………………, a student of


B.Tech (Computer Science & Engineering)……. semester has submitted his/her
Technical Seminar report under my supervision.

Ved Mitra Shringi


(Seminar –Coordinator)
Acknowledgement

I would like to express my sincere gratitude towards my Supervisor ……………… and also to
………………… for their valuable support and guidance throughout the year. Without their help
and support, this seminar would not have taken the present shape.
Abstract
Contents

1 Introduction 1

2 Major Functional Blocks on Virtex-II Pro FPGA 3


2.1 The Processor Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 PowerPC 405D5 Core. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 On-Chip Memory Controller (OCM) . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Clock/Control Interface Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.4 CPU – FPGA Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Block SelectRAM+ Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Configurable Logic Blocks (CLBs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 18 x 18 Bit Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 CoreConnect Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Instruction Level Parallelism (ILP) 9


3.1 Temporal Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 The Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Spatial Parallelism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 The Superpipelining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 The Superscalar Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Pipeline Performance Parameters and Dependencies 13


4.1 CPI and MIPS ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
4.2 Speed-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
4.4 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5 Pipeline Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5.1 Resource Conflict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
4.5.2 Data Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
4.5.3 Control Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

5 Pipeline Performance Measures 20


5.1 The Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2 The Superpipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 The Superscalar Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4.1 Cycle Time (Freq.) versus Pipeline Depth . . . . . . . . . . . . . . . . . .25
5.4.2 TPI versus Pipeline Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4.3 Normalized Eff. versus Branch Misprediction Latency . . . . 26
5.4.4 Normalized Performance versus ALU Latency . . . . . . . . . . . 27

6 Proposed Work 28
6.1 Aim–I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Aim–II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Aim–III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7 Conclusion 31

8 References 32
List of Figures

2.1 Processor Block Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3


2.2 Embedded PPC 405D5 Core Block Diagram . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Core Connect Bus System-on-a-Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1 Instruction Processing Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9


3.2 Space-Time Diagram for a 4-stage Non-Pipelined Processor . . . . . . . . . . . . 10
3.3 Space-Time Diagram for a 4-stage Pipelined Processor . . . . . . . . . . . . . 10

4.1 Space-Time Diagram for a 4-stage Pipelined Processor . . . . . . . . . . . . . . 14


4.2 Space-Time Diagram showing Resource-Conflict . . . . . . . . . . . . . . . . . . . 16
4.3 Space-Time Diagram showing RAW Hazard . . . . . . . . . . . . . . . . . . . . . 17
4.4 Space-Time Diagram showing Branch-Conflict . . . . . . . . . . . . . . . . . . . 18

5.1 Space-Time Diagram showing – R3000 Pipeline . . . . . . . . . . . . . . . . 20


5.2 Space-Time Diagram showing – Modified R3000 Pipeline . . . . . . . . 21
5.3 Space-Time Diagram showing – R4000 Superpipeline . . . . . . . . . . . . 22
5.4 Space-Time Diagram showing – R4000 with Branch Mis. Latency . . 23
5.5 Space-Time Diagram showing – R5000 Superscalar Architecture . . . 24
5.6 Dependency of Cycle Time on Pipeline Depth . . . . . . . . . . . . . . . . . . . . 25
5.7 TPI as a function of Pipeline Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
5.8 Normalized Efficiency versus Branch Misprediction Latency. . . . . . . . 26
5.9 Performance versus ALU Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.1 Proposed Design on Virtex-II Pro FPGA . . . . . . . . . . . . . . . . . . . . . . . . 28


6.2 Block Ram Memory Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3 Block Diagram for Proposed Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
List of Tables
Chapter 1

Introduction

1.1 Cloud Computing

1.2 Cloud Computing Security


Chapter 2

Major Functional Blocks on Virtex-II Pro FPGA and


the CoreConnect Bus

The Virtex-II Pro families contain user-programmable gate arrays with various configurable
elements and embedded blocks optimized for high-density and high-performance system designs.
The family incorporates Multi-gigabit Transceivers and PowerPC CPU Blocks in Virtex-II Pro
Series FPGA architecture. It empowers complete solutions for telecommunication, wireless,
networking, video, and DSP applications.

Virtex-II Pro devices has the following Blocks –

2.1 RocketIO Transceiver[1]


The RocketIO Multi-Gigabit Transceivers are flexible parallel-to-serial and serial-to-parallel
Embedded Transceiver Cores used for high-bandwidth interconnection between Buses, or other
subsystems. Virtex-II Pro devices support upto twenty RocketIO Transceivers .

RocketIO Transceivers have following features –

 Full-Duplex Serial Transceiver capable of data transfer rates from 600 Mbps to 3.125
Gbps.
 Monolithic clock synthesis and clock recovery (CDR)
 8/16/32-bit (RocketIO) selectable FPGA interface.
 Optional transmit and receive data inversion.
 50Ω /75Ω selectable on-chip transmit and receive terminations.
 Cyclic Redundancy Check support.

2.2 The Processor Block[1]


Virtex-II Pro devices support upto four IBM PowerPC Processor Blocks. Figure 2.1 shows the
internal architecture of the Processor Block.
Figure 2.1 Processor Block Architecture[1]

2.2.1 Power PC 405D5 Core

PPC 405D5 is a true RISC Core that can execute One Instruction per Clock cycle. On-chip
Instruction and Data Cache reduce design complexity and improve system throughput.

The Figure 2.2 shows the Block diagram of PowerPC 405D5 Processor Core.
Figure 2.2 Embedded PPC 405D5 Core Block Diagram[1]

PPC 405D5 has the following features –

 Embedded 300+ MHz Harvard Architecture Core


 Low PowerConsumption - 0.9 mW/MHz
 Five-stage Instruction Pipeline with Single-cycle execution of most Instructions,
including Load/Store.
 Separate Instruction & Data Cache Units.
 Thirty-two 32-bit general purpose Registers (GPRs).
 Static Branch Prediction.
 Unaligned and Aligned Load/Store support to Cache, Main Memory and On-Chip
Memory.
 Hardware Multiply/Divide for faster integer Arithmetic.
 Enhanced String and Multiple-Word handling.
 Big/Little Endian Byte-Ordering support.
 16 KB Array Instruction Cache Unit(ICU), 16 KB Array Data Cache Unit(DCU).
 Memory Management Unit (MMU).
 Three Timers Support on 64-bit time base – Programmable Interval Timer (PIT), Watch
Dog Timer(WDT) and Fixed Interval Timer (FIT)
 Debug Support through JTAG.

2.2.2 On-Chip Memory(OCM) Controller

OCM Controllers provide dedicated interfaces between Block SelectRAM+ memory in the FPGA
fabric and Processor Block instruction and data paths for high-speed access. The OCM Controller
provides an interface to both the 64-bit Instruction-Side Block RAM (ISBRAM) and the 32-bit
Data-Side Block RAM (DSBRAM).

The designer can choose to implement –

• ISBRAM only
• DSBRAM only
• Both ISBRAM and DSBRAM
• No ISBRAM and no DSBRAM

Typical applications for DSOCM include Scratch-pad Memory, as well as use of the Dual-port
feature of Block RAM to enable bidirectional data transfer between Processor and FPGA. Typical
applications for ISOCM include storage of Interrupt Service Routines(ISRs). Data transfer is
controlled through DCR Registers.

2.2.3 Clock/Control Interface Logic

The Clock/Control interface logic provides proper initialization and connections for PPC405
Clock/power management, resets, PLB Cycle control, and OCM interfaces. It also couples user
Signals between the FPGA fabric and the Embedded PPC405 CPU Core.

The Processor Clock connectivity is similar to CLB clock pins. It can connect either to global
clock nets or general routing resources. Therefore the Processor Clock source can come from
DCM, CLB, or user package pin.
Chapter 5

Conclusion and Future Work


References

[1] “Virtex-II Pro Series FPGA, Xilinx Documentation”, XC4000E/ XC4000X


Series FPGAs, Jan. 29, 1999
http://www.xilinx.com

[2] “Xilinx University Program Virtex-II Pro Development System, Hardware


Reference Manual”, UG069 (v1.0) March 8, 2005

[3] Hayes, J. P. – “Computer Architecture and Organization”, ed. 3rd : McGraw-


Hill, 1998.

[4] David A . Patterson and John L. Hennessy – “Computer Organization and


Design, A Hardware/ Software approach”, 3rd Edition, Morgan Kaufmann/
Elsevier Publishers.

[5] Kane G. and J. Heinrich – “MIPS RISC Architecture”, Englewood Cliffs, NJ :


Prentice – Hall, 1992.

[6] Hwang K. and Briggs F. A.– “Computer Architecture & Parallel Processing”,
McGraw-Hill, 1995.

[7] Yeager, K. C. – “The MIPS R10000 Superscalar Microprocessor” IEEE


Micro, vol. 16 (April 1996) pp. 28-40.

[8] Hartstein A. and Puzak R. Thomas – “The Optimum Pipeline Depth for a
Microprocessor” IBM – T. J. Watson Research Centre, 2002.

[20] Ramesh Gaonkar, “Microprocessor Architecture, Programming and


Applications with the 8085”, 5th edition, Penram International Publishing
(India) Pvt. Ltd.

You might also like