PDF 1
PDF 1
PDF 1
ASSIGNMENT 1
Implement a 6-stage pipelined processor in Verilog. This processor supports load (lw), store (sw),
jump (j), add (add), and (and), and add immediate (addi) instructions only. The processor
should implement forwarding to resolve data hazards. The processor has Reset, CLK as inputs and
no outputs. The processor has instruction fetch unit, decode, reg read (with 32 32-bit registers),
execution, memory and writeback units. The processor also contains five pipelined registers IF/ID,
ID/RR, RR/EX, EX/MEM and MEM/WB. When reset is activated the PC, IF/ID, ID/RR, RR/EX,
EX/MEM and MEM/WB registers are initialized to 0, the instruction memory and register file get
loaded by predefined values. When the instruction unit starts fetching the first instruction the
pipelined registers contain unknown values. When the second instruction is being fetched in the
IF unit, the IF/ID register will hold the instruction code for the first instruction. When the third
instruction is being fetched by the IF unit, the IF/ID register contains the instruction code of the
second instruction, the ID/RR register contains information related to the first instruction and so
on. (Assume a 32-bit PC. Also Assume Address and Data size as 32-bit).
The instruction and its 32-bit instruction format are shown below:
lw destinationReg, offset [sourceReg] (Sign extends data specified in instruction field (15:0) to
32-bits, add it with register specified by register number in rs field and store the data corresponding
to the memory location defined by the calculated address in the rt register. Opcode for lw is
100011).
op rs rt offset
sw sourceReg, offset [destinationReg] (Sign extends data specified in instruction field (15:0) to
32-bits, add it with register specified by register number in rs field. Opcode for sw is 101011).
op rs rt offset
j target (Shift left by 2 the data specified in offset field (25:0) to 28-bits, and append the first 4
bits of PC+4. Opcode for j is 000010).
op offset
6 bits (31-26) 26-bits (25-0)
op rs rt rd shamt funct
6 bits (31-26) 5-bits (25-21) 5-bits (20-16) 5-bits (15-11) 5-bits (10-6) 6-bits (5-0)
op rs rt rd shamt funct
6 bits (31-26) 5-bits (25-21) 5-bits (20-16) 5-bits (15-11) 5-bits (10-6) 6-bits (5-0)
addi destinationReg, sourceReg, immediate (Sign extends data specified in instruction field
(15:0) to 32-bits, add it with register specified by register number in rs field. And store the result
in rt. Opcode for addi is 001000).
op rs rt offset
Assume the register file contains 32 registers (R0-R31) each register can hold 32-bit data. On reset,
PC and all register file registers should get initialized to 0. Ensure r0 is always zero. Each location
in DMEM has 8-bit data. So, to store a 32-bit value, you need 4 locations in the DMEM, stored in
big-endian format. Let DMEM[10] have the value 8’d96 on reset. Also ensure that on reset, the
instruction memory gets initialized with the following instructions, starting at address 0:
A partial block level representation of the 6-stage pipelined processor is shown below. Please note
that for registerfile implementation, write should be on the positive edge and read should be
on the negative edge of the clock. Write operation depends on the control signal.
1. PDF version of this Document with all the Questions below answered with file name as
IDNO_NAME.pdf.
2. Design Verilog Files for all the Sub-modules (instruction fetch, Register file, forwarding
unit).
3. Design Verilog file for the main processor.
Answer:
2. List the control signals used and also the values of control signals for different
instructions in a tabular format as follows:
Answer:
lw 1 0 1 1 0 1 00
sw 0 X 1 X 1 0 00
j 0 X 0 0 0 0 01
add 1 1 0 0 0 0 10
and 1 1 0 0 0 0 10
addi 1 0 1 0 0 0 11
3. In a program, there are 25% load instructions, 1/x of which are immediately
followed by an instruction that uses a result, requiring a stall. 10% are stores.
50% are R-type. 10% are branch, 1/y of which are taken. 5% are jumps. What
is the average CPI of this program? If the number of instructions are 10^9,
and the clock cycle is 100 ps, how much time does a MIPS single cycle pipelined
processor take to execute all instructions? Assume the processor always
predicts branch not-taken.
Where x, y, z are related to last 3 digits of your ID No.
If ID number: 20XXXXXXABCG, then x = (A % 8) + 1, y = ((B + 2) % 8) + 1,
and z= ((C + 3) % 8) + 1.
Answer:
x = 2, y = 8, z = 8
CPI = 1.1675
Answer:
5. Implement the Instruction Decode block. Copy the image of Verilog code of
the Instruction decode block here
Answer:
6. Determine the condition that can be used to detect data hazard?
EX Hazard Detection
1a. EX/MEM.RegWrite = 1
1b. EX/MEM.RegWrite = 1
2a. MEM/WB.RegWrite = 1
2b. MEM/WB.RegWrite = 1
Answer:
8. Implement the forwarding unit and copy the image of Verilog code of
forwarding unit here.
Answer:
9. Implement a complete processor in Verilog (using all the Datapath blocks).
Copy the image of Verilog code of the processor here. (Use comments to
describe your Verilog implementation)
Answer:
10. Test the processor design by generating the appropriate clock and reset. Copy
the image of your testbench code here.
Answer:
11. Verify if the register file is getting updated according to the set of instructions
(mentioned earlier).
Copy verified Register file waveform here (show only the Registers that get
updated, CLK, and RESET):
12. What are the total number of cycles needed to issue the program given above
on the pipelined MIPS Processor? What is the CPI of the program?
Answer: 11 cycles.
13. Make a diagram of the pipelined processor executing each instruction in the
program given above. Also show in which cycles and instructions are stalls and
forwarding required.
Answer:
Unrelated Questions
What were the problems you faced during the implementation of the processor?
Answer: I faced problems while making forwarding unit and alu control unit.
Did you implement the processor on your own? If you took help from someone whose help
did you take? Which part of the design did you take help for?