Computer Architecture Lab Manual

Download as pdf or txt
Download as pdf or txt
You are on page 1of 108
At a glance
Powered by AI
The key takeaways are an introduction to the MARS simulator and how it can be used to assemble, run and debug MIPS programs.

The main components of the MARS IDE are the Edit window for writing code and the Execute window for running and debugging programs. It also has message areas and tabs for switching between windows.

The main features of the MARS Execute window are the assembly code, data segment, registers, labels, breakpoints, execution speed and ability to view values in different formats like decimal and hexadecimal.

1 Introduction to MARS

1.1 Objectives

After completing this lab, you will:


• Get familiar with the MARS simulator
• Learn how to assemble, run, and debug a MIPS program

1.2 The MARS Simulator

MARS, the MIPS Assembly and Runtime Simulator, is an integrated development environment
(IDE) for programming in MIPS assembly language. It allows editing, assembling, debugging and
simulating the execution of MIPS assembly language programs. MARS is written in Java.

There are two main windows in MARS, as shown in Figure 1.1.


• The Edit window: used to create and modify a MIPS program.
• The Execute window: used to run and debug a MIPS program.

To switch between the Edit and the Execute windows, use the tabs at the top.

The Execute window contains three main panes:


1. Text Segment: shows the machine code and related addresses.
2. Data Segment: shows memory locations that hold variables in the data segment.
3. Labels: shows addresses of labeled items, i.e. variables and jump endpoints.

There are two tabbed message areas at the bottom of Figure 1.1:
1. The Mars Messages tab: Used for messages such as assembly or runtime errors and
informational messages. You can click on assembly error messages to select the
corresponding line of code in the editor.
2. The Run I/O tab: Used at runtime for displaying console output and entering console input as
program execution progresses.

1: Introduction to MARS Page 1


Figure 1.1: The MARS Integrated Development Environment (IDE)

Figure 1.2 shows the MARS Execute window’s panes, and emphasizes the following features:

1. The Execute window’s tab.

2. Assembly code displayed with addresses and machine code.

3. Values stored in the data segment. These are directly editable.

4. Controls for navigating the data memory area. Allows switching to view the stack segment.

5. Switching between decimal and hexadecimal addresses and values in memory and registers.

6. Labels and their corresponding memory addresses.

7. Values stored in registers. These are directly editable.

8. Checkboxes used to setup breakpoints for each MIPS instruction. Useful in debugging.

9. Execution speed selection. Useful in debugging.

1: Introduction to MARS Page 2


Figure 1.2: The MARS Execute Window

At all times, the MIPS register window appears on the right-hand side of the screen, even when you
are editing and not running a program. While writing a program, this serves as a useful reference for
register names and their use. Move the mouse over the register name to see the tool tips.

There are three register tabs:

The Register File: integer registers $0 through $31, HI, LO, and the Program Counter PC.

Coprocessor 0: exceptions, interrupts, and status codes.

Coprocessor 1: floating point registers.

1.3 Assemble, Run, and Debug a MIPS Program


To assemble the file currently in the Edit tab, select Assemble from the Run menu, or use the
Assemble toolbar icon.

If there are syntax errors in the program, they will appear in the Mars Messages tab at the bottom of
the MARS screen. Each error message contains the line and column where the error occurred.

Once a MIPS program assembles successfully, the registers are initialized, and the Text Segment
and the Data Segment are filled, as shown in Figure 1.3.

1: Introduction to MARS Page 3


Figure 1.3: MARS screen after running the Assemble command

After running the Assemble command, you can now execute the program. The Run menu and the
toolbar contain the following execution options:

Menu Item Icon Action

Run > Assemble Assemble the program.

Run > Go Run the program to completion, or until the next breakpoint.

Reset the program and simulator to initial values. Allows


Run > Reset
restarting program execution.

Single-step execution: execute one instruction at a time. Allows


Run > Step debugging the program by inspecting register and memory after
executing each single instruction.

Run > Backstep Single-step backwards: “unexecute” the last executed instruction.

The Run Speed Slider allows running the program at full speed or
slowing it down so you can watch the execution. Affects normal
execution only, not single-step execution.

1: Introduction to MARS Page 4


You can set a breakpoint at any instruction by checking the checkbox in front of the instruction in
the text segment pane.

During execution, the instruction being executed is highlighted in yellow, and the register that was
last modified is highlighted in green. Also, the variable that was last updated in the data segment is
highlighted in blue. It’s usually only possible to see the highlighting when you are stepping or
running at less than full speed.

For more details about the MARS simulator, refer to the MARS documentation at the following
link: http://courses.missouristate.edu/KenVollmar/MARS/

1.4 In-Lab Tasks

1. Test a simple MIPS program. Consider the following program shown below:

a) Type the program shown in the Figure above.

b) Find out how to show and hide line numbers.

c) Assemble and run the program.

d) What output does the program produce? and where does it appear?

2. Explore the MARS simulator:

a) Download and assemble the Fibonacci.asm program from the MARS website.

b) Identify the locations and values of the initialized data.

c) Toggle the display format between decimal and hexadecimal.

d) Run the program at a speed of 3 instructions per second or less.

e) Single-step through the program and watch how register and memory values change.

1: Introduction to MARS Page 5


f) Observe the output of the program in the Run I/O display window.

g) Set a breakpoint at the first instruction that prints results. What is the address of this
instruction?

h) Run the program at full speed and watch how it stops at the breakpoint.

i) Change the line:

space: .asciiz " " # space to insert between numbers

to:

space: .asciiz "\n" # space to insert between numbers

Run the program again. What do you notice?

1: Introduction to MARS Page 6


Introduction to MIPS
2 Assembly Programming
2.1 Objectives

After completing this lab, you will:


• Learn about the MIPS assembly language
• Write simple MIPS programs
• Use system calls for simple input and output

2.2 MIPS Assembly Language Program Template

A MIPS assembly language program template is shown in Figure 2.1.

# Title:
# Author:
# Date:
# Description:
# Input:
# Output:
################### Data segment #####################
.data
. . .
################### Code segment #####################
.text
.globl main
main: # main function entry
. . .
li $v0, 10
syscall # system call to exit program

Figure 2.1: MIPS Assembly Language Program Template

2: Introduction to MIPS Assembly Programming Page 1


There are three types of statements that can be used in assembly language, where each statement
appears on a separate line:
1. Assembler directives: These provide information to the assembler while translating a program.
Directives are used to define segments and allocate space for variables in memory. An
assembler directive always starts with a dot. A typical MIPS assembly language program uses
the following directives:

.data Defines the data segment of the program, containing the program’s variables.

.text Defines the code segment of the program, containing the instructions.

.globl Defines a symbol as global that can be referenced from other files.

2. Executable Instructions: These generate machine code for the processor to execute at runtime.
Instructions tell the processor what to do.
3. Pseudo-Instructions and Macros: Translated by the assembler into real instructions. These
simplify the programmer task.

In addition there are comments. Comments are very important for programmers, but ignored by the
assembler. A comment begins with the # symbol and terminates at the end of the line. Comments
can appear at the beginning of a line, or after an instruction. They explain the program purpose,
when it was written, revised, and by whom. They explain the data and registers used in the program,
input, output, the instruction sequence, and algorithms used.

2.3 The Edit-Assemble-Link-Run Cycle

Before you can run a MIPS program, you must convert the assembly language code into an
executable form. This involves two steps:
1. Assemble: translate the MIPS assembly language code into a binary object file. This is done by
the assembler. If there is more than one assembly language file, then each should be assembled
separately.
2. Link: combine all the object files together (if there is more than one) as well as with libraries.
This is done by the linker. The linker checks if there are any calls to functions in libraries. The
result is an executable file.
Figure 2.2 summarizes the Edit-Assemble-Link-Run cycle of the program development process. If a
program is written in assembly language, the assembler detects any syntax errors and will report
them to the programmer. Therefore, you should edit your program and assemble it again if there any
syntax errors.
It is typical that the first executable version of your program to have some runtime errors. These
errors are not detected by the assembler but occur when you are running your program. For
example, your program might compute erroneous results. Therefore, you should debug your
program to identify the errors at runtime. You can run your program with various inputs and under
different conditions to verify that it is working correctly. You can use the slow execution mode in

2: Introduction to MIPS Assembly Programming Page 2


MARS, the single-step feature, or breakpoints to identify the sources of the errors. Single-step
execution is a standard and essential feature in a debugger. It allows inspecting the effect of each
instruction on CPU registers and main memory.

Figure 2.2: The Edit-Assemble-Link-Run Cycle

2.4 MIPS Instructions, Registers, Format and Syntax

All MIPS instructions are 32-bit wide and occupy 4 bytes in memory. The address of a MIPS
instruction in memory is always a multiple of 4 bytes. There are three basic MIPS instruction
formats: Register (R-Type) format, Immediate (I-Type) format, and Jump (or J-Type) format as
shown in Figure 2.3.

All instructions have a 6-bit opcode that defines the format and sometimes the operation of an
instruction. The R-type format has two source register fields: Rs and Rt, and one destination
register field Rd. All register fields are 5-bit long and address 32 general-purpose registers. The sa
field is used as the shift amount for shift instructions and the funct field defines the ALU function
for R-type instructions.

The I-type format has two register fields only: Rs and Rt, where Rs is always a source register,
while Rt can be a destination register or a second source depending on the opcode. The 16-bit

2: Introduction to MIPS Assembly Programming Page 3


immediate field is used as a constant in arithmetic instructions, or as an offset in load, store, and
branch instructions.

The J-type format has no register field. The 26-bit Immediate field is used as an address in jump
and function call instructions.

R-Type Format
Op6 Rs5 Rt5 Rd5 sa5 funct6

I-Type Format
Op6 Rs5 Rt5 Immediate16

J-Type Format
Op6 Immediate26

Figure 2.3: MIPS Instruction Formats

The MIPS architecture defines 32 general-purpose registers, numbered from $0 to $31. The $ sign
is used to refer to a register. To simplify software development, the assembler can also refer to
registers by name as shown in Table 2.1. The assembler converts a register name to its
corresponding number.

Register Name Number Register Usage by Software


$zero $0 Always zero, forced by hardware
$at $1 Assembler Temporary register, reserved for assembler use
$v0 - $v1 $2 - $3 Results of a function
$a0 - $a3 $4 - $7 Arguments of a function
$t0 - $t7 $8 - $15 Registers for storing temporary values
$s0 - $s7 $16 - $23 Registers that should be saved across function calls
$t8 - $t9 $24 - $25 Registers for storing more temporary values
$k0 - $k1 $26 - $27 Registers reserved for the OS kernel use
$gp $28 Global Pointer register that points to global data
$sp $29 Stack Pointer register that points to top of stack
$fp $30 Frame Pointer register that points to stack frame
$ra $31 Return Address register used to return from a function call

Table 2.1: General-Purpose Registers and their Usage

2: Introduction to MIPS Assembly Programming Page 4


The general assembly language syntax of a MIPS instruction is:

[label:] mnemonic [operands] [# comment]

The label is optional. It marks the memory address of the instruction. It must have a colon. In
addition, a label can be used for referring to the address of a variable in memory.

The mnemonic specifies the operation: add, sub, etc.

The operands specify the data required by the instruction. Different instructions have different
number of operands. Operands can be registers, memory variables, or constants. Most arithmetic
and logical instructions have three operands.

An example of a MIPS instruction is shown below. This example uses the addiu to increment the
$t0 register:

L1: addiu $t0, $t0, 1 # increment $t0

To be able to write programs, a basic set of instructions is needed. Only few instructions are described
in the following tables. Table 2.2 lists the basic arithmetic instructions and Table 2.3 lists basic
control instructions.

Instruction Meaning
add Rd, Rs, Rt Rd = Rs + Rt. Overflow causes an exception.
sub Rd, Rs, Rt Rd = Rs – Rt. Overflow causes an exception.
addi Rt, Rs, Imm Rt = Rs + Imm (16-bit constant). Overflow causes an exception.
li Rt, Imm Rt = Imm (pseudo-instruction).
la Rt, var Rt = address of var (pseudo-instruction).
move Rd, Rs Rd = Rs (pseudo-instruction).

Table 2.2: Basic Arithmetic Instructions.

Instruction Meaning
beq Rs, Rt, label if (Rs == Rt) branch to label.
bne Rs, Rt, label if (Rs != Rt) branch to label.
j label Jump to label.

Table 2.3: Basic Control Instructions.

2: Introduction to MIPS Assembly Programming Page 5


2.5 System Calls
Programs do input and output using system calls. On a real-system, the operating system provides
system call services to application programs. The MIPS architecture provides a special syscall
instruction that generates a system call exception, which is handled by the operating system.

System calls are operating-system specific. Each operating system provides its own set of system
calls. Because MARS is a simulator, there is no operating system involved. The MARS simulator
handles the syscall exception and provides system services to programs. Table 2.1 shows a small
set of services provided by MARS for doing basic I/O.

Before using the syscall instruction, you should load the service number into register $v0, and load
the arguments, if any, into registers $a0, $a1, etc. After issuing the syscall instruction, you should
retrieve return values, if any, from register $v0.

Service Code in $v0 Arguments Result

Print Integer 1 $a0 = integer to print

Print String 4 $a0 = address of null-terminated string

Read Integer 5 $v0 = integer read

$a0 = address of input buffer


Read String 8
$a1 = maximum characters to read

Exit program 10 Terminates program

Print char 11 $a0 = character to print

Read char 12 $v0 = character read

Table 2.4: Basic System Call Services Provided by MARS.

Now, we are ready to write a MIPS assembly language program. A simple program that asks the user
to enter an integer value and then displays the value of this integer is shown in Figure 2.4.

Five system calls are used. The first system call prints string str1. The second system call reads an
input integer. The third system call prints str2. The fourth system call prints the integer value that was
input by the user. The fifth system call exits the program.

2: Introduction to MIPS Assembly Programming Page 6


Figure 2.4: MIPS Program that uses System Calls

2.6 In-Lab Tasks

1. Modify the program shown in Figure 2.4. Ask the user to enter an integer value, and then print
the result of doubling that number. Use the add instruction.
2. Modify again the program shown in Figure 2.4. Ask the user whether he wants to repeat the
program: "\nRepeat [y/n]? ". Use service code 12 to read a character and the branch
instruction to repeat the main function if the user input is character 'y'.
3. Write a MIPS program that asks the user to input his name and then prints "Hello ", followed
by the name entered by the user.

4. Write a MIPS program that executes the statement: s = (a + b) – (c + 101), where a, b, and c are
user provided integer inputs, and s is computed and printed as an output. Answer the following:
a. Suppose the user enters a = 5, b = 10, and c = -30, what is the expected value of s?
b. Which instruction in your program computed the value of s and which register is used?
c. What is the address of this instruction in memory?
d. Put a breakpoint at this instruction and write the value of the register used for computing s in
decimal and hexadecimal.
5. Write a MIPS program that inputs two integer values. The program should output equal if the
two integers are equal. Otherwise, it should output not equal. Use the branch instruction to
check for equality.

2: Introduction to MIPS Assembly Programming Page 7


3 Integer Arithmetic

3.1 Objectives

After completing this lab, you will:


• Get familiar with the basic MIPS integer arithmetic and logic instructions including:
o Integer addition and subtraction instructions
o Bitwise logic instructions
o Shift instructions
• Learn some useful applications of these instructions.

3.2 Integer Add/Subtract Instructions

The MIPS add/subtract instructions are shown in Table 3.1 below. The R-type add/sub instructions
have three registers where the first register is the destination register while the other two registers
are the two registers to be added. Similarly the I-type add/sub instructions have the first register as
the destination register, while the second register and the constant are the operands to be either
added or subtracted.

Instruction Meaning
add $s1, $s2, $s3 $s1 = $s2 + $s3
addu $s1, $s2, $s3 $s1 = $s2 + $s3
sub $s1, $s2, $s3 $s1 = $s2 – $s3
subu $s1, $s2, $s3 $s1 = $s2 – $s3
addi $s1, $s2, 10 $s1 = $s2 + 10
addiu $s1, $s2, 10 $s1 = $s2 + 10

Table 3.1: Add/Subtract Instructions

The difference between add/sub and addu/subu instructions is that in case of overflow
occurrence, the add/sub instructions will cause an arithmetic exception and the result will not be
written to the destination register. However, for the instructions addu/subu, overflow occurrence
is ignored.

3: Integer Arithmetic Page 1


As an example of using the add/sub instructions, consider the translation of the expression:
f = (g+h) – (i+j)
Assuming that f, g, h, i, and j are allocated registers $s0 thru $s4, the following assembly code
performs the translation:
addu $t0, $s1, $s2 # $t0 = g + h

addu $t1, $s3, $s4 # $t1 = i + j

subu $s0, $t0, $t1 # $s0 = f = (g+h)–(i+j)

To illustrate the use of the addiu instruction with constants, let us assume that variables a, b, and c
are allocated in registers $s0, $s1, $s2. Then,

a = b + 5 is translated as: addiu $s0, $s1, 5


c = b – 1 is translated as: addiu $s2, $s1, -1

3.3 Logical Bitwise Instructions

The MIPS logical instructions are given in Table 3.2. These include the: and, or, nor and xor
instructions. The operands for these instructions follow the same convention as the add/sub
instructions. The immediate value for the andi, ori, and xori logical instructions is treated as
unsigned constant, while it is treated as a signed constant for the addi and addiu instructions.

Instruction Meaning
and $s1, $s2, $s3 $s1 = $s2 & $s3
or $s1, $s2, $s3 $s1 = $s2 | $s3
xor $s1, $s2, $s3 $s1 = $s2 ^ $s3
nor $s1, $s2, $s3 $s1 = ~($s2|$s3)
andi $s1, $s2, 10 $s1 = $s2 & 10
ori $s1, $s2, 10 $s1 = $s2 | 10
xori $s1, $s2, 10 $s1 = $s2 ^ 10

Table 3.2: Logical Instructions

The truth tables of the and, or, xor and nor logical operations are given below:

3: Integer Arithmetic Page 2


The and instruction is used to clear bits: x and 0 is equal to 0. The or instruction is used to set
bits: x or 1 is equal to 1. The xor instruction is used to toggle bits: x xor 1 is equal to not x. The
nor instruction can be used as a not since nor $s1,$s2,$s2 is equivalent to not $s1,$s2.

As an example of these instructions, assume that $s1 = 0xabcd1234 and $s2 = 0xffff0000.
Then, the following logical instructions produce the shown resulting values in registers $s3 to $s6:
and $s3,$s1,$s2 # $s3 = 0xabcd0000

or $s4,$s1,$s2 # $s4 = 0xffff1234

xor $s5,$s1,$s2 # $s5 = 0x54321234

nor $s6,$s1,$s2 # $s6 = 0x0000edcb

The sample program to run this code is given below and the resulting register content after
executing the program is shown in Figure 3.1.

.text
.globl main
main: # main program
li $s1, 0xabcd1234 # Pseudo instruction to initialize a register
li $s2, 0xffff0000
and $s3,$s1,$s2 # $s3 = 0xabcd0000
or $s4,$s1,$s2 # $s4 = 0xffff1234
xor $s5,$s1,$s2 # $s5 = 0x54321234
nor $s6,$s1,$s2 # $s6 = 0x0000edcb

li $v0, 10 # Exit program


syscall

Arithmetic and logic instructions have many useful applications. For example, to convert a
character in register $s0 from lower case (i.e. 'a' to 'z') to upper case (i.e. 'A' to 'Z'), we could
use any of the following instructions:

subi $s0, $s0, 0x20 # ASCII code of 'a' = 0x61, of 'A' = 0x41

andi $s0, $s0, 0xfd

Similarly, to convert a character in register $s0 from upper case to lower case, we could use any of
the following instructions:

addi $s0, $s0, 0x20

ori $s0, $s0, 0x20

3: Integer Arithmetic Page 3


Figure 3.1: Register content after sample code execution.

To initialize the content of register $s0 by a 16-bit constant k (i.e., having a value in the range 0 to
215-1), we can use any of the following instructions:

addi $s0, $0, k

ori $s0, $0, k

However, to initialize a register with a 32-bit constant, we need to use the lui instruction.

Suppose that we want to initialize $s1 with the constant 0xAC5165D9 (32-bit constant), then we
can use the following two instructions:

This sequence of instructions is generated by the assembler when we use the pseudo instruction:

li $s1, 0xAC5165D9

3: Integer Arithmetic Page 4


3.4 Shift Instructions

The MIPS shift instructions are given in Table 3.3. The first operand of the shift instructions is the
destination register, the second operand is the register to be shifted while the third operand specifies
the amount of shift. The amount of shift can be specified as a constant value or it can be stored in a
register. For the instructions sll, srl, sra, the shift amount is a 5-bit constant while for the
instructions sllv, srlv, srav, the shift amount is variable and is stored in a register.

Instruction Meaning
sll $s1,$s2,10 $s1 = $s2 << 10
srl $s1,$s2,10 $s1 = $s2>>>10
sra $s1,$s2,10 $s1 = $s2 >> 10
sllv $s1,$s2,$s3 $s1 = $s2 << $s3
srlv $s1,$s2,$s3 $s1 = $s2>>>$s3
srav $s1,$s2,$s3 $s1 = $s2 >> $s3

Table 3.3: Shift Instructions.

Shifting is to move all the bits in a register left or right. sll/srl mean shift left/right logical while
sra means shift right arithmetic for which the sign-bit (rather than 0) is shifted from the left as
illustrated in Figure 3.2.

As an example, let us assume that $s2 = 0xabcd1234 and $s3 = 16. Then, the following shift
instructions produce the shown values in $s1.

sll $s1,$s2,8 $s1 = $s2<<8 $s1 = 0xcd123400

sra $s1,$s2,4 $s1 = $s2>>4 $s1 = 0xfabcd123

srlv $s1,$s2,$s3 $s1 = $s2>>>$s3 $s1 = 0x0000abcd

Figure 3.2: Illustration of shift instructions.

3: Integer Arithmetic Page 5


We can use shift instructions along with either addition or subtraction instructions to multiply the
content of a register by a constant. Shift-left (sll) instruction can be used to perform multiplication
when the multiplier is a power of 2. You can factor any binary number into powers of 2. For
example, to multiply $s1 by 36, factor 36 into (4 + 32) and use distributive property of
multiplication $s2 = $s1*36 = $s1*(4 + 32) = $s1*4 + $s1*32. Thus, this can be
achieved by the following instructions:

sll $t0, $s1, 2 # $t0 = $s1 * 4

sll $t1, $s1, 5 # $t1 = $s1 * 32

addu $s2, $t0, $t1 # $s2 = $s1 * 36

As another example, let us multiply the content of $s1 by 31. We can do that using the following
instructions noting that 31=32-1:

sll $s2, $s1, 5 # $s2 = $s1 * 32

subu $s2, $s2, $s1 # $s2 = $s1 * 31

We can also use the shift right instructions (srl and sra) to divide a number by a power of 2
constant. Shifting register $s0 right by n bits divides its content by 2n. For example, to divide an
unsigned number in register $s0 by 8, we use the instruction srl $s0, 3. However, to divide a
signed number in register $s0 by 8, we use the instruction sra $s0, 3.

3.5 In-Lab Tasks

1. Write a program to ask the user to enter two integers A and B and then display the result of
computing the expression: A + 2B - 5.
2. Assume that $s1 = 0x12345678 and $s2 = 0xffff9a00. Determine the content of registers
$s3 to $s6 after executing the following instructions:

and $s3,$s1,$s2 # $s3 =

or $s4,$s1,$s2 # $s4 =

xor $s5,$s1,$s2 # $s5 =

nor $s6,$s1,$s2 # $s6 =

Write a program to execute these instructions and verify the content of registers $s3 to $s6.

3: Integer Arithmetic Page 6


3. Assume that $s1 = 0x87654321. Determine the content of registers $s2 to $s4 after executing
the following instructions:

sll $s2,$s1, 16 # $s2 =

srl $s3,$s1, 8 # $s3 =

sra $s4,$s1, 12 # $s4 =

Write a program to execute these instructions and verify the content of registers $s2 to $s4.

4. Write a program that asks the user to enter an alphabetic character (either lower or upper case)
and change the case of the character from lower to upper and from upper to lower and display it.

5. Write a program that asks the user to enter and integer number and read it. Then ask him to
enter a bit position (between 0 and 31) and display the value of that bit.

6. Write a program that asks the user to enter a signed number and read it. Then display the
content of multiplying this number by 24.5.

7. Write a program that asks the user to enter an unsigned number and read it. Then swap the bits
at odd positions with those at even positions and display the resulting number. For example, if
the user enters the number 9, which has binary representation of 1001, then bit 0 is swapped
with bit 1, and bit 2 is swapped with bit 3, resulting in the binary number 0110. Thus, the
program should display 6.

3: Integer Arithmetic Page 7


4 Flow Control

4.1 Objectives

After completing this lab, you will:


• Get familiar with MIPS Jump and Branch instructions
• Learn about pseudo instructions in MIPS
• Learn how to translate high-level flow control constructs (if-then-else, for loop, while
loop) to MIPS code

4.2 MIPS Jump and Branch Instructions

Like all processors, MIPS has instructions for implementing unconditional and conditional jumps.
The MIPS Jump and Branch instructions are shown in Table 4.1.

Table 4.1: MIPS Jump and Branch Instructions.

For unconditional jump, the instruction j label is used where label is the address of the target
instruction as shown below:

j label # jump to label

. . .

label:

4: Flow Control Page 1


There are two MIPS conditional branch instructions that branch based on the condition whether two
registers are equal or not as follows:

beq Rs, Rt, label # branch to label if (Rs == Rt)

bne Rs, Rt, label # branch to label if (Rs != Rt)

Four additional MIPS instructions are provided based on comparing the content of a register with 0
as follows:

bltz Rs, label # branch to label if (Rs < 0)

bgtz Rs, label # branch to label if (Rs > 0)

blez Rs, label # branch to label if (Rs <= 0)

bgez Rs, label # branch to label if (Rs >= 0)

Note that MIPS does not provide the instructions beqz and bnez as these can be implemented
using the beq and bne instructions with register $0 used as the second operand.

MIPS also provides four set on less than instructions as follows:

slt rd, rs, rt # if (rs < rt) rd = 1 else rd = 0

sltu rd, rs, rt # unsigned <

slti rt, rs, im16 # if (rs < im16) rt = 1 else rt = 0

sltiu rt, rs, im16 # unsigned <

Note that the instructions slt and slti are used for signed comparison while instructions sltu
and sltiu are used for unsigned comparison.

For example, assume that $s0 = 1 and $s1 = -1 = 0xffffffff, then the following two
instructions produce different results as shown below:

slt $t0, $s0, $s1 # results in $t0 = 0


sltu $t0, $s0, $s1 # results in $t0 = 1

4.3 Pseudo Instructions

Pseudo instructions are instructions introduced by an assembler as if they were real instructions. We
have seen an example of a pseudo instruction before, which is the li instruction. Pseudo
instructions are useful as they facilitate programming in assembly language.

For example, the MIPS processor does not have the following useful conditional branch comparison
instructions:

4: Flow Control Page 2


blt, bltu branch if less than (signed/unsigned)

ble, bleu branch if less or equal (signed/unsigned)

bgt, bgtu branch if greater than (signed/unsigned)

bge, bgeu branch if greater or equal (signed/unsigned)

The reason for not implementing these instructions as part of the MIPS instruction set is that they
can be easily implemented based on a set of two instructions.

For example, the instruction blt $s0, $s1, label can be implemented using the following
sequence of two instructions:

slt $at, $s0, $s1

bne $at, $zero, label

Similarly, the instruction ble $s2, $s3, label can be implemented using the following
sequence of two instructions:

slt $at, $s3, $s2

beq $at, $zero, label

Table 4.2 shows more examples of pseudo instructions. Note that the assembler temporary register
$at=$1 is reserved for its own use.

Table 4.2: Examples of pseudo instructions.

4.4 Translating High-Level Flow Control Constructs

We can translate any high-level flow construct into assembly language using the jump, branch and
set-less-than instructions. For example, let us consider the following if statement:

if (a == b) c = d + e; else c = d – e;

4: Flow Control Page 3


Let us assume that variables a, b, c, d, e are stored in registers $s0 thru $s4 respectively. The
following assembly code implements this IF statement:

bne $s0, $s1, else

addu $s2, $s3, $s4

j exit

else: subu $s2, $s3, $s4

exit: . . .

We can also implement an IF statement with a compound condition involving logical AND
operation. For example, let us consider implementing the following IF statement:

if (($s1 > 0) && ($s2 < 0)) {$s3++;}

The IF statement is implemented efficiently using the following assembly code which uses the fall
through concept which skips the execution of the instruction if the first condition is false otherwise
it continues the execution:

blez $s1, next # skip if false

bgez $s2, next # skip if false

addiu $s3, $s3, 1 # both are true

next:

Similarly, we can translate an IF statement with a compound condition involving logical OR


operation. For example, let us consider implementing the following IF statement:

if (($sl > $s2) || ($s2 > $s3)) {$s4 = 1;}

The IF statement is implemented efficiently using the following assembly code which checks the
first condition and if it is true, it skips the second condition:

bgt $s1, $s2, L1 # yes, execute if part

ble $s2, $s3, next # no: skip if part


L1: li $s4, 1 # set $s4 to 1

next:

We can also implement all types of loops. Let us consider implementing the following for loop:

for (i=0; i<n; i++) {

loop body

4: Flow Control Page 4


Let us assume that variable i is stored in register $s0 and n is stored in register $s1. Then, the for
loop is implemented using the following assembly code:

li $s0, 0 # i = 0

ForLoop:

bge $s0, $s1, EndFor

loop body

addi $s0, #s0, 1 # i++

j ForLoop

EndFor:

Consider the implementing of the following while loop:

i=0;

while (i<n) {

loop body

i++;

We can note that this while loop has identical behavior to the for loop and hence its assembly
code will be identical.

Finally, let us consider implementing the following do-while loop:

i=0;

do {

loop body

i++;

} while (i<n)

The do-while loop can be translated using the following assembly code:

li $s0, 0 # i = 0

WhileLoop:

loop body

addi $s0, $s0, 1 # i++

blt $s0, $s1, WhileLoop

4: Flow Control Page 5


4.5 In-Lab Tasks

1. Write a program that asks the user to enter an integer and then displays the number of 1's in the
binary representation of that integer. For example, if the user enters 9, then the program should
display 2.
2. Write a program that asks the user to enter two integers: n1 and n2 and prints the sum of all
numbers from n1 to n2. For example, if the user enters n1=3 and n2=7, then the program
should display the sum as 25.
3. Write a program that asks the user to enter an integer and then display the hexadecimal
representation of that integer.
4. The Fibonacci sequence are the numbers in the following integer sequence: 0, 1, 1, 2, 3,
5, 8, 13, 21, 34, 55, 89, 144, ...
The first two numbers are 0 and 1 and each subsequent number is the sum of the previous two.
Write a program that asks the user to enter a positive integer number n and then prints the nth
number in the Fibonacci sequence. The following algorithm can be used:
Input: n positive integer
Output: nth Fibonacci number
Fib0 = 0 Fib1 = 1
for (i=2; i <= n; i++) do
temp = fib0
fib0 = fib1
fib1 = temp + fib1
if (n > 0) fib = fib1
else fib = 0

4: Flow Control Page 6


4.6 Bonus Problem

5. One method for computing the greatest common divisor of two positive numbers is the binary
gcd method, which uses only subtraction and division by 2. The algorithm of the binary gcd is
outlined below:

Input: a, b positive integers


Output: g and d such that g is odd and gcd(a, b) = g×2d
d = 0
while (a and b are both even) {
a = a/2
b = b/2
d = d + 1
}
while (a != b) {
if (a is even) a = a/2
else if (b is even) b = b/2
else if (a > b) a = (a – b)/2
else b = (b – a)/2
}
g = a

Write a program that asks the user to enter two positive numbers a and b and outputs the
greatest common divisor of the two numbers by implementing the given algorithm. If the user
enters a=48 and b=18, your program should output the gcd as 6.

4: Flow Control Page 7


5 Arrays and Files

5.1 Objectives

After completing this lab, you will:


• Define and initialize arrays statically in the data segment
• Allocate memory dynamically on the heap
• Compute the memory addresses of array elements
• Write loops in MIPS assembly to traverse arrays
• Write system calls to open, read from, and write to files

5.2 Defining and Initializing Arrays Statically in the Data Segment

Unlike high-level programming languages, assembly language has no special notion for an array.
An array is just a block of memory. In fact, all data structures and objects that exist in a high-level
programming language are simply blocks of memory. The block of memory can be allocated
statically or dynamically, as will be explained shortly.
An array is a homogeneous data structure. It has the following properties:
1. All array elements must be of the same type and size.
2. Once an array is allocated, its size becomes fixed and cannot be changed.
3. The base address of an array is the address of the first element in the array.
4. The address of an element can be computed from the base address and the element index.

An array can be allocated and initialized statically in the data segment. This requires:
1. A label: for the array name.
2. A .type directive for the type and size of each array element.
3. A list of initial values, or a count of the number of elements
A data definition statement allocates memory in the data segment. It has the following syntax:
label: .type value [, value] . . .
Examples of data definition statements are shown below:

5: Arrays and Files Page 1


.data
arr1: .half 5, -1 # array of 2 half words initialized to 5, -1
arr2: .word 1:10 # array of 10 words, all initialized to 1
arr3: .space 20 # array of 20 bytes, uninitialized
str1: .ascii "This is a string"
str2: .asciiz "Null-terminated string"
In the above example, arr1 is an array of 2 half words, as indicated by the .half directive,
initialized to the values 5 and -1. arr2 is an array of 10 words, as indicated by the .word
directive, all initialized to 1. The 1:10 notation indicates that the value 1 is repeated 10 times.
arr3 is an array of 20 bytes. The .space directive allocates bytes without initializing them in
memory. The .ascii directive allocates memory for a string, which is an array of bytes. The
.asciiz directive does the same thing, but adds a NULL byte at the end of the string. In addition
to the above, the .byte directive is used to define bytes, the .float and .double directives are
used to define floating-point numbers.
Every program has three segments when it is loaded into memory by the operating system, as
shown in Figure 5.1. There is the text segment where the machine language instructions are stored,
the data segment where constants, variables and arrays are stored, and the stack segment that
provides an area that can be allocated and freed by functions. Every segment starts at a specific
address in memory. The data segment is divided into a static area and a dynamic area. The
dynamic area of the data segment is called the heap. Data definition statements allocate space for
variables and arrays in the static area.
0x7fffffff
Stack Segment

Heap Area
0x10040000 Data Segment
0x10000000 Static Area

Text Segment
0x00400000

0x00000000 Reserved

Figure 5.1: The text, data, and stack segments of a program


If arrays are allocated in the static area, then one might ask: what is the address of each array? To
answer this question, the assembler constructs a symbol table that maps each label to a fixed
address. To see this table in the MARS simulator, select “Show Labels Window (symbol table)”
from the Settings menu. This will display the Labels window as shown in Figure 5.2. From this
figure, one can obtain the address of arr1 (0x10010000), of arr2 (0x10010004), of arr3
(0x1001002c), etc.
The la pseudo-instruction loads the address of a label into a register. For example, la $t0, arr3
loads the address of arr3 (0x1001002c) into register $t0. This is essential because the
programmer needs the address of an array to process its elements in memory.

5: Arrays and Files Page 2


You can watch the values in the data segment window as shown in Figure 5.3. To watch ASCII
characters, click on the ASCII box in the data segment window as shown in Figure 5.4. Notice that
characters appear in reverse order within a word in Figure 5.4. If the lw instruction is used to load
four ASCII characters into a register, then the first character is loaded into the least significant byte
of a register. This is known as little-endian byte ordering. On the other hand, big-endian orders the
bytes within a word from the most-significant to the least-significant byte.

Figure 5.2: Labels (symbol table) window under MARS

Figure 5.3: Watching Values in hexadecimal in the Data Segment

Figure 5.4: Watching ASCII Characters in the Data Segment

5.3 Allocating Memory Dynamically on the Heap


Defining data in the static area of the data segment might not be convenient. Sometimes, a program
wants to allocate memory dynamically at runtime. One of the functions of the operating system is to
manage memory. During runtime, a program can make requests to the operating system to allocate
additional memory dynamically on the heap. The heap area is a part of the data segment (Figure
5.1), that can grow dynamically at runtime. The program makes a system call ($v0 = 9) to allocate

5: Arrays and Files Page 3


memory on the heap, where $a0 is the number of bytes to allocate. The system call returns the
address of the allocated memory in $v0. The following program allocates two blocks on the heap:
.text
. . .
li $a0, 100 # $a0 = number of bytes to allocate
li $v0, 9 # system call 9
syscall # allocate 100 bytes on the heap
move $t0, $v0 # $t0 = address of first block
li $a0, 200 # $a0 = number of bytes to allocate
li $v0, 9 # system call 9
syscall # allocate 200 bytes on the heap
move $t1, $v0 # $t1 = address of second block
. . .

5.4 Computing the Addresses of Array Elements


In a high-level programming language, arrays are indexed. Typically, array[0] is the first element
in the array, and array[i] is the element at index i. Because all array elements have the same
size, then address of array[i], denoted as &array[i] = &array + i × element_size.
In the above example, arr2 is defined as an array of words (.word directive). Since each word is 4
bytes, then &arr2[i] = &arr2 + i×4. The & is the address operator. Since the address of arr2
is given as 0x10010004 in Figure 6.2, then: &arr2[i] = 0x10010004 + i×4.
A two-dimensional array is stored linearly in memory, similar to a one-dimensional array. To define
matrix[Rows][Cols], one must allocate Rows × Cols elements in memory. For example, one
can define a matrix of 10 rows × 20 columns word elements, all initialized to zero, as follows:
matrix: .word 0:200 # 10 by 20 word elements initialized to 0
In most programming languages, a two-dimensional array is stored row-wise in memory: row 0,
row 1, row 2, … etc. This is known as row-major order. Then, address of matrix[i][j],
denoted as &matrix[i][j] becomes:
& matrix[i][j] = &matrix + (i×Cols + j) × element_size
If the number of columns is 20 and the element size is 4 bytes (.word), then:
&matrix[i][j] = &matrix + (i×20 + j) × 4

For example, to translate matrix[1][5] = 73 into MIPS assembly language, one must compute:
&matrix[1][5] = &matrix + (1×20+5)×4 = &matrix + 100.

la $t0, matrix # load address: $t0 = &matrix


li $t1, 73 # $t1 = 73
sw $t1, 100($t0) # matrix[1][5] = 73

5: Arrays and Files Page 4


0 1 … j … 19
0 Each row has 20 word elements
1

i &matrix[i][j] = &matrix + (i×20 + j) × 4



9

Unlike a high-level programming language, address calculation is essential when programming in


assembly language. One must calculate the addresses of array elements precisely when processing
arrays in assembly language.

5.5 Writing Loops to Traverse Arrays

The following while loop searches an array of n integers linearly for a given target value:

int i=0;
while (arr[i]!=target && i<n) i = i+1;

Given that $a0 = &arr (address of arr), $a1 = n, and $a2 = target, the above loop is
translated into MIPS assembly code as follows:
move $t0, $a0 # $t0 = address of arr
li $t1, 0 # $t1 = index i = 0
while:
lw $t2, 0($t0) # $t2 = arr[i]
beq $t2, $a2, next # branch if (arr[i] == target) to next
beq $t1, $a1, next # branch if (i == n) to next
addi $t1, $t1, 1 # i = i+1
sll $t3, $t1, 2 # $t3 = i×4
add $t0, $a0, $t3 # $t0 = &arr + i×4 = &arr[i]
j while # jump to while loop
next:
. . .

To calculate the address of arr[i], the sll instruction shifts left i by 2 bits (computes i×4) and
then the add instruction computes &arr[i] = &arr + i×4. However, one can also point to the next
array element by incrementing the address in $t0 by 4, as shown below. Using a pointer to traverse
an array sequentially is generally faster than computing the address from the index.

5: Arrays and Files Page 5


while:
lw $t2, 0($t0) # $t2 = arr[i]
beq $t2, $a2, next # branch if (arr[i] == target) to next
beq $t1, $a1, next # branch if (i == n) to next
addi $t1, $t1, 1 # i = i+1
addi $t0, $t0, 4 # $t0 = &arr[i]
j while # jump to while loop
next:
. . .
A second example about a 2-dimensional array is shown below:

High-Level Program MIPS Assembly Language Code


int M[10][5]; .data
int i; M: .word 0:50 # array of 50 words
for (i=0; i<10; i++) { # &M[i][3] = &M + (i×5+3)×4 = &M + i×20 + 12
M[i][3] = i; .text
} la $t0, M # $t0 = &M[0][0]
li $t1, 0 # $t1 = i = 0
li $t2, 10 # $t2 = 10
for:
sll $t3, $t1, 4 # $t3 = i×16
sll $t4, $t1, 2 # $t4 = i×4
add $t5, $t3, $t4 # $t5 = i×20
add $t5, $t0, $t5 # $t5 = &M + i×20
sw $t1, 12($t5) # store: M[i][3] = i
addi $t1, $t1, 1 # i++
bne $t1, $t2, for # branch if (i != 10)

The for loop above sets the elements of column 3 to their row numbers. The first four instructions
used in the above for loop are used for address computation. Element addresses are computed
using the address of the first element in each row ($t5) and a fixed offset of 12. However, using a
pointer to traverse a column is much faster than re-computing the address from the index. Because
each row has 5 integer elements, the distance between two consecutive elements in the same
column is 20 bytes. Below, the MIPS assembly code is rewritten to use register $t0 as a pointer.
The number of instructions in the loop is reduced from 7 down to 4.

5: Arrays and Files Page 6


High-Level Program Using a Pointer to Traverse a Column in a Matrix
int M[10][5]; .data
int i; M: .word 0:50 # array of 50 words
for (i=0; i<10; i++) { # &M[i][3] = &M + (i×5+3)×4 = &M + i×20 + 12
M[i][3] = i; # &M[i+1][3] = &M[i][3] + 20
} .text
la $t0, M # $t0 = &M[0][0]
li $t1, 0 # $t1 = i = 0
li $t2, 10 # $t2 = 10
for: # for loop
sw $t1, 12($t0) # store: M[i][3] = i
addi $t1, $t1, 1 # i++
addi $t0, $t0, 20 # $t0 = &M[i][3]
bne $t1, $t2, for # branch if (i != 10)

5.6 Files
The operating system manages files on the disk storage. It provides system calls to open, read from,
and write to files. The MARS tool simulates some of the services of the operating system and
provides the following system calls:

Service $v0 Arguments Result


$a0 = address of null-terminated string
containing the file name.
$v0 = file descriptor
Open file 13 $a1 = 0 if read only
$v0 is negative if error
$a1 = 1 if write-only with create
$a1 = 9 if write-only with create and append
$v0 = number of
$a0 = file descriptor
characters read
Read from file 14 $a1 = address of input buffer
$v0 = 0 if end-of-file
$a2 = maximum number of characters to read
$v0 is negative if error
$a0 = file descriptor $v0 = number of
Write to file 15 $a1 = address of output buffer characters written
$a2 = number of characters to write $v0 is negative if error

Close file 16 $a0 = file descriptor

5: Arrays and Files Page 7


Here is a MIPS program that writes a string to an output file:
.data
outfile: .asciiz "out.txt" # output file name
msg: .asciiz "This text should be written in file out.txt"
.text
li $v0, 13 # Service 13: open file
la $a0, outfile # Output file name
li $a1, 1 # Write-only with create
syscall # Open file
move $s0, $v0 # $s0 = file descriptor
li $v0, 15 # Service 15: write to file
move $a0, $s0 # $a0 = file descriptor
la $a1, msg # $a1 = address of buffer
li $a2, 43 # $a2 = number of characters to write
syscall # Write to file
li $v0, 16 # Service 16: close file
move $a0, $s0 # $a0 = file descriptor
syscall # Close file

5.7 In-Lab Tasks

1. Given the following data definition statements, compute the addresses of arr2, arr3, str1,
and str2, given that the address of arr1 is 0x10010000. Show your steps for a full mark.
Select “Show Labels Window (symbol table)” from the Settings menu in MARS to check the
values of your computed addresses.

.data
arr1: .word 5:20
arr2: .half 7, -2, 8, -6
arr3: .space 100
str1: .asciiz "This is a message"
str2: .asciiz "Another important string"

2. In problem 1, given that arr1 is a one-dimensional array of integers, what are the addresses of
arr1[5] and arr1[17]?

3. In problem 1, given that arr3 is a two-dimensional array of bytes with 20 rows and 5 columns,
what are the addresses of arr3[7][2], arr3[11][4], and arr3[19][3]?

4. Write a MIPS program that defines a one-dimensional array of 10 integers in the static area of
the data segment, asks the user to input all 10 array elements, computes, and displays their sum.

5: Arrays and Files Page 8


5. Write a MIPS program that allocates an n×n array of integers on the heap, where n is a user
input. The program should compute and print the value of each element as follows:
for (i=0; i<n; i++)
for (j=0; j<n; j++) {
a[i][j] = i+j;
if (i>0) a[i][j] = a[i][j] + a[i-1][j];
if (j>0) a[i][j] = a[i][j] + a[i][j-1];
print_int(a[i][j]);
print_char(' ');
}
print_char('\n');
}
6. Write a MIPS program to copy an input text file into an output file. The input and output file
names should be entered by the user. If the input file cannot be opened, print an error message.

5.8 Bonus Question

7. Write a MIPS program to sort an array of integers in ascending order using the selection sort
algorithm. The array size should be entered by the user. The array should be allocated
dynamically on the heap. The array elements should be generated randomly using the random
number generator. The array elements should be printed before and after sorting.

5: Arrays and Files Page 9


Integer Multiplication
6 and Division
6.1 Objectives

After completing this lab, you will:


• Understand binary multiplication and division
• Understand the MIPS multiply and divide instructions
• Write MIPS programs that use integer multiplication and division

6.2 Binary Multiplication

Sequential binary multiplication is a simple but slow form of multiplication. It is performed using
addition and shift operations as illustrated in Figure 6.1. The multiplication of two 32-bit integers is a
64-bit product stored in two registers: HI and LO.

Figure 6.1: Sequential Binary Multiplication Algorithm

Register HI is initialized with the value 0, and LO is loaded with the value of the multiplier. The
sequential algorithm is repeated 32 times for each bit of the multiplier. Finally, the product is
computed in two registers HI and LO.

6: Integer Multiplication and Division Page 1


Figure 6.2 shows a simple sequential binary multiplier for unsigned integers. It uses a 32-bit ALU, a
register for storing the multiplicand, a shift register (HI and LO) for storing the final product, and
simple control. Instead of shifting the multiplicand to the left, the product is shifted to the right. It has
the same net effect and computes the same result. The control examines each bit of the multiplier
LO[0]. If a bit of the multiplier is 1, then an addition is done: HI = HI + Multiplicand. Then the HI
and LO registers are shifted to the right and the carry bit is inserted. This is repeats 32 times to
compute the 64-bit product in HI and LO.

Figure 6.2: Sequential Binary Multiplier

Signed multiplication can also be performed using the same sequential binary multiplier, but with
minor modification. When adding (HI + Multiplicand) the proper sign bit of the result is computed,
instead of the carry bit. When shifting the HI and LO registers to the right, the sign-bit is inserted to
the left of the product. Additions are used for the first 31 steps. However, the last step should use
subtraction (rather than addition), if the sign-bit of the multiplier is negative.

Sequential binary multiplication is slow because it requires one cycle for each bit of the multiplier.
Faster binary multiplication can be done in hardware, as shown in Figure 6.3. The cost of the binary
multiplier increases because it uses many adders, instead of just one used as in Figure 6.2.

Figure 6.3: Faster Integer Multiplier

6: Integer Multiplication and Division Page 2


6.3 Binary Division

Sequential binary division can be performed using shift and subtract operations. Binary division
produces a quotient and a remainder. It also uses two registers HI and LO. The quotient is computed
in the LO register and the remainder is computed in the HI register. HI is initialized with zero and
LO is initialized with the dividend. At each iteration step, registers HI and LO are shifted left by 1
bit. The difference: (HI – Divisor) is computed. If this difference is ≥ 0, the Remainder (HI register)
is set equal to the difference and the least significant bit of the quotient (LO register) is set to 1. The
sequential binary division algorithm is repeated 32 times, as shown in Figure 6.4.

Figure 6.4: Binary Division Algorithm

A simple but slow sequential binary divider is shown in Figure 5.5. It uses a 32-bit ALU that does
subtraction. It also uses HI and LO registers, a Divisor register, and simple control logic to shift left
the HI and LO registers and set the least-significant bit of LO. The control logic repeats 32 times to
compute each bit of the quotient in LO. The final remainder will be in the HI register.

Figure 6.5: Sequential Binary Divider

6: Integer Multiplication and Division Page 3


6.4 MIPS Integer Multiply and Divide Instructions

Multiplication and division generate results that are larger than 32 bits. Multiplication produces a
64-bit product while division produces a 32-bit quotient and a 32-bit remainder. The MIPS
architecture provides two special 32-bit registers that are the target for integer multiply and divide
instructions. The source operands come from the general-purpose register file. However, the results
are written into HI and LO registers. For multiplication, the HI and LO registers form a 64-bit
product. For division, HI stores the 32-bit remainder, while LO stores the 32-bit quotient.
MIPS defines two multiply instructions: mult for signed multiplication and multu for unsigned
multiplication. Both multiply instructions produce a 64-bit product without overflow. There is also
a third mul instruction that computes the same 64-bit product as a mult instruction in HI and LO
registers. However, the mul instruction also copies the LO register into destination register Rd. The
mul instruction is useful when the product is small and can fit in a 32-bit destination register.

Instruction Meaning Note


mult Rs, Rt [HI, LO] = Rs × Rt Signed multiplication
multu Rs, Rt [HI, LO] = Rs × Rt Unsigned multiplication
mul Rd, Rs, Rt [HI, LO] = Rs × Rt; Rd = LO Rd = Lower 32-bit of the product

Table 6.1: MIPS Integer Multiply Instructions

In addition, MIPS defines two integer divide instructions: div for signed division and divu for
unsigned division. The quotient of the integer division is saved in the LO register, while the
remainder is saved in the HI register as shown in Table 6.2.

Instruction Meaning Note


div Rs, Rt LO = Rs / Rt, HI = Rs % Rt Signed division
divu Rs, Rt LO = Rs / Rt, HI = Rs % Rt Unsigned division

Table 6.2: MIPS Integer Divide Instructions

If the divisor in register Rt is zero, then the MIPS divide instructions do not compute any result in
the HI and LO registers. Division by zero is ignored and no exception is produced. The MIPS
programmer must ensure that the divisor is non-zero when using the integer divide instruction.

Special instructions are used to move data between the HI and LO registers and general-purpose
registers. These are listed in Table 6.3.

6: Integer Multiplication and Division Page 4


Instruction Meaning Instruction Meaning
mfhi Rd Rd = HI mthi Rs HI = Rs
mflo Rd Rd = LO mtlo Rs LO = Rs

Table 6.3: Move Instructions for the HI and LO registers

6.5 Applications of Integer Multiply and Divide Instructions

Raising an integer number x to a power n (xn), may be computed using successive multiplications.
The following code uses integer multiplication to implement the power function. Registers $a0 and
$a1 are used to store x and n, while $v0 contains the result.

li $a0, 7 # number x
li $a1, 5 # number n
li $v0, 1 # $v0 = 1
pow:
mul $v0, $v0, $a0 # $v0 = $v0, $a0
addiu $a1, $a1, -1 # decrement n
bnez $a1, pow # loop if (n != 0)

The greatest common divisor can be computed as follows:


gcd(a, 0) = a
gcd(a, b) = gcd(b, a % b) where % is the remainder operator
For example,
gcd(30, 18) = gcd(18, 30%18) =
gcd(18, 12) = gcd(12, 18%12) =
gcd(12, 6) = gcd(6, 12%6) = gcd(6, 0) = 6

The following MIPS loop computes the gcd of two numbers stored in registers $a0 and $a1. The
final result is computed in register $a0.
li $a0, 30 # number a
li $a1, 18 # number b
gcd:
div $a0, $a1 # HI = remainder, LO = quotient
move $a0, $a1 # $a0 = number b
mfhi $a1 # $a1 = a % b (remainder)
bnez $a1, gcd # loop if (b != 0)

6: Integer Multiplication and Division Page 5


A string is an array of bytes stored in memory. For example, str is defined as a string of digits in
the data segment. The .asciiz directive is used to define an ASCII string stored in memory and
terminated with a NULL byte.
.data
str: .asciiz "512943687"
The string of digits can be read from memory and converted into a number. The lb (load byte)
instruction can read each character of a string from memory into a register. The following MIPS
loop converts the above string str into an integer computed in register $v0:

.text
main:
la $t0, str # load address of str into $t0
li $v0, 0 # Initialize $v0 = 0
li $v1, 10 # Initialize $v1 = 10
lb $t1, ($t0) # load byte: $t1 = Memory($t0)
str2int:
addiu $t1, $t1, -48 # convert character to a number
mul $v0, $v0, $v1 # $v0 = $v0 * 10
addu $v0, $v0, $t1 # $v0 = $v0 + digit
addiu $t0, $t0, 1 # point to next character in memory
lb $t1, ($t0) # load byte: $t1 = Memory($t0)
bnez $t1, str2int # loop back if not NULL character
done:
. . . # $v0 = integer result

An unsigned integer can be converted to a string by successive divisions by 10. The remainder is a
digit between 0 and 9. The remainder digits are computed backwards starting at the least significant
digit. Each remainder is then converted to an ASCII character and stored in memory in a string. For
example, consider converting the integer 5218 into a string. This can be done as follows:
5128 / 10 = 512, 5128 % 10 = 8 Convert 8 into character '8'
512 / 10 = 51, 512 % 10 = 2 Convert 2 into character '2'
51 / 10 = 5, 51 % 10 = 1 Convert 1 into character '1'
5 / 10 = 0, 5 % 10 = 5 Convert 5 into character '0'
Stop when the quotient is zero

The following MIPS code converts an unsigned integer stored in register $a0 into a string stored in
the data segment in memory. The string is initialized with 10 space characters. The string has 10
characters only because a 32-bit unsigned integer can have at most 10 digits.

6: Integer Multiplication and Division Page 6


.data
str: .asciiz " " # str = 10 space characters
.text
main:
li $a0, 5128 # $a0 = unsigned integer to convert
la $v0, str # load address of str into $v0
addiu $v0, $v0, 11 # $v0 = pointer at end of str
li $a1, 10 # Initialize $a1 = 10
int2str:
divu $a0, $a1 # divide $a0 by 10
mflo $a0 # $a0 = quotient
mfhi $t0 # $t0 = remainder (0 to 9)
addiu $t0, $t0, 48 # convert digit into a character
addiu $v0, $v0, -1 # point to previous space character
sb $t0, ($v0) # store byte: Memory($v0) = $t0
bnez $a0, int2str # loop back if quotient is not zero
done:
. . . # $v0 = pointer to string in memory

6.6 In-Lab Tasks

1. Write MIPS code to perform the following integer multiplications. What is the value of the LO
and HI registers?
a) 98765 × 54321 using the multu instruction
b) -98765 × -54321 using the mult instruction

2. Write MIPS code to perform the following integer divisions. What is the value of the LO and HI
registers?
a) 98765 / 54321 using the divu instruction
b) -98765 / -54321 using the div instruction
3. Factorial Calculation: Using the mul instruction, write a MIPS program that computes the
factorial of a number n input by the user, and display the result on the screen. Run your code
and record the maximum 32-bit factorial value that can be computed without errors.

4. The string-to-integer program presented in Section 6.5 converts a string of decimal digits to an
unsigned integer using successive multiplications by 10 and additions. It is also possible to
convert a string of digits in any radix system to an integer, using successive multiplications by
the radix value and additions. Rewrite the string-to-integer program asking the user to input a

6: Integer Multiplication and Division Page 7


radix value between 2 and 10 and a string of digits in the specified radix system. For example,
if the radix value is 8 then the string can only have octal digit characters from '0' to '7'.
Convert the string of digit characters into an unsigned integer and display the value of the
unsigned integer.

5. The integer-to-string program presented in Section 6.5 converts an unsigned integer to string
format using successive division by 10 and storing the remainder digit characters in a string. It
is also possible to convert the unsigned integer to any radix using successive divisions by the
radix value. Rewrite the integer-to-string program asking the user to input an unsigned integer
and a radix value between 2 and 10. Do the radix conversion and then print the string. Make
sure that the string has sufficient space characters, especially when converting to radix 2.

6. Fraction computation: Using successive integer multiplications and divisions, write a MIPS
program that divides an integer x by another integer y that are read as input. The result of the
division should be in the form: a.b, where a is the integer part and b is the fractional part.
Compute the fraction b with 8 digits after the decimal point. Display the result in the form a.b.

6: Integer Multiplication and Division Page 8


MIPS Functions and the
7 Stack Segment
7.1 Objectives

After completing this lab, you will:


• Write MIPS functions, pass parameters, and return results
• Understand the stack segment, allocate, and free stack frames
• Understand the MIPS register usage convention
• Write recursive functions in MIPS

7.2 MIPS Functions

A function (or a procedure) is a tool that programmers use to structure programs, to make them
easier to understand, and to allow the function’s code to be reused. A function is a block of
instructions that can be called and used when required at several different points in the program.
The function that initiates the call to another function is known as the caller. The function that
receives and executes the call is known as the callee. When the callee function finishes execution,
control is transferred back to the caller function.
A function can receive parameters and return results. The parameters and results act as an interface
between a function and the rest of the program.
To execute a function, the program must follow these steps:
1. The caller must put the parameters in a place where the callee function can access them
2. Transfer control to the callee function
3. Execute the callee function
4. The callee function must put the results in a place where the caller can access them
5. Return control to the caller (point of origin) next to where the call was made
Registers are the fastest place to pass parameters and return results. The MIPS architecture follows
the following software conventions for passing parameters and returning results:
• $a0-$a3: four argument registers in which to pass parameters
• $v0-$v1: two value registers in which to return function results
• $ra: one return address register to return back to the caller

7: MIPS Functions and the Stack Segment Page 1


The jal (jump-and-link) instruction initiates the call to a function and the jr (jump register)
instruction returns control back to the caller.
To call a function, use the jal instruction as follows:
jal label
The jal instruction saves the return address in register $ra and jumps to the first instruction in the
function after label. The return address is the address of the next instruction that appears after the
jal instruction in the caller function.
To return from a function, use the jr instruction as follows:
jr $ra
The jr instruction jumps to the address stored in $ra. It modifies the program counter PC register
according to the value stored in register $ra.
An example of a C function that checks whether a character ch is a lowercase letter or not is shown
in Figure 7.1. The function is translated into MIPS assembly language as shown to the right. The
function islower assumes that the parameter ch is passed in register $a0. The function result is
passed in register $v0.

int islower(char ch) { islower:


if (ch>='a' && ch<='z') blt $a0, 'a', else # branch if $a0 < 'a'

return 1; bgt $a0, 'z', else # branch if $a0 > 'z'


li $v0, 1 # $v0 = 1
else
jr $ra # return to caller
return 0;
else:
} li $v0, 0 # $v0 = 0
jr $ra # return to caller

Figure 7.1: Example of a C function and its translation into MIPS assembly code

To call the function islower, the caller must first copy the character ch into register $a0 and then
make the function call. This is shown in Figure 7.2:

move $a0, ... # move into register $a0 the character ch


jal islower # call function islower
. . . # return here after executing function islower

Figure 7.2: Using the jal instruction in MIPS to initiate a function call

Remember that the jal instruction saves the return address in register $ra and that jr jumps into
the return address in register $ra to achieve a function return.
The MIPS architecture provides three instructions to support functions and methods in high-level
programming languages. The jal (jump-and-link) instruction is used to call functions whose
addresses are constants known at compile time, while the jalr (jump-and-link register) instruction

7: MIPS Functions and the Stack Segment Page 2


is used to call methods whose addresses are variables known at runtime. The jr (jump register)
instruction can be used to return from function calls and methods. These instructions, their meaning,
and format are summarized in Figure 7.3.

Figure 7.3: The jal, jr, and jalr instructions in MIPS

7.3 The Stack Segment and the Stack Pointer Register

Every program has three segments when it is loaded into memory by the operating system. There is
the text segment where the machine language code is stored, the data segment where space is
allocated for constants and variables, and the stack segment that provides an area that can be
allocated and freed by functions. The programmer has no control over where these segments are
located in memory. The stack segment can be used by functions for passing many parameters, for
allocating space for local variables, and for saving and preserving registers across calls. Without the
stack segment in memory, it would be impossible to write recursive functions, or pure functions that
have no side effects.
When a program is loaded into memory, the operating system initializes the stack pointer $sp
(register $29) to the base address of the stack segment. The stack segment grows downwards
towards lower memory addresses as shown in Figure 7.4.

0x7fffffff
Stack Segment Stack grows
Downwards

Heap Area
0x10040000 Data Segment
0x10000000 Static Area

Text Segment
0x00400000

0x00000000 Reserved

Figure 7.4: The text, data, and stack segments of a program

When a program starts execution, the operating system initializes the stack pointer $sp register
with a valid address to point to the top of the stack. For example, when executing a MIPS program
under the MARS tool, the initial value of register $sp is 0x7fffeffc.
A function can allocate space on the stack for saving registers and for its local variables. The space
that a function allocates on the stack is called a stack frame (called also an activation record).

7: MIPS Functions and the Stack Segment Page 3


To allocate a stack frame of n bytes, decrement the stack pointer by n at the start of a function:
addiu $sp, $sp, -n # n must be a constant number of bytes
To free a stack frame of n bytes, increment the stack pointer by n just before a function return:
addiu $sp, $sp, n # n must be a constant number of bytes
Figure 7.5 illustrates the stack allocation before calling a function, during the execution of function,
and after returning from a function call. The stack pointer register $sp points to the top of the
caller’s stack frame before making a call to function f. The $sp register points to the stack frame of
function f during its execution. The $sp register points back to the top of the caller’s stack frame
after returning from function f. The stack frame can be used to pass arguments to a function, to
save registers across function calls, and to allocate space for local variables declared inside the
function. In particular, register $ra should be saved before a function can call another function,
because the jal instruction modifies the $ra register. Arguments are typically passed in registers
$a0 thru $a3. However, if a function has more than four arguments then the additional arguments
should be passed on the stack.

High address High address High address

Caller’s Caller’s Caller’s


Stack Frame Stack Frame Stack Frame
$sp $sp
Arguments
(if any)
Saved Regs Stack
(if any) Frame of
Local vars Function f
(if any)
$sp
Low address Low address Low address

a) Before calling f b) While executing f c) After returning from f

Figure 7.5: Stack allocation (a) before (b) while executing, and (c) after returning from function f
An example of a function f that allocates a stack frame is shown in Figure 7.6. The function f is
non-leaf, because it calls functions read, reverse, and print. Therefore, the return address of
function f (register $ra) must be saved on the stack. In addition, the stack frame of function f must
allocate space for the local array (10 integer elements = 40 bytes), as shown in Figure 7.6.

Example function Stack Frame


void f() { saved $ra = 4 bytes
int array[10];
read(array, 10);
int array[10]
reverse(array, 10);
print(array, 10); (40 bytes)
}

Figure 7.6: Example of a function f and its corresponding stack frame

7: MIPS Functions and the Stack Segment Page 4


The translation of function f into MIPS assembly code is shown in Figure 7.7. Function f allocates
a stack frame of 44 bytes. The stack is accessed using the same load and store instructions used to
access the data segment. The base address register is $sp. A displacement is used to access
different elements on the stack.

f: addiu $sp, $sp, -44 # allocate stack frame = 44 bytes


sw $ra, 40($sp) # save $ra on the stack
move $a0, $sp # $a0 = address of array on the stack
li $a1, 10 # $a1 = 10
jal read # call function read
move $a0, $sp # $a0 = address of array on the stack
li $a1, 10 # $a1 = 10
jal reverse # call function reverse
move $a0, $sp # $a0 = address of array on the stack
li $a1, 10 # $a1 = 10
jal print # call function print
lw $ra, 40($sp) # load $ra from the stack
addiu $sp, $sp, 44 # Free stack frame = 44 bytes
jr $ra # return to caller

Figure 7.7: Translation of function f into MIPS assembly code


Some MIPS software uses the frame pointer register $fp (register $30) to point to the base address
of a stack frame. This might be needed if the stack pointer $sp changes during the execution of a
function, or arrays and objects are allocated dynamically on the stack. The frame pointer $fp
register provides a stable address for a stack frame within a function.

7.4 MIPS Register Usage

A convention regarding the usage of registers is necessary because software is written by many
programmers. In this case, each programmer must know how registers are supposed to be used,
such that his piece of the software does not conflict with pieces written by other programmers.
Since programming is done today using high-level programming languages, you may ask why such
a register convention is still needed. Well, it is the compiler who needs to know about it. This is
because a program can be created from different pieces that are compiled separately. To compile a
function, the compiler must know which registers are used to pass parameters, which registers are
used to return results, and which registers must be preserved across function calls. These rules for
register usage are also known as function call conventions.
The MIPS hardware does not prevent you from ignoring these rules, from not preserving registers,
from using any register in passing parameters and returning results. However, if you ignore these
rules, you will easily run into trouble and have software bugs that are difficult to eliminate.

7: MIPS Functions and the Stack Segment Page 5


The following table presents the MIPS register usage convention:

Register Register
Register Usage
Name Number
$zero $0 Always zero. Cannot be modified
$at $1 Reserved for assembler use
$v0 - $v1 $2 - $3 Function results are returned in $v0 and $v1
$a0 - $a3 $4 - $7 Function arguments are passed in $a0 thru $a3
$t0 - $t7 $8 - $15 Temporary registers. Not preserved across function calls
$s0 - $s7 $16 - $23 Saved registers. Must be preserved across function calls
$t8 - $t9 $24 - $25 Additional temporary registers. Not preserved
$k0 - $k1 $26 - $27 Reserved for OS kernel usage
$gp $28 Global pointer to global data. Must be preserved
$sp $29 Stack pointer. Must be preserved
$fp $30 Frame pointer. Must be preserved
$ra $31 Return address register. Must be preserved
Figure 7.8: MIPS register usage convention

A function is free to modify the value registers $v0-$v1, the argument registers $a0-$a3, and the
temporary registers $t0-$t7 and $t8-$t9 without saving their old values. However, it should not
modify registers $s0-$s7, $gp, $sp, $fp, and $ra except after saving their old values in memory
on the stack. A function must restore the value of registers $s0-$s7, $gp, $sp, $fp, and $ra by
loading their old values from the stack, just before returning back to the caller. Registers $sp and
$fp must be preserved if a new stack frame is allocated by a function. Register $ra must be
preserved if a function makes a call to another function, because the jal instruction modifies the
return address register $ra.

7.5 Recursive Functions

A recursive function is a function that calls itself. For example, the recursive function fact
(factorial) and its translation into MIPS assembly code are shown in Figure 7.9. If (n<2) then there
is no need to allocate a stack frame. However, if (n>=2) then the factorial function allocates a
stack frame of 8 bytes to save registers $a0 and $ra.
Register $a0 (argument n) is saved on the stack because its value is changed in the recursive call,
and because it is needed after returning from the recursive call. Register $ra is saved on the stack
because its value is changed by the recursive call (jal fact).

7: MIPS Functions and the Stack Segment Page 6


int fact(int n) {
if (n<2) return 1;
else return (n * fact(n-1));
}
fact:
bge $a0, 2, else # branch if (n >= 2) to else
li $v0, 1 # $v0 = 1
jr $ra # return to caller
else:
addi $sp, $sp, -8 # allocate a stack frame of 8 bytes
sw $a0, 0($sp) # save the argument n
sw $ra, 4($sp) # save the return address
addi $a0, $a0, -1 # argument $a0 = n-1
jal fact # call fact(n-1)
lw $a0, 0($sp) # restore $a0 = n
lw $ra, 4($sp) # restore return address
mul $v0, $a0, $v0 # $v0 = n * fact(n-1)
addi $sp, $sp, 8 # free stack frame
jr $ra # return to the caller

Figure 7.9: A recursive function and its translation into MIPS assembly language code

7.6 In-Lab Tasks

1. The function islower, shown in Figure 7.1, tests whether a character ch is lowercase or not.
Write the main function of a program that reads a character ch, calls the function islower,
and then prints a message to indicate whether ch is a lowercase character or not.
2. Write a function that reads an array of n integers. The function read must receive two
arguments: $a0 = address of the array, and $a1 = number n of elements to read.
3. Write a function that prints an array of n integers. The function print must receive two
arguments: $a0 = address of the array, and $a1 = number n of elements to print.
4. Write a function that reverses the elements of an array of n integers. The function reverse
must receive two arguments: $a0 = address of the array, and $a1 = number n of elements.
5. Suppose we rewrite function f (Figures 7.6) to have an integer parameter n. The local array is
now declared to have n integers (rather than 10). This means that the size of the stack frame size

7: MIPS Functions and the Stack Segment Page 7


of function f will depend on n. Rewrite the function f in MIPS assembly language. Hint: you
may use the $fp register (in addition to $sp) to implement the function f.
void f(int n) {
int array[n];
read(array, n);
reverse(array, n);
print(array, n);
}
6. The function f(n) implemented in problem 5 calls the functions read, reverse, and print
implemented in problems 2 to 4. Write a complete program that includes the main function as
well as functions f, read, reverse, and print. The main function should call function f
twice as: f(5) and f(8).
7. The recursive function fib(n) computes the nth element in the Fibonacci sequence. Implement
this function in MIPS. Write a main function to call fib.
int fib(int n) {
if (n < 2) return n;
return (fib(n-1) + fib(n-2));
}

7.7 Bonus Question

8. The function quick_sort sorts an array recursively. Translate this function into MIPS code.
Write a main function to call and test this function.
void quick_sort(int array[], int low, int high) {
int i = low, j = high; // low and high index
int pivot = array[(low+high)/2]; // pivot = middle value
while (i <= j) {
while (array[i] < pivot) i++;
while (array[j] > pivot) j--;
if (i <= j) {
int temp=array[i];
array[i]=array[j]; // swap array[i]
array[j]=temp; // with array[j]
i++;
j--;
}
}
if (low < j) quick_sort(array, low, j); // Recursive call 1
if (i < high) quick_sort(array, i, high); // Recursive call 2
}

7: MIPS Functions and the Stack Segment Page 8


8 Exceptions and I/O

8.1 Objectives

After completing this lab, you will:


• Understand the exception mechanism in MIPS
• Understand Coprocessor 0 instructions
• Write exception handlers
• Understand Memory Mapped I/O

8.2 Exception Mechanism in MIPS

Branches and jumps change the control flow in a program. Exceptions also change the control flow.
The MIPS architecture calls an exception any unexpected change in control flow, regardless of its
source.
An exception is said to be synchronous if it is caused by an instruction in the running program.
Examples of synchronous exceptions are arithmetic exceptions, invalid memory addresses
generated by load and store instructions, and trap instructions.
An exception is said to be asynchronous if it is caused by an I/O device requesting the processor.
This is also known as a hardware interrupt, which is not related to program execution. Interrupts
can be caused by a variety of I/O devices, such as the keyboard, timer, and disk controller.
When an exception happens, control is transferred to an exception handler, written specifically for
the purpose of dealing with exceptions. After executing the exception handler, control is returned
back to the program. The program continues as if nothing happened. The exception handler appears
as a procedure called suddenly in the program with no parameters and no return value.
The MIPS processor operates either in user or kernel mode. User programs (applications) run in
user mode. The CPU enters the kernel mode when an exception happens. The exception handling
mechanism is implemented by Coprocessor 0, which has several important registers, such as:
vaddr, status, cause, and EPC, which record information about an exception.
1. Vaddr ($8): Contains the invalid memory address caused by load, store, or fetch.
2. Status ($12): Contains the interrupt mask and enable bits (see below).
3. Cause ($13): Contains the type of exception and any pending bits (see below).
4. EPC ($14): Contains the address of the instruction when the exception occurred.

8: Exceptions and I/O Page 1


The MARS tool shows the values of registers: vaddr, status, cause, and epc under the
Coprocessor 0 tab, as shown in Figure 8.1.

Figure 8.1: Coprocessor 0 registers in the MARS tool


Examples of exceptions are shown in Figure 8.2. The first example initializes register $t0 with
0x7fffffff and $t1 with 1. The addu instruction ignores overflow. However, add detects and
causes an arithmetic overflow exception. The second example is about storing data at an illegal
address in memory. The third example is about loading a word from a misaligned address in
memory. The last example is about inserting a breakpoint (break instruction) inside the program.
All of these examples cause exceptions and require special handling.

Examples of Exceptions Coprocessor 0 Registers


# Arithmetic Overflow Exception
li $t0, 0x7fffffff # $t0 = MAX_INT
li $t1, 1 # $t1 = 1
addu $t2, $t0, $t1 # Ignores Overflow
add $t3, $t0, $t1 # Detects Overflow

# Store Address Exception


# Cannot store at address 4
li $t0, 4
li $a0, 5
sw $a0, ($t0)

# Misaligned Load Address Exception


.data
arr: .word 12, 17
. . .
la $t0, arr
lw $t0, 1($t0)

# Breakpoint Exception
# Caused by the break instruction
.text
. . .
break

Figure 8.2: Examples of exceptions and corresponding values of Coprocessor 0 registers

8: Exceptions and I/O Page 2


The second column of Figure 8.2 shows Coprocessor 0 registers when an exception happens. The
Exception Program Counter (EPC) register $14 stores the address of the instruction that caused the
exception. The value of EPC in Figure 8.2 is 0x00400010, which is the address of the add
instruction that caused arithmetic overflow. It is 0x0040001c, which is the address of sw that
attempted to write to an illegal address in memory. It is 0x00400028, which is the address of lw
that generated a misaligned address in memory.
The cause register $13 provides information about the cause of an exception (exception code) and
about pending interrupts, if any. The exception code is stored in bits 2 to 6 of the cause register. The
bit fields of the cause register is shown in Figure 8.3.

15 14 13 12 11 10 9 8 6 5 4 3 2

Pending Interrupts Exception Code


Figure 8.3: The Cause Register $13
The MIPS architecture defines exception codes for different types of exceptions. Some are listed in
Figure 8.4. The MARS tool simulates some of these exception codes.

Code Name Description


0 INT Hardware Interrupt
4 ADDRL Address error exception caused by load or instruction fetch
5 ADDRS Address error exception caused by store
6 IBUS Bus error on instruction fetch
7 DBUS Bus error on data load or store
8 SYSCALL System call exception caused by the syscall instruction
9 BKPT Breakpoint exception caused by the break instruction
10 RI Reserved instruction exception
12 OVF Arithmetic overflow exception
13 TRAP Exception caused by a trap instruction
15 FPE Floating-Point exception cause by a floating-point instruction
Figure 8.4: Some of the MIPS exception codes
If a load, store, jump, or branch instruction generates an exception, then vaddr (register $8)
contains the invalid memory address that caused the exception. Consider again the examples of
Figure 8.2, the illegal data address of sw that caused the exception is vaddr = 0x00000004 and the
misaligned data address of lw that caused the exception is vaddr = 0x10010001.
The status (register $12) is shown in Figure 8.5. Bit 0 is the Interrupt Enable (IE), which enables
or disables interrupts. Bit 1 is the Exception Level (EL). The EL bit is normally 0, but is set to 1
after an exception occurs.
The interrupt mask field contains 8 bits and supports a bit for each of the six hardware and two
software interrupt levels. A mask bit that is 1 allows interrupts at that level to interrupt the
processor. A mask bit that is 0 disables interrupts at that level. When an interrupt arrives, it sets its

8: Exceptions and I/O Page 3


interrupt pending bit in the cause register, even if the mask bit is disabled. When an interrupt is
pending, it will interrupt the processor when its mask bit is subsequently enabled.

15 14 13 12 11 10 9 8 1 0
EL IE
Interrupt Mask
Figure 8.5: The Status Register $12

8.3 Coprocessor 0 Instructions


The MIPS architecture defines special instructions that cause exceptions and switch the processor
from user to kernel mode. These are the trap, break, and syscall instructions:

Instruction Meaning
teq Rs, Rt Raise the trap exception if register Rs is equal to register Rt
tne Rs, Rt Raise the trap exception if register Rs is not equal to register Rt
tlt Rs, Rt Raise the trap exception if register Rs is less than register Rt
. . . There are other trap instructions not listed here (see Appendix B)
break code Raise the breakpoint exception. Code 1 is reserved for the debugger
syscall Raise the system call exception. Service number is specified in $v0

The trap instructions (teq, tne, … etc.) raise the TRAP exception code 13. The break instruction
raises the breakpoint exception code 9. However, the MARS simulator provides services for the
syscall instruction without raising an exception. On a real system, the syscall instruction raises
the system call exception code 8, which is serviced by the operating system.
When an exception occurs, the processor switches to kernel mode. Coprocessor 0 registers can be
accessed only when the processor is servicing an exception in kernel mode. Register values can be
transferred from and to coprocessor 0 using the following instructions. Load and store instructions
that transfer data between coprocessor 0 registers and memory are also available. These instructions
can be used when writing an exception handler.
After an exception is processed, the exception handler can return back and resume the execution of
a program. The eret instruction is used to return from an exception.

Instruction Meaning
mfc0 Rd, C0src Move from Coprocessor 0 register C0src into destination register Rd
mtc0 Rs, C0dst Move to Coprocessor 0 register C0dst the value of register Rs
lwc0 C0dst, addr Load a word from memory into Coprocessor 0 register C0dst
swc0 C0src, addr Store Coprocessor 0 register C0src in memory
eret Reset EL = 0 (back to user mode) and return: PC = EPC

8: Exceptions and I/O Page 4


8.4 Exception Handlers
The layout of a MIPS program is shown in Figure 8.6. The Operating System appears in the upper
half of the address space, which can only be accessed when the processor is running in kernel mode.
The OS kernel text segment starts at address 0x80000000 and the kernel data segment starts at
address 0x90000000. The last segment of the address space is mapped to I/O devices, starting at
address 0xffff0000. This is known as Memory-Mapped I/O (MMIO). The default memory
configuration is shown in Figure 8.6. It is also possible to change the memory configuration under
the MARS tool settings.

0xffff0000
Memory-Mapped I/O

Kernel Data Segment


Operating
0x90000000
System
Kernel Text Segment
0x80000000

Stack Segment Stack grows


Downwards

Dynamic Area
0x10040000 Data Segment
0x10000000 Static Area

Text Segment
0x00400000

0x00000000 Reserved

Figure 8.6: The layout of a MIPS program in memory

When a function is called using jal, control is transferred at the address provided by the instruction
and the return address is saved in register $ra. In the case of an exception there is no explicit call.
In MIPS, when an exception occurs, control is transferred at the fixed address 0x80000180. The
exception handler must be located at that address.
The exception return address cannot be saved in $ra since it will modify a return address that has
been placed in that register before the exception. The Exception Program Counter (EPC) register is
used to store the address of the instruction that was executing when the exception was generated.
An exception handler can be written in the same file as the regular program, or in a separate file.
The exception handler must start at the fixed address 0x80000180. This address is in the kernel
text segment. If there is no instruction at address 0x80000180, MARS will terminate the MIPS
program with an appropriate error message. An example of an exception handler that prints the
address of the instruction that caused the exception and the exception code is shown below:

8: Exceptions and I/O Page 5


# Exception Handler starts here
.ktext 0x80000180

move $k0, $at # $k0 = $at


la $k1, _regs # $k1 = address of _regs
sw $k0, 0($k1) # save $at
sw $v0, 4($k1) # save $v0
sw $a0, 8($k1) # save $a0

la $a0, _msg1 # $a0 = address of _msg1


li $v0, 4 # $v0 = service 4
syscall # Print _msg1
mfc0 $a0, $14 # $a0 = EPC
li $v0, 34 # $v0 = service 34
syscall # print EPC in hexadecimal

la $a0, _msg2 # $a0 = address of _msg2


li $v0, 4 # $v0 = service 4
syscall # Print _msg2
mfc0 $a0, $13 # $a0 = cause
srl $a0, $a0, 2 # shift right by 2 bits
andi $a0, $a0, 31 # $a0 = exception code
li $v0, 1 # $v0 = service 1
syscall # Print exception code

la $a0, _msg3 # $a0 = address of _msg3


li $v0, 4 # $v0 = service 4
syscall # Print _msg3

la $k1, _regs # $k1 = address of _regs


lw $at, 0($k1) # restore $at
lw $v0, 4($k1) # restore $v0
lw $a0, 8($k1) # restore $a0

mtc0 $zero, $8 # clear vaddr


mfc0 $k0, $14 # $k0 = EPC
addiu $k0, $k0, 4 # Increment $k0 by 4
mtc0 $k0, $14 # EPC = point to next instruction

eret # exception return: PC = EPC

# kernel data is stored here


.kdata

_msg1: .asciiz "\nException caused by instruction at address: "


_msg2: .asciiz "\nException Code = "
_msg3: .asciiz "\nIgnore and continue program ...\n"
_regs: .word 0:3 # Space for saved registers

Figure 8.7: Example of an exception handler

8: Exceptions and I/O Page 6


Writing an exception handler is no easy job. Things that should be done are:
• Save registers before modifying them by the exception handler.
• Read the coprocessor registers to determine what exception has occurred.
• Execute the specific handler for the exception code, usually via a jump table.
• Restore all the registers that were modified by the exception handler.
• Restart the user-mode program, or kill the program if it cannot be restarted.
The exception handler must preserve the value of any register it may modify, such that the
execution of the interrupted program can continue at a later time. The MIPS architecture reserves
register $26 and $27 ($k0 and $k1) for the use of the exception handler. The handler can modify
these two registers without having to save them first. Additional registers should be saved in
memory before they can be modified by the exception handler. For example, the exception handler
of Figure 8.7 saves register $at, $a0, and $v0 in the kernel data segment.
The EPC contains the address of the instruction that has generated the software exception. The
return address must be incremented by 4 to avoid executing the same instruction again. The return
sequence from a software exception can be written as follows:

mfc0 $k0, $14 # $k0 = EPC


addiu $k0, $k0, 4 # increment $k0 by 4
mtc0 $k0, $14 # EPC = point to next instruction
eret # exception return: PC = EPC

Rather than writing the exception handler at the end of every MIPS program, it is better to write the
exception handler in a separate file, then open the “Exception Handler …” dialog box in the MARS
settings and include the exception handler file in all your MIPS programs.

8.5 Memory Mapped I/O


In any computer system, input and output devices are outside the processor chip. A MIPS processor
communicates with I/O devices using a technique called memory-mapped I/O. Using memory-
mapped I/O, there is no need to add additional instructions to the MIPS instruction set. Any lw or
sw instruction with an effective address of 0xffff0000 or greater will not access the main
memory. These addresses are reserved to make access to registers in I/O devices. The I/O device
controllers must be connected to the system I/O bus, as shown in Figure 8.8.
Unique address decode logic is associated with each I/O register. When the MIPS processor reads
from or writes to one of these addresses, the processor is actually reading from or writing to a
selected register in one of the I/O device controllers. The processor can read data about the state of
the I/O device and write control data to change the state of the device.
The two registers associated with the keyboard are the receiver control and data registers. These are
mapped to addresses 0xffff0000 and 0xffff0004 respectively. To communicate with the
keyboard, the processor reads the control register. As long as the ready bit is zero, the processor
keeps reading the control register in a loop. This approach is known as polling.

8: Exceptions and I/O Page 7


Figure 8.8: MIPS System I/O Bus
When a key is pressed, the data register stores the character and the ready bit is set. Then the
processor reads the character. The following MIPS code provides an example of memory-mapped
access to the keyboard control and data registers via polling:
li $t0, 0xffff0000 # Address of keyboard control register
li $t1, 0 # Initialize wait_counter = 0
wait_keyboard:
lw $t2, ($t0) # Read the keyboard control register
andi $t2, $t2, 1 # Extract ready bit
addiu $t1, $t1, 1 # wait_counter++ (counts iterations)
beqz $t2, wait_keyboard # loop back while not ready
lw $a0, 4($t0) # Get character from keyboard

The rate that characters are typed on the keyboard is very slow compared to the rate that the MIPS
processor can execute instructions. Typically, millions of instructions are executed until a key is
pressed. Register $t1 keeps track of the number of iterations in the wait_keyboard loop. The lw
instruction outside the loop gets the character from the keyboard data register, which also clears the
ready bit. A MIPS programmer can only read the keyboard data register. Writing to the keyboard
data register has no effect.
Communicating with the display controller can be done via polling in a similar way. The display
registers are mapped to addresses 0xffff0008 and 0xffff000c. As long as the ready bit is zero,
the processor keeps reading it in a loop. We must not store a character in the data register until the
display is ready to receive it. The display controller clears the ready bit when a character is stored in
the data register. It then sets the ready bit again after the character is displayed and it is ready to
receive the next character to display.
li $t0, 0xffff0008 # Address of display control register
wait_display:
lw $t2, ($t0) # Read the display control register
andi $t2, $t2, 1 # Extract ready bit
beqz $t2, wait_display # loop back while not ready
sw $a0, 4($t0) # Send character to display

8: Exceptions and I/O Page 8


The MARS simulator provides a tool to simulate the keyboard and display, as shown in Figure 8.9.
Press “Connect to MIPS” to connect this tool to the MIPS program. You must activate this tool to
communicate character-by-character with the keyboard and display. The code that communicates
with I/O devices at this level is known as a device driver. This is one of the advantages of using a
simulator to learn how to communicate directly with I/O devices.

Figure 8.9: Keyboard and Display MMIO Simulator under the MARS Tools

The main drawback of polling is that it keeps the processor busy, wasting millions of cycles before
the I/O device (such as the keyboard) becomes ready. An alternative to polling is to use interrupts.
Interrupts can be enabled for the keyboard by setting the Interrupt Enable bit in the keyboard
control register as follows:
li $t0, 0xffff0000 # Address of keyboard control register
li $t1, 2
sw $t1, ($t0) # Enable keyboard interrupt

When a key is pressed, the keyboard sends an interrupt signal to the MIPS processor and sets the
cause register $13 to the value 0x00000100. Bit 8 in the cause register is set to indicate that the
keyboard has interrupted the processor. An interrupt handler must be written to read the character
from the keyboard and returning its value to the running program. This requires modifying the code
of the exception handler shown in Figure 8.7 to deal with interrupts.
The Disk and Ethernet I/O device controllers use Direct Memory Access (DMA) to transfer blocks
of data directly between the device and memory. As controllers become more complex, more than
two registers are associated with each device controller. These devices are not part of the MARS
simulator. In general, the supplier of any I/O device must provide programmers with an explanation
of how to properly communicate with the I/O device controller.

8: Exceptions and I/O Page 9


8.6 In-Lab Tasks
1. Just before dividing $s0 by $s1, the programmer might want to check that $s1 is not equal to
zero. Use teq $s1, $zero (trap if equal) to cause an exception. What is the address of the teq
instruction in your program? What is the value of the cause register, the exception code, and
the epc when the exception occurs?
2. Use lb $t1, 5($zero) to cause an exception when attempting to load a byte from address 5.
What is the address of the lb instruction in your program? What is the value of the cause
register, the exception code, the vaddr, and the epc when the exception occurs?
3. Write a complete program that uses the exception handler of Figure 8.7 to generate multiple
exceptions. The exception handler should report the address of the instruction that caused the
exception, the exception code, and should resume the program after handling each exception.
Insert instructions that cause overflow, invalid memory addresses, trap and break instructions.
4. Modify the exception handler of Figure 8.7 to print the invalid vaddr, when it is caused by a
load, store, or instruction fetch (exception code 4 or 5). Test your exception handler by writing
load and store instructions that generate invalid memory addresses.
5. Using memory-mapped I/O and polling, write a function print_string that prints a string on
the display, without using any system call. The address of the string is passed in register $a0
and the string must be null-terminated. Test this function by calling it from the main function.
Make sure to activate the “Keyboard and Display MMIO Simulator”.
6. Using memory-mapped I/O and polling, write a program that reads characters directly from the
keyboard. To demonstrate how slow the keyboard device is, print the character pressed and the
number of iterations after exiting the wait_keyboard loop. Repeat the execution of the
program until the newline character is pressed. Make sure to activate the “Keyboard and
Display MMIO Simulator” and to run the MARS simulator at maximum speed.

7. If the keyboard interrupt enable bit is set then the keyboard will interrupt the processor each
time a key is pressed. Write a simple interrupt handler that returns the character pressed in
register $v0. Rewrite the main function of question 6 using keyboard interrupts.

8.7 Bonus Problem


8. Using memory-mapped I/O and polling, write a function read_string that reads a string
directly from the keyboard. The function will get characters from the keyboard and stores them
in an array pointed by register $a0. The function should continue until n-1 characters are read
or the newline character is pressed. The parameter n should be passed in register $a1. The
function should insert a NULL byte at the end of the string. It should return the actual number
of characters read in register $v0. Make sure to activate the “Keyboard and Display MMIO
Simulator”. Write a main function to test read_string repeatedly, and to print the string after
each call.

8: Exceptions and I/O Page 10


9 Floating-Point

9.1 Objectives

After completing this lab, you will:


• Understand Floating-Point Number Representation (IEEE 754 Standard)
• Understand the MIPS Floating-Point Unit
• Write Programs using the MIPS Floating-Point Instructions
• Write functions that have floating-point parameters and return floating-point results

9.2 Floating-Point Number Representation

Floating-point numbers have the following representation:

S E = Exponent F = Fraction
The Sign bit S is zero (positive) or one (negative).
The Exponent field E is 8 bits for single-precision and 11 bits for double-precision. The exponent
field is biased. The Bias is 127 for single-precision and 1023 for double-precision.
The Fraction field F is 23 bits for single-precision and 52 bits for double-precision. Floating-point
numbers are normalized (except when E is zero). There is an implicit 1. (not stored) before the
fraction F. Therefore, the value of a normalized floating-point number is:

Value = ± (1.F)2 × 2 E – Bias


The MARS simulator has a floating-point representation tool that illustrates single-precision
floating-point numbers. Go to Tools Floating Point Representation, and open the window,
shown in Figure 9.1.
Now use the tool to check the binary format and the decimal value of floating-point numbers.
For example, the decimal value of: 0 10000001 10110100000000000000000 is 6.75.
Similarly, the 32-bit representation of: -2.7531 is 1 10000000 01100000011001011001010.

9: Floating-Point Page 1
Figure 9.1: Floating-Point Representation tool supported by MARS

9.3 MIPS Floating-Point Registers

The floating-point unit (called coprocessor 1) has 32 floating-point registers. These registers are
numbered as $f0, $f1, …, $f31. Each register is 32 bits wide. Thus, each register can hold one
single-precision floating-point number. How can we use these registers to store 64-bit double-
precision floating-point numbers? The answer is that the 32 single-precision registers are grouped
into 16 double-precision registers. The double-precision number is stored in an even-odd pair of
registers, but we only refer to the even-numbered register. For example, when we store a double-
precision number in $f0, it is actually stored in registers $f0 and $f1.
In addition, there are 8 condition flags, numbered from 0 to 7. These condition flags are used by
floating-point compare and branch instructions. These are shown in Figure 9.2.

9: Floating-Point Page 2
Figure 9.2: MIPS Floating-Point Registers and Condition Flags

9.4 MIPS Floating-Point Instructions

The FPU supports several instructions including floating-point load and store, floating-point
arithmetic operations, floating-point data movement instructions, convert, and branch instructions.
We start this section with the floating-point load and store instructions. These instructions load into
or store a floating-point register. However, they use the same base-displacement addressing mode
used with integer instructions. Notice that the base address register is an integer (not a floating-
point) register.

Instruction Example Meaning


lwc1 or l.s lwc1 $f1,0($sp) Load a word from memory to a single-precision
floating-point register: $f1 = MEM[$sp]
ldc1 or l.d ldc1 $f2,8($t1) Load a double word from memory to a double-
precision register: $f2 = MEM[$t1+8]

9: Floating-Point Page 3
Instruction Example Meaning
swc1 or s.s swc1 $f5,4($t2) Store a single-precision floating-point register in
memory: MEM[$t2+4] = $f5

sdc1 or s.d sdc1 $f6,16($t3) Store a double-precision floating-point register in


memory: MEM[$t3+16] = $f6

The floating-point arithmetic instructions are listed next. The .s extension is used for single-
precision arithmetic instructions, while the .d is used for double-precision instructions.

Instruction Example Meaning


add.s add.s $f0,$f2,$f4 $f0 = $f2 + $f4 (single-precision)
add.d add.d $f0,$f2,$f4 $f0 = $f2 + $f4 (double-precision)
sub.s sub.s $f0,$f2,$f4 $f0 = $f2 - $f4 (single-precision)
sub.d sub.d $f0,$f2,$f4 $f0 = $f2 - $f4 (double-precision)
mul.s mul.s $f0,$f2,$f4 $f0 = $f2 × $f4 (single-precision)
mul.d mul.d $f0,$f2,$f4 $f0 = $f2 × $f4 (double-precision)
div.s div.s $f0,$f2,$f4 $f0 = $f2 / $f4 (single-precision)
div.d div.d $f0,$f2,$f4 $f0 = $f2 / $f4 (double-precision)
sqrt.s sqrt.s $f0, $f2 Square root (single-precision)
sqrt.d sqrt.d $f0, $f2 Square root (double-precision)
abs.s abs.s $f0, $f2 Absolute value (single-precision)
abs.d abs.d $f0, $f2 Absolute value (double-precision)
neg.s neg.s $f0, $f2 Negative value (single-precision)
neg.d neg.d $f0, $f2 Negative value (double-precision)

The data movement instructions move data between general-purpose and floating-point registers, or
between floating-point registers.

Instruction Example Meaning


mfc1 mfc1 $t0, $f2 Move data from a floating-point register to a general-
purpose register.
mtc1 mfc1 $t0, $f2 Move data from a general-purpose register to a
floating-point register.
mov.s mov.s $f0, $f1 Move single-precision data between two floating-
point registers.
mov.d mov.d $f0, $f2 Move double-precision data between two floating-
point registers (move even-odd pair of registers).

9: Floating-Point Page 4
The convert instructions convert the format of data in floating-point registers. Three data formats
are supported: .s = single-precision float, .d = double-precision, and .w = integer word.

Instruction Example Meaning


cvt.s.w cvt.s.w $f0,$f2 $f0 = convert $f2 from word to single-precision

cvt.s.d cvt.s.d $f0,$f2 $f0 = convert $f2 from double to single-precision

cvt.d.w cvt.d.w $f0,$f2 $f0 = convert $f2 from word to double-precision

cvt.d.s cvt.d.s $f0,$f2 $f0 = convert $f2 from single to double-precision

cvt.w.s cvt.w.s $f0,$f2 $f0 = convert $f2 from single-precision to word

cvt.w.d cvt.w.d $f0,$f2 $f0 = convert $f2 from double-precision to word

ceil.w.s ceil.w.s $f0,$f2 $f0 = Integer ceiling of single-precision float in $f2

ceil.w.d ceil.w.d $f0,$f2 $f0 = Integer ceiling of double-precision float in $f2

floor.w.s floor.w.s $f0,$f2 $f0 = Integer floor of single-precision float in $f2

floor.w.d floor.w.d $f0,$f2 $f0 = Integer floor of double-precision float in $f2

trunc.w.s trunc.w.s $f0,$f2 $f0 = Truncate single-precision float in $f2

trunc.w.d trunc.w.d $f0,$f2 $f0 = Truncate double-precision float in $f2

The floating-point compare instructions compare floating-point registers for equality, less than, and
less than or equal. The FP compare instructions set the condition flags 0 to 7 to true (1) or false(0).

Instruction Example Meaning


c.eq.s c.eq.s $f2,$f3 if ($f2 == $f3) set flag 0 to true else false

c.eq.d c.eq.s 3,$f4,$f6 Compare equal double-precision. Result in flag 3

c.lt.s c.eq.s 4,$f5,$f8 if ($f5 < $f8) set flag 4 to true else false

c.lt.d c.lt.d 7,$f4,$f6 Compare less-than double. Result in flag 7

c.le.s c.le.s $f10,$f11 if ($f10 <= $f11) set flag 0 to true else false

c.le.d c.le.d $f14,$f16 Compare less or equal double. Result in flag 0

9: Floating-Point Page 5
The floating-point branch instructions (bc1t and bc1f) branch to the target address based on the
value of the specified condition flag (true or false).

Instruction Example Meaning


bc1t bc1t label Branch to label if condition flag 0 is true
bc1t bc1t 1, label Branch to label if condition flag 1 is true
bc1f bc1f label Branch to label if condition flag 0 is false
bc1f bc1f 4, label Branch to label if condition flag 4 is false

9.5 System Call Services for Floating-Point Numbers

The MARS tool provides the following syscall service numbers (passed in $v0) to print and read
single-precision and double-precision floating-point numbers:

Service $v0 Arguments Result


Print Float 2 $f12 = float to print
Print Double 3 $f12 = double to print
Read Float 6 Float is returned in $f0
Read Double 7 Double is returned in $f0

9.6 MIPS Floating-Point Register Usage Convention

Compilers follow the MIPS register usage convention when translating functions and procedures
into MIPS assembly-language code. The following table shows the MIPS software convention for
floating-point registers. Not following the MIPS software usage convention can result in serious
bugs when passing parameters, getting results, or using registers across function calls.

Registers Usage
$f0 - $f3 Floating-point procedure results

$f4 - $f11 Temporary floating-point registers, NOT preserved across procedure calls

$f12 - $f15 Floating-point parameters, NOT preserved across procedure calls. Additional
floating-point parameters should be pushed on the stack.

$f16 - $f19 More temporary registers, NOT preserved across procedure calls.

$f20 - $f31 Saved floating-point registers. Should be preserved across procedure calls.

9: Floating-Point Page 6
9.7 In-Lab Tasks

1. Convert by hand the number -123456789 into its 32-bit single-precision binary representation,
and then use the floating-point representation tool presented in Section 9.2 to verify your
answer. Show your work for a full mark.
2. Convert by hand the floating-point number 1 10010100 10011000001100000000000
(shown in binary) into its corresponding decimal value, and then use the floating-point
representation tool presented in Section 9.2 to verify your answer. Show your work for a full
mark.
3. Trace the following program by hand to determine the values of registers $f0 thru $f9. Notice
that array1 and array2 have the same elements, but in a different order. Comment on the
sums of array1 and array2 elements computed in registers $f4 and $f9, respectively. Now
use the MARS tool to trace the execution of the program and verify your results. What
conclusion can be made from this exercise?
.data
array1: .float 5.6e+20, -5.6e+20, 1.2
array2: .float 1.2, 5.6e+20, -5.6e+20
.text
la $t0, array1
lwc1 $f0, 0($t0)
lwc1 $f1, 4($t0)
lwc1 $f2, 8($t0)
add.s $f3, $f0, $f1
add.s $f4, $f2, $f3
la $t1, array2
lwc1 $f5, 0($t1)
lwc1 $f6, 4($t1)
lwc1 $f7, 8($t1)
add.s $f8, $f5, $f6
add.s $f9, $f7, $f8

4. Write an interactive program that inputs an integer sum and an integer count, computes, and
displays the average = (float) sum / (float) count as a single-precision floating-
point number. Hint: use the proper convert instruction to convert sum and count from integer
word into single-precision float.
5. Write an interactive program that inputs the coefficient of a quadratic equation, computes, and
displays the roots of the quadratic equation. All input, computation, and output should be done
using double-precision floating-point instructions and registers. The program should handle the
case of complex roots and displays the results properly.

9: Floating-Point Page 7
6. Square Root Calculation: Newton’s iterative method can be used to approximate the square root
of a number x. Let the initial guess be 1. Then each new guess can be computed as follows:
guess = ((x/guess) + guess) / 2;
Write a function called square_root that receives a double-precision parameter x, computes,
and returns the approximated value of the square root of x. Write a loop that repeats 20 times
and computes 20 guess values, then returns the final guess after 20 iterations. Use the MIPS
floating-point register convention (Section 9.6) to pass the parameter x and to return the
function result. All computation should be done using double-precision floating-point
instructions and registers. Compare the result of the sqrt.d instruction against the result of
your square_root function. What is the error in absolute value?

9.8 Bonus Problems

7. The sine function can be approximated by the following series:

Write a function that computes the sine of a parameter x. Use the MIPS floating-point register
convention (Section 9.6) to pass the parameter x and to return the function result. All
computation should be done using double-precision floating-point instructions and registers.
Limit your computation to the first 20 terms of the series.
8. Converting a string into a floating-point number.
Write a function to convert a string, such as: "-13.232e-5" into a double-precision floating-
point number. The address of the string should be passed in register $a0. The function should
return the double-precision floating-point number in $f0. Conversion should terminate if the
end of the string is reached (NULL byte), or an invalid character is encountered, such as a
space, comma, etc.

9: Floating-Point Page 8
10 Introduction to Logisim

10.1 Objectives

• Get familiar with Logisim


• Use Logisim to model and simulate the behavior of digital logic circuits

10.2 Logisim Environment Layout

Logisim is an educational and user friendly tool. It mainly consists of an interactive graphical
schematic editor and logic simulator. Logisim has a layout similar to most available software tools (
Figure 1).

Figure 10.1: Logisim Main Window Layout

10: Logisim Tutorial Page 1


The main window consists of the following items:

• Toolbar: contains short cuts to several commonly used items


o The simulation tool: shaped like a hand, is used in simulation mode to alter input pins.
o The design tool: is used while designing the circuit.
o The input pin: green circle surrounded by a square box, is used to send a signal through a
wire. When placing the input on the canvas it initializes the input to logic 1 or 0. The
number of bits can be increased in the Attribute Table.
o The output pin: green circle surrounded by a circular shape, is used to observe the output
from a gate or a block. The output pin toggles in real time as long as the simulation is
enabled from the menu bar: Simulate > Simulation Enabled

• Explorer Pane: This pane contains the list of wiring, gates, multiplexers and other
components that are available for digital design in Logisim.

• Attribute Table: Gives detailed attributes of digital design components (e.g., AND, OR,
XOR gates). The attribute table allows you to alter the number of inputs/outputs that a
digital component may have.

• Canvas: The canvas is the area for you to create your digital circuits. In this area you may
simulate your circuits while designing in real time.

10.3 Circuit Design

Logisim operates in two modes: Design and Simulate. Logisim is first operated in edit mode while
designing a logic circuit, then in simulate mode to see how the circuit would actually work.

10.3.1 Edit Mode


• To use the edit mode, select the Design tool (Figure 10.1).
• Select then a component in the library on the left. To add it in your design, simply drag and
drop the desired component in the canvas:
o To insert a logic gate, expand the folder Gates and click on the desired logic gate.
Once the gate is selected, the mouse will turn into the shape of the selected gate.
Place the gate in the canvas or circuit area on the right.
o Each selected component has modifiable attributes in the attribute area. For example
for an AND gate, the number of input signals, the number of bits per input or output
and the size of the component can all be modified.
o It is also possible to copy and paste of one or more components in the canvas. In this
case, the components retain all the attributes previously defined.
• To connect the gates, select the arrow icon on top. Then drag from the output of one gate to
the input of another gate. You can connect to any of the small blue dots on the gate symbol.
• To create an input to the circuit, select the “Add Pin" icon with an outline of a square.
• Similarly, add an output to the circuit by using the “Add Pin" icon with an outline of a
circle.
• To assign a name to the input/output pin, click on the pin while the arrow icon is selected.
• You may then add the label for the pins.
• While components are added to the circuit, notice how the color of the wire changes (Table
10.1).

10: Logisim Tutorial Page 2


Table 10.1: Logisim Wire Color Conventions

Wire Color Meaning


dark green logical `0'
light green logical `1'
blue unknown value
gray unconnected wire
black wire has more than one bit (bus)
red conflicting values, error
orange incompatible bus width

10.3.2
0.3.2 Full Adder Design

In Logisim, a digital circuit may be designed using either one of two approaches:

• Drawing schematic: the user has to do the design by choosing the necessary components as
per his design, based on the digital schematic.
• Analyzing the circuit, by providing the truth tables with necessary inputs and outputs and let
Logisim do the design.

Schematic Design:

a. Set up the truth table for a simple Full Adder. Let A B and Cin be the inputs, S and Co the
outputs.

A B Ci S Co
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

b. Derive the logic equations for S and Cout:

= ⨁ ⨁ = ∧ + ∧ ⨁

c. Finally, design the circuit based on the above equations (Figure 10.2)

10: Logisim Tutorial Page 3


Figure 10.2: Full Adder Circuit Design

Circuit Analysis:

From the Menu Bar select "Window" then "Combinational Analysis". You may also go to Project
then, select "Analyze Circuit".
a. Create three inputs named Cin, A, and B.
b. Create two outputs named Cout and S.
c. Complete the truth table for a full adder.
d. Look at the "Expression" and "Minimized" tabs.
e. Press the "Build Circuit" button and enter FA0 as the name of the new circuit.

Once your design is complete, you may proceed by testing its functionality:

a. Use the Simulation Tool in the Toolbar.


b. Click on an input to see its state toggle between the values 0 and 1
c. While doing so, check the outputs that they correspond exactly to your design.

10.3.3
10.3.3 Test Mode

Select the Simulation Tool (Figure: 10.1) so you can test the circuit. You can toggle the value of an
input pin by clicking on it. Note that the values of all subsequent wires change instantly.

a. Change the values of the Ci, A, and B inputs


b. Observe the Co and S outputs to verify the correct operation of the circuit

Probes:

As you test your circuit you may also add probes to check the values on wires or buses. These
values may be displayed in different formats.
a. From the explorer pane, select Wiring then Probe.
b. Set a direction for your probe,
c. Then select the numbering system that you would like to use to display your data.
10: Logisim Tutorial Page 4
Figure 10.3: Full Adder Circuit Designed Through Analysis

10.4 Hierarchical Design

A complex circuit is usually broken down into smaller sub-circuits. These sub-circuits are designed
independently, tested, arranged into libraries, and then used to build the big circuit.

10.4
10.4.1 Circuit Appearance

Later on, you may use your circuit as a sub-circuit in another design. For that purpose, you may
want to alter the locations of the inputs and outputs or the appearance of the whole circuit as a
module. To do so, while your design is on the editor, click the Appearance Tool under the toolbar
(Figure 10.4). This will show your circuit as a block with the inputs and outputs; once you click any
of the input/output pins, on the bottom right corner your circuit will appear showing the location of
the pin you selected.

10: Logisim Tutorial Page 5


Figure 10.4: Design and Appearance Tools

You may than modify the following properties of your circuit appearance: pin locations, general
appearance, and color.

10.4
10.4.2 Using Sub-
Sub-Circuits

One of the good features of Logisim is that you can use a circuit you have already built as a
building block in another, more complex circuit.

You can create a new sub-circuit with (Project > Add Circuit)
a. Save all your sub-circuits in one folder that you will make your own library.
b. On the menu bar select load Project > Load Library > Logism-Library then select your
folder and the module you would like to add.
c. Your circuit will appear as a new element in the explorer pane. Select your module then
drag and drop.
d. You may then add pins for input and output, or connect your module to your circuit using
wires.

Within the newly created full adder FA0 circuit designed above (Figures 10.2 or 10.3), change the
orientation of the Ci input so that it is facing south. Change the orientation of the Co output so that
it is facing north (Figure 10.5). Your circuit is ready to be used as a module or sub-circuit in a more
complex digital system design.

Figure 10.5: Full Adder Block

10.4.3
10.4.3 Splitters and Tunnels:

These are components mainly used to simplify wiring. Splitters are used to group or separate bits in
a bus, and tunnels are used to avoid lengthy connections between components in the design.

Splitters:

10: Logisim Tutorial Page 6


Wire splitters split a multi-bit wire into smaller wires (an 8-bit wire into 2 x 4-bit wires, for
example). They can also combine multiple wires into a single wire with a wider bit width.
• To use a splitter from the menu select (Wiring > Splitter)
• Set the fan out and the bit width in properties.
o "Fan Out" controls how many teeth the wire splitter has.
o "Bit Width In" controls the actual size of the data the splitter is splitting.
• If you have more data bits than outputs, some of the fan outs would carry more than one
data bit.

Tunnels:

Tunnels are you to transmit signals without using wires.

• From the menu, select (Wiring > Tunnel)


• Tunnels are mainly used to avoid excessive wiring, but can be used to connect control and
clock signals.

10.5 In-Lab Design


10.5
10.5.1 A 32-
32-Bit Adder Design:

• Using the full adder already designed as a block, we intend to design an 8-bit ripple carry
adder.
a. Add three splitters to the circuit. Each splitter should have an input width of 8 bits and
a fan out of 8.

10: Logisim Tutorial Page 7


b. Attach two east-facing splitters to the 8-bit inputs A and B.
c. Attach a west-facing splitter to the 8-bit output S.
d. Create eight instances of the FA circuit.
e. Connect the S outputs of the eight FA instances to the splitter for the 8-bit S output.
f. Connect the carry inputs and outputs of the eight FA instances so that carries will
propagate appropriately from the Ci input, through the 1-bit adders, to the Co output.
g. Connect the A inputs of the eight FA instances to the splitter for the A input.
h. Connect the B inputs of the eight FA instances to the splitter for the B input.
i. Test your circuit

• Finally save your design as 1 module or sub-circuit


• Using the above adder as a block, design a 32 bit adder then test it.

10.5
10.5.2 A 2-Bit Multiplier Design:

Using the Combinational Analysis option in Logisim, design a 2 bit multiplier. First you need to set
up the truth table of your multiplier by choosing appropriate names for your inputs and outputs.
Derive the logical equations for all your outputs. Use the Design option to let Logisim design the
multiplier. Test your circuit and make it as one complete module. Compare your design to the
multiplier provided in Logisim.

10.5
10.5.3 Bonus Question:
Question:

Make use of your multiplier to design a 4 bit multiplier, test it on compare to the built-in multiplier
provided in Logisim.

10: Logisim Tutorial Page 8


10 Introduction to Logisim

10.1 Revised In-Lab Tasks

1. 2×2 Multiplier Design (2 pts)

Design a 2-bit multiplier in logisim. First you need to set up the truth table of your multiplier by
choosing appropriate names for your inputs and outputs. Derive the logical equations for all your
outputs. Design the 2-bit multiplier using logic gates. Test your circuit and make it as one complete
module. Compare your design to the multiplier provided in Logisim.

2. Register File (3 pts)

Design an 8×32-bit register file. The register file should contain 8 registers: R0 to R7 (each having
32 bits). It should have two read ports to read any two registers and one write port. Model the
register file in Logisim. Test the register file for correct operation by reading and writing different
register combinations.

3. ALU Design (5 pts)

Design a 32-bit ALU to perform the following eight operations: ADD, SUB, SLT, SLTU, AND,
OR, XOR, and NOR. The ALU should have two 32-bit inputs A and B, and a 3-bit input function f.
It should have a 32-bit output result, and four flags: output carry, overflow, negative sign, and zero
output. Model the ALU in Logisim. Test the ALU for correct operation by providing different
inputs and operations.

Bonus: 4×4 Multiplier Design (2 pts)

Design and model a 4×4 multiplier using four 2×2 multipliers and additional adders. Test the 4×4
multiplier using different inputs.

10: Introduction to Logisim Page 1


Data Path Main
11 Components Design

11.1 Objectives

After completing this lab, you will:


• Design a 32x 32 bit register file
• Design a 32 bit arithmetic and logic unit (ALU)

11.2 Register File


The register file consists of 32 x 32-bit registers and has the following interface as shown in Figure
11.1:
BusA and BusB: 32-bit output busses for reading 2 registers
BusW: 32-bit input bus for writing a register when RegWrite is 1
RA selects register to be read on BusA
RB selects register to be read on BusB
RW selects the register to be written

Thus, two registers are read and one register is written in a single cycle. Writing happens on the
rising edge of the clock. During read operation, the register file bahaves as a combinational block
and once the RA or RB have valid data, the content of the read register will appear on BusA or
BusB after a certain access time.

Figure 11.1: 32x32-bit register file interface.

11. Data Path Main Components Design Page 1


The 32 x 32-bit register file design is given in Figure 11.2. It should be observed that register 0 is a
constant. Each of BusA and BusB is connected to 32 tri-state buffers. Each tr-state buffer is
connected to one of the 32 registers. The tri-state buffers enable signals are driven by the outputs of
two 5x32 decoders, one with Ra input and the other with Rb input, to select which register puts its
value on the corresponding bus. The enable signals of the 31 registers (register 1 to 31) are anded
with the output of a 5x32 decoder and RegWrite signal. The 5x32 decoder input is connected to Rw
to select which register should be written.

Figure 11.2: 32x32-bit register file design.

11.3 Arithmetic and Logic Unit (ALU) Design

The Arithmetic and Logic Unit (ALU) is the unit where most instructions are executed. It mainly
performs arithmetic, logic and shift operations. The ALU will have four main blocks: Arithmetic,
Comparator, Logic, and Shifter blocks as illustrated in Figure 11.3.

Arithmetic Block
The arithmetic block is composed of a 32-bit adder that can perform 32-bit addition and subtraction.
Its internal implementation can be designed using a ripple carry adder composed of 32 full-adder
blocks. The inputs A and B are two 32- bit integers and the output F is A+B or A-B. The arithmetic
block has a control signal, ADD/SUB, to determine whether the operation to be performed is
addition or subtraction. If this signal is 0, the adder will perform addition, otherwise it will perform
subtraction. Note that subtraction is performed using 2's complement representation as A-B= A + B'
+ 1. B' is computed by the XOR gates in the arithmetic block. The adder also generates Carry-Out
(Cout) and Overflow signal that can be used to test for correctness of the obtained result and for

11. Data Path Main Components Design Page 2


comparison purposes for unsigned and signed operations. Note that Overflow can be generated by
XORing the last two carry-out signals (i.e. the carry-outs of bits 30 and 31).

Figure 11.3: Arithmetic and Logic Unit (ALU) design.

Comparator Block
This block is mainly used for comparing signed and unsigned numbers. For the MIPS CPU, we just
need to compare if a number is less than another number for implementing the set on less than
instructions (SLT, SLTU, SLTI, SLTIU). For unsigned comparison of two numbers A and B, we
need to perform a subtraction operation, A- B, and then check whether we have a Carry-Out (Cout)
or not. If Carry-Out=0, this means that there was a borrow when B is subtracted from A and thus A
< B. However, if Carry-Out=1 then this implies that there was no borrow and hence A ≥ B.

Similarly, for comparing signed numbers A and B, we perform the subtraction operation A – B and
then we check both the Sign of the result and the Overflow signal. The Sign of the result is the most
significant bit of the result (i.e. but 31). If the Sign value is not equal to the Overflow value, then A
< B, otherwise, A ≥ B. This can be done by XORing the Sign and the Overflow signals. If the result
is 1, this means that A < B.

11. Data Path Main Components Design Page 3


Logical
Logical Block
The logical block performs the logic operations, AND, OR, NOR, and XOR, for implementing the
MIPS logic instructions. Functions in this block are performed using the basic logic gates. Logic
gates can have up to 32 inputs in Logisim, and each input may have up to 32 bits. In our case, the
gates are implemented as gates having 2 inputs, each having 32 bits. Figure 11.4 shows a model of a
2-input 8-bit AND gate in Logisim.

Figure 11.4: A 2-input 8-bit AND Block.

Shifter
Shifter Block
The shifter block is used to implement MIPS shift instructions (SLL, SRL, SRA). A shift operation
takes a binary number and shifts it left or right by a specified number of bits. There are two main
kinds of shift operations: logical,and arithmetic.
• Logical shift: whenever bits are shifted to the left or right, 0's are injected.
• Arithmetic shift: when bits are shifted to the left, 0's are injected, however when bits are
shifted to the right the sign bit (i.e., most significant bit) is injected.

The functionality of logical and arithmetic shit instructions is illustrated in Figure 11.5.

Figure 11.5: Functionality of logical and arithmetic shift instructions.

Logisim provides blocks for performing shift operations that can be used in the design of the Shifter
block.

11. Data Path Main Components Design Page 4


Multiplexor Block

A 32-bit 4x1 Multiplexor is used to select the output from either the Arithmetic block, the
Comparator block, the Logical block or the Shifter block. This is done through a 2-bit ALU
selection signal.

Zero Flag Detector Block


The role of this block is to set the zero-flag bit to 1 whenever the output of any operation is equal to
zero. This could easily be designed using a NOR gate at the output.

11.4 In-Lab Tasks

You are required to design a 32-bit MIPS-like processor with 31 general-purpose registers. The first
building blocks of the CPU are the ALU and the register file.

1. Task 1:

- Model the 32x32-bit register file given in Figure 11.2 as one single module in Logisim
- Test the register file for correct operation by writing to and reading from different register
combinations.

2. Task 2: Arithmetic and Logical Unit (ALU) Design

- Design a 32-bit ALU to perform all the arithmetic, logic and shift operations required by
your data path
- Model the your designed 32-bit ALU in Logisim
- Test the correct functionality of all operations implemented by the ALU.

11. Data Path Main Components Design Page 5


11.5 Bonus Question

One possible implementation of the shifter known as the Barrel Shifter is given in Figure 11.6. This
architecture has the advantage of performing a number of operations using the same hardware. You
are required to design such shifter and adapt it to your design. You need then to use it instead of the
shifter made up of available shifters in Logisim.

The shifter is implemented with multiplexers and wiring (splitters in our design), the shift
operations can be: SLL, SRL, SRA, or ROR. The input data is extended to 63 bits according to Shift
Op, and the 63 bits are shifted right according to S4S3S2S1S0

Figure 11.6: Barrel shifter.

The Input data is extended from 32 to 63 bits as follows:

• If shift op = SRL then ext_data[62:0] = {031 , Data[31:0]}

• If shift op = SRA then ext_data[62:0] = {Data[31]31 , Data[31:0]}

• If shift op = ROR then ext_data[62:0] = {Data[30:0] , Data[31:0]}

• If shift op = SLL then ext_data[62:0] = {Data[31:0] , 031}

- For SRL, the 32-bit input data is zero-extended to 63 bits

- For SRA, the 32-bit input data is sign-extended to 63 bits

- For ROR, 31-bit extension = lower 31 bits of data

- Then, shift right according to the shift amount

11. Data Path Main Components Design Page 6


- As the extended data is shifted right, the upper bits will be: 0 (SRL), sign-bit (SRA), or
lower bits of data (ROR)

11. Data Path Main Components Design Page 7


Single-Cycle CPU
12 Design
12.1 Objectives

After completing this lab, you will:


• Learn how to design a single-cycle CPU
• Verify the correct operation of your single-cycle CPU design

12.2 Subset of the MIPS Instructions included in CPU Design

In this section, we will illustrate the design of a single-cycle CPU for a subset of the MIPS
instructions, shown in Table 12.1. These include the following instructions:

ALU instructions (R-type): add, sub, and, or, xor, slt


Immediate instructions (I-type): addi, slti, andi, ori, xori
Load and Store (I-type): lw, sw
Branch (I-type): beq, bne
Jump (J-type): j

Although this subset does not include all the integer instructions, it is sufficient to illustrate the
design of datapath and control. Concepts used to implement the MIPS subset are used to construct a
broad spectrum of computers. For each instruction to be implemented, you need to identify all the
steps that need to be performed for the execution of each instruction expressed in register transfer
level (RTL) notation. These steps are summarized below for all the instructions to be implemented:

R-type Fetch instruction: Instruction ← MEM[PC]


Fetch operands: data1 ← Reg(Rs), data2 ← Reg(Rt)
Execute operation: ALU_result ← func(data1, data2)
Write ALU result: Reg(Rd) ← ALU_result
Next PC address: PC ← PC + 4

12: Single-Cycle CPU Design Page 1


Table 12.1: MIPS instructions subset implemented in CPU design.

I-type Fetch instruction: Instruction ← MEM[PC]


Fetch operands: data1 ← Reg(Rs), data2 ← Extend(imm16)
Execute operation: ALU_result ← op(data1, data2)
Write ALU result: Reg(Rt) ← ALU_result
Next PC address: PC ← PC + 4
BEQ Fetch instruction: Instruction ← MEM[PC]
Fetch operands: data1 ← Reg(Rs), data2 ← Reg(Rt)
Equality: zero ← subtract(data1, data2)
Branch: if (zero) PC ← PC + 4 + 4×sign_ext(imm16)
else PC ← PC + 4
LW Fetch instruction: Instruction ← MEM[PC]
Fetch base register: base ← Reg(Rs)
Calculate address: address ← base + sign_extend(imm16)
Read memory: data ← MEM[address]
Write register Rt: Reg(Rt) ← data
Next PC address: PC ← PC + 4
SW Fetch instruction: Instruction ← MEM[PC]
Fetch registers: base ← Reg(Rs), data ← Reg(Rt)

12: Single-Cycle CPU Design Page 2


Calculate address: address ← base + sign_extend(imm16)
Write memory: MEM[address] ← data
Next PC address: PC ← PC + 4
Jump Fetch instruction: Instruction ← MEM[PC]
Target PC address: target ← PC[31:28] , Imm26 , ‘00’
Jump: PC ← target

12.3 Data Path Design

The first step in designing a datapath is to determine the requirements of the instruction set in terms
of components. These include the following:

Memory
Instruction memory where instructions are stored
Data memory where data is stored
Registers
32 × 32-bit general purpose registers, R0 is always zero
Read source register Rs
Read source register Rt
Write destination register Rt or Rd
Program counter PC register and Adder to increment PC
Sign and Zero extender for immediate constant
ALU for executing instructions

The needed components are summarized below:

Combinational Elements
ALU, Adder
Immediate extender
Multiplexers
Storage Elements
Instruction memory
Data memory
PC register
Register file

12: Single-Cycle CPU Design Page 3


We can now assemble the datapath from its components. For instruction fetching, we need:

Program Counter (PC) register


Instruction Memory
Adder for incrementing PC

The implementation of the instruction fetch process is illustrated in Figure 12.1. Since all the MIPS
instructions are 32-bit instructions (i.e. each instruction is stored in 4 address locations) and since
the instruction memory will be aligned on 4-byte boundary, the least significant 2-bits of instruction
addresses will always be 0. Thus, it is sufficient the update the most significant 30 bits of the PC.

Figure 12.1: Data path component for instruction fetching.

To execute R-type instructions, we need to read the content of registers Rs and Rt, perform an ALU
operation on their contents and then store the result in the register file to register Rd. The datapath
for executing R-type instructions is shown in Figure 12.2.

Figure 12.2: Data path implementation of R-type instructions.

12: Single-Cycle CPU Design Page 4


The control signals needed for the execution of R-type instructions are:

ALUCtrl is derived from the funct field because Op = 0 for R-type


RegWrite is used to enable the writing of the ALU result

The execution of the I-type instructions is similar to the R-type instructions with the difference that
the second operand is an immediate value instead of a register and that the destination register is
determined by Rt instead of Rd. The 16-bit immediate value needs to be extended to a 32-bit value
by either adding 16 0's or by extending the sign bit. The datapath for the execution of I-type
instructions is given in Figure 12.3.

Figure 12.3: Data path implementation of I-type instructions.

The control signals needed for the execution of I-type instructions are:

ALUCtrl is derived from the Op field


RegWrite is used to enable the writing of the ALU result
ExtOp is used to control the extension of the 16-bit immediate

Next we combine the datapath for executing both the R-type and I-type instructions as shown in
Figure 12.4. A multiplexer is added to select between Rd and Rt to be connected to Rw in the
register file to determine the destination register. Another multiplexer is added to select the second
ALU input as either the source register Rt data on BusB or the extended immediate.

The control signals needed for the execution of either R-type or I-type instructions are:

ALUCtrl is derived from either the Op or the funct field


RegWrite enables the writing of the ALU result
ExtOp controls the extension of the 16-bit immediate
RegDst selects the register destination as either Rt or Rd
ALUSrc selects the 2nd ALU source as BusB or extended immediate

12: Single-Cycle CPU Design Page 5


Figure 12.4: Data path implementation of R-type and I-type instructions.

To execute the load and store instructions, we need to add data memory to the datapath. For the load
and store instructions, the ALU will be used to compute the memory address by adding the content
of register Rs coming through BusA and the sign-extended immediate value. For the load
instruction, we need to write the output of the data memory to register file. Thus, a third multiplexer
is added to select between the output of the ALU and the data memory to be written to the register
file. BusB is connected to Datain of Data Memory for store instructions. The updated CPU with the
capability for executing load and store instructions is shown in Figure 12.5.

Figure 12.5: Data path implementation with load/store instructions.

The additional control signals needed for the execution of load and store instructions are:

MemRead for load instructions


MemWrite for store instructions
MemtoReg selects data on BusW as ALU result or Memory Data_out

For executing jump and branch instructions, we need to add a block, called NextPC, to compute the
target address. In addition, we need to add a multiplexer to select the input to the PC register to be
either the incremented PC address or the target address generated by NextPC block. For branch

12: Single-Cycle CPU Design Page 6


instructions, the ALU is used to perform subtract operation to subtract the content of the two
compared registers Rs and Rt. The updated data path to include the execution of the jump and
branch instructions is given in Fig 12.6.

Figure 12.6: Data path implementation with jump/branch instructions.

The additional control signals needed for the execution of jump and branch instructions are:

J, Beq, Bne for jump and branch instructions


Zero condition of the ALU is examined
PCSrc = 1 for Jump & taken Branch

The details of the NextPC block are illustrated in Fug. 12.7. For the jump instruction, the target
address is computed by concatenating the upper 4 bits of PC with Imm26 (i.e. the 26-bit immediate
value). However, for branch instructions the target address is computed by adding the sign-extended
version of the 16-bit immediate value with the incremented value of PC. Note that the immediate
value is computed by the assembler as [Terget – (PC + 4)]/4. Thus, to restore the target address we
need to multiply the immediate value by 4 (i.e. shift it 2 bits to the left) and then add PC+4 to it.
Since we are updating the most significant 30-bits of PC, this is achieved by adding PC+1 to the
immediate value. The PCSrc signal is set when a branch instruction is taken or a jump instruction is
executed, which is implemented by the equation PCSrc = J + (Beq . Zero) + (Bne . Zero').

12: Single-Cycle CPU Design Page 7


Figure 12.7: Implementation of Next PC block.

12.4 Control Unit Design

The control unit of the single-cycle CPU can be decomposed into two parts Main Control and ALU
Control. The Main Control unit receives a 6-input opcode and generates all the needed control
signals other than the ALU control. However, the ALU Control gets a 6-bit function field from the
instruction and ALUCtrl signal from the Main Control. The single cycle CPU including the
datapath and control unit is illustrated in Figure 12.8.

Figure 12.8: Single-cycle CPU.

12: Single-Cycle CPU Design Page 8


To design the Main Control unit, we need to generate the control table which lists for each
instruction, the control values needed to execute the instruction. This is illustrated in Table 12.2.

Table 12.2: Main Control Signal Values.

One we have the Control Table, the control unit can be design easily using a 6x64 decoder that has
the 6-bit opcode as input and a signal for each instruction as output. Then each control signal will
be either an OR gate of the instructions signals that make this signal 1 or a NOR gate of the
instructions signals that make this signal 0, which ever results in a smaller gate size. The decoder
and the logic equations for the Main Control signals are shown in Figure 12.9.

Figure 12.9: Main control unit design.

12: Single-Cycle CPU Design Page 9


Similarly, the ALU control signals equations can be derived based on the 6-bit function field and
the ALUOp signal generated by the Main Control unit.
It should be observed that the control unit signals equation can also be derived using K-map
technique without using a decoder. However, using a decoder makes the design of the control unit
simple.

12.5 In-Lab Tasks

1. For the instructions in the CPU that you are going to design, list all the steps that are needed
for the execution of each instruction in RTL notation.
2. Ensure that you have all the needed components for constructing your datapath.
3. Design the datapath for your CPU and model it using logisim.
4. Apply the needed values for the control signals needed for the execution of each instruction
to ensure correct functionality of the datapath.
5. Design the control unit of your CPU and model it using logisim.
6. Test the correct functionality of the control unit by ensuring that it generates the correct
control signal values for each instruction.
7. Model the single cycle CPU design in logisim by combining the datapath and control units.
8. Test the correct functionality of your CPU by storing all the implemented instructions in the
instruction memory and verifying the correct execution of each instruction.

12: Single-Cycle CPU Design Page 10


Pipelined CPU Design
13 with Data Forwarding
13.1 Objectives

After completing this lab, you will:


• Learn how to design a pipelined CPU
• Learn the different types of pipeline hazards
• Implement Forwarding to handle data hazards
• Verify the correct operation of your pipelined CPU design

13.2 Pipeline Data Path

The single-cycle data path design can be pipelined into a 5-stage pipeline by introducing registers at
the end of each stage as shown in Figure 13.1.

Figure 13.1: Pipelined Data Path.

It should be observed that the destination register number is also pipelined by saving it across stages
as the writing of the content of the register is done at stage 5. In addition, the incremented value of
PC is also pipelined across stage 2 and 3 as it is need by the Next PC block.

13: Pipelined CPU Design with Data Forwarding Page 1


13.3 Pipelined Control

The control signals are generated by the control unit in the second stage (i.e. ID state). In order to
pipeline the control unit, we need to save all the control signals needed by the later stages in
registers. For example, the control signals ExtOp, ALUSrc, ALUCtrl, J, Beq, Bne, MemRead,
MemWrite, Memtoreg and RegWrite need to be saved in the register separating stage 2 and stage 3.
However, only the control signals MemRead, MemWrite, Memtoreg and RegWrite are saved in the
register separating stages 3 and 4 as the remaining control signals are used in stage 3 and are not
needed in stages 4 and 5. The pipelined data path and control unit CPU is shown in Figure 13.2.

Figure 13.2: Pipeline Data Path and Control logic.

13.4 Pipeline Hazards

Hazards are situations that would cause incorrect execution if next instruction were launched during
its designated clock cycle. Hazards can be classified into three main types:

1. Structural hazards
Caused by resource contention

13: Pipelined CPU Design with Data Forwarding Page 2


Using same resource by two instructions during the same cycle

2. Data hazards
An instruction may compute a result needed by next instruction
Hardware can detect dependencies between instructions

3. Control hazards
Caused by instructions that change control flow (branches/jumps)
Delays in changing the flow of control

Hazards complicate pipeline control and limit performance. Dependency between instructions
causes a data hazard. An example of a data hazard is Read After Write – RAW Hazard. An example
of a RAW hazard is given below. Given two instructions I and J, where I comes before J.
Instruction J should read an operand after it is written by I.

I: add $s1, $s2, $s3 # $s1 is written

J: sub $s4, $s1, $s3 # $s1 is read

The RAW Hazard occurs when J reads the operand before I writes it.

Figure 13.3 shows a sample MIPS program with several RAW hazards. The result of sub instruction
is needed by the add, or, and, & sw instructions. The instructions add & or will read old value of
$s2 from reg file as the value of $s has not been updated in the reg file yet. During CC5, $s2 is
written at the end of the cycle, and thus the old value is read. Thus, from this example we can see
that any dependency between an instruction and any of the three following instructions will cause a
RAW hazard.

Figure 13.3: Example of RAW hazard.

13: Pipelined CPU Design with Data Forwarding Page 3


13.5 Handling RAW Data Hazards

One way of handling RAW data hazards is to stall the pipeline until the required destination register
is updated in the reg file. This requires freezing the execution of instructions that have such
dependency for three clock cycles to all the reg file to be updated. Figure 13.4 illustrates an
example of that. Due to the RAW dependency between the add and sub instructions, fetching the
operands of the add instruction has to wait until register $s2 of the sub instruction is updated. This
requires stalling the pipeline for three clock cycles from CC3 to CC5. Stall cycles delay the
execution of the add instruction & fetching of the or instruction. The add instruction cannot read
$s2 until beginning of CC6. The add instruction remains in the Instruction register until CC6 and
the PC register is not modified until the beginning of CC6.

However, instead of stalling the pipeline and wasting clock cycles, RAW hazards can be handled by
observing that the needed data is available in one of the stages 3 to 5 and can be used by forwarding
it to stage 2 instead of waiting until the data is written to the reg file. This idea is illustrated in
Figure 13.5.

Figure 13.4: Pipeline stall due to RAW hazard.

The add instruction takes the content of $s2 from the ALU output. The or instruction takes the
content of $s2 from the output of the DM stage. The and instruction takes the content of $s2 from
the input of the reg file stage 5 and the content of $s6 from the ALU output at stage 2.

To implement forwarding, two multiplexers are added at the inputs of the A & B registers and data
from ALU stage, MEM stage, and WB stage is fed back to these multiplexers as shown in Figure
13.6. Two signals ForwardA and ForwardB control forwarding as shown in Table 13.1.

13: Pipelined CPU Design with Data Forwarding Page 4


Figure 13.5: Example of data forwarding.

It should be observed that current instruction being decoded is in the Decode stage, the previous
instruction is in the Execute stage, the second previous instruction is in the Memory stage and the
third previous instruction in the Write Back stage. Thus, RAW data hazards detection conditions
and the generation of the forwarding control signals can be done as follows:

If ((Rs != 0) and (Rs == Rd2) and (EX.RegWrite)) ForwardA 1


Else if ((Rs != 0) and (Rs == Rd3) and (MEM.RegWrite)) ForwardA 2
Else if ((Rs != 0) and (Rs == Rd4) and (WB.RegWrite)) ForwardA 3
Else ForwardA 0

If ((Rt != 0) and (Rt == Rd2) and (EX.RegWrite)) ForwardB 1


Else if ((Rt != 0) and (Rt == Rd3) and (MEM.RegWrite)) ForwardB 2
Else if ((Rt != 0) and (Rt == Rd4) and (WB.RegWrite)) ForwardB 3
Else ForwardB 0

The hazard detection and forwarding unit is shown in Figure 13.7.

13: Pipelined CPU Design with Data Forwarding Page 5


Table 13.1: Data forwarding signals.

Signal Explanation

ForwardA = 0 First ALU operand comes from register file = Value of (Rs)

ForwardA = 1 Forward result of previous instruction to A (from ALU stage)

ForwardA = 2 Forward result of 2nd previous instruction to A (from MEM stage)

ForwardA = 3 Forward result of 3rd previous instruction to A (from WB stage)

ForwardB = 0 Second ALU operand comes from register file = Value of (Rt)

ForwardB = 1 Forward result of previous instruction to B (from ALU stage)

ForwardB = 2 Forward result of 2nd previous instruction to B (from MEM stage)

ForwardB = 3 Forward result of 3rd previous instruction to B (from WB stage)

Figure 13.6: Implementation of data forwarding.

13: Pipelined CPU Design with Data Forwarding Page 6


Figure 13.7: Hazard detection and forwarding unit.

13.6 In-Lab Tasks

1. Implement RAW hazard detection and the forwarding unit in your pipelined CPU design.
2. Add pipeline registers to your data path CPU design.
3. Add pipeline registers to pipeline the control signals in your CPU design.
4. Verify the correctness of your pipelined CPU design by executing the following instruction
sequence:

ori $s1, $0, 1


addi $s2, $0, 2
xor $s3, $s3, $s3
andi $s4, $0, $0
addi $s5, $s1, 5
add $s6, $s1, $s2
sub $s7, $s1, $s2

How many clock cycles your pipelined CPU takes to execute this program?

13: Pipelined CPU Design with Data Forwarding Page 7


5. Add the two multiplexers needed to implement data forwarding.
6. Implement the forwarding unit and add it to your pipelined CPU.
7. Verify the correctness of your pipelined CPU design including data forwarding unit by
executing the following instruction sequence:

ori $s1, $0, 1


addi $s2, $0, 2
ori $s3, $0, 3
sub $s4, $s3, $s1
add $s5, $s4, $s2
or $s6, $s4 $s5
and $s7, $s3, $s4
sw $s7, 10($s4)

13: Pipelined CPU Design with Data Forwarding Page 8


Pipelined CPU Design
14 with Stall Capability
14.1 Objectives

After completing this lab, you will:


• Implement pipeline stall to handle RAW hazard after Load instruction
• Implement pipeline stall for handling control hazards

14.2 Handling RAW Hazard after Load

Unfortunately, not all data hazards can be forwarded. Load has a delay that cannot be eliminated by
forwarding. In the example shown in Figure 14.1, the LW instruction does not read data until the
end of CC4. Thus, data cannot be forward to ADD instruction at the end of CC3. However, the
Load instruction can forward data to the 2nd next and later instruction.

Figure 14.1: RAW data hazard following a Load instruction.

Whenever a RAW hazard after a Load instruction occurs, we have no choice but to stall the
pipeline. Stalling the pipeline requires that the instruction in the ID stage be replaced by a NOP (No
Operation) instruction and freezing the content of PC and Instruction registers for one clock cycle.
This is achieved by replacing the control signals by 0 signals (called a Bubble) and disabling the PC
and IR registers as illustrated in Figure 14.2. The Condition for stalling the pipeline is as follows:

14: Pipelined CPU Design with Stall Capability Page 1


if ((EX.MemRead == 1) // Detect Load in EX stage
and (ForwardA==1 or ForwardB==1)) Stall // RAW Hazard

Figure 14.2: Pipeline stall due to data hazard following a Load instruction.

14.3 Handling Control Hazards

Jump and Branch can cause great performance loss. Jump instruction needs only the jump target
address while Branch instruction needs two things:
• Branch Result Taken or Not Taken
• Branch Target Address
o PC + 4 If Branch is NOT taken
o PC + 4 + 4 × immediate If Branch is Taken
Jump and Branch targets are computed in EX stage at which point two instructions have already
been fetched. For Jump, the two instructions need to be flushed. For Branch, the two instructions
are flushed if Branch is Taken. Figure 14.3 illustrates an example of handling a control hazard by
replacing the two fetched instructions by bubbles. Control logic detects a Branch instruction in the
2nd Stage. The ALU computes the Branch outcome in the 3rd Stage. Next1 and Next2 instructions
will be fetched anyway. Thus, we need to convert Next1 and Next2 into bubbles if branch is taken.

14: Pipelined CPU Design with Stall Capability Page 2


Figure 14.3: Example of handling a control hazard.

Converting the instruction in the 2nd stage into a bubble can be achieved by replacing the control
signals by a bubble. Converting the instruction in 1st stage into a NOP instruction is achieved by
Resetting the IR register using synchronous reset. The reset is signal is connected to the PCSrc
signal which is set when the branch is taken. Figure 14.4 illustrates the implementation of handling
control hazards.

Figure 14.4: Handling control hazards.

14: Pipelined CPU Design with Stall Capability Page 3


14.4 In-Lab Tasks

1. Implement handling of RAW hazard occurring after LOAD instruction in your pipelined CPU
design.
2. Test the correctness of your implementation by running the MIPS code fragment given below and
verify its correct execution:

I1: ADDI $s0, $0, 10


I2: ADDI $s1, $0, 5
I3: SLL $s0, $s0, 4
I4: LW $s2, 0($s0)
I5: ADD $s2, $s2, $s1
I6: SW $s2, 4($s0)

3. Implement control hazard handling in your pipelined CPU design.


4. Test the correctness of your implementation of handling control hazards by running the following
MIPS code fragment and verify its correct execution:

I1: ADDI $a1, $0, 10


I2: ADD $t5, $0, $0
I3: LW $t2, 0($t5)
I4: ADD $t3, $t2, $t2
I5: SW $t3, 0($t5)
I6: ADDI $a1, $a1, -1
I7: ADDI $t5, $t5, 4
I8: BNE $a1, $0, I3

14: Pipelined CPU Design with Stall Capability Page 4

You might also like