Computer Architecture Lab Manual
Computer Architecture Lab Manual
Computer Architecture Lab Manual
1.1 Objectives
MARS, the MIPS Assembly and Runtime Simulator, is an integrated development environment
(IDE) for programming in MIPS assembly language. It allows editing, assembling, debugging and
simulating the execution of MIPS assembly language programs. MARS is written in Java.
To switch between the Edit and the Execute windows, use the tabs at the top.
There are two tabbed message areas at the bottom of Figure 1.1:
1. The Mars Messages tab: Used for messages such as assembly or runtime errors and
informational messages. You can click on assembly error messages to select the
corresponding line of code in the editor.
2. The Run I/O tab: Used at runtime for displaying console output and entering console input as
program execution progresses.
Figure 1.2 shows the MARS Execute window’s panes, and emphasizes the following features:
4. Controls for navigating the data memory area. Allows switching to view the stack segment.
5. Switching between decimal and hexadecimal addresses and values in memory and registers.
8. Checkboxes used to setup breakpoints for each MIPS instruction. Useful in debugging.
At all times, the MIPS register window appears on the right-hand side of the screen, even when you
are editing and not running a program. While writing a program, this serves as a useful reference for
register names and their use. Move the mouse over the register name to see the tool tips.
The Register File: integer registers $0 through $31, HI, LO, and the Program Counter PC.
If there are syntax errors in the program, they will appear in the Mars Messages tab at the bottom of
the MARS screen. Each error message contains the line and column where the error occurred.
Once a MIPS program assembles successfully, the registers are initialized, and the Text Segment
and the Data Segment are filled, as shown in Figure 1.3.
After running the Assemble command, you can now execute the program. The Run menu and the
toolbar contain the following execution options:
Run > Go Run the program to completion, or until the next breakpoint.
Run > Backstep Single-step backwards: “unexecute” the last executed instruction.
The Run Speed Slider allows running the program at full speed or
slowing it down so you can watch the execution. Affects normal
execution only, not single-step execution.
During execution, the instruction being executed is highlighted in yellow, and the register that was
last modified is highlighted in green. Also, the variable that was last updated in the data segment is
highlighted in blue. It’s usually only possible to see the highlighting when you are stepping or
running at less than full speed.
For more details about the MARS simulator, refer to the MARS documentation at the following
link: http://courses.missouristate.edu/KenVollmar/MARS/
1. Test a simple MIPS program. Consider the following program shown below:
d) What output does the program produce? and where does it appear?
a) Download and assemble the Fibonacci.asm program from the MARS website.
e) Single-step through the program and watch how register and memory values change.
g) Set a breakpoint at the first instruction that prints results. What is the address of this
instruction?
h) Run the program at full speed and watch how it stops at the breakpoint.
to:
# Title:
# Author:
# Date:
# Description:
# Input:
# Output:
################### Data segment #####################
.data
. . .
################### Code segment #####################
.text
.globl main
main: # main function entry
. . .
li $v0, 10
syscall # system call to exit program
.data Defines the data segment of the program, containing the program’s variables.
.text Defines the code segment of the program, containing the instructions.
.globl Defines a symbol as global that can be referenced from other files.
2. Executable Instructions: These generate machine code for the processor to execute at runtime.
Instructions tell the processor what to do.
3. Pseudo-Instructions and Macros: Translated by the assembler into real instructions. These
simplify the programmer task.
In addition there are comments. Comments are very important for programmers, but ignored by the
assembler. A comment begins with the # symbol and terminates at the end of the line. Comments
can appear at the beginning of a line, or after an instruction. They explain the program purpose,
when it was written, revised, and by whom. They explain the data and registers used in the program,
input, output, the instruction sequence, and algorithms used.
Before you can run a MIPS program, you must convert the assembly language code into an
executable form. This involves two steps:
1. Assemble: translate the MIPS assembly language code into a binary object file. This is done by
the assembler. If there is more than one assembly language file, then each should be assembled
separately.
2. Link: combine all the object files together (if there is more than one) as well as with libraries.
This is done by the linker. The linker checks if there are any calls to functions in libraries. The
result is an executable file.
Figure 2.2 summarizes the Edit-Assemble-Link-Run cycle of the program development process. If a
program is written in assembly language, the assembler detects any syntax errors and will report
them to the programmer. Therefore, you should edit your program and assemble it again if there any
syntax errors.
It is typical that the first executable version of your program to have some runtime errors. These
errors are not detected by the assembler but occur when you are running your program. For
example, your program might compute erroneous results. Therefore, you should debug your
program to identify the errors at runtime. You can run your program with various inputs and under
different conditions to verify that it is working correctly. You can use the slow execution mode in
All MIPS instructions are 32-bit wide and occupy 4 bytes in memory. The address of a MIPS
instruction in memory is always a multiple of 4 bytes. There are three basic MIPS instruction
formats: Register (R-Type) format, Immediate (I-Type) format, and Jump (or J-Type) format as
shown in Figure 2.3.
All instructions have a 6-bit opcode that defines the format and sometimes the operation of an
instruction. The R-type format has two source register fields: Rs and Rt, and one destination
register field Rd. All register fields are 5-bit long and address 32 general-purpose registers. The sa
field is used as the shift amount for shift instructions and the funct field defines the ALU function
for R-type instructions.
The I-type format has two register fields only: Rs and Rt, where Rs is always a source register,
while Rt can be a destination register or a second source depending on the opcode. The 16-bit
The J-type format has no register field. The 26-bit Immediate field is used as an address in jump
and function call instructions.
R-Type Format
Op6 Rs5 Rt5 Rd5 sa5 funct6
I-Type Format
Op6 Rs5 Rt5 Immediate16
J-Type Format
Op6 Immediate26
The MIPS architecture defines 32 general-purpose registers, numbered from $0 to $31. The $ sign
is used to refer to a register. To simplify software development, the assembler can also refer to
registers by name as shown in Table 2.1. The assembler converts a register name to its
corresponding number.
The label is optional. It marks the memory address of the instruction. It must have a colon. In
addition, a label can be used for referring to the address of a variable in memory.
The operands specify the data required by the instruction. Different instructions have different
number of operands. Operands can be registers, memory variables, or constants. Most arithmetic
and logical instructions have three operands.
An example of a MIPS instruction is shown below. This example uses the addiu to increment the
$t0 register:
To be able to write programs, a basic set of instructions is needed. Only few instructions are described
in the following tables. Table 2.2 lists the basic arithmetic instructions and Table 2.3 lists basic
control instructions.
Instruction Meaning
add Rd, Rs, Rt Rd = Rs + Rt. Overflow causes an exception.
sub Rd, Rs, Rt Rd = Rs – Rt. Overflow causes an exception.
addi Rt, Rs, Imm Rt = Rs + Imm (16-bit constant). Overflow causes an exception.
li Rt, Imm Rt = Imm (pseudo-instruction).
la Rt, var Rt = address of var (pseudo-instruction).
move Rd, Rs Rd = Rs (pseudo-instruction).
Instruction Meaning
beq Rs, Rt, label if (Rs == Rt) branch to label.
bne Rs, Rt, label if (Rs != Rt) branch to label.
j label Jump to label.
System calls are operating-system specific. Each operating system provides its own set of system
calls. Because MARS is a simulator, there is no operating system involved. The MARS simulator
handles the syscall exception and provides system services to programs. Table 2.1 shows a small
set of services provided by MARS for doing basic I/O.
Before using the syscall instruction, you should load the service number into register $v0, and load
the arguments, if any, into registers $a0, $a1, etc. After issuing the syscall instruction, you should
retrieve return values, if any, from register $v0.
Now, we are ready to write a MIPS assembly language program. A simple program that asks the user
to enter an integer value and then displays the value of this integer is shown in Figure 2.4.
Five system calls are used. The first system call prints string str1. The second system call reads an
input integer. The third system call prints str2. The fourth system call prints the integer value that was
input by the user. The fifth system call exits the program.
1. Modify the program shown in Figure 2.4. Ask the user to enter an integer value, and then print
the result of doubling that number. Use the add instruction.
2. Modify again the program shown in Figure 2.4. Ask the user whether he wants to repeat the
program: "\nRepeat [y/n]? ". Use service code 12 to read a character and the branch
instruction to repeat the main function if the user input is character 'y'.
3. Write a MIPS program that asks the user to input his name and then prints "Hello ", followed
by the name entered by the user.
4. Write a MIPS program that executes the statement: s = (a + b) – (c + 101), where a, b, and c are
user provided integer inputs, and s is computed and printed as an output. Answer the following:
a. Suppose the user enters a = 5, b = 10, and c = -30, what is the expected value of s?
b. Which instruction in your program computed the value of s and which register is used?
c. What is the address of this instruction in memory?
d. Put a breakpoint at this instruction and write the value of the register used for computing s in
decimal and hexadecimal.
5. Write a MIPS program that inputs two integer values. The program should output equal if the
two integers are equal. Otherwise, it should output not equal. Use the branch instruction to
check for equality.
3.1 Objectives
The MIPS add/subtract instructions are shown in Table 3.1 below. The R-type add/sub instructions
have three registers where the first register is the destination register while the other two registers
are the two registers to be added. Similarly the I-type add/sub instructions have the first register as
the destination register, while the second register and the constant are the operands to be either
added or subtracted.
Instruction Meaning
add $s1, $s2, $s3 $s1 = $s2 + $s3
addu $s1, $s2, $s3 $s1 = $s2 + $s3
sub $s1, $s2, $s3 $s1 = $s2 – $s3
subu $s1, $s2, $s3 $s1 = $s2 – $s3
addi $s1, $s2, 10 $s1 = $s2 + 10
addiu $s1, $s2, 10 $s1 = $s2 + 10
The difference between add/sub and addu/subu instructions is that in case of overflow
occurrence, the add/sub instructions will cause an arithmetic exception and the result will not be
written to the destination register. However, for the instructions addu/subu, overflow occurrence
is ignored.
To illustrate the use of the addiu instruction with constants, let us assume that variables a, b, and c
are allocated in registers $s0, $s1, $s2. Then,
The MIPS logical instructions are given in Table 3.2. These include the: and, or, nor and xor
instructions. The operands for these instructions follow the same convention as the add/sub
instructions. The immediate value for the andi, ori, and xori logical instructions is treated as
unsigned constant, while it is treated as a signed constant for the addi and addiu instructions.
Instruction Meaning
and $s1, $s2, $s3 $s1 = $s2 & $s3
or $s1, $s2, $s3 $s1 = $s2 | $s3
xor $s1, $s2, $s3 $s1 = $s2 ^ $s3
nor $s1, $s2, $s3 $s1 = ~($s2|$s3)
andi $s1, $s2, 10 $s1 = $s2 & 10
ori $s1, $s2, 10 $s1 = $s2 | 10
xori $s1, $s2, 10 $s1 = $s2 ^ 10
The truth tables of the and, or, xor and nor logical operations are given below:
As an example of these instructions, assume that $s1 = 0xabcd1234 and $s2 = 0xffff0000.
Then, the following logical instructions produce the shown resulting values in registers $s3 to $s6:
and $s3,$s1,$s2 # $s3 = 0xabcd0000
The sample program to run this code is given below and the resulting register content after
executing the program is shown in Figure 3.1.
.text
.globl main
main: # main program
li $s1, 0xabcd1234 # Pseudo instruction to initialize a register
li $s2, 0xffff0000
and $s3,$s1,$s2 # $s3 = 0xabcd0000
or $s4,$s1,$s2 # $s4 = 0xffff1234
xor $s5,$s1,$s2 # $s5 = 0x54321234
nor $s6,$s1,$s2 # $s6 = 0x0000edcb
Arithmetic and logic instructions have many useful applications. For example, to convert a
character in register $s0 from lower case (i.e. 'a' to 'z') to upper case (i.e. 'A' to 'Z'), we could
use any of the following instructions:
subi $s0, $s0, 0x20 # ASCII code of 'a' = 0x61, of 'A' = 0x41
Similarly, to convert a character in register $s0 from upper case to lower case, we could use any of
the following instructions:
To initialize the content of register $s0 by a 16-bit constant k (i.e., having a value in the range 0 to
215-1), we can use any of the following instructions:
However, to initialize a register with a 32-bit constant, we need to use the lui instruction.
Suppose that we want to initialize $s1 with the constant 0xAC5165D9 (32-bit constant), then we
can use the following two instructions:
This sequence of instructions is generated by the assembler when we use the pseudo instruction:
li $s1, 0xAC5165D9
The MIPS shift instructions are given in Table 3.3. The first operand of the shift instructions is the
destination register, the second operand is the register to be shifted while the third operand specifies
the amount of shift. The amount of shift can be specified as a constant value or it can be stored in a
register. For the instructions sll, srl, sra, the shift amount is a 5-bit constant while for the
instructions sllv, srlv, srav, the shift amount is variable and is stored in a register.
Instruction Meaning
sll $s1,$s2,10 $s1 = $s2 << 10
srl $s1,$s2,10 $s1 = $s2>>>10
sra $s1,$s2,10 $s1 = $s2 >> 10
sllv $s1,$s2,$s3 $s1 = $s2 << $s3
srlv $s1,$s2,$s3 $s1 = $s2>>>$s3
srav $s1,$s2,$s3 $s1 = $s2 >> $s3
Shifting is to move all the bits in a register left or right. sll/srl mean shift left/right logical while
sra means shift right arithmetic for which the sign-bit (rather than 0) is shifted from the left as
illustrated in Figure 3.2.
As an example, let us assume that $s2 = 0xabcd1234 and $s3 = 16. Then, the following shift
instructions produce the shown values in $s1.
As another example, let us multiply the content of $s1 by 31. We can do that using the following
instructions noting that 31=32-1:
We can also use the shift right instructions (srl and sra) to divide a number by a power of 2
constant. Shifting register $s0 right by n bits divides its content by 2n. For example, to divide an
unsigned number in register $s0 by 8, we use the instruction srl $s0, 3. However, to divide a
signed number in register $s0 by 8, we use the instruction sra $s0, 3.
1. Write a program to ask the user to enter two integers A and B and then display the result of
computing the expression: A + 2B - 5.
2. Assume that $s1 = 0x12345678 and $s2 = 0xffff9a00. Determine the content of registers
$s3 to $s6 after executing the following instructions:
or $s4,$s1,$s2 # $s4 =
Write a program to execute these instructions and verify the content of registers $s3 to $s6.
Write a program to execute these instructions and verify the content of registers $s2 to $s4.
4. Write a program that asks the user to enter an alphabetic character (either lower or upper case)
and change the case of the character from lower to upper and from upper to lower and display it.
5. Write a program that asks the user to enter and integer number and read it. Then ask him to
enter a bit position (between 0 and 31) and display the value of that bit.
6. Write a program that asks the user to enter a signed number and read it. Then display the
content of multiplying this number by 24.5.
7. Write a program that asks the user to enter an unsigned number and read it. Then swap the bits
at odd positions with those at even positions and display the resulting number. For example, if
the user enters the number 9, which has binary representation of 1001, then bit 0 is swapped
with bit 1, and bit 2 is swapped with bit 3, resulting in the binary number 0110. Thus, the
program should display 6.
4.1 Objectives
Like all processors, MIPS has instructions for implementing unconditional and conditional jumps.
The MIPS Jump and Branch instructions are shown in Table 4.1.
For unconditional jump, the instruction j label is used where label is the address of the target
instruction as shown below:
. . .
label:
Four additional MIPS instructions are provided based on comparing the content of a register with 0
as follows:
Note that MIPS does not provide the instructions beqz and bnez as these can be implemented
using the beq and bne instructions with register $0 used as the second operand.
Note that the instructions slt and slti are used for signed comparison while instructions sltu
and sltiu are used for unsigned comparison.
For example, assume that $s0 = 1 and $s1 = -1 = 0xffffffff, then the following two
instructions produce different results as shown below:
Pseudo instructions are instructions introduced by an assembler as if they were real instructions. We
have seen an example of a pseudo instruction before, which is the li instruction. Pseudo
instructions are useful as they facilitate programming in assembly language.
For example, the MIPS processor does not have the following useful conditional branch comparison
instructions:
The reason for not implementing these instructions as part of the MIPS instruction set is that they
can be easily implemented based on a set of two instructions.
For example, the instruction blt $s0, $s1, label can be implemented using the following
sequence of two instructions:
Similarly, the instruction ble $s2, $s3, label can be implemented using the following
sequence of two instructions:
Table 4.2 shows more examples of pseudo instructions. Note that the assembler temporary register
$at=$1 is reserved for its own use.
We can translate any high-level flow construct into assembly language using the jump, branch and
set-less-than instructions. For example, let us consider the following if statement:
if (a == b) c = d + e; else c = d – e;
j exit
exit: . . .
We can also implement an IF statement with a compound condition involving logical AND
operation. For example, let us consider implementing the following IF statement:
The IF statement is implemented efficiently using the following assembly code which uses the fall
through concept which skips the execution of the instruction if the first condition is false otherwise
it continues the execution:
next:
The IF statement is implemented efficiently using the following assembly code which checks the
first condition and if it is true, it skips the second condition:
next:
We can also implement all types of loops. Let us consider implementing the following for loop:
loop body
li $s0, 0 # i = 0
ForLoop:
loop body
j ForLoop
EndFor:
i=0;
while (i<n) {
loop body
i++;
We can note that this while loop has identical behavior to the for loop and hence its assembly
code will be identical.
i=0;
do {
loop body
i++;
} while (i<n)
The do-while loop can be translated using the following assembly code:
li $s0, 0 # i = 0
WhileLoop:
loop body
1. Write a program that asks the user to enter an integer and then displays the number of 1's in the
binary representation of that integer. For example, if the user enters 9, then the program should
display 2.
2. Write a program that asks the user to enter two integers: n1 and n2 and prints the sum of all
numbers from n1 to n2. For example, if the user enters n1=3 and n2=7, then the program
should display the sum as 25.
3. Write a program that asks the user to enter an integer and then display the hexadecimal
representation of that integer.
4. The Fibonacci sequence are the numbers in the following integer sequence: 0, 1, 1, 2, 3,
5, 8, 13, 21, 34, 55, 89, 144, ...
The first two numbers are 0 and 1 and each subsequent number is the sum of the previous two.
Write a program that asks the user to enter a positive integer number n and then prints the nth
number in the Fibonacci sequence. The following algorithm can be used:
Input: n positive integer
Output: nth Fibonacci number
Fib0 = 0 Fib1 = 1
for (i=2; i <= n; i++) do
temp = fib0
fib0 = fib1
fib1 = temp + fib1
if (n > 0) fib = fib1
else fib = 0
5. One method for computing the greatest common divisor of two positive numbers is the binary
gcd method, which uses only subtraction and division by 2. The algorithm of the binary gcd is
outlined below:
Write a program that asks the user to enter two positive numbers a and b and outputs the
greatest common divisor of the two numbers by implementing the given algorithm. If the user
enters a=48 and b=18, your program should output the gcd as 6.
5.1 Objectives
Unlike high-level programming languages, assembly language has no special notion for an array.
An array is just a block of memory. In fact, all data structures and objects that exist in a high-level
programming language are simply blocks of memory. The block of memory can be allocated
statically or dynamically, as will be explained shortly.
An array is a homogeneous data structure. It has the following properties:
1. All array elements must be of the same type and size.
2. Once an array is allocated, its size becomes fixed and cannot be changed.
3. The base address of an array is the address of the first element in the array.
4. The address of an element can be computed from the base address and the element index.
An array can be allocated and initialized statically in the data segment. This requires:
1. A label: for the array name.
2. A .type directive for the type and size of each array element.
3. A list of initial values, or a count of the number of elements
A data definition statement allocates memory in the data segment. It has the following syntax:
label: .type value [, value] . . .
Examples of data definition statements are shown below:
Heap Area
0x10040000 Data Segment
0x10000000 Static Area
Text Segment
0x00400000
0x00000000 Reserved
For example, to translate matrix[1][5] = 73 into MIPS assembly language, one must compute:
&matrix[1][5] = &matrix + (1×20+5)×4 = &matrix + 100.
The following while loop searches an array of n integers linearly for a given target value:
int i=0;
while (arr[i]!=target && i<n) i = i+1;
Given that $a0 = &arr (address of arr), $a1 = n, and $a2 = target, the above loop is
translated into MIPS assembly code as follows:
move $t0, $a0 # $t0 = address of arr
li $t1, 0 # $t1 = index i = 0
while:
lw $t2, 0($t0) # $t2 = arr[i]
beq $t2, $a2, next # branch if (arr[i] == target) to next
beq $t1, $a1, next # branch if (i == n) to next
addi $t1, $t1, 1 # i = i+1
sll $t3, $t1, 2 # $t3 = i×4
add $t0, $a0, $t3 # $t0 = &arr + i×4 = &arr[i]
j while # jump to while loop
next:
. . .
To calculate the address of arr[i], the sll instruction shifts left i by 2 bits (computes i×4) and
then the add instruction computes &arr[i] = &arr + i×4. However, one can also point to the next
array element by incrementing the address in $t0 by 4, as shown below. Using a pointer to traverse
an array sequentially is generally faster than computing the address from the index.
The for loop above sets the elements of column 3 to their row numbers. The first four instructions
used in the above for loop are used for address computation. Element addresses are computed
using the address of the first element in each row ($t5) and a fixed offset of 12. However, using a
pointer to traverse a column is much faster than re-computing the address from the index. Because
each row has 5 integer elements, the distance between two consecutive elements in the same
column is 20 bytes. Below, the MIPS assembly code is rewritten to use register $t0 as a pointer.
The number of instructions in the loop is reduced from 7 down to 4.
5.6 Files
The operating system manages files on the disk storage. It provides system calls to open, read from,
and write to files. The MARS tool simulates some of the services of the operating system and
provides the following system calls:
1. Given the following data definition statements, compute the addresses of arr2, arr3, str1,
and str2, given that the address of arr1 is 0x10010000. Show your steps for a full mark.
Select “Show Labels Window (symbol table)” from the Settings menu in MARS to check the
values of your computed addresses.
.data
arr1: .word 5:20
arr2: .half 7, -2, 8, -6
arr3: .space 100
str1: .asciiz "This is a message"
str2: .asciiz "Another important string"
2. In problem 1, given that arr1 is a one-dimensional array of integers, what are the addresses of
arr1[5] and arr1[17]?
3. In problem 1, given that arr3 is a two-dimensional array of bytes with 20 rows and 5 columns,
what are the addresses of arr3[7][2], arr3[11][4], and arr3[19][3]?
4. Write a MIPS program that defines a one-dimensional array of 10 integers in the static area of
the data segment, asks the user to input all 10 array elements, computes, and displays their sum.
7. Write a MIPS program to sort an array of integers in ascending order using the selection sort
algorithm. The array size should be entered by the user. The array should be allocated
dynamically on the heap. The array elements should be generated randomly using the random
number generator. The array elements should be printed before and after sorting.
Sequential binary multiplication is a simple but slow form of multiplication. It is performed using
addition and shift operations as illustrated in Figure 6.1. The multiplication of two 32-bit integers is a
64-bit product stored in two registers: HI and LO.
Register HI is initialized with the value 0, and LO is loaded with the value of the multiplier. The
sequential algorithm is repeated 32 times for each bit of the multiplier. Finally, the product is
computed in two registers HI and LO.
Signed multiplication can also be performed using the same sequential binary multiplier, but with
minor modification. When adding (HI + Multiplicand) the proper sign bit of the result is computed,
instead of the carry bit. When shifting the HI and LO registers to the right, the sign-bit is inserted to
the left of the product. Additions are used for the first 31 steps. However, the last step should use
subtraction (rather than addition), if the sign-bit of the multiplier is negative.
Sequential binary multiplication is slow because it requires one cycle for each bit of the multiplier.
Faster binary multiplication can be done in hardware, as shown in Figure 6.3. The cost of the binary
multiplier increases because it uses many adders, instead of just one used as in Figure 6.2.
Sequential binary division can be performed using shift and subtract operations. Binary division
produces a quotient and a remainder. It also uses two registers HI and LO. The quotient is computed
in the LO register and the remainder is computed in the HI register. HI is initialized with zero and
LO is initialized with the dividend. At each iteration step, registers HI and LO are shifted left by 1
bit. The difference: (HI – Divisor) is computed. If this difference is ≥ 0, the Remainder (HI register)
is set equal to the difference and the least significant bit of the quotient (LO register) is set to 1. The
sequential binary division algorithm is repeated 32 times, as shown in Figure 6.4.
A simple but slow sequential binary divider is shown in Figure 5.5. It uses a 32-bit ALU that does
subtraction. It also uses HI and LO registers, a Divisor register, and simple control logic to shift left
the HI and LO registers and set the least-significant bit of LO. The control logic repeats 32 times to
compute each bit of the quotient in LO. The final remainder will be in the HI register.
Multiplication and division generate results that are larger than 32 bits. Multiplication produces a
64-bit product while division produces a 32-bit quotient and a 32-bit remainder. The MIPS
architecture provides two special 32-bit registers that are the target for integer multiply and divide
instructions. The source operands come from the general-purpose register file. However, the results
are written into HI and LO registers. For multiplication, the HI and LO registers form a 64-bit
product. For division, HI stores the 32-bit remainder, while LO stores the 32-bit quotient.
MIPS defines two multiply instructions: mult for signed multiplication and multu for unsigned
multiplication. Both multiply instructions produce a 64-bit product without overflow. There is also
a third mul instruction that computes the same 64-bit product as a mult instruction in HI and LO
registers. However, the mul instruction also copies the LO register into destination register Rd. The
mul instruction is useful when the product is small and can fit in a 32-bit destination register.
In addition, MIPS defines two integer divide instructions: div for signed division and divu for
unsigned division. The quotient of the integer division is saved in the LO register, while the
remainder is saved in the HI register as shown in Table 6.2.
If the divisor in register Rt is zero, then the MIPS divide instructions do not compute any result in
the HI and LO registers. Division by zero is ignored and no exception is produced. The MIPS
programmer must ensure that the divisor is non-zero when using the integer divide instruction.
Special instructions are used to move data between the HI and LO registers and general-purpose
registers. These are listed in Table 6.3.
Raising an integer number x to a power n (xn), may be computed using successive multiplications.
The following code uses integer multiplication to implement the power function. Registers $a0 and
$a1 are used to store x and n, while $v0 contains the result.
li $a0, 7 # number x
li $a1, 5 # number n
li $v0, 1 # $v0 = 1
pow:
mul $v0, $v0, $a0 # $v0 = $v0, $a0
addiu $a1, $a1, -1 # decrement n
bnez $a1, pow # loop if (n != 0)
The following MIPS loop computes the gcd of two numbers stored in registers $a0 and $a1. The
final result is computed in register $a0.
li $a0, 30 # number a
li $a1, 18 # number b
gcd:
div $a0, $a1 # HI = remainder, LO = quotient
move $a0, $a1 # $a0 = number b
mfhi $a1 # $a1 = a % b (remainder)
bnez $a1, gcd # loop if (b != 0)
.text
main:
la $t0, str # load address of str into $t0
li $v0, 0 # Initialize $v0 = 0
li $v1, 10 # Initialize $v1 = 10
lb $t1, ($t0) # load byte: $t1 = Memory($t0)
str2int:
addiu $t1, $t1, -48 # convert character to a number
mul $v0, $v0, $v1 # $v0 = $v0 * 10
addu $v0, $v0, $t1 # $v0 = $v0 + digit
addiu $t0, $t0, 1 # point to next character in memory
lb $t1, ($t0) # load byte: $t1 = Memory($t0)
bnez $t1, str2int # loop back if not NULL character
done:
. . . # $v0 = integer result
An unsigned integer can be converted to a string by successive divisions by 10. The remainder is a
digit between 0 and 9. The remainder digits are computed backwards starting at the least significant
digit. Each remainder is then converted to an ASCII character and stored in memory in a string. For
example, consider converting the integer 5218 into a string. This can be done as follows:
5128 / 10 = 512, 5128 % 10 = 8 Convert 8 into character '8'
512 / 10 = 51, 512 % 10 = 2 Convert 2 into character '2'
51 / 10 = 5, 51 % 10 = 1 Convert 1 into character '1'
5 / 10 = 0, 5 % 10 = 5 Convert 5 into character '0'
Stop when the quotient is zero
The following MIPS code converts an unsigned integer stored in register $a0 into a string stored in
the data segment in memory. The string is initialized with 10 space characters. The string has 10
characters only because a 32-bit unsigned integer can have at most 10 digits.
1. Write MIPS code to perform the following integer multiplications. What is the value of the LO
and HI registers?
a) 98765 × 54321 using the multu instruction
b) -98765 × -54321 using the mult instruction
2. Write MIPS code to perform the following integer divisions. What is the value of the LO and HI
registers?
a) 98765 / 54321 using the divu instruction
b) -98765 / -54321 using the div instruction
3. Factorial Calculation: Using the mul instruction, write a MIPS program that computes the
factorial of a number n input by the user, and display the result on the screen. Run your code
and record the maximum 32-bit factorial value that can be computed without errors.
4. The string-to-integer program presented in Section 6.5 converts a string of decimal digits to an
unsigned integer using successive multiplications by 10 and additions. It is also possible to
convert a string of digits in any radix system to an integer, using successive multiplications by
the radix value and additions. Rewrite the string-to-integer program asking the user to input a
5. The integer-to-string program presented in Section 6.5 converts an unsigned integer to string
format using successive division by 10 and storing the remainder digit characters in a string. It
is also possible to convert the unsigned integer to any radix using successive divisions by the
radix value. Rewrite the integer-to-string program asking the user to input an unsigned integer
and a radix value between 2 and 10. Do the radix conversion and then print the string. Make
sure that the string has sufficient space characters, especially when converting to radix 2.
6. Fraction computation: Using successive integer multiplications and divisions, write a MIPS
program that divides an integer x by another integer y that are read as input. The result of the
division should be in the form: a.b, where a is the integer part and b is the fractional part.
Compute the fraction b with 8 digits after the decimal point. Display the result in the form a.b.
A function (or a procedure) is a tool that programmers use to structure programs, to make them
easier to understand, and to allow the function’s code to be reused. A function is a block of
instructions that can be called and used when required at several different points in the program.
The function that initiates the call to another function is known as the caller. The function that
receives and executes the call is known as the callee. When the callee function finishes execution,
control is transferred back to the caller function.
A function can receive parameters and return results. The parameters and results act as an interface
between a function and the rest of the program.
To execute a function, the program must follow these steps:
1. The caller must put the parameters in a place where the callee function can access them
2. Transfer control to the callee function
3. Execute the callee function
4. The callee function must put the results in a place where the caller can access them
5. Return control to the caller (point of origin) next to where the call was made
Registers are the fastest place to pass parameters and return results. The MIPS architecture follows
the following software conventions for passing parameters and returning results:
• $a0-$a3: four argument registers in which to pass parameters
• $v0-$v1: two value registers in which to return function results
• $ra: one return address register to return back to the caller
Figure 7.1: Example of a C function and its translation into MIPS assembly code
To call the function islower, the caller must first copy the character ch into register $a0 and then
make the function call. This is shown in Figure 7.2:
Figure 7.2: Using the jal instruction in MIPS to initiate a function call
Remember that the jal instruction saves the return address in register $ra and that jr jumps into
the return address in register $ra to achieve a function return.
The MIPS architecture provides three instructions to support functions and methods in high-level
programming languages. The jal (jump-and-link) instruction is used to call functions whose
addresses are constants known at compile time, while the jalr (jump-and-link register) instruction
Every program has three segments when it is loaded into memory by the operating system. There is
the text segment where the machine language code is stored, the data segment where space is
allocated for constants and variables, and the stack segment that provides an area that can be
allocated and freed by functions. The programmer has no control over where these segments are
located in memory. The stack segment can be used by functions for passing many parameters, for
allocating space for local variables, and for saving and preserving registers across calls. Without the
stack segment in memory, it would be impossible to write recursive functions, or pure functions that
have no side effects.
When a program is loaded into memory, the operating system initializes the stack pointer $sp
(register $29) to the base address of the stack segment. The stack segment grows downwards
towards lower memory addresses as shown in Figure 7.4.
0x7fffffff
Stack Segment Stack grows
Downwards
Heap Area
0x10040000 Data Segment
0x10000000 Static Area
Text Segment
0x00400000
0x00000000 Reserved
When a program starts execution, the operating system initializes the stack pointer $sp register
with a valid address to point to the top of the stack. For example, when executing a MIPS program
under the MARS tool, the initial value of register $sp is 0x7fffeffc.
A function can allocate space on the stack for saving registers and for its local variables. The space
that a function allocates on the stack is called a stack frame (called also an activation record).
Figure 7.5: Stack allocation (a) before (b) while executing, and (c) after returning from function f
An example of a function f that allocates a stack frame is shown in Figure 7.6. The function f is
non-leaf, because it calls functions read, reverse, and print. Therefore, the return address of
function f (register $ra) must be saved on the stack. In addition, the stack frame of function f must
allocate space for the local array (10 integer elements = 40 bytes), as shown in Figure 7.6.
A convention regarding the usage of registers is necessary because software is written by many
programmers. In this case, each programmer must know how registers are supposed to be used,
such that his piece of the software does not conflict with pieces written by other programmers.
Since programming is done today using high-level programming languages, you may ask why such
a register convention is still needed. Well, it is the compiler who needs to know about it. This is
because a program can be created from different pieces that are compiled separately. To compile a
function, the compiler must know which registers are used to pass parameters, which registers are
used to return results, and which registers must be preserved across function calls. These rules for
register usage are also known as function call conventions.
The MIPS hardware does not prevent you from ignoring these rules, from not preserving registers,
from using any register in passing parameters and returning results. However, if you ignore these
rules, you will easily run into trouble and have software bugs that are difficult to eliminate.
Register Register
Register Usage
Name Number
$zero $0 Always zero. Cannot be modified
$at $1 Reserved for assembler use
$v0 - $v1 $2 - $3 Function results are returned in $v0 and $v1
$a0 - $a3 $4 - $7 Function arguments are passed in $a0 thru $a3
$t0 - $t7 $8 - $15 Temporary registers. Not preserved across function calls
$s0 - $s7 $16 - $23 Saved registers. Must be preserved across function calls
$t8 - $t9 $24 - $25 Additional temporary registers. Not preserved
$k0 - $k1 $26 - $27 Reserved for OS kernel usage
$gp $28 Global pointer to global data. Must be preserved
$sp $29 Stack pointer. Must be preserved
$fp $30 Frame pointer. Must be preserved
$ra $31 Return address register. Must be preserved
Figure 7.8: MIPS register usage convention
A function is free to modify the value registers $v0-$v1, the argument registers $a0-$a3, and the
temporary registers $t0-$t7 and $t8-$t9 without saving their old values. However, it should not
modify registers $s0-$s7, $gp, $sp, $fp, and $ra except after saving their old values in memory
on the stack. A function must restore the value of registers $s0-$s7, $gp, $sp, $fp, and $ra by
loading their old values from the stack, just before returning back to the caller. Registers $sp and
$fp must be preserved if a new stack frame is allocated by a function. Register $ra must be
preserved if a function makes a call to another function, because the jal instruction modifies the
return address register $ra.
A recursive function is a function that calls itself. For example, the recursive function fact
(factorial) and its translation into MIPS assembly code are shown in Figure 7.9. If (n<2) then there
is no need to allocate a stack frame. However, if (n>=2) then the factorial function allocates a
stack frame of 8 bytes to save registers $a0 and $ra.
Register $a0 (argument n) is saved on the stack because its value is changed in the recursive call,
and because it is needed after returning from the recursive call. Register $ra is saved on the stack
because its value is changed by the recursive call (jal fact).
Figure 7.9: A recursive function and its translation into MIPS assembly language code
1. The function islower, shown in Figure 7.1, tests whether a character ch is lowercase or not.
Write the main function of a program that reads a character ch, calls the function islower,
and then prints a message to indicate whether ch is a lowercase character or not.
2. Write a function that reads an array of n integers. The function read must receive two
arguments: $a0 = address of the array, and $a1 = number n of elements to read.
3. Write a function that prints an array of n integers. The function print must receive two
arguments: $a0 = address of the array, and $a1 = number n of elements to print.
4. Write a function that reverses the elements of an array of n integers. The function reverse
must receive two arguments: $a0 = address of the array, and $a1 = number n of elements.
5. Suppose we rewrite function f (Figures 7.6) to have an integer parameter n. The local array is
now declared to have n integers (rather than 10). This means that the size of the stack frame size
8. The function quick_sort sorts an array recursively. Translate this function into MIPS code.
Write a main function to call and test this function.
void quick_sort(int array[], int low, int high) {
int i = low, j = high; // low and high index
int pivot = array[(low+high)/2]; // pivot = middle value
while (i <= j) {
while (array[i] < pivot) i++;
while (array[j] > pivot) j--;
if (i <= j) {
int temp=array[i];
array[i]=array[j]; // swap array[i]
array[j]=temp; // with array[j]
i++;
j--;
}
}
if (low < j) quick_sort(array, low, j); // Recursive call 1
if (i < high) quick_sort(array, i, high); // Recursive call 2
}
8.1 Objectives
Branches and jumps change the control flow in a program. Exceptions also change the control flow.
The MIPS architecture calls an exception any unexpected change in control flow, regardless of its
source.
An exception is said to be synchronous if it is caused by an instruction in the running program.
Examples of synchronous exceptions are arithmetic exceptions, invalid memory addresses
generated by load and store instructions, and trap instructions.
An exception is said to be asynchronous if it is caused by an I/O device requesting the processor.
This is also known as a hardware interrupt, which is not related to program execution. Interrupts
can be caused by a variety of I/O devices, such as the keyboard, timer, and disk controller.
When an exception happens, control is transferred to an exception handler, written specifically for
the purpose of dealing with exceptions. After executing the exception handler, control is returned
back to the program. The program continues as if nothing happened. The exception handler appears
as a procedure called suddenly in the program with no parameters and no return value.
The MIPS processor operates either in user or kernel mode. User programs (applications) run in
user mode. The CPU enters the kernel mode when an exception happens. The exception handling
mechanism is implemented by Coprocessor 0, which has several important registers, such as:
vaddr, status, cause, and EPC, which record information about an exception.
1. Vaddr ($8): Contains the invalid memory address caused by load, store, or fetch.
2. Status ($12): Contains the interrupt mask and enable bits (see below).
3. Cause ($13): Contains the type of exception and any pending bits (see below).
4. EPC ($14): Contains the address of the instruction when the exception occurred.
# Breakpoint Exception
# Caused by the break instruction
.text
. . .
break
15 14 13 12 11 10 9 8 6 5 4 3 2
15 14 13 12 11 10 9 8 1 0
EL IE
Interrupt Mask
Figure 8.5: The Status Register $12
Instruction Meaning
teq Rs, Rt Raise the trap exception if register Rs is equal to register Rt
tne Rs, Rt Raise the trap exception if register Rs is not equal to register Rt
tlt Rs, Rt Raise the trap exception if register Rs is less than register Rt
. . . There are other trap instructions not listed here (see Appendix B)
break code Raise the breakpoint exception. Code 1 is reserved for the debugger
syscall Raise the system call exception. Service number is specified in $v0
The trap instructions (teq, tne, … etc.) raise the TRAP exception code 13. The break instruction
raises the breakpoint exception code 9. However, the MARS simulator provides services for the
syscall instruction without raising an exception. On a real system, the syscall instruction raises
the system call exception code 8, which is serviced by the operating system.
When an exception occurs, the processor switches to kernel mode. Coprocessor 0 registers can be
accessed only when the processor is servicing an exception in kernel mode. Register values can be
transferred from and to coprocessor 0 using the following instructions. Load and store instructions
that transfer data between coprocessor 0 registers and memory are also available. These instructions
can be used when writing an exception handler.
After an exception is processed, the exception handler can return back and resume the execution of
a program. The eret instruction is used to return from an exception.
Instruction Meaning
mfc0 Rd, C0src Move from Coprocessor 0 register C0src into destination register Rd
mtc0 Rs, C0dst Move to Coprocessor 0 register C0dst the value of register Rs
lwc0 C0dst, addr Load a word from memory into Coprocessor 0 register C0dst
swc0 C0src, addr Store Coprocessor 0 register C0src in memory
eret Reset EL = 0 (back to user mode) and return: PC = EPC
0xffff0000
Memory-Mapped I/O
Dynamic Area
0x10040000 Data Segment
0x10000000 Static Area
Text Segment
0x00400000
0x00000000 Reserved
When a function is called using jal, control is transferred at the address provided by the instruction
and the return address is saved in register $ra. In the case of an exception there is no explicit call.
In MIPS, when an exception occurs, control is transferred at the fixed address 0x80000180. The
exception handler must be located at that address.
The exception return address cannot be saved in $ra since it will modify a return address that has
been placed in that register before the exception. The Exception Program Counter (EPC) register is
used to store the address of the instruction that was executing when the exception was generated.
An exception handler can be written in the same file as the regular program, or in a separate file.
The exception handler must start at the fixed address 0x80000180. This address is in the kernel
text segment. If there is no instruction at address 0x80000180, MARS will terminate the MIPS
program with an appropriate error message. An example of an exception handler that prints the
address of the instruction that caused the exception and the exception code is shown below:
Rather than writing the exception handler at the end of every MIPS program, it is better to write the
exception handler in a separate file, then open the “Exception Handler …” dialog box in the MARS
settings and include the exception handler file in all your MIPS programs.
The rate that characters are typed on the keyboard is very slow compared to the rate that the MIPS
processor can execute instructions. Typically, millions of instructions are executed until a key is
pressed. Register $t1 keeps track of the number of iterations in the wait_keyboard loop. The lw
instruction outside the loop gets the character from the keyboard data register, which also clears the
ready bit. A MIPS programmer can only read the keyboard data register. Writing to the keyboard
data register has no effect.
Communicating with the display controller can be done via polling in a similar way. The display
registers are mapped to addresses 0xffff0008 and 0xffff000c. As long as the ready bit is zero,
the processor keeps reading it in a loop. We must not store a character in the data register until the
display is ready to receive it. The display controller clears the ready bit when a character is stored in
the data register. It then sets the ready bit again after the character is displayed and it is ready to
receive the next character to display.
li $t0, 0xffff0008 # Address of display control register
wait_display:
lw $t2, ($t0) # Read the display control register
andi $t2, $t2, 1 # Extract ready bit
beqz $t2, wait_display # loop back while not ready
sw $a0, 4($t0) # Send character to display
Figure 8.9: Keyboard and Display MMIO Simulator under the MARS Tools
The main drawback of polling is that it keeps the processor busy, wasting millions of cycles before
the I/O device (such as the keyboard) becomes ready. An alternative to polling is to use interrupts.
Interrupts can be enabled for the keyboard by setting the Interrupt Enable bit in the keyboard
control register as follows:
li $t0, 0xffff0000 # Address of keyboard control register
li $t1, 2
sw $t1, ($t0) # Enable keyboard interrupt
When a key is pressed, the keyboard sends an interrupt signal to the MIPS processor and sets the
cause register $13 to the value 0x00000100. Bit 8 in the cause register is set to indicate that the
keyboard has interrupted the processor. An interrupt handler must be written to read the character
from the keyboard and returning its value to the running program. This requires modifying the code
of the exception handler shown in Figure 8.7 to deal with interrupts.
The Disk and Ethernet I/O device controllers use Direct Memory Access (DMA) to transfer blocks
of data directly between the device and memory. As controllers become more complex, more than
two registers are associated with each device controller. These devices are not part of the MARS
simulator. In general, the supplier of any I/O device must provide programmers with an explanation
of how to properly communicate with the I/O device controller.
7. If the keyboard interrupt enable bit is set then the keyboard will interrupt the processor each
time a key is pressed. Write a simple interrupt handler that returns the character pressed in
register $v0. Rewrite the main function of question 6 using keyboard interrupts.
9.1 Objectives
S E = Exponent F = Fraction
The Sign bit S is zero (positive) or one (negative).
The Exponent field E is 8 bits for single-precision and 11 bits for double-precision. The exponent
field is biased. The Bias is 127 for single-precision and 1023 for double-precision.
The Fraction field F is 23 bits for single-precision and 52 bits for double-precision. Floating-point
numbers are normalized (except when E is zero). There is an implicit 1. (not stored) before the
fraction F. Therefore, the value of a normalized floating-point number is:
9: Floating-Point Page 1
Figure 9.1: Floating-Point Representation tool supported by MARS
The floating-point unit (called coprocessor 1) has 32 floating-point registers. These registers are
numbered as $f0, $f1, …, $f31. Each register is 32 bits wide. Thus, each register can hold one
single-precision floating-point number. How can we use these registers to store 64-bit double-
precision floating-point numbers? The answer is that the 32 single-precision registers are grouped
into 16 double-precision registers. The double-precision number is stored in an even-odd pair of
registers, but we only refer to the even-numbered register. For example, when we store a double-
precision number in $f0, it is actually stored in registers $f0 and $f1.
In addition, there are 8 condition flags, numbered from 0 to 7. These condition flags are used by
floating-point compare and branch instructions. These are shown in Figure 9.2.
9: Floating-Point Page 2
Figure 9.2: MIPS Floating-Point Registers and Condition Flags
The FPU supports several instructions including floating-point load and store, floating-point
arithmetic operations, floating-point data movement instructions, convert, and branch instructions.
We start this section with the floating-point load and store instructions. These instructions load into
or store a floating-point register. However, they use the same base-displacement addressing mode
used with integer instructions. Notice that the base address register is an integer (not a floating-
point) register.
9: Floating-Point Page 3
Instruction Example Meaning
swc1 or s.s swc1 $f5,4($t2) Store a single-precision floating-point register in
memory: MEM[$t2+4] = $f5
The floating-point arithmetic instructions are listed next. The .s extension is used for single-
precision arithmetic instructions, while the .d is used for double-precision instructions.
The data movement instructions move data between general-purpose and floating-point registers, or
between floating-point registers.
9: Floating-Point Page 4
The convert instructions convert the format of data in floating-point registers. Three data formats
are supported: .s = single-precision float, .d = double-precision, and .w = integer word.
The floating-point compare instructions compare floating-point registers for equality, less than, and
less than or equal. The FP compare instructions set the condition flags 0 to 7 to true (1) or false(0).
c.lt.s c.eq.s 4,$f5,$f8 if ($f5 < $f8) set flag 4 to true else false
c.le.s c.le.s $f10,$f11 if ($f10 <= $f11) set flag 0 to true else false
9: Floating-Point Page 5
The floating-point branch instructions (bc1t and bc1f) branch to the target address based on the
value of the specified condition flag (true or false).
The MARS tool provides the following syscall service numbers (passed in $v0) to print and read
single-precision and double-precision floating-point numbers:
Compilers follow the MIPS register usage convention when translating functions and procedures
into MIPS assembly-language code. The following table shows the MIPS software convention for
floating-point registers. Not following the MIPS software usage convention can result in serious
bugs when passing parameters, getting results, or using registers across function calls.
Registers Usage
$f0 - $f3 Floating-point procedure results
$f4 - $f11 Temporary floating-point registers, NOT preserved across procedure calls
$f12 - $f15 Floating-point parameters, NOT preserved across procedure calls. Additional
floating-point parameters should be pushed on the stack.
$f16 - $f19 More temporary registers, NOT preserved across procedure calls.
$f20 - $f31 Saved floating-point registers. Should be preserved across procedure calls.
9: Floating-Point Page 6
9.7 In-Lab Tasks
1. Convert by hand the number -123456789 into its 32-bit single-precision binary representation,
and then use the floating-point representation tool presented in Section 9.2 to verify your
answer. Show your work for a full mark.
2. Convert by hand the floating-point number 1 10010100 10011000001100000000000
(shown in binary) into its corresponding decimal value, and then use the floating-point
representation tool presented in Section 9.2 to verify your answer. Show your work for a full
mark.
3. Trace the following program by hand to determine the values of registers $f0 thru $f9. Notice
that array1 and array2 have the same elements, but in a different order. Comment on the
sums of array1 and array2 elements computed in registers $f4 and $f9, respectively. Now
use the MARS tool to trace the execution of the program and verify your results. What
conclusion can be made from this exercise?
.data
array1: .float 5.6e+20, -5.6e+20, 1.2
array2: .float 1.2, 5.6e+20, -5.6e+20
.text
la $t0, array1
lwc1 $f0, 0($t0)
lwc1 $f1, 4($t0)
lwc1 $f2, 8($t0)
add.s $f3, $f0, $f1
add.s $f4, $f2, $f3
la $t1, array2
lwc1 $f5, 0($t1)
lwc1 $f6, 4($t1)
lwc1 $f7, 8($t1)
add.s $f8, $f5, $f6
add.s $f9, $f7, $f8
4. Write an interactive program that inputs an integer sum and an integer count, computes, and
displays the average = (float) sum / (float) count as a single-precision floating-
point number. Hint: use the proper convert instruction to convert sum and count from integer
word into single-precision float.
5. Write an interactive program that inputs the coefficient of a quadratic equation, computes, and
displays the roots of the quadratic equation. All input, computation, and output should be done
using double-precision floating-point instructions and registers. The program should handle the
case of complex roots and displays the results properly.
9: Floating-Point Page 7
6. Square Root Calculation: Newton’s iterative method can be used to approximate the square root
of a number x. Let the initial guess be 1. Then each new guess can be computed as follows:
guess = ((x/guess) + guess) / 2;
Write a function called square_root that receives a double-precision parameter x, computes,
and returns the approximated value of the square root of x. Write a loop that repeats 20 times
and computes 20 guess values, then returns the final guess after 20 iterations. Use the MIPS
floating-point register convention (Section 9.6) to pass the parameter x and to return the
function result. All computation should be done using double-precision floating-point
instructions and registers. Compare the result of the sqrt.d instruction against the result of
your square_root function. What is the error in absolute value?
Write a function that computes the sine of a parameter x. Use the MIPS floating-point register
convention (Section 9.6) to pass the parameter x and to return the function result. All
computation should be done using double-precision floating-point instructions and registers.
Limit your computation to the first 20 terms of the series.
8. Converting a string into a floating-point number.
Write a function to convert a string, such as: "-13.232e-5" into a double-precision floating-
point number. The address of the string should be passed in register $a0. The function should
return the double-precision floating-point number in $f0. Conversion should terminate if the
end of the string is reached (NULL byte), or an invalid character is encountered, such as a
space, comma, etc.
9: Floating-Point Page 8
10 Introduction to Logisim
10.1 Objectives
Logisim is an educational and user friendly tool. It mainly consists of an interactive graphical
schematic editor and logic simulator. Logisim has a layout similar to most available software tools (
Figure 1).
• Explorer Pane: This pane contains the list of wiring, gates, multiplexers and other
components that are available for digital design in Logisim.
• Attribute Table: Gives detailed attributes of digital design components (e.g., AND, OR,
XOR gates). The attribute table allows you to alter the number of inputs/outputs that a
digital component may have.
• Canvas: The canvas is the area for you to create your digital circuits. In this area you may
simulate your circuits while designing in real time.
Logisim operates in two modes: Design and Simulate. Logisim is first operated in edit mode while
designing a logic circuit, then in simulate mode to see how the circuit would actually work.
10.3.2
0.3.2 Full Adder Design
In Logisim, a digital circuit may be designed using either one of two approaches:
• Drawing schematic: the user has to do the design by choosing the necessary components as
per his design, based on the digital schematic.
• Analyzing the circuit, by providing the truth tables with necessary inputs and outputs and let
Logisim do the design.
Schematic Design:
a. Set up the truth table for a simple Full Adder. Let A B and Cin be the inputs, S and Co the
outputs.
A B Ci S Co
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
= ⨁ ⨁ = ∧ + ∧ ⨁
c. Finally, design the circuit based on the above equations (Figure 10.2)
Circuit Analysis:
From the Menu Bar select "Window" then "Combinational Analysis". You may also go to Project
then, select "Analyze Circuit".
a. Create three inputs named Cin, A, and B.
b. Create two outputs named Cout and S.
c. Complete the truth table for a full adder.
d. Look at the "Expression" and "Minimized" tabs.
e. Press the "Build Circuit" button and enter FA0 as the name of the new circuit.
Once your design is complete, you may proceed by testing its functionality:
10.3.3
10.3.3 Test Mode
Select the Simulation Tool (Figure: 10.1) so you can test the circuit. You can toggle the value of an
input pin by clicking on it. Note that the values of all subsequent wires change instantly.
Probes:
As you test your circuit you may also add probes to check the values on wires or buses. These
values may be displayed in different formats.
a. From the explorer pane, select Wiring then Probe.
b. Set a direction for your probe,
c. Then select the numbering system that you would like to use to display your data.
10: Logisim Tutorial Page 4
Figure 10.3: Full Adder Circuit Designed Through Analysis
A complex circuit is usually broken down into smaller sub-circuits. These sub-circuits are designed
independently, tested, arranged into libraries, and then used to build the big circuit.
10.4
10.4.1 Circuit Appearance
Later on, you may use your circuit as a sub-circuit in another design. For that purpose, you may
want to alter the locations of the inputs and outputs or the appearance of the whole circuit as a
module. To do so, while your design is on the editor, click the Appearance Tool under the toolbar
(Figure 10.4). This will show your circuit as a block with the inputs and outputs; once you click any
of the input/output pins, on the bottom right corner your circuit will appear showing the location of
the pin you selected.
You may than modify the following properties of your circuit appearance: pin locations, general
appearance, and color.
10.4
10.4.2 Using Sub-
Sub-Circuits
One of the good features of Logisim is that you can use a circuit you have already built as a
building block in another, more complex circuit.
You can create a new sub-circuit with (Project > Add Circuit)
a. Save all your sub-circuits in one folder that you will make your own library.
b. On the menu bar select load Project > Load Library > Logism-Library then select your
folder and the module you would like to add.
c. Your circuit will appear as a new element in the explorer pane. Select your module then
drag and drop.
d. You may then add pins for input and output, or connect your module to your circuit using
wires.
Within the newly created full adder FA0 circuit designed above (Figures 10.2 or 10.3), change the
orientation of the Ci input so that it is facing south. Change the orientation of the Co output so that
it is facing north (Figure 10.5). Your circuit is ready to be used as a module or sub-circuit in a more
complex digital system design.
10.4.3
10.4.3 Splitters and Tunnels:
These are components mainly used to simplify wiring. Splitters are used to group or separate bits in
a bus, and tunnels are used to avoid lengthy connections between components in the design.
Splitters:
Tunnels:
• Using the full adder already designed as a block, we intend to design an 8-bit ripple carry
adder.
a. Add three splitters to the circuit. Each splitter should have an input width of 8 bits and
a fan out of 8.
10.5
10.5.2 A 2-Bit Multiplier Design:
Using the Combinational Analysis option in Logisim, design a 2 bit multiplier. First you need to set
up the truth table of your multiplier by choosing appropriate names for your inputs and outputs.
Derive the logical equations for all your outputs. Use the Design option to let Logisim design the
multiplier. Test your circuit and make it as one complete module. Compare your design to the
multiplier provided in Logisim.
10.5
10.5.3 Bonus Question:
Question:
Make use of your multiplier to design a 4 bit multiplier, test it on compare to the built-in multiplier
provided in Logisim.
Design a 2-bit multiplier in logisim. First you need to set up the truth table of your multiplier by
choosing appropriate names for your inputs and outputs. Derive the logical equations for all your
outputs. Design the 2-bit multiplier using logic gates. Test your circuit and make it as one complete
module. Compare your design to the multiplier provided in Logisim.
Design an 8×32-bit register file. The register file should contain 8 registers: R0 to R7 (each having
32 bits). It should have two read ports to read any two registers and one write port. Model the
register file in Logisim. Test the register file for correct operation by reading and writing different
register combinations.
Design a 32-bit ALU to perform the following eight operations: ADD, SUB, SLT, SLTU, AND,
OR, XOR, and NOR. The ALU should have two 32-bit inputs A and B, and a 3-bit input function f.
It should have a 32-bit output result, and four flags: output carry, overflow, negative sign, and zero
output. Model the ALU in Logisim. Test the ALU for correct operation by providing different
inputs and operations.
Design and model a 4×4 multiplier using four 2×2 multipliers and additional adders. Test the 4×4
multiplier using different inputs.
11.1 Objectives
Thus, two registers are read and one register is written in a single cycle. Writing happens on the
rising edge of the clock. During read operation, the register file bahaves as a combinational block
and once the RA or RB have valid data, the content of the read register will appear on BusA or
BusB after a certain access time.
The Arithmetic and Logic Unit (ALU) is the unit where most instructions are executed. It mainly
performs arithmetic, logic and shift operations. The ALU will have four main blocks: Arithmetic,
Comparator, Logic, and Shifter blocks as illustrated in Figure 11.3.
Arithmetic Block
The arithmetic block is composed of a 32-bit adder that can perform 32-bit addition and subtraction.
Its internal implementation can be designed using a ripple carry adder composed of 32 full-adder
blocks. The inputs A and B are two 32- bit integers and the output F is A+B or A-B. The arithmetic
block has a control signal, ADD/SUB, to determine whether the operation to be performed is
addition or subtraction. If this signal is 0, the adder will perform addition, otherwise it will perform
subtraction. Note that subtraction is performed using 2's complement representation as A-B= A + B'
+ 1. B' is computed by the XOR gates in the arithmetic block. The adder also generates Carry-Out
(Cout) and Overflow signal that can be used to test for correctness of the obtained result and for
Comparator Block
This block is mainly used for comparing signed and unsigned numbers. For the MIPS CPU, we just
need to compare if a number is less than another number for implementing the set on less than
instructions (SLT, SLTU, SLTI, SLTIU). For unsigned comparison of two numbers A and B, we
need to perform a subtraction operation, A- B, and then check whether we have a Carry-Out (Cout)
or not. If Carry-Out=0, this means that there was a borrow when B is subtracted from A and thus A
< B. However, if Carry-Out=1 then this implies that there was no borrow and hence A ≥ B.
Similarly, for comparing signed numbers A and B, we perform the subtraction operation A – B and
then we check both the Sign of the result and the Overflow signal. The Sign of the result is the most
significant bit of the result (i.e. but 31). If the Sign value is not equal to the Overflow value, then A
< B, otherwise, A ≥ B. This can be done by XORing the Sign and the Overflow signals. If the result
is 1, this means that A < B.
Shifter
Shifter Block
The shifter block is used to implement MIPS shift instructions (SLL, SRL, SRA). A shift operation
takes a binary number and shifts it left or right by a specified number of bits. There are two main
kinds of shift operations: logical,and arithmetic.
• Logical shift: whenever bits are shifted to the left or right, 0's are injected.
• Arithmetic shift: when bits are shifted to the left, 0's are injected, however when bits are
shifted to the right the sign bit (i.e., most significant bit) is injected.
The functionality of logical and arithmetic shit instructions is illustrated in Figure 11.5.
Logisim provides blocks for performing shift operations that can be used in the design of the Shifter
block.
A 32-bit 4x1 Multiplexor is used to select the output from either the Arithmetic block, the
Comparator block, the Logical block or the Shifter block. This is done through a 2-bit ALU
selection signal.
You are required to design a 32-bit MIPS-like processor with 31 general-purpose registers. The first
building blocks of the CPU are the ALU and the register file.
1. Task 1:
- Model the 32x32-bit register file given in Figure 11.2 as one single module in Logisim
- Test the register file for correct operation by writing to and reading from different register
combinations.
- Design a 32-bit ALU to perform all the arithmetic, logic and shift operations required by
your data path
- Model the your designed 32-bit ALU in Logisim
- Test the correct functionality of all operations implemented by the ALU.
One possible implementation of the shifter known as the Barrel Shifter is given in Figure 11.6. This
architecture has the advantage of performing a number of operations using the same hardware. You
are required to design such shifter and adapt it to your design. You need then to use it instead of the
shifter made up of available shifters in Logisim.
The shifter is implemented with multiplexers and wiring (splitters in our design), the shift
operations can be: SLL, SRL, SRA, or ROR. The input data is extended to 63 bits according to Shift
Op, and the 63 bits are shifted right according to S4S3S2S1S0
In this section, we will illustrate the design of a single-cycle CPU for a subset of the MIPS
instructions, shown in Table 12.1. These include the following instructions:
Although this subset does not include all the integer instructions, it is sufficient to illustrate the
design of datapath and control. Concepts used to implement the MIPS subset are used to construct a
broad spectrum of computers. For each instruction to be implemented, you need to identify all the
steps that need to be performed for the execution of each instruction expressed in register transfer
level (RTL) notation. These steps are summarized below for all the instructions to be implemented:
The first step in designing a datapath is to determine the requirements of the instruction set in terms
of components. These include the following:
Memory
Instruction memory where instructions are stored
Data memory where data is stored
Registers
32 × 32-bit general purpose registers, R0 is always zero
Read source register Rs
Read source register Rt
Write destination register Rt or Rd
Program counter PC register and Adder to increment PC
Sign and Zero extender for immediate constant
ALU for executing instructions
Combinational Elements
ALU, Adder
Immediate extender
Multiplexers
Storage Elements
Instruction memory
Data memory
PC register
Register file
The implementation of the instruction fetch process is illustrated in Figure 12.1. Since all the MIPS
instructions are 32-bit instructions (i.e. each instruction is stored in 4 address locations) and since
the instruction memory will be aligned on 4-byte boundary, the least significant 2-bits of instruction
addresses will always be 0. Thus, it is sufficient the update the most significant 30 bits of the PC.
To execute R-type instructions, we need to read the content of registers Rs and Rt, perform an ALU
operation on their contents and then store the result in the register file to register Rd. The datapath
for executing R-type instructions is shown in Figure 12.2.
The execution of the I-type instructions is similar to the R-type instructions with the difference that
the second operand is an immediate value instead of a register and that the destination register is
determined by Rt instead of Rd. The 16-bit immediate value needs to be extended to a 32-bit value
by either adding 16 0's or by extending the sign bit. The datapath for the execution of I-type
instructions is given in Figure 12.3.
The control signals needed for the execution of I-type instructions are:
Next we combine the datapath for executing both the R-type and I-type instructions as shown in
Figure 12.4. A multiplexer is added to select between Rd and Rt to be connected to Rw in the
register file to determine the destination register. Another multiplexer is added to select the second
ALU input as either the source register Rt data on BusB or the extended immediate.
The control signals needed for the execution of either R-type or I-type instructions are:
To execute the load and store instructions, we need to add data memory to the datapath. For the load
and store instructions, the ALU will be used to compute the memory address by adding the content
of register Rs coming through BusA and the sign-extended immediate value. For the load
instruction, we need to write the output of the data memory to register file. Thus, a third multiplexer
is added to select between the output of the ALU and the data memory to be written to the register
file. BusB is connected to Datain of Data Memory for store instructions. The updated CPU with the
capability for executing load and store instructions is shown in Figure 12.5.
The additional control signals needed for the execution of load and store instructions are:
For executing jump and branch instructions, we need to add a block, called NextPC, to compute the
target address. In addition, we need to add a multiplexer to select the input to the PC register to be
either the incremented PC address or the target address generated by NextPC block. For branch
The additional control signals needed for the execution of jump and branch instructions are:
The details of the NextPC block are illustrated in Fug. 12.7. For the jump instruction, the target
address is computed by concatenating the upper 4 bits of PC with Imm26 (i.e. the 26-bit immediate
value). However, for branch instructions the target address is computed by adding the sign-extended
version of the 16-bit immediate value with the incremented value of PC. Note that the immediate
value is computed by the assembler as [Terget – (PC + 4)]/4. Thus, to restore the target address we
need to multiply the immediate value by 4 (i.e. shift it 2 bits to the left) and then add PC+4 to it.
Since we are updating the most significant 30-bits of PC, this is achieved by adding PC+1 to the
immediate value. The PCSrc signal is set when a branch instruction is taken or a jump instruction is
executed, which is implemented by the equation PCSrc = J + (Beq . Zero) + (Bne . Zero').
The control unit of the single-cycle CPU can be decomposed into two parts Main Control and ALU
Control. The Main Control unit receives a 6-input opcode and generates all the needed control
signals other than the ALU control. However, the ALU Control gets a 6-bit function field from the
instruction and ALUCtrl signal from the Main Control. The single cycle CPU including the
datapath and control unit is illustrated in Figure 12.8.
One we have the Control Table, the control unit can be design easily using a 6x64 decoder that has
the 6-bit opcode as input and a signal for each instruction as output. Then each control signal will
be either an OR gate of the instructions signals that make this signal 1 or a NOR gate of the
instructions signals that make this signal 0, which ever results in a smaller gate size. The decoder
and the logic equations for the Main Control signals are shown in Figure 12.9.
1. For the instructions in the CPU that you are going to design, list all the steps that are needed
for the execution of each instruction in RTL notation.
2. Ensure that you have all the needed components for constructing your datapath.
3. Design the datapath for your CPU and model it using logisim.
4. Apply the needed values for the control signals needed for the execution of each instruction
to ensure correct functionality of the datapath.
5. Design the control unit of your CPU and model it using logisim.
6. Test the correct functionality of the control unit by ensuring that it generates the correct
control signal values for each instruction.
7. Model the single cycle CPU design in logisim by combining the datapath and control units.
8. Test the correct functionality of your CPU by storing all the implemented instructions in the
instruction memory and verifying the correct execution of each instruction.
The single-cycle data path design can be pipelined into a 5-stage pipeline by introducing registers at
the end of each stage as shown in Figure 13.1.
It should be observed that the destination register number is also pipelined by saving it across stages
as the writing of the content of the register is done at stage 5. In addition, the incremented value of
PC is also pipelined across stage 2 and 3 as it is need by the Next PC block.
The control signals are generated by the control unit in the second stage (i.e. ID state). In order to
pipeline the control unit, we need to save all the control signals needed by the later stages in
registers. For example, the control signals ExtOp, ALUSrc, ALUCtrl, J, Beq, Bne, MemRead,
MemWrite, Memtoreg and RegWrite need to be saved in the register separating stage 2 and stage 3.
However, only the control signals MemRead, MemWrite, Memtoreg and RegWrite are saved in the
register separating stages 3 and 4 as the remaining control signals are used in stage 3 and are not
needed in stages 4 and 5. The pipelined data path and control unit CPU is shown in Figure 13.2.
Hazards are situations that would cause incorrect execution if next instruction were launched during
its designated clock cycle. Hazards can be classified into three main types:
1. Structural hazards
Caused by resource contention
2. Data hazards
An instruction may compute a result needed by next instruction
Hardware can detect dependencies between instructions
3. Control hazards
Caused by instructions that change control flow (branches/jumps)
Delays in changing the flow of control
Hazards complicate pipeline control and limit performance. Dependency between instructions
causes a data hazard. An example of a data hazard is Read After Write – RAW Hazard. An example
of a RAW hazard is given below. Given two instructions I and J, where I comes before J.
Instruction J should read an operand after it is written by I.
The RAW Hazard occurs when J reads the operand before I writes it.
Figure 13.3 shows a sample MIPS program with several RAW hazards. The result of sub instruction
is needed by the add, or, and, & sw instructions. The instructions add & or will read old value of
$s2 from reg file as the value of $s has not been updated in the reg file yet. During CC5, $s2 is
written at the end of the cycle, and thus the old value is read. Thus, from this example we can see
that any dependency between an instruction and any of the three following instructions will cause a
RAW hazard.
One way of handling RAW data hazards is to stall the pipeline until the required destination register
is updated in the reg file. This requires freezing the execution of instructions that have such
dependency for three clock cycles to all the reg file to be updated. Figure 13.4 illustrates an
example of that. Due to the RAW dependency between the add and sub instructions, fetching the
operands of the add instruction has to wait until register $s2 of the sub instruction is updated. This
requires stalling the pipeline for three clock cycles from CC3 to CC5. Stall cycles delay the
execution of the add instruction & fetching of the or instruction. The add instruction cannot read
$s2 until beginning of CC6. The add instruction remains in the Instruction register until CC6 and
the PC register is not modified until the beginning of CC6.
However, instead of stalling the pipeline and wasting clock cycles, RAW hazards can be handled by
observing that the needed data is available in one of the stages 3 to 5 and can be used by forwarding
it to stage 2 instead of waiting until the data is written to the reg file. This idea is illustrated in
Figure 13.5.
The add instruction takes the content of $s2 from the ALU output. The or instruction takes the
content of $s2 from the output of the DM stage. The and instruction takes the content of $s2 from
the input of the reg file stage 5 and the content of $s6 from the ALU output at stage 2.
To implement forwarding, two multiplexers are added at the inputs of the A & B registers and data
from ALU stage, MEM stage, and WB stage is fed back to these multiplexers as shown in Figure
13.6. Two signals ForwardA and ForwardB control forwarding as shown in Table 13.1.
It should be observed that current instruction being decoded is in the Decode stage, the previous
instruction is in the Execute stage, the second previous instruction is in the Memory stage and the
third previous instruction in the Write Back stage. Thus, RAW data hazards detection conditions
and the generation of the forwarding control signals can be done as follows:
Signal Explanation
ForwardA = 0 First ALU operand comes from register file = Value of (Rs)
ForwardB = 0 Second ALU operand comes from register file = Value of (Rt)
1. Implement RAW hazard detection and the forwarding unit in your pipelined CPU design.
2. Add pipeline registers to your data path CPU design.
3. Add pipeline registers to pipeline the control signals in your CPU design.
4. Verify the correctness of your pipelined CPU design by executing the following instruction
sequence:
How many clock cycles your pipelined CPU takes to execute this program?
Unfortunately, not all data hazards can be forwarded. Load has a delay that cannot be eliminated by
forwarding. In the example shown in Figure 14.1, the LW instruction does not read data until the
end of CC4. Thus, data cannot be forward to ADD instruction at the end of CC3. However, the
Load instruction can forward data to the 2nd next and later instruction.
Whenever a RAW hazard after a Load instruction occurs, we have no choice but to stall the
pipeline. Stalling the pipeline requires that the instruction in the ID stage be replaced by a NOP (No
Operation) instruction and freezing the content of PC and Instruction registers for one clock cycle.
This is achieved by replacing the control signals by 0 signals (called a Bubble) and disabling the PC
and IR registers as illustrated in Figure 14.2. The Condition for stalling the pipeline is as follows:
Figure 14.2: Pipeline stall due to data hazard following a Load instruction.
Jump and Branch can cause great performance loss. Jump instruction needs only the jump target
address while Branch instruction needs two things:
• Branch Result Taken or Not Taken
• Branch Target Address
o PC + 4 If Branch is NOT taken
o PC + 4 + 4 × immediate If Branch is Taken
Jump and Branch targets are computed in EX stage at which point two instructions have already
been fetched. For Jump, the two instructions need to be flushed. For Branch, the two instructions
are flushed if Branch is Taken. Figure 14.3 illustrates an example of handling a control hazard by
replacing the two fetched instructions by bubbles. Control logic detects a Branch instruction in the
2nd Stage. The ALU computes the Branch outcome in the 3rd Stage. Next1 and Next2 instructions
will be fetched anyway. Thus, we need to convert Next1 and Next2 into bubbles if branch is taken.
Converting the instruction in the 2nd stage into a bubble can be achieved by replacing the control
signals by a bubble. Converting the instruction in 1st stage into a NOP instruction is achieved by
Resetting the IR register using synchronous reset. The reset is signal is connected to the PCSrc
signal which is set when the branch is taken. Figure 14.4 illustrates the implementation of handling
control hazards.
1. Implement handling of RAW hazard occurring after LOAD instruction in your pipelined CPU
design.
2. Test the correctness of your implementation by running the MIPS code fragment given below and
verify its correct execution: