U2 - ARM Processor

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 85

UNIT-II

 ARM PROCESSOR AND PERIPHERALS 9

ARM Architecture Versions – ARM Architecture – Instruction Set – Stacks and

04/02/24 1
Instruction sets

 Computer architecture taxonomy.


 Assembly language.

04/02/24 2
von Neumann architecture

 Memory holds data, instructions.


 Central processing unit (CPU) fetches
instructions from memory.
 Separate CPU and memory distinguishes
programmable computer.
 CPU registers help out: program counter
(PC), instruction register (IR), general-
purpose registers, etc.
04/02/24 3
CPU + memory

address
200
PC
memory data
CPU
200 ADD r5,r1,r3 ADD IR
r5,r1,r3

04/02/24 4
Harvard architecture

address
data memory
data PC
CPU
address

program memory data

04/02/24 5
04/02/24 6
von Neumann vs. Harvard

 Harvard can’t use self-modifying code.


 Harvard allows two simultaneous memory
fetches.
 Most DSPs use Harvard architecture for
streaming data:
 greater memory bandwidth;
 more predictable bandwidth.

04/02/24 7
RISC vs. CISC

 Complex instruction set computer (CISC):


 many addressing modes;
 many operations.
 Reduced instruction set computer (RISC):
 load/store;
 pipelinable instructions.

04/02/24 8
04/02/24 9
Instruction set
characteristics

 Fixed vs. variable length.


 Addressing modes.
 Number of operands.
 Types of operands.

04/02/24 10
Programming model

 Programming model: registers visible to


the programmer.
 Some registers are not visible (IR).

04/02/24 11
Multiple implementations

 Successful architectures have several


implementations:
 varying clock speeds;
 different bus widths;
 different cache sizes;
 etc.

04/02/24 12
Assembly language

 One-to-one with instructions (more or


less).
 Basic features:
 One instruction per line.
 Labels provide names for addresses (usually
in first column).
 Instructions often start in later columns.
 Columns run to end of line.

04/02/24 13
ARM assembly language
example

label1 ADR r4,c


LDR r0,[r4] ; a comment
ADR r4,d
LDR r1,[r4]
SUB r0,r0,r1 ; comment

04/02/24 14
Pseudo-ops

 Some assembler directives don’t


correspond directly to instructions:
 Define current address.
 Reserve storage.
 Constants.

04/02/24 15
ARM FEATURES
Some of the general features of ARM are listed here.
ARM Processors have a good speed of execution to power consumption
ratio.
They have a wide range of clock frequency ranging from 1MHz to few GHz.
They support direct execution of Java bytecodes using ARM’s Java Jazelle
DBX.
ARM Processors have built in hardware for debugging.
Supports enhanced instructions for DSP operations.

04/02/24 16
04/02/24 17
ARM Processor Family
 ARM has several processors that are grouped into number of families
based on the processor core they are implemented with. The architecture
of ARM processors has continued to evolve with every family. Some of
the famous ARM Processor families are ARM7, ARM9, ARM10 and
ARM11. The following table shows some of the commonly found ARM
Families along with their architectures.

04/02/24 18
ARM Nomenclature

 The letters or words after “ARM” are used to indicate the features of a processor.
 x – Family or series
 y – Memory Management/Protection Unit
 z – Cache
 T – 16 bit Thumb decoder
 D – JTAG Debugger
 M – Fast Multiplier
 I – Embedded In-circuit Emulator (ICE) Macrocell
 E – Enhanced Instructions for DSP (assumes TDMI)
 J – Jazelle (for accelerated JAVA execution)
 F – Vector Floating-point Unit
 S – Synthesizable Version

04/02/24 19
ARM Processors

 ARM Processors can be divided into ARM Classic Processors, ARM


Embedded Processors and ARM Application Processors.

ARM Classic processors include ARM7, ARM9 and ARM11 families


and ARM7TMDI is still the highest shipping 32-bit processor. ARM7
based processors are still used in many small and simple 32-bit devices.

04/02/24 20
 Even though ARM7 or other classic ARM Processors can be used for
small scale embedded systems, newer embedded systems are built using
the advanced ARM embedded processors or the Cortex-M processors and
Cortex-R Processors.

04/02/24 21
ARM Embedded
Processors
ARM Cortex-M Processors have a Microcontroller profile while the Cortex-R
Processors have a Real time profile.
ARM Cortex-M Processors are energy efficient, simple to implement and are
mainly developed for advanced embedded applications. ARM Cortex-M
Processors are further divided into several processor cores like Cortex-M0, Cortex-
M0+, Cortex-M3, Cortex-M4 and Cortex-M7.
ARM Cortex-R Series of processors provide solution for real time embedded
systems. They provide high reliability, high fault tolerance and real time responses.
Cortex-R series of processors are used in systems where high performance is
required and timing deadlines are important.
The Cortex-R family includes the processor cores like Cortex-R4, Cortex-R5,
Cortex-R7 and Cortex-R8.

04/02/24 22
04/02/24 23
ARM Application
Processors
 ARM Cortex-A Series of processors are the highest
performance processors from ARM. They are used in
powerful mobile devices, compelling technology products like
network devices, consumer appliances, automation systems,
automobiles and other embedded systems.
 The Cortex-A Processors are again divided into high
performance, high efficiency and ultra-high efficiency type
processors. Each sub division has several types of processor
cores.

04/02/24 24
04/02/24 25
ARM instruction set

 ARM versions.
 ARM assembly language.
 ARM programming model.
 ARM memory organization.
 ARM data operations.
 ARM flow of control.

04/02/24 26
ARM versions

 ARM architecture has been extended over


several versions.
 We will concentrate on ARM7.

04/02/24 27
ARM assembly language

 Fairly standard assembly language:

LDR r0,[r8] ; a comment


labelADD r4,r0,r1

04/02/24 28
ARM programming model

r0 r8
r1 r9 0
31
r2 r10
r3 r11 CPSR
r4 r12
r5 r13
r6 r14 NZCV
r7 r15 (PC)

CPSR-current program status register


04/02/24 29
Endianness

 Relationship between bit and byte/word


ordering defines endianness:

bit 31 bit 0 bit 0 bit 31


byte 3 byte 2 byte 1 byte 0 byte 0 byte 1 byte 2 byte 3

little-endian big-endian

04/02/24 30
ARM data types

 Word is 32 bits long.


 Word can be divided into four 8-bit bytes.
 ARM addresses can be 32 bits long.
 Address refers to byte.
 Address 4 starts at byte 4.
 Can be configured at power-up as either
little- or bit-endian mode.

04/02/24 31
ARM status bits

 Every arithmetic, logical, or shifting


operation sets CPSR bits:
 N (negative), Z (zero), C (carry), V
(overflow).
 Examples:
 -1 + 1 = 0: NZCV = 0110.
 231-1+1 = -231: NZCV = 0101.

04/02/24 32
ARM data instructions

 Basic format:
ADD r0,r1,r2
 Computes r1+r2, stores in r0.
 Immediate operand:
ADD r0,r1,#2
 Computes r1+2, stores in r0.

04/02/24 33
ARM data instructions
 ADD, ADC : add (w.  AND, ORR, EOR
carry)  BIC : bit clear
 SUB, SBC : subtract  LSL, LSR : logical shift
(w. carry) left/right
 RSB, RSC : reverse  ASL, ASR : arithmetic
subtract (w. carry) shift left/right
 MUL, MLA : multiply  ROR : rotate right
(and accumulate)  RRX : rotate right
extended with C
04/02/24 34
Data operation varieties

 Logical shift:
 fills with zeroes.
 Arithmetic shift:
 fills with ones.
 RRX performs 33-bit rotate, including C bit
from CPSR above sign bit.

04/02/24 35
04/02/24 36
ARM comparison
instructions

 CMP : compare
 CMN : negated compare(ADDITION)
 TST : bit-wise test(AND)
 TEQ : bit-wise negated test(EX-OR)
 These instructions set only the NZCV bits
of CPSR.

04/02/24 37
ARM move instructions

 MOV, MVN : move (negated)

MOV r0, r1 ; sets r0 to r1

04/02/24 38
ARM load/store
instructions

 LDR, LDRH, LDRB : load (half-word, byte)


 STR, STRH, STRB : store (half-word, byte)
 Addressing modes:
 register indirect : LDR r0,[r1]
 with second register : LDR r0,[r1,-r2]
 with constant : LDR r0,[r1,#4]

04/02/24 39
ARM ADR pseudo-op

 Cannot refer to an address directly in an


instruction.
 Generate value by performing arithmetic
on PC.
 ADR pseudo-op generates instruction
required to calculate address:
ADR r1,FOO

04/02/24 40
Example: C assignments

 C:
x = (a + b) - c;
 Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
ADR r4,c ; get address for c
LDR r2[r4] ; get value of c

04/02/24 41
C assignment, cont’d.
SUB r3,r3,r2 ; complete computation of x
ADR r4,x ; get address for x
STR r3[r4] ; store value of x

04/02/24 42
Example: C assignment

 C:
y = a*(b+c);
 Assembler:
ADR r4,b ; get address for b
LDR r0,[r4] ; get value of b
ADR r4,c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial result
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a

04/02/24 43
C assignment, cont’d.
MUL r2,r2,r0 ; compute final value for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y

04/02/24 44
Example: C assignment

 C:
z = (a << 2) | (b & 15);
 Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MOV r0,r0,LSL 2 ; perform shift
ADR r4,b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND
ORR r1,r0,r1 ; perform OR

04/02/24 45
C assignment, cont’d.
ADR r4,z ; get address for z
STR r1,[r4] ; store value for z

04/02/24 46
Additional addressing
modes

 Base-plus-offset addressing:
LDR r0,[r1,#16]
 Loads from location r1+16
 Auto-indexing increments base register:
LDR r0,[r1,#16]!
 Post-indexing fetches, then does offset:
LDR r0,[r1],#16
 Loads r0 from r1, then adds 16 to r1.
04/02/24 47
ARM flow of control

 All operations can be performed


conditionally, testing CPSR:
 EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE,
LT, GT, LE
 Branch operation: (BYTE ADDRESSABLE)
B #100 (flow of control- changes)
 Can be performed conditionally.

04/02/24 48
04/02/24 49
Example: if statement

 C:
if (a > b) { x = 5; y = c + d; } else x = c - d;
 Assembler:
; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BGE fblock ; if a >= b, branch to false block

04/02/24 50
If statement, cont’d.
; true block
MOV r0,#5 ; generate value for x
ADR r4,x ; get address for x
STR r0,[r4] ; store x
ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y
ADR r4,y ; get address for y
STR r0,[r4] ; store y
B after ; branch around false block

04/02/24 51
If statement, cont’d.
; false block
fblock ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute a-b
ADR r4,x ; get address for x
STR r0,[r4] ; store value of x
after ...

04/02/24 52
Summary

 Load/store architecture
 Most instructions are RISC, operate in
single cycle.
 Some multi-register operations take longer.
 All instructions can be executed
conditionally.

04/02/24 53
CPUs

 Input and output.


 Supervisor mode, exceptions, traps.
 Co-processors.

04/02/24 54
I/O devices

 Usually includes some non-digital


component.
 Typical digital interface to CPU:

status

mechanism
reg
CPU
data
reg

04/02/24 55
Application: 8251 UART

 Universal asynchronous receiver


transmitter (UART) : provides serial
communication.
 8251 functions are integrated into
standard PC interface chip.
 Allows many communication parameters
to be programmed.

04/02/24 56
Serial communication

 Characters are transmitted separately:

no
char

start bit 0 bit 1 ... bit n-1 stop

time

04/02/24 57
Serial communication
parameters

 Baud (bit) rate.


 Number of bits per character.
 Parity/no parity.
 Even/odd parity.
 Length of stop bit (1, 1.5, 2 bits).

04/02/24 58
8251 CPU interface

status
(8 bit)
CPU xmit/
8251
rcv
data serial
(8 bit) port

04/02/24 59
Programming I/O

 Two types of instructions can support I/O:


 special-purpose I/O instructions;
 memory-mapped load/store instructions.
 Intel x86 provides in, out instructions.
Most other CPUs use memory-mapped
I/O.
 I/O instructions do not preclude memory-
mapped I/O.
04/02/24 60
ARM memory-mapped I/O

 Define location for device:


DEV1 EQU 0x1000
 Read/write code:
LDR r1,#DEV1 ; set up device adrs
LDR r0,[r1] ; read DEV1
LDR r0,#8 ; set up value to write
STR r0,[r1] ; write value to device

04/02/24 61
Interrupt I/O

 Busy/wait is very inefficient.


 CPU can’t do other work while testing device.
 Hard to do simultaneous I/O.
 Interrupts allow a device to change the
flow of control in the CPU.
 Causes subroutine call to handle device.

04/02/24 62
Interrupt interface

intr request
status

mechanism
intr ack reg
PC
IR

CPU
data/address data
reg

04/02/24 63
Interrupt behavior

 Based on subroutine call mechanism.


 Interrupt forces next instruction to be a
subroutine call to a predetermined
location.
 Return address is saved to resume executing
foreground program.

04/02/24 64
Priorities and vectors

 Two mechanisms allow us to make


interrupts more specific:
 Priorities determine what interrupt gets CPU
first.
 Vectors determine what code is called for
each type of interrupt.
 Mechanisms are orthogonal: most CPUs
provide both.

04/02/24 65
Prioritized interrupts

device 1 device 2 device n

interrupt
acknowledge

L1 L2 .. Ln
CPU

04/02/24 66
Interrupt prioritization

 Masking: interrupt with priority lower than


current priority is not recognized until
pending interrupt is complete.
 Non-maskable interrupt (NMI): highest-
priority, never masked.
 Often used for power-down.

04/02/24 67
Interrupt vectors

 Allow different devices to be handled by


different code.
 Interrupt vector table:

Interrupt handler 0
vector
handler 1
table head
handler 2
handler 3

04/02/24 68
Interrupt sequence

 CPU acknowledges request.


 Device sends vector.
 CPU calls handler.
 Software processes request.
 CPU restores state to foreground
program.

04/02/24 69
Sources of interrupt
overhead

 Handler execution time.


 Interrupt mechanism overhead.
 Register save/restore.
 Pipeline-related penalties.
 Cache-related penalties.

04/02/24 70
Supervisor mode
 May want to provide protective barriers between
programs.
 Avoid memory corruption.
 Need supervisor mode to manage the various program
 C55x does not have a supervisor mode.
 CPU in supervisor mode is called SWI
 The old value of the CPSR just before the SWI
is stored in a register called the saved program statu
register (SPSR).

04/02/24 71
Exception

 Exception: internally detected error.


 Exceptions are synchronous with
instructions but unpredictable.
 Build exception mechanism on top of
interrupt mechanism.
 Exceptions are usually prioritized and
vectorized.

04/02/24 72
Trap

 Trap (software interrupt): an exception


generated by an instruction.
 Call supervisor mode.
 ARM uses SWI instruction for traps.
 SHARC offers three levels of software
interrupts.
 Called by setting bits in IRPTL register.

04/02/24 73
Co-processor

 Co-processor: added function unit that is


called by instruction.
 Floating-point units are often structured as
co-processors.
 ARM allows up to 16 designer-selected co-
processors.
 Floating-point co-processor uses units 1, 2.
 C55x uses co-processors as well.
04/02/24 74
CPUs

 CPU performance
 CPU power consumption.

04/02/24 75
Elements of CPU
performance

 Cycle time.
 CPU pipeline.
 Memory system.

04/02/24 76
Pipelining

 Several instructions are executed


simultaneously at different stages of
completion.
 Various conditions can cause pipeline
bubbles that reduce utilization:
 branches;
 memory system delays;
 etc.

04/02/24 77
Performance measures

 Latency: time it takes for an instruction to


get through the pipeline.
 Throughput: number of instructions
executed per time period.
 Pipelining increases throughput without
reducing latency.

04/02/24 78
ARM7 pipeline

 ARM 7 has 3-stage pipe:


 fetch instruction from memory;
 decode opcode and operands;
 execute.

04/02/24 79
ARM pipeline execution

fetch decode execute add r0,r1,#5

sub r2,r3,r6 fetch decode execute

cmp r2,#3 fetch decode execute

time
1 2 3

04/02/24 80
Pipeline stalls

 If every step cannot be completed in the


same amount of time, pipeline stalls.
 Bubbles introduced by stall increase
latency, reduce throughput.

04/02/24 81
CPU power consumption

 Most modern CPUs are designed with


power consumption in mind to some
degree.
 Power vs. energy:
 heat depends on power consumption;
 battery life depends on energy consumption.

04/02/24 82
CMOS power consumption

 Voltage drops: power consumption


proportional to V2.
 Toggling: more activity means more
power.
 Leakage: basic circuit characteristics; can
be eliminated by disconnecting power.

04/02/24 83
CPU power-saving
strategies

 Reduce power supply voltage.


 Run at lower clock frequency.
 Disable function units with control signals
when not in use.
 Disconnect parts from power supply when
not in use.

04/02/24 84
Power management

 Static power and dynamic power


management
 Static power: Eg-power down mode
 Dynamic power: CPU may turn off certain
sections of the CPU when the instructions
being executed do not need them.

04/02/24 85

You might also like