ARM7TDMI
ARM7TDMI
ARM7TDMI
January 1999
is the registered trademark of Atmel Corporation, 2325 Orchard Parkway, San Jose, CA 95131
Document Details Title: ARM7TDMI (Thumb) Data Sheet Literature Number: 0673B Revision: B Date: January 1999 Printed and distributed by Atmel ES2 in accordance with the license agreement existing between ARM for the ARM7TDMI microprocessor. Revision History Revision A: July 1996 Revision B: Reformatting of Revision A (numbering removed) and electrical characteristics removed. From now on, please see one of the following datasheets for electrical characteristics: ARM7TDMI Embedded Core ATC50 Electrical Characteristics (0.5 micron three-layer-metal CMOS process intended for use with a supply voltage of 3.3V 0.3V) ARM7TDMI Embedded Core ATC50/E2 Electrical Characteristics (0.5 micron three-layer-metal CMOS/ NVM process intended for use with a supply voltage of 3.3V 0.3V) ARM7TDMI Embedded Core ATC35 Electrical Characteristics (0.35 micron three-layer-metal CMOS process intended for use with a supply voltage of 3.3V 0.3V) Copyright Advanced RISC Machines Limited (ARM) 1996 ARM, Thumb and ARM Powered are registered trademarks of ARM Limited. The ARM7TDMI EmbeddedICE, BlackICE and ICEbreaker are trademarks of ARM Ltd. Neither the whole nor any part of the information contained in, or the product described in, this datasheet may be adapted or reproduced in any material form except with the prior written permission of the copyright holder. The product described in this datasheet is subject to continuous developments and improvements. All particulars of the product and its use contained in this datasheet are given by ARM in good faith. However, all warranties implied or expressed, including but not limited to implied warranties or merchantability, or fitness for purpose are excluded. This datasheet is intended only to assist the reader in the use of the product. ARM Ltd. shall not be liable for any loss or damage arising from the use of any information in this datasheet, or any error or omission in such information, or any incorrect use of the product. Important Notice Atmel Corporation makes no warranty for the use of its products, other than those expressly contained in the Companys standard warranty which is detailed in Atmels Terms and Conditions located on the Companys website. The Company assumes no responsibility for any errors which may appear in this document, reserves the right to change devices or specifications detailed herein at any time without notice, and does not make any commitment to update the information contained herein. No licenses to patents or other intellectual property of Atmel are granted by the Company in connection with the sale of Atmel products, expressly or by implication. Atmels products are not authorized for use as critical components in life support devices or systems. Marks bearing and/or TM are registered trademarks and trademarks of Atmel Corporation. Terms and product names in this document may be trademarks of others. Atmel ES2 Zone Industrielle 13106 Rousset Cedex France Tel: (+33) (0)4 42 53 60 00 Fax: (+33) (0)4 42 53 60 01 For other Atmel addresses see back page.
Table of Contents
Architectural Overview ....................................................................................................................... 1 Introduction ..................................................................................................................... 1 ARM7TDMI Architecture ................................................................................................. 2 ARM7TDMI Block Diagram ............................................................................................. 3 ARM7TDMI Core Diagram .............................................................................................. 4 ARM7TDMI Functional Diagram ..................................................................................... 5 Signal Description .............................................................................................................................. 7 Programmers Model ........................................................................................................................ 15 Processor Operating States .......................................................................................... 15 Switching State ............................................................................................................. 15 Memory Formats ........................................................................................................... 16 Instruction Length .......................................................................................................... 17 Data Types .................................................................................................................... 17 Operating Modes ........................................................................................................... 17 Registers ....................................................................................................................... 17 The Program Status Registers ...................................................................................... 21 Exceptions ..................................................................................................................... 23 Interrupt Latencies ........................................................................................................ 26 Reset ............................................................................................................................. 26 ARM Instruction Set ......................................................................................................................... 27 Instruction Set Summary ............................................................................................... 28 The Condition Field ....................................................................................................... 30 Branch and Exchange (BX) ........................................................................................... 30 Branch and Branch with Link (B, BL) ............................................................................ 32 Data Processing ............................................................................................................ 34 PSR Transfer (MRS, MSR) ........................................................................................... 40 Multiply and Multiply-Accumulate (MUL, MLA) ............................................................. 44 Multiply Long and Multiply-Accumulate Long (MULL,MLAL) ........................................ 46 Single Data Transfer (LDR, STR) ................................................................................. 48 Halfword and Signed Data Transfer(LDRH/STRH/LDRSB/LDRSH) ............................. 52 Block Data Transfer (LDM, STM) .................................................................................. 56 Single Data Swap (SWP) .............................................................................................. 62 Software Interrupt (SWI) ............................................................................................... 64 Coprocessor Data Operations (CDP) ............................................................................ 66 Coprocessor Data Transfers (LDC, STC) ..................................................................... 68 Coprocessor Register Transfers (MRC, MCR) ............................................................. 70 Undefined Instruction .................................................................................................... 71 Instruction Set Examples .............................................................................................. 72 Thumb Instruction Set ...................................................................................................................... 77 Format Summary .......................................................................................................... 78 Opcode Summary ......................................................................................................... 79
i
Format 1: move shifted register .................................................................................... 80 Format 2: add/subtract .................................................................................................. 81 Format 3: move/compare/add/subtract immediate ........................................................ 83 Format 4: ALU operations ............................................................................................. 84 Format 5: Hi register operations/branch exchange ....................................................... 86 Format 6: PC-relative load ............................................................................................ 89 Format 7: load/store with register offset ........................................................................ 90 Format 8: load/store sign-extended byte/halfword ........................................................ 92 Format 9: load/store with immediate offset ................................................................... 94 Format 10: load/store halfword ..................................................................................... 96 Format 11: SP-relative load/store ................................................................................. 98 Format 12: load address ............................................................................................. 100 Format 13: add offset to Stack Pointer ........................................................................ 101 Format 14: push/pop registers .................................................................................... 102 Format 15: multiple load/store ..................................................................................... 104 Format 16: conditional branch ..................................................................................... 105 Format 17: software interrupt ...................................................................................... 107 Format 18: unconditional branch ................................................................................. 108 Format 19: long branch with link ................................................................................. 109 Instruction Set Examples ............................................................................................ 110 Memory Interface ............................................................................................................................ 117 Overview ..................................................................................................................... 117 Cycle Types ................................................................................................................ 118 Data Transfer Size ...................................................................................................... 124 Instruction Fetch .......................................................................................................... 124 Memory Management ................................................................................................. 126 Locked Operations ...................................................................................................... 126 Stretching Access Times ............................................................................................. 126 The ARM Data Bus ..................................................................................................... 127 The External Data Bus ................................................................................................ 129 Coprocessor Interface .................................................................................................................... 135 Overview ..................................................................................................................... 135 Interface Signals ......................................................................................................... 136 Register Transfer Cycle .............................................................................................. 137 Privileged Instructions ................................................................................................. 137 Idempotency ................................................................................................................ 137 Undefined Instructions ................................................................................................ 137 Debug Interface ............................................................................................................................... 139 Overview ..................................................................................................................... 139 Debug Systems ........................................................................................................... 140 Debug Interface Signals .............................................................................................. 141
ii
Table of Contents
Table of Contents
Scan Chains and JTAG Interface ................................................................................ 143 Reset ........................................................................................................................... 145 Pullup Resistors .......................................................................................................... 145 Instruction Register ..................................................................................................... 145 Public Instructions ....................................................................................................... 145 Test Data Registers .................................................................................................... 147 ARM7TDMI Core Clocks ............................................................................................. 151 Determining the Core and System State ..................................................................... 152 The PCs Behaviour During Debug ............................................................................. 155 Priorities / Exceptions .................................................................................................. 157 Scan Interface Timing ................................................................................................. 158 Debug Timing .............................................................................................................. 161 ICEBreaker Module ......................................................................................................................... 163 Overview ..................................................................................................................... 164 The Watchpoint Registers ........................................................................................... 165 Programming Breakpoints ........................................................................................... 168 Programming Watchpoints .......................................................................................... 169 The Debug Control Register ....................................................................................... 169 Debug Status Register ................................................................................................ 170 Coupling Breakpoints and Watchpoints ...................................................................... 171 Disabling ICEBreaker .................................................................................................. 172 ICEBreaker Timing ...................................................................................................... 172 Programming Restriction ............................................................................................. 172 Debug Communications Channel ............................................................................... 173 Instruction Cycle Operations ......................................................................................................... 175 Introduction ................................................................................................................. 176 Branch and Branch with Link ...................................................................................... 176 THUMB Branch with Link ............................................................................................ 177 Branch and Exchange (BX) ......................................................................................... 177 Data Operations .......................................................................................................... 178 Multiply and Multiply Accumulate ................................................................................ 179 Load Register .............................................................................................................. 180 Store Register ............................................................................................................. 180 Load Multiple Registers ............................................................................................... 181 Store Multiple Registers .............................................................................................. 182 Data Swap ................................................................................................................... 182 Software Interrupt and Exception Entry ...................................................................... 183 Coprocessor Data Operation ...................................................................................... 183 Coprocessor Data Transfer (from memory to coprocessor) ........................................ 184 Coprocessor Data Transfer (from coprocessor to memory) ........................................ 185 Coprocessor Register Transfer (Load from coprocessor) ........................................... 186 Coprocessor Register Transfer (Store to coprocessor) .............................................. 186
iii
Undefined Instructions and Coprocessor Absent ........................................................ 187 Unexecuted Instructions .............................................................................................. 187 Instruction Speed Summary ........................................................................................ 188 AC/DC Parameters .......................................................................................................................... 189 Timing Diagrams ......................................................................................................... 190
iv
Table of Contents
Architectural Overview
This chapter introduces the ARM7TDMI architecture and shows block, core, and functional diagrams for the ARM7TDMI.
Introduction
The ARM7TDMI is a member of the Advanced RISC Machines (ARM) family of general purpose 32-bit microprocessors, which offer high performance for very low power consumption and price. The ARM architecture is based on Reduced Instruction Set Computer (RISC) principles, and the instruction set and related decode mechanism are much simpler than those of microprogrammed Complex Instruction Set Computers. This simplicity results in a high instruction throughput and impressive real-time interrupt response from a small and cost-effective chip. Pipelining is employed so that all parts of the processing and memory systems can operate continuously. Typically, while one instruction is being executed, its successor is being decoded, and a third instruction is being fetched from memory. The ARM memory interface has been designed to allow the performance potential to be realised without incurring high costs in the memory system. Speed-critical control signals are pipelined to allow system control functions to be implemented in standard low-power logic, and these control signals facilitate the exploitation of the fast local access modes offered by industry standard dynamic RAMs.
Architectural Overview
Rev. 0673B12/98
ARM7TDMI Architecture
The ARM7TDMI is a 3-stage pipeline, 32-bit RISC processor. The processor architecture is Von Neumann load/store architecture, which is characterized by a single data and address bus for instructions and data. The CPU has two instruction sets, the ARM and the Thumb instruction set. The ARM instruction set has 32-bit wide instructions and provides maximum performance. Thumb instructions are 16-bits wide and give maximum code-density. Instructions operate on 8-, 16-, and 32-bit data types. The CPU has seven operating modes (see Operating Modes on page 17). Each operating mode has dedicated banked registers for fast exception handling. The processor has a total of 37 32-bit registers, including 6 status registers (see Registers).
THUMBs Advantages
THUMB instructions operate with the standard ARM register configuration, allowing excellent interoperability
between ARM and THUMB states. Each 16-bit THUMB instruction has a corresponding 32-bit ARM instruction with the same effect on the processor model. The major advantage of a 32-bit (ARM) architecture over a 16-bit architecture is its ability to manipulate 32-bit integers with single instructions, and to address a large address space efficiently. When processing 32-bit data, a 16-bit architecture will take at least two instructions to perform the same task as a single ARM instruction. However, not all the code in a program will process 32-bit data (for example, code that performs character string handling), and some instructions, like Branches, do not process any data at all. If a 16-bit architecture only has 16-bit instructions, and a 32-bit architecture only has 32-bit instructions, then overall the 16-bit architecture will have better code density, and better than one half the performance of the 32-bit architecture. Clearly 32-bit performance comes at the cost of code density. THUMB breaks this constraint by implementing a 16-bit instruction length on a 32-bit architecture, making the processing of 32-bit data efficient with a compact instruction coding. This provides far better performance than a 16-bit architecture, with better code density than a 32-bit architecture. THUMB also has a major advantage over other 32-bit architectures with 16-bit instructions. This is the ability to switch back to full ARM code and execute at full speed. Thus critical loops for applications such as fast interrupts DSP algorithms can be coded using the full ARM instruction set, and linked with THUMB code. The overhead of switching from THUMB code to ARM code is folded into sub-routine entry time. Various portions of a system can be optimised for speed or for code density by switching between THUMB and ARM execution as appropriate.
Architecture
Architecture
ARM7TDMI Block Diagram
Figure 1. ARM7TDMI Block Diagram
Scan Chain 2 Scan Chain 0
RANGEOUT0 RANGEOUT1 EXTERN1 EXTERN0 nOPC nRW MAS[1:0] nTRANS nMREQ A[31:0]
ICEBreaker
Core
Bus Splitter
Scan Chain 1
TAP controller
TDO
Address Register P C b u s
Address Incrementer
A L U b u s
32-bit ALU
DBGRQI BREAKPTI DBGACK ECLK nEXEC ISYNC BL[3:0] APE MCLK nWAIT nRW MAS[1:0] nIRQ nFIQ nRESET ABORT nTRANS nMREQ nOPC SEQ LOCK nCPI CPA CPB nM[4:0] TBE TBIT HIGHZ
Instruction Pipeline & Read Data Register & Thumb Instruction Decoder
Architecture
Architecture
ARM7TDMI Functional Diagram
Figure 3. ARM7TDMI Functional Diagram
MCLK Clocks nWAIT ECLK nIRQ Interrupts nFIQ ISYNC nRESET BUSEN HIGHZ BIGEND nENIN nENOUT Bus Controls nENOUTI ABE APE ALE DBE TBE nM[4:0] TCK TMS TDI nTRST TDO TAPSM[3:0] IR[3:0] nTDOEN TCK1 TCK2 SCREG[3:0] 11
Boundary Scan
ARM7TDMI
TBIT A[31:0]
APE
BUSDIS ECAPCLK VDD Power VSS DBGRQ BREAKPT DBGACK nEXEC Debug EXTERN 1 EXTERN 0 DBGEN RANGEOUT0 RANGEOUT1 DBGRQI COMMRX COMMTX
D[31:0]
DIN[31:0] nMREQ SEQ nRW MAS[1:0] BL[3:0] LOCK nTRANS ABORT nOPC nCPI CPA CPB Memory Management Interface
Coprocessor Interface
Architecture
Signal Description
This chapter lists and describes the input/output signals for the ARM7TDMI. The following table (Table 1) lists and describes all of the signals for the ARM7TDMI. Key to signal types IC Input with CMOS thresholds P Power O4 Output with INV4 driver O8 Output with INV8 driver
Signal Description
IC
IC IC
IC
IC IC
BREAKPT Breakpoint.
IC
BUSDIS
Bus Disable
BUSEN Data bus configuration IC
Signal
Signal
Table 1. Signal Description (Continued)
Name COMMTX Communications Channel Transmit CPA Coprocessor absent. Type O Description When HIGH, this signal denotes that the comms channel transmit buffer is empty. This signal changes on the rising edge of MCLK. See Debug Communications Channel for more information on the debug comms channel. A coprocessor which is capable of performing the operation that ARM7TDMI is requesting (by asserting nCPI) should take CPA LOW immediately. If CPA is HIGH at the end of phase 1 of the cycle in which nCPI went LOW, ARM7TDMI will abort the coprocessor handshake and take the undefined instruction trap. If CPA is LOW and remains LOW, ARM7TDMI will busy-wait until CPB is LOW and then complete the coprocessor instruction. A coprocessor which is capable of performing the operation which ARM7TDMI is requesting (by asserting nCPI), but cannot commit to starting it immediately, should indicate this by driving CPB HIGH. When the coprocessor is ready to start it should take CPB LOW. ARM7TDMI samples CPB at the end of phase 1 of each cycle in which nCPI is LOW. These are bidirectional signal paths which are used for data transfers between the processor and external memory. During read cycles (when nRW is LOW), the input data must be valid before the end of phase 2 of the transfer cycle. During write cycles (when nRW is HIGH), the output data will become valid during phase 1 and remain valid throughout phase 2 of the transfer cycle. Note that this bus is driven at all times, irrespective of whether BUSEN is HIGH or LOW. When D[31:0] is not being used to connect to the memory system it must be left unconnected. See Memory Interface on page 117. This is an input signal which, when driven LOW, puts the data bus D[31:0] into the high impedance state. This is included for test purposes, and should be tied HIGH at all times. When HIGH indicates ARM is in debug state. This input signal allows the debug features of ARM7TDMI to be disabled. This signal should be driven LOW when debugging is not required. This is a level-sensitive input, which when HIGH causes ARM7TDMI to enter debug state after executing the current instruction. This allows external hardware to force ARM7TDMI into the debug state, in addition to the debugging features provided by the ICEBreaker block. See ICEBreaker Module on page 163 for details. This signal represents the debug request signal which is presented to the processor. This is the combination of external DBGRQ, as presented to the ARM7TDMI macrocell, and bit 1 of the debug control register. Thus there are two conditions where this signal can change. Firstly, when DBGRQ changes, DBGRQI will change after a propagation delay. When bit 1 of the debug control register has been written, this signal will change on the falling edge of TCK when the TAP controller state machine is in the RUN-TEST/IDLE state. See ICEBreaker Module on page 163 for details. This is the input data bus which may be used to transfer instructions and data between the processor and memory.This data input bus is only used when BUSEN is HIGH. The data on this bus is sampled by the processor at the end of phase 2 during read cycles (i.e. when nRW is LOW). This is the data out bus, used to transfer data from the processor to the memory system. Output data only appears on this bus when BUSEN is HIGH. At all other times, this bus is driven to value 0x00000000. When in use, data on this bus changes during phase 1 of store cycles (i.e. when nRW is HIGH) and remains valid throughout phase 2.
IC
IC
IC 08
DBE Data Bus Enable. DBGACK Debug acknowledge. DBGEN Debug Enable. DBGRQ Debug request.
IC
04 IC IC
04
IC
08
ECAPCLK Extest capture clock ECAPCLKBS Extest capture clock for Boundary Scan
04
ECLK External clock output. EXTERN0 External input 0. EXTERN1 External input 1. HIGHZ ICAPCLKBS Intest capture clock
04
IC IC 04 04
04
IC
08
08
10
Signal
Signal
Table 1. Signal Description (Continued)
Name MCLK Memory clock input. Type IC Description This clock times all ARM7TDMI memory accesses and internal operations. The clock has two distinct phases - phase 1 in which MCLK is LOW and phase 2 in which MCLK (and nWAIT) is HIGH. The clock may be stretched indefinitely in either phase to allow access to slow peripherals or memory. Alternatively, the nWAIT input may be used with a free running MCLK to achieve the same effect. When ARM7TDMI executes a coprocessor instruction, it will take this output LOW and wait for a response from the coprocessor. The action taken will depend on this response, which the coprocessor signals on the CPA and CPB inputs. This signal may be used in conjunction with nENOUT to control the data bus during write cycles. See Memory Interface on page 117. During a data write cycle, this signal is driven LOW during phase 1, and remains LOW for the entire cycle. This may be used to aid arbitration in shared bus applications. See Memory Interface on page 117. During a coprocessor register transfer C-cycle from the ICEbreaker comms channel coprocessor to the ARM core, this signal goes LOW during phase 1 and stays LOW for the entire cycle. This may be used to aid arbitration in shared bus systems. When HIGH indicates that the instruction in the execution unit is not being executed, because for example it has failed its condition code check. This is an interrupt request to the processor which causes it to be interrupted if taken LOW when the appropriate enable in the processor is active. The signal is level-sensitive and must be held LOW until a suitable response is received from the processor. nFIQ may be synchronous or asynchronous, depending on the state of ISYNC. This signal is generated by the TAP controller when the current instruction is HIGHZ. This is used to place the scan cells of that scan chain in the high impedance state. When a external boundary scan chain is not connected, this output should be left unconnected. As nFIQ, but with lower priority. May be taken LOW to interrupt the processor when the appropriate enable is active. nIRQ may be synchronous or asynchronous, depending on the state of ISYNC. These are output signals which are the inverses of the internal status bits indicating the processor operation mode. This signal, when LOW, indicates that the processor requires memory access during the following cycle. The signal becomes valid during phase 1, remaining valid through phase 2 of the cycle preceding that to which it refers. When LOW this signal indicates that the processor is fetching an instruction from memory; when HIGH, data (if present) is being transferred. The signal becomes valid during phase 2 of the previous cycle, remaining valid through phase 1 of the referenced cycle. The timing of this signal may be modified by the use of ALE and APE in a similar way to the address, please refer to the ALE and APE descriptions. This signal may also be driven to a high impedance state by driving ABE LOW. This is a level sensitive input signal which is used to start the processor from a known address. A LOW level will cause the instruction being executed to terminate abnormally. When nRESET becomes HIGH for at least one clock cycle, the processor will re-start from address 0. nRESET must remain LOW (and nWAIT must remain HIGH) for at least two clock cycles. During the LOW period the processor will perform dummy instruction fetches with the address incrementing from the point where reset was activated. The address will overflow to zero if nRESET is held beyond the maximum address limit.
nCPI Not Coprocessor instruction. nENIN NOT enable input. nENOUT Not enable output. nENOUTI Not enable output.
04
IC 04
04 IC
04
nIRQ Not interrupt request. nM[4:0] Not processor mode. nMREQ Not memory request. nOPC Not op-code fetch.
IC
04 04
08
IC
11
04
08
IC
IC
04
04
RANGEOUT1 ICEbreaker Rangeout1 RSTCLKBS Boundary Scan Reset Clock SCREG[3:0] Scan Chain Register SDINBS Boundary Scan Serial Input Data SDOUTBS Boundary scan serial output data
04 O
IC
O4
12
Signal
Signal
Table 1. Signal Description (Continued)
Name SHCLKBS Boundary scan shift clock, phase 1 Type 04 Description This control signal is provided to ease the connection of an external boundary scan chain. SHCLKBS is used to clock the master half of the external scan cells. When in the SHIFT-DR state of the state machine and scan chain 3 is selected, SHCLKBS follows TCK1. When not in the SHIFT-DR state or when scan chain 3 is not selected, this clock is LOW. When an external boundary scan chain is not connected, this output should be left unconnected. This control signal is provided to ease the connection of an external boundary scan chain. SHCLK2BS is used to clock the master half of the external scan cells. When in the SHIFT-DR state of the state machine and scan chain 3 is selected, SHCLK2BS follows TCK2. When not in the SHIFT-DR state or when scan chain 3 is not selected, this clock is LOW. When an external boundary scan chain is not connected, this output should be left unconnected. This bus reflects the current state of the TAP controller state machine, as shown in The JTAG state machine. These bits change off the rising edge of TCK. When driven LOW, TBE forces the data bus D[31:0], the Address bus A[31:0], plus LOCK, MAS[1:0], nRW, nTRANS and nOPC to high impedance. This is as if both ABE and DBE had both been driven LOW. However, TBE does not have an associated scan cell and so allows external signals to be driven high impedance during scan testing. Under normal operating conditions, TBE should be held HIGH at all times. When HIGH, this signal denotes that the processor is executing the THUMB instruction set. When LOW, the processor is executing the ARM instruction set. This signal changes in phase 2 in the first execute cycle of a BX instruction. Test Clock. This clock represents phase 1 of TCK. TCK1 is HIGH when TCK is HIGH, although there is a slight phase lag due to the internal clock non-overlap. This clock represents phase 2 of TCK. TCK2 is HIGH when TCK is LOW, although there is a slight phase lag due to the internal clock non-overlap.TCK2 is the nonoverlapping compliment of TCK1. Test Data Input. Output from the boundary scan logic. Test Mode Select. These connections provide power to the device. These connections are the ground reference for all signals.
04
04
IC
TBIT
O4
TCK TCK1 TCK, phase 1 TCK2 TCK, phase 2 TDI TDO Test Data Output. TMS VDD Power supply. VSS Ground.
IC 04 04
IC O4 IC P P
13
14
Signal
Programmers Model
Switching State
Entering THUMB state Entry into THUMB state can be achieved by executing a BX instruction with the state bit (bit 0) set in the operand register. Transition to THUMB state will also occur automatically on return from an exception (IRQ, FIQ, UNDEF, ABORT, SWI etc.), if the exception was entered with the processor in THUMB state. Entering ARM state Entry into ARM state happens: 1. On execution of the BX instruction with the state bit clear in the operand register. 2. On the processor taking an exception (IRQ, FIQ, RESET, UNDEF, ABORT, SWI etc.). In this case, the PC is placed in the exception modes link register, and execution commences at the exceptions vector address.
Programmers Model
15
Memory Formats
ARM7TDMI views memory as a linear collection of bytes numbered upwards from zero. Bytes 0 to 3 hold the first s t o r e d w o r d , b y te s 4 t o 7 t h e s e c o n d a n d s o o n . ARM7TDMI can treat words in memory as being stored either in Big Endian or Little Endian format. Figure 4. Big Endian Addresses of Bytes within Words
Higher Address 31 8 4 0 Lower Address 24 23 9 5 1 16 15 10 6 2 8 7 11 7 3 0 Word Address 8 4 0
Most significant byte is at lowest address Word is addressed by byte address of most significant byte
Least significant byte is at lowest address Word is addressed by byte address of least significant byte
16
Model
Model
Instruction Length
Instructions are either 32 bits long (in ARM state) or 16 bits long (in THUMB state).
Registers
ARM7TDMI has a total of 37 registers - 31 general-purpose 32-bit registers and six status registers - but these cannot all be seen at once. The processor state and operating mode dictate which registers are available to the programmer.
Data Types
ARM7TDMI supports byte (8-bit), halfword (16-bit) and word (32-bit) data types. Words must be aligned to fourbyte boundaries and half words to two-byte boundaries.
Operating Modes
ARM7TDMI supports seven modes of operation: User (usr): The normal ARM program execution state FIQ (fiq): Designed to support a data transfer or channel process IRQ (irq): Used for general-purpose interrupt handling Supervisor (svc): Protected mode for the operating system Abort mode (abt): Entered after a data or instruction prefetch abort System (sys): A privileged user mode for the operating system Undefined (und): Entered when an undefined instruction is executed Mode changes may be made under software control, or may be brought about by external interrupts or exception processing. Most application programs will execute in User mode. The non-user modes - known as privileged modes are entered in order to service interrupts or exceptions, or to access protected resources.
17
FIQ
Supervisor
R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13_svc R14_svc R15 (PC)
Abort
R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13_abt R14_abt R15 (PC) R0 R1 R2 R3 R4 R5 R6 R7 R8 R9
IRQ
Undefined
R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13_und R14_und R15 (PC)
= banked register
18
Model
Model
The THUMB state register set
The THUMB state register set is a subset of the ARM state set. The programmer has direct access to eight general registers, R0-R7, as well as the Program Counter (PC), a stack pointer register (SP), a link register (LR), and the Figure 7. Register Organization in Thumb State CPSR. There are banked Stack Pointers, Link Registers and Saved Process Status Registers (SPSRs) for each privileged mode. This is shown in Figure 7.
FIQ
Supervisor
R0 R1 R2 R3 R4 R5 R6 R7 SP_svc LR_svc PC
Abort
R0 R1 R2 R3 R4 R5 R6 R7 SP_abt LR_abt PC R0 R1 R2 R3 R4 R5 R6 R7
IRQ
Undefined
R0 R1 R2 R3 R4 R5 R6 R7 SP_und LR_und PC
SP_irq LR_irq PC
= banked register
19
THUMB state
R0 R1 R2 R3 R4 R5 R6 R7
ARM state
R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 Stack Pointer (R13) Link Register (R14) Program Counter (R15) CPSR SPSR
Stack Pointer (SP) Link Register (LR) Program Counter (PC) CPSR SPSR
20
Model
Hi registers
Lo registers
Model
The Program Status Registers
The ARM7TDMI contains a Current Program Status Register (CPSR), plus five Saved Program Status Registers (SPSRs) for use by exception handlers. These registers hold information about the most recently performed ALU operation Figure 9. Program Status Register Format
condition code flags 31 30 29 28 27 26 (reserved) 25 24 23 8 7 6 5 control bits 4 3 2 1 0
control the enabling and disabling of interrupts set the processor operating mode The arrangement of bits is shown in Figure 9.
M4 M3 M2
M1 M0
21
The mode bits The M4, M3, M2, M1 and M0 bits (M[4:0]) are the mode bits. These determine the processors operating mode, as shown in Table 2. Not all combinations of the mode bits define a valid processor mode. Only those explicitly described shall be used. The user should be aware that if any illegal value is programmed into the mode bits, M[4:0], then the processor will enter an unrecoverable state. If this occurs, reset should be applied. Reserved bits The remaining bits in the PSRs are reserved. When changing a PSRs flag or control bits, you must ensure that these unused bits are not altered. Also, your program should not rely on them containing specific values, since in future processors they may read as one or zero.
10001
FIQ
10010
IRQ
10011
Supervisor
10111
Abort
11011
Undefined
11111
System
22
Model
Model
Exceptions
Exceptions arise whenever the normal flow of a program has to be halted temporarily, for example to service an interrupt from a peripheral. Before an exception can be handled, the current processor state must be preserved so that the original program can resume when the handler routine has finished. It is possible for several exceptions to arise at the same time. If this happens, they are dealt with in a fixed order - see Exception priorities on page 25. 3. Forces the CPSR mode bits to a value which depends on the exception 4. Forces the PC to fetch the next instruction from the relevant exception vector It may also set the interrupt disable flags to prevent otherwise unmanageable nestings of exceptions. If the processor is in THUMB state when an exception occurs, it will automatically switch into ARM state when the PC is loaded with the exception vector address.
MOV PC, R14 MOVS PC, R14_svc MOVS PC, R14_und SUBS PC, R14_fiq, #4 SUBS PC, R14_irq, #4 SUBS PC, R14_abt, #4 SUBS PC, R14_abt, #8 NA
ARM R14_x PC + 4 PC + 4 PC + 4 PC + 4 PC + 4 PC + 4 PC + 8 -
Notes
1 1 1 2 2 1 3 4
Notes 1. Where PC is the address of the BL/SWI/Undefined Instruction fetch which had the prefetch abort. 2. Where PC is the address of the instruction which did not get executed since the FIQ or IRQ took priority. 3. Where PC is the address of the Load or Store instruction which generated the data abort.
23
FIQ
The FIQ (Fast Interrupt Request) exception is designed to support a data transfer or channel process, and in ARM state has sufficient private registers to remove the need for register saving (thus minimising the overhead of context switching). FIQ is externally generated by taking the nFIQ input LOW. This input can except either synchronous or asynchronous transitions, depending on the state of the ISYNC input signal. When ISYNC is LOW, nFIQ and nIRQ are considered asynchronous, and a cycle delay for synchronization is incurred before the interrupt can affect the processor flow. Irrespective of whether the exception was entered from ARM or Thumb state, a FIQ handler should leave the interrupt by executing
SUBS PC,R14_fiq,#4
Abort
An abort indicates that the current memory access cannot be completed. It can be signalled by the external ABORT input. ARM7TDMI checks for the abort exception during memory access cycles. There are two types of abort: Prefetch abort occurs during an instruction prefetch. Data abort occurs during a data access. If a prefetch abort occurs, the prefetched instruction is marked as invalid, but the exception will not be taken until the instruction reaches the head of the pipeline. If the instruction is not executed - for example because a branch occurs while it is in the pipeline - the abort does not take place. If a data abort occurs, the action taken depends on the instruction type: 1. Single data transfer instructions (LDR, STR) write back modified base registers: the Abort handler must be aware of this. 2. The swap instruction (SWP) is aborted as though it had not been executed. 3. Block data transfer instructions (LDM, STM) complete. If write-back is set, the base is updated. If the instruction would have overwritten the base with data (ie it has the base in the transfer list), the overwriting is prevented. All register overwriting is prevented after an abort is indicated, which means in particular that R15 (always the last register to be transferred) is preserved in an aborted LDM instruction. The abort mechanism allows the implementation of a demand paged virtual memory system. In such a system the processor is allowed to generate arbitrary addresses. When the data at an address is unavailable, the Memory Management Unit (MMU) signals an abort. The abort handler must then work out the cause of the abort, make the requested data available, and retry the aborted instruction. The application program needs no knowledge of the amount of memory available to it, nor is its state in any way affected by the abort. After fixing the reason for the abort, the handler should execute the following irrespective of the state (ARM or Thumb): SUBS PC,R14_abt,#4 for a prefetch abort, or SUBS PC,R14_abt,#8 for a data abort This restores both the PC and the CPSR, and retries the aborted instruction.
FIQ may be disabled by setting the CPSRs F flag (but note that this is not possible from User mode). If the F flag is clear, ARM7TDMI checks for a LOW level on the output of the FIQ synchroniser at the end of each instruction.
IRQ
The IRQ (Interrupt Request) exception is a normal interrupt caused by a LOW level on the nIRQ input. IRQ has a lower priority than FIQ and is masked out when a FIQ sequence is entered. It may be disabled at any time by setting the I bit in the CPSR, though this can only be done from a privileged (non-User) mode. Irrespective of whether the exception was entered from ARM or Thumb state, an IRQ handler should return from the interrupt by executing
SUBS PC,R14_irq,#4
24
Model
Model
Software interrupt
The software interrupt instruction (SWI) is used for entering Supervisor mode, usually to request a particular supervisor function. A SWI handler should return by executing the following irrespective of the state (ARM or Thumb): MOV PC, R14_svc This restores the PC and CPSR, and returns to the instruction following the SWI. mechanism may be used to extend either the THUMB or ARM instruction set by software emulation. After emulating the failed instruction, the trap handler should execute the following irrespective of the state (ARM or Thumb): MOVS PC,R14_und This restores the CPSR and returns to the instruction following the undefined instruction.
Undefined instruction
When ARM7TDMI comes across an instruction which it cannot handle, it takes the undefined instruction trap. This Table 4. Exception Vectors
Address 0x00000000 0x00000004 0x00000008 0x0000000C 0x00000010 0x00000014 0x00000018 0x0000001C Exception Reset Undefined instruction Software interrupt Abort (prefetch) Abort (data) Reserved IRQ FIQ
Exception vectors
The following table shows the exception vector addresses.
Mode on entry Supervisor Undefined Supervisor Abort Abort Reserved IRQ FIQ
Exception priorities
When multiple exceptions arise at the same time, a fixed priority system determines the order in which they are handled: Highest priority: 1. Reset 2. Data abort 3. FIQ 4. IRQ 5. Prefetch abort Lowest priority: 6. Undefined Instruction, Software interrupt. Not all exceptions can occur at once: Undefined Instruction and Software Interrupt are mutually exclusive, since they each correspond to particular (nonoverlapping) decodings of the current instruction. If a data abort occurs at the same time as a FIQ, and FIQs are enabled (ie the CPSRs F flag is clear), ARM7TDMI enters the data abort handler and then immediately proceeds to the FIQ vector. A normal return from FIQ will cause the data abort handler to resume execution. Placing data abort at a higher priority than FIQ is necessary to ensure that the transfer error does not escape detection. The time for this exception entry should be added to worstcase FIQ latency calculations.
25
Interrupt Latencies
The worst case latency for FIQ, assuming that it is enabled, consists of the longest time the request can take to pass through the synchroniser (Tsyncmax if asynchronous), plus the time for the longest instruction to complete (Tldm, the longest instruction is an LDM which loads all the registers including the PC), plus the time for the data abort entry (Texc), plus the time for FIQ entry (Tfiq). At the end of this time ARM7TDMI will be executing the instruction at 0x1C. Tsyncmax is 3 processor cycles, Tldm is 20 cycles, Texc is 3 cycles, and Tfiq is 2 cycles. The total time is therefore 28 processor cycles. This is just over 1.4 microseconds in a system which uses a continuous 20 MHz processor clock. The maximum IRQ latency calculation is similar, but must allow for the fact that FIQ has higher priority and could delay entry into the IRQ handling routine for an arbitrary length of time. The minimum latency for FIQ or IRQ consists of the shortest time the request can take through the synchroniser ( Tsyncmin ) plus Tfiq . This is 4 processor cycles.
Reset
When the nRESET signal goes LOW, ARM7TDMI abandons the executing instruction and then continues to fetch instructions from incrementing word addresses. When nRESET goes HIGH again, ARM7TDMI: 1. Overwrites R14_svc and SPSR_svc by copying the current values of the PC and CPSR into them. The value of the saved PC and SPSR is not defined. 2. Forces M[4:0] to 10011 (Supervisor mode), sets the I and F bits in the CPSR, and clears the CPSRs T bit. 3. Forces the PC to fetch the next instruction from address 0x00. 4. Execution resumes in ARM state.
26
Model
27
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
Cond Cond Cond Cond Cond Cond Cond Cond Cond Cond Cond Cond Cond Cond Cond
0 0 I
Opcode S
Rn Rd RdHi Rn
Rd Rn RdLo Rd Rs Rn
Operand 2 1 0 0 1 1 0 0 1 Rm Rm Rm Rn Rm Offset
Data Processing / PSR Transfer Multiply Multiply Long Single Data Swap Branch and Exchange Halfword Data Transfer: register offset Halfword Data Transfer: immediate offset Single Data Transfer
0 0 0 0 0 0 A S 0 0 0 0 1 U A S 0 0 0 1 0 B 0 0
0 0 0 0 1 0 0 1
0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 P U 0 W L 0 0 0 P U 1 W L 0 1 I P U B W L 0 1 1 1 0 0 P U S W L 1 0 1 L 1 1 0 P U N W L 1 1 1 0 CP Opc Rn CRn CRn CRd CRd Rd Rn Offset CP# CP# CP# CP CP Offset 0 1 Register List Rn Rn Rn Rd Rd Rd 0 0 0 0 1 S H 1 Offset 1 S H 1 Offset 1
Undefined Block Data Transfer Branch Coprocessor Data Transfer Coprocessor Data Operation Coprocessor Register Transfer Software Interrupt
CRm CRm
1 1 1 0 CP Opc L 1 1 1 1
Ignored by processor
8 7 6 5 4 3 2 1 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
Note: Some instruction codes are not defined but do not cause the Undefined instruction trap to be taken, for instance a Multiply instruction with bit 6 changed to a 1. These instructions should not be used, as their action may change in future ARM implementations.
28
Instruction Set
Instruction Set
Instruction summary
Table 5. The ARM Instruction Set
Mnemonic ADC ADD AND B BIC BL BX CDP CMN CMP EOR LDC LDM LDR MCR MLA MOV MRC MRS MSR MUL MVN ORR RSB RSC SBC STC STM STR SUB SWI SWP TEQ TST Instruction Add with carry Add AND Branch Bit Clear Branch with Link Branch and Exchange Coprocesor Data Processing Compare Negative Compare Exclusive OR Load coprocessor from memory Load multiple registers Load register from memory Move CPU register to coprocessor register Multiply Accumulate Move register or constant Move from coprocessor register to CPU register Move PSR status/flags to register Move register to PSR status/flags Multiply Move negative register OR Reverse Subtract Reverse Subtract with Carry Subtract with Carry Store coprocessor register to memory Store Multiple Store register to memory Subtract Software Interrupt Swap register with memory Test bitwise equality Test bits Action Rd := Rn + Op2 + Carry Rd := Rn + Op2 Rd := Rn AND Op2 R15 := address Rd := Rn AND NOT Op2 R14 := R15, R15 := address R15 := Rn, T bit := Rn[0] (Coprocessor-specific) CPSR flags := Rn + Op2 CPSR flags := Rn - Op2 Rd := (Rn AND NOT Op2) OR (op2 AND NOT Rn) Coprocessor load Stack manipulation (Pop) Rd := (address) cRn := rRn {<op>cRm} Rd := (Rm * Rs) + Rn Rd : = Op2 Rn := cRn {<op>cRm} Rn := PSR PSR := Rm Rd := Rm * Rs Rd := 0xFFFFFFFF EOR Op2 Rd := Rn OR Op2 Rd := Op2 - Rn Rd := Op2 - Rn - 1 + Carry Rd := Rn - Op2 - 1 + Carry address := CRn Stack manipulation (Push) <address> := Rd Rd := Rn - Op2 OS call Rd := [Rn], [Rn] := Rm CPSR flags := Rn EOR Op2 CPSR flags := Rn AND Op2 See Page 34 34 34 32 34 32 31 66 34 34 34 68 56 48, 52 70 44, 46 34 70 40 40 44, 46 34 34 34 34 34 68 56 48, 52 34 64 62 34 34
29
language) becomes BEQ for "Branch if Equal", which means the Branch will only be taken if the Z flag is set. In practice, fifteen different conditions may be used: these are listed in Table 6. The sixteenth (1111) is reserved, and must not be used. In the absence of a suffix, the condition field of most instructions is set to "Always" (sufix AL). This means the instruction will always be executed regardless of the CPSR condition codes.
30
Instruction Set
Instruction Set
Branch and Exchange (BX)
This instruction is only executed if the condition is true. The various conditions are defined in Table 6. This instruction performs a branch by copying the contents of a general register, Rn, into the program counter, PC. The branch causes a pipeline flush and refill from the address Figure 11. Branch and Exchange Instructions
31 28 27 24 23 20 19 16 15 12 11 8 7 4 3 0
specified by Rn. This instruction also permits the instruction set to be exchanged. When the instruction is executed, the value of Rn[0] determines whether the instruction stream will be decoded as ARM or THUMB instructions.
Cond
0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
Rn
Operand register
If bit 0 of Rn = 1, subsequent instructions decoded as THUMB instructions If bit 0 of Rn = 0, subsequent instructions decoded as ARM instructions
Condition Field
{cond}: Two character condition mnemonic. See Table 6. Rn: is an expression evaluating to a valid register number.
Assembler syntax
BX - branch and exchange.
BX{cond} Rn
Examples
ADR R0, Into_THUMB + 1; Generate branch target address ; and set bit 0 high - hence ; arrive in THUMB state. BX R0 ; Branch and change to THUMB ; state. CODE16 ; Assemble subsequent code as Into_THUMB ; THUMB instructions . . ADR R5, Back_to_ARM: Generate branch target to word : aligned ; address - hence bit 0 ; is low and so change back to ARM ; state. BX R5 ; Branch and change back to ARM ; state. . . ALIGN ; Word align CODE32 ; Assemble subsequent code as ARM Back_to_ARM ; instructions . .
31
Cond
101
offset
Link bit
0 = Branch 1 = Branch with Link
Condition field
Branch instructions contain a signed 2s complement 24 bit offset. This is shifted left two bits, sign extended to 32 bits, and added to the PC. The instruction can therefore specify a branch of +/- 32Mbytes. The branch offset must take account of the prefetch operation, which causes the PC to be 2 words (8 bytes) ahead of the current instruction. Branches beyond +/- 32Mbytes must use an offset or absolute destination which has been previously loaded into a register. In this case the PC should be manually saved in R14 if a Branch with Link type operation is required.
adjusted to allow for the prefetch, and contains the address of the instruction following the branch and link instruction. Note that the CPSR is not saved with the PC and R14[1:0] are always cleared. To return from a routine called by Branch with Link use MOV PC,R14 if the link register is still valid or LDM Rn!,{..PC} if the link register has been saved onto a stack pointed to by Rn.
32
Instruction Set
Instruction Set
Assembler syntax
Items in {} are optional. Items in <> must be present.
B{L}{cond} <expression>
{L} is used to request the Branch with Link form of the instruction. If absent, R14 will not be affected by the instruction.
{cond} is a two-character mnemonic as shown in Table 6. If absent then AL (ALways) will be used. <expression> is the destination. The assembler calculates the offset.
Examples
here BAL B CMP BEQ BL ADDS BLCC here ; ; there ; R1,#0 ; ; fred ; assembles to 0xEAFFFFFE (note effect of PC offset). Always condition used as default. Compare R1 with zero and branch to fred if R1 was zero, otherwise continue continue to next instruction.
sub+ROM; Call subroutine at computed address. R1,#1 ; Add 1 to register 1, setting CPSR flags ; on the result then call subroutine if sub ; the C flag is clear, which will be the ; case unless R1 held 0xFFFFFFFF.
33
Data Processing
The data processing instruction is only executed if the condition is true. The conditions are defined in Table 6. Figure 13. Data Processing Instructions
31 28 27 26 25 24 21 20 19 16 15 12 11 0
Cond
00
OpCode
Rn
Rd
Operand 2
Operation Code
0000 = AND - Rd:= Op1 AND Op2 0001 = EOR - Rd:= Op1 EOR Op2 0010 = SUB - Rd:= Op1 - Op2 0011 = RSB - Rd:= Op2 - Op1 0100 = ADD - Rd:= Op1 + Op2 0101 = ADC - Rd:= Op1 + Op2 + C 0110 = SBC - Rd:= Op1 - Op2 + C - 1 0111 = RSC - Rd:= Op2 - Op1 + C - 1 1000 = TST - set condition codes on Op1 AND Op2 1001 = TEQ - set condition codes on Op1 EOR Op2 1010 = CMP - set condition codes on Op1 - Op2 1011 = CMN - set condition codes on Op1 + Op2 1100 = ORR - Rd:= Op1 OR Op2 1101 = MOV - Rd:= Op2 1110 = BIC - Rd:= Op1 AND NOT Op2 1111 = MVN - Rd:= NOT Op2
Immediate Operand
11 0 = operand 2 is a register 4 3 0
Shift
Rm
Rotate
Imm
Condition field
The instruction produces a result by performing a specified arithmetic or logical operation on one or two operands. The first operand is always a register (Rn). The second operand may be a shifted register (Rm) or a rotated 8 bit immediate value (Imm) according to the value of the I bit in the instruction. The condition codes in the CPSR may be preserved or updated as a result of this instruction, according to the value of the S bit in the instruction. Certain operations (TST, TEQ, CMP, CMN) do not write the result to Rd. They are used only to perform tests and to set
the condition codes on the result and always have the S bit set. The instructions and their effects are listed in Table 7.
34
Instruction Set
Instruction Set
CPSR flags
The data processing operations may be classified as logical or arithmetic. The logical operations (AND, EOR, TST, TEQ, ORR, MOV, BIC, MVN) perform the logical action on all corresponding bits of the operand or operands to produce the result. If the S bit is set (and Rd is not R15, see Table 7. ARM Data Processing Instructions
Assembler Mnemonic AND EOR SUB RSB ADD ADC SBC RSC TST TEQ CMP CMN ORR MOV BIC MVN OpCode 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Action operand1 AND operand2 operand1 EOR operand2 operand1 - operand2 operand2 - operand1 operand1 + operand2 operand1 + operand2 + carry operand1 - operand2 + carry - 1 operand2 - operand1 + carry - 1 as AND, but result is not written as EOR, but result is not written as SUB, but result is not written as ADD, but result is not written operand1 OR operand2 operand2 (operand1 is ignored) operand1 AND NOT operand2 (Bit clear) NOT operand2 (operand1 is ignored)
below) the V flag in the CPSR will be unaffected, the C flag will be set to the carry out from the barrel shifter (or preserved when the shift operation is LSL #0), the Z flag will be set if and only if the result is all zeros, and the N flag will be set to the logical value of bit 31 of the result.
The arithmetic operations (SUB, RSB, ADD, ADC, SBC, RSC, CMP, CMN) treat each operand as a 32 bit integer (either unsigned or 2s complement signed, the two are equivalent). If the S bit is set (and Rd is not R15) the V flag in the CPSR will be set if an overflow occurs into bit 31 of the result; this may be ignored if the operands were considered unsigned, but warns of a possible error if the operands were 2s complement signed. The C flag will be set to the carry out of bit 31 of the ALU, the Z flag will be set if and only if the result was zero, and the N flag will be set to the value of bit 31 of the result (indicating a negative result if the operands are considered to be 2s complement signed). Figure 14. ARM Shift Operations
11 7 6 5 4
Shifts
When the second operand is specified to be a shifted register, the operation of the barrel shifter is controlled by the Shift field in the instruction. This field indicates the type of shift to be performed (logical left or right, arithmetic right or rotate right). The amount by which the register should be shifted may be contained in an immediate field in the instruction, or in the bottom byte of another register (other than R15). The encoding for the different shift types is shown in Figure 14.
11
Rs
Shift type
00 = logical left 01 = logical right 10 = arithmetic right 11 = rotate right
Shift type
00 = logical left 01 = logical right 10 = arithmetic right 11 = rotate right
Shift amount
5 bit unsigned integer
Shift register
Shift amount specified in bottom byte of Rs
35
Instruction specified shift amount When the shift amount is specified in the instruction, it is contained in a 5 bit field which may take any value from 0 to 31. A logical shift left (LSL) takes the contents of Rm and moves each bit by the specified amount to a more significant position. The least significant bits of the result are filled with zeros, and the high bits of Rm which do not map into the Figure 15. Logical Shift Left
31 27 26 0
result are discarded, except that the least significant discarded bit becomes the shifter carry output which may be latched into the C bit of the CPSR when the ALU operation is in the logical class (see above). For example, the effect of LSL #5 is shown in Figure 15.
contents of Rm
carry out
value of operand 2
0 0 0 0 0
Note: LSL #0 is a special case, where the shifter carry out is the old value of the CPSR C flag. The contents of Rm are used directly as the second operand. Figure 16. Logical Shift Right
31
A logical shift right (LSR) is similar, but the contents of Rm are moved to less significant positions in the result. LSR #5 has the effect shown in Figure 16.
contents of Rm
carry out
0 0 0 0 0
value of operand 2
The form of the shift field which might be expected to correspond to LSR #0 is used to encode LSR #32, which has a zero result with bit 31 of Rm as the carry output. Logical shift right zero is redundant as it is the same as logical shift left zero, so the assembler will convert LSR #0 (and ASR #0 and ROR #0) into LSL #0, and allow LSR #32 to be specified. An arithmetic shift right (ASR) is similar to logical shift right, except that the high bits are filled with bit 31 of Rm instead of zeros. This preserves the sign in 2s complement notation. For example, ASR #5 is shown in Figure 17.
36
Instruction Set
Instruction Set
Figure 17. Arithmetic Shift Right
31 30 5 4 0
contents of Rm
carry out
value of operand 2
The form of the shift field which might be expected to give ASR #0 is used to encode ASR #32. Bit 31 of Rm is again used as the carry output, and each bit of operand 2 is also equal to bit 31 of Rm. The result is therefore all ones or all zeros, according to the value of bit 31 of Rm. Figure 18. Rotate Right
31
Rotate right (ROR) operations reuse the bits which overshoot in a logical shift right operation by reintroducing them at the high end of the result, in place of the zeros used to fill the high end in logical right operations. For example, ROR #5 is shown in Figure 18.
5 4
contents of Rm
carry out
If this byte is zero, the unchanged contents of Rm will be used as the second operand, and the old value of the CPSR C flag will be passed on as the shifter carry output.
37
If the byte has a value between 1 and 31, the shifted result will exactly match that of an instruction specified shift with the same value and shift operation. If the value in the byte is 32 or more, the result will be a logical extension of the shift described above: 1. LSL by 32 has result zero, carry out equal to bit 0 of Rm. 2. LSL by more than 32 has result zero, carry out zero. 3. LSR by 32 has result zero, carry out equal to bit 31 of Rm. 4. LSR by more than 32 has result zero, carry out zero. 5. ASR by 32 or more has result filled with and carry out equal to bit 31 of Rm. 6. ROR by 32 has result equal to Rm, carry out equal to bit 31 of Rm. 7. ROR by n where n is greater than 32 will give the same result and carry out as ROR by n-32; therefore repeatedly subtract 32 from n until the amount is in the range 1 to 32 and see above. Note: The zero in bit 7 of an instruction with a register controlled shift is compulsory; a one in this bit will cause the instruction to be a multiply or undefined instruction.
Writing to R15
When Rd is a register other than R15, the condition code flags in the CPSR may be updated from the ALU flags as described above. When Rd is R15 and the S flag in the instruction is not set the result of the operation is placed in R15 and the CPSR is unaffected. When Rd is R15 and the S flag is set the result of the operation is placed in R15 and the SPSR corresponding to the current mode is moved to the CPSR. This allows state changes which atomically restore both PC and CPSR. This form of instruction should not be used in User mode.
38
Instruction Set
Instruction Set
Instruction cycle times
Data Processing instructions vary in the number of incremental cycles taken as follows: Table 8. Incremental Cycle Times
Processing Type Cycles 1S
Normal Data Processing Data Processing with register specified shift Data Processing with PC written
Data Processing with register specified shift and PC written
1S + 1I 2S + 1N
2S + 1N + 1I
Assembler syntax
1. MOV,MVN (single operand instructions.)
<opcode>{cond}{S} Rd,<Op2>
3. AND,EOR,SUB,RSB,ADD,ADC,SBC,RSC,ORR,BI C
<opcode>{cond}{S} Rd,Rn,<Op2>
{cond} is a two-character condition mnemonic. See Table 6. {S} set condition codes if S present (implied for CMP, CMN, TEQ, TST). Rd, Rn and Rm are expressions evaluating to a register number. <#expression> if this is used, the assembler will attempt to generate a shifted immediate 8-bit field to match the expression. If this is impossible, it will give an error. <shift> is <shiftname> <register> or <shiftname> #expression, or RRX (rotate right one bit with extend). <shiftname>s are: ASL, LSL, LSR, ASR, ROR. (ASL is a synonym for LSL, they assemble to the same code.)
Examples
ADDEQ R2,R4,R5 TEQS R4,#3 ; If the Z flag is set make R2:=R4+R5 ; test R4 for equality with 3. ; (The S is in fact redundant as the ; assembler inserts it automatically.) R4,R5,R7,LSR R2; Logical right shift R7 by the number in ; the bottom byte of R2, subtract result ; from R5, and put the answer into R4. PC,R14 ; Return from subroutine. PC,R14 ; Return from exception and restore CPSR ; from SPSR_mode.
SUB
MOV MOVS
39
Operand restrictions
In User mode, the control bits of the CPSR are protected from change, so only the condition code flags of the CPSR can be changed. In other (privileged) modes the entire CPSR can be changed. Note that the software must never change the state of the T bit in the CPSR. If this happens, the processor will enter an unpredictable state. The SPSR register which is accessed depends on the mode at the time of execution. For example, only SPSR_fiq is accessible when the processor is in FIQ mode. You must not specify R15 as the source or destination register. Also, do not attempt to access an SPSR in User mode, since no such register exists.
40
Instruction Set
Instruction Set
Figure 20. PSR Transfer MRS (transfer PSR contents to a register)
31 28 27 23 22 21 16 15 12 11 0
Cond
00010
P s
001111
Rd
Cond
00010
P d
1010011111
00000000
Rm
Condition field MSR (transfer register contents or immdiate value to PSR flag bits only)
31 28 27 23 22 21 12 11 0
Cond
00 I
10
P d
1010001111
Source operand
Destination PSR
0=CPSR 1=SPSR_<current mode>
Immediate Operand
11 0=source operand is a register 4 3 0
00000000
Rm
11
Rotate
Imm
Condition field
41
Reserved bits
Only twelve bits of the PSR are defined in ARM7TDMI (N,Z,C,V,I,F, T & M[4:0]); the remaining bits are reserved for use in future versions of the processor. Refer to Figure 9 for a full description of the PSR bits. To ensure the maximum compatibility between ARM7TDMI programs and future processors, the following rules should be observed: The reserved bits should be preserved when changing the value in a PSR. Programs should not rely on specific values from the reserved bits when checking the PSR status, since they may read as one or zero in future processors. A read-modify-write strategy should therefore be used when altering the control bits of any PSR register; this involves transferring the appropriate PSR register to a general register using the MRS instruction, changing only the relevant bits and then transferring the modified value back to the PSR register using the MSR instruction.
Example
The following sequence performs a mode change: MRS R0,CPSR ; Take a copy of the CPSR. BIC R0,R0,#0x1F ; Clear the mode bits. ORR R0,R0,#new_mode ; Select new mode MSR CPSR,R0 ; Write back the modified ; CPSR. When the aim is simply to change the condition code flags in a PSR, a value can be written directly to the flag bits without disturbing the control bits. The following instruction sets the N,Z,C and V flags: MSR CPSR_flg,#0xF0000000 ; Set all the flags ; regardless of their ; previous state (does not ; affect any control bits). No attempt should be made to write an 8 bit immediate value into the whole PSR since such an operation cannot preserve the reserved bits.
42
Instruction Set
Instruction Set
Assembler syntax
1. MRS - transfer PSR contents to a register
MRS{cond} Rd,<psr>
The most significant four bits of the register contents are written to the N,Z,C & V flags respectively. 4. MSR - transfer immediate value to PSR flag bits only
MSR{cond} <psrf>,<#expression>
Key: {cond} two-character condition mnemonic. See Table 6. Rd and Rm are expressions evaluating to a register number other than R15 <psr> is CPSR, CPSR_all, SPSR or SPSR_all. (CPSR and CPSR_all are synonyms as are SPSR and SPSR_all) <psrf> is CPSR_flg or SPSR_flg <#expression>where this is used, the assembler will attempt to generate a shifted immediate 8-bit field to match the expression. If this is impossible, it will give an error.
The expression should symbolise a 32 bit value of which the most significant four bits are written to the N,Z,C and V flags respectively.
Examples
In User mode the instructions behave as follows: MSR CPSR_all,Rm ; CPSR[31:28] <- Rm[31:28] MSR CPSR_flg,Rm ; CPSR[31:28] <- Rm[31:28] MSR CPSR_flg,#0xA0000000; CPSR[31:28] <- 0xA ;(set N,C; clear Z,V) MRS Rd,CPSR ; Rd[31:0] <- CPSR[31:0] In privileged modes the instructions behave as follows: MSR CPSR_all,Rm ; CPSR[31:0] <- Rm[31:0] MSR CPSR_flg,Rm ; CPSR[31:28] <- Rm[31:28] MSR CPSR_flg,#0x50000000; CPSR[31:28] <- 0x5 ;(set Z,V; clear N,C) MRS Rd,CPSR ; Rd[31:0] <- CPSR[31:0] MSR SPSR_all,Rm ;SPSR_<mode>[31:0]<- Rm[31:0] MSR SPSR_flg,Rm ; SPSR_<mode>[31:28] <- Rm[31:28] MSR SPSR_flg,#0xC0000000; SPSR_<mode>[31:28] <- 0xC ;(set N,Z; clear C,V) MRS Rd,SPSR ; Rd[31:0] <- SPSR_<mode>[31:0]
43
The multiply and multiply-accumulate instructions use an 8 bit Booths algorithm to perform integer multiplication.
Cond
0 0 0 0 0 0 A S
Rd
Rn
Rs
1 0 0
Rm
Accumulate
0 = multiply only 1 = multiply and accumulate
Condition Field
The multiply form of the instruction gives Rd:=Rm*Rs. Rn is ignored, and should be set to zero for compatibility with possible future upgrades to the instruction set. The multiply-accumulate form gives Rd:=Rm*Rs+Rn, which can save an explicit ADD instruction in some circumstances. Both forms of the instruction work on operands which may be considered as signed (2s complement) or unsigned integers. The results of a signed multiply and of an unsigned multiply of 32 bit operands differ only in the upper 32 bits - the low 32 bits of the signed and unsigned results are identical. As these instructions only produce the low 32 bits of a multiply, they can be used for both signed and unsigned multiplies. For example consider the multiplication of the operands: Operand A Operand B Result 0xFFFFFFF6 0x00000014 0xFFFFFF38 If the operands are interpreted as signed Operand A has the value -10, operand B has the value 20, and the result is -200 which is correctly represented as 0xFFFFFF38 If the operands are interpreted as unsigned Operand A has the value 4294967286, operand B has the value 20 and the result is 85899345720, which is represented as 0x13FFFFFF38, so the least significant 32 bits are 0xFFFFFF38.
Operand restrictions
The destination register Rd must not be the same as the operand register Rm. R15 must not be used as an operand or as the destination register. All other register combinations will give correct results, and Rd, Rn and Rs may use the same register when required.
CPSR flags
Setting the CPSR flags is optional, and is controlled by the S bit in the instruction. The N (Negative) and Z (Zero) flags are set correctly on the result (N is made equal to bit 31 of the result, and Z is set if and only if the result is zero). The C (Carry) flag is set to a meaningless value and the V (oVerflow) flag is unaffected.
44
Instruction Set
Instruction Set
Assembler syntax
MUL{cond}{S} Rd,Rm,Rs MLA{cond}{S} Rd,Rm,Rs,Rn
{cond} two-character condition mnemonic. See Table 6. {S} set condition codes if S present Rd, Rm, Rs and Rnare expressions evaluating to a register number other than R15.
Examples
MUL MLAEQS R1,R2,R3 ; R1:=R2*R3 R1,R2,R3,R4; Conditionally R1:=R2*R3+R4, ; setting condition codes.
45
The multiply long instructions perform integer multiplication on two 32 bit operands and produce 64 bit results. Signed and unsigned multiplication each with optional accumulate give rise to four variations.
Cond
0 0 0 0 1 U A S
RdHi
RdLo
Rs
1 0 0
Rm
Accumulate
0 = multiply only 1 = multiply and accumulate
Unsigned
0 = unsigned 1 = signed
Condition Field
The multiply forms (UMULL and SMULL) take two 32 bit numbers and multiply them to produce a 64 bit result of the form RdHi,RdLo := Rm * Rs. The lower 32 bits of the 64 bit result are written to RdLo, the upper 32 bits of the result are written to RdHi. The multiply-accumulate forms (UMLAL and SMLAL) take two 32 bit numbers, multiply them and add a 64 bit number to produce a 64 bit result of the form RdHi,RdLo := Rm * Rs + RdHi,RdLo. The lower 32 bits of the 64 bit number to add is read from RdLo. The upper 32 bits of the 64 bit number to add is read from RdHi. The lower 32 bits of the 64 bit result are written to RdLo. The upper 32 bits of the 64 bit result are written to RdHi. The UMULL and UMLAL instructions treat all of their operands as unsigned binary numbers and write an unsigned 64 bit result. The SMULL and SMLAL instructions treat all of their operands as twos-complement signed numbers and write a twos-complement signed 64 bit result.
Operand restrictions
R15 must not be used as an operand or as a destination register. RdHi, RdLo, and Rm must all specify different registers.
CPSR flags
Setting the CPSR flags is optional, and is controlled by the S bit in the instruction. The N and Z flags are set correctly on the result (N is equal to bit 63 of the result, Z is set if and only if all 64 bits of the result are zero). Both the C and V flags are set to meaningless values.
46
Instruction Set
Instruction Set
Assembler syntax
Table 9. Assembler Syntax Descriptions
Mnemonic Description Purpose
Unsigned Multiply Long Unsigned Multiply & Accumulate Long Signed Multiply Long Signed Multiply & Accumulate Long
32 x 32 = 64 32 x 32 + 64 = 64 32 x 32 = 64 32 x 32 + 64 = 64
where: {cond} two-character condition mnemonic. See Table 6. {S} set condition codes if S present
RdLo, RdHi, Rm, Rs are expressions evaluating to a register number other than R15.
Examples
UMULL UMLALS R1,R4,R2,R3; R4,R1:=R2*R3 R1,R5,R2,R3; R5,R1:=R2*R3+R5,R1 also setting ; condition codes
47
used in the transfer is calculated by adding an offset to or subtracting an offset from a base register. The result of this calculation may be written back into the base register if auto-indexing is required.
Cond
01
I P U B W L
Rn
Rd
Offset
Write-back bit
0 = no write-back 1 = write address into base
Byte/Word bit
0 = transfer word quantity 1 = transfer byte quantity
Up/Down bit
0 = down; subtract offset from base 1 = up; add offset to base
Immediate offset
11 0 = offset is an immediate value 0
Immediate offset
Shift
Rm
shift applied to Rm
Offset register
Condition field
48
Instruction Set
Instruction Set
indexed data transfer is in privileged mode code, where setting the W bit forces non-privileged mode for the transfer, allowing the operating system to generate a user address in a system where the memory management hardware makes suitable use of this hardware. Little endian configuration A byte load (LDRB) expects the data on data bus inputs 7 through 0 if the supplied address is on a word boundary, on data bus inputs 15 through 8 if it is a word address plus one byte, and so on. The selected byte is placed in the bottom 8 bits of the destination register, and the remaining bits of the register are filled with zeros. Please see Figure 5. A byte store (STRB) repeats the bottom 8 bits of the source register four times across data bus outputs 31 through 0. The external memory system should activate the appropriate byte subsystem to store the data. A word load (LDR) will normally use a word aligned address. However, an address offset from a word boundary will cause the data to be rotated into the register so that the addressed byte occupies bits 0 to 7. This means that halfwords accessed at offsets 0 and 2 from the word boundary will be correctly loaded into bits 0 through 15 of the register. Two shift operations are then required to clear or to sign extend the upper 16 bits. This is illustrated in Figure 24.
register
A
A+3 24
A
24
B
A+2 16
B
16
C
A+1 8
C
8
D
A 0
D
0
A
24
B
A+2 16
B
16
C
A+1 8
C
8
D
A 0
D
0
LDR from address offset by 2 A word store (STR) should generate a word aligned address. The word presented to the data bus is not affected if the address is not word aligned. That is, bit 31 of the register being stored always appears on data bus output 31. Big endian configuration A byte load (LDRB) expects the data on data bus inputs 31 through 24 if the supplied address is on a word boundary, on data bus inputs 23 through 16 if it is a word address plus one byte, and so on. The selected byte is placed in the bottom 8 bits of the destination register and the remaining bits of the register are filled with zeros. Please see Figure 4. A byte store (STRB) repeats the bottom 8 bits of the source register four times across data bus outputs 31 through 0. The external memory system should activate the appropriate byte subsystem to store the data. A word load (LDR) should generate a word aligned address. An address offset of 0 or 2 from a word boundary
49
will cause the data to be rotated into the register so that the addressed byte occupies bits 31 through 24. This means that half-words accessed at these offsets will be correctly loaded into bits 16 through 31 of the register. A shift operation is then required to move (and optionally sign extend) the data into the bottom 16 bits. An address offset of 1 or 3 from a word boundary will cause the data to be rotated into the register so that the addressed byte occupies bits 15 through 8. A word store (STR) should generate a word aligned address. The word presented to the data bus is not affected if the address is not word aligned. That is, bit 31 of the register being stored always appears on data bus output 31.
updated before the abort handler starts. Sometimes it may be impossible to calculate the initial value. After an abort, the following example code is difficult to unwind as the base register, Rn, gets updated before the abort handler starts. Sometimes it may be impossible to calculate the initial value. Example:
LDRR0,[R1],R1
Therefore a post-indexed LDR or STR where Rm is the same register as Rn should not be used.
Data aborts
A transfer to or from a legal address may cause problems for a memory management system. For instance, in a system which uses virtual memory the required data may be absent from main memory. The memory manager can signal a problem by taking the processor ABORT input HIGH whereupon the Data Abort trap will be taken. It is up to the system software to resolve the cause of the problem, then the instruction can be restarted and the original program continued.
Use of R15
Write-back must not be specified if R15 is specified as the base register (Rn). When using R15 as the base register you must remember it contains an address 8 bytes on from the address of the current instruction. R15 must not be specified as the register offset (Rm). When R15 is the source register (Rd) of a register store (STR) instruction, the stored value will be address of the instruction plus 12.
50
Instruction Set
Instruction Set
Assembler syntax
<LDR|STR>{cond}{B}{T} Rd,<Address>
where: LDR load from memory into a register STR store from a register into memory {cond} two-character condition mnemonic. See Table 6. {B} if B is present then byte transfer, otherwise word transfer {T} if T is present the W bit will be set in a post-indexed instruction, forcing non-privileged mode for the transfer cycle. T is not allowed when a pre-indexed addressing mode is specified or implied. Rd is an expression evaluating to a valid register number. Rn and Rm are expressions evaluating to a register number. If Rn is R15 then the assembler will subtract 8 from the offset value to allow for ARM7TDMI pipelining. In this case base write-back should not be specified. <Address>can be: 1. An expression which generates an address:
<expression>
The assembler will attempt to generate an instruction using the PC as a base and a corrected immediate offset to address the location given by evaluating the expression. This will be a PC relative, pre-indexed
address. If the address is out of range, an error will be generated. 2. A pre-indexed addressing specification: [Rn] offset of zero [Rn,<#expression>]{!} offset of <expression> bytes [Rn,{+/-}Rm{,<shift>}]{!} offset of +/- contents of index register, shifted by <shift> 3. A post-indexed addressing specification: [Rn],<#expression> offset of <expression> bytes [Rn],{+/-}Rm{,<shift>} offset of +/- contents of index register, shifted as by <shift>. <shift> general shift operation (see data processing instructions) but you cannot specify the shift amount by a register. {!} writes back the base register (set the W bit) if! is present.
Examples
STR R1,[R2,R4]! ; ; ; ; ; ; ; ; ; ; ; ; ; Store R1 at R2+R4 (both of which are registers) and write back address to R2. Store R1 at R2 and write back R2+R4 to R2. Load R1 from contents of R2+16, but dont write back. Load R1 from contents of R2+R3*4. Conditionally load byte at R6+5 into R1 bits 0 to 7, filling bits 8 to 31 with zeros. Generate PC relative offset to address PLACE.
STR LDR
R1,[R2],R4 R1,[R2,#16]
STR
R1,PLACE
PLACE
51
Figure 25. Halfword and Signed Data Transfer with Register Offset
31 28 27 25 24 23 22 21 20 19 16 15 12 11 8 7 6 5 4 3 0
Cond
0 0 0 P U 0 W L
Rn
Rd
0 0 0
0 1 S H 1
Rm
Offset register SH
00 = SWP instruction 01 = Unsigned halfwords 10 = Signed byte 11 = Signed halfwords
Write-back
0 = no write-back 1 = write address into base
Up/Down
0 = down: subtract offset from base 1 = up: add offset to base
Pre/Post indexing
0 = post: add/subtract offset after transfer 1 = pre: add/subtract offset before transfer
Condition field
52
Instruction Set
Instruction Set
Figure 26. Halfword and Signed Data Transfer With Immediate Offset
31 28 27 25 24 23 22 21 20 19 16 15 12 11 8 7 6 5 4 3 0
Cond
0 0 0 P U 1 W L
Rn
Rd
Offset
1 S H 1
Offset
Write-back
0 = no write-back 1 = write address into base
Up/Down
0 = down: subtract offset from base 1 = up: add offset to base
Pre/Post indexing
0 = post: add/subtract offset after transfer 1 = pre: add/subtract offset before transfer
Condition field
53
Use of R15
Write-back should not be specified if R15 is specified as the base register (Rn). When using R15 as the base register you must remember it contains an address 8 bytes on from the address of the current instruction. R15 should not be specified as the register offset (Rm). When R15 is the source register (Rd) of a Half-word store (STRH) instruction, the stored address will be address of the instruction plus 12.
Data aborts
A transfer to or from a legal address may cause problems for a memory management system. For instance, in a system which uses virtual memory the required data may be absent from the main memory. The memory manager can signal a problem by taking the processor ABORT input HIGH whereupon the Data Abort trap will be taken. It is up to the system software to resolve the cause of the problem, then the instruction can be restarted and the original program continued.
54
Instruction Set
Instruction Set
Assembler syntax
<LDR|STR>{cond}<H|SH|SB> Rd,<address>
LDR load from memory into a register STR Store from a register into memory {cond} two-character condition mnemonic. See Table 6. H Transfer halfword quantity SB Load sign extended byte (Only valid for LDR) SH Load sign extended halfword (Only valid for LDR) Rd is an expression evaluating to a valid register number. <address> can be: 1. An expression which generates an address:
<expression>
The assembler will attempt to generate an instruction using the PC as a base and a corrected immediate offset to address the location given by evaluating the expression. This will be a PC relative, pre-indexed
address. If the address is out of range, an error will be generated. 2. A pre-indexed addressing specification: [Rn] offset of zero [Rn,<#expression>]{!} offset of <expression> bytes [Rn,{+/-}Rm]{!} offset of +/- contents of index register 3. A post-indexed addressing specification: [Rn],<#expression> offset of <expression> bytes [Rn],{+/-}Rm offset of +/- contents of index register. Rn and Rm are expressions evaluating to a register number. If Rn is R15 then the assembler will subtract 8 from the offset value to allow for ARM7TDMI pipelining. In this case base write-back should not be specified. {!} writes back the base register (set the W bit) if ! is present
Examples
LDRH R1,[R2,-R3]!; Load R1 from the contents of the ; halfword address contained in ; R2-R3 (both of which are registers) ; and write back address to R2 STRH R3,[R4,#14]; Store the halfword in R3 at R14+14 ; but dont write back. LDRSB R8,[R2],#-223; Load R8 with the sign extended ; contents of the byte address ; contained in R2 and write back ; R2-223 to R2. LDRNESHR11,[R0]; conditionally load R11 with the sign ; extended contents of the halfword ; address contained in R0. HERE ; Generate PC relative offset to ; address FRED. ; Store the halfword in R5 at address ; FRED. STRH R5, [PC, #(FRED-HERE-8)] . FRED
55
Cond
100
P U S W L
Rn
Register list
Write-back bit
0 = no write-back 1 = write address into base
Up/Down bit
0 = down; subtract offset from base 1 = up; add offset to base
Condition field
Addressing modes
The transfer addresses are determined by the contents of the base register (Rn), the pre/post bit (P) and the up/down bit (U). The registers are transferred in the order lowest to highest, so R15 (if in the list) will always be transferred last. The lowest register also gets transferred to/from the lowest memory address. By way of illustration, consider the transfer of R1, R5 and R7 in the case where Rn=0x1000 and write back of the modified base is required (W=1). Figure 28, Figure 29, Figure 30 and Figure 31 show the sequence of register transfers, the addresses used, and the value of Rn after the instruction has completed. In all cases, had write back of the modified base not been required (W=0), Rn would have retained its initial value of 0x1000 unless it was also in the transfer list of a load multiple register instruction, when it would have been overwritten with the loaded value.
56
Instruction Set
Instruction Set
Address alignment
The address should normally be a word aligned quantity and non-word aligned addresses do not affect the instrucFigure 28. Post-increment Addressing
0x100C 0x100C
tion. However, the bottom 2 bits of the address will appear on A[1:0] and might be interpreted by the memory system.
Rn
0x1000
R1
0x1000
0x0FF4
0x0FF4
1
0x100C R5 R1 Rn
2
0x100C R7 R5 R1
0x1000
0x1000
0x0FF4
0x0FF4
3
Figure 29. Pre-increment Addressing
0x100C
0x100C R1
Rn
0x1000
0x1000
0x0FF4
0x0FF4
1
0x100C R5 R1 0x1000 Rn
2
R7 R5 R1 0x100C
0x1000
0x0FF4
0x0FF4
57
Rn
0x1000 R1 0x0FF4
0x1000
0x0FF4
1
0x100C
2
0x100C
0x1000 R5 R1 0x0FF4 Rn
R7 R5 R1
0x1000
0x0FF4
3
Figure 31. Pre-decrement Addressing
0x100C
0x100C
Rn
0x1000
0x1000
0x0FF4
R1
0x0FF4
1
0x100C
2
0x100C
0x1000 R5 R1 R7 R5 R1
0x1000
0x0FF4
Rn
0x0FF4
58
Instruction Set
Instruction Set
Use of the S bit
When the S bit is set in a LDM/STM instruction its meaning depends on whether or not R15 is in the transfer list and on the type of instruction. The S bit should only be set if the instruction is to execute in a privileged mode. LDM with R15 in transfer list and S bit set (Mode changes) If the instruction is a LDM then SPSR_<mode> is transferred to CPSR at the same time as R15 is loaded. STM with R15 in transfer list and S bit set (User bank transfer) The registers transferred are taken from the User bank rather than the bank corresponding to the current mode. This is useful for saving the user state on process switches. Base write-back should not be used when this mechanism is employed. R15 not in list and S bit set (User bank transfer) For both LDM and STM instructions, the User bank registers are transferred rather than the register bank corresponding to the current mode. This is useful for saving the user state on process switches. Base write-back should not be used when this mechanism is employed. When the instruction is LDM, care must be taken not to read from a banked register during the following cycle (inserting a dummy instruction such as MOV R0, R0 after the LDM will ensure safety).
Data aborts
Some legal addresses may be unacceptable to a memory management system, and the memory manager can indicate a problem with an address by taking the ABORT signal HIGH. This can happen on any transfer during a multiple register load or store, and must be recoverable if ARM7TDMI is to be used in a virtual memory system. Aborts during STM instructions If the abort occurs during a store multiple instruction, ARM7TDMI takes little action until the instruction completes, whereupon it enters the data abort trap. The memory manager is responsible for preventing erroneous writes to the memory. The only change to the internal state of the processor will be the modification of the base register if write-back was specified, and this must be reversed by software (and the cause of the abort resolved) before the instruction may be retried. Aborts during LDM instructions When ARM7TDMI detects a data abort during a load multiple instruction, it modifies the operation of the instruction to ensure that recovery is possible. 1. Overwriting of registers stops when the abort happens. The aborting load will not take place but earlier ones may have overwritten registers. The PC is always the last register to be written and so will always be preserved. 2. The base register is restored, to its modified value if write-back was requested. This ensures recoverability in the case where the base register is also in the transfer list, and may have been overwritten before the abort occurred. The data abort trap is taken when the load multiple has completed, and the system software must undo any base modification (and resolve the cause of the abort) before restarting the instruction.
59
Assembler syntax
<LDM|STM>{cond}<FD|ED|FA|EA|IA|IB|DA|DB> Rn{!},<Rlist>{^}
where: {cond} two character condition mnemonic. See Table 6. Rn is an expression evaluating to a valid register number <Rlist> is a list of registers and register ranges enclosed in {} (e.g. {R0,R2-R7,R10}). {!} if present requests write-back (W=1), otherwise W=0 {^} if present set S bit to load the CPSR along with the PC, or force transfer of user bank when in privileged mode Table 10. Addressing Mode Names
Name pre-increment load post-increment load pre-decrement load post-decrement load pre-increment store post-increment store pre-decrement store post-decrement store Stack LDMED LDMFD LDMEA LDMFA STMFA STMEA STMFD STMED
Addressing mode names There are different assembler mnemonics for each of the addressing modes, depending on whether the instruction is being used to support stacks or for other purposes. The equivalence between the names and the values of the bits in the instruction are shown in the following table
L bit 1 1 1 1 0 0 0 0
P bit 1 0 1 0 1 0 1 0
U bit 1 1 0 0 1 1 0 0
FD, ED, FA, EA define pre/post indexing and the up/down bit by reference to the form of stack required. The F and E refer to a full or empty stack, i.e. whether a pre-index has to be done (full) before storing to the stack. The A and D refer to whether the stack is ascending or descending. If ascending, a STM will go up and LDM down, if descending, vice-versa.
IA, IB, DA, DB allow control when LDM/STM are not being used for stacks and simply mean Increment After, Increment Before, Decrement After, Decrement Before.
60
Instruction Set
Instruction Set
Examples
LDMFD STMIA LDMFD LDMFD ; Unstack 3 registers. ; Save all registers. ; R15 <- (SP),CPSR unchanged. ; R15 <- (SP), CPSR <- SPSR_mode ; (allowed only in privileged modes). STMFD R13,{R0-R14}^ ; Save user mode regs on stack ; (allowed only in privileged modes). These instructions may be used to save state on subroutine entry, and restore it efficiently on return to the calling routine: STMED SP!,{R0-R3,R14} ; Save R0 to R3 to use as workspace ; and R14 for returning. BL somewhere ; This nested call will overwrite R14 LDMED SP!,{R0-R3,R15} ; restore workspace and return. SP!,{R0,R1,R2} R0,{R0-R15} SP!,{R15} SP!,{R15}^
61
Cond
00010
00
Rn
Rd
0000
1001
Rm
Condition field
The instruction is only executed if the condition is true. The various conditions are defined in Table 6. The instruction encoding is shown in Figure 32. The data swap instruction is used to swap a byte or word quantity between a register and external memory. This instruction is implemented as a memory read followed by a memory write which are locked together (the processor cannot be interrupted until both operations have completed, and the memory manager is warned to treat them as inseparable). This class of instruction is particularly useful for implementing software semaphores. The swap address is determined by the contents of the base register (Rn). The processor first reads the contents of the swap address. Then it writes the contents of the source register (Rm) to the swap address, and stores the old memory contents in the destination register (Rd). The same register may be specified as both the source and destination. The LOCK output goes HIGH for the duration of the read and write operations to signal to the external memory manager that they are locked together, and should be allowed to complete without interruption. This is important in multiprocessor systems where the swap instruction is the only indivisible instruction which may be used to implement semaphores; control of the memory must not be removed from a processor while it is performing a locked operation.
Use of R15
Do not use R15 as an operand (Rd, Rn or Rs) in a SWP instruction.
Data aborts
If the address used for the swap is unacceptable to a memory management system, the memory manager can flag the problem by driving ABORT HIGH. This can happen on either the read or the write cycle (or both), and in either case, the Data Abort trap will be taken. It is up to the system software to resolve the cause of the problem, then the instruction can be restarted and the original program continued.
62
Instruction Set
Instruction Set
Assembler syntax
<SWP>{cond}{B} Rd,Rm,[Rn]
{cond} two-character condition mnemonic. See Table 6. {B} if B is present then byte transfer, otherwise word transfer Rd,Rm,Rn are expressions evaluating to valid register numbers
Examples
SWP SWPB R0,R1,[R2] R2,R3,[R4] ; ; ; ; ; ; Load R0 with the word addressed by R2, and store R1 at R2. Load R2 with the byte addressed by R4, and store bits 0 to 7 of R3 at R4. Conditionally swap the contents of the word addressed by R1 with R0.
SWPEQ R0,R0,[R1]
63
mode change. The PC is then forced to a fixed value (0x08) and the CPSR is saved in SPSR_svc. If the SWI vector address is suitably protected (by external memory management hardware) from modification by the user, a fully protected operating system may be constructed.
Cond
1111
Condition field
at this field and use it to index into an array of entry points for routines which perform the various supervisor functions.
Assembler syntax
SWI{cond} <expression>
Comment field
The bottom 24 bits of the instruction are ignored by the processor, and may be used to communicate information to the supervisor code. For instance, the supervisor may look
{cond} two character condition mnemonic, Table 6. <expression> is evaluated and placed in the comment field (which is ignored by ARM7TDMI).
64
Instruction Set
Instruction Set
Examples
SWI ReadC SWI WriteI+k SWINE 0 ; ; ; ; Get next character from read stream. Output a k to the write stream. Conditionally call supervisor with 0 in comment field.
Supervisor code The previous examples assume that suitable supervisor code exists, for instance: 0x08 B Supervisor ; SWI entry point EntryTable ; addresses of supervisor routines DCD ZeroRtn DCD ReadCRtn DCD WriteIRtn . . . Zero EQU 0 ReadC EQU 256 WriteI EQU 512 Supervisor ; SWI has routine required in bits 8-23 and data (if any) in ; bits 0-7. ; Assumes R13_svc points to a suitable stack STMFD R13,{R0-R2,R14} LDR BIC MOV ADR LDR ; ; R0,[R14,#-4] ; R0,R0,#0xFF000000 ; R1,R0,LSR#8 ; R2,EntryTable ; R15,[R2,R1,LSL#2] ; Save work registers and return address. Get SWI instruction. Clear top 8 bits. Get routine offset. Get start address of entry table. Branch to appropriate routine.
; Enter with character in R0 bits 0-7. . ; Restore workspace and return, ; restoring processor mode and flags.
65
back to ARM7TDMI, and it will not wait for the operation to complete. The coprocessor could contain a queue of such instructions awaiting execution, and their execution can overlap other activity, allowing the coprocessor and ARM7TDMI to perform independent tasks in parallel.
Cond
1110
CP Opc
CRn
CRd
CP#
CP
CRm
Coprocessor operand register Coprocessor information Coprocessor number Coprocessor destination register Coprocessor operand register Coprocessor operation code Condition field
Assembler syntax
CDP{cond} p#,<expression1>,cd,cn,cm{,<expression2>}
{cond} two character condition mnemonic. See Table 6. p# the unique number of the required coprocessor <expression1> evaluated to a constant and placed in the CP Opc field cd, cn and cm evaluate to the valid coprocessor register numbers CRd, CRn and CRm respectively <expression2> where present is evaluated to a constant and placed in the CP field
66
Instruction Set
Instruction Set
Examples
CDP p1,10,c1,c2,c3 ; ; ; ; ; ; Request coproc 1 to do operation 10 on CR2 and CR3, and put the result in CR1. If Z flag is set request coproc 2 to do operation 5 (type 2) on CR2 and CR3,and put the result in CR1.
CDPEQ p2,5,c1,c2,c3,2
67
Cond
110
P U N W L
Rn
CRd
CP#
Offset
Unsigned 8 bit immediate offset Coprocessor number Coprocessor source/destination register Base register Load/Store bit
0 = Store to memory 1 = Load from memory
Write-back bit
0 = no write-back 1 = write address into base
Condition field
Addressing modes
ARM7TDMI is responsible for providing the address used by the memory system for the transfer, and the addressing modes available are a subset of those used in single data transfer instructions. Note, however, that the immediate offsets are 8 bits wide and specify word offsets for coprocessor data transfers, whereas they are 12 bits wide and specify byte offsets for single data transfers.
The 8 bit unsigned immediate offset is shifted left 2 bits and either added to (U=1) or subtracted from (U=0) the base register (Rn); this calculation may be performed either before (P=1) or after (P=0) the base is used as the transfer address. The modified base value may be overwritten back into the base register (if W=1), or the old value of the base may be preserved (W=0). Note that post-indexed addressing modes require explicit setting of the W bit, unlike LDR and STR which always write-back when post-indexed. The value of the base register, modified by the offset in a pre-indexed instruction, is used as the address for the transfer of the first word. The second word (if more than one is transferred) will go to or come from an address one word (4 bytes) higher than the first transfer, and the address will be incremented by one word for each subsequent transfer.
68
Instruction Set
Instruction Set
Address alignment
The base address should normally be a word aligned quantity. The bottom 2 bits of the address will appear on A[1:0] and might be interpreted by the memory system. {cond} two character condition mnemonic. See Table 6. p# the unique number of the required coprocessor cd is an expression evaluating to a valid coprocessor register number that is placed in the CRd field <Address> can be: 1. An expression which generates an address:
<expression>
Use of R15
If Rn is R15, the value used will be the address of the instruction plus 8 bytes. Base write-back to R15 must not be specified.
Data aborts
If the address is legal but the memory manager generates an abort, the data trap will be taken. The write-back of the modified base will take place, but all other processor state will be preserved. The coprocessor is partly responsible for ensuring that the data transfer can be restarted after the cause of the abort has been resolved, and must ensure that any subsequent actions it undertakes can be repeated when the instruction is retried.
Assembler syntax
<LDC|STC>{cond}{L} p#,cd,<Address>
LDC load from memory to coprocessor STC store from coprocessor to memory {L} when present perform long transfer (N=1), otherwise perform short transfer (N=0)
The assembler will attempt to generate an instruction using the PC as a base and a corrected immediate offset to address the location given by evaluating the expression. This will be a PC relative, pre-indexed address. If the address is out of range, an error will be generated. 2. A pre-indexed addressing specification: [Rn] offset of zero [Rn,<#expression>]{!} offset of <expression> bytes 3. A post-indexed addressing specification: [Rn],<#expression> offset of <expression> bytes {!} write back the base register (set the W bit) if! is present Rn is an expression evaluating to a valid ARM7TDMI register number. Note: If Rn is R15, the assembler will subtract 8 from the offset value to allow for ARM7TDMI pipelining.
Examples
LDC p1,c2,table; Load c2 of coproc 1 from address ; table, using a PC relative address. STCEQLp2,c3,[R5,#24]!; Conditionally store c3 of coproc 2 ; into an address 24 bytes up from R5, ; write this address back to R5, and use ; long transfer option (probably to ; store multiple words). Note: Although the address offset is expressed in bytes, the instruction offset field is in words. The assembler will adjust the offset appropriately.
69
FLOAT of a 32 bit value in ARM7TDMI register into a floating point value within the coprocessor illustrates the use of ARM7TDMI register to coprocessor transfer (MCR). An important use of this instruction is to communicate control information directly from the coprocessor into the ARM7TDMI CPSR flags. As an example, the result of a comparison of two floating point values within a coprocessor can be moved to the CPSR to control the subsequent flow of execution.
Cond
1110
CP Opc L
CRn
Rd
CP#
CP
CRm
Coprocessor operand register Coprocessor information Coprocessor number ARM source/destination register Coprocessor source/destination register Load/Store bit
0 = Store to Co-Processor 1 = Load from Co-Processor
Transfers to R15
When a coprocessor register transfer to ARM7TDMI has R15 as the destination, bits 31, 30, 29 and 28 of the transferred word are copied into the N, Z, C and V flags respectively. The other bits of the transferred word are ignored, and the PC and other CPSR bits are unaffected by the transfer.
70
Instruction Set
Instruction Set
Assembler syntax
<MCR|MRC>{cond} p#,<expression1>,Rd,cn,cm{,<expression2>}
MRC move from coprocessor to ARM7TDMI register (L=1) MCR move from ARM7TDMI register to coprocessor (L=0) {cond} two character condition mnemonic. See Table 6. p# the unique number of the required coprocessor <expression1> evaluated to a constant and placed in the CP Opc field
Rd is an expression evaluating to a valid ARM7TDMI register number cn and cm are expressions evaluating to the valid coprocessor register numbers CRn and CRm respectively <expression2> where present is evaluated to a constant and placed in the CP field
Examples
MRC p2,5,R3,c5,c6 ; Request coproc 2 to perform operation 5 ; on c5 and c6, and transfer the (single ; 32 bit word) result back to R3. ; Request coproc 6 to perform operation 0 ; on R4 and place the result in c6.
MCR
p6,0,R4,c5,c6
MRCEQ p3,9,R3,c5,c6,2; Conditionally request coproc 3 to ; perform operation 9 (type 2) on c5 and ; c6, and transfer the result back to R3.
Undefined Instruction
The instruction is only executed if the condition is true. The various conditions are defined in Table 6. The instruction format is shown in Figure 37. Figure 37. Undefined Instruction
31 28 27 25 24 5 4 3 0
Cond
011
xxxxxxxxxxxxxxxxxxxx
xxxx
If the condition is true, the undefined instruction trap will be taken. Note that the undefined instruction mechanism involves offering this instruction to any coprocessors which may be present, and all coprocessors must refuse to accept it by driving CPA and CPB HIGH.
Assembler syntax
The assembler has no mnemonics for generating this instruction. If it is adopted in the future for some specified use, suitable mnemonics will be added to the assembler. Until such time, this instruction must not be used.
71
Multiplication by 4, 5 or 6 (run time) MOV Rc,Ra,LSL#2 ; Multiply by 4, CMP Rb,#5 ; test value, ADDCS Rc,Rc,Ra ; complete multiply by 5, ADDHI Rc,Rc,Ra ; complete multiply by 6. Combining discrete and range tests TEQ Rc,#127 ; Discrete test, CMPNE Rc,# -1 ; range test MOVLS Rc,#. ; IF Rc<= OR Rc=ASCII(127) ; THEN Rc:=. Division and remainder from your supplier. A short general purpose divide routine A number of divide routines for specific applications are follows. provided in source form as part of the ANSI C library provided with the ARM Cross Development Toolkit, available ; Enter with numbers in Ra and Rb. ; MOV Rcnt,#1 ; Bit to control the division. Div1 CMP Rb,#0x80000000 ; Move Rb until greater than Ra. CMPCC Rb,Ra MOVCC Rb,Rb,ASL#1 MOVCC Rcnt,Rcnt,ASL#1 BCC Div1 MOV Rc,#0 Div2 CMP Ra,Rb ; Test for possible subtraction. SUBCS Ra,Ra,Rb ; Subtract if ok, ADDCS Rc,Rc,Rcnt ; put relevant bit into result MOVS Rcnt,Rcnt,LSR#1; shift control bit MOVNE Rb,Rb,LSR#1 ; halve unless finished. BNE Div2 ; ; Divide result in Rc,
72
Instruction Set
Instruction Set
; remainder in Ra. Overflow detection in the ARM7TDMI 1. Overflow in unsigned multiply with a 32 bit result UMULL Rd,Rt,Rm,Rn ;3 to 6 cycles TEQ Rt,#0 ;+1 cycle and a register BNE overflow 2. Overflow in signed multiply with a 32 bit result SMULL Rd,Rt,Rm,Rn ;3 to 6 cycles TEQ Rt,Rd ASR#31 ;+1 cycle and a register BNE overflow 3. Overflow in unsigned multiply accumulate with a 32 bit result UMLAL Rd,Rt,Rm,Rn ;4 to 7 cycles TEQ Rt,#0 ;+1 cycle and a register BNE overflow 4. Overflow in signed multiply accumulate with a 32 bit result SMLAL Rd,Rt,Rm,Rn ;4 to 7 cycles TEQ Rt,Rd, ASR#31 ;+1 cycle and a register BNE overflow 5. Overflow in unsigned multiply accumulate with a 64 bit result UMULL Rl,Rh,Rm,Rn ;3 to 6 cycles ADDS Rl,Rl,Ra1 ;lower accumulate ADC Rh,Rh,Ra2 ;upper accumulate BCS overflow ;1 cycle and 2 registers 6. Overflow in signed multiply accumulate with a 64 bit result SMULL Rl,Rh,Rm,Rn ;3 to 6 cycles ADDS Rl,Rl,Ra1 ;lower accumulate ADC Rh,Rh,Ra2 ;upper accumulate BVS overflow ;1 cycle and 2 registers Note: Overflow checking is not applicable to unsigned and signed multiplies with a 64-bit result, since overflow does not occur in such calculations.
; Enter with seed in Ra (32 bits), Rb (1 bit in Rb lsb), uses Rc. ; ; Top bit into carry ; 33 bit rotate right ; carry into lsb of Rb ; (involved!) ; (similarly involved!) ; new seed in Ra, Rb as before
73
74
Instruction Set
Instruction Set
Loading a word from an unknown alignment
; ; ; ; BIC Rb,Ra,#3 ; LDMIA Rb,{Rd,Rc} ; AND Rb,Ra,#3 ; MOVS Rb,Rb,LSL#3 ; MOVNE Rd,Rd,LSR Rb ; ; RSBNE Rb,Rb,#32 ; ORRNE Rd,Rd,Rc,LSL Rb; enter with address in Ra (32 bits) uses Rb, Rc; result in Rd. Note d must be less than c e.g. 0,1 get word aligned address get 64 bits containing answer correction factor in bytes ...now in bits and test if aligned produce bottom of result word (if not aligned) get other shift amount combine two halves to get result
75
76
Instruction Set
77
Format Summary
The THUMB instruction set formats are shown in the following figure. Figure 38. THUMB Instruction Set Formats
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
15
0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1
14
0 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1
13
Op 1 1
Offset5 I Op Rn/offset3 Rd 0 1 Op Rd B S 0 1 Ro Ro Op H1 H2
Rd Rd
Op 0 0 0 1 1 B 0 1 0 0 1 L H L L L
Rd Rd/Hd
Rd Rd Rd Rd
Load/store with register offset Load/store sign-extended byte/halfword Load/store with immediate offset Load/store halfword SP-relative load/store Load address Add offset to stack pointer Push/pop registers Multiple load/store Conditional branch Software Interrupt Unconditional branch Long branch with link
0 SP 1 1 0 1 1 0 1
12
0 L L
1 0 H
11
10
78
Instruction Set
Instruction Set
Opcode Summary
The following table summarizes the THUMB instruction set. For further information about a particular instruction please refer to the sections listed in the right-most column. Table 11. THUMB Instruction Set Opcodes
Mnemonic ADC ADD AND ASR B Bxx BIC BL BX CMN CMP EOR LDMIA LDR LDRB LDRH LSL LDSB LDSH LSR MOV MUL MVN NEG ORR POP PUSH ROR SBC STMIA STR STRB STRH SWI SUB TST Notes: 1. 2. Instruction Add with Carry Add AND Arithmetic Shift Right Unconditional branch Conditional branch Bit Clear Branch and Link Branch and Exchange Compare Negative Compare EOR Load multiple Load word Load byte Load halfword Logical Shift Left Load sign-extended byte Load sign-extended halfword Logical Shift Right Move register Multiply Move Negative register Negate OR Pop registers Push registers Rotate Right Subtract with Carry Store Multiple Store word Store byte Store halfword Software Interrupt Subtract Test bits Lo register operand Hi register operand Condition codes set
(1)
See Page: 84 81, 83, 86, 100, 101 84 80, 84 108 105 84 109 86 84 83, 84, 86 84 104 89, 90, 94, 98 90, 94 92, 96 80, 84 92 92 80, 84 83, 86 84 84 84 84 102 102 84 84 104 90, 94, 98 90 92, 96 107 81, 83 84
(2)
The condition codes are unaffected by the format 5, 12 and 13 versions of this instruction. The condition codes are unaffected by the format 5 version of this instruction.
79
Op
Offset5
Rs
Rd
Operation
These instructions move a shifted value between Lo registers. The THUMB assembler syntax is shown in Table 12. Table 12. Summary of Format 1 Instructions
OP 00 01 10 THUMB assembler LSL Rd, Rs, #Offset5 LSR Rd, Rs, #Offset5 ASR Rd, Rs, #Offset5 ARM equivalent MOVS Rd, Rs, LSL #Offset5 MOVS Rd, Rs, LSR #Offset5 MOVS Rd, Rs, ASR #Offset5 Action Shift Rs left by a 5-bit immediate value and store the result in Rd. Perform logical shift right on Rs by a 5-bit immediate value and store the result in Rd. Perform arithmetic shift right on Rs by a 5-bit immediate value and store the result in Rd.
Note: All instructions in this group set the CPSR condition codes
Examples
LSR R2, R5, #27 ; Logical shift right the contents ; of R5 by 27 and store the result in R2. ; Set condition codes on the result.
80
Instruction Set
Instruction Set
Format 2: add/subtract
Figure 40. Format 2
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Op
Rn/Offset3
Rs
Rd
Immediate flag
0 - Register operand 1 - Immediate operand
Operation
These instructions allow the contents of a Lo register or a 3-bit immediate value to be added to or subtracted from a Lo register. The THUMB assembler syntax is shown in Table 13. Table 13. Summary of Format 2 Instructions
Op 0 0 1 1 I 0 1 0 1 THUMB assembler ADD Rd, Rs, Rn ADD Rd, Rs, #Offset3 SUB Rd, Rs, Rn SUB Rd, Rs, #Offset3 ARM equivalent ADDS Rd, Rs, Rn ADDS Rd, Rs, #Offset3 SUBS Rd, Rs, Rn SUBS Rd, Rs, #Offset3 Action Add contents of Rn to contents of Rs. Place result in Rd. Add 3-bit immediate value to contents of Rs. Place result in Rd. Subtract contents of Rn from contents of Rs. Place result in Rd. Subtract 3-bit immediate value from contents of Rs. Place result in Rd.
Note: All instructions in this group set the CPSR condition codes
81
Examples
ADD R0, R3, R4 ; R0 := R3 + R4 and set condition codes on ; the result. ; R6 := R2 - 6 and set condition codes.
SUB
R6, R2, #6
82
Instruction Set
Instruction Set
Format 3: move/compare/add/subtract immediate
Figure 41. Format 3
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Op
Rd
Offset8
Operations
The instructions in this group perform operations between a Lo register and an 8-bit immediate value. The THUMB assembler syntax is shown in Table 14. Table 14. Summary of Format 3 Instructions
Op 00 01 10 11 THUMB assembler MOV Rd, #Offset8 CMP Rd, #Offset8 ADD Rd, #Offset8 SUB Rd, #Offset8 ARM equivalent MOVS Rd, #Offset8 CMP Rd, #Offset8 ADDS Rd, Rd, #Offset8 SUBS Rd, Rd, #Offset8 Action Move 8-bit immediate value into Rd. Compare contents of Rd with 8-bit immediate value. Add 8-bit immediate value to contents of Rd and place the result in Rd. Subtract 8-bit immediate value from contents of Rd and place the result in Rd.
Note: All instructions in this group set the CPSR condition codes.
Examples
MOV CMP ADD R0, #128 R2, #62 R1, #255 ; R0 := 128 and set condition codes ; Set condition codes on R2 - 62 ; R1 := R1 + 255 and set condition ; codes ; R6 := R6 - 145 and set condition ; codes
SUB
R6, #145
83
Op
Rs
Rd
Operation
The following instructions perform ALU operations on a Lo register pair. Table 15. Summary of Format 4 Instructions
OP 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 THUMB assembler AND Rd, Rs EOR Rd, Rs LSL Rd, Rs LSR Rd, Rs ASR Rd, Rs ADC Rd, Rs SBC Rd, Rs ROR Rd, Rs TST Rd, Rs NEG Rd, Rs CMP Rd, Rs CMN Rd, Rs ORR Rd, Rs MUL Rd, Rs BIC Rd, Rs MVN Rd, Rs ARM equivalent ANDS Rd, Rd, Rs EORS Rd, Rd, Rs MOVS Rd, Rd, LSL Rs MOVS Rd, Rd, LSR Rs MOVS Rd, Rd, ASR Rs ADCS Rd, Rd, Rs SBCS Rd, Rd, Rs MOVS Rd, Rd, ROR Rs TST Rd, Rs RSBS Rd, Rs, #0 CMP Rd, Rs CMN Rd, Rs ORRS Rd, Rd, Rs MULS Rd, Rs, Rd BICS Rd, Rd, Rs MVNS Rd, Rs Action Rd:= Rd AND Rs Rd:= Rd EOR Rs Rd := Rd << Rs Rd := Rd >> Rs Rd := Rd ASR Rs Rd := Rd + Rs + C-bit Rd := Rd - Rs - NOT C-bit Rd := Rd ROR Rs Set condition codes on Rd AND Rs Rd = -Rs Set condition codes on Rd - Rs Set condition codes on Rd + Rs Rd := Rd OR Rs Rd := Rs * Rd Rd := Rd AND NOT Rs Rd := NOT Rs
Note: All instructions in this group set the CPSR condition codes.
84
Instruction Set
Instruction Set
Instruction cycle times
All instructions in this format have an equivalent ARM instruction as shown in Table 15. The instruction cycle times for the THUMB instruction are identical to that of the equivalent ARM instruction. For more information on instruction cycle times, please refer to Instruction Cycle Operations on page 175.
Examples
EOR ROR R3, R4 R1, R0 ; R3 := R3 EOR R4 and set condition codes ; Rotate Right R1 by the value in R0, store ; the result in R1 and set condition codes ; Subtract the contents of R3 from zero, ; store the result in R5. Set condition codes ; ie R5 = -R3 ; Set the condition codes on the result of ; R2 - R6 ; R0 := R7 * R0 and set condition codes
NEG
R5, R3
CMP
R2, R6
MUL
R0, R7
85
Op
H1
H2
Rs/Hs
Rd/Hd
Operation
There are four sets of instructions in this group. The first three allow ADD, CMP and MOV operations to be performed between Lo and Hi registers, or a pair of Hi registers. The fourth, BX, allows a Branch to be performed which may also be used to switch processor state. The THUMB assembler syntax is shown in Table 16. Table 16. Summary of Format 5 Instructions
Op 00 00 00 01 H1 0 1 1 0 H2 1 0 1 1 THUMB assembler ADD Rd, Hs ADD Hd, Rs ADD Hd, Hs CMP Rd, Hs ARM equivalent ADD Rd, Rd, Hs ADD Hd, Hd, Rs ADD Hd, Hd, Hs CMP Rd, Hs Action Add a register in the range 8-15 to a register in the range 0-7. Add a register in the range 0-7 to a register in the range 8-15. Add two registers in the range 8-15 Compare a register in the range 0-7 with a register in the range 8-15. Set the condition code flags on the result. Compare a register in the range 8-15 with a register in the range 0-7. Set the condition code flags on the result. Compare two registers in the range 8-15. Set the condition code flags on the result. Move a value from a register in the range 8-15 to a register in the range 0-7. Move a value from a register in the range 0-7 to a register in the range 8-15. Move a value between two registers in the range 8-15. Perform branch (plus optional state change) to address in a register in the range 0-7. Perform branch (plus optional state change) to address in a register in the range 8-15.
Note: In this group only CMP (Op = 01) sets the CPSR condition codes. The action of H1= 0, H2 = 0 for Op = 00 (ADD), Op =01 (CMP) and Op = 10 (MOV) is undefined, and should not be used
01
CMP Hd, Rs
CMP Hd, Rs
01 10 10 10 11 11
1 0 1 1 0 0
1 1 0 1 0 1
86
Instruction Set
Instruction Set
Instruction cycle times
All instructions in this format have an equivalent ARM instruction as shown in Table 16. The instruction cycle times for the THUMB instruction are identical to that of the equivalent ARM instruction. For more information on instruction cycle times, please refer to Instruction Cycle Operations on page 175.
The BX instruction
BX performs a Branch to a routine whose start address is specified in a Lo or Hi register. Bit 0 of the address determines the processor state on entry to the routine: Bit 0 = 0 causes the processor to enter ARM state. Bit 0 = 1 causes the processor to enter THUMB state. Note: The action of H1 = 1 for this instruction is undefined, and should not be used.
87
Examples
Hi register operations ADD PC, R5 ; PC := PC + R5 but dont set the ; condition codes. ; Set the condition codes on the ; result of R4 - R12.
CMP
R4, R12
MOV
R15, R14 ; Move R14 (LR) into R15 (PC) ; but dont set the condition codes, ; eg. return from subroutine.
Branch and exchange ; Switch from THUMB to ARM state. ADR R1,outofTHUMB ; Load address of outofTHUMB ; into R1. R11,R1 R11 ; Transfer the contents of R11 into ; the PC. ; Bit 0 of R11 determines whether ; ARM or THUMB state is entered, ie. ; ARM state here.
MOV BX
88
Instruction Set
Instruction Set
Format 6: PC-relative load
Figure 44. Format 6
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Rd
Word8
Operation
This instruction loads a word from an address specified as a 10-bit immediate offset from the PC. Table 17. Summary of PC-Relative Load Instruction
THUMB assembler LDR Rd, [PC, #Imm] ARM equivalent LDR Rd, [R15, #Imm] Action Add unsigned offset (255 words, 1020 bytes) in Imm to the current value of the PC. Load the word from the resulting address into Rd.
Note: The value specified by #Imm is a full 10-bit address, but must always be word-aligned (ie with bits 1:0 set to 0), since the assembler places #Imm >> 2 in field Word8. Note: The value of the PC will be 4 bytes greater than the address of this instruction, but bit 1 of the PC is forced to 0 to ensure it is word aligned.
Examples
LDR R3,[PC,#844] ; ; ; ; ; Load into R3 the word found at the address formed by adding 844 to PC. bit[1] of PC is forced to zero. Note that the THUMB opcode will contain 211 as the Word8 value.
89
Ro
Rb
Rd
Load/Store flag
0 - Store to memory 1 - Load from memory
Operation
These instructions transfer byte or word values between registers and memory. Memory addresses are pre-indexed using an offset register in the range 0-7. The THUMB assembler syntax is shown in Table 18
90
Instruction Set
Instruction Set
Table 18. Summary of Format 7 Instructions (Continued)
L 0 B 1 THUMB assembler STRB Rd, [Rb, Ro] ARM equivalent STRB Rd, [Rb, Ro] Action Pre-indexed byte store: Calculate the target address by adding together the value in Rb and the value in Ro. Store the byte value in Rd at the resulting address. Pre-indexed word load: Calculate the source address by adding together the value in Rb and the value in Ro. Load the contents of the address into Rd. Pre-indexed byte load: Calculate the source address by adding together the value in Rb and the value in Ro. Load the byte value at the resulting address.
Examples
STR R3, [R2,R6] ; Store word in R3 at the address ; formed by adding R6 to R2. ; Load into R2 the byte found at ; the address formed by adding ; R7 to R0.
LDRB
R2, [R0,R7]
91
Ro
Rb
Rd
H flag
Operation
These instructions load optionally sign-extended bytes or halfwords, and store halfwords. The THUMB assembler syntax is shown below. Table 19. Summary of Format 8 Instructions
S 0 H 0 THUMB assembler STRH Rd, [Rb, Ro] ARM equivalent STRH Rd, [Rb, Ro] Action Store halfword: Add Ro to base address in Rb. Store bits 0-15 of Rd at the resulting address. Load halfword: Add Ro to base address in Rb. Load bits 0-15 of Rd from the resulting address, and set bits 16-31 of Rd to 0. Load sign-extended byte: Add Ro to base address in Rb. Load bits 0-7 of Rd from the resulting address, and set bits 8-31 of Rd to bit 7. Load sign-extended halfword: Add Ro to base address in Rb. Load bits 0-15 of Rd from the resulting address, and set bits 16-31 of Rd to bit 15.
92
Instruction Set
Instruction Set
Instruction cycle times
All instructions in this format have an equivalent ARM instruction as shown in Table 19. The instruction cycle times for the THUMB instruction are identical to that of the equivalent ARM instruction. For more information on instruction cycle times, please refer to Instruction Cycle Operations on page 175.
Examples
STRH R4, [R3, R0] ; Store the lower 16 bits of R4 at the ; address formed by adding R0 to R3. ; Load into R2 the sign extended byte ; found at the address formed by adding ; R1 to R7. ; Load into R3 the sign extended halfword ; found at the address formed by adding ; R2 to R4.
LDSB
LDSH
93
Offset5
Rb
Rd
Byte/Word flag
0 - Transfer word quantity 1 - Transfer byte quantity
Operation
These instructions transfer byte or word values between registers and memory using an immediate 5 or 7-bit offset. Table 20. Summary of Format 9 Instructions
L 0 B 0 THUMB assembler STR Rd, [Rb, #Imm] ARM equivalent STR Rd, [Rb, #Imm] Action Calculate the target address by adding together the value in Rb and Imm. Store the contents of Rd at the address. Calculate the source address by adding together the value in Rb and Imm. Load Rd from the address. Calculate the target address by adding together the value in Rb and Imm. Store the byte value in Rd at the address. Calculate source address by adding together the value in Rb and Imm. Load the byte value at the address into Rd.
Note: For word accesses (B = 0), the value specified by #Imm is a full 7-bit address, but must be word-aligned (ie with bits 1:0 set to 0), since the assembler places #Imm >> 2 in the Offset5 field.
94
Instruction Set
Instruction Set
Instruction cycle times
All instructions in this format have an equivalent ARM instruction as shown in Table 20. The instruction cycle times for the THUMB instruction are identical to that of the equivalent ARM instruction. For more information on instruction cycle times, please refer to Instruction Cycle Operations on page 175.
Examples
LDR R2, [R5,#116] ; ; ; ; ; ; ; ; Load into R2 the word found at the address formed by adding 116 to R5. Note that the THUMB opcode will contain 29 as the Offset5 value. Store the lower 8 bits of R1 at the address formed by adding 13 to R0. Note that the THUMB opcode will contain 13 as the Offset5 value.
STRB
R1, [R0,#13]
95
Offset5
Rb
Rd
Operation
These instructions transfer halfword values between a Lo register and memory. Addresses are pre-indexed, using a 6-bit immediate value. Table 21. Halfword Data Transfer Instructions
L 0 1 THUMB assembler STRH Rd, [Rb, #Imm] LDRH Rd, [Rb, #Imm] ARM equivalent STRH Rd, [Rb, #Imm] LDRH Rd, [Rb, #Imm] Action Add #Imm to base address in Rb and store bits 0-15 of Rd at the resulting address. Add #Imm to base address in Rb. Load bits 0-15 from the resulting address into Rd and set bits 16-31 to zero.
Note: #Imm is a full 6-bit address but must be halfwordaligned (ie with bit 0 set to 0) since the assembler places #Imm >> 1 in the Offset5 field.
96
Instruction Set
Instruction Set
Instruction cycle times
All instructions in this format have an equivalent ARM instruction as shown in Table 21. The instruction cycle times for the THUMB instruction are identical to that of the equivalent ARM instruction. For more information on instruction cycle times, please refer to Instruction Cycle Operations on page 175.
Examples
STRH R6, [R1, #56] ; ; ; ; ; ; ; ; ; Store the lower 16 bits of R4 at the address formed by adding 56 R1. Note that the THUMB opcode will contain 28 as the Offset5 value. Load into R4 the halfword found at the address formed by adding 4 to R7. Note that the THUMB opcode will contain 2 as the Offset5 value.
LDRH
97
Rd
Word8
Operation
The instructions in this group perform an SP-relative load or store.The THUMB assembler syntax is shown in the following table. Table 22. SP-Relative Load/Store Instructions
L 0 THUMB assembler STR Rd, [SP, #Imm] ARM equivalent STR Rd, [R13 #Imm] Action Add unsigned offset (255 words, 1020 bytes) in Imm to the current value of the SP (R7). Store the contents of Rd at the resulting address. Add unsigned offset (255 words, 1020 bytes) in Imm to the current value of the SP (R7). Load the word from the resulting address into Rd.
Note: The offset supplied in #Imm is a full 10-bit address, but must always be word-aligned (ie bits 1:0 set to 0), since the assembler places #Imm >> 2 in the Word8 field.
98
Instruction Set
Instruction Set
Instruction cycle times
All instructions in this format have an equivalent ARM instruction as shown in Table 22. The instruction cycle times for the THUMB instruction are identical to that of the equivalent ARM instruction. For more information on instruction cycle times, please refer to Instruction Cycle Operations on page 175.
Examples
STR R4, [SP,#492] ; ; ; ; Store the contents of R4 at the address formed by adding 492 to SP (R13). Note that the THUMB opcode will contain 123 as the Word8 value.
99
SP
Rd
Word8
Operation
These instructions calculate an address by adding an 10bit constant to either the PC or the SP, and load the resulting address into a register. Table 23. Load Address
SP 0 1 THUMB assembler ADD Rd, PC, #Imm ADD Rd, SP, #Imm ARM equivalent ADD Rd, R15, #Imm ADD Rd, R13, #Imm Action Add #Imm to the current value of the program counter (PC) and load the result into Rd. Add #Imm to the current value of the stack pointer (SP) and load the result into Rd.
Note: The value specified by #Imm is a full 10-bit value, but this must be word-aligned (ie with bits 1:0 set to 0) since the assembler places #Imm >> 2 in field Word8. Where the PC is used as the source register (SP = 0), bit 1 of the PC is always read as 0. The value of the PC will be 4 bytes greater than the address of the instruction before bit 1 is forced to 0. The CPSR condition codes are unaffected by these instructions.
Examples
ADD R2, PC, #572 ; ; ; ; ; ; ; ; ; R2 := PC + 572, but dont set the condition codes. bit[1] of PC is forced to zero. Note that the THUMB opcode will contain 143 as the Word8 value. R6 := SP (R13) + 212, but dont set the condition codes. Note that the THUMB opcode will contain 53 as the Word8 value.
ADD
100
Instruction Set
Instruction Set
Format 13: add offset to Stack Pointer
Figure 51. Format 13
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SWord7
Operation
This instruction adds a 9-bit signed constant to the stack pointer. The following table shows the THUMB assembler syntax. Table 24. The ADD SP Instruction
S 0 1 THUMB assembler ADD SP, #Imm ADD SP, #-Imm ARM equivalent ADD R13, R13, #Imm SUB R13, R13, #Imm Action Add #Imm to the stack pointer (SP). Add #-Imm to the stack pointer (SP).
Note: The offset specified by #Imm can be up to -/+ 508, but must be word-aligned (ie with bits 1:0 set to 0) since the assembler converts #Imm to an 8-bit sign + magnitude number before placing it in field SWord7. Note: The condition codes are not set by this instruction.
Examples
ADD SP, #268 ; ; ; ; ; ; ; ; SP (R13) := SP + 268, but dont set the condition codes. Note that the THUMB opcode will contain 67 as the Word7 value and S=0. SP (R13) := SP - 104, but dont set the condition codes. Note that the THUMB opcode will contain 26 as the Word7 value and S=1.
ADD
SP, #-104
101
Rlist
Load/Store bit
0 - Store to memory 1 - Load from memory
Operation
The instructions in this group allow registers 0-7 and optionally LR to be pushed onto the stack, and registers 07 and optionally PC to be popped off the stack. Table 25. PUSH and POP Instructions
L 0 0 R 0 1 THUMB assembler PUSH { Rlist } PUSH { Rlist, LR } ARM equivalent STMDB R13!, { Rlist } STMDB R13!, { Rlist, R14 } Action Push the registers specified by Rlist onto the stack. Update the stack pointer. Push the Link Register and the registers specified by Rlist (if any) onto the stack. Update the stack pointer. Pop values off the stack into the registers specified by Rlist. Update the stack pointer. Pop values off the stack and load into the registers specified by Rlist. Pop the PC off the stack. Update the stack pointer.
The THUMB assembler syntax is shown in Table 25. Note: The stack is always assumed to be Full Descending.
1 1
0 1
102
Instruction Set
Instruction Set
Instruction cycle times
All instructions in this format have an equivalent ARM instruction as shown in Table 25. The instruction cycle times for the THUMB instruction are identical to that of the equivalent ARM instruction. For more information on instruction cycle times, please refer to Instruction Cycle Operations on page 175.
Examples
PUSH {R0-R4,LR} ; ; ; ; ; ; ; ; ; Store R0,R1,R2,R3,R4 and R14 (LR) at the stack pointed to by R13 (SP) and update R13. Useful at start of a sub-routine to save workspace and return address. Load R2,R6 and R15 (PC) from the stack pointed to by R13 (SP) and update R13. Useful to restore workspace and return from sub-routine.
POP
{R2,R6,PC}
103
Rb
Rlist
Operation
These instructions allow multiple loading and storing of Lo registers. The THUMB assembler syntax is shown in the following table. Table 26. The Multiple Load/Store Instructions
L 0 THUMB assembler STMIA Rb!, { Rlist } ARM equivalent STMIA Rb!, { Rlist } Action Store the registers specified by Rlist, starting at the base address in Rb. Write back the new base address. Load the registers specified by Rlist, starting at the base address in Rb. Write back the new base address.
Examples
STMIA R0!, {R3-R7} ; ; ; ; ; Store the contents of registers R3-R7 starting at the address specified in R0, incrementing the addresses for each word. Write back the updated value of R0.
104
Instruction Set
Instruction Set
Format 16: conditional branch
Figure 54. Format 16
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Cond
SOffset8
Operation
The instructions in this group all perform a conditional Branch depending on the state of the CPSR condition codes. The branch offset must take account of the prefetch Table 27. The Conditional Branch Instructions
Cond 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 THUMB assembler BEQ label BNE label BCS label BCC label BMI label BPL label BVS label BVC label BHI label BLS label BGE label BLT label BGT label BLE label ARM equivalent BEQ label BNE label BCS label BCC label BMI label BPL label BVS label BVC label BHI label BLS label BGE label BLT label BGT label BLE label Action Branch if Z set (equal) Branch if Z clear (not equal) Branch if C set (unsigned higher or same) Branch if C clear (unsigned lower) Branch if N set (negative) Branch if N clear (positive or zero) Branch if V set (overflow) Branch if V clear (no overflow) Branch if C set and Z clear (unsigned higher) Branch if C clear or Z set (unsigned lower or same) Branch if N set and V set, or N clear and V clear (greater or equal) Branch if N set and V clear, or N clear and V set (less than) Branch if Z clear, and either N set and V set or N clear and V clear (greater than) Branch if Z set, or N set and V clear, or N clear and V set (less than or equal)
operation, which causes the PC to be 1 word (4 bytes) ahead of the current instruction. The THUMB assembler syntax is shown in the following table.
Note: While label specifies a full 9-bit twos complement address, this must always be halfword-aligned (ie with bit 0 set to 0) since the assembler actually places label >> 1 in field SOffset8. Note: Cond = 1110 is undefined, and should not be used. Cond = 1111 creates the SWI instruction: see Format 17: software interrupt on page 107.
105
Examples
CMP R0, #45 BGT over ... ... ... ... ... ; Branch to over if R0 > 45. ; Note that the THUMB opcode will contain ; the number of halfwords to offset.
over
106
Instruction Set
Instruction Set
Format 17: software interrupt
Figure 55. Format 17
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Value8
Comment field
Operation
The SWI instruction performs a software interrupt. On taking the SWI, the processor switches into ARM state and enters Supervisor (SVC) mode. Table 28. The SWI Instruction
THUMB assembler SWI Value8 ARM equivalent SWI Value8 Action Perform Software Interrupt: Move the address of the next instruction into LR, move CPSR to SPSR, load the SWI vector address (0x8) into the PC. Switch to ARM state and enter SVC mode.
Note: Value8 is used solely by the SWI handler: it is ignored by the processor.
times for the THUMB instruction are identical to that of the equivalent ARM instruction. For more information on instruction cycle times, please refer to Instruction Cycle Operations on page 175
Examples
SWI 18 ; Take the software interrupt exception. ; Enter Supervisor mode with 18 as the ; requested SWI number.
107
Offset11
Immediate value
Operation
This instruction performs a PC-relative Branch. The THUMB assembler syntax is shown below. The branch offset must take account of the prefetch operation, which Table 29. Summary of Branch Instruction
THUMB assembler B label ARM equivalent BAL label (halfword offset) Action Branch PC relative +/- Offset11 << 1, where label is PC +/2048 bytes.
Note: The address specified by label is a full 12-bit twos complement address, but must always be halfword aligned
(ie bit 0 set to 0), since the assembler places label >> 1 in the Offset11 field.
Examples
here B here ; ; ; ; ; ; ; ; Branch onto itself. Assembles to 0xE7FE. (Note effect of PC offset). Branch to jimmy. Note that the THUMB opcode will contain the number of halfwords to offset. Must be halfword aligned.
B jimmy ...
jimmy
...
108
Instruction Set
Instruction Set
Format 19: long branch with link
Figure 57. Format 19
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Offset
Operation
This format specifies a long branch with link. The assembler splits the 23-bit twos complement half-word offset specifed by the label into two 11-bit halves, ignoring bit 0 (which must be 0), and creates two THUMB instructions. Instruction 1 (H = 0) In the first instruction the Offset field contains the upper 11 bits of the target address. This is shifted left by 12 bits and added to the current PC address. The resulting address is placed in LR. Instruction 2 (H =1) Table 30. The BL Instruction
H 0 1 THUMB assembler BL label ARM equivalent none
In the second instruction the Offset field contains an 11-bit representation lower half of the target address. This is shifted left by 1 bit and added to LR. LR, which now contains the full 23-bit address, is placed in PC, the address of the instruction following the BL is placed in LR and bit 0 of LR is set. The branch offset must take account of the prefetch operation, which causes the PC to be 1 word (4 bytes) ahead of the current instruction
Examples
next BL faraway ... ; ; ; ; ; ; ; ; Unconditionally Branch to faraway and place following instruction address, ie next, in R14,the Link Register and set bit 0 of LR high. Note that the THUMB opcodes will contain the number of halfwords to offset. Must be Half-word aligned.
faraway ...
109
ARM
2. Multiplication by 2^n+1 (3,5,9,17,...) LSL Rt, Rb, #n ADD Ra, Rb, Rb, LSL #n ADD Ra, Rt, Rb 3. Multiplication by 2^n-1 (3,7,15,...) LSL Rt, Rb, #n SUB Ra, Rt, Rb
4. Multiplication by -2^n (-2, -4, -8, ...) LSL Ra, Rb, #n MOV Ra, Rb, LSL #n MVN Ra, Ra RSB Ra, Ra, #0 5. Multiplication by -2^n-1 (-3, -7, -15, ...) LSL Rt, Rb, #n SUB Ra, Rb, Rb, LSL #n SUB Ra, Rb, Rt 6. Multiplication by any C = {2^n+1, 2^n-1, -2^n or -2^n-1} * 2^n Effectively this is any of the multiplications in 2 to 5 followed by a final shift. This allows the following additional constants to be multiplied. 6, 10, 12, 14, 18, 20, 24, 28, 30, 34, 36, 40, 48, 56, 60, 62 ..... (2..5) (2..5) LSL Ra, Ra, #n MOV Ra, Ra, LSL #n
110
Instruction Set
Instruction Set
General purpose signed divide
This example shows a general purpose signed divide and remainder routine in both Thumb and ARM code. Thumb code signed_divide ; Signed divide of R1 by R0: returns quotient in R0, ; remainder in R1 ; Get abs ASR EOR SUB value of R0 R2, R0, #31 R0, R2 R3, R0, R2 into R3 ; Get 0 or -1 in R2 depending on sign of R0 ; EOR with -1 (0xFFFFFFFF) if negative ; and ADD 1 (SUB -1) to get abs value
; SUB always sets flag so go & report division by 0 if necessary ; BEQ divide_by_zero ; Get abs value of R1 ; if negative ASR R0, R1, #31 EOR R1, R0 SUB R1, R0 by xoring with 0xFFFFFFFF and adding 1 ; Get 0 or -1 in R3 depending on sign of R1 ; EOR with -1 (0xFFFFFFFF) if negative ; and ADD 1 (SUB -1) to get abs value
; Save signs (0 or -1 in R0 & R2) for later use in determining ; sign of quotient & remainder. PUSH {R0, R2} ; Justification, shift 1 bit at a time until divisor (R0 value) ; is just <= than dividend (R1 value). To do this shift dividend ; right by 1 and stop as soon as shifted value becomes >. LSR R0, R1, #1 MOV R2, R3 B %FT0 just_l LSL 0 CMP BLS MOV B div_l LSR 0 CMP BCC SUB 0 ADC R2, #1 R2, R0 just_l R0, #0 %FT0 R2, #1 R1, R2 %FT0 R1, R2 R0, R0 ; Set accumulator to 0 ; Branch into division loop
; Test subtract ; ; ; ; If successful do a real subtract Shift result and add 1 if subtract succeeded
CMP BNE
R2, R3 div_l
; Terminate when R2 == R3 (ie we have just ; tested subtracting the ones value).
111
; Now fixup the signs of the quotient (R0) and remainder (R1) POP {R2, R3} ; Get dividend/divisor signs back EOR EOR SUB EOR SUB MOV R3, R2 R0, R3 R0, R3 R1, R2 R1, R2 pc, lr ; Result sign ; Negate if result sign = -1
ARM code signed_divide ; effectively ANDS RSBMI EORS ; ip bit 31 = ; ip bit 30 = RSBCS
zero a4 a4, a1, a1, a1, ip, a4, sign of sign of a2, a2,
as top bit will be shifted out later #&80000000 #0 a2, ASR #32 result a2 #0
; central part is identical code to udiv ; (without MOV a4, #0 which comes for free as part of signed ; entry sequence) MOVS a3, a1 BEQ divide_by_zero just_l ; justification stage shifts 1 bit at a time CMP a3, a2, LSR #1 MOVLS a3, a3, LSL #1 ; NB: LSL #1 is always OK if LS succeeds BLO s_loop div_l CMP ADC SUBCS TEQ MOVNE BNE MOV MOVS RSBCS RSBMI MOV a2, a3 a4, a4, a4 a2, a2, a3 a3, a1 a3, a3, LSR #1 s_loop2 a1, a4 ip, ip, ASL #1 a1, a1, #0 a2, a2, #0 pc, lr
112
Instruction Set
Instruction Set
Division by a constant
The ARM instruction set was designed following a RISC philosophy. One of the consequences of this is that the ARM core has no divide instruction, so divides must be performed using a subroutine. This means that divides can be quite slow, but this is not a major issue as divide performance is rarely critical for applications. It is possible to do better than the general divide in the special case when the divisor is a constant. The divc.c example shows how the divide-by-constant technique works by generating ARM assembler code for divide-by-constant. In the special case when dividing by 2^n, a simple right shift is all that is required. There is a small caveat which concerns the handling of signed and unsigned numbers. For signed numbers, an arithmetic right shift is required, as this performs sign extension (to handle negative numbers correctly). In contrast, unsigned numbers require a 0-filled logical shift right: MOV MOV a2, a1, lsr #5 ; unsigned division by 32 a2, a1, asr #10 ; signed division by 1024
Explanation of divide-by-constant ARM code The divide-by-constant technique basically does a multiply in place of the divide. Given that: x/y == x * (1/y) consider the underlined portion as a 0.32 fixed-point number (truncating any bits past the most significant 32). 0.32 means 0 bits before the decimal point and 32 after it. == (x * (2^32/y)) / 2^32 the underlined portion here is a 32.0 bit fixed-point number: == (x * (2^32/y)) >> 32 This is effectively returning the top 32-bits of the 64-bit product of x and (2^32/y). If y is a constant, then (2^32/y) is also a constant. For certain y, the reciprocal (2^32/y) is a repeating pattern in binary:
y (2^32/y)
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
10000000000000000000000000000000 01010101010101010101010101010101 01000000000000000000000000000000 00110011001100110011001100110011 00101010101010101010101010101010 00100100100100100100100100100100 00100000000000000000000000000000 00011100011100011100011100011100 00011001100110011001100110011001 00010111010001011101000101110100 00010101010101010101010101010101 00010011101100010011101100010011 00010010010010010010010010010010 00010001000100010001000100010001 00010000000000000000000000000000 00001111000011110000111100001111 00001110001110001110001110001110 00001101011110010100001101011110 00001100110011001100110011001100 00001100001100001100001100001100 00001011101000101110100010111010
# * # * * * # * * *
* * # * *
* *
113
23 24 25
The lines marked with a # are the special cases 2^n, which have already been dealt with. The lines marked with a * have a simple repeating pattern. Note how regular the patterns are for y=2^n+2^m or y=2^n-2^m (for n>m):
n m (2^n+2^m) n m (2^n-2^m)
1 2 2 3 3 3 4 4 4 4 5 5 5 5 5
0 0 1 0 1 2 0 1 2 3 0 1 2 3 4
3 5 6 9 10 12 17 18 20 24 33 34 36 40 48
1 2 2 3 3 3 4 4 4 4 5 5 5 5 5
0 1 0 2 1 0 3 2 1 0 4 3 2 1 0
1 2 3 4 6 7 8 12 14 15 16 24 28 30 31
For the repeating patterns, it is a relatively easy matter to calculate the product by using a multiply-by-constant method. The result can be calculated in a small number of instructions by taking advantage of the repetition in the pattern. The actual multiply is slightly unusual due to the need to return the top 32 bits of the 64-bit result. It efficient to calculate just the top 32 bits. Consider this fragment of the divide-by-ten code (x is the input dividend as used in the above equations):
SUB ADD ADD ADD MOV a1, x, x, lsr #2 ; a1 = x*%0.11000000000000000000000000000000 ; a1 = x*%0.11001100000000000000000000000000 ; a1 = x*%0.11001100110011000000000000000000 ; a1 = x*%0.11001100110011001100110011001100 ; a1 = x*%0.00011001100110011001100110011001
a1, a1, a1, lsr #4 a1, a1, a1, lsr #8 a1, a1, a1, lsr #16 a1, a1, lsr #3
114
Instruction Set
Instruction Set
The SUB calculates (for example): a1 = x - x/4 = x - x*%0.01 = x*%0.11 Therefore, just five instructions are needed to perform the multiply. A small problem is caused by calculating just the top 32 bits, as this ignores any carry from the low 32 bits of the 64-bit product. Fortunately, this can be corrected. A correct divide would round down, so the remainder can be calculated by: x - (x/10)*10 = 0..9 By making good use of the ARMs barrel shifter, it takes just two ARM instructions to perform this multiply-by-10 and subtract. In the case when (x/10) is too small by 1 (if carry has been lost), the remainder will be in the range 10..19, in which case corrections must be applied. This test would require a compare-with-10 instruction, but this can be combined with other operations to save an instruction (see below). When a lost carry is detected, both the quotient and remainder must be fixed up (one instruction each). The following fragments should explain the full divide-by-10 code. ARM code div10 ; takes argument in a1 ; returns quotient in a1, remainder in a2 ; cycles could be saved if only divide or remainder is required SUB a2, a1, #10 ; keep (x-10) for later SUB a1, a1, a1, lsr #2 ADD a1, a1, a1, lsr #4 ADD a1, a1, a1, lsr #8 ADD a1, a1, a1, lsr #16 MOV a1, a1, lsr #3 ADD a3, a1, a1, asl #2 SUBS a2, a2, a3, asl #1 ; calc (x-10) - (x/10)*10 ADDPL a1, a1, #1 ; fix-up quotient ADDMI a2, a2, #10 ; fix-up remainder MOV pc, lr The optimisation which eliminates the compare-with-10 instruction is to keep (x-10) for use in the subtraction to calculate the remainder. This means that compare-with-0 is required instead, which is easily achieved by adding an S (to set the flags) to the SUB opcode. This also means that the subtraction has to be undone if no rounding error occurred (which is why the ADDMI instruction is used). THUMB code udiv10 ; takes argument in a1 ; returns quotient in a1, remainder in a2 MOV a2, a1 LSR a3, a1, #2 SUB a1, a3 LSR a3, a1, #4 ADD a1, a3 LSR a3, a1, #8 ADD a1, a3 LSR a3, a1, #16 ADD a1, a3 LSR a1, #3 ASL a3, a1, #2 ADD a3, a1 ASL a3, #1
115
116
Instruction Set
Memory Interface
Overview
ARM7TDMIs memory interface consists of the following basic elements: 32-bit address bus This specifies to memory the location to be used for the transfer. 32-bit data bus Instructions and data are transferred across this bus. Data may be word, halfword or byte wide in size. ARM7TDMI includes a bidirectional data bus, D[31:0], plus separate unidirectional data busses, DIN[31:0] and DOUT[31:0]. Most of the text in this chapter describes the bus behaviour assuming that the bidirectional is in use. However, the behaviour applies equally to the unidirectional busses. Control signals These specify, for example, the size of the data to be transferred, and the direction of the transfer together with providing privileged information. This collection of signals allow ARM7TDMI to be simply interfaced to DRAM, SRAM and ROM. To fully exploit page mode access to DRAM, information is provided on whether or not the memory accesses are sequential. In general, interfacing to static memories is much simpler than interfacing to dynamic memory.
Memory Interface
117
Cycle Types
All memory transfer cycles can be placed in one of four categories: 1. Non-sequential cycle. ARM7TDMI requests a transfer to or from an address which is unrelated to the address used in the preceding cycle. 2. Sequential cycle. ARM7TDMI requests a transfer to or from an address which is either the same as the address in the preceding cycle, or is one word or halfword after the preceding address. 3. Internal cycle. ARM7TDMI does not require a transfer, as it is performing an internal function and no useful prefetching can be performed at the same time. 4. Coprocessor register transfer. ARM7TDMI wishes to use the data bus to communicate with a coprocessor, but does not require any action by the memory system. These four classes are distinguishable to the memory system by inspection of the nMREQ and SEQ control lines Figure 58. ARM Memory Cycle Timing
N-cycle MCLK S-cycle I-cycle C-cycle
(see Table 31). These control lines are generated during phase 1 of the cycle before the cycle whose characteristics they forecast, and this pipelining of the control information gives the memory system sufficient time to decide whether or not it can use a page mode access. Table 31. Memory Cycle Types
nMREQ 0 0 1 1 SEQ 0 1 0 1 Cycle type Non-sequential (N-cycle) Sequential (S-cycle) Internal (I-cycle) Coprocessor register transfer (C-cycle)
Figure 58 shows the pipelining of the control signals, and suggests how the DRAM address strobes (nRAS and nCAS) might be timed to use page mode for S-cycles. Note that the N-cycle is longer than the other cycles. This is to allow for the DRAM precharge and row access time, and is not an ARM7TDMI requirement.
A[31:0]
a+4
a+8
nMREQ
SEQ
nRAS
nCAS
D[31:0]
118
Memory
Memory
When an S-cycle follows an N-cycle, the address will always be one word or halfword greater than the address used in the N-cycle. This address (marked a in the above diagram) should be checked to ensure that it is not the last in the DRAM page before the memory system commits to the S-cycle. If it is at the page end, the S-cycle cannot be performed in page mode and the memory system will have to perform a full access. Figure 59. Memory Cycle Optimization
I-cycle S-cycle
The processor clock must be stretched to match the full access. When an S-cycle follows an I-cycle, the address will be the same as that used in the I-cycle. This fact may be used to start the DRAM access during the preceding cycle, which enables the S-cycle to run at page mode speed whilst performing a full DRAM access. This is shown in Figure 59.
MCLK
A[31:0]
nMREQ
SEQ
nRAS
nCAS
D[31:0]
119
Address Timing
ARM7TDMIs address bus can operate in one of two configurations - pipelined or depipelined, and this is controlled by the APE input signal. The configurability is provided to ease the design in of ARM7TDMI to both SRAM and DRAM based systems. It is a requirement SRAMs and ROMs that the address be held stable throughout the memory cycle. In a system conFigure 60. ARM7TDMI De-Pipelined Addresses taining SRAM and ROM only, APE may be tied permanently LOW, producing the desired address timing. This is shown in Figure 60. Note: APE affects the timing of the address bus A[31:0], plus nRW, MAS[1:0], LOCK, nOPC and nTRANS.
MCLK
APE
nMREQ SEQ
A[31:0]
D[31:0]
In a DRAM based system, it is desirable to obtain the address from ARM7TDMI as early as possible. When APE is HIGH, ARM7TDMI's address becomes valid in the MCLK high phase before the memory cycle to which it refers. This timing allows longer for address decoding and the generation of DRAM control signals. Figure 61 shows the effect on the timing when APE is HIGH.
120
Memory
Memory
Figure 61. ARM7TDMI Pipelined Addresses
MCLK
APE
nMREQ SEQ
A[31:0]
D[31:0]
Many systems will contain a mixture of DRAM and SRAM/ROM. To cater for the different address timing requirements, APE may be safely changed during the low phase of MCLK. Typically, APE would be held at one level during a burst of sequential accesses to one type of memory. When a non-sequential access occurs, the timing of most systems enforce a wait state to allow for address decoding. As a result of the address decode, APE can be driven to the correct value for the particular bank of memory being accessed. The value of APE can be held until
the memory control signals denote another non-sequential access. By way of an example, Figure 62 shows a combination of accesses to a mixed DRAM / SRAM system. Here, the SRAM has zero wait states, and the DRAM has a 2:1 Ncycle / S-cycle ratio. A single wait state is inserted for address decode when a non-sequential access occurs. Typical, externally generated DRAM control signals are also shown.
121
S MCLK
nMREQ
SEQ
A[31:0]
A+4
A+8
B+4
B+8
C+4
C+8
nRW
nWAIT
APE
D[31:0]
DBE
nRAS
nCAS
122
Memory
Memory
Previous ARM processors included the ALE signal, and this is retained for backwards compatibility. This signal also allows the address timing to be modified to achieve the same results as APE, but in an asynchronous manner. To obtain clean MCLK low timing of the address bus by this mechanism, ALE must be driven HIGH with the falling edge of MCLK, and LOW with the rising edge of MCLK. Figure 63. SRAM Compatible AddressTiming ALE can simply be the inverse of MCLK but the delay from MCLK to ALE must be carefully controlled such that the Tald timing constraint is achieved. Figure 63 shows how ALE can be used to achieve SRAM compatible address timing. Refer to Timing Diagrams on page 189 for details of the exact timing constraints.
MCLK
APE
ALE
nMREQ SEQ
A[31:0]
D[31:0]
Note: If ALE is to be used to change address timing, then APE must be tied HIGH. Similarly, if APE is to be used, ALE must be tied HIGH.
123
Instruction Fetch
ARM7TDMI will perform 32- or 16-bit instruction fetches depending on whether the processor is in ARM or THUMB state. The processor state may be determined externally by the value of the TBIT signal. When this is LOW, the processor is in ARM state and 32-bit instructions are fetched. When TBIT is HIGH, the processor is in THUMB state and 16-bit instructions are fetched. The size of the data being fetched is also indicated on the MAS[1:0] bits, as described above. When the processor is in ARM state, 32-bit instructions are fetched on D[31:0]. When the processor is in THUMB state, 16-bit instructions are fetched from either the upper, D[31:16], or the lower D[15:0] half of the bus. This is determined by the endianism of the memory system, as configured by the BIGEND input, and the state of A[1]. Table 32 shows which half of the data bus is sampled in the different configurations. Table 32. Endianism Effect on Instruction Position
Endianism Little BIGEND = 0 D[15:0] D[31:16] Big BIGEND = 1 D[31:16] D[15:0]
A[1] = 0 A[1] = 1
When a 16-bit instruction is fetched, ARM7TDMI ignores the unused half of the data bus. Table 32 describes instructions fetched from the bidirectional data bus (i.e. BUSEN is LOW). When the unidirectional data busses are in use (i.e. BUSEN is HIGH), data will be fetched from the corresponding half of the DIN[31:0] bus.
124
Memory
A[0]
A[1]
MAS[0] [1]
MAS[0] [1]
MCLK
CAS
G NCAS0
NCAS3
Memory
125
Memory Management
The ARM7TDMI address bus may be processed by an address translation unit before being presented to the memory, and ARM7TDMI is capable of running a virtual memory system. The ABORT input to the processor may be used by the memory manager to inform ARM7TDMI of page faults. Various other signals enable different page protection levels to be supported: 1. nRW can be used by the memory manager to protect pages from being written to. 2. nTRANS indicates whether the processor is in user or a privileged mode, and may be used to protect system pages from the user, or to support completely separate mappings for the system and the user. Address translation will normally only be necessary on an N-cycle, and this fact may be exploited to reduce power consumption in the memory manager and avoid the translation delay at other times. The times when translation is necessary can be deduced by keeping track of the cycle types that the processor uses.
Locked Operations
The ARM instruction set of ARM7TDMI includes a data swap (SWP) instruction that allows the contents of a memory location to be swapped with the contents of a processor register. This instruction is implemented as an uninterruptable pair of accesses; the first access reads the contents of the memory, and the second writes the register data to the memory. These accesses must be treated as a contiguous operation by the memory controller to prevent another device from changing the affected memory location before the swap is completed. ARM7TDMI drives the LOCK signal HIGH for the duration of the swap operation to warn the memory controller not to give the memory to another device.
126
Memory
Memory
The ARM Data Bus
To ease the connection of ARM7TDMI to sub-word sized memory systems, input data and instructions may be latched on a byte by byte basis. This is achieved by use of the BL[3:0] input signals where BL[3] controls the latching of the data present on D[31:24] of the data bus and so on. In a memory system containing word wide memory only, BL[3:0] may be tied HIGH. For sub word wide memory systems, BL[3:0] are used to latch the data as it is read out of memory. For example, a word access to halfword wide memory must take place in two memory cycles. In the first cycle, the data for D[15:0] is obtained from the memory and latched into the processor on the falling edge of MCLK when BL[1:0] are both HIGH. In the second cycle, the data for D[31:16] is latched into the processor on the falling edge of MCLK when BL[3:2] are both HIGH. A memory access like this is shown in Figure 65. Here, a word access is performed from halfword wide memory in Figure 65. Memory Access
MCLK
two cycles.In the first, the data read is applied to the lower half of the bus, in the second cycle the read data is applied to the upper half of the bus. Since two memory cycles were required, nWAIT is used to stretch the internal processor clock. However, nWAIT does not effect the operation of the data latches. In this way, data may be extracted from memory word, halfword or byte at a time, and the memory may have as many wait states as required. In any multi-cycle memory access, nWAIT is held LOW until the final quantum of data is latched. In this example, BL[3:0] were driven to value 0x3 in the first cycle so that only the latches on D[15:0] were opened. In fact, BL[3:0] could have been driven to value 0xF and all the latches opened. Since in the second cycle, the latches on D[31:16] were written with the correct data, this would not have effected the processors operation. Note: BL[3:0] should all be HIGH during store cycles.
APE
nMREQ SEQ
A[31:0]
nWAIT
D[31:0]
D[31:16]
BL[3:0]
0xF
0x2
127
As a further example, a halfword load from 2-wait state byte wide memory is shown in Figure 66. Here, each memory access takes two cycles. In the first, access, BL[3:0] are driven to value 0xF. The correct data is latched from D[7:0] whilst unknown data is latched from D[31:8]. In the second Figure 66. Two-Cycle Memory Access
access, the byte for D[15:8] is latched and so the halfword on D[15:0] has been correctly read from the memory. The fact that internally D[31:16] are unknown does not matter because internally the processor will extract only the halfword it is interested in.
MCLK
APE
nMREQ SEQ
A[31:0]
nWAIT
D[7:0]
D[15:8]
BL[3:0]
0xF
0x2
128
Memory
Memory
The External Data Bus
ARM7TDMI has a bidirectional data bus, D[31:0]. However, since some ASIC design methodologies prohibit the use of bidirectional buses, unidirectional data in, DIN[31:0], Figure 67. ARM7TDMI External Bus Arrangement and data out, DOUT[31:0], busses are also provided. The logical arrangement of these buses is shown inFigure 67.
ICEbreaker ARM7TDMI
DIN[31:0]
D[31:0]
DOUT[31:0]
When the bidirectional data bus is being used, the unidirectional busses must be disabled by driving BUSEN LOW. Figure 68. Bidirectional Bus Timing
Read Cycle Store Cycle
The timing of the bus for three cycles, load-store-load, is shown in Figure 68.
Read Cycle
MCLK
APE
129
value on DOUT[31:0] changes off the falling edge of MCLK. The bus timing of a read-write-read cycle combination is shown in Figure 69. When BUSEN is LOW, the buffer between DIN[31:0] and D[31:0] is disabled. Any data presented on DIN[31:0] is i gn o r e d . A l s o , wh e n BU S E N i s l o w , t he v a l u e o n DOUT[31:0] is forced to 0x00000000. Typically, the unidirectional busses would be used internally in ASIC embedded applications. Externally, most systems still require a bidirectional data bus to interface to external memory. Figure 70 shows how the unidirectional busses may be joined up at the pads of an ASIC to connect to an external bidirectional bus.
MCLK
DIN[31:0]
D1
D2
DOUT[31:0]
Dout
D[31:0]
D1
Dout
D2
PAD XDATA[31:0]
130
Memory
Memory
The bidirectional data bus
ARM7TDMI has a bidirectional data bus, D[31:0]. Most of the time, the ARM reads from memory and so this bus is configured to input. During write cycles however, the ARM7TDMI must output data. During phase 2 of the previous cycle, the signal nRW is driven HIGH to indicate a write cycle. During the actual cycle, nENOUT is driven LOW to Figure 71. Data Write Bus Cycle
Memory Cycle MCLK
indicate that the ARM7TDMI is driving D[31:0] as an output. Figure 71 shows this bus timing (DBE has been tied HIGH in this example). Figure 73 on page 133 shows the circuit which exists in ARM7TDMI for controlling exactly when the external bus is driven out.
A[31:0]
nRW
nENOUT
D[31:0]
The ARM7TDMI macrocell has an additional bus control signal, nENIN, which allows the external system to manually tristate the bus. In the simplest systems, nENIN can be tied LOW and nENOUT can be ignored. However, in many applications when the external data bus is a shared resource, greater control may be required. In this situation, nENIN can be used to delay when the external bus is driven. Note that for backwards compatibility, DBE is also included. At the macrocell level, DBE and nENIN have almost identical functionality and in most applications one can be tied off. The Section Example system: The ARM7TDMI Testchip on page 133 describes how ARM7TDMI may be interfaced to an external data bus, using ARM7TDMI Testchip as an example. Table 33. Output Enable Control Summary
ARM7TDMI output A[31:0] D[31:0] nRW LOCK MAS[1:0] nOPC nTRANS DBGACK ABE
ARM7TDMI has another output control signal called TBE. This signal is normally only used during test and must be tied HIGH when not in use. When driven LOW, TBE forces all three-stateable outputs to high impedance. It is as if both DBE and ABE have been driven LOW, causing the data bus, the address bus, and all other signals normally controlled by ABE to become high impedance. Note, however, that there is no scan cell on TBE. Thus, TBE is completely independent of scan data and may be used to put the outputs into a high impedance state while scan testing takes place. Table 33 below, shows the tri-state control of ARM7TDMIs outputs. Signals without in the ABE, DBE or TBE column cannot be driven to the high impedance state:
DBE TBE
131
Scan Cell
DBE
Core Control
Scan Cell
nENOUT
Scan Cell
nENIN
TBE
D[31:0]
132
Memory
Memory
Example system: The ARM7TDMI Testchip
Connecting ARM7TDMIs data bus, D[31:0] to an external shared bus requires some simple additional logic. This will vary from application to application. As an example, the following describes how the ARM7TDMI macrocell was conn e c t ed t o t h e b i -d i r e c t i o n a l d a ta b u s p a d s o f th e ARM7TDMI testchip. In this application, care must be taken to prevent bus clash on D[31:0] when the data bus drive changes direction. The timing of nENIN, and the pad control signals must be arranged so that when the core starts to drive out, the pad drive onto D[31:0] switches off before the core starts to Figure 73. The ARM7TDMI Testchip Data Bus Circuit
ARM7TDMI
drive. Similarly, when the bus switches back to input, the core must stop driving before the pad switches on. All this can be achieved using a simple non-overlapping clock generator. The actual circuit implemented in the ARM7TDMI testchip is shown in Figure 73. Note that at the core level, TBE and DBE are tied HIGH (inactive). This is because in a packaged part, there is no need to ever manually force the internal buses into a high impedance state. Note also that at the pad level, the signal EDBE is factored into the bus control logic. This allows the external memory controller to arbitrate the bus and asynchronously disable ARM7TDMI testchip if required.
Core
EDBE nENOUT
SRL
nEN2
nEN1 nENIN
SRL Vdd
TBE
Pad D[31:0]
XD[31:0]
Figure 74 shows how the various control signals interact. Under normal conditions, when the data bus is configured as input, nENOUT is HIGH, nEN1 is LOW, and nEN2/nENIN is HIGH. Thus the pads drive XD[31:0] onto D[31:0]. When a write cycle occurs, nRW is driven HIGH to indicate a write during phase 2 of the previous cycle, (ie, with the address). During phase 1 of the actual cycle, nENOUT is driven LOW to indicate that ARM7TDMI is about to drive
the bus. The falling edge of this signal makes nEN1 go HIGH, which disables the input half pad from driving D[31:0]. This in turn makes nEN2 go LOW, which enables the output half of the pad so that the ARM7TDMI is now driving the external data bus, XD[31:0]. nEN2 is then buffered and driven back into the core on nENIN, so that finally the ARM7TDMI macrocell drives D[31:0]. The delay between all the signals ensures that there is no clash on the data bus as it changes direction from input to output.
133
nEN1
nEN2/ nENIN
D[31:1]
When the bus turns around to the other direction at the end of the cycle, the various control signals switch the other way. Again, the non-overlap ensures that there is never a bus clash. This time, nENOUT is driven HIGH to denote that ARM7TDMI no longer needs to drive the bus and the cores output is immediately switched off. This causes nEN2 to disable the output half of the pad which in turn
causes nEN1 to switch on the input half. Thus, the bus is back to its original input configuration. Note that the data out time of ARM7TDMI is not directly determined by nENOUT and nENIN, and so delaying exactly when the bus is driven will not affect the propagation delay. Please refer to Timing Diagrams on page 189 for timing details.
134
Memory
Coprocessor Interface
The functionality of the ARM7TDMI instruction set can be extended by adding external coprocessors. This chapter describes the ARM7TDMI coprocessor interface.
Overview
The functionality of the ARM7TDMI instruction set may be extended by the addition of up to 16 external coprocessors. When the coprocessor is not present, instructions intended for it will trap, and suitable software may be installed to emulate its functions. Adding the coprocessor will then increase the system performance in a software compatible way. Note that some coprocessor numbers have already been assigned. Contact ARM Ltd for up-to-date information.
Coprocessor Interface
135
Interface Signals
Three dedicated signals control the coprocessor interface, nCPI, CPA and CPB. The CPA and CPB inputs should be driven HIGH except when they are being used for handshaking. and if transfers are involved they will start on the next cycle. If nCPI has gone HIGH after being LOW, and before the instruction is committed, ARM7TDMI has broken off from the busy-wait state to service an interrupt. The instruction may be restarted later, but other coprocessor instructions may come sooner, and the instruction should be discarded.
Coprocessor present/absent
ARM7TDMI takes nCPI LOW whenever it starts to execute a coprocessor (or undefined) instruction. (This will not happen if the instruction fails to be executed because of the condition codes.) Each coprocessor will have a copy of the instruction, and can inspect the CP# field to see which coprocessor it is for. Every coprocessor in a system must have a unique number and if that number matches the contents of the CP# field the coprocessor should drive the CPA (coprocessor absent) line LOW. If no coprocessor has a number which matches the CP# field, CPA and CPB will remain HIGH, and ARM7TDMI will take the undefined instruction trap. Otherwise ARM7TDMI observes the CPA line going LOW, and waits until the coprocessor is not busy.
Pipeline following
In order to respond correctly when a coprocessor instruction arises, each coprocessor must have a copy of the instruction. All ARM7TDMI instructions are fetched from memory via the main data bus, and coprocessors are connected to this bus, so they can keep copies of all instructions as they go into the ARM7TDMI pipeline. The nOPC signal indicates when an instruction fetch is taking place, and MCLK gives the timing of the transfer, so these may be used together to load an instruction pipeline within the coprocessor.
Busy-waiting
If CPA goes LOW, ARM7TDMI will watch the CPB (coprocessor busy) line. Only the coprocessor which is driving CPA LOW is allowed to drive CPB LOW, and it should do so when it is ready to complete the instruction. ARM7TDMI will busy-wait while CPB is HIGH, unless an enabled interrupt occurs, in which case it will break off from the coprocessor handshake to process the interrupt. Normally ARM7TDMI will return from processing the interrupt to retry the coprocessor instruction. When CPB goes LOW, the instruction continues to completion. This will involve data transfers taking place between the coprocessor and either ARM7TDMI or memory, except in the case of coprocessor data operations which complete immediately the coprocessor ceases to be busy. All three interface signals are sampled by both ARM7TDMI and the coprocessor(s) on the rising edge of MCLK. If all three are LOW, the instruction is committed to execution,
136
Coprocessor
Coprocessor
Register Transfer Cycle
The coprocessor register transfer cycle is the one case when ARM7TDMI requires the data bus without requiring the memory to be active. The memory system is informed that the bus is required by ARM7TDMI taking both nMREQ and SEQ HIGH. When the bus is free, DBE should be taken HIGH to allow ARM7TDMI or the coprocessor to drive the bus, and an MCLK cycle times the transfer.
Idempotency
A consequence of the implementation of the coprocessor interface, with the interruptible busy-wait state, is that all instructions may be interrupted at any point up to the time when the coprocessor goes not-busy. If so interrupted, the instruction will normally be restarted from the beginning after the interrupt has been processed. It is therefore essential that any action taken by the coprocessor before it goes not-busy must be idempotent, ie must be repeatable with identical results. For example, consider a FIX operation in a floating point coprocessor which returns the integer result to an ARM7TDMI register. The coprocessor must stay busy while it performs the floating point to fixed point conversion, as ARM7TDMI will expect to receive the integer value on the cycle immediately following that where it goes not-busy. The coprocessor must therefore preserve the original floating point value and not corrupt it during the conversion, because it will be required again if an interrupt arises during the busy period. The coprocessor data operation class of instruction is not generally subject to idempotency considerations, as the processing activity can take place after the coprocessor goes not-busy. There is no need for ARM7TDMI to be held up until the result is generated, because the result is confined to stay within the coprocessor.
Privileged Instructions
The coprocessor may restrict certain instructions for use in privileged modes only. To do this, the coprocessor will have to track the nTRANS output. As an example of the use of this facility, consider the case of a floating point coprocessor (FPU) in a multi-tasking system. The operating system could save all the floating point registers on every task switch, but this is inefficient in a typical system where only one or two tasks will use floating point operations. Instead, there could be a privileged instruction which turns the FPU on or off. When a task switch happens, the operating system can turn the FPU off without saving its registers. If the new task attempts an FPU operation, the FPU will appear to be absent, causing an undefined instruction trap. The operating system will then realise that the new task requires the FPU, so it will reenable it and save FPU registers. The task can then use the FPU as normal. If, however, the new task never attempts an FPU operation (as will be the case for most tasks), the state saving overhead will have been avoided.
Undefined Instructions
Undefined instructions are treated by ARM7TDMI as coprocessor instructions. All coprocessors must be absent (ie CPA and CPB must be HIGH) when an undefined instruction is presented. ARM7TDMI will then take the undefined instruction trap. Note that the coprocessor need only look at bit 27 of the instruction to differentiate undefined instructions (which all have 0 in bit 27) from coprocessor instructions (which all have 1 in bit 27) Note that when in THUMB state, coprocessor instructions are not supported but undefined instructions are. Thus, all coprocessors must monitor the state of the TBIT output from ARM7TDMI. When ARM7TDMI is in THUMB state, coprocessors must appear absent (ie they must drive CPA and CPB HIGH) and the instructions seen on the data bus must be ignored. In this way, coprocessors will not erroneously execute THUMB instructions, and all undefined instructions will be handled correctly.
137
138
Coprocessor
Debug Interface
Overview
The ARM7TDMI debug interface is based on IEEE Std. 1149.1- 1990, Standard Test Access Port and Boundary-Scan Architecture. Please refer to this standard for an explanation of the terms used in this chapter and for a description of the TAP controller states. ARM7TDMI contains hardware extensions for advanced debugging features. These are intended to ease the users development of application software, operating systems, and the hardware itself. The debug extensions allow the core to be stopped either on a given instruction fetch (breakpoint) or data access (watchpoint), or asynchronously by a debug-request. When this happens, ARM7TDMI is said to be in debug state. At this point, the cores internal state and the systems external state may be examined. Once examination is complete, the core and system state may be restored and program execution resumed. ARM7TDMI is forced into debug state either by a request on one of the external debug interface signals, or by an internal functional unit known as ICEBreaker. Once in debug state, the core isolates itself from the memory system. The core can then be examined while all other system activity continues as normal. ARM7TDMIs internal state is examined via a JTAG-style serial interface, which allows instructions to be serially inserted into the cores pipeline without using the external data bus. Thus, when in debug state, a store-multiple (STM) could be inserted into the instruction pipeline and this would dump the contents of ARM7TDMIs registers. This data can be serially shifted out without affecting the rest of the system.
Debug Interface
139
Debug Systems
The ARM7TDMI forms one component of a debug system that interfaces from the high-level debugging performed by the user to the low-level interface supported by ARM7TDMI. Such a system typically has three parts: 1. The Debug Host This is a computer, for example a PC, running a software debugger such as ARMSD. The debug host allows the user to issue high level commands such as set breakpoint at location XX, or examine the contents of memory from 0x0 to 0x100. 2. The Protocol Converter The Debug Host will be connected to the ARM7TDMI development system via an interface (an RS232, for example). The messages broadcast over this connection must be converted to the interface signals of the ARM7TDMI, and this function is performed by the protocol converter. 3. ARM7TDMI ARM7TDMI, with hardware extensions to ease debugging, is the lowest level of the system. The debug extensions allow the user to stall the core from program execution, examine its internal state and the state of the memory system, and then resume program execution. Figure 75. Typical Debug System
Debug Host Host computer running ARMSD
Protocol Converter
Debug Target
The anatomy of ARM7TDMI is shown in Figure 77. The major blocks are: ARM7TDMI This is the CPU core, with hardware support for debug. ICEBreaker This is a set of registers and comparators used to generate debug exceptions (eg breakpoints). This unit is described in ICEBreaker Module on page 163. TAP controller This controls the action of the scan chains via a JTAG serial interface. The Debug Host and the Protocol Converter are system dependent. The rest of this chapter describes the ARM7TDMIs hardware debug extensions.
140
Debug
Debug
Debug Interface Signals
There are three primary external signals associated with the debug interface: BREAKPT and DBGRQ with which the system requests ARM7TDMI to enter debug state. DBGACK which ARM7TDMI uses to flag back to the system that it is in debug state. nal logic can monitor the address and data bus, and flag breakpoints and watchpoints via the BREAKPT pin. The timing is the same for externally generated breakpoints and watchpoints. Data must always be valid around the falling edge of MCLK. If this data is an instruction to be breakpointed, the BREAKPT signal must be HIGH around the next rising edge of MCLK. Similarly, if the data is for a load or store, this can be marked as watchpointed by asserting BREAKPT around the next rising edge of MCLK. When a breakpoint or watchpoint is generated, there may be a delay before ARM7TDMI enters debug state. When it does, the DBGACK signal is asserted in the HIGH phase of MCLK. The timing for an externally generated breakpoint is shown in Figure 76.
A[31:0]
D[31:0]
BREAKPT
DBGACK
nMREQ SEQ
Memory Cycles
Internal Cycles
Entry into debug state on breakpoint After an instruction has been breakpointed, the core does not enter debug state immediately. Instructions are marked as being breakpointed as they enter ARM7TDMIs instruction pipeline. Thus ARM7TDMI only enters debug state when (and if) the instruction reaches the pipelines execute stage. A breakpointed instruction may not cause ARM7TDMI to enter debug state for one of two reasons: a branch precedes the breakpointed instruction.
When the branch is executed, the instruction pipeline is flushed and the breakpoint is cancelled. an exception has occurred. Again, the instruction pipeline is flushed and the breakpoint is cancelled. However, the normal way to exit from an exception is to branch back to the instruction that would have executed next. This involves refilling the pipeline, and so the breakpoint can be re-flagged. When a breakpointed conditional instruction reaches the execute stage of the pipeline, the breakpoint is always
141
taken and ARM7TDMI enters debug state, regardless of whether the condition was met. Breakpointed instructions do not get executed: instead, ARM7TDMI enters debug state. Thus, when the internal state is examined, the state before the breakpointed instruction is seen. Once examination is complete, the breakpoint should be removed and program execution restarted from the previously breakpointed instruction. Entry into debug state on watchpoint Watchpoints occur on data accesses. A watchpoint is always taken, but the core may not enter debug state immediately. In all cases, the current instruction will complete. If this is a multi-word load or store (LDM or STM), many cycles may elapse before the watchpoint is taken. Watchpoints can be thought of as being similar to data aborts. The difference is however that if a data abort occurs, although the instruction completes, all subsequent changes to ARM7TDMIs state are prevented. This allows the cause of the abort to be cured by the abort handler, and the instruction re-executed. This is not so in the case of a watchpoint. Here, the instruction completes and all changes to the cores state occur (ie load data is written into the destination registers, and base write-back occurs). Thus the instruction does not need to be restarted. Watchpoints are always taken. If an exception is pending when a watchpoint occurs, the core enters debug state in the mode of that exception. Entry into debug state on debug-request ARM7TDMI may also be forced into debug state on debug request. This can be done either through ICEBreaker programming (see ICEBreaker Module on page 163) or be the assertion of the DBGRQ pin. This pin is an asynchronous input and is thus synchronised by logic inside
ARM7TDMI before it takes effect. Following synchronisation, the core will normally enter debug state at the end of the current instruction. However, if the current instruction is a busy-waiting access to a coprocessor, the instruction terminates and ARM7TDMI enters debug state immediately (this is similar to the action of nIRQ and nFIQ). Action of ARM7TDMI in debug state Once ARM7TDMI is in debug state, nMREQ and SEQ are forced to indicate internal cycles. This allows the rest of the memory system to ignore ARM7TDMI and function as normal. Since the rest of the system continues operation, ARM7TDMI must be forced to ignore aborts and interrupts. The BIGEND signal should not be changed by the system during debug. If it changes, not only will there be a synchronisation problem, but the programmers view of ARM7TDMI will change without the debuggers knowledge. nRESET must also be held stable during debug. If the system applies reset to ARM7TDMI (ie. nRESET is driven LOW) then ARM7TDMIs state will change without the debuggers knowledge. The BL[3:0] signals must remain HIGH while ARM7TDMI is clocked by DCLK in debug state to ensure all of the data in the scan cells is correctly latched by the internal logic. When instructions are executed in debug state, ARM7TDMI outputs (except nMREQ and SEQ) will change asynchronously to the memory system. For example, every time a new instruction is scanned into the pipeline, the address bus will change. Although this is asynchronous it should not affect the system, since nMREQ and SEQ are forced to indicate internal cycles regardless of what the rest of ARM7TDMI is doing. The memory controller must be designed to ensure that this asynchronous behaviour does not affect the rest of the system.
142
Debug
Debug
Scan Chains and JTAG Interface
There are three JTAG style scan chains inside ARM7TDMI. These allow testing, debugging and ICEBreaker programming. The scan chains are controlled from a JTAG style TAP (Test Access Port) controller. For further details of the JTAG specification, please refer to IEEE Standard 1149.1 1990 Standard Test Access Port and Boundary-Scan Architecture. In addition, support is provided for an optional fourth scan chain. This is intended to be used for an external boundary scan chain around the pads of a packaged device. The control signals provided for this scan chain are described later. Note: The scan cells are not fully JTAG compliant. The following sections describe the limitations on their use. Scan chain 0 Scan chain 0 allows access to the entire periphery of the ARM7TDMI core, including the data bus. The scan chain functions allow inter-device testing (EXTEST) and serial testing of the core (INTEST). The order of the scan chain (from SDIN to SDOUTMS) is: data bus bits 0 through 31, the control signals, followed by the address bus bits 31 through 0. Scan chain 1 Scan chain 1 is a subset of the signals that are accessible through scan chain 0. Access to the cores data bus D[31:0], and the BREAKPT signal is available serially. There are 33 bits in this scan chain, the order being (from serial data in to out): data bus bits 0 through 31, followed by BREAKPT. Scan Chain 2 This scan chain simply allows access to the ICEBreaker registers. Refer to ICEBreaker Module on page 163 for details.
Scan limitations
The three scan paths are referred to as scan chain 0, 1 and 2: these are shown in Figure 77.
Scan Chain 0
Processor
Scan Chain 2
ARM7TDMI
Scan Chain 1
TAP Controller
143
The state numbers are also shown on the diagram. These are output from ARM7TDMI on the TAPSM[3:0] bits.
tms=1
tms=1
tms=1
tms=1
144
Debug
Debug
Reset
The boundary-scan interface includes a state-machine controller (the TAP controller). In order to force the TAP controller into the correct state after power-up of the device, a reset pulse must be applied to the nTRST signal. If the boundary scan interface is to be used, nTRST must be driven LOW, and then HIGH again. If the boundary scan interface is not to be used, the nTRST input may be tied permanently LOW. Note that a clock on TCK is not necessary to reset the device. The action of reset is as follows: 1. System mode is selected (ie the boundary scan chain cells do not intercept any of the signals passing between the external system and the core). 2. The IDCODE instruction is selected. If the TAP controller is put into the Shift-DR state and TCK is pulsed, the contents of the ID register will be clocked out of TDO. In the descriptions that follow, TDI and TMS are sampled on the rising edge of TCK and all output transitions on TDO occur as a result of the falling edge of TCK.
EXTEST (0000)
The selected scan chain is placed in test mode by the EXTEST instruction. The EXTEST instruction connects the selected scan chain between TDI and TDO. When the instruction register is loaded with the EXTEST instruction, all the scan cells are placed in their test mode of operation. In the CAPTURE-DR state, inputs from the system logic and outputs from the output scan cells to the system are captured by the scan cells. In the SHIFT-DR state, the previously captured test data is shifted out of the scan chain via TDO, while new test data is shifted in via the TDI input. This data is applied immediately to the system logic and system pins.
Pullup Resistors
The IEEE 1149.1 standard effectively requires that TDI and TMS should have internal pullup resistors. In order to minimise static current draw, these resistors are not fitted to ARM7TDMI. Accordingly, the 4 inputs to the test interface (the above 3 signals plus TCK) must all be driven to good logic levels to achieve normal circuit operation.
SCAN_N (0010)
This instruction connects the Scan Path Select Register between TDI and TDO. During the CAPTURE-DR state, the fixed value 1000 is loaded into the register. During the SHIFT-DR state, the ID number of the desired scan path is shifted into the scan path select register. In the UPDATEDR state, the scan register of the selected scan chain is connected between TDI and TDO, and remains connected until a subsequent SCAN_N instruction is issued. On reset, scan chain 3 is selected by default. The scan path select register is 4 bits long in this implementation, although no finite length is specified.
Instruction Register
The instruction register is 4 bits in length. There is no parity bit. The fixed value loaded into the instruction register during the CAPTURE-IR controller state is 0001.
INTEST (1100)
The selected scan chain is placed in test mode by the INTEST instruction. The INTEST instruction connects the selected scan chain between TDI and TDO. When the instruction register is loaded with the INTEST instruction, all the scan cells are placed in their test mode of operation. In the CAPTURE-DR state, the value of the data applied from the core logic to the output scan cells, and the value of the data applied from the system logic to the input scan cells is captured. In the SHIFT-DR state, the previously captured test data is shifted out of the scan chain via the TDO pin, while new test data is shifted in via the TDI pin. Single-step operation is possible using the INTEST instruction.
Public Instructions
The following public instructions are supported: Table 34. Public Instructions
Instruction EXTEST SCAN_N INTEST IDCODE BYPASS CLAMP HIGHZ CLAMPZ SAMPLE/PRELOAD RESTART Binary Code 0000 0010 1100 1110 1111 0101 0111 1001 0011 0100
145
IDCODE (1110)
The IDCODE instruction connects the device identification register (or ID register) between TDI and TDO. The ID register is a 32-bit register that allows the manufacturer, part number and version of a component to be determined through the TAP. See ARM7TDMI device identification (ID) code register on page 147 for the details of the ID register format. When the instruction register is loaded with the IDCODE instruction, all the scan cells are placed in their normal (system) mode of operation. In the CAPTURE-DR state, the device identification code is captured by the ID register. In the SHIFT-DR state, the previously captured device identification code is shifted out of the ID register via the TDO pin, while data is shifted in via the TDI pin into the ID register. In the UPDATE-DR state, the ID register is unaffected.
HIGHZ (0111)
This instruction connects a 1 bit shift register (the BYPASS register) between TDI and TDO. When the HIGHZ instruction is loaded into the instruction register, the Address bus, A[31:0], the data bus, D[31:0], plus nRW, nOPC, LOCK, MAS[1:0] and nTRANS are all driven to the high impedance state and the external HIGHZ signal is driven HIGH. This is as if the signal TBE had been driven LOW. In the CAPTURE-DR state, a logic 0 is captured by the bypass register. In the SHIFT-DR state, test data is shifted into the bypass register via TDI and out via TDO after a delay of one TCK cycle. Note that the first bit shifted out will be a zero. The bypass register is not affected in the UPDATE-DR state.
CLAMPZ (1001)
This instruction connects a 1 bit shift register (the BYPASS register) between TDI and TDO. When the CLAMPZ instruction is loaded into the instruction register, all the 3-state outputs (as described above) are placed in their inactive state, but the data supplied to the outputs is derived from the scan cells. The purpose of this instruction is to ensure that, during production test, each output can be disabled when its data value is either a logic 0 or a logic 1. In the CAPTURE-DR state, a logic 0 is captured by the bypass register. In the SHIFT-DR state, test data is shifted into the bypass register via TDI and out via TDO after a delay of one TCK cycle. Note that the first bit shifted out will be a zero. The bypass register is not affected in the UPDATE-DR state.
BYPASS (1111)
The BYPASS instruction connects a 1 bit shift register (the BYPASS register) between TDI and TDO. When the BYPASS instruction is loaded into the instruction register, all the scan cells are placed in their normal (system) mode of operation. This instruction has no effect on the system pins. In the CAPTURE-DR state, a logic 0 is captured by the bypass register. In the SHIFT-DR state, test data is shifted into the bypass register via TDI and out via TDO after a delay of one TCK cycle. Note that the first bit shifted out will be a zero. The bypass register is not affected in the UPDATE-DR state. Note that all unused instruction codes default to the BYPASS instruction.
CLAMP (0101)
This instruction connects a 1 bit shift register (the BYPASS register) between TDI and TDO. When the CLAMP instruction is loaded into the instruction register, the state of all the output signals is defined by the values previously loaded into the currently loaded scan chain. Note This instruction should only be used when scan chain 0 is the currently selected scan chain. In the CAPTURE-DR state, a logic 0 is captured by the bypass register. In the SHIFT-DR state, test data is shifted into the bypass register via TDI and out via TDO after a delay of one TCK cycle. Note that the first bit shifted out will be a zero. The bypass register is not affected in the UPDATE-DR state.
SAMPLE/PRELOAD (0011)
This instruction is included for production test only, and should never be used.
RESTART (0100)
This instruction is used to restart the processor on exit from debug state. The RESTART instruction connects the bypass register between TDI and TDO and the TAP controller behaves as if the BYPASS instruction had been loaded. The processor will resynchronise back to the memory system once the RUN-TEST/IDLE state is entered.
146
Debug
Debug
Test Data Registers
There are 6 test data registers which may be connected between TDI and TDO. They are: Bypass Register, ID Code Register, Scan Chain Select Register, Scan chain 0, 1 or 2. These are now described in detail. with a delay of one TCK cycle. There is no parallel output from the bypass register. A logic 0 is loaded from the parallel input of the bypass register in the CAPTURE-DR state.
Bypass register
Purpose: Bypasses the device during scan testing by providing a path between TDI and TDO. Length: 1 bit Operating Mode When the BYPASS instruction is the current instruction in the instruction register, serial data is transferred from TDI to TDO in the SHIFT-DR state
31 28 27
12 11
Version
Part Number
Manufacturer Identity
Please contact your supplier for the correct Device Identification Code. Operating mode: When the IDCODE instruction is current, the ID register is selected as the serial path between TDI and TDO. There is no parallel output from the ID register. The 32-bit device identification code is loaded into the ID register from its parallel inputs during the CAPTURE-DR state.
Instruction register
Purpose: Changes the current TAP instruction. Length: 4 bits Operating mode: When in the SHIFT-IR state, the instruction register is selected as the serial path between TDI and TDO. During the CAPTURE-IR state, the value 0001 binary is loaded into this register. This is shifted out during SHIFT-IR (lsb first), while a new instruction is shifted in (lsb first). During the UPDATE-IR state, the value in the instruction register becomes the current instruction. On reset, IDCODE becomes the current instruction.
During the CAPTURE-DR state, the value 1000 binary is loaded into this register. This is shifted out during SHIFTDR (lsb first), while a new value is shifted in (lsb first). During the UPDATE-DR state, the value in the register selects a scan chain to become the currently active scan chain. All further instructions such as INTEST then apply to that scan chain. The currently selected scan chain only changes when a SCAN_N instruction is executed, or a reset occurs. On reset, scan chain 3 is selected as the active scan chain. The number of the currently selected scan chain is reflected on the SCREG[3:0] outputs. The TAP controller may be used to drive external scan chains in addition to those within the ARM7TDMI macrocell. The external scan chain must be assigned a number and control signals for it can be derived from SCREG[3:0], IR[3:0], TAPSM[3:0], TCK1 and TCK2. The list of scan chain numbers allocated by ARM are shown in Table 35. An external scan chain may take any other number.The serial data stream to be applied to the external scan chain is made present on SDINBS, the serial data back from the scan chain must be presented to the TAP controller on the SDOUTBS input. The scan chain present between SDINBS and SDOUTBS will be connected between TDI and TDO whenever scan chain 3 is selected, or when any of the unassigned scan chain numbers is selected. If there is more than one external scan chain, a multiplexor must be built externally to apply the desired scan chain output to SDOUTBS. The multiplexor can be controlled by decoding SCREG[3:0]
147
Scan chain 0 and 1 Purpose: Allows access to the processor core for test and debug. Length: Scan chain 0: 105 bits Scan chain 1: 33 bits Each scan chain cell is fairly simple, and consists of a serial register and a multiplexer. The scan cells perform two basic functions, capture and shift. For input cells, the capture stage involves copying the value of the system input to the core into the serial register. During shift, this value is output serially. The value applied to the core from an input cell is either the system input or the contents of the serial register, and this is controlled by the multiplexer.
Serial Data In
For output cells, capture involves placing the value of a cores output into the serial register. During shift, this value is serially output as before. The value applied to the system from an output cell is either the core output, or the contents of the serial register. All the control signals for the scan cells are generated internally by the TAP controller. The action of the TAP controller is determined by the current instruction, and the state of the TAP state machine. This is described below. There are three basic modes of operation of the scan chains, INTEST, EXTEST and SYSTEM, and these are selected by the various TAP controller instructions. In SYSTEM mode, the scan cells are idle. System data is applied to inputs, and core outputs are applied to the system. In INTEST mode, the core is internally tested. The data serially scanned in is applied to the core, and the resulting outputs are captured in the output cells and scanned out. In 148
EXTEST mode, data is scanned onto the core's outputs and applied to the external system. System input data is captured in the input cells and then shifted out. Note: The scan cells are not fully JTAG compliant in that they do not have an Update stage. Therefore, while data is being moved around the scan chain, the contents of the scan cell is not isolated from the output. Thus the output from the scan cell to the core or to the external system could change on every scan clock. This does not affect ARM7TDMI since its internal state does not change until it is clocked. However, the rest of the system needs to be aware that every output could change asynchronously as data is moved around the scan chain. External logic must ensure that this does not harm the rest of the system.
Debug
Debug
Scan chain 0 Scan chain 0 is intended primarily for inter-device testing (EXTEST), and testing the core (INTEST). Scan chain 0 is selected via the SCAN_N instruction: see SCAN_N (0010) on page 145. INTEST allows serial testing of the core. The TAP Controller must be placed in INTEST mode after scan chain 0 has been selected. During CAPTURE-DR, the current outputs from the cores logic are captured in the output cells. During SHIFT-DR, this captured data is shifted out while a new serial test pattern is scanned in, thus applying known stimuli to the inputs. During RUN-TEST/IDLE, the core is clocked. Normally, the TAP controller should only spend 1 cycle in RUN-TEST/IDLE. The whole operation may then be repeated. For details of the cores clocks during test and debug, see ARM7TDMI Core Clocks on page 151. EXTEST allows inter-device testing, useful for verifying the connections between devices on a circuit board. The TAP Controller must be placed in EXTEST mode after scan chain 0 has been selected. During CAPTURE-DR, the current inputs to the core's logic from the system are captured in the input cells. During SHIFT-DR, this captured data is shifted out while a new serial test pattern is scanned in, thus applying known values on the cores outputs. During UPDATE-DR, the value shifted into the data bus D[31:0] scan cells appears on the outputs. For all other outputs, the value appears as the data is shifted round. Note, during RUN-TEST/IDLE, the core is not clocked. The operation may then be repeated. Scan chain 1 The primary use for scan chain 1 is for debugging, although it can be used for EXTEST on the data bus. Scan chain 1 is selected via the SCAN_N TAP Controller instruction. Debugging is similar to INTEST, and the procedure described above for scan chain 0 should be followed. Note that this scan chain is 33 bits long - 32 bits for the data value, plus the scan cell on the BREAKPT core input. This 33rd bit serves four purposes: 1. Under normal INTEST test conditions, it allows a known value to be scanned into the BREAKPT input. 2. During EXTEST test conditions, the value applied to the BREAKPT input from the system can be captured. 3. While debugging, the value placed in the 33rd bit determines whether ARM7TDMI synchronises back to system speed before executing the instruction. SeeSystem speed access on page 156 for further details. 4. After ARM7TDMI has entered debug state, the first time this bit is captured and scanned out, its value tells the debugger whether the core entered debug state due to a breakpoint (bit 33 LOW), or a watchpoint (bit 33 HIGH).
149
Scan chain 2 Purpose: Allows ICEBreakers registers to be accessed. The order of the scan chain, from TDI to TDO is: read/write, register address bits 4 to 0, followed by data value bits 31 to 0. See Figure 84. Length: 38 bits. To access this serial register, scan chain 2 must first be selected via the SCAN_N TAP controller instruction. The TAP controller must then be placed in INTEST mode. No action is taken during CAPTURE-DR. During SHIFT-DR, a data value is shifted into the serial register. Bits 32 to 36 specify the address of the ICEBreaker register to be accessed. During UPDATE-DR, this register is either read or written depending on the value of bit 37 (0 = read). Refer to ICEBreaker Module on page 163 for further details. Scan chain 3 Purpose: Allows ARM7TDMI to control an external boundary scan chain. Length: User defined. Scan chain 3 is provided so that an optional external boundary scan chain may be controlled via ARM7TDMI. Typically this would be used for a scan chain around the pad ring of a packaged device. The following control signals are provided which are generated only when scan chain 3 has been selected. These outputs are inactive at all other times.
DRIVEBS This would be used to switch the scan cells from system mode to test mode. This signal is asserted whenever either the INTEST, EXTEST, CLAMP or CLAMPZ instruction is selected. PCLKBS This is an update clock, generated in the UPDATE-DR state. Typically the value scanned into a chain would be transferred to the cell output on the rising edge of this signal. ICAPCLKBS, ECAPCLKBS These are capture clocks used to sample data into the scan cells during INTEST and EXTEST respectively. These clocks are generated in the CAPTURE-DR state. SHCLKBS, SHCLK2BS These are non-overlapping clocks generated in the SHIFT-DR state used to clock the master and slave element of the scan cells respectively. When the state machine is not in the SHIFT-DR state, both these clocks are LOW. nHIGHZ This signal may be used to drive the outputs of the scan cells to the high impedance state. This signal is driven LOW when the HIGHZ instruction is loaded into the instruction register, and HIGH at all other times. In addition to these control outputs, SDINBS output and SDOUTBS input are also provided. When an external scan chain is in use, SDOUTBS should be connected to the serial data output and SDINBS should be connected to the serial data input.
150
Debug
Debug
ARM7TDMI Core Clocks
ARM7TDMI has two clocks, the memory clock, MCLK, and an internally TCK generated clock, DCLK. During normal operation, the core is clocked by MCLK, and internal logic holds DCLK LOW. When ARM7TDMI is in the debug state, the core is clocked by DCLK under control of the TAP state machine, and MCLK may free run. The selected clock is output on the signal ECLK for use by the external system. Note that when the CPU core is being debugged and is running from DCLK, nWAIT has no effect. Figure 80. Clock Switching on Entry to Debug State There are two cases in which the clocks switch: during debugging and during testing.
MCLK
DBGACK
DCLK
ARM7TDMI is forced to use DCLK as the primary clock until debugging is complete. On exit from debug, the core must be allowed to synchronise back to MCLK. This must be done in the following sequence. The final instruction of the debug sequence must be shifted into the data bus scan chain and clocked in by asserting DCLK. At this point, BYPASS must be clocked into the TAP instruction register. ARM7TDMI will now automatically resynchronise back to MCLK and start fetching instructions from memory at MCLK speed. Please refer also to Exit from debug state on page 153.
into test is less automatic than debug and some care must be taken. On the way into test, MCLK must be held LOW. The TAP controller can now be used to serially test ARM7TDMI. If scan chain 0 and INTEST are selected, DCLK is generated while the state machine is in the RUN-TEST/IDLE state. During EXTEST, DCLK is not generated. On exit from test, BYPASS must be selected as the TAP controller instruction. When this is done, MCLK can be allowed to resume. After INTEST testing, care should be taken to ensure that the core is in a sensible state before switching back. The safest way to do this is to either select BYPASS and then cause a system reset, or to insert MOV PC, #0 into the instruction pipeline before switching back.
151
[R0] ; Save R0 before use PC ; Copy PC into R0 [R0] ; Now save the PC in R0 ; Jump into ARM state R8 ; NOP R8 ; NOP
Note: Since all THUMB instructions are only 16 bits long, the simplest course of action when shifting them into Scan Chain 1 is to repeat the instruction twice. For example, the encoding for BX R0 is 0x4700. Thus if 0x47004700 is shifted into scan chain 1, the debugger does not have to keep track of which half of the bus the processor expects to read the data from. From this point on, the processors state can be determined by the sequences of ARM instructions described below. Once the processor is in ARM state, typically the first instruction executed would be:
STM R0, {R0-R15}
This causes the contents of the registers to be made visible on the data bus. These values can then be sampled and shifted out.
Note: The above use of R0 as the base register for the STM is for illustration only, any register could be used. After determining the values in the current bank of registers, it may be desirable to access the banked registers. This can only be done by changing mode. Normally, a mode change may only occur if the core is already in a privileged mode. However, while in debug state, a mode change from any mode into any other mode may occur. Note that the debugger must restore the original mode before exiting debug state. For example, assume that the debugger had been asked to return the state of the USER mode and FIQ mode registers, and debug state was entered in supervisor mode. The instruction sequence could be:
STM MRS STR BIC ORR MSR STM ORR MSR STM
R0, {R0-R15}; Save current registers R0, CPSR R0, R0 ; Save CPSR to determine current mode R0, 0x1F ; Clear mode bits R0, 0x10 ; Select user mode CPSR, R0 ; Enter USER mode R0, {R13,R14}; Save register not previously visible R0, 0x01 ; Select FIQ mode CPSR, R0 ; Enter FIQ mode R0, {R8-R14}; Save banked FIQ registers
152
Debug
Debug
All these instructions are said to execute at debug speed. Debug speed is much slower than system speed since between each core clock, 33 scan clocks occur in order to shift in an instruction, or shift out data. Executing instructions more slowly than usual is fine for accessing the cores state since ARM7TDMI is fully static. However, this same method cannot be used for determining the state of the rest of the system. While in debug state, only the following instructions may legally be scanned into the instruction pipeline for execution: all data processing operations, except TEQP all load, store, load multiple and store multiple instructions MSR and MRS By the use of system speed load multiples and debug speed store multiples, the state of the systems memory can be fed back to the debug host. There are restrictions on which instructions may have the 33rd bit set. The only valid instructions on which to set this bit are loads, stores, load multiple and store multiple. See also <Reference><body> Exit from debug state<body> . When ARM7TDMI returns to debug state after a system speed access, bit 33 of scan chain 1 is set HIGH. This gives the debugger information about why the core entered debug state the first time this scan chain is read.
153
For example, imagine a fictitious peripheral that simply counts the number of memory cycles. This device should return the same answer after a program has been run both Figure 81. Debug Exit Sequence
with and without debugging. Figure 81 shows the behaviour of ARM7TDMI on exit from the debug state.
ECLK
nMREQ SEQ
Internal Cycles
A[31:0]
Ab
Ab+4
Ab+8
D[31:0]
DBGACK
It can be seen from Figure 76 that the final memory access occurs in the cycle after DBGACK goes HIGH, and this is the point at which the cycle counter should be disabled. Figure 81 shows that the first memory access that the cycle counter has not seen before occurs in the cycle after DBGACK goes LOW, and so this is the point at which the counter should be re-enabled. Note that when a system speed access from debug state occurs, ARM7TDMI temporarily drops out of debug state, and so DBGACK can go LOW. If there are peripherals which are sensitive to the number of memory accesses, they must be led to believe that ARM7TDMI is still in debug state. By programming the ICEBreaker control register, the value on DBGACK can be forced to be HIGH. See ICEBreaker Module on page 163 for more details.
154
Debug
Debug
The PCs Behaviour During Debug
In order that ARM7TDMI may be forced to branch back to the place at which program flow was interrupted by debug, the debugger must keep track of what happens to the PC. There are five cases: breakpoint, watchpoint, watchpoint when another exception occurs, debug request and system speed access.
Breakpoint
Entry to the debug state from a breakpoint advances the PC by 4 addresses, or 16 bytes. Each instruction executed in debug state advances the PC by 1 address, or 4 bytes. The normal way to exit from debug state after a breakpoint is to remove the breakpoint, and branch back to the previously breakpointed address. For example, if ARM7TDMI entered debug state from a breakpoint set on a given address and 2 debug speed instructions were executed, a branch of -7 addresses must occur (4 for debug entry, +2 for the instructions, +1 for the final branch). The following sequence shows the data scanned into scan chain 1. This is msb first, and so the first digit is the value placed in the BREAKPT bit, followed by the instruction data.
0 E0802000; ADD R2, R0, R0 1 E1826001; ORR R6, R2, R1 0 EAFFFFF9; B -7 (2s complement)
Note that once in debug state, a minimum of two instructions must be executed before the branch, although these may both be NOPs (MOV R0, R0). For small branches, the final branch could be replaced with a subtract with the PC as the destination (SUB PC, PC, #28 in the above example).
Watchpoints
Returning to program execution after entering debug state from a watchpoint is done in the same way as the procedure described above. Debug entry adds 4 addresses to the PC, and every instruction adds 1 address. The difference is that since the instruction that caused the watchpoint has executed, the program returns to the next instruction.
This will force a branch back to the abort vector, causing the instruction at that location to be refetched and executed. Note that after the abort service routine, the instruction which caused the abort and watchpoint will be reexecuted. This will cause the watchpoint to be generated and hence ARM7TDMI will enter debug state again.
155
Debug request
Entry into debug state via a debug request is similar to a breakpoint. However, unlike a breakpoint, the last instruction will have completed execution and so must not be refetched on exit from debug state. Therefore, it can be thought that entry to debug state adds 3 addresses to the PC, and every instruction executed in debug state adds 1. For example, suppose that the user has invoked a debug request, and decides to return to program execution straight away. The following sequence could be used:
0 E1A00000; MOV R0, R0 1 E1A00000; MOV R0, R0 0 EAFFFFFA; B -6
This is similar to an aborted watchpoint except that the problem is much harder to fix, because the abort was not caused by an instruction in the main program, and the PC does not point to the instruction which caused the abort. An abort handler usually looks at the PC to determine the instruction which caused the abort, and hence the abort address. In this case, the value of the PC is invalid, but the debugger should know what location was being accessed. Thus the debugger can be written to help the abort handler fix the memory system.
This restores the PC, and restarts the program from the next instruction.
156
Debug
Debug
Priorities / Exceptions
Because the normal program flow is broken when a breakpoint or a debug request occurs, debug can be thought of as being another type of exception. Some of the interaction with other exceptions has been described above. This section summarises the priorities. ARM7TDMI will never be forced into an interrupt mode. Interrupts only have this effect on watchpointed accesses. They are ignored at all times on breakpoints. If an interrupt was pending during the instruction prior to entering debug state, ARM7TDMI will enter debug state in the mode of the interrupt. Thus, on entry to debug state, the debugger cannot assume that ARM7TDMI will be in the expected mode of the users program. It must check the PC, the CPSR and the SPSR to fully determine the reason for the exception. Thus, debug takes higher priority than the interrupt, although ARM7TDMI remembers that an interrupt has occurred.
Data aborts
As described above, when a data abort occurs on a watchpointed access, ARM7TDMI enters debug state in abort mode. Thus the watchpoint has higher priority than the abort, although, as in the case of interrupt, ARM7TDMI remembers that the abort happened.
Interrupts
When ARM7TDMI enters debug state, interrupts are automatically disabled. If interrupts are disabled during debug,
157
TCK Tbscl TMS TDI Tbsis TDO Tbsoh Tbsod Data In Tbsss Data Out Tbsdh Tbsdd Tbsdh Tbsdd Tbssh Tbsih Tbsch
16.4
2 2 1 1 2 2
Notes: 1. For correct data latching, the I/O signals (from the core and the pads) must be setup and held with respect to the rising edge of TCK in the CAPTUREDR state of the INTEST and EXTEST instructions. 2. Assumes that the data outputs are loaded with the AC test loads (see AC parameter specification).
All delays are provisional and assume a process which achieves 33MHz MCLK maximum operating frequency. In the above table all units are ns.
158
Debug
Debug
Table 37. Macrocell Scan Signals and Pins
No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Signal D[0] D[1] D[2] D[3] D[4] D[5] D[6] D[7] D[8] D[9] D[10] D[11] D[12] D[13] D[14] D[15] D[16] D[17] D[18] D[19] D[20] D[21] D[22] D[23] D[24] D[25] D[26] D[27] D[28] D[29] D[30] D[31] BREAKPT NENIN NENOUT LOCK BIGEND DBE MAS[0] MAS[1] BL[0] BL[1] BL[2] BL[3] DCTL ** nRW DBGACK Type I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I I O O I I O O I I I I O O O
159
KeyI - Input O - Output I/O - Input/Output Note: DCTL is not described in this datasheet. DCTL is an output from the processor used to control the unidirectional data out latch, DOUT[31:0]. This signal is not visible from the periphery of ARM7TDMI.
160
Debug
Debug
Debug Timing
Table 38. ARM7TDMI debug interface timing
Symbol Ttdbgd Ttpfd Ttpfh Ttprd Ttprh Ttckr Ttckf Tecapd Tdckf Tdckfh Tdckr Tdckrh Ttrstd Ttrsts Tsdtd Tclkbs Tshbsr Tshbsf Parameter TCK falling to DBGACK, DBGRQI changing TCKf to TAP outputs TAP outputs hold time from TCKf TCKr to TAP outputs TAP outputs hold time from TCKr TCK to TCK1, TCK2 rising TCK to TCK1, TCK2 falling TCK to ECAPCLK changing DCLK induced: TCKf to various outputs valid DCLK induced: Various outputs hold from TCKf DCLK induced: TCKr to various outputs valid DCLK induced: Various outputs hold from TCKr nTRSTf to TAP outputs valid nTRSTr setup to TCKr SDOUTBS to TDO valid TCK to Boundary Scan Clocks TCK to SHCLKBS, SHCLK2BS rising TCK to SHCLKBS, SHCLK2BS falling Min Max 13.3 10.0 8.0 2.4 7.8 6.1 8.2 23.8 6.0 26.6 6.0 8.5 2.3 10.0 8.2 5.7 4.0
2.4
Notes: All delays are provisional and assume a process which achieves 33MHz MCLK maximum operating frequency. Assumes that the data outputs are loaded with the AC test loads (see AC parameter specification). All units are ns.
161
162
Debug
ICEBreaker Module
This chapter describes the ARM7TDMI ICEBreaker module. Note: The name ICEbreaker has changed. It is now known as the EmbeddedICE macrocell. Future versions of the datasheet will reflect this change.
ICEBreaker Module
163
Overview
The ARM7TDMI-ICEBreaker module, hereafter referred to simply as ICEBreaker, provides integrated on-chip debug support for the ARM7TDMI core. ICEBreaker is programmed in a serial fashion using the ARM7TDMI TAP controller. It consists of two real-time watchpoint units, together with a control and status register. One or both of the watchpoint units can be programmed to halt the execution of instructions by the ARM7TDMI core via its BREAKPT signal. Execution is halted when a match occurs between the values programmed into ICEBreaker and the values currently appearing on the address bus, data bus and various control signals. Any bit can be masked so that its value does not affect the comparison. Figure 83. ARM7TDMI Block Diagram
DBGRQI A[31:0] EXTERN1 D[31:0] nOPC nRW TBIT RANGEOUT1 RANGEOUT0 EXTERN0
Figure 83 shows the relationship between the core, ICEBreaker and the TAP controller. Either watchpoint unit can be configured to be a watchpoint (monitoring data accesses) or a breakpoint (monitoring instruction fetches). Watchpoints and breakpoints can be made to be datadependent. Two independent registers, Debug Control and Debug Status, provide overall control of ICEBreakers operation. Note: Only those signals that are pertinent to ICEBreaker are shown.
Processor
MAS[1:0]
ICEBreaker
nTRANS
DBGACK BREAKPT
SDIN
SDOUT TCK
nTRST
TMS
TAP
TDI TDO
164
ICEBreaker
ICEBreaker
The Watchpoint Registers
The two watchpoint units, known as Watchpoint 0 and Watchpoint 1, each contain three pairs of registers: 1. Address Value and Address Mask 2. Data Value and Data Mask Table 39. Function and Mapping of ICEBreaker Registers
Address 00000 00001 00100 00101 01000 01001 01010 01011 01100 01101 10000 10001 10010 10011 10100 10101 Width 3 5 6 32 32 32 32 32 9 8 32 32 32 32 9 8 Function Debug Control Debug Status Debug Comms Control Register Debug Comms Data Register Watchpoint 0 Address Value Watchpoint 0 Address Mask Watchpoint 0 Data Value Watchpoint 0 Data Mask Watchpoint 0 Control Value Watchpoint 0 Control Mask Watchpoint 1Address Value Watchpoint 1 Address Mask Watchpoint 1 Data Value Watchpoint 1 Data Mask Watchpoint 1 Control Value Watchpoint 1 Control Mask
3. Control Value and Control Mask Each register is independently programmable, and has its own address: see Table 39.
165
bit address field and a read/write bit. This is shown in Figure 84.
4
Address
0 31
32
Data
+
A[31:0] D[31:0] Control
Comparator Value Mask
BREAKPOINT
TDI
TDO
The data to be written is scanned into the 32-bit data field, the address of the register into the 5-bit address field and a 1 into the read/write bit. A register is read by scanning its address into the address field and a 0 into the read/write bit. The 32-bit data field is ignored. The register addresses are shown in Table 39. Note: A read or write actually takes place when the TAP controller enters the UPDATE-DR state. Setting the mask bit to 0 means that the comparator will only match if the input value matches the value programmed into the value register.
166
ICEBreaker
ICEBreaker
The control registers
The Control Value and Control Mask registers are mapped identically in the lower eight bits, as shown below. Bit 8 of the control value register is the ENABLE bit, which cannot be masked. Figure 85. Watchpoint Control Value and Mask Format
8 7 6 5 4 3 nOPC 2 1 0 nRW
ENABLE RANGE
MAS[1] MAS[0]
The bits have the following functions: nRW compares against the not read/write signal from the core in order to detect the direction of bus activity. nRW is 0 for a read cycle and 1 for a write cycle. MAS[1:0] compares against the MAS[1:0] signal from the core in order to detect the size of bus activity. The encoding is shown in the following table. Table 40. MAS [1:0] Signal Encoding
bit 1 0 0 1 1 bit 0 0 1 0 1 Data size byte halfword word (reserved)
nOPC is used to detect whether the current cycle is an instruction fetch (nOPC = 0) or a data access (nOPC = 1). nTRANS compares against the not translate signal from the core in order to distinguish between User mode (nTRANS = 0) and non-User mode (nTRANS = 1) accesses.
EXTERN is an external input to ICEBreaker which allows the watchpoint to be dependent upon some external condition. The EXTERN input for Watchpoint 0 is labelled EXTERN0 and the EXTERN input for Watchpoint 1 is labelled EXTERN1. CHAIN can be connected to the chain output of another watchpoint in order to implement, for example, debugger requests of the form breakpoint on address YYY only when in process XXX. In the ARM7TDMI-ICEBreaker, the CHAINOUT output of Watchpoint 1 is connected to the CHAIN input of Watchpoint 0. The CHAINOUT output is derived from a latch; the address/control field comparator drives the write enable for the latch and the input to the latch is the value of the data field comparator. The CHAINOUT latch is cleared when the Control Value register is written or when nTRST is LOW. RANGE can be connected to the range output of another watchpoint register. In the ARM7TDMI-ICEBreaker, the RANGEOUT output of Watchpoint 1 is connected to the RANGE input of Watchpoint 0. This allows the two watchpoints to be coupled for detecting conditions that occur simultaneously, eg for range-checking. ENABLE If a watchpoint match occurs, the BREAKPT signal will only be asserted when the ENABLE bit is set. This bit only exists in the value register: it cannot be masked. For each of the bits 8:0 in the Control Value register, there is a corresponding bit in the Control Mask register. This removes the dependency on particular signals.
167
Programming Breakpoints
Breakpoints can be classified as hardware breakpoints or software breakpoints. Hardware breakpoints: Typically monitor the address value and can be set in any code, even in code that is in ROM or code that is self-modifying. Software breakpoints: Monitor a particular bit pattern being fetched from any address. One ICEBreaker watchpoint can thus be used to support any number of software breakpoints. Software breakpoints can normally only be set in RAM because an instruction has to be replaced by the special bit pattern chosen to cause a software breakpoint.
Software breakpoints
To make a watchpoint unit cause software breakpoints (ie on instruction fetches of a particular bit pattern): 1. Program its Address Mask register to 0xFFFFFFFF (all bits set to 1) so that the address is disregarded. 2. Program the Data Value register with the particular bit pattern that has been chosen to represent a software breakpoint. If a THUMB software breakpoint is being programmed, the 16-bit pattern must be repeated in both halves of the Data Value register. For example, if the bit pattern is 0xDFFF, then 0xDFFFDFFF must be programmed. When a 16-bit instruction is fetched, ICEbreaker only compares the valid half of the data bus against the contents of the Data Value register. In this way, a single Watchpoint register can be used to catch software breakpoints on both the upper and lower halves of the data bus. 3. Program the Data Mask register to 0x00000000. 4. Program the Control Value register with nOPC = 0. 5. Program the Control Mask register with nOPC = 0, all other bits to 1. 6. If you wish to make the distinction between user and non-user mode instruction fetches, program the nTRANS bit in the Control Value and Control Mask registers accordingly. 7. If required, program the EXTERN, RANGE and CHAIN bits in the same way. Note: The address value register need not be programmed. Setting the breakpoint To set the software breakpoint: 1. Read the instruction at the desired address and store it away. 2. Write the special bit pattern representing a software breakpoint at the address. Clearing the breakpoint To clear the software breakpoint, restore the instruction to the address.
Hardware breakpoints
To make a watchpoint unit cause hardware breakpoints (ie on instruction fetches): 1. Program its Address Value register with the address of the instruction to be breakpointed. 2. For a breakpoint in ARM state, program bits [1:0] of the Address Mask register to 1. For a breakpoint in THUMB state, program bit 0 of the Address Mask to 1. In both cases the remaining bits are set to 0. 3. Program the Data Value register only if you require a data-dependent breakpoint: ie only if the actual instruction code fetched must be matched as well as the address. If the data value is not required, program the Data Mask register to 0xFFFFFFFF (all bits to1), otherwise program it to0x00000000. 4. Program the Control Value register with nOPC = 0. 5. Program the Control Mask register with nOPC =0, all other bits to 1. 6. If you need to make the distinction between user and non-user mode instruction fetches, program the nTRANS Value and Mask bits as above. 7. If required, program the EXTERN, RANGE and CHAIN bits in the same way.
168
ICEBreaker
ICEBreaker
Programming Watchpoints
To make a watchpoint unit cause watchpoints (ie on data accesses): 1. Program its Address Value register with the address of the data access to be watchpointed. 2. Program the Address Mask register to 0x00000000. 3. Program the Data Value register only if you require a data-dependent watchpoint; i.e. only if the actual data value read or written must be matched as well as the address. If the data value is irrelevant, program the Data Mask register to 0xFFFFFFFF (all bits set to 1) otherwise program it to 0x00000000. 4. Program the Control Value register with nOPC = 1, nRW = 0 for a read or nRW = 1 for a write, MAS[1:0] with the value corresponding to the appropriate data size. 5. Program the Control Mask register with nOPC = 0, nRW = 0, MAS[1:0] = 0, all other bits to 1. Note that nRW or MAS[1:0] may be set to 1 if both reads and writes or data size accesses are to be watchpointed respectively. 6. If you wish to make the distinction between user and non-user mode data accesses, program the nTRANS bit in the Control Value and Control Mask registers accordingly. 7. If required, program the EXTERN, RANGE and CHAIN bits in the same way. Note: The above are just examples of how to program the watchpoint register to generate breakpoints and watchpoints; many other ways of programming the registers are possible. For instance, simple range breakpoints can be provided by setting one or more of the address mask bits.
1
DBGRQ
0
DBGACK
Bits 1 and 0 allow the values on DBGRQ and DBGACK to be forced. As shown in Figure 87, the value stored in bit 1 of the control register is synchronised and then ORed with the external DBGRQ before being applied to the processor. The output of this OR gate is the signal DBGRQI which is brought out externally from the macrocell. The synchronisation between control bit 1 and DBGRQI is to assist in multiprocessor environments. The synchronisation latch only opens when the TAP controller state machine is in the RUN-TEST/IDLE state. This allows an enter debug condition to be set up in all the processors in the system while they are still running. Once the condition is set up in all the processors, it can then be applied to them simultaneously by entering the RUN-TEST/IDLE state. In the case of DBGACK, the value of DBGACK from the core is ORed with the value held in bit 0 to generate the external value of DBGACK seen at the periphery of ARM7TDMI. This allows the debug system to signal to the rest of the system that the core is still being debugged even when system-speed accesses are being performed (in which case the internal DBGACK signal from the core will be LOW). If Bit 2 (INTDIS) is asserted, the interrupt enable signal (IFEN) of the core is forced LOW. Thus all interrupts (IRQ and FIQ) are disabled during debugging (DBGACK =1) or if the INTDIS bit is asserted. The IFEN signal is driven according to the following table: Table 41. IFEN Signal Control
DBGACK 0 1 x INTDIS 0 x 1 IFEN 1 0 0
169
3
nMREQ
2
IFEN
1
DBGRQ
0
DBGACK
The function of each bit in this register is as follows: Bits 1 and 0 allow the values on the synchronised versions of DBGRQ and DBGACK to be read.
Bit 2 allows the state of the core interrupt enable signal (IFEN) to be read. Since the capture clock for the scan chain may be asynchronous to the processor clock, the DBGACK output from the core is synchronised before being used to generate the IFEN status bit. Bit 3 allows the state of the NMREQ signal from the core (synchronised to TCK) to be read. This allows the debugger to determine that a memory access from the debug state has completed. Bit 4 allows TBIT to be read. This enables the debugger to determine what state the processor is in, and hence which instructions to execute. The structure of the debug status register bits is shown in Figure 87.
Figure 87. Structure of TBIT, NMREQ, DBGACK, DBGRQ and INTDIS Bits
Debug Control Register Debug Status Register
TBIT (from core) nMREQ (from core) DBGACK (from core) Bit 2
Synch
Bit 4
Synch
Bit 3
+ +
Synch Bit 2
+
Synch Bit 1
Bit 0
+
Synch Bit 0
170
ICEBreaker
ICEBreaker
Coupling Breakpoints and Watchpoints
Watchpoint units 1 and 0 can be coupled together via the CHAIN and RANGE inputs. The use of CHAIN enables watchpoint 0 to be triggered only if watchpoint 1 has previously matched. The use of RANGE enables simple range checking to be performed by combining the outputs of both watchpoints. Example Let Av[31:0]be the value in the Address Value Register Am[31:0]be the value in the Address Mask Register A[31:0]be the Address Bus from the ARM7TDMI Dv[31:0]be the value in the Data Value Register Dm[31:0]be the value in the Data Mask Register D[31:0]be the Data Bus from the ARM7TDMI Cv[8:0]be the value in the Control Value Register Cm[7:0]be the value in the Control Mask Register C[9:0]be the combined Control Bus from the ARM7TDMI, other watchpoint registers and the EXTERN signal. CHAINOUT signal The CHAINOUT signal is then derived as follows: WHEN (({Av[31:0],Cv[4:0]} XNOR {A[31:0],C[4:0]}) OR {Am[31:0],Cm[4:0]} == 0xFFFFFFFFF) CHAINOUT = ((({Dv[31:0],Cv[6:4]} XNOR {D[31:0],C[7:5]}) OR {Dm[31:0],Cm[7:5]}) == 0x7FFFFFFFF) The CHAINOUT output of watchpoint register 1 provides the CHAIN input to Watchpoint 0. This allows for quite complicated configurations of breakpoints and watchpoints. Take for example the request by a debugger to breakpoint on the instruction at location YYY when running process XXX in a multiprocess system. If the current process ID is stored in memory, the above function can be implemented with a watchpoint and breakpoint chained together. The watchpoint address is set to a known memory location containing the current process ID, the watchpoint data is set to the required process ID and the ENABLE bit is set to off. The address comparator output of the watchpoint is used to drive the write enable for the CHAINOUT latch, the input to the latch being the output of the data comparator from the same watchpoint. The output of the latch drives the CHAIN input of the breakpoint comparator. The address YYY is stored in the breakpoint register and when the CHAIN input is asserted, and the breakpoint address matches, the breakpoint triggers correctly. RANGEOUT signal The RANGEOUT signal is then derived as follows: RANGEOUT = ((({Av[31:0],Cv[4:0]} XNOR {A[31:0],C[4:0]}) OR {Am[31:0],Cm[4:0]}) == 0xFFFFFFFFF) AND ((({Dv[31:0],Cv[7:5]} XNOR {D[31:0],C[7:5]}) OR {Dm[31:0],Cm[7:5]}) == 0x7FFFFFFFF) The RANGEOUT output of watchpoint register 1 provides the RANGE input to watchpoint register 0. This allows two breakpoints to be coupled together to form range breakpoints. Note that selectable ranges are restricted to being powers of 2. This is best illustrated by an example. Example If a breakpoint is to occur when the address is in the first 256 bytes of memory, but not in the first 32 bytes, the watchpoint registers should be programmed as follows: 1. Watchpoint 1 is programmed with an address value of 0x00000000 and an address mask of 0x0000001F. The ENABLE bit is cleared. All other Watchpoint 1 registers are programmed as normal for a breakpoint. An address within the first 32 bytes will cause the RANGE output to go HIGH but the breakpoint will not be triggered. 2. Watchpoint 0 is programmed with an address value of 0x00000000 and an address mask of 0x000000FF. The ENABLE bit is set and the RANGE bit programmed to match a 0. All other Watchpoint 0 registers are programmed as normal for a breakpoint. If Watchpoint 0 matches but Watchpoint 1 does not (ie the RANGE input to Watchpoint 0 is 0), the breakpoint will be
171
Disabling ICEBreaker
ICEBreaker may be disabled by wiring the DBGEN input LOW. When DBGEN is LOW, BREAKPT and DBGRQ to the core are forced LOW, DBGACK from the ARM7TDMI is also forced LOW and the IFEN input to the core is forced HIGH, enabling interrupts to be detected by ARM7TDMI. When DBGEN is LOW, ICEBreaker is also put into a lowpower mode.
Programming Restriction
The ICEBreaker watchpoint units should only be programmed when the clock to the core is stopped. This can be achieved by putting the core into the debug state. The reason for this restriction is that if the core continues to run at ECLK rates when ICEBreaker is being programmed at TCK rates, it is possible for the BREAKPT signal to be asserted asynchronously to the core. This restriction does not apply if MCLK and TCK are driven from the same clock, or if it is known that the breakpoint or watchpoint condition can only occur some time after ICEBreaker has been programmed. Note: This restriction does not apply in any event to the Debug Control or Status Registers.
ICEBreaker Timing
The EXTERN1 and EXTERN0 inputs are sampled by ICEBreaker on the falling edge of ECLK. Sufficient set-up and hold time must therefore be allowed for these signals.
172
ICEBreaker
ICEBreaker
Debug Communications Channel
ARM7TDMIs ICEbreaker contains a communication channel for passing information between the target and the host debugger. This is implemented as coprocessor 14. The communications channel consists of a 32-bit wide Comms Data Read register, a 32-bit wide Comms Data Write Register and a 6-bit wide Comms Control Register for synchronised handshaking between the processor and the asynchronous debugger. These registers live in fixed locations in ICEbreakers memory map (as shown in Table 39) and are accessed from the processor via MCR and MRC instructions to coprocessor 14. Writes the value in Rn to the Comms Data Write register MRC CP14, 0, Rd, C1, C0 Returns the Debug Data Read register into Rd Since the THUMB instruction set does not contain coprocessor instructions, it is recommended that these are accessed via SWI instructions when in THUMB state. Communications via the comms channel Communication between the debugger and the processor occurs as follows. When the processor wishes to send a message to ICEbreaker, it first checks that the Comms Data Write register is free for use. This is done by reading the Debug Comms Control register to check that the W bit is clear. If it is clear then the Comms Data Write register is empty and a message is written by a register transfer to the coprocessor. The action of this data transfer automatically sets the W bit. If on reading the W bit it is found to be set, then this implys that previously written data has not been picked up by the debugger and thus the processor must poll until the W bit is clear. As the data transfer occurs from the processor to the Comms Data Write register, the W bit is set in the Debug Comms Control register. When the debugger polls this register it sees a synchronised version of both the R and W bit. When the debugger sees that the W bit is set it can read the Comms Data Write register and scan the data out. The action of reading this data register clears the W bit of the Debug Comms Control register. At this point, the communications process may begin again. Message transfer from the debugger to the processor is carried out in a similar fashion. Here, the debugger polls the R bit of the Debug Comms Control register. If the R bit is low then the Data Read register is free and so data can be placed there for the processor to read. If the R bit is set, then previously deposited data has not yet been collected and so the debugger must wait. When the Comms Data Read register is free, data is written there via the scan chain. The action of this write sets the R bit in the Debug Comms Control register. When the processor polls this register, it sees an MCLK synchronised version. If the R bit is set then this denotes that there is data waiting to be collected, and this can be read via a CPRT load. The action of this load clears the R bit in the Debug Comms Control register. When the debugger polls this register and sees that the R bit is clear, this denotes that the data has been taken and the process may now be repeated.
The function of each register bit is described below: Bits 31:28 contain a fixed pattern which denote the ICEbreaker version number, in this case 0001. Bit 1 denotes whether the Comms Data Write register (from the processors point of view) is free. From the processors point of view, if the Comms Data Write register is free (W=0) then new data may be written. If it is not free (W=1), then the processor must poll until W=0. From the debuggers point of view, if W=1 then some new data has been written which may then be scanned out. Bit 0 denotes whether there is some new data in the Comms Data Read register. From the processors point of view, if R=1, then there is some new data which may be read via an MRC instruction. From the debuggers point of view, if R=0 then the Comms Data Read register is free and new data may be placed there through the scan chain. If R=1, then this denotes that data previously placed there through the scan chain has not been collected by the processor and so the debugger must wait. From the debuggers point of view, the registers are accessed via the scan chain in the usual way. From the processor, these registers are accessed via coprocessor register transfer instructions. The following instructions should be used: MRC CP14, 0, Rd, C0, C0 Returns the Debug Comms Control register into Rd MCR CP14, 0, Rn, C1, C0
173
ICEBreaker
174
175
Introduction
In the following tables nMREQ and SEQ (which are pipelined up to one cycle ahead of the cycle to which they apply) are shown in the cycle in which they appear, so they predict the type of the next cycle. The address, MAS[1:0], nRW, nOPC, nTRANS and TBIT (which appear up to half a cycle ahead) are shown in the cycle to which they apply. The address is incremented for prefetching of instructions in most cases. Since the instruction width is 4 bytes in ARM state and 2 bytes in THUMB state, the increment will vary accordingly. Hence the letter L is used to indicate instruction length (4 bytes in ARM state and 2 bytes in THUMB state). Similarly, MAS[1:0] will indicate the width of the instruction fetch, i=2 in ARM state and i=1 in THUMB state representing word and halfword accesses respectively.
pc is the address of the branch instruction alu is an address calculated by ARM7TDMI (alu) are the contents of that address
Note: This applies to branches in ARM and THUMB state, and to Branch with Link in ARM state only.
176
Operations
Operations
THUMB Branch with Link
A THUMB Branch with Link operation consists of two consecutive THUMB instructions, see Format 19: long branch with link . The first instruction acts like a simple data operation, taking a single cycle to add the PC to the upper part of the offset, storing the result in Register 14 (LR). The second instruction acts in a similar fashion to the ARM Branch with Link instruction, thus its first cycle calculates the final branch destination whilst performing a prefetch from the current PC. Table 43. THUMB Long Branch with Link
Cycle 1 2 3 4 Address pc + 4 pc + 6 alu alu + 2 alu + 4 MAS[1:0] 1 1 1 1 nRW 0 0 0 0 Data (pc + 4) (pc + 6) (alu) (alu + 2) nMREQ 0 0 0 0 SEQ 1 0 1 1 nOPC 0 0 0 0
The second cycle of the second instruction performs a fetch from the branch destination and the return address is stored in R14. The third cycle of the second instruction performs a fetch from the destination +2, refilling the instruction pipeline and R14 is modified (2 subtracted from it) to simplify the return to MOV PC, R14. This makes the PUSH {..,LR} ; POP {..,PC} type of subroutine work correctly. The cycle timings of the complete operation are shown in Table 43.
During the second cycle, a fetch is performed from the branch destination using the new instruction width, dependent on the state that has been selected. The third cycle performs a fetch from the destination +2 or +4 dependent on the new specified state, refilling the instruction pipeline. The cycle timings are shown in Table 44.
Notes: 1. W and w represent the instruction width before and after the BX respectively. In ARM state the width equals 4 bytes and in THUMB state the width equals 2 bytes. For example, when changing from ARM to THUMB state, W would equal 4 and w would equal 2. 2. I and i represent the memory access size before and after the BX respectively. In ARM state, the MAS[1:0] is 2 and in THUMB state MAS[1:0] is 1.
When changing from THUMB to ARM state, I would equal 1 and i would equal 2. 3. T and t represent the state of the TBIT before and after the BX respectively. In ARM state TBIT is 0 and in THUMB state TBIT is 1. When changing from ARM to THUMB state, T would equal 0 and t would equal 1.
177
Data Operations
A data operation executes in a single datapath cycle except where the shift is determined by the contents of a register. A register is read onto the A bus, and a second register or the immediate field onto the B bus. The ALU combines the A bus source and the shifted B bus source according to the operation specified in the instruction, and the result (when required) is written to the destination register. (Compares and tests do not produce results, only the ALU status flags are affected.) An instruction prefetch occurs at the same time as the above operation, and the program counter is incremented. When the shift length is specified by a register, an additional datapath cycle occurs before the above operation to copy the bottom 8 bits of that register into a holding latch in the barrel shifter. The instruction prefetch will occur during this first cycle, and the operation cycle will be internal (ie Table 45. Data Operation Instruction Cycle Operations
normal Cycle 1 Address pc+2L pc+3L pc+2L alu alu+L alu+2L pc+2L pc+3L pc+3L pc+8 pc+12 alu alu+4 alu+8 MAS[1:0] i nRW 0 Data (pc+2L) nMREQ 0 SEQ 1 nOPC 0
will not request memory). This internal cycle can be merged with the following sequential access by the memory manager as the address remains stable through both cycles. The PC may be one or more of the register operands. When it is the destination, external bus activity may be affected. If the result is written to the PC, the contents of the instruction pipeline are invalidated, and the address for the next instruction prefetch is taken from the ALU rather than the address incrementer. The instruction pipeline is refilled before any further execution takes place, and during this time exceptions are locked out. PSR Transfer operations exhibit the same timing characteristics as the data operations except that the PC is never used as a source or destination register. The cycle timings are shown below Table 45.
dest=pc
1 2 3
i i i
0 0 0
0 0 0
0 1 1
0 0 0
shift(Rs)
1 2
i i
0 0
(pc+2L) -
1 0
0 1
0 1
shift(Rs) dest=pc
1 2 3 4
2 2 2 2
0 0 0 0
1 0 0 0
0 0 1 1
0 1 0 0
Note: Shifted register with destination equals PC is not possible in THUMB state.
178
Operations
Operations
Multiply and Multiply Accumulate
The multiply instructions make use of special hardware which implements integer multiplication with early termination. All cycles except the first are internal. The cycle timings are shown in the following four tables, where m is the number of cycles required by the multiplication algorithm; see Instruction Speed Summary on page 188.
179
Load Register
The first cycle of a load register instruction performs the address calculation. The data is fetched from memory during the second cycle, and the base register modification is performed during this cycle (if required). During the third cycle the data is transferred to the destination register, and external memory is unused. This third cycle may normally Table 50. Load Register Instruction Cycle Operations
normal Cycle 1 2 3 Address pc+2L alu pc+3L pc+3L pc+8 alu pc+12 pc pc+4 pc+8 MAS[1:0] i b/h/w i nRW 0 0 0 Data (pc+2L) (alu) nMREQ 0 1 0 SEQ 0 0 1 nOPC 0 1 1 nTRANS c d c
be merged with the following prefetch to form one memory N-cycle. The cycle timings are shown below in Table 50. Either the base or the destination (or both) may be the PC, and the prefetch sequence will be changed if the PC is affected by the instruction. The data fetch may abort, and in this case the destination modification is prevented.
dest=pc
1 2 3 4 5
2 2 2 2
0 0 0 0 0
0 1 0 0 0
0 0 0 1 1
0 1 1 0 0
c d c c c
b, h and w are byte, halfword and word as defined in Table 40. c represents current mode-dependent value.
d will either be 0 if the T bit has been specified in the instruction (eg. LDRT), or c at all other times. Note: Destination equals PC is not possible in THUMB state.
Store Register
The first cycle of a store register is similar to the first cycle of load register. During the second cycle the base modificaTable 51. Store Register Instruction Cycle Operations
Cycle 1 2 Address pc+2L alu pc+3L MAS[1:0] i b/h/w nRW 0 1 Data (pc+2L) Rd nMREQ 0 0 SEQ 0 0 nOPC 0 1 nTRANS c d
tion is performed, and at the same time the data is written to memory. There is no third cycle. The cycle timings are shown below in Table 51.
b, h and w are byte, halfword and word as defined in Table 40. c represents current mode-dependent value d will either be 0 if the T bit has been specified in the instruction (eg. SDRT), or c at all other times.
180
Operations
Operations
Load Multiple Registers
The first cycle of LDM is used to calculate the address of the first word to be transferred, whilst performing a prefetch from memory. The second cycle fetches the first word, and performs the base modification. During the third cycle, the first word is moved to the appropriate destination register while the second word is fetched from memory, and the modified base is latched internally in case it is needed to patch up after an abort. The third cycle is repeated for subsequent fetches until the last data word has been accessed, then the final (internal) cycle moves the last word to its destination register. The cycle timings are shown in Table 52. The last cycle may be merged with the next instruction prefetch to form a single memory N-cycle. If an abort occurs, the instruction continues to completion, but all register writing after the abort is prevented. The final cycle is altered to restore the modified base register (which may have been overwritten by the load activity before the abort occurred). When the PC is in the list of registers to be loaded the current instruction pipeline must be invalidated. Note: The PC is always the last register to be loaded, so an abort at any point will prevent the PC from being overwritten. Note: LDM with destination = PC cannot be executed in THUMB state. However POP{Rlist,PC} equates to an LDM with destination=PC.
1 register dest=pc
1 2 3 4 5
i 2 i i i
0 0 0 0 0
0 1 0 0 0
0 0 0 1 1
0 1 1 0 0
n registers (n>1)
1 2 n n+1 n+2
i 2 2 2 2 i
0 0 0 0 0 0
0 0 0 0 1 0
0 1 1 1 0 1
0 1 1 1 1 1
i 2 2 2 2 i i i
0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0
0 1 1 1 0 0 1 1
0 1 1 1 1 1 0 0
181
forward here, as there is no wholesale overwriting of registers. The cycle timings are shown in Table 53 below.
nRW 0 1 Data (pc+2L) Ra nMREQ 0 0 SEQ 0 0 nOPC 0 1
n registers (n>1)
1 2 n n+1
i 2 2 2 2
0 1 1 1 1
(pc+2L) Ra R R R
0 0 0 0 0
0 1 1 1 0
0 1 1 1 1
Data Swap
This is similar to the load and store register instructions, but the actual swap takes place in cycles 2 and 3. In the second cycle, the data is fetched from external memory. In the third cycle, the contents of the source register are written out to the external memory. The data read in cycle 2 is written into the destination register during the fourth cycle. The cycle timings are shown below in Table 54. Table 54. Data Swap Instruction Cycle Operations
Cycle 1 2 3 4 Address pc+8 Rn Rn pc+12 pc+12 MAS[1:0] 2 b/w b/w 2 nRW 0 0 1 0 Data (pc+8) (Rn) Rm nMREQ 0 0 1 0 SEQ 0 0 0 1 nOPC 0 1 1 1 LOCK 0 1 1 0
The LOCK output of ARM7TDMI is driven HIGH for the duration of the swap operation (cycles 2 and 3) to indicate that both cycles should be allowed to complete without interruption. The data swapped may be a byte or word quantity (b/w). The swap operation may be aborted in either the read or write cycle, and in both cases the destination register will not be affected.
b and w are byte and word as defined in Table 40. Note: Data swap cannot be executed in THUMB state.
182
Operations
Operations
Software Interrupt and Exception Entry
Exceptions (and software interrupts) force the PC to a particular value and refill the instruction pipeline from there. During the first cycle the forced address is constructed, and a mode change may take place. The return address is moved to R14 and the CPSR to SPSR_svc. Table 55. Software Interrupt Instruction Cycle Operations
Cycle 1 2 3 Address pc+2L Xn Xn+4 Xn+8 MAS[1:0] i 2 2 nRW 0 0 0 Data (pc+2L) (Xn) (Xn+4) nMREQ 0 0 0 SEQ 0 1 1 nOPC 0 0 0 nTRANS C 1 1 Mode old mode exception mode exception mode TBIT T 0 0
During the second cycle the return address is modified to facilitate return, though this modification is less useful than in the case of branch with link. The third cycle is required only to complete the refilling of the instruction pipeline. The cycle timings are shown below in Table 55.
C represents the current mode-dependent value. T represents the current state-dependent value pc for software interrupts is the address of the SWI instruction. for exceptions is the address of the instruction following the last one to be executed before entering the
exception. for prefetch aborts is the address of the aborting instruction. for data aborts is the address of the instruction following the one which attempted the aborted data transfer. Xn is the appropriate trap address.
not ready
1 2 n
0 0 0 0
2 2 2 2
(pc+8) -
1 1 1 0
0 0 0 0
0 1 1 1
0 0 0 0
0 0 0 0
1 1 1 0
183
1 2 n n+1
2 2 2 2 2
0 0 0 0 0
(pc+8) (alu)
1 1 1 0 0
0 0 0 0 0
0 1 1 1 1
0 0 0 0 1
0 0 0 0 1
1 1 1 0 1
1 2 n n+1
2 2 2 2 2
0 0 0 0 0
0 0 0 0 0
0 1 1 1 0
0 1 1 1 1
0 1 1 1 1
0 0 0 0 1
0 0 0 0 1
2 2 2 2 2 2 2
0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0
0 0 0 0 1 1 1 0
0 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 1
1 1 1 0 0 0 0 1
184
Operations
Operations
Coprocessor Data Transfer (from coprocessor to memory)
The ARM7TDMI controls these instructions exactly as for memory to coprocessor transfers, with the one exception
Cycle 1 register ready 1 register not ready 1 2 1 2 n n+1 1 2 n n+1 1 2 n n+1 n+m n+m+1 Address pc+8 alu pc+12 pc+8 pc+8 pc+8 pc+8 alu pc+12 pc+8 alu alu+ alu+ alu+ pc+12 pc+8 pc+8 pc+8 pc+8 alu alu+ alu+ alu+ pc+12 MAS [1:0] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 nRW 0 1 0 0 0 0 1 0 1 1 1 1 0 0 0 0 1 1 1 1 Data (pc+8) CPdata (pc+8) CPdata (pc+8) CPdata CPdata CPdata CPdata (pc+8) CPdata CPdata CPdata CPdata
that the nRW line is inverted during the transfer cycle. The cycle timings are show in Table 58.
nMREQ 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 SEQ 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 nOPC 0 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 nCPI 0 1 0 0 0 0 1 0 1 1 1 1 0 0 0 0 1 1 1 1 CPA 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 CPB 0 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1
185
third cycle may be merged with the following prefetch cycle into one memory N-cycle as with all ARM7TDMI register load instructions. The cycle timings are shown in Table 59.
nMREQ 1 1 0 SEQ 1 0 1 nOPC 0 1 1 nCPI 0 1 1 CPA 0 1 CPB 0 1 -
not ready
1 2 n n+1 n+2
2 2 2 2 2 2
0 0 0 0 0 0
(pc+8) CPdata -
1 1 1 1 1 0
0 0 0 1 0 1
0 1 1 1 1 1
0 0 0 0 1 1
0 0 0 0 1 -
1 1 1 0 1 -
not ready
1 2 n n+1
2 2 2 2 2
0 0 0 0 1
(pc+8) Rd
1 1 1 1 0
0 0 0 1 0
0 1 1 1 1
0 0 0 0 1
0 0 0 0 1
1 1 1 0 1
186
Operations
Operations
Undefined Instructions and Coprocessor Absent
When a coprocessor detects a coprocessor instruction which it cannot perform, and this must include all undefined instructions, it must not drive CPA or CPB LOW. These will Table 61. Undefined Instruction Cycle Operations
Cycle 1 2 3 4 Address pc+2L pc+2L Xn Xn+4 Xn+8 MAS [1:0] i i 2 2 nRW 0 0 0 0 Data (pc+2L) (Xn) (Xn+4) nMRE Q 1 0 0 0 SE Q 0 0 1 1 nOPC 0 0 0 0 nCPI 0 1 1 1 CP A 1 1 1 1 CPB 1 1 1 1 nTRANS C C 1 1 Mode Old Old 00100 00100 TBIT T T 0 0
remain HIGH, causing the undefined instruction trap to be taken. Cycle timings are shown in Table 61.
C represents the current mode-dependent value. T represents the current state-dependent value.
Unexecuted Instructions
Any instruction whose condition code is not met will fail to execute. It will add one cycle to the execution time of the code segment in which it is embedded (see Table 62). Table 62. Unexecuted Instruction Cycle Operations
Cycle 1 Address pc+2L pc+3L MAS[1:0] i nRW 0 Data (pc+2L) nMREQ 0 SEQ 1 nOPC 0
187
188
Operations
Timing Diagrams
This sections presents the timing diagrams for the ARM7TDMI Core. The delays shown in these timing diagrams are all process specific. For the corresponding characterized values, refer to one of the following datasheets: ARM7TDMI Embedded Core ATC50 Electrical Characteristics (0.5 micron three-layer-metal CMOS process intended for use with a supply voltage of 3.3V 0.3V, previously known as AT55K) ARM7TDMI Embedded Core ATC50/E2 Electrical Characteristics (0.5 micron three-layer-metal CMOS/NVM process intended for use with a supply voltage of 3.3V 0.3V, previously known as AT55.8K) ARM7TDMI Embedded Core ATC35 Electrical Characteristics (0.35 micron three-layer-metal CMOS process intended for use with a supply voltage of 3.3V 0.3V, previously known as AT56K)
Timing Diagrams
189
Timing Diagrams
Figure 89. General Timings
MCLK
A[31:0] Tah Taddr nRW Trwh Trwd MAS[1:0], LOCK Tblh Tbld nM[4:0], nTRANS TBIT Tmdh Tmdd nOPC Topch Topcd nMREQ, SEQ Tmsh Tmsd nEXEC Texh Texd
Note:
nWAIT, APE, ALE and ABE are all HIGH during the cycle shown. Tcdel is the delay (on either edge) from MCLK changing to ECLK changing.
190
Timing Diagrams
Timing Diagrams
Figure 90. ALE Address Control
MCLK
Note:
Tald is the time by which ALE must be driven LOW in order to latch the current address in phase 2. If ALE is driven low after Tald, then a new address will be latched.
MCLK
Tald
ABE A[31:0], nRW, LOCK, nOPC, nTRANS, MAS[1:0] Tabz Tabe Taddr
191
Note:
Note:
192
Timing Diagrams
Timing Diagrams
Figure 95. Data Bus Control
MCLK
nENIN Tdbe
Note:
The cycle shown is a data write cycle since nENOUT was driven LOW during phase 1. Here, DBE has first been used to modify the behaviour of the data bus, and then nENIN.
MCLK
TBE
Ttbz
Ttbe
193
MCLK Tnen
Tdoutu
nENOUT
194
Timing Diagrams
Timing Diagrams
Figure 100. Coprocessor Timing
MCLK Tcpi nCPI Tcps CPA, CPB Tcpih
Tcph
Note:
Normally, nMREQ and SEQ become valid Tmsd after the falling edge of MCLK. In this cycle the ARM has been busy-waiting, waiting for a coprocessor to complete the instruction. If CPA and CPB change during phase 1, the timing of nMREQ and SEQ will depend on Tcpms. Most systems should be able to generate CPA and CPB during the previous phase 2, and so the timing of nMREQ and SEQ will always be Tmsd.
Note:
Tis/Trs guarantee recognition of the interrupt (or reset) source by the corresponding clock edge. Tim/Trm guarantee non-recognition by that clock edge. These inputs may be applied fully asynchronously where the exact cycle of recognition is unimportant.
195
Note:
BREAKPT changing in the LOW phase of MCLK to signal a watchpointed store can affect nCPI, nEXEC, nMREQ, and SEQ in the LOW phase of MCLK.
196
Timing Diagrams
Timing Diagrams
Figure 105. MCLK Timing
MCLK Tmckl nWAIT Tws ECLK Twh Tmckh
Note:
The ARM core is not clocked by the HIGH phase of MCLK enveloped by nWAIT. Thus, during the cycles shown, nMREQ and SEQ change once, during the first LOW phase of MCLK, and A[31:0] change once, during the second HIGH phase of MCLK. For reference, ph2 is shown. This is the internal clock from which the core times all its activity. This signal is included to show how the high phase of the external MCLK has been removed from the internal core clock.
197
Atmel Headquarters
Corporate Headquarters
2325 Orchard Parkway San Jose, CA 95131 TEL (408) 441-0311 FAX (408) 487-2600
Atmel Operations
Atmel Colorado Springs
1150 E. Cheyenne Mtn. Blvd. Colorado Springs, CO 80906 TEL (719) 576-3300 FAX (719) 540-1759
Europe
Atmel U.K., Ltd. Coliseum Business Centre Riverside Way Camberley, Surrey GU15 3YL England TEL (44) 1276-686677 FAX (44) 1276-686697
Atmel Rousset
Zone Industrielle 13106 Rousset Cedex, France TEL (33) 4 42 53 60 00 FAX (33) 4 42 53 60 01
Asia
Atmel Asia, Ltd. Room 1219 Chinachem Golden Plaza 77 Mody Road Tsimshatsui East Kowloon, Hong Kong TEL (852) 27219778 FAX (852) 27221369
Japan
Atmel Japan K.K. Tonetsu Shinkawa Bldg., 9F 1-24-8 Shinkawa Chuo-ku, Tokyo 104-0033 Japan TEL (81) 3-3523-3551 FAX (81) 3-3523-7581
Fax-on-Demand
North America: 1-(800) 292-8635 International: 1-(408) 441-0732
e-mail
[email protected]
Web Site
http://www.atmel.com
BBS
1-(408) 436-4309
Atmel Corporation 1999. Atmel Corporation makes no warranty for the use of its products, other than those expressly contained in the Companys standard warranty which is detailed in Atmels Terms and Conditions located on the Companys website. The Company assumes no responsibility for any errors which may appear in this document, reserves the right to change devices or specifications detailed herein at any time without notice, and does not make any commitment to update the information contained herein. No licenses to patents or other intellectual property of Atmel are granted by the Company in connection with the sale of Atmel products, expressly or by implication. Atmels products are not authorized for use as critical components in life suppor t devices or systems. Marks bearing
and/or
are registered trademarks and trademarks of Atmel Corporation. Printed on recycled paper.
Rev. 0673B01/99