Assembly Language Fundamentals
Assembly Language Fundamentals
Assembly Language Fundamentals
Constants and expressions Numeric Literal: A combination of digits, optionally a sign, decimal point and an exponent. Examples of numeric literals:
5 5.5 -5.5 26.5E+05
Integer constants can end with an uppercase or lowercase radix (base) symbol: h = hexadecimal, q (or o) = octal, d = decimal (default), b == binary. A constant expression consists of a well-defined combination of numeric literals, operators, and defined symbolic constants. Examples of integer constants and constant expressions:
Integer constants 26 decimal 1Ah hexadecimal 1101b binary 36D decimal Constant expressions
5 26.5 4 * 20 -3 * 4 / 6
String or character constants are embedded in single or double quotes. Embedded quotes are allowed. Examples of string and character constants:
'ABC' 'X' "This is a message" '4042' "This isn't a test" 'Say "hello" to John'
A variable is a location in a program's data area that has been assigned a name. For example:
count1 db 50 ; count1 is a variable (memory allocation)
A label serves as a place marker when a program needs to jump or loop from one location to another. A label can be followed by a blank line, or can be on a line with an instruction. In the following example, Label1 and Label2 are labels identifying locations in a program:
Label1: mov mov : : Label2: jmp ax, 0 bx, 0
A keyword always has some predefined meaning to the assembler. It can be an instruction, or a directive. Examples are MOV, PROC, TITLE, ADD, AX, and END. Keywords cannot be used out of context. In the following example, the label add is a syntax error:
add: mov ax, 10 ; Error because add cannot be used as label
Statements An assembly language statement is either an instruction (executable statement) or a directive (provide information on how to generate code):
[<label:>] <mnemonic> [<operands>] [; <comment>]
Statements are freeform with white space between each component. Cannot be longer than 128 characters, but can be extended to the following line if the last character is \ (backslash). Examples of instructions shown by category:
; ; ; ; ;
transfer of control data transfer arithmetic logical (jump if Zero flag was set) input/output (reads from hardware port)
Sample Program 1
The following example shows assembly program that displays the traditional "Hello World" message. The first line contains the TITLE directive; all characters on this line are treated as comments, as well as the next line. Segments are the building blocks of programs: The code segment is where the program instructions are stored; the data segment contains all the variables, and the stack segment contains the program's runtime stack. The stack is a special area in memory that the program uses when calling and returning from subroutines. Example 1: The Hello World program:
title Hello World program (hello.asm)
; This program displays "Hello world" .model small ; The '.' precedes assembler directives .stack 100h ; Allocate stack size .data HelloMess db 'Hello, World',13,10,'$' .code main proc mov ax, @data mov ds, ax ;Set DS to point to data segment mov ah, 9 ;Print string function mov dx, OFFSET HelloMess ;Point to "Hello World" int 21h ;Display "Hello World" mov ah, 4C00h ;Terminate program function int 21h ;Terminate the program main endp end main
Description of important lines in program: The .model small directive indicates that the program uses no more than 64K memory for code and 64K for data. The .stack directive sets aside 100h (256) bytes of stack space for the program. The .data directive marks the beginning of the data segment where variables are stored.
The HelloMess variable is declared to hold the string "Hello, World", along with two bytes containing the newline character sequence (13, 10). The '$' is a required string terminator character needed by the output subroutine used further. The .code directive marks the beginning of the code segment, where the executable instructions are located. The proc directive declares the beginning of a procedure. Here, the procedure is called main. The first two statements in the main procedure copy the address of the data segment (@data) into the DS register. The mov instruction always have two operands; first the destination, then the source. (I.e., mov <destination>,<source>) Next, a character string is written to the screen. This is done by calling the function that displays a string whose address is in the DX register. First, the function number is placed in the AH register (function number 9). Then, the offset address of the start of the string (indicated by the variable HelloMess) is copied to the DX register. Thirdly, an call to interrupt vector 21h is made. The last two statements (mov ah, 4C00h, and int 21h) halts the program and returns control to the operating system. The statement main endp uses the endp directive to mark the end of the procedure main. Procedures may not overlap. The end of the program contains the end directive which is the last line to be assembled. The label main next to it identifies the location of the entry point - that is, the point at which the CPU starts to execute the program. Standard Assembly Directives
Directive end endp page proc title .code .data .model .stack Description End of program assembly (Required) End of procedure (Required by proc directive) Set a page format for the listing file (Optional) Begin procedure (Optional) Title of the listing file (Optional) Marks the start of the code segment (Required) Marks the start of the data segment (Required) Specifies the program's memory model (Highly recommended) Sets the size of the stack segment (Required)
A text editor is used to produce the ASCII source file. The assembler reads the source file and produces an object file which is a machine-language translation of the program. The object file may contain several links to subroutines in an external link library. The linker then copies the needed subroutines from the link library into the object file, creates a special header record at the beginning of the program, and produces an executable program. The assembler can optionally produce a listing file, which is a copy of the program's source file (suitable for printing) with line numbers and translated machine code. The linker can optionally produce a map file, which contains information about the program's code, data and stack segments. A link library is a file containing subroutines that are already compiled into machine language. The table below shows a list of the filenames that would be created if we assembled and linked the program above.
Filename hello.asm hello.obj hello.lst hello.exe hello.map Description Source program Object program Listing file Executable program Map file When/how created Text Editor Assembler Assembler Linker Linker
With Borland Turbo Assembler (TASM), the command to assemble the program would be:
C:\> tasm /l/n/z hello
The /l/n (slash el, slash n) options produce a listing file, and the /z option is used to show source lines with errors. If there are no assembly errors, the screen output during assembly may look like this:
Turbo Assembler Version 4.1 Copyright (c) 1988, 1996 Borland International Assembling file: hello.ASM Error messages: None Warning messages: None Passes: 1 Remaining memory: 418k
This will produce the object file hello.obj and the listing file hello.lst. To link the object file the command will be:
C:\> tlink /3/m/v hello
This will produce the executable file hello.exe and map file hello.map. The /3 option allows the use of 32-bit registers; the /m option creates a map file, and the /voption includes debugging information in the executable program. To run the program, simply type:
C:> hello
Using Microsoft Assembler (MASM) The Microsoft Assembler package contains the ML.EXE program, which assembles and links one or more assembly language source files, producing an object file (*.obj), and an executable file (*.exe). The general syntax is:
ML options filename.ASM
Each command line option must be precede by at least one space. For example, the following commands assemble and link hello.asm with different options:
ML ML ML ML
; ; ; ;
include debugging information produce a listing file (hello.lst) produce a map file (hello.map) use MASM 5.12 compatibility mode
The following command assembles hello.asm and links hello.obj to the link library linkfile.lib in the C:\MASM directory:
ML /Zi /Zm /Fm /Fl hello.asm /link /co c:\MASM\linkfile
Sample Program 2
Create the assembler program file reverse.asm for the following code that displays a user-entered string in reverse. Assemble, link and run the executable file in the CodeView debugger: Example 2: The Reverse String program:
title Reverse String program (reverse.asm)
; This program displays a user-entered string in reverse .model small .stack 100h .data MAX_STRING_LENGTH EQU 1000 StringToReverse DB MAX_STRING_LENGTH DUP(?) ReverseString DB MAX_STRING_LENGTH DUP(?) .code main proc mov ax, @Data mov ds, ax ; Set DS to point to data segment mov ah, 3 ; Standard input handle mov cx, MAX_STRING_LENGTH ; Read to MAX chars mov dx, OFFSET StringToReverse ; Store string here int 21h ; Get the string and ax, ax ; Read any characters? jz Done ; No, so done mov cx,ax ; Put string length in CX, where ; you can save it as a counter push cx ; Save the string length mov bx, OFFSET StringToReverse mov si, OFFSET ReverseString add si, cx dec si ; Point to the end of the
; ReverseLoop: mov al, [bx] ; mov [si], al ; inc bx ; dec si ; loop ReverseLoop ; pop cx ; mov ah, 40h ; mov bx, 1 ; mov dx, OFFSET ReverseString int 21h ; Done: mov ah, 4ch ; int 21h ; main endp end main
reverse string buffer Go to the next character Store the characters in reverse Point to the next character Point to previous location buffer Move next character, if any Get back the string length Write from handle function # Standard output handle Print the reversed string Terminate program function Terminate the program