SS Material
SS Material
SS Material
Software
A set of instructions to perform specific tasks is called a program, and the collection of one or many
programs for a specific purpose is termed as computer software or, simply, software. These
instructions can be on internal command or an external input received from devices such as a
mouse or keyboard.
Characteristics:
The software characteristics include performance, portability, and functionality. Developing any
software requires the understanding of the following software quality factors:
Operational characteristics: These include characteristics such as correctness,
usability/learnability, integrity, reliability, efficiency, security, and safety.
Transitional characteristics: These include interoperability, reusability, and portability.
Revision characteristics: These are characteristics related to 'interior quality' of software
such as efficiency, documentation, and structure. Various revision characteristics of
software are maintainability, flexibility, extensibility, scalability, testability, and modularity.
Software Hierarchy
Broadly, computer software includes various computer programs, system libraries and their
associated documentation. Based on the nature of the task and goal, computer software can be
classified into application software, utility software, and system software.
User
Application Software
Utility
Software
System Software
(Including OS)
Hardware
|System Programming 1
1 – Overview of System Software
a tool and enables the end user to perform specific and productive tasks. There are different
types of application software based on the range of tasks performed by the computer.
Utility software: This software is designed for users to assist in maintenance and monitoring
activities. These include anti-virus software, firewalls, and disk utilities. They help maintain
and protect system but do not directly interface with the hardware.
System software: System software can be viewed as software that logically binds
components of a computer to work as a single unit and provides the infrastructure over
which programs can operate. It is responsible for controlling computer hardware and other
resources, and allows the application software to interact with computers to perform their
tasks. System software includes operating system, device drivers, language translators, etc.
Some specific system software are assemblers, linkers, loaders, macro processors, text
editors, compilers, operating system, debugging system, source code control system, etc.
System Programming
System programming is characterized by the fact that it is aimed at producing system software that
provides services to the computer hardware or specialized system services. Many a time, system
programming directly deals with the peripheral devices with a focus on input, process (storage),
and output.
The essential characteristics of system programming are as follows:
Programmers are expected to know the hardware and internal behavior of the computer
system on which the program will run. System programmers explore these known hardware
properties and write software for specific hardware using efficient algorithms.
Uses a low level programming language or some programming dialect.
Requires little runtime overheads and can execute in a resource-constrained environment.
These are very efficient programs with a small or no runtime library requirements.
Has access to systems resources, including memory
Can be written in assembly language
The following are the limiting factors of system programming:
Many times, system programs cannot be run in debugging mode.
Limited programming facility is available, which requires high skills for the system
programmer.
Less powerful runtime library (if available at all), with less error-checking capabilities.
|System Programming 2
1 – Overview of System Software
Machine Structure
A generic computer system comprises hardware components, collection of system programs, and a
set of application programs.
Hospital Application
Banking System Web Browser
Management Programs
Command
Compilers Editors System
Interpreters
Programs
Operating System
Machine Language
Microarchitecture Hardware
Physical devices
The traditional Von Neumann architecture describes stored-program computer, which does
not allow fetching of instructions and data operations to occur at the same time. The reason
is that the architecture uses a commonly shared bus to access both. This has led to limiting
the performance of the system. The structure of a typical Von Neumann machine is shown
in Figure:
Memory
Accumulator
Output Input
|System Programming 3
1 – Overview of System Software
ALU
Data Instruction
Control Unit
Memory Memory
I/O
Interfaces
An interface is defined as a border or an entry point across which distinct components of a digital
computing system interchange data and information. There are three types of interfaces - software,
hardware, and user interfaces.
Types of Interface
1) Software Interface
Software interface comprises a set of statements, predefined functions, user options,
and other methods of conveying instructions and data obtained from a program or
|System Programming 4
1 – Overview of System Software
language for programmers.
Access to resources including CPU, memory and storage, etc., is facilitated by software
interfaces for the underlying computer system.
While programming, the interface between software components makes use of program
and language facilities such as constants, various data types, libraries and procedures,
specifications for exception, and method handling.
Operating system provides the interface that allows access to the system resources from
applications. This interface is called Application Programming Interface (API). These APIs
contain the collection of functions, definitions for type, and constants, and also include
some variable definitions. While developing software applications, the APIs can be used
to access and implement functionalities.
2) Hardware Interface
Hardware interfaces are primarily designed to exchange data and information among
various hardware components of the system, including internal and external devices.
This type of interface is seen between buses, across storage devices and other I/O and
peripherals devices.
A hardware interface provides access to electrical, mechanical, and logical signals and
implements signaling protocols for reading and sequencing them.
These hardware interfaces may be designed to support either parallel or serial data
transfer or both. Hardware interfaces with parallel implementations allow more than
one connection to carry data simultaneously, while serial allows data to be sent one bit
at a time.
One of the popular standard interfaces is Small Computer System Interface (SCSI) that
defines the standards for physically connecting and communicating data between
peripherals and computers.
3) User Interface
User interface allows interaction between a computer and a user by providing various
modalities of interaction including graphics, sound, position, movement, etc. These
interfaces facilitate transfer of data between the user and the computing system. User
interface is very important for all systems that require user inputs.
Address Space
The amount of space allocated for all possible addresses for data and other computational entities
is called address space. The address space is governed by the architecture and managed by the
operating system. The computational entities such as a device, file, server, or networked computer
all are addressed within space.
There are two types of address space namely,
1. Physical Address Space
Physical address space is the collection of all physical addresses produced by a computer
program and provided by the hardware. Every machine has its own physical address space
with its valid address range between 0 and some maximum limits supported by the
|System Programming 5
1 – Overview of System Software
machine.
2. Logical Address Space
Logical address space is generated by the CPU or provided by the OS kernel. It is also
sometimes called virtual address space. In the virtual address space, there is one address
space per process, which may or may not start at zero and extend to the highest address.
Computer Languages
Computer needs language to communicate across its components and devices and carry out
instructions. A computer acts on a specific sequence of instructions written by a programmer for a
specific job. This sequence of instructions is known as a program.
Computer
Languages
|System Programming 6
1 – Overview of System Software
Writing code using machine language is time consuming, cumbersome, and
complicated.
Debugging of programs is difficult.
2) Assembly Language
It is a kind of low-level programming language, which uses symbolic codes or mnemonics as
instruction. Some examples of mnemonics include ADD, SUB, LDA, and STA that stand for
addition, subtraction, load accumulator, and store accumulator, respectively. The processing
of an assembly language program is done by using a language translator called assembler
that translates assembly language code into machine code. Assembly language program
needs to be translated into equivalent machine language code (binary code) before
execution. The advantages of assembly language are as follows:
Due to use of symbolic codes (mnemonics), an assembly program can be written
faster.
It makes the programmer free from the burden of remembering the operation codes
and addresses of memory location.
It is easier to debug.
The disadvantages of assembly language are as follows:
As it is a machine-oriented language, it requires familiarity with machine
architecture and understanding of available instruction set.
Execution in an assembly language program is comparatively time consuming
compared to machine language. The reason is that a separate language translator
program is needed to translate assembly program into binary machine code.
|System Programming 7
1 – Overview of System Software
stage, which exhibits the behavior specified in the program.
Every source program goes through a life cycle of several stages.
Edit time: It is the phase where editing of the program code takes place and is also known as
design time. At this stage, the code is in its raw form and may not be in a consistent state.
Compile time: At the compile time stage, the source code after editing is passed to a
translator that translates it into machine code. One such translator is a compiler. This stage
checks the program for inconsistencies and errors and produces an executable file.
Distribution time: It is the stage that sends or distributes the program from the entity
creating it to an entity invoking it. Mostly executable files are distributed.
Installation time: Typically, a program goes through the installation process, which makes it
ready for execution within the system. The installation can also optionally generate calls to
other stages of a program's life cycle.
Link time: At this stage, the specific implementation of the interface is linked and associated
to the program invoking it. System libraries are linked by using the lookup of the name and
the interface of the library needed during compile time or throughout the installation time,
or invoked with the start or even during the execution process.
Load time: This stage actively takes the executable image from its stored repositories and
places them into active memory to initiate the execution. Load time activities are influenced
by the underlying operating system.
Run time: This is the final stage of the life cycle in which the programmed behavior of the
source program is demonstrated.
Source Object
Loader
Image
Program Code
Load
Module
|System Programming 9
1 – Overview of System Software
System Software
Operating Systems
Target Machine
Code
Utility Programs
Loader
Relocatable
Language Machine Code
Translators
Modified Source
Assemblers Program
Compiler
Interpreter
Preprocessor
|System Programming 10
2 – Language Processor
Procedure oriented language: Procedure oriented language provides general purpose facilities
required in most application domains. Such a language is independent of specific application
domains and results in a large specification gap which has to be bridged by an application
designer.
Forward Reference: A forward reference of a program entity is a reference to the entity in
some statement of the program that occurs before the statement containing the definition or
declaration of the entity.
Language processor pass: A Language processor pass is the processing of every statement in a
source program, or in its equivalent representation, to perform a language processing function
(a set of language processing functions).
Intermediate representation (IR): An intermediate representation is a representation of a
source program which reflects the effect of some, but not all analysis and synthesis functions
performed during language processing.
An intermediate representation should have the following three properties:
1. Ease of use: It should be easy to construct the intermediate representation and analyze
it.
2. Processing efficiency: Efficient algorithms should be available for accessing the data
structures used in the intermediate representation.
3. Memory efficiency: The intermediate representation should be compact so that it does
not occupy much memory.
Program
specification
Program Target
generator Program
2. Program Execution
Two popular models for program execution are translation and interpretation.
Translation
The program translation model bridges the execution gap by translating a program
written in PL, called source program, into an equivalent program in machine or assembly
| System Programming 2
2 – Language Processor
Interpretation
The interpreter reads the source program and stores it in its memory.
The CPU uses the program counter (PC) to note the address of the next instruction to be
executed.
The statement would be subjected to the interpretation cycle, which could consist the
following steps:
1. Fetch the instruction
2. Analyze the statement and determine its meaning, the computation to be performed
and its operand.
3. Execute the meaning of the statement.
Interpreter Memory
PC Source
prog.
Error Data
Intermediate
representation (IR)
The first pass performs analysis of the source program and reflects its results in the
intermediate representation.
The second pass reads and analyzes the intermediate representation to perform synthesis
| System Programming 3
2 – Language Processor
a +
b i
3. Semantic Analysis
Semantic analysis determines the meaning of a statement by applying the semantic rules to
the structure of the statement.
While processing a declaration statement, it adds information concerning the type, length
and dimensionality of a symbol to the symbol table.
While processing an imperative statement, it determines the sequence of actions that
would have to be performed for implementing the meaning of the statement and
represents them in the intermediate code.
Considering the tree structure for the statement a = b + i
| System Programming 4
2 – Language Processor
o If node is operand, then type of operand is added in the description field of operand.
o While evaluating the expression the type of b is real and i is int so type of i is
converted to real i*.
=
=
=
+
a, real a, real
The analysis ends when the tree has been completely processed.
Intermediate representation
IR contains intermediate code and table.
Symbol table
| System Programming 5
2 – Language Processor
MOVEM AREG, A
Symbol tables
An identifier used in the source program is called a symbol. Thus, names of variables,
functions and procedures are symbols.
A language processor uses the symbol table to maintain the information about attributes of
symbols used in a source program.
It performs the following four kinds of operations on the symbol table:
1. Add a symbol and its attributes: Make a new entry in the symbol table.
2. Locate a symbol’s entry: Find a symbol’s entry in the symbol table.
3. Delete a symbol’s entry: Remove the symbol’s information from the table.
4. Access a symbol’s entry: Access the entry and set, modify or copy its attribute
information.
The symbol table consists of a set entries organized in memory.
Two kinds of data structures can be used for organizing its entries:
o Linear data structure: Entries in the symbol table occupy adjoining areas of memory.
This property is used to facilitate search.
o Nonlinear data structure: Entries in the symbol table do no occupy contiguous areas
of memory. The entries are searched and accessed using pointers.
| System Programming 6
2 – Language Processor
| System Programming 7
2 – Language Processor
If the symbol table has 'n' names, the insertion of new name will take effort proportional to 'n' and
to insert 'n' names with 'm' information, the total effort is 'cn(n+m)', where 'c' is a constant
representing the time necessary for a few machine operations.
The advantage of using list is that it takes minimum possible space. On the other hand, it may suffer
for performance for larger values of 'n' and 'm’.
Self-Organizing List
Searching in symbol table takes most of the time during symbol table management process.
The pointer field called 'LINK' is added to each record, and the search is controlled by the order
indicated by the ‘LINK’.
A pointer called 'FIRST' can be used to designate the position of the first record on the linked list,
and each 'LINK' field indicates the next record on the list.
Self-organizing list is advantageous over simple list implementation in the sense that frequently
referenced name variables will likely to be at the top of the list.
If the access is random, the self-organizing list will cost time and space.
Search Trees
Symbol tables can also be organized as binary tree organization with two pointer fields, namely,
'LEFT' and 'RIGHT' in each record that points to the left and right subtrees respectively.
The left subtree of the record contains only records with names less than the current records name.
The right subtree of the node will contain only records with name variables greater than the current
name.
The advantage of using search tree organization is that it proves efficient in searching operations,
which are the most performed operations over the symbol tables.
A binary search tree gives both performance compared to list organization at some difficulty in
implementation.
Stack Allocation
Stack is a linear data structure that satisfies last-in, first-out (UFO) policy for its allocation
and deallocation.
This makes only last element of the stack accessible at any time.
Implementing stack data structure requires use of Stack Base (SB) that points to first entry
of stack, and a Top of Stack (TOS) pointer to point to last entry allocated to stack.
| System Programming 8
2 – Language Processor
Heap Allocation
Heaps are a kind of non-linear data structure that permits allocation and deallocation of
entities in any (random) order as needed.
Heap data structure returns a pointer to allocated and deallocated area in heap for an
allocation request.
Hence, an allocated entity must maintain a pointer to the memory area allocated to it.
Space allocation strategy using heaps is optimized for performance by maintaining list of
free areas and implementing policies such as first-fit and best-fit for new object allocation.
| System Programming 9
3 – Assemblers
Statement format
An assembly language statement has the following format:
[Label] <Opcode> <operand specification>[,<operand specification>..]
where the notation [..] indicates that the enclosed specification is optional. If a label is specified in a
statement, it is associated as a symbolic name with the memory word generated for the statement.
If more than one memory word is generated for a statement, the label would be associated with
the first of these memory words.
<operand specification> has the following syntax:
<symbolic name> [± <displacement> ] [(<index register>)]
Thus, some possible operand forms are as follows:
The operand AREA refers to the memory word with which the name AREA is associated.
The operand AREA+5 refers to the memory word that is 5 words away from the word with
the name AREA. Here '5' is the displacement or offset from AREA.
The operand AREA(4) implies indexing the operand AREA with index register 4—that is, the
operand address is obtained by adding the contents of index register 4 to the address of
AREA.
The operand AREA+5 (4) is a combination of the previous two specifications.
2. Declaration statement
Declaration statements are for reserving memory for variables.
The syntax of declaration statement is as follow:
[Label] DS <constant>
*Label+ DC ‘<value>’
DS: stands for Declare storage, DC: stands for Declare constant.
The DS statement reserves area of memory and associates name with them.
A DS 10
Above statement reserves 10 word of memory for variable A.
The DC statement constructs memory words containing constants.
ONE DC ‘1’
Above statement associates the name ONE with a memory word containing the value ‘1’
Any assembly program can use constant in two ways- as immediate operands, and as
literals.
Many machine support immediate operands in machine instruction. Ex: ADD AREG, 5
But hypothetical machine does not support immediate operands as a part of the machine
instruction. It can still handle literals.
A literal is an operand with the syntax=’<value>’. EX: ADD AREG,=’5’
It differs from constant because its location cannot be specified in assembly program.
3. Assembler Directive
Assembler directives instruct the assembler to perform certain action during the assembly
program.
a. START
This directive indicates that first word of machine should be placed in the
memory word with address <constant>.
START <Constant>
Ex: START 500
First word of the target program is stored from memory location 500
onwards.
b. END
This directive indicates end of the source program.
The operand indicates address of the instruction where the execution of
program should begin.
By default it is first instruction of the program.
END <operand 2>
Execution control should transfer to label given in operand field.
| System Programming 2
3 – Assemblers
Mnemonics table
symbol address
AGAIN 104
N 113
Symbol Table
Figure 3.1: Design of Assembler
Analysis Phase
The primary function performed by the analysis phase is the building of the symbol table.
For this purpose it must determine address of the symbolic name.
It is possible to determine some address directly, however others must be inferred. And this
function is called memory allocation.
To implement memory allocation a data structure called location counter (LC) is used, it is
initialized to the constant specified in the START statement.
We refer the processing involved in maintaining the location counter as LC processing.
Tasks of Analysis phase
1. Isolate the label, mnemonics opcode, and operand fields of a constant.
2. If a label is present, enter the pair (symbol, <LC content>) in a new entry of symbol
table.
3. Check validity of mnemonics opcode.
4. Perform LC processing.
Synthesis Phase
Consider the assembly statement,
MOVER BREG, ONE
We must have following information to synthesize the machine instruction corresponding to
this statement:
| System Programming 3
3 – Assemblers
1. Address of name ONE
2. Machine operation code corresponding to mnemonics MOVER.
The first item of information depends on the source program; hence it must be available by
analysis phase.
The second item of information does not depend on the source program; it depends on the
assembly language.
Based on above discussion, we consider the use of two data structure during synthesis
phase:
1. Symbol table:
Each entry in symbol table has two primary field- name and address. This table is
built by analysis phase
2. Mnemonics table:
An entry in mnemonics table has two primary field- mnemonics and opcode.
Tasks of Synthesis phase
1. Obtain machine opcode through look up in the mnemonics table.
2. Obtain address of memory operand from the symbol table.
3. Synthesize a machine instruction.
| System Programming 5
3 – Assemblers
where <address specification> is either a <constant> or <symbolic name> ±
<displacement>.
o The EQU statement simply associates the name <symbol> with the address specified
by <address specification>. However, the address in the location counter is not
affected.
3. LTORG
o The LT0RG directive, which stands for 'origin for literals', allows a programmer to
specify where literals should be placed.
o The assembler uses the following scheme for placement of literals: When the use of
a literal is seen in a statement, the assembler enters it into a literal pool unless a
matching literal already exists in the pool.
o At every LTORG statement, as also at the END statement, the assembler allocates
memory to the literals of the literal pool and clears the literal pool.
o This way, a literal pool would contain all literals used in the program since the start
of the program or since the previous LTORG statement.
o Thus, all references to literals are forward references by definition.
o If a program does not use an LTORG statement, the assembler would enter all literals
used in the program into a single pool and allocate memory to them when it
encounters the END statement.
Consider the following assembly program to understand ORIGIN, EQU and LTORG
1 START 200
2 MOVER AREG, =’5’ 200) +04 1 211
3 MOVEM AREG, A 201) +05 1 217
4 LOOP MOVER AREG, A 202) +04 1 217
5 MOVER CREG, B 203) +04 3 218
6 ADD CREG, =’1’ 204) +01 3 212
7 …
| System Programming 6
3 – Assemblers
21 A DS 1 217)
22 BACK EQU LOOP
23 B DS 1 218)
24 END
25 =’1’ 219) +00 0 001
ORIGIN
Statement number 18 of the above program viz. ORIGIN LOOP + 2 puts the address 204 in the
location counter because symbol LOOP is associated with the address 202. The next statement
MULT CREG, B is therefore given the address 204.
EQU
On encountering the statement BACK EQU LOOP, the assembler associates the symbol BACK with
the address of LOOP i.e. with 202.
LTORG
In assembly program, the literals ='5' and ='1' are added to the literal pool in Statements 2 and 6,
respectively. The first LTORG statement (Statement 13) allocates the addresses 211 and 212 to the
values '5' and ‘1’. A new literal pool is now started. The value T is put into this pool in Statement 15.
This value is allocated the address 219 while processing the END statement. The literal ='1' used in
Statement 15 therefore refers to location 219 of the second pool of literals rather than location 212
of the first pool.
| System Programming 7
3 – Assemblers
START AD R#11
SYMTAB
A SYMTAB entry contains the symbol name, field address and length.
Some address can be determining directly, e.g. the address of the first instruction in the
program, however other must be inferred.
To find address of other we must fix the addresses of all program elements preceding it. This
function is called memory allocation.
Symbol Address Length
LOOP 202 1
NEXT 214 1
LAST 216 1
A 217 1
BACK 202 1
B 218 1
LITTAB
A table of literals used in the program.
A LITTAB entry contains the field literal and address.
The first pass uses LITTAB to collect all literals used in a program.
POOLTAB
Awareness of different literal pools is maintained using the auxiliary table POOLTAB.
This table contains the literal number of the starting literal of each literal pool.
At any stage, the current literal pool is the last pool in the LITTAB.
On encountering an LTORG statement (or the END statement), literals in the current pool
are allocated addresses starting with the current value in LC and LC is appropriately
incremented.
| System Programming 8
3 – Assemblers
littab_ptr=1;
2) While next statement is not END statement
a) If a label is present then
this_label=symbol in label field
Enter (this_label, loc_cntr) in SYMTAB
b) If an LTORG statement then
(i) Process literals LITTAB to allocate memory and put the address field.update
loc_cntr accordingly
(ii) pooltab_ptr= pooltab_ptr+1;
(iii) POOLTAB[ pooltab_ptr]= littab_ptr
c) If a START or ORIGIN statement then
loc_cntr=value specified in operand field;
d) If an EQU statement then
(i) this_address=value specified in <address spec>;
(ii) Correct the symtab entry for this_label to (this_label, this_address);
e) If a declaration
(i) Code= code of the declaration statement
(ii) Size= size of memory area required by DC/DS
(iii) loc_cntr=loc_cntr+size;
(iv) Generate IC ’(DL,code)’..
f) If an imperative statement then
(i) Code= machine opcode from OPTAB
(ii) loc_cntr=loc_cntr+instruction length from OPTAB;
(iii) if operand is a literal then
this_literal=literal in operand field;
LITTAB[littab_ptr]=this_literal;
littab_ptr= littab_ptr +1;
else
this_entry= SYMTAB entry number of operand
generate IC ‘(IS, code)(S, this_entry)’;
3) (processing END statement)
a) Perform step2(b)
b) Generate IC ‘(AD,02)’
c) Go to pass II
Intermediate code forms:
Intermediate code consist of a set of IC units, each unit consisting of the following three
fields
1. Address
2. Representation of mnemonics opcode
3. Representation of operands
| System Programming 9
3 – Assemblers
Mnemonics field
The mnemonics field contains a pair of the form
(statement class, code)
Where statement class can be one of IS, DL, and AD standing for imperative statement,
declaration statement and assembler directive respectively.
For imperative statement, code is the instruction opcode in the machine language.
For declarations and assembler directives, code is an ordinal number within the class.
Thus, (AD, 01) stands for assembler directive number 1 which is the directive START.
Codes for various declaration statements and assembler directives.
The second operand, which is a memory operand, is represented by a pair of the form
(operand class, code)
Where operand class is one of the C, S and L standing for constant, symbol and literal.
For a constant, the code field contains the internal representation of the constant itself. Ex:
the operand descriptor for the statement START 200 is (C,200).
For a symbol or literal, the code field contains the ordinal number of the operand’s entry in
SYMTAB or LITTAB.
Variant II
| System Programming 10
3 – Assemblers
This variant differs from variant I of the intermediate code because in variant II symbols,
condition codes and CPU register are not processed.
So, IC unit will not generate for that during pass I.
Variant I Variant II
START 200 (AD,01) (C, 200) (AD,01) (C, 200)
READ A (IS, 09) (S, 01) (IS, 09) A
LOOP MOVER AREG, A (IS, 04) (1)(S, 01) (IS, 04) AREG, A
. . .
. . .
SUB AREG, =’1’ (IS, 02) (1)(L, 01) (IS, 02) AREG,(L, 01)
BC GT, LOOP (IS, 07) (4)(S, 02) (IS, 07) GT, LOOP
STOP (IS, 00) (IS, 00)
A DS 1 (DL, 02) (C,1) (DL, 02) (C,1)
LTORG (AD, 05) (AD, 05)
…..
Comparison of the variants
Variant I Variant II
IS, DL and AD all statements contain DL and AD statements contain processed
processed form. form while for Is statements, operand field is
processed only to identify literal references.
Extra work in pass I Extra work in pass II
Simplifies tasks in pass II Simplifies tasks in pass I
Occupies more memory than pass II Memory utilization of two passes gets better
balanced.
| System Programming 11
3 – Assemblers
c) If a START or ORIGIN statement
i) Loc_cntr=value specified in operand field;
ii) Size=0;
d) If a declaration statement
i) If a DC statement then assemble the constant in machine_code_buffer;
ii) Size= size of memory area required by DC/DS;
e) If an imperative statement
i) Get operand address from SYMTAB or LITTAB
ii) Assemble instruction in machine_code_buffer;
iii) Size=size of instruction;
f) If size≠ 0 then
i) Move contents of machine_code_buffer to the address
code_area_address+loc_cntr;
ii) Loc_cntr=loc_cntr+size;
3. Processing end statement
a) Perform steps 2(b) and 2(f)
b) Write code_area into output file.
Design
The algorithm for the Intel 8088 assembler is given at the end of this section.
LC processing in this algorithm differs from LC processing in the first pass of a two-pass
assembler in one significant respect.
In Intel 8088, the unit for memory allocation is a byte; however, certain entities require
their first byte to be aligned on specific boundaries in the address space.
While processing declarations and imperative statements, the assembler first aligns the
address contained in the LC on the appropriate boundary. We call this action LC alignment.
Allocation of memory for a statement is performed after LC alignment.
The data structures of the assembler are as follows:
1) Mnemonics Table (MOT)
The mnemonics table (MOT) is hash organized and contains the following fields:
mnemonic opcode, machine opcode, alignment/format info and routine id. The routine
id field of an entry specifies the routine which handles that opcode. Alignment/format
info is specific to a given routine.
Mnemonic opcode Machine opcode Alignment/format Routine id (4)
(6) (2) info (1)
JNE 75H OOH R2
| – System Programming 13
3 – Assemblers
(8) (1) (2) (2) (2) (2) (2) (2) (2) (2)
Pointer to FRT
Source stmt #
Length
Size
Owner Segment (SYMTAB entry #)
Offset in segment
Type/Defined?/Segment name?/EQU?
Symbol
3) Segment Register Table (SRTAB)
The segment register table (SRTAB) contains four entries, one for each segment register.
Each entry shows the SYMTAB entry # of the segment whose address is contained in the
segment register. SRTAB_ARRAY is an array of SRTABs.
Segment Register (1) SYMTAB entry # (2)
00 (ES) 23
.
.
| – System Programming 14
3 – Assemblers
SYMTAB, SRTAB_ARRAY, CRT, FRT and ERRTAB
LC : Location Counter
code_area : Area for assembling the target program
code_area_address : Contains address of code_area
srtab_no : Number of the current SRTAB
stmt_no : Number of the current statement
SYMTAB_segment_entry : SYMTAB entry # of current segment
machine_code_buffer : Area of constructing code for one statement
| – System Programming 16
3 – Assemblers
Output of Pass-I
Data Structure of Pass-I of an assembler
OPTAB
Mnemonic Class Mnemonic info
OPCODE
START AD R#11
READ IS (09,1)
MOVER IS (04,1)
MOVEM IS (05,1)
ADD IS (01,1)
BC IS (07,1)
DC DL R#5
SUB IS (02,1)
STOP IS (00,1)
COMP IS (06,1)
| – System Programming 17
3 – Assemblers
DS DL R#7
PRINT IS (10,1)
END AD
MULT IS (03,1)
SYMTAB
Symbol Address Length
AGAIN 104 1
N 113 1
RESULT 114 1
ONE 115 1
TERM 116 1
LITTAB and POOLTAB are empty as there are no Literals defined in the the program.
Intermediate Representation
Variant I Variant II
1 START 101 (AD, 01) (C, 101) (AD, 01) (C, 101)
2 READ N (IS, 09) (S, 02) (IS, 09) N
3 MOVER BREG, ONE (IS, 04) (2)(S, 04) (IS, 04) (2) ONE
4 MOVEM BREG, TERM (IS, 05) (2)(S, 05) (IS, 05) (2) TERM
5 AGAIN MULT BREG,TERM (IS, 03) (2)(S, 05) (IS, 03) (2) TERM
6 MOVER CREG, TERM (IS, 04) (3)(S, 05) (IS, 04) (3) TERM
7 ADD CREG, ONE (IS, 01) (3)(S, 04) (IS, 01) (3) ONE
12 MOVEM CREG, TERM (IS, 05) (3)(S, 05) (IS, 05) (3) TERM
13 COMP CREG, N (IS, 06) (3)(S, 02) (IS, 06) (3) N
14 BC LE, AGAIN (IS, 07) (2)(S, 01) (IS, 07) (2)
AGAIN
15 MOVEM BREG, RESULT (IS, 05) (2)(S, 03) (IS, 05) (2)
RESULT
16 PRINT RESULT (IS, 10) (S, 03) (IS, 10) RESULT
17 STOP (IS, 00) (IS, 00)
18 N DS 1 (DL, 02) (C, 1) (DL, 02) (C, 1)
19 RESULT DS 1 (DL, 02) (C, 1) (DL, 02) (C, 1)
20 ONE DC ‘1’ (DL, 01) (C, 1) (DL, 01) (C, 1)
21 TERM DS 1 (DL, 02) (C, 1) (DL, 02) (C, 1)
22 END (AD, 02) (AD, 02)
| – System Programming 18
4 – Macro and Macro Processor
Macro
Formally, macro instructions (often called macro) are single-line abbreviations for groups of
instructions.
For every occurrence of this one-line macro instruction within a program, the instruction
must be replaced by the entire block.
The advantages of using macro are as follows:
o Simplify and reduce the amount of repetitive coding.
o Reduce the possibility of errors caused by repetitive coding.
o Make an assembly program more readable.
Macro Processors
A preprocessor can be any program that processes its input data to produce output, which is
used as an input to another program.
The outputs of the macro processors are assembly programs that become inputs to the
assembler.
The macro processor may exist independently and be called during the assembling process
or be a part of the assembler implementation itself.
|– System Programming 1
4 – Macro and Macro Processor
Macro Expansion
A macro call in a program leads to macro expansion. To expand a macro, the name of the macro is
|– System Programming 2
4 – Macro and Macro Processor
placed in the operation field, and no special directives are necessary. During macro expansion, the
macro name statement in the program is replaced by the sequence of assembly statements. Let us
consider the following example:
START 100
A DS 1
B DS 1
INCR A, B, AREG
PRINT A
STOP
END
The preceding example uses a statement that calls the macro. The assembly code sequence INCR A,
B, AREG is an example of the macro call, with A and B being the actual parameters of the macro.
While passing over the assembly program, the assembler recognizes INCR as the name of the
macro, expands the macro, and places a copy of the macro definition (along with the parameter
substitutions). The expanded code for the code is as below.
START 100
A DS 1
B DS 1
+ MOVER REG A
+ ADD REG B
+ MOVEM REG A
PRINT A
STOP
END
The statements marked with a ‘+’ sign in the preceding label field denote the expanded code and
differentiate them from the original statements of the program.
|– System Programming 3
4 – Macro and Macro Processor
Keyword parameters
For keyword parameters, the specification <parameter kind> is the string '=' in syntax rule.
The <actual parameter specification> is written as <formal parameter name> = <ordinary
string>. The value of a formal parameter is determined by the rule of keyword association as
follows:
o Find the actual parameter specification which has the form XYZ= <ordinary string>.
o If the <ordinary string> in the specification is some string ABC, the value of formal
parameter XYZ would be ABC.
|– System Programming 4
4 – Macro and Macro Processor
It unconditionally transfers expansion time control to the statement containing <sequencing
symbol> in its label field.
Example
MACRO
CONSTANTS
LCL &A
&A SET 1
DB &A
&A SET &A+l
DB &A
MEND
The local EV A is created.
The first SET statement assigns the value '1' to it.
The first DB statement thus declares a byte constant ‘1’.
The second SET statement assigns the value '2' to A and the second DB statement declares a
constant '2'.
Example
MACRO
DCL_CONST &A
AIF (L'&A EQ 1) .NEXT
--
.NEXT --
--
MEND
Here expansion time control is transferred to the statement having .NEXT field only if the actual
parameter corresponding to the formal parameter length of ' 1'.
The design of a macro preprocessor is influenced by the provisions for performing the following
tasks involved in macro expansion:
Recognize macro calls: A table is maintained to store names of all macros defined in a
program. Such a table is called Macro Name Table (MNT) in which an entry is made for
every macro definition being processed. During processing program statements, a match is
done to compare strings in the mnemonic field with entries in the MNT. A successful match
in the MNT indicates that the statement is a macro call.
Determine the values of formal parameters: A table called Actual Parameter Table (APT)
holds the values of formal parameters during the expansion of a macro call. The entry into
this table will be in pair of the form (<formal parameter name>, <value>). A table called
Parameter Default Table (PDT) contains information about default parameters stored as
pairs of the form (<formal parameter name>, <default value>) for each macro defined in the
program. If the programmer does not specify value for any or some parameters, its
corresponding default value is copied from PDT to APT.
Maintain the values of expansion time variables declared in a macro: A table called
Expansion time Variable Table (EVT) maintains information about expansion variables in the
form (<EV name>, <value>). It is used when a preprocessor statement or a model statement
during expansion refers to an EV.
Organize expansion time control flow: A table called Macro Definition Table (MDT) is used
to store the body of a macro. The flow of control determines when a model statement from
the MDT is to be visited for expansion during macro expansion. MEC {Macro Expansion
Counter) is defined and initialized to the first statement of the macro body in the MDT. MDT
is updated following an expansion of a model statement by a macro preprocessor.
Determine the values of sequencing symbols: A table called Sequencing Symbols Table
|– System Programming 7
4 – Macro and Macro Processor
(SST) maintains information about sequencing symbols in pairs of the form
(<sequencing symbol name>, <MDT entry #>)
where <MDT entry #> denotes the index of the MDT entry containing the model statement
with the sequencing symbol. Entries are made on encountering a statement with the
sequencing symbol in their label field or on reading a reference prior to its definition.
Perform expansion of a model statement: The expansion task has the following steps:
o MEC points to the entry in the MDT table with the model statements.
o APT and EVT provide the values of the formal parameters and EVs, respectively.
o SST enables identifying the model statement and defining sequencing.
|– System Programming 8
4 – Macro and Macro Processor
macro.
Handles expansion time control flow and performs expansion of model statements.
|– System Programming 9
4 – Macro and Macro Processor
|– System Programming 12
4 – Macro and Macro Processor
appear always before any statements that invoke that macro in the program.
The important data structures required in a one-pass macro processor are:
DEFTAB (Definition Table): It is a definition table that is used to store the macro definition
including macro prototype and macro body. Comment lines are not included here, and
references to the parameters use positional notation for efficiency in substituting
arguments.
NAMTAB (Name Table): This table is used for storing macros names. It serves as an index to
DEFTAB and maintains pointers that point to the beginning and end of the macro definition
in DEFTAB.
ARGTAB (Argument Table): It maintains arguments according to their positions in the
argument list. During expansion, the arguments from this table are substituted for the
corresponding parameters in the macro body.
One-pass Macro Processor scheme is presented as below.
GETLINE PROCESSLI
Macro definition Macro call
DEFINE EXPAND
|– System Programming 13
5 – Linker and Loader
Introduction
1) Translation time address: Translation time address is used at the translation time. This
address is assigned by translator
2) Linked time address: Link time address is used at the link time. This address is assigned by
linker
3) Load time address: Load time address is used at the load time. This address is assigned by
loader
4) Translated origin: Address of origin assumed by the translator
5) Linked origin: Address of origin assumed by the linker while producing a binary program
6) Load origin: Address of origin assumed by the loader while loading the program for
execution.
|– System Programming 1
5 – Linker and Loader
Linking
Consider an application program AP consisting of a set of program units SP = {Pi}.
A program unit Pi interacts with another program unit Pj by using addresses of Pj’s
instructions and data in its own instructions.
To realize such interactions, Pj and Pi must contain public definitions and external
references as defined in the following: (Explain public definition and external reference)
o Public definition: a symbol pub_symb defined in a program unit which may be
referenced in other program units.
o External reference: a reference to a symbol ext_symb which is not defined in the
program unit
Design of a Linker
The design of linker is divided into two parts:
1) Relocation
The linker uses an area of memory called the work area for constructing the binary
program.
It loads the machine language program found in the program component of an
object module into the work area and relocates the address sensitive instructions in
it by processing entries of the RELOCTAB.
For each RELOCTAB entry, the linker determines the address of the word in the work
area that contains the address sensitive instruction and relocates it.
The details of the address computation would depend on whether the linker loads
and relocates one object module at a time, or loads all object modules that are to
be linked together into the work area before performing relocation.
Algorithm: Program Relocation
1. program_linked_origin := <link origin> from the linker command;
2. For each object module mentioned in the linker command
(a) t_origin := translated origin of the object module;
OMsize := size of the object module;
(b) relocation_factor := program_linked_origin – t_origin;
(c) Read the machine language program contained in the program
component of the object module into the work-area.
|– System Programming 2
5 – Linker and Loader
(d) Read RELOCTAB of the object module.
(e) For each entry in RELOCTAB
i. translated_address := address found in the RELOCTAB
entry;
ii. address_in_work_area := address of work_area +
translated_address – t_origin;,
iii. Add relocation-factor to the operand address found in the
word that has the address address_in_work_area.
(f) Program_linked_origin := program_linked_origin + OM_size;
2) Linking
An external reference to a symbol alpha can be resolved only if alpha is declared as
a public definition in some object module.
Using this observation as the basis, program linking can be performed as follows:
The linker would process the linking tables (LINKTABs) of all object modules that are
to be linked and copy the information about public definitions found in them into a
table called the name table (NTAB).
The external reference to alpha would be resolved simply by searching for alpha in
this table, obtaining its linked address, and copying it into the word that contains
the external reference.
Accordingly, each entry of the NTAB would contain the following fields:
Symbol: Symbolic name of an external reference or an object module.
Linked-address: For a public definition, this field contains linked address of the
symbol. For an object module, it contains the linked origin of the object module.
Algorithm: Program Linking
1. program_linked_origin := <link origin> from the linker command.
2. For each object module mentioned in the linker command
(a) t_origin := translated origin of the object module;
OM_size := size of the object module;
(b) relocation_factor := program_linked_origin – t_origin;
(c) Read the machine language program contained in the program
component of the object module into the work_area.
(d) Read LINKTAB of the object module.
(e) Enter (object module name, program_linked_origin) in NTAB.
(f) For each LINKTAB entry with type = PD
name := symbol field of the LINKTAB entry;
linked_address := translated_address + relocation_factor;
Enter (name, linked_address) in a new entry of the NTAB.
(g) program_linked_origin := program_linked_origin + OM_size;
3. For each object module mentioned in the linker command
(a) t_origin := translated origin of the object module;
program_linked_origin := linked_adress from NTAB;
(b) For each LINKTAB entry with type = EXT
i. address_in_work_area := address of work_area +
program_linked_origin - <link origin> in linker command +
translated address – t_origin;
|– System Programming 3
5 – Linker and Loader
ii. Search the symbol found in the symbol field of the LINKTAB
entry in NTAB and note its linked address. Copy this address
into the operand address field in the word that has the
address address_in_work_area.
Self-Relocating Programs
A self relocating program is a program which can perform the relocation of its own address
sensitive instructions.
It contains the following two provisions for this purpose:
o A table of information concerning the address sensitive instructions exists as a part
of the program.
o Code to perform the relocation of address sensitive instructions also exists as a part
of the program. This is called the relocating logic.
The start address of the relocating logic is specified as the execution start address of the
program.
Thus the relocating logic gains control when the program is loaded in memory for the
execution.
It uses the load address and the information concerning address sensitive instructions to
perform its own relocation.
Execution control is now transferred to the relocated program.
A self –relocating program can execute in any area of the memory.
This is very important in time sharing operating systems where the load address of a
program is likely to be different for different executions.
Linking in MSDOS
We discuss the design of a linker for the Intel 8088/80x86 processors which resembles LINK
of MS DOS in many respects.
It may be noted that the object modules of MS DOS differ from the Intel specifications in
some respects.
Object Module Format (Explain object module of the program)
An Intel 8088 object module is a sequence of object records, each object record describing
specific aspects of the programs in the object module.
There are 14 types of object records containing the following five basic categories of
information:
o Binary image (i.e. code generated by a translator)
o External references
o Public definitions
o Debugging information (e.g. line number in source program).
o Miscellaneous information (e.g. comments in the source program).
|– System Programming 4
5 – Linker and Loader
We only consider the object records corresponding to first three categories-a total of eight
object record types.
Each object record contains variable length information and may refer to the contents of
previous object records.
Each name in an object record is represented in the following format:
length( 1 byte) name
THEADR, LNAMES and SEGDEF records
THEADR record
80H length T-module name check-sum
The module name in the THEADR record is typically derived by the translator from the
source file name.
This name is used by the linker to report errors.
An assembly programmer can specify the module name in the NAME directive.
LNAMES record
96H length name-list check-sum
The LNAMES record lists the names for use by SEGDEF records.
SEGDEF record
98H length attributes segment name check-sum
(1-4) length index
(2) (1)
A SEGDEF record designates a segment name using an index into this list.
The attributes field of a SEGDEF record indicates whether the segment is relocatable or
absolute, whether (and in what manner) it can be combined with other segments, as also
the alignment requirement of its base address (e.g. byte, word or paragraph, i.e. 16 byte,
alignment).
Stack segments with the same name are concatenated with each other, while common
segments with the same name are overlapped with one another.
The attribute field also contains the origin specification for an absolute segment.
EXTDEF and PUBDEF record
8CH length external reference check-sum
list
|– System Programming 5
5 – Linker and Loader
Each (name, offset) pair in the record defines one public name, specifying the name of the
symbol and it’s offset within the segment designated by the base specification.
LEDATA records
A0H length segment data offset data check-sum
index (2)
(1-2)
An LEDATA record contains the binary image of the code generated by the language
translator.
Segment index identifies the segment to which the code belongs, and offset specifies the
location of the code within the segment.
FIXUPP record
9CH length locat fix frame target target … check-
(1) dat datum datum offset sum
(1) (1) (1) (2)
A FIXUPP record contains information for one or more relocation and linking fixups to be
performed.
The locat field contains a numeric code called loc code to indicate the type of a fixup.
The meanings of these codes are given in Table
Loc code Meaning
0 Low order byte is to be fixed.
1 Offset is to be fixed.
2 Segment is to be fixed.
3 Pointer (i.e., segment: offset) is to be fixed.
locat also contains the offset of the fixup location in the previous LEDATA record.
The frame datum field, which refers to a SEGDEF record, identifies the segment to which
the fixup location belongs.
The target datum and target offset fields specify the relocation or linking information.
Target datum contains a segment index or an external index, while target offset contains
an offset from the name indicated in target datum.
The fix dat field indicates the manner in which the target datum and target offset fields are
to be interpreted.
The numeric codes used for this purpose are given in below table.
Code contents of target datum and offset fields
0 Segment index and displacement.
2 External index and target displacement.
4 Segment index (offset field is not used).
6 External index (offset field is not used).
|– System Programming 6
5 – Linker and Loader
MODEND record
8AH length type start addr check-sum
(1) (5)
The MODEND record signifies the end of the module, with the type field indicating whether
it is the main program.
This record also optionally indicates the execution start address.
This has two components: (a) the segment, designated as an index into the list of segment
names defined in SEGDEF record(s), and (b) an offset within the segment.
|– System Programming 7
5 – Linker and Loader
This interrupt is processed by the overlay manager and the appropriate overlay is loaded
into memory.
When each overlay is structured into a separate binary program, as in IBM mainframe
systems, a call which crosses overlay boundaries leads to an interrupt which is attended by
the OS kernel.
Control is now transferred to the OS loader to load the appropriate binary program.
Dynamic Linking
Static Linking
In static linking, the liner links all modules of a program before its execution begins; it
produces a binary program that does not contain any unresolved external references.
If statically linked programs use the same module from a library, each program will get a
private copy of the module.
If many programs that use the module are in execution at the same time, many copies of
the module might be present in memory.
Dynamic Linking
Dynamic linking is performed during execution of a binary program.
The linker is invoked when an unresolved external reference and resumes execution of the
program.
This arrangement has several benefits concerning use, sharing and updating of library
modules.
If the module referenced by a program has already been linked to another program that is
in execution, a copy of the module would exist in memory. The same copy of the module
could be lined to his program as well, thus saving memory.
To facilitate dynamic linking, each program is first processed by the static linker.
The static linker links each external reference in the program to a dummy module whose
Different Loading Schemes
Compile-and-Go Loaders
Assembler is loaded in one part of memory and assembled program directly into their
assigned memory location
After the loading process is complete, the assembler transfers the control to the starting
instruction of the loaded program.
Advantages
The user need not be concerned with the separate steps of compilation, assembling, linking,
loading, and executing.
Execution speed is generally much superior to interpreted systems.
They are simple and easier to implement.
|– System Programming 8
5 – Linker and Loader
Program
loader in
memory
Source Compiler & go
program assembler
Assembler
Disadvantages
There is wastage in memory space due to the presence of the assembler.
The code must be reprocessed every time it is run.
Absolute Loaders
An absolute loader loads a binary program in memory for execution.
The binary program is stored in a file contains the following:
o A Header record showing the load origin, length and load time execution start
address of the program.
|– System Programming 9
5 – Linker and Loader
o A sequence of binary image records containing the program’s code. Each binary
image record contains a part of the program’s code in the form of a sequence of
bytes, the load address of the first byte of this code and a count of the number of
bytes of code.
The absolute loader notes the load origin and the length of the program mentioned in the
header record.
It then enters a loop that reads a binary image record and moves the code contained in it
to the memory area starting on the address mentioned in the binary image record.
At the end, it transfers control to the execution start address of the program.
Advantages of the absolute loading scheme:
o Simple to implement and efficient in execution.
o Saves the memory (core) because the size of the loader is smaller than that of the
assembler.
o Allows use of multi-source programs written in different languages. In such cases,
the given language assembler converts the source program into the language, and
a common object file is then prepared by address resolution.
o The loader is simpler and just obeys the instruction regarding where to place the
object code in the main memory.
Disadvantages of the absolute loading scheme:
o The programmer must know and clearly specify to the translator (the assembler)
the address in the memory for inner-linking and loading of the programs. Care
should be taken so that the addresses do not overlap.
o For programs with multiple subroutines, the programmer must remember the
absolute address of each subroutine and use it explicitly in other subroutines to
perform linking.
o If the subroutine is modified, the program has to be assembled again from first to
last.
Relocating Loaders
A relocating loader loads a program in a designated area of memory, relocates it so that it
can execute correctly in that area of memory and passes control to it for execution.
The binary program is stored in a file contains the following:
o A Header record showing the load origin, length and load time execution start
address of the program.
o A sequence of binary image records containing the program’s code. Each binary
image record contains a part of the program’s code in the form of a sequence of
bytes, the load address of the first byte of this code and a count of the number of
bytes of code.
o A table analogous to RELOCTAB table giving linked addresses of address sensitive
instructions in the program.
Algorithm: Relocating Loader
1. Program_load_origin = load origin specified in the loader command
2. program_linked_origin = linked origin specified in the header record
3. relocation_factor = program_load_origin – program_linked_origin
4. For each binary image record
a. code_linked_address = linked address specified in the record
|– System Programming 10
5 – Linker and Loader
b. code_load_address = code_linked_address + relocation_factor
c. byte_count = count of the number of bytes in the record
d. Move byte_count bytes from the record to the memory area with start address
code_load_address
5. Read RELOCTAB of the program
6. For each entry in the RELOCTAB
a. Instruction_linked_address = address specified in the RELOCTAB entry
b. instruction_load_address = instruction_linked_address + relocation_factor
c. Add relocation_factor to the operand address used in the instruction that has
the address instruction_load_address
In the above program the address of var5iable X in the instruction ADD AREG, X will be 30
If this program is loaded from the memory location 500 for execution then the address of
X in the instruction ADD AREG, X must become 530.
Offset=10
500
ADD AREG, X
Offset=30 ADD AREG, X
530
Linking Loaders
|– System Programming 11
5 – Linker and Loader
A modern program comprises several procedures or subroutines together with the main
program module.
The translator, in such cases as a compiler, will translate them all independently into
distinct object modules usually stored in the secondary memory.
Execution of the program in such cases is performed by linking together these independent
object modules and loading them into the main memory.
Linking of various object modules is done by the linker.
Special system program called linking loader gathers various object modules, links them
together to produce single executable binary program and loads them into the memory.
This category of loaders leads to a popular class of loaders called direct-linking loaders.
The loaders used in these situations are usually called linking loaders, which link the
necessary library functions and symbolic references.
Essentially, linking loaders accept and link together a set of object programs and a single
file to load them into the core.
Linking loaders additionally perform relocation and overcome disadvantages of other
loading schemes.
|– System Programming 12
6 – Scanning and Parsing
Reduction: Let production P1 of grammar G be of the form P1: A → α and let σ be a string such
that σ α, then replacement of α by A in string σ constitutes a reduction according to
production P1.
Step String
0 the boy
1 <Article> boy
2 <Article> <Noun>
3 <Noun Phrase>
Parse trees: The tree representation of the sequence of derivations that produces a string from
the distinguished (start) symbol is termed as parse tree.
|– System Programming 1
6 – Scanning and Parsing
… NTi …
Eg.
<Noun Phrase>
<Article> <Noun>
the boy
Classification of Grammar
Type-0 grammars
These grammars are known as phrase structure grammars. Their productions are of the form
α→β
where both α and β can be strings of terminal and nonterminal symbols. Such productions permit arbitrary
substitution of strings during derivation or reduction, hence they are not relevant to specification of
programming languages.
Type-1 grammars
Productions of Type-1 grammars specify that derivation or reduction of strings can take place only in specific
contexts. Hence these grammars are also known as context sensitive grammars. A production of a Type -1
grammar has the form
αAβ → απβ
Here, a string π can be replaced by 'A' (or vice versa) only when it is enclosed by the strings α and β in a
sentential form. These grammars are also not relevant for programming language specification since
recognition of programming language constructs is not context sensitive in nature.
Type-2 grammars
These grammars do not impose any context requirements on derivations or reductions. A typical Type -2
production is of the form
A→π
which can be applied independent of its context. These grammars are therefore known as context free
grammars (CFG). CFGs are ideally suited for programming language specification.
Type-3 grammars
Type-3 grammars are characterized by productions of the form
A → tB|t or
A → Bt|t
Note that these productions also satisfy the requirements of type-2 grammars. The specific form of the RHS
alternatives—namely a single nonterminal symbol or a string containing a single terminal symbol and a
single nonterminal symbol, gives some practical advantages in scanning. However, the nature of the
|– System Programming 2
6 – Scanning and Parsing
productions restricts the expressive power of these grammars, e.g., nesting of constructs or matching of
parentheses cannot be specified using such productions. Hence the use of Type -3 productions is restrictedto
the specification of lexical units, e.g., identifiers, constants, labels, etc. Type-3 grammars are also known as
linear grammars or regular grammars.
Operator grammars
Productions of an operator grammar do not contain two or more consecutive nonterminal symbols in any
RHS alternative. Thus, nonterminal symbols occurring in an RHS string are separated by one or more
terminal symbols.
Eg.
<exp> → <exp> + <term> | <term>
<term> → <term> * <factor> | <factor>
<factor> → <factor> ↑ <primary> | <primary>
<primary> → <<id> | <constant> | (<exp>)
<id> → <letter> | <id> <letter> | <id> <digit>
<const> → *+|-] <digit> | <const> <digit>
<letter> → a|b|c|…|z
<digit> → 0|1|2|3|4|5|6|7|8|9
Ambiguous Grammar
A grammar is ambiguous if a string can be interpreted in two or more ways by using it.
In natural languages, ambiguity may concern the meaning of a word, the syntax category of
a word, or the syntactic structure of a construct. For example, a word can have multiple
meanings or it can be both a noun and a verb and a sentence can have more than one
syntactic.
In a formal language grammar, ambiguity would arise if identical strings can occur on the
RHS of two or more productions. For example, if a grammar has the
N1 → α
N2 → α
The string α can be derived from or reduced to either N1 or N2.
Ambiguity at the level of the syntactic structure of a string would mean that more than one
parse tree can be built for the string.
Eg.
<exp> → <id> | <exp> + <exp> | <exp> * <exp>
<id> → a | b | c
Two parse trees can be built for the source string a+b*c according to this grammar – one I
which a+b is first reduced to <exp> and another in which b*c is first reduced to <exp>.
|– System Programming 3
6 – Scanning and Parsing
Scanning
Finite state automaton (FSA): A finite state automaton is a triple (S, Σ, T) where
S is a finite set of states, one of which is the initial state
sinit, and one or more of which are the final states
Σ is the alphabet of source symbols
T is a finite set of state transitions defining transitions
out of states in S on encountering symbols in Σ.
Deterministic finite state automaton: It is a finite state automaton none of whose states has
two or more transitions for the same source symbol. The DFA has the property that it reaches a
unique state for every source string input to it.
Regular Expression: A regular expression is a sequence of characters that define a search
pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and
replace"-like operations.
Give the regular expressions for the following:
1. 0 or 1
0+1
2. 0 or 11 or 111
0+11+111
3. Regular expression over ∑=,a,b,c- that represent all string of length 3.
(a+b+c)(a+b+c)(a+b+c)
4. String having zero or more a.
a*
5. String having one or more a.
a+
6. All binary string.
(0+1)*
7. 0 or more occurrence of either a or b or both
(a+b)*
8. 1 or more occurrence of either a or b or both
(a+b)+
9. Binary no. end with 0
(0+1)*0
10. Binary no. end with 1
(0+1)*1
11. Binary no. starts and end with 1.
1(0+1)*1
12. String starts and ends with same character.
0(0+1)*0 or a(a+b)*a
1(0+1)*1 b(a+b)*b
13. All string of a and b starting with a
a(a/b)*
14. String of 0 and 1 end with 00.
|– System Programming 4
6 – Scanning and Parsing
(0+1)*00
15. String end with abb.
(a+b)*abb
16. String start with 1 and end with 0.
1(0+1)*0
17. All binary string with at least 3 characters and 3rd character should be zero.
(0+1)(0+1)0(0+1)*
18. Language which consist of exactly two b’s over the set ∑=,a,b-
a*ba*ba*
19. ∑=,a,b- such that 3rd character from right end of the string is always a.
(a+b)*a(a+b)(a+b)
20. Any no. of a followed by any no. of b followed by any no. of c.
a*b*c*
21. It should contain at least 3 one.
(0+1)*1(0+1)*1(0+1)*1(0+1)*
22. String should contain exactly Two 1’s
0*10*10*
23. Length should be at least be 1 and at most 3.
(0+1) + (0+1) (0+1) + (0+1) (0+1) (0+1)
24. No.of zero should be multiple of 3
(1*01*01*01*)*+1*
25. ∑=,a,b,c- where a are multiple of 3.
((b+c)*a (b+c)*a (b+c)*a (b+c)*)*
26. Even no. of 0.
(1*01*01*)*
27. Odd no. of 1.
0*(10*10*)*10*
28. String should have odd length.
(0+1)((0+1)(0+1))*
29. String should have even length.
((0+1)(0+1))*
30. String start with 0 and has odd length.
0((0+1)(0+1))*
31. String start with 1 and has even length.
1(0+1)((0+1)(0+1))*
32. Even no of 1
(0*10*10*)*
33. String of length 6 or less
(0+1+^)6
34. String ending with 1 and not contain 00.
(1+01)+
35. All string begins or ends with 00 or 11.
(00+11)(0+1)*+(0+1)*(00+11)
36. All string not contains the substring 00.
(1+01)* (^+0)
37. Language of all string containing both 11 and 00 as substring.
|– System Programming 5
6 – Scanning and Parsing
((0+1)*00(0+1)*11(0+1)*)+ ((0+1)*11(0+1)*00(0+1)*)
38. Language of C identifier.
(_+L)(_+L+D)*
Regular Expression to DFA Conversion
(a+b)*abb
|– System Programming 6
6 – Scanning and Parsing
States a b
A B C
B B D
C B C
D B E
E B C
Table 6.1: Transition Table
a
b
a B D
a
b b
A
a
a
b E
C
b
Figure 6.2: DFA for (a+b)*abb
Note: For more examples refer separate notes has been attached.
Parsing
This top-down parsing is non-recursive. LL(1) – the first L indicates input is scanned from left
to right. The second L means it uses leftmost derivation for input string and 1 means it uses
only input symbol to predict the parsing process.
The data structure used by LL(1) parser are input buffer, stack and parsing table.
The parser works as follows,
The parsing program reads top of the stack and a current input symbol. With the help of
these two symbols parsing action can be determined.
The parser consult the LL(1) parsing table each time while taking the parsing actions hence
this type of parsing method is also called table driven parsing method.
The input is successfully parsed if the parser reaches the halting configuration. When the
stack is empty and next token is $ then it corresponds to successful parsing.
Steps to construct LL(1) parser
1. Remove left recursion / Perform left factoring.
2. Compute FIRST and FOLLOW of nonterminals.
3. Construct predictive parsing table.
4. Parse the input string with the help of parsing table.
Example:
EE+T/T
TT*F/F
|– System Programming 7
6 – Scanning and Parsing
F(E)/id
Step1: Remove left recursion
ETE’
E’+TE’ | ϵ
TFT’
T’*FT’ | ϵ
F(E) | id
Step2: Compute FIRST & FOLLOW
FIRST FOLLOW
E {(,id} {$,)}
E’ {+,ϵ} {$,)}
T {(,id} {+,$,)}
T’ {*,ϵ} {+,$,)}
F {(,id} {*,+,$,)}
Table 6.2 First & Follow set
Step3: Predictive Parsing Table
id + * ( ) $
E ETE’ ETE’
E’ E’+TE’ E’ϵ E’ϵ
T TFT’ TFT’
T’ T’ϵ T’*FT’ T’ϵ T’ϵ
F Fid F(E)
Table 6.3: Predictive parsing table
Step4: Parse the string
Stack Input Action
$E id+id*id$
$E’T id+id*id$ ETE’
$ E’T’F id+id*id$ TFT’
$ E’T’id id+id*id$ Fid
$ E’T’ +id*id$
$ E’ +id*id$ T’ ϵ
$ E’T+ +id*id$ E’+TE’
$ E’T id*id$
$ E’T’F id*id$ TFT’
$ E’T’id id*id$ Fid
$ E’T’ *id$
$ E’T’F* *id$ T’*FT’
$ E’T’F id$
$ E’T’id id$ Fid
$ E’T’ $
|– System Programming 8
6 – Scanning and Parsing
$ E’ $ T’ ϵ
$ $ E’ ϵ
Table 6.4: Moves made by predictive parse
Bottom Up Parsing
Handle: A “handle” of a string is a substring of the string that matches the right side of a
production, and whose reduction to the nonterminal of the production is one step along the
reverse of rightmost derivation.
Handle pruning: The process of discovering a handle and reducing it to appropriate Left hand
side non terminal is known as handle pruning.
Right sentential form Handle Reducing production
id1+id2*id3 id1 Eid
E+id2*id3 id2 Eid
E+E*id3 id3 Eid
E+E*E E*E E E*E
E+E E+E E E+E
E
Table 6.5: Handle and Handle Pruning
|– System Programming 9
6 – Scanning and Parsing
NT . Op Traling(NT) .> Op
E+ {+,*, id} .> +
T* {*, id} .> *
3. $ <. {+, *,id}
4. {+,*,id} .> $
Step-3: Creation of table
+ * id $
.
+ > <. <. .
>
. .
* > > <. .
>
. . .
id > > >
$ <. <. <.
Table 6.8: Precedence table
Step-4: Parsing of the string <id> + <id> * <id> using precedence table.
We will follow following steps to parse the given string,
1. Scan the input string until first .> is encountered.
2. Scan backward until <. is encountered.
3. The handle is string between <. and .>.
$ <. id .> + <. id .> * <. id .> $ Handle id is obtained between <. .>
Reduce this by F->id
$ F+ <. id .> * <. id .> $ Handle id is obtained between <. .>
Reduce this by F->id
$ F + F * <. id .> $ Handle id is obtained between <. .>
Reduce this by F->id
$F+F*F$ Perform appropriate reductions of all
nonterminals.
$ E + T * F$ Remove all nonterminal
$ +* $ Place relation between operators
$ <. + <. * >$ The * operator is surrounded by <. .>. This
indicates * becomes handle we have to
reduce T*F.
$ <. + >$ + becomes handle. Hence reduce E+T.
$$ Parsing Done
Table 6.9: Moves for parsing <id>+<id>*<id>
|– System Programming 11
6 – Scanning and Parsing
|– System Programming 12
6 – Scanning and Parsing
|- a
SB
(b) ‘*’
TOS +
b
SB |- a
(c) ‘ -| ’ + b
TOS c
*
‘ -| ’ SB |- a
TOS + *
(d)
b c
(e) ‘ -| ’ SB,TOS |- +
a *
b c
|– System Programming 13
6 – Scanning and Parsing
Source Program in L
LEX
The input to LEX consists of two components.
The first component is a specification of strings that represents the lexical units in L.
This specification is in the form of regular expressions.
The second component is a specification of semantic actions that are aimed at building the
intermediate representation.
The intermediated representation produced by a scanner would consist of a set of tables of
lexical units and a sequence of tokens for the lexical units occurring in a source statement.
The scanner generated by LEX would be invoked by a parser whenever the parser needs the
next token.
Accordingly, each semantic action would perform some table building actions and return a
single token.
YACC
Each translation rule input to YACC has a string specification that resembles a production of
a grammar—it has a nonterminal on the LHS and a few alternatives on the RHS.
For simplicity, we will refer to a string specification as a production. YACC generates an
LALR(1) parser for language L from the productions, which is a bottom-up parser.
The parser would operate as follows: For a shift action, it would invoke the scanner to
obtain the next token and continue the parse by using that token. While performing a
reduce action in accordance with a production, it would perform the semantic action
associated with that production.
The semantic actions associated with productions achieve building of an intermediate
representation or target code as follows: Every nonterminal symbol in the parser has an
attribute. The semantic action associated with a production can access attributes of
|– System Programming 14
6 – Scanning and Parsing
nonterminal symbols used in that production—a symbol '$n' in the semantic action, where n
is an integer, designates the attribute of the nth nonterminal symbol in the RHS of the
production and the symbol '$$' designates the attribute of the LHS nonterminal symbol of
the production. The semantic action uses the values of these attributes for building the
intermediate representation or target code. The attribute type can be declared in the
specification input to YACC.
|– System Programming 15
7 – Compilers
|– System Programming 1
7 – Compilers
about the relevant binding during its execution and use it to access the entity appropriately.
It affects execution efficiency of the target program.
Memory Allocation
Three important tasks of memory allocation are:
1. Determine the amount of memory required to represent the value of a data item.
2. Use an appropriate memory allocation model to implement the lifetimes and scopes of
data items.
3. Determine appropriate memory mappings to access the values in a non scalar data
item, e.g. values in an array.
Memory allocation are mainly divides into two types:
1. Static binding
2. Dynamic binding
Static memory allocation
In static memory allocation, memory is allocated to a variable before the execution of a
program begins.
Static memory allocation is typically performed during compilation.
No memory allocation or deallocation actions are performed during the execution of a
program. Thus, variables remain permanently allocated
Dynamic memory allocation
In dynamic memory allocation, memory bindings are established and destroyed during the
execution of a program
Dynamic memory allocation has two flavors’-automatic allocation and program controlled
allocation.
In automatic dynamic allocation, memory is allocated to the variables declared in a program
unit when the program unit is entered during execution and is deallocated when the
program unit is exit. Thus the same memory area may be used for the variables of different
program units
In program controlled dynamic allocation, a program can allocate or deallocate memory at
arbitrary points during its execution.
It is obvious that in both automatic and program controlled allocation, address of the
memory area allocated to a program unit cannot be determined at compilation time.
|– System Programming 2
7 – Compilers
The delimiters mark the beginning and the end of the block. There can be nested blocks for
ex: block B2 can be completely defined within the block B1.
A block structured language uses dynamic memory allocation.
Finding the scope of the variable means checking the visibility within the block
Following are the rules used to determine the scope of the variable:
1. Variable X is accessed within the block B1 if it can be accessed by any statement situated
in block B1.
2. Variable X is accessed by any statement in block B2 and block B2 is situated in block B1.
There are two types of variable situated in the block structured language
1. Local variable
2. Non local variable
To understand local and non-local variable consider the following example
Procedure A
{
int x,y,z
Procedure B
{
Int a,b
}
Procedure C
{
Int m,n
}
}
Dynamic pointer
The first reserved pointer in block’s AR points to the activation record of its dynamic parent.
This is called dynamic pointer and has the address 0 (ARB).
The dynamic pointer is used for de-allocating an AR.
Following example shows memory allocation for program given below.
|– System Programming 3
7 – Compilers
Static pointer
Access to non local variable is implemented using the second reserved pointer in AR. This
pointer which has the address 1 (ARB) is called the static pointer.
Activation record
The activation record is a block of memory used for managing information needed by a
single execution of a procedure.
Return value
Actual parameter
Control link
Access link
Local variables
Temporaries
1. Temporary values: The temporary variables are needed during the evaluation of
expressions. Such variables are stored in the temporary field of activation record.
2. Local variables: The local data is a data that is local to the execution procedure is
stored in this field of activation record.
3. Saved machine registers: This field holds the information regarding the status of
machine just before the procedure is called. This field contains the registers and
program counter.
|– System Programming 4
7 – Compilers
4. Control link: This field is optional. It points to the activation record of the calling
procedure. This link is also called dynamic link.
5. Access link: This field is also optional. It refers to the non local data in other
activation record. This field is also called static link field.
6. Actual parameters: This field holds the information about the actual parameters.
These actual parameters are passed to the called procedure.
7. Return values: This field is used to store the result of a function call.
Compilation of Expression
Operand Descriptor
An operand descriptor has the following fields:
1. Attributes: Contains the subfields type, length and miscellaneous information
2. Addressability: Specifies where the operand is located, and how it can be accessed. It has
two subfields
Addressability code: Takes the values 'M' (operand is in memory), and 'R' (operand is in
register). Other addressability codes, e.g. address in register ('AR') and address in memory
('AM'), are also possible,
Address: Address of a CPU register or memory word.
Ex: a*b
MOVER AREG, A
MULT AREG, B
Three operand descriptors are used during code generation. Assuming a, b to be integers
occupying 1 memory word, these are:
Attribute Addressability
(int, 1) Address(a)
(int, 1) Address(b)
(int, 1) Address(AREG)
Register descriptors
A register descriptor has two fields
1. Status: Contains the code free or occupied to indicate register status.
2. Operand descriptor #: If status = occupied, this field contains the descriptor for the operand
contained in the register.
Register descriptors are stored in an array called Register_descriptor. One register
descriptor exists for each CPU register.
In above Example the register descriptor for AREG after generating code for a*b would be
Occupied #3
This indicates that register AREG contains the operand described by descriptor #3.
|– System Programming 5
7 – Compilers
There are two types of intermediate representation
1. Postfix notation
2. Three address code.
1) Postfix notation
Postfix notation is a linearized representation of a syntax tree.
it a list of nodes of the tree in which a node appears immediately after its children
the postfix notation of x=-a*b + -a*b will be
x a –b * a-b*+=
2) Three address code
In three address code form at the most three addresses are used to represent statement.
The general form of three address code representation is -a:= b op c
Where a,b or c are the operands that can be names, constants.
For the expression like a = b+c+d the three address code will be
t1=b+c
t2=t1+d
Here t1 and t2 are the temporary names generated by the compiler. There are most three
addresses allowed. Hence, this representation is three-address code.
There are three representations used for three code such as quadruples, triples and indirect
triples.
Quadruple representation
The quadruple is a structure with at the most tour fields such as op,arg1,arg2 and result.
The op field is used to represent the internal code for operator, the arg1 and arg2
represent the two operands used and result field is used to store the result of an
expression.
Consider the input statement x:= -a*b + -a*b
t2 := t1 * b (1) * t1 b t2
t 3= - a (2) uminus a t3
t4 := t3 * b (3) * t3 b t4
t5 := t2 + t4 (4) + t2 t4 t5
x= t5 (5) := t5 X
Triples
The triple representation the use of temporary variables is avoided by referring the
pointers in the symbol table.
|– System Programming 6
7 – Compilers
the expression x : = - a * b + - a * b the triple representation is as given below
Number Op Arg1 Arg2
(0) uminus a
(1) * (0) b
(2) uminus a
(3) * (2) b
(4) + (1) (3)
(5) := X (4)
Indirect Triples
The indirect triple representation the listing of triples is been done. And listing pointers
are used instead of using statements.
Number Op Arg1 Arg2 Statement
Code Optimization
I. Compile Time Evaluation
Compile time evaluation means shifting of computations from run time to compilation.
There are two methods used to obtain the compile time evaluation.
1. Folding
In the folding technique the computation of constant is done at compile time instead
of run time.
example : length = (22/7) * d
Here folding is implied by performing the computation of 22/7 at compile time
2. Constant propagation
In this technique the value of variable is replaced and computation of an expression is
done at the compilation time.
example : pi = 3.14; r = 5;
Area = pi * r * r
Here at the compilation time the value of pi is replaced by 3.14 and r by 5 then
computation of 3.14 * 5 * 5 is done during compilation.
|– System Programming 7
7 – Compilers
II. Common Sub Expression Elimination
The common sub expression is an expression appearing repeatedly in the program which
is computed previously.
Then if the operands of this sub expression do not get changed at all then result of such
sub expression is used instead of recomputing it each time
Example:
t1 := 4 * i
t2 := a[t1]
t3 := 4 * j
t4 : = 4 * i
t5:= n
t6 := b[t4]+t5
The above code can be optimized using common sub expression elimination
t1=4*i
t2=a[t1]
t3=4*j
t5=n
t6=b[t1]+t5
The common sub expression t4:= 4 * i is eliminated as its computation is already in t1 and
value of i is not been changed from definition to use.
}
III. Loop invariant computation (Frequency reduction)
Loop invariant optimization can be obtained by moving some amount of code outside the
loop and placing it just before entering in the loop.
This method is also called code motion.
Example:
while(i<=max-1)
{
sum=sum+a[i];
}
Can be optimized as a
N=max-1;
While(i<=N)
{
sum=sum+a[i];
}
IV. Strength Reduction
Strength of certain operators is higher than others.
|– System Programming 8
7 – Compilers
For instance strength of * is higher than +.
In this technique the higher strength operators can be replaced by lower strength
operators.
Example:
for(i=1;i<=50;i++)
{
count = i x 7;
}
Here we get the count values as 7, 14, 21 and so on up to less than 50.
This code can be replaced by using strength reduction as follows
temp=7
for(i=l;i<=50;i++)
{
count = temp;
temp = temp+7;
}
V. Dead Code Elimination
A variable is said to be live in a program if the value contained into is subsequently.
On the other hand, the variable is said to be dead at a point in a program if the value
contained into it is never been used. The code containing such a variable supposed to be
a dead code. And an optimization can be performed by eliminating such a dead code.
Example :
i=0;
if(i==1)
{
a=x+5;
}
if statement is a dead code as this condition will never get satisfied hence, statement
can be eliminated and optimization can be done.
|– System Programming 9
8 – Interpreters and Debuggers
|– System Programming 1
8 – Interpreters and Debuggers
statement performed by the compiler is of the same order of magnitude as the effort
involved in the interpretation of the statement.
If a 400-statement program is executed on a test data with only 80 statements being visited
during the test run, the total CPU time in compilation followed by the execution of the
program is 400 * tc + 80 *te, while the total CPU time in interpretation of the program is 80
*ti = 80 *tc. This shows that the interpretation will be cheaper in such cases. However, if
more than 400 statements are to be executed, compilation followed by execution will be
cheaper, which means that using interpreter is advantageous up to the execution of 400
statements during the execution. This clearly indicates that from the point of view of the
CPU time cost, interpreters are a better choice at least for the program development
environment.
Benefits of Interpretation
The distinguished benefits of interpretation are as follows:
Executes the source code directly. It translates the source code into some efficient
Intermediate Code (IC) and immediately executes it. The process of execution can be
performed in a single stage without the need of a compilation stage.
Handles certain language features that cannot be compiled.
Ensures portability since it does not produce machine language program.
Suited for development environment where a program is modified frequently. This means
alteration of code can be performed dynamically.
Suited for debugging of the code and facilitates interactive code development.
|– System Programming 2
8 – Interpreters and Debuggers
Data
Java Virtual
Results
Java Machine
bytecode
Errors
Class Loader +
Java source Java
Java bytecode Errors
program Compiler
Java verifier
bytecode
Java Just-In- Mixed-mode
Results
Time Compiler execution
|– System Programming 3
8 – Interpreters and Debuggers
Figure 8.1. It simply compiles the complete Java program into the machine language of a
computer. This scheme provides fast execution of the Java program; however, it cannot
provide any of the benefits of interpretation or just-in- time compilation.
Types of Errors
Syntax Error
Syntax errors occur due to the fact that the syntax of the programming language is not
followed.
The errors in token formation, missing operators, unbalanced parenthesis, etc., constitute
syntax errors.
These are generally programmer induced due to mistakes and negligence while writing a
program.
Syntax errors are detected early during the compilation process and restrict the compiler to
|– System Programming 4
8 – Interpreters and Debuggers
proceed for code generation.
Let us see the syntax errors with Java language in the following examples.
Example 1: Missing punctuation-"semicolon"
int age = 50 // note here semicolon is missing
Example 2: Errors in expression syntax
x = ( 30 - 15; // note the missing closing parenthesis ")"
Semantic Error
Semantic errors occur due to improper use of programming language statements.
They include operands whose types are incompatible, undeclared variables, incompatible
arguments to function or procedures, etc.
Semantic errors are mentioned in the following examples.
Example: Type incompatibility between operands
int msg = "hello"; //note the types String and int are incompatible
Logical Error
Logical errors occur due to the fact that the software specification is not followed while
writing the program. Although the program is successfully compiled and executed error
free, the desired results are not obtained.
Let us look into some logical errors with Java language.
Example : Errors in computation
public static int mul(int a, int b) {
return a + b ;
}
// this method returns the incorrect value with respect to the specification that requires to
multiply two integers
Example : Non-terminating loops
String str = br. readLine();
while (str != null) {
System.out.pri ntln(str);
} // the loop in the code did not terminate
Logical errors may cause undesirable effect and program behaviors. Sometimes, these
errors remain undetected unless the results are analyzed carefully.
Debugging Procedures
Whenever there is a gap between an expected output and an actual output of a program,
the program needs to be debugged.
An error in a program is called bug, and debugging means finding and removing the errors
present in the program.
Debugging involves executing the program in a controlled fashion.
During debugging, the execution of a program can be monitored at every step.
|– System Programming 5
8 – Interpreters and Debuggers
In the debug mode, activities such as starting the execution and stopping the execution are
in the hands of the debugger.
The debugger provides the facility to execute a program up to the specified instruction by
inserting a breakpoint.
It gives a chance to examine the values assigned to the variables present in the program at
any instant and, if required, offers an opportunity to update the program.
Types of debugging procedures:
o Debug Monitors:
A debug monitor is a program that monitors the execution of a program and reports
the state of a program during its execution. It may interfere in the execution process,
depending upon the actions carried out by a debugger (person). In order to initiate
the process of debugging, a programmer must compile the program with the debug
option first. This option, along with other information, generates a table that stores
the information about the variables used in a program and their addresses.
o Assertions:
Assertions are mechanisms used by a debugger to catch the errors at a stage before
the execution of a program. Sometimes, while programming, some assumptions are
made about the data involved in computation. If these assumptions went wrong
during the execution of the program, it may lead to erroneous results. For this, a
programmer can make use of an assert statement. Assertions are the statements
used in programs, which are always associated with Boolean conditions. If an assert()
statement is evaluated to be true, nothing happens. But if it is realized that the
statement is false, the execution program halts.
Classification of Debuggers
Static Debugging
Static debugging focuses on semantic analysis.
In a certain program, suppose there are two variables: var1 and var2. The type of var1 is an
integer, and the type of var2 is a float. Now, the program assigns the value of var2 to var1;
then, there is a possibility that it may not get correctly assigned to the variable due to
truncation. This type of analysis falls under static debugging.
Static debugging detects errors before the actual execution.
Static code analysis may include detection of the following situations:
o Dereferencing of variable before assigning a value to it
o Truncation of value due to wrong assignment
o Redeclaration of variables
o Presence of unreachable code
Dynamic/Interactive Debugger
Dynamic analysis is carried out during program execution. An interactive debugging system
provides programmers with facilities that aid in testing and debugging programs interactively.
|– System Programming 6
8 – Interpreters and Debuggers
A dynamic debugging system should provide the following facilities:
Execution sequencing: It is nothing but observation and control of the flow of program
execution. For example, the program may be halted after a fixed number of instructions are
executed.
Breakpoints: Breakpoints specify the position within a program till which the program gets
executed without disturbance. Once the control reaches such a position, it allows the user
to verify the contents of variables declared in the program.
Conditional expressions: A debugger can include statements in a program to ensure that
certain conditions are reached in the program. These statements, known as assertions, can
be used to check whether some pre-condition or post-condition has been met in the
program during execution.
Tracing: Tracing monitors step by step the execution of all executable statements present in
a program. The other name for this process is "step into". Another possible variation is "step
over" debugging that can be executed at the level of procedure or function. This can be
implemented by adding a breakpoint at the last executable statement in a program.
Traceback: This gives a user the chance to traceback over the functions, and the traceback
utility uses stack data structure. Traceback utility should show the path by which the current
statement in the program was reached.
Program-display capabilities: While debugging is in progress, the program being debugged
must be made visible on the screen along with the line numbers.
Multilingual capability: The debugging system must also consider the language in which the
debugging is done. Generally, different programming languages involve different user
environments and applications systems.
Optimization: Sometimes, to make a program efficient, programmers may use an optimized
code. Debugging of such statements can be tricky. However, to simplify the debugging
process, a debugger may use an optimizing compiler that deals with the following issues:
o Removing invariant expressions from a loop
o Merging similar loops
o Eliminating unnecessary statements
o Removing branch instructions
|– System Programming 7
NFA to DFA Examples
1. (0+1)*010(0+1)*
ϵ
0
2 3
ϵ
ϵ
ϵ ϵ 0 1
0 1 6 7 8 9
ϵ
4 5 ϵ
1
0
0
12 13
ϵ ϵ
ϵ ϵ
10 11 16 17
ϵ
14 15 ϵ
ϵ
ϵ- closure (0) = {0, 1, 2, 4, 7} ------- A
A B C
B B D
C B C
0
0
0 0
B E F H
0
1
1
1 0
A 0 0
1
1 1
C D G I
1
1
1
0
2. (010+00)*(10)* ϵ
0 1 0
ϵ 2 3 4 5 ϵ
ϵ
ϵ
9
0 1 10
ϵ ϵ
6 7 8
0 0
ϵ
ϵ
ϵ
ϵ
ϵ
1 0
14
11 12 13
Move (F, 0) = φ
Move (F, 1) = {12}
ϵ- closure (Move(F, 1)) = {12} ------ C
A B C
B D E
C F φ
D B C
E G φ
F φ C
G B C
DFA:
0
0
D
B
0 0
A G
1
1 0 0
1
C F E
1
3. (a+b)*a(a+b)
ϵ
a
a
10 ϵ
2 9
3 ϵ ϵ
ϵ
ϵ ϵ a
0 1 6 7 8 13
ϵ
4 5 ϵ ϵ 11 ϵ
12
b b
DFA:
a
a
B
a D
a
b
A a b
b
C E
b
b
4. (a+b)*abb(a+b)*
ϵ
a
2 3
ϵ
ϵ
ϵ ϵ a b
0 1 6 7 8 9
ϵ
4 5 ϵ
b
b
a
12 13
ϵ ϵ
ϵ ϵ
10 11 16 17
ϵ
14 15 ϵ
ϵ
ϵ- closure (0) = {0, 1, 2, 4, 7} ------- A
States a b
A B C
B B D
C B C
|– Compiler Design 9
NFA to DFA Examples
D B E
E F G
F F H
G F G
H F I
I F G
DFA: a
a
a b
B E F H
a a
b
b b
a a
A a b
a
b
C D G I
b
b
b
|– Compiler Design 10
NFA to DFA Examples
5. 10(0+1)*1
ϵ
0
4 5
ϵ
ϵ
1 0 ϵ ϵ 1
0 1 2 3 10
8 9
ϵ ϵϵ
ϵ 6 7
1
Move (A, 0) = φ
ϵ- closure (Move (A, 0)) = φ
Move (A, 1) = {1}
ϵ- closure (Move (A, 1)) = {1}------- B
|– Compiler Design 11
NFA to DFA Examples
Transition Table:
States 0 1
A φ B
B C φ
C D E
D D E
E D E
0
DFA:
B D
1
0
0 1 0
A
C E
1 1
|– Compiler Design 12