Compiler Design.: Why To Learn About Compilers

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

Compiler Design.

Why to learn about compilers?


Few people will ever be required to write a compiler for a general-
purpose language like C, Pascal. So why do most computer science institutions
offer compiler courses and often make these mandatory?
Some typical reasons are:
a) It is considered a topic that you should know in order to be well-cultured in
computer science.
b) A good craftsman should know his tools, and compilers are important tools
for programmers and computer scientists.
c) The techniques used for constructing a compiler are useful for other purposes
as well.
d) There is a good chance that a programmer or computer scientist will need to
write a compiler or interpreter for a domain-specific language.
Understanding how a compiler is built will allow programmers to get an
intuition about what their high-level programs will look like when compiled and
use this intuition to tune programs for better efficiency. Furthermore, the error
reports that compilers provide are often easier to understand.

What is a compiler?

In order to reduce the complexity of designing and building computers, nearly all of these are
made to execute relatively simple commands (but do so very quickly).
A program for a computer must be built by combining these very simple commands into a
program in what is called machine language. Since this is a tedious and errorprone
process most programming is, instead, done using a high-level programming language. This
language can be very different from the machine language that the computer can execute, so
some means of bridging the gap is required. This is where the compiler comes in.
A compiler translates (or compiles) a program written in a high-level programming language
that is suitable for human programmers into the low-level machine language that is required
by computers. During this process, the compiler will also attempt to spot and report obvious
programmer mistakes.
Using a high-level language for programming has a large impact on how fast programs can be
developed. The main reasons for this are:
Compared to machine language, the notation used by programming languages is closer to the
way humans think about problems.
The compiler can spot some obvious programming mistakes.
Programs written in a high-level language tend to be shorter than equivalent programs written
in machine language.
Another advantage of using a high-level level language is that the same program can be
compiled to many different machine languages and, hence, be brought to run on many
different machines.
Analysis of the Source Program
It consist of 3 types
Linear Analysis
Hierarchal Analysis
Semantic Analysis
Linear Analysis

In which the stream of characters making up
the source program is read from left-to-right
and grouped into tokens that are sequences of
characters having collective meaning.
Hierarchal Analysis
In which character or tokens are grouped
hieratically into nested collective meaning.

Semantic Analysis
In which certain checks are performed to
ensure that the components of the program
fit together meaningfully.
The Phase of Compiler
The phases of a compiler Since writing a compiler is a nontrivial
(having some variables or terms that are not equal to zero or an
identity)task, it is a good idea to structure the work.
A typical way of doing this is to split the compilation into several phases
with well-defined interfaces. Conceptually, these phases operate in
sequence each phase (except the first) taking the output from the previous
phase as its input. It is common to let each phase be handled by a separate
module. Some of these modules are written by hand, while others may be
generated from specifications. Often, some of the modules can be shared
between several compilers.

A common division into phases is described below. In some
compilers, the ordering of phases may differ slightly, some phases may be
combined or split into several phases or some extra phases may be inserted
between those mentioned below.
Lexical analysis This is the initial part of reading and analysing the
program text: The text is read and divided into tokens, each of which
corresponds to a symbol in the programming language, e.g., a variable
name, keyword or number.
Syntax analysis This phase takes the list of tokens produced by the lexical
analysis and arranges these in a tree-structure (called the syntax tree) that
reflects the structure of the program. This phase is often called parsing.
Type checking This phase analyses the syntax tree to determine if the
program violates certain consistency requirements, e.g., if a variable is used
but not declared or if it is used in a context that does not make sense given
the type of the variable, such as trying to use a boolean value as a function
pointer.
Intermediate code generation The program is translated to a simple
machine independent intermediate language.
Register allocation The symbolic variable names used
in the intermediate code are translated to numbers, each
of which corresponds to a register in the target machine
code.
Machine code generation The intermediate language
is translated to assembly language (a textual
representation of machine code) for a specific machine
architecture.
Assembly and linking The assembly-language code is
translated into binary representation and addresses of
variables, functions, etc., are determined.
THE COUSINS OF THE COMPILER
Are Preprocessor.Assembler.Loader and Link-
editor. pre-processor is a program that processes
its input data to produce output that is used as
input to another program. The output is said to
be a preprocessed form of the input data, which
is often used by some subsequent programs like
compilers. The preprocessor is executed before
the actual compilation of code begins, therefore
the preprocessor digests all these directives
before any code is generated by the
statements.They may perform the following
functions 1. Macro processing 2. File Inclusion
The Grouping of Phases:
Phases deals with logical organisation of compiler.
In an implementation, activities from more than one phase are often grouped
together.
Front and Back Ends:
The phases are collected into a front and a back end
. The front end consists of phases or part of phases that depends primarily on
source language and is largely independent of the target machine.
These normally include lexical and syntactic analysis, the creation of symbol table,
semantic analysis and intermediate code generation.
The back end includes portions of the compiler that depend on the target machine.
This includes part of code optimization and code generation.
Passes:
Several phases of compilation are implemented in a single pass consisting of
reading an input file and writing an output file.
Reducing the number of passes:
Takes time to read and write intermediate files.
Grouping of several phases into one pass, may force the entire program in
memory, because one phase may need information in a different order than previous
phase produces it. Intermediate code and code generation are often merged into
one pass using a technique called backpatching

Compiler-Construction Tools:

The compiler writers use software tools such as, debuggers, version managers,
profilers, and so on.
The following is a list of some useful compiler-construction tools:
1. Parser generators
2. Scanner generators
3. Syntax-directed translation engines
4. Automatic code generators
5. Data-flow engines
1. Parser generators: These produce syntax analyser from context free grammar as
input.
2. Scanner generators: These automatically produce lexical analyser from a
specification based on regular expressions.
3. Syntax-directed translation engines: These produce collection of routines from
parse tree, generating the intermediate code.
4. Automatic code generators: Takes collection of rules that define the translation of
each operation of the intermediate language into the machine language for the
target machine.
5. Data-flow engines: To perform good code optimization involves data-flow
analysis gathering of information about how values are transmitted from one part
of a program to other part.

You might also like