Lec3-CompilerConstruction 2
Lec3-CompilerConstruction 2
Lec3-CompilerConstruction 2
Construction
Lecture 3
1
Topics Covered in
Lecture 2
2
Source Code
Lexical Analyzer
Syntax Analyzer
Code Optimizer
Code Generator
3
Object Code
Lexical Analyzer
(Part One)
4
Lexical Analysis
INPUT: sequence of characters
OUTPUT: sequence of tokens
Next_char() Next_token()
Input
Scanner Parser
character token
Symbol
Table
6
1. Removal of white space
• By white space we mean
– Blanks
– Tabs
– New lines
• Why ?
– White space is generally used for
formatting source code.
A = B + C Equals A=B+C
7
1. Removal of white space
Learn by Example
// This is beginning of my code
int A;
int B = 2;
int C = 33;
A = B + C;
/* This is
end of
my code
*/
8
1. Removal of white space
Learn by Doing
// This is beginning of my code
int A ;
A = A
*
A
;
/* This is
end of
my code
*/
9
2. Removal of comments
Why ?
– Comments are user-added strings which
do not contribute to the source code
Example in Java
// This is beginning of my code
Means nothing to the program
int A;
int B = 2;
int C = 33;
A = B + C;
/* This is
end of Means nothing to the program
my code
*/
10
3. Recognizes
constants/numbers
• How is recognition done?
– If the source code contains a stream of digits
coming together, it shall be recognized as a
constant.
Example in Java
// This is beginning of my code
int A;
int B = 2 ;
int C = 33 ;
A = B + C;
/* This is
end of
my code
*/
11
4. Recognizes keywords
• Keywords in C and Java
– If , else , for, while, do , return etc
12
5. Recognizes identifiers
• What are identifiers ?
– Names of variables, functions, arrays , etc
13
6. Correlates error messages with
the source program
• How ?
– Keeps track of the number of new line characters seen
in the source code
– Tells the line number when an error message is to be
generated. Error Message at line 1
• Example in Java
1. This is beginning of my code
2. int A;
3. int B2 = 2 ;
4. int C4R = 33 ;
5. A = B + C;
6. /* This is
7. end of
8. my code
9. */
14
Errors generated by Lexical
Analyzer
1. Illegal symbols
• =>
2. Illegal identifiers
• 2ab
3. Un terminated comments
• /* This is beginning of my code
15
• Learn by example
– // Beginning of Code
– int a char } switch b[2] =;
– // end of code
• No error generated
• Why ?
• Lexeme
– Actual sequence of characters that matches a pattern and has
a given Token class.
– Examples:
Identifier: Name, Data, x
Integer: 345, 2, 0, 629
• Pattern
– The rules that characterize the set of strings for a token
– Example:
Integer: A digit followed or not followed by digits
Identifier: A character followed or not followed by characters or
digits
17
18
Learn by Example:
Input string: size := r * 32 + c
Identify the <token ,lexeme> pairs
1. <id, size>
2. <assign, :=>
3. <id, r>
4. <arith_symbol, *>
5. <integer, 32>
6. <arith_symbol, +>
7. <id, c>
19
Learn by Doing
Input string:
position = initial + rate * 60
20
Lets Revise!
21
Lexical Analysis
Next_char()
Next_token()
Input
Scanner Parser
character token
Symbol
Table
22
Role of Lexical Analyzer
1. Removal of white space
2. Removal of comments
3. Recognizes constants
4. Recognizes Keywords
5. Recognizes identifiers
6. Correlates error messages with the
source program
23
Terminologies
• Token
–Identifier, Integer, Float, LeftParen
• Lexeme
– Identifier: Name, Data, x
Integer: 345, 2, 0, 629
Pattern
– Example:
Integer: A digit followed or not followed by
digits
Identifier: A character followed or not followed
by characters or digits
24
Homework
Identify the <token ,lexeme> pairs
1. For ( int x= 0; x<=5; x++)
2. B= (( c + a) * d ) / f
3. While ( a < 5 )
a= a+1
4. Char MyCourse[5];
5. if ( a< b)
a=a*a;
else
b=b*b;
25
Assignment-1
Write a program in C++ or Java that reads a
source file and performs the followings
operations:
1. Removal of white space
2. Removal of comments
3. Recognizes constants
4. Recognizes Keywords
5. Recognizes identifiers
Due Date: 5th October, 2010
26