Lexical Analyser Parser
Lexical Analyser Parser
Lexical Analyser Parser
COMPILER
compiler is a program takes a program written in a source language and translates it into an equivalent program in a target language. source program COMPILER target ( Normally a program written in ( Normally the equivalent program in program a high-level programming language) machine code relocatable object file)
A
error messages
2
PHASES OF COMPILER
PHASES OF A COMPILER
Source Program
Lexical Analyzer
Syntax Semantic Intermediate Code Code Analyzer Analyzer Code Generator Optimizer Generator
Target Program
Each phase transforms the source program from one representation into another representation.
They communicate with error handlers. They communicate with the symbol table.
LEXICAL ANALYZER
INTRODUCTION
A lexical analyzer breaks an input stream of characters into tokens. Programs performing lexical analysis are called lexical analyzer or lexer. A lexer consists of scanner and tokenizer.
Writing lexical analyzers by hand can be a tedious process, so software tools have been developed to ease this task. Perhaps the best known such utility is Lex. Lex is a lexical analyzer generator for the UNIX operating system, targeted to the C programming language
TOKEN A classification for a common set of strings Examples Include <Identifier>, <number>, etc. PATTERN The rules which characterize the set of strings for a token Recall File and OS Wildcards ([A-Z]*.*) LEXEME Actual sequence of characters that matches pattern and is classified by a token Identifiers: x, count, name, etc
8
The input program as you see it. main () { int i, sum; sum = 0; for (i=1; i<=10; i++); sum = sum + i; printf("%d\n",sum); }
10
11
analyzer [Scanner]
input
Remove
white spaces,tabs,new line characters Remove comments Manufacture tokens Generate lexical errors Pass token to parser
13
14
15
LEX INTRODUCTION
Lex is one of the compiler writing tools, that is used to generate a lexical analyzer or scanner from description of tokens of programming language to be implemented. Lex takes a specially-formatted specification file containing the details of a lexical analyzer. This tool then creates a C source file for the associated tabledriven lexer.
LEX SPECIFICATION
Input
to the Lex is a text file containing regular expression along with the actions to be taken by the generated scanner when each regular expression is matched.
The
output is a file that contains C source code defining procedure yylex(),which implements DFA corresponding to regular expression given in input file.
The
output file is usually called lex.yy.c or lexyy.c, which when compiled linked to the main program acts as a scanner or lexical analyzer recognizing tokens specified by regular expression of the input file.
LEX SPECIFICATIONS
A Lex input file is consists of three parts, a collection of definitions, a collection of rules, and a collection of user subroutines. These three sections are separated by double-percent directives (``%%'').
A proper Lex specification has the following format.
LEX SPECIFICATIONS
{definition} %% {rules} %% {user subroutines}
Where the definition & the user subroutines are often omitted. The second %% is optional, but the first is required to mark the beginning of rules.
The input program as you see it. main () { int i, sum; sum = 0; for (i=1; i<=10; i++); sum = sum + i; printf("%d\n",sum); }
21
22
23
24
25
26
27
28
29
MAIN FEATURES
Simple
PARSER
PARSING
Parsing
(syntactic analysis) is the process of analyzing a sequence of tokens to determine their grammatical structure with respect to a given (more or less) formal grammar.
YACC SPECIFICATION
Yacc
(yet another compiler compiler) is a parser generator, which is a program that takes as its input a specification of syntax of the programming language, and produces as its output a parse procedure for that language whose name is yyparse().
The
to yacc is a specification file usually with .y suffix, containing the rules of grammar specifying the structure of language to be implemented. The output is C source code for parser, usually in a file y.tab.c or ytab.c.
CREDITS
THANK YOU!