Literature Review LEX
Literature Review LEX
Literature Review LEX
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue III Mar 2022- Available at
Abstract: A literature review, that has compiled information from various sources on the topic of lex. We intend to explore
different facets of the topic such as process , specification , variables and sample code.
Keywords: Lex, Yacc, Compiler Design, Lexical Analysis
While in the Lexical Analysis Phase it is required that tokens be recognized. Lex accomplishes this task using regular expressions.
Prior to the year 1975 designing a compiler was a tedious ordeal which is when Lesk and Johnson[1975] published their work on lex
and yacc. These utilities greatly simplified compiler writing [1].
A Compilation process is a long and complicated procedure. When a source code is entered ,it goes through three layers of logical
treatment. The first layer is the lexical analysis phase, where a tool called lex is used to convert the given string into tokens . The
second layer is the syntax analysis phase, here the tokens are converted into a syntax tree, using the yacc tool. The third layer is the
code generator phase , here the syntax tree is converted to the generated code [2] .
In this article, The lexical analysis phase is of primary interest to us. The Lexical analysis phase attempts to convert strings to
tokens.In technical terms lex converts regular expression specifications into C implementation of a corresponding finite state
machine, the C program is later compiled and executed to produce a lexical analyzer [3] .
Here an “.l” file (eg. file.l) is added as input to a lexical analyzer, which is converted into a stream of tokens as output, A C
program(.c file) [4] .Tokens are uniquely identified using a token name, which is essentially an abstract representations of certain
kinds of lexical units. The parser processes these input symbols [5].
A lex program comprises of a “pattern” part (which is basically the regular expressions used ) and an action part ( C code) [6] . An
action endeavors to return a token so that it made used by the parser [7]. Regular expression can be expressed as a finite state
automate or FSA which can be represented by states and the transition between them [8] . Lex translates regular expressions into
computer programs that mimic FSA [9] . Using next input character and current state, next state is recognized and put in computer
generated state table[8]. Having this information we can now understand some of lex’s limitations, lex cannot handle nested
structures like parentheses [11] .
A lex program contains three parts: Declarations, Rules and Auxiliary functions[12]
Example 1:
Auxiliary functions
Lex is divided into parts and is separated with the ' %(‘ and ‘%) ’ symbol . The shortest lex file [13] is :
Example 2:
%% [14]
Characters are copied from input to output one at a time, The first “%%” is needed as there should be a rules section [15] .
Declaration: Declaration section is divided into auxiliary declaration and regular definition [16] Auxiliary declaration is used to
declare functions, header files, define global variables etc. It is copied on the C code by lex. C is used to write the declaration and it
is bracketed with’ %{‘and ‘%}’ . Short hand representations are allowed in lex, a regular expression maybe expressed in the [17]
form D R , Where a regular expression R is represented by D [18].
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 148
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue III Mar 2022- Available at
Rules: There are two parts of rules observed in a lex program: pattern matching and action execution [19] The yylex() function
checks the input for a match of the pattern and executes code in the action part . [20]. Auxiliary Functions: Lex generates a c code
for the rules and adds it to the yylex() functions. Using Auxiliary functions programmers may add their own code to the c file . [21]
A. Variables
1) Yyin: It is of the type FILE* and is defined by lex. It points to file .yyin, which is an input file[22]. A programmer can chose a
file to associate yyin to a file, then yyin points to that file [23] by default lex assigns it to stdin.
Example 3 [24] :
/* Declarations */
/* Rules */
main(int argc, char* argv[])
{ if(argc > 1)
{ FILE *fp = fopen(argv[1], "r");
yyin = fp; }
yylex(); return 1; }
2) yytext: Is of the type char*, matches the lexeme found.Each evocation yytext carries a pointer to the lexeme found within the
input stream by yylex() [25] .
3) yyleng: An int type variable that stores lexeme’s length.
/* Rules Section*/
([a-zA-Z0-9])* {i++;} /* Rule for counting
number of words*/
"\n" {printf("%d\n", i); i = 0;}
int yywrap(void){}
int main()
// The function that starts the analysis
return 0;
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 149
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue III Mar 2022- Available at
We have looked at several sources to articulate an introduction into the topic of LEX and its role in compiler design, Operating
Systems etc. We have provided a programmatic basis for how lex maybe used, how it maybe specifies, how its variables maybe
used and finally how it all comes together in code. Further scope for research includes a more thorough computer science or
computing based approach to lex and not just a programmatic one.
Causal We wish to acknowledge the contributions of Prof. Shamik Palit of the School of Engineering and IT, Manipal Academy of
Higher Education for helping us gain a clear understanding of compiler and design and allowing us to research on this topic.
[1] [1],[2],[7]- [11],[13]-[15] T. Niemann, Lex and Yacc Tutorial,
[2] [3]-[6] Aho, Alfred V., Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, and Alfred V. Aho. Compilers: Principles, Techniques, & Tools. 2007.
[3] [12], [17]-[25] Vpn , Nachi, , EXPL NIT Calicut, 20
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 150