CD File - Merged
CD File - Merged
CD File - Merged
SESSION: 2024-2025
LAB FILE
OF
COMPILER DESIGN LAB
(CSP-012)
PRACTICAL -01
Aim-
To design and implement a lexical analyzer in C that processes an input
source code, recognizes tokens like identifiers, keywords, numbers, and
operators, and ignores redundant spaces, tabs, newlines, and comments.
The identifiers have a restricted maximum length.
Theory-
A lexical analyzer (lexer) is a crucial part of the front-end of a compiler.
It scans the source code character by character, groups them into
meaningful sequences, and classifies them as tokens such as identifiers,
keywords, numbers, operators, or delimiters. In this implementation,
the lexer also ignores comments (both single-line and multi-line) and
whitespace characters (spaces, tabs, and newlines). Although identifiers
can be arbitrarily long in the specification, they are limited to a
maximum length for practical purposes.
Algorithm-
1. Read the source code from an input file.
2. For each character, perform the following checks:
- If the character is a space, tab, or newline, skip it.
- If the character starts a comment (// or /*), skip the entire comment.
- If the character starts an identifier or keyword (alphabetic character
or _), read the sequence of characters, classify it as an identifier or
keyword.
- If the character is a digit, read the sequence of digits to recognize a
number.
- If the character is an operator or delimiter (like +, -, *, /, etc.), classify
it as an operator.
3. Output the recognized tokens.
4. Stop when the end of the file is reached.
2
Code-
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
|| chr == '=');
for (int i = 0;
if (strcmp(str, keywords[i]) == 0) {
return true;
4
return false;
return false;
int i = 0;
while (isdigit(str[i])) {
i++;
char* subStr
= (char*)malloc((subLength + 1) * sizeof(char));
subStr[subLength] = '\0';
return subStr;
if (!isDelimiter(input[right]))
right++;
if (isOperator(input[right]))
input[right]);
right++;
left = right;
char* subStr
if (isKeyword(subStr))
subStr);
else if (isInteger(subStr))
subStr);
else if (isValidIdentifier(subStr)
subStr);
else if (!isValidIdentifier(subStr)
subStr);
left = right;
return 0;
}
7
int main()
lexicalAnalyzer(lex_input);
printf(" \n");
char lex_input01[MAX_LENGTH]
lexicalAnalyzer(lex_input01);
return (0);
}
8
OUTPUT-
PRACTICAL -02
Aim
To write a C program that checks if a given line is a comment and
verifies if a given identifier is valid or not.
Theory
In C programming, comments are used to annotate the code without
affecting the program's execution. Comments can be either single-line
(starting with //) or multi-line (enclosed within /* and */). Identifiers
are names used to identify variables, functions, arrays, etc. A valid
identifier in C must start with a letter or an underscore, followed by
letters, digits, or underscores.
Algorithm
1. Start
2. Take input from the user for a line of code.
3. Check if the line is a comment:
a. If the line starts with //, it is a single-line comment.
b. If the line starts with /* and ends with */, it is a multi-line comment.
4. Display if the line is a comment or not.
5. Take input from the user for an identifier.
6. Check if the identifier is valid:
a. The first character must be a letter or an underscore.
b. Subsequent characters can be letters, digits, or underscores.
7. Display if the identifier is valid or not.
8. End
Code
#include <stdio.h>
#include <string.h>
#include <ctype.h>
if (strncmp(line, "//", 2) == 0) {
return 1;
return 1;
return 0;
return 0;
return 0;
return 1;
int main() {
if (isComment(line)) {
} else {
}
// Input identifier for validation check
if (isValidIdentifier(identifier)) {
} else {
return 0;
OUTPUT-
Practical-03(A)
Aim-
To implement a lexer in C language that reads a text file and counts the
number of characters, words, and lines.
Theory-
A lexer, also known as a lexical analyzer, is the first phase of a compiler.
It takes input as a stream of characters and divides it into meaningful
sequences called tokens. In this case, the lexer will read characters from
a text file and count the number of characters, words, and lines. Words
are identified by spaces, newlines, and other whitespace characters.
Algorithm-
1. Start.
2. Prompt the user for the filename.
3. Open the specified file in read mode.
4. Initialize counters for characters, words, and lines.
5. Read the file character by character until EOF:
a. Increment the character counter for each character.
b. If the character is a newline, increment the line counter.
c. If the character is a space, newline, or tab, and we are inside a word,
increment the word counter.
6. Close the file.
7. Display the character, word, and line count.
8. Stop.
Code
#include <stdio.h>
#include <ctype.h>
int main() {
FILE *file;
char filename[100], ch;
int characters = 0, words = 0, lines = 0;
int in_word = 0;
if (file == NULL) {
printf("Could not open file %s.\n", filename);
return 1;
}
return 0;
}
OUTPUT-
Number of characters: 30
Number of words: 6
Number of lines: 2
PRACTICAL-03(B)
Aim
To write a Lex program that counts the number of vowels and
consonants in a given input string.
Theory
Lex (or Flex) is a tool for generating scanners, also known as lexical
analyzers. It transforms a series of regular expressions and
corresponding actions into C code, which can recognize patterns in the
input. In this program, we use Lex to differentiate between vowels and
consonants in an input string and count their occurrences. The Lex tool
is particularly useful when working with text processing, as it handles
pattern matching efficiently.
Algorithm-
1. Start.
2. Define counters for vowels and consonants and initialize them to 0.
3. In the Lex rules section, define a pattern for vowels ([aeiouAEIOU])
and increment the vowel counter for each match.
4. Define a pattern for consonants ([a-zA-Z]) and increment the
consonant counter for each match.
5. Ignore all other characters (spaces, punctuation, digits, etc.).
6. In the main function, invoke the Lex function yylex() to analyze the
input string.
7. Print the counts of vowels and consonants.
8. End.
Code-
%{
#include <stdio.h>
int vowels = 0, consonants = 0;
%}
%%
%%
int main() {
printf("Enter a string: ");
yylex(); /* Call the lexical analyzer */
printf("Number of vowels: %d\n", vowels);
printf("Number of consonants: %d\n", consonants);
return 0;
}
int yywrap() {
return 1;
}
OUTPUT-
PRACTICAL-04(a)
Aim
To write a Lex program that reads a file and prints out all the numbers
present in that file.
Theory
Lex is a lexical analyzer generator that transforms an input file
describing a set of lexical patterns into a C program that scans input text
for occurrences of the patterns. A Lex program contains three sections:
definitions, rules, and user code. The rules section contains regular
expressions that define patterns in the input. When a pattern matches,
associated C code is executed. This program specifically identifies and
prints numbers from the input text.
Algorithm
1. Start the Lex program by defining the required libraries.
2. In the rules section, write a regular expression to match numbers (a
sequence of digits).
3. For each matched pattern, print the number.
4. In the main function, call the yylex() function to start scanning.
5. Compile and run the Lex program by providing a file as input.
6. The program will print all the numbers from the file.
Code
%{
#include <stdio.h>
%}
%%
[0-9]+ { printf("Number: %s\n", yytext); }
.|
{ /* Ignore everything else */ }
%%
int main(void)
{
yylex();
return 0;
}
OUTPUT-
Practical-04(b)
Aim
To implement a Lex program that reads an HTML file and prints all the
HTML tags present in the file.
Theory
Lex is a powerful tool for generating lexical analyzers based on regular
expressions. HTML files contain tags that are enclosed in angle brackets
'<' and '>'. These tags are used to format the structure of a webpage. In
this program, we use Lex to extract all HTML tags from a given file by
identifying patterns that match the structure of an HTML tag. This
program captures any sequence of characters enclosed between the
angle brackets, which is a typical structure of HTML tags.
Algorithm
1. Start.
2. Define a regular expression to match any HTML tag (anything
between '<' and '>').
3. Define rules to ignore whitespace and any other characters.
4. Open the given file as input.
5. Process the file using Lex rules to identify and print HTML tags.
6. Close the file.
7. Stop.
Code
%{
#include <stdio.h>
%}
%%
"<"[^>]+">" { printf("HTML Tag: %s\n", yytext); } /* Match HTML tags:
anything between '<' and '>' */
[ \t\n]+ ; /* Ignore whitespace */
. ; /* Ignore any other characters */
%%
Theory-
LL(1) parsers are a type of top-down parsers for context-free grammars. The
'L' refers to reading the input from left to right, and the '1' refers to using one
lookahead token to make parsing decisions. The construction of an LL(1)
parsing table involves calculating the FIRST and FOLLOW sets for each
non-terminal in the grammar.
• FIRST Set: The set of terminals that begin the strings derivable from
a non-terminal.
• FOLLOW Set: The set of terminals that can appear immediately to
the right of a non-terminal in some sentential form.
The parsing table is constructed using these sets, allowing the parser to
decide which production to apply based on the current input symbol.
Algorithm-
1• Input the grammar: Read the productions for the grammar.
2• Calculate the FIRST set: For each non-terminal, recursively compute
the FIRST set.
3• Calculate the FOLLOW set: For each non-terminal, compute the
FOLLOW set based on the FIRST sets.
4• Construct the parsing table: Use the FIRST and FOLLOW sets to fill
the parsing table.
5• Parse the input: Use the parsing table to parse a given input string.
Code-
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
void FIRST_SHOW();
void FOLLOW_SHOW();
int CREATE_LL1_TABLE();
void PARSING_TABLE_SHOW(int);
int top=0;
int t,nt,ch,cr,count;
char FIRST[100][100],FOLLOW[100][100];
char T[100],NT[100],G[100][100],STACK[100];
int LL1[100][100];
void main()
int i,j,flag,fl,ch1;
char STR[100];
fl=1;
while(flag==1)
scanf("%d",&ch1);
switch(ch1)
scanf("%s",G[cr++]);
if(NT[i]==G[cr-1][0])
fl=0;
if(fl==1)
NT[nt++]=G[cr-1][0];
fl=1;
for(i=3;G[cr-1][i]!='\0';i++)
if(T[j]==G[cr-1][i])
fl=0;
if(fl==1)
T[t++]=G[cr-1][i];
fl=1;
break;
case 2:if(cr>0)
printf("\nGrammar");
for(i=0;i<nt;i++)
printf("%c ",NT[i]);
for(i=0;i<t;i++)
printf("%c ",T[i]);
for(i=0;i<cr;i++)
printf("%s ",G[i]);
printf("\n");
else
break;
case 3:flag=0;
FIRST_SHOW();
FOLLOW_SHOW();
T[t++]='$';
T[t]='\0';
flag=CREATE_LL1_TABLE();
PARSING_TABLE_SHOW(flag);
if(flag==0)
{
printf("Enter string for parsing: ");
scanf("%s",STR);
LL1_PARSER(STR);
void FIRST_SHOW()
int i,j;
char arr[100];
for(i=0;i<nt;i++)
arr[0]='\0';
FIND_FIRST(arr,NT[i]);
for(j=0;arr[j]!='\0';j++)
FIRST[i][j]=arr[j];
FIRST[i][j]='\0';
count=0;
printf("\nFIRST:\n\n");
for(i=0;i<nt;i++)
{
printf("FIRST( %c ): { ",NT[i]);
for(j=0;FIRST[i][j+1]!='\0';j++)
printf(" %c,",FIRST[i][j]);
printf(" %c }",FIRST[i][j]);
printf("\n");
void FOLLOW_SHOW()
int i,j;
char arr[100];
for(i=0;i<nt;i++)
count=0;
arr[0]='\0';
FIND_FOLLOW(arr,NT[i]);
for(j=0;arr[j]!='\0';j++)
FOLLOW[i][j]=arr[j];
FOLLOW[i][j]='\0';
}
printf("\nFOLLOW:\n\n");
for(i=0;i<nt;i++)
printf("FOLLOW( %c ): { ",NT[i]);
for(j=0;FOLLOW[i][j+1]!='\0';j++)
printf(" %c,",FOLLOW[i][j]);
printf(" %c }",FOLLOW[i][j]);
printf("\n");
int i,j;
if(flag==0)
for(j=0;j<t;j++)
printf("\t%c\t",T[j]);
printf("\n--------------------------------------------------------------------------
--------------");
printf("\n\n");
for(i=0;i<nt;i++)
printf("%c\t|\t",NT[i]);
for(j=0;j<t;j++)
if(LL1[i][j]!=0)
printf("%s\t\t",G[LL1[i][j]-1]);
else
printf("%c\t\t",'_');
printf("\n\n");
int i;
if(!isupper(ch))
add_symbol(arr,ch);
else
{
for(i=0;i<cr;i++)
if(ch==G[i][0])
if(G[i][3]=='!')
add_symbol(arr,G[i][3]);
else
FIND_FIRST(arr,G[i][3]);
int i,j,k,l,fl=1,flag=1;
if(ch==G[0][0])
add_symbol(arr,'$');
for(i=0;i<cr;i++)
if(ch==G[i][j])
{
flag=0;
for(k=0;k<nt;k++)
if(NT[k]==G[i][j+1])
for(l=0;FIRST[k][l]!='\0';l++)
add_symbol(arr,FIRST[k][l]);
if(FIRST[k][l]=='!')
fl=0;
break;
add_symbol(arr,G[i][j+1]);
fl=1;
FIND_FOLLOW(arr,G[i][0]);
int i,flag=0;
for(i=0;arr[i]!='\0';i++)
if(ch==arr[i])
flag=1;
break;
}
}
if(flag==0)
arr[count++]=ch;
arr[count]='\0';
int CREATE_LL1_TABLE()
int i,j,k,fl,pos,flag=0;
char arr[100];
for(i=0;i<cr;i++)
arr[0]='\0';
count=0;
FIND_FIRST(arr,G[i][3]);
for(j=0;j<count;j++)
if(arr[j]=='!')
FIND_FOLLOW(arr,G[i][0]);
break;
}
for(k=0;k<nt;k++)
if(NT[k]==G[i][0])
pos=k;
break;
for(j=0;j<count;j++)
if(arr[j]!='!')
for(k=0;k<t;k++)
if(arr[j]==T[k])
if(LL1[pos][k]>0)
return flag;
else
LL1[pos][k]=i+1;
break;
return flag;
int i=0,j,pos,pos1,n,k;
STR[strlen(STR)]='$';
STACK[top++]='$';
STACK[top]=G[0][0];
printf("STACK\t\t\tINPUT\t\t\tACTION");
printf("\n-----------------------------------------------------------------------------
-------\n");
i=0;
while(STACK[top]!='$')
for(j=0;STACK[j]!='\0';j++)
printf("%c ",STACK[j]);
printf("\t\t");
for(j=i;STR[j]!='\0';j++)
printf("%c ",STR[j]);
if(STR[i]==STACK[top])
printf("\t\tReduced: %c",STACK[top]);
STACK[top]='\0';
top=top-1;
i=i+1;
else
for(j=0;j<nt;j++)
if(STACK[top]==NT[j])
{
pos=j;
break;
for(j=0;j<t;j++)
if(STR[i]==T[j])
pos1=j;
break;
n=LL1[pos][pos1];
if(G[n-1][3]=='!')
STACK[top]='\0';
top--;
else
for(j=3;G[n-1][j]!='\0';j++)
k=j;
STACK[top]='\0';
for(j=k;j>2;j--)
STACK[top++]=G[n-1][j];
top--;
printf("\t\tShift: %s",G[n-1]);
printf("\n");
for(j=0;STACK[j]!='\0';j++)
printf("%c ",STACK[j]);
printf("\t\t");
for(j=i;STR[j]!='\0';j++)
printf("%c ",STR[j]);
printf("\n");
printf("\nParsing successfully\n");
}
OUTPUT-
Practical -06
Aim-
To implement a LALR (Look-Ahead LR) parser in C to parse a given
input according to a specified grammar.
Theory-
LALR (Look-Ahead LR) parsers are a type of bottom-up parser used for
syntax analysis in compilers. LALR parsers combine the strengths of SLR
(Simple LR) and canonical LR parsers, providing a balance between
parsing power and memory usage. The parser uses a parsing table to
determine the next action (shift, reduce, or accept) based on the current
state and the input symbol. Shift moves the symbol onto the stack, while
reduce applies a grammar rule to replace symbols in the stack with a
non-terminal. LALR parsers are efficient for parsing deterministic
context-free languages.
Algorithm-
1. Define a grammar and construct the LALR parsing table.
2. Initialize a stack with the starting state (typically state 0).
3. Read the input string and append an end-of-input marker ('$').
4. Repeat until the input is parsed or an error is detected:
a. Get the current state from the top of the stack.
b. Read the next input symbol.
c. Use the parsing table to determine the action (shift, reduce, or
accept):
i. Shift: Push the new state onto the stack and read the next symbol.
ii. Reduce: Apply the specified production rule, pop states, and push
the resulting non-terminal.
iii. Accept: If the input symbol is '$' and the stack is in the accepting
state, accept the input.
iv. Error: If no valid action exists, display a parsing error.
5. Display the stack contents and parsing actions at each step for
debugging.
Code-
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
// Function declarations
void push(int state);
void pop();
int getTop();
void displayStack();
int getAction(int state, char symbol);
int main() {
int state;
int i = 0;
char symbol;
printf("\nParsing steps:\n");
while (1) {
state = getTop();
symbol = input[i];
return 0;
}
void pop() {
if (top >= 0) {
top--;
displayStack();
} else {
printf("Stack underflow!\n");
}
}
int getTop() {
if (top >= 0) {
return stack[top];
}
return -1
}
void displayStack() {
printf("Stack: ");
for (int i = 0; i <= top; i++) {
printf("%d ", stack[i]);
}
printf("\n");
}