Lesson 16
Lesson 16
Lesson 16
Overview
of
Previous Lesson(s)
Over View
For a deterministic finite automaton M, the minimum number of
states in any equivalent deterministic finite automaton is the same
as the number of equivalence groups of M's states.
3
Over View..
The following table shows the result of applying the inputs a & b to
these states.
The input a leads from s1 to s5 in group B and input b leads to s2 in group
A.
5
Minimizing DFA States...
6
Over View...
Transition Diagrams:
7
Over View...
Finite Automata:
Accepting states indicate that the lexeme for some token has been
found.
8
Over View...
DFA:
A DFA is a special kind of finite automaton that has exactly one transition
out of each state for each input symbol.
NFA:
Automata that are not DFA's are called nondeterministic.
Allow ɛ transitions.
9
Over View...
Lex:
There is a family of software systems, including Lex and Flex, that are
lexical-analyzer generators.
The user specifies the patterns for tokens using an extended regular-
expression notation. Lex converts these expressions into a lexical
analyzer that is essentially a DFA that recognizes any of the patterns.
10
TODAY’S LESSON
11
Contents
Syntax Analysis
Introduction
The Role of the Parser
Representative Grammars
Syntax Error Handling
Error-Recovery Strategies
Context-Free Grammars
Formal Definition of a CFG
Notational Conventions
Derivations
Parse Trees and Derivations
Ambiguity
12
Syntax Analysis
In this section we will see parsing methods that are typically used
in compilers.
15
Role of the Parser..
There are 03 general types of parsers for grammars: universal, top-
down, and bottom-up.
Universal parsing methods such as the Cocke-Younger-Kasami
algorithm and Earley's algorithm can parse any grammar. These
general methods are, however, too inefficient to use in production
compilers.
Top-down methods build parse trees from the top (root) to the bottom
(leaves), while bottom-up methods start from the leaves and work
their way up to the root.
16
Representative Grammars
Expressions with + and *
E→E+T|T
T→T*F|F
F → ( E ) | id
17
Representative Grammars
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
Following ambiguous grammar will be used for illustration, but in
general we try to avoid ambiguity.
E → E + E | E * E | ( E ) | id
This grammar does not enforce precedence and it does not specify
left vs right associativity.
For example, id + id + id and id * id + id each have two parse trees.
18
Syntax Error Handling
Syntactic errors and general strategies for error recovery.
20
Syntax Error Handling...
The error handler in a parser has goals that are simple to state but
challenging to realize:
21
Error Recovery Strategies
Once an error is detected, how should the parser recover?
Error Productions
Include productions for common errors.
Global Correction
Change the input I to the closest correct input I' and produce the
parse tree for I'.
23
CFG
Grammars used to systematically describe the syntax of programming
language constructs like expressions and statements.
Other productions then define precisely what an expr is and what else a
stmt can be.
Terminals:
The basic components found by the lexer.
They are sometimes called token names, i.e., the first component of
the token as produced by the lexer.
Non-terminals:
Syntactic variables that denote sets of strings.
The sets of strings denoted by non-terminals help define the language
generated by the grammar
25
CFG Definition..
Start Symbol:
A non-terminal that forms the root of the parse tree.
Conventionally, the productions for the start symbol are listed first.
Productions:
The productions of a grammar specify the manner in which the
terminals and non-terminals can be combined to form strings.
26
CFG Definition..
2. The symbol Sometimes ::= has been used in place of the arrow.
27
CFG Definition..
Ex Grammar
Terminals: id + - * / ( )
Non-Terminals: expression, term, factor
Start Symbol: expression
28
Notational Conventions
Notational conventions for grammars: (used in this course)
29
Notational Conventions..
These symbols are non-terminals:
30
Notational Conventions…
Uppercase letters late in the alphabet, such as X, Y, Z, represent
grammar symbols, that is, either non-terminals or terminals.
32
Derivations
Assume we have a production A → α.
We would then say that A derives α and write A ⇒ α
33
Derivations
Formal definition of zero or more definitions:
1. α ⇒* α, for any string α.
2. If α ⇒* β and β ⇒ γ, then α ⇒* γ.
34
Derivations
Ex: E → E + E | E * E | ( E ) | id
35
Thank You