Chapter 3 - Context Free Languages

Download as pdf or txt
Download as pdf or txt
You are on page 1of 59

Chapter 3: Context-free Languages

3.1. Context-free languages


3.2. Sentential forms
3.3. Derivation tree or parse tree
3.3.1. Left most and right most derivations
3.4. Simplification of context-free grammar
3.4.1. Methods for transforming grammar
3.4.2. Chomsky’s Hierarchy of grammars
3.5. Parsing and ambiguity

1
Context Free Languages
• Context Free languages are languages generated by Context Free
Grammar.

• Context Free languages are the higher level of regular languages.

• Context Free Languages are accepted by Pushdown automata (PDA).

• Context-Free Grammar (CFG): It is a formal grammar that is used to


generate all possible patterns of strings in a given formal language.

• Context-free grammar G can be defined by four tuples:

G = {V, ∑, P, S} 2
Cont…

• ∑ is the final set of a terminal symbol. It is denoted by lowercase letters.

• V is the final set of a non-terminal symbol. It is denoted by capital letters.

• P is a set of production rules, which is used for replacing non-terminal


symbols(on the left side of the production) in a string with other terminal
or non-terminal symbols(on the right side of the production).

• S is the start symbol that is used to derive the string. We can derive the
string by repeatedly replacing a non-terminal by the right-hand side of the
production until all non-terminals have been replaced by terminal
symbols. 3
Cont…

• Context Free Grammar has a production rule of the form:

A → a, where a = {V ∪ ∑}* and A ∈V

Example 1: Construct a CFG for the language L=anbn


V={S}
∑={a,b}
S={S}
P = { S → aSb , S → aSb , S → ɛ (epsilon) }
S → aSb (if we substitute S by ɛ, S = {ab})
S → a aSb b (substitute S by aSb), if we substitute S by ɛ, S = {aabb}
4
Cont…

P = { S → aSb , S → aSb , S → ɛ }
S → aSb
S → a aSb b (substitute S by aSb),
S → aa aSb bb (substitute S by aSb), if we substitute S by ɛ, S = {aaabbb}
S → aaa aSb bbb, (substitute S by aSb), if we substitute S by ɛ, S =
{aaaabbbb}
Generally, the Above production rules generate the strings having an equal
number of a’s and b’s.

S= {ab, aabb, aaabbb, aaaabbbb, …}


5
Example 2: Construct a CFG for the language L=anb2n

The production rule is:


P = ({S}, {a, b}, S, {S → aSbb | abb, S → ɛ})
S → aSbb (if we substitute S → ɛ, S = {abb}
S → aabbbb (substitute S → abb)
Or S → aSbb
S → a aSbb bb (Substitute S by aSbb)
S → aaS bbbb (substitute S → abb) , if we substitute S → ɛ, S = {aabbbb}
Generally, the above grammar can generate strings having a double number
of b’s over the number of a’s.
S = {abb, aabbbb, aaabbbbbb....}. 6
Properties of Context Free Grammar
• Context Free languages (CFG) are accepted by pushdown automata (PDA)
but not by finite automata (FA).

• Context free grammar generates the Context free languages. Let discuss
some Closure Properties of CFG.

CFG Closure Properties


Consider L1 and If L2 are two context free languages where:
L1 = { anbn | n >= 0 } and L2 = { cmdm | m >= 0 }
1. CFL are closed under Union
As L3 = L1 ∪ L2 = { anbn ∪ cmdm | n >= 0, m >= 0 } is also context free. 7
Cont…
2. CFL are closed under Concatenation
As L3 = L1.L2 = { anbncmdm | n >= 0, m >= 0 } is also context free.
3. CFL are closed under Kleen Closure
L1* = { anbn | n >= 0 }* is also context free.
4. CFL are not closed under Complementation
Complementation of context free language L1 which is ∑* – L1 is not a CFL.
5. CFL are not closed under Intersection
Suppose L1 = { anbncm | n >= 0 and m >= 0 } and L2 = (ambncn | n >= 0 and m >= 0 } are
CFL’s. Now apply intersection on given CFL’s.
L3 = L1 ∩ L2 = { anbncn | n >= 0 } is not a context free language because there are two
comparisons in derived language after intersection. 8
Applications of Context Free Grammar
• CFG has great practical importance:

o For defining programming languages.

o For the construction of compilers.

o For describing Arithmetic expressions.

o For the transition of programming languages.

9
How context-free language named
• Context-free languages are so named because they can be generated by
context-free grammars (CFGs), which are a type of formal grammar that
generate strings of symbols without taking into account any context or
surrounding symbols.

• In other words, the rules of a CFG depend only on the nonterminal symbols
and do not depend on the context in which they appear. This makes CFGs
simpler and more flexible than regular grammars and context-sensetive
languages, which have more complex rules that depend on the context of
the symbols. 10
Sentential form
• In context-free grammar (CFG), a sentential form is a sequence of
symbols that can be derived from the start symbol of the grammar by
applying a series of production rules.

• In other words, it is a string of terminals and non-terminals that can be


generated by the grammar. Sentential forms can be used to represent the
structure of a sentence or other string of symbols in a language.

11
Cont…
• For example, in the CFG
S -> NP VP,
NP -> Det Noun,
VP -> Verb,
Det -> "the",
Noun -> "cat",
Verb -> "sits",
• The sentential form can be derived by applying the production rules in the
following order: S -> NP VP -> Det Noun VP -> the cat VP -> the cat Verb
-> the cat sits. The sentential form derived from the above production
rule is: "the cat sits". 12
Cont…
Example 2:
S -> NP VP
NP -> Pronoun
VP -> Verb NP
Pronoun -> "I"
Verb -> "saw"
NP -> Det Noun
Det -> "a"
Noun -> "dog"
• Sentential form: "I saw a dog"

13
Derivation tree or parse tree
• The process of deriving or generating a string from a given grammar is
called derivation.

• The geometrical representation of a derivation is known as a derivation


tree or parse tree.

• A derivation or parse tree is an ordered rooted tree that graphically


representation of strings that can be derived from Context Free Grammar.

14
Cont…

• In the Derivation Tree, the deepest sub-tree from leftmost is traversed first by
following the rule of In-order traversal. In this way, the original input string is
obtained but

✓ Root node/vertex must be labeled by the Start symbol.

✓ All leaf nodes must be terminals or epsilon(∈).

✓ All interior nodes must be Non-terminals.

✓ The derivation is read from left to right.

15
Types derivation trees
• There are two types of derivation trees:

1. Leftmost or left derivation tree

• A tree obtained by applying a production rule to the leftmost variable (non-


terminal) in each step(getting a string by expanding the leftmost non-terminal at
each step).

2. Rightmost derivation tree

• a tree obtained by applying the production rule to the rightmost variable or no-
terminal in each step (getting a string by expanding the rightmost non-terminal at
16
each step)
Examples
• Consider a string W = xxxyyxyyyx using the grammar

G = {S → xB | yA, S → xS | yAA , A → x, B → yS | xBB | y}

let’s draw the derivation tree of each step as given below

1. Leftmost Derivation and Tree

Step 01: S → xB (W=xB)


The production rules which give the 1st x are
S → xB , S → xS , A → x, and S → xB.
Now choose which production rule gives the 1st symbol
x and the next symbol also x (S → xB).
17
The first leftmost non-terminal
Cont…

Step 02: W= xxBB (Using B → xBB)

Considering the 2nd symbol in the string is x


The leftmost non-terminal
Step 03: W= xxxBBB (Using B → xBB)

Considering the 3rd symbol in the string is x


18
Cont…

Step 04: W= xxxyBB (Using B → y)

Considering the 4th symbol in the string is y

19
Cont…

Step 05: W= xxxyyB (Using B → y)

Considering the 5th symbol in the string is y

Now we have finished the move and we got xxxyy. But there are remaining
symbols xyyyx. To represent these symbols we have to go back to the previous
step.
20
Cont…

Step 06: W= xxxyyxBB (Using B → xBB)

The remaining leftmost non-terminal

Considering the 6th symbol in the string is x

21
Cont…

Step 07: W= xxxyyxyB (Using B → y)

Considering the 7th symbol in the string is y

22
Cont…

Step 08: W= xxxyyxyyS (Using B → yS)

Considering the 8th symbol in the string is y

The remaining leftmost non-terminal


23
Cont…

Step 09: W= xxxyyxyyyA (Using S → yA)

Considering the 9th symbol in the string is y

The remaining leftmost non-terminal


24
Cont…

Step 10: W= xxxyyxyyyx (Using A → x)

x Considering the 10th symbol in the string is x


Note: the strings are taken from left to right in both leftmost and rightmost derivation trees
25
Cont…

2. Rightmost derivation

Step 01: S → xB (Using S → xB)

Considering the 1st symbol in the string is x

The first rightmost non-terminal


26
Cont…

Step 02: W= xxBB (Using B → xBB)

Considering the 2nd symbol in the string is x

The rightmost non-terminal

27
Cont…

Step 03: W= xxBxBB (Using B → xBB)

The rightmost non-terminal

28
Cont…

Step 04: W= xxBxByS (Using B → yS)

29
Cont…

Step 05: W= xxBxByyA (Using S → yA)

30
Cont…

Step 06: W= xxBxByyx (Using A → x)

Now we have finished the move and we


got xx..x..yyx. But there are remaining
symbols at the middle. To represent these
symbols we have to go back to the
previous step.

31
Cont…

Step 07: W= xxBxyyyx (Using B → y)

32
Cont…

Step 08: W= xxxBBxyyyx (Using B → xBB)

Considering the 3rd symbol in the string is x

33
Cont…

Step 09: W= xxxByxyyyx (Using B → y)

Considering the 4th symbol in the string is y

34
Cont…

Step 10: W= xxxyyxyyyx (Using B → y)

Considering the 5th symbol in the string is y

We will take the strings from left to right


35
Exercise
1. Consider a string W = “aabaa” using the grammar

G = {S → aAS | aSS| ɛ, A → SbA | ba}

2. Consider the CFG for the string “baababa”.

S → bB | aA

A → b | bS | aAA

B → a |aS | bBB

and draw the derivation tree (both leftmost and rightmost derivation trees)

36
Simplification of CFG
• In CFG(Context Free Grammar), all production rules and symbols are not
necessary for the derivation of a string.

• Besides this, there may be “null” productions or “unit” productions.

• Elimination of these symbols and productions is called simplification of CFG.

• Simplification consists of the following ways:

1. Reduction of CFG

2. Removal of Unit productions

3. Removal of Null productions


37
1. Reduction of CFG
• CFGs are reduced in two phases

Phase 1: Derivation of an equivalent Grammar G' from the Context Free Grammar
G, such that each variables (non-terminal) derives some terminal symbol.

Derivation Procedure:

1. Include all symbols Wi, that derives some terminal and initialize i = 1

2. Include symbols Wi+1, that derives Wi

3. Increment i and repeat step 2, until Wi+1 = Wi

4. Include all production rules that have Wi in it.


38
Cont…

Phase 2: Derivation of an equivalent Grammar G'' from the Context Free Grammar
G', such that each symbol appears as sentential.

Derivation Procedure:

1. Include Start symbol in Yi, that derives some terminal and initialize i = 1

2. Include symbols Yi+1, which can be derived from Yi, and include all production
rules that have been applied.

3. Increment i and repeat step 2, until Yi+1 = Yi

39
Cont…

Example: find a reduced equivalent grammar G, having a production rule as:

P: S → AC|B, A → a, C→ c|BC, E → aA|e

Phase 1:
T = {a, c, e}, V = {S, A, C, B, E}, S = {S}
Step 1: include all symbols that can derive terminal symbols
W1 = {A, C, E}, S and B don’t drive any terminal symbols
Step 2: include all symbols that can derive the symbols in W1
W2 = {A, C, E, S}, S (included) can derive symbols A and C
Step 3: include all symbols that can derive the symbols in W2
W3 = {A, C, E, S} 40
Cont…
W2 = {A, C, E, S}
W3 = {A, C, E, S}
Now W3 = W2, satisfies Wi+1 = Wi
G' = ({A, C, E, S}, {a, c, e}, P, {S})
{A, C, E, S}: non-terminal symbols found in W3
{a, c, e}: are terminal symbols derived by non-terminal symbols in W3, A derives
‘a’, C derives ‘c’, E derives ‘e’, and S derives no terminal symbol (checked from the
production rule given.
Step 4: Production rule: we have to start from Start symbol
P1: S → AC, A → a, C → c, E → aA|e
In S → AC (why B is not included, because B is not found in G' .
Let’s proceed to phase 2
41
Cont…

Phase 2:
Step 1: Start from Start symbol
Y1 = {S}
Step 2: include all symbols that can be derived from Y1(use a production P1 rule of
G ’)
Y2 = {S, A, C}, A and C can be derived from S in P1
Step 3: include all symbols that can be derived from Y2
Y3 = {S, A, C, a, c}, a derived by A, and c derived by C
Step 4: include all symbols that can be derived from Y3
Y4 = {S, A, C, a, c}
Now Y4 = Y3 , now Yi+1 = Yi is satisfied.
42
Cont…

G" = {(A, C, S), (a, c), P, (S)}


(A, C, S): Non-terminal symbols found in G"
(a, c): terminal symbols that can be derived by terminal symbols in G",
S: start symbol
P: production rule as: Let’s start from start symbol
S → AC, A → a, C →c
So that the reduced grammar G" is found as:
G" = {(A, C, S), (a, c), (S), (S → AC, A → a, C →c)}

43
2. Removal of Unit productions
• Any production rule of the form A → B, where A, B ∈ V (non-terminal symbols)
is called Unit production

Procedure for removal Unit productions:

1. To remove A → B, add production A → x to the grammar rule whenever B → x


occurs in the grammar. x ∈ T ( terminal symbols), x can be Null.

2. Delete A → B from the grammar.

3. Repeat step 1 until all unit productions are removed.

44
Cont…

Example : Remove a Unit production from the grammar G having the production
rule as:
P: S → XY, X → a, Y → Z | b, Z → M, M → N, N → a
Unit productions (in the from form if A → B) are Y → Z , Z → M, and M → N
Steps to remove these unit productions:
1. Since N → a, add M → a to the production (because M → N)
P: S → XY, X → a, Y → Z | b, Z → M, M → a, N → a
2. Since M → a, add Z → a to the production (because Z → M)
P: S → XY, X → a, Y → Z | b, Z → a, M → a, N → a
3. Since Z → a, add Y → a to the production (because Y → Z)
P: S → XY, X → a, Y → a | b, Z → a, M → a, N → a
45
Cont…
3. Since Z → a, add Y → a to the production (because Y → Z)
P: S → XY, X → a, Y → a | b, Z → a, M → a, N → a
• In the production rule S → XY, X gives a, and Y gives a | b. but Z → a, M → a, N
→ a can’t be reached from the start symbol. So we have to remove them.

• The new production rule after the unit production is removed will be:

P: S → XY, X → a, Y → a | b

• If a symbol does not exist on the right-hand side of the production rule and does
not participate in the derivation of any string, that symbol is considered a useless
symbol. Similarly, if a variable does not participate in the derivation of any
string, it is useless. 46
3. Removal of Null productions
• In CFG, a non-terminal symbol A is a nullable if there is a production in the form
A → ɛ or there is a derivation that starts at A and leads to ɛ. Like (A→ … → ɛ)

Procedure for removal Null productions:

1. To remove A → ɛ look for the productions whose right-side contains A.

2. Replace each occurrence of A in each of these productions with ɛ.

3. Add resultant productions to the grammar.

47
Cont…
Example: Remove the null production from the grammar G, having the production
rule as follows:
P: S → ABAC, A → aA | ɛ, B → bB | ɛ, C → C
• The null production in the form of (Non-terminal → ɛ), are A → ɛ, B → ɛ. We
have to remove these null productions.
• To remove A → ɛ:
1. Replace A by ɛ in the production rule
S → ABAC, A → aA | ɛ
S → ABAC | ABC | BAC | BC (replacing A by ɛ in different cases)
A→ a
The new production rule is:
S → ABAC | ABC | BAC | BC , A → aA | a, B → bB | ɛ, C → C
48
Cont…
• To remove B → ɛ:
2. Replace A by ɛ in the production rule
The production rule is:
S → ABAC | ABC | BAC | BC , A → aA | a, B → bB | ɛ, C → C
Replace B by ɛ
S → AAC | AC | C, B → b
The new production rule is:
S → AAC | AC | C, A → aA | a B → bB | b, C → C
As you ca see all the null productions are removed

49
Parsing and ambiguity
• Ambiguity refers to a situation where a given sentence or string of
symbols can be generated by the grammar in more than one way.

• In other words, there are multiple possible parse trees or derivations that
can lead to the same sentence. This can lead to confusion or uncertainty
about the intended meaning of the sentence.

50
Cont…

• Ambiguity can arise in CFGs when there are multiple production rules that
can be applied to the same nonterminal symbol, or when there are
overlapping or conflicting rules.

• One way to address ambiguity in CFGs is to use techniques such as


disambiguation rules or semantic constraints to ensure that each sentence
has a unique interpretation.

51
Ambiguous grammar
• A grammar or a Context-Free Grammar(CFG) is said to be ambiguous if
there exists more than one leftmost derivation(LMDT) or more than one
rightmost derivation(RMDT), or more than one parse tree for a given input
string.

• A context-free grammar(CFG) represented by G = (N, T, P, S) is said to be


ambiguous grammar if there exists more than one string in L(G).
Otherwise, the grammar will be unambiguous.

52
Cont…

• Since Ambiguous Grammar can produce two Parse trees for the same
expression, it's often confusing for a compiler to find out which one among
all available Parse Trees is the correct one according to the context of the
work.

• This is the reason ambiguity is not suitable for compiler construction.

53
Cont…

• Example 1: Let the production rule is given as:


S -> AB|aaB
A -> a|Aa
B -> b
• Let us generate the string “aab” from the given grammar. Parse trees for
generating string “aab” are as follows :

Here for the same string, we are getting more than


one parse tree for the same string. Hence,
grammar is ambiguous grammar.
54
• Example 2: Check whether the given production is ambiguous or not.
S → aSb | SS
S→ε
• For the string "aabb" the above grammar can generate two parse trees.

Since there are two parse trees for a


single string, "aabb", the grammar
G is ambiguous.

55
Unambiguous grammar
• A grammar is unambiguous if there is no ambiguity in it. This means it
should contain only one leftmost derivation (LMD), one rightmost
derivation (RMD), and one parse tree for the given input string.

• Any parsers do not parse this Grammar.

Example of Unambiguous Grammar:


X -> AB
A -> a
B -> b
56
Chomsky’s Hierarchy of grammars
• According to Noam Chomsky, there are four types of grammars:
Grammar Grammar Language Accepted Automaton Type
type Accepted
Type -0 Unrestricted Recursively Turing Machine
grammar enumerable language
Type-1 Context sensitive Context sensitive Linear bounded
grammar language automaton
Type-2 Context free Context free Pushdown
grammar language automaton
Type-3 Regular grammar Regular language Finite state
automaton
57
58
THE END!

59

You might also like