Session 7 - Syntax Parsing

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY
Syntax Parsing
Lecturer: Dr. Bùi Thanh Hùng

Data Science Department
Faculty of Information Technology
Industrial University of Ho Chi Minh city
Email: [email protected]
Website: https://sites.google.com/site/hungthanhbui1980/
Examples
Build a parse tree for the following sentence
I gave him the book

Examples
Examples
If (b==0) a=b;
Examples
How it works
How it works
Rule Table
Parser (PTCP)
Syntax Parser
A parse tree typically has the following

components:
- Tree structure (there are many data structure
options)
- Structure of a tree node:
• Symbol at the current node
• Related vocabulary (if it is a leaf node)
• List of child nodes
• Additional information, for further analysis
Steps to build a Syntax Parser
Describe the source language grammar rules
These descriptions may initially be in natural
language
- Specify the meaning of non-terminal symbols
- Translate into rigorous grammar rules
- Analyze the grammar to select the most
appropriate parsing method
+ Is the grammar ambiguous?
+ Is the grammar left-recursive?
+ Is the grammar ambiguous?
+ How complex is the grammar?
Selecting the appropriate parsing method
Building a direct PTCP (for languages with simple
complexity)
Building a 2-step PTCP set
- Based on the input grammar, build a prediction
automaton
- Using the automaton to process the word
sequence from the PTTV
Building a universal PTCP set: in case the grammar
is too complex, universal analysis methods can be
used to build a PTCP set
Choosing the right parsing method
Choosing how to handle syntactic errors,

generating correction suggestions, and
situations requiring semantic composition
Inference
Definition: (called Aß deduces

B) if A →  is a production,  and ß are
strings of symbols belonging to a certain
language L
Inference
Definition: (called Aß deduces

B) if A →  is a production,  and ß are
strings of symbols belonging to a certain
language L
we say deduces
Inference
Symbolic system:
direct derivation
derivation through 0 or more steps
derivation through 1 or more steps
Some properties
with
and then
Left and right inference
The problem of parsing is essentially the
problem of finding the derivation string
where:
-S: the original symbol
-: the string containing the intermediate
symbol
-ß: the string consisting only of the terminal
symbols
It is easy to see in the above derivation process:
▪ There are many derivation options from S to ß
▪ An intermediate symbol belonging to a must
sooner or later be transformed by a certain
production rule
▪ If the intermediate symbol chosen for
transformation is always the leftmost of a, then
we call this option a left derivation
▪ Similar definition for right derivation
Given grammar G with productions
Input String
Left derivation from S into W

What encoding should we use to

store the derivation and how
should we use that information to
get the derivation?
parse tree
The parse tree represents the structure of an inference
The root node is the starting symbol
The leaf nodes are always the ending symbols
The internal nodes are always the intermediate symbols
The tree does not represent the order in which direct
inferences are executed:
- Traversing the tree will form the order in which
inferences are executed
- Left inference is equivalent to traversing the tree in
the middle-left-right order
parse tree
The parse tree represents the structure of
an inference
The root node is the starting symbol
The leaf nodes are always the ending
symbols
The internal nodes are always the
intermediate symbols
The tree does not represent the order in
which direct inferences are executed:
- Traversing the tree will form the order in
which inferences are executed
- Left inference is equivalent to traversing
the tree in the middle-left-right order
Inference vs Tree structure
Inference vs Tree structure
• Deduction is a one-dimensional information
representation
• Tree structure is a two-dimensional information
representation
• Tree structure illustrates the correlation between
components in a spatial structure
• A parse tree most fully describes the transformation from
the original symbol to the string to be analyzed, most
suitable for all purposes
• A syntax tree eliminates intermediate components,
focusing on describing the correlation between the end
symbols, this structure is suitable for information
aggregation
Ambiguous grammar
A loose grammar leads to many different parse trees
for a given input string
Ambiguous grammar
A grammar in which a string w has at least two

or more corresponding parse trees is called an
ambiguous grammar.
The biggest problem with ambiguous grammars
is the polysemy of the string w (there are many
ways to understand it), which results in the
impossibility of calculating the exact semantics
of w
Ambiguous grammar
A grammar in which a string w has at least two or
more corresponding parse trees is called an ambiguous
grammar.
The biggest problem with ambiguous grammars is the
polysemy of the string w (there are many ways to
understand it), which results in the impossibility of
calculating the exact semantics of w
Eliminate Ambiguity
Disambiguation actually creates a new grammar
based on the old grammar, but the two
grammars are not exactly equivalent.
There are many disambiguation strategies such
as:
- Adding intermediate variables
- Introducing constraints outside the grammar
(e.g. specifying the precedence of
operations)
Eliminate Ambiguity
There are many disambiguation strategies such as:
- Adding intermediate variables
- Introducing constraints outside the grammar (e.g.
specifying the precedence of operations)
Parsing strategies
Parsing strategies are divided into 3 groups:
• Trial-error strategies (backtracking): top-
down, bottom-up
• Dynamic programming strategies: CYK,
Earley,…
• Deterministic strategies: LL, LR,…
Parsing strategies
There are also other methods that rely on the

characteristics of the language to apply
effective techniques (e.g., operator hierarchy
analysis).
Some methods are general (e.g., Earley), but
most methods only work with grammars with
specific characteristics
Trial and error strategy
• Trial-and-error or backtracking strategies are
the first to come to mind when solving the
problem of grammar analysis.
• These strategies simply involve trying out
the inference rules until the target inference
sequence is achieved.
Divided into 2 opposite approaches:
• Top-down
• Bottom-up
Top-down method:
• Looking from the parse tree, go from top to
bottom
• Try to gradually transform S to w
Top-down method:
• Looking from the parse tree, go from top to
bottom
• Try to gradually transform S to w
Bottom-up method:
• Looking at the parse tree, go from bottom to top
• Try to reduce from w to S
Both methods have restrictions on the input
grammar:
• Top-down requires the input grammar to be
non-left recursive
• Bottom-up requires the input grammar to
have no empty production (A→ ) and no
inference of the form A +A
Both methods have input grammar constraints:
These strategies are only theoretically meaningful

because they are slow and grammar-constrained,
but trial-and-error provides many suggestions for
other algorithms
• Removing grammar constraints: universal
methods
• Removing backtracking: deterministic methods
Dynamic strategy
The idea of dynamic programming aims
to:
• Build methods that have no input
grammar restrictions
• Store analyzed substrings to avoid
backtracking
Dynamic strategy
CYK Algorithm:
• Transform grammar to Chomsky
normal form
• Grammar has no empty inference
• No backtracking
Dynamic strategy
Let w be the n length string to be parsed. And
G represent the set of rules in our grammar
with start state S.
Dynamic strategy
with start state S.
Thuật toán CYK
CYK
with start state S.
Time and Space Complexity :

•Time Complexity
•Space Complexity
Earley
▪ Problems with left recursion in top-down parsing

– VP → VP PP
▪ Background
– Developed by Jay Earley in 1970
– No need to convert the grammar to CNF
– Left to right
▪ Complexity
– Faster than O(n3) in many cases
more versatile, no grammar constraints, no backtracking
Earley
Looks for both full and partial constituents

• When reading word k, it has already identified
all hypotheses that are consistent with words 1
to k-1
• Example:
– S [i,j] → Aux . NP VP
– NP [j,k] → N
– S [i,k] → Aux NP . VP
Earley
• It uses a dynamic programming table, just like CKY

• Example entry in column 1
– [0:1] VP -> VP . PP
– Created when processing word 1
– Corresponds to words 0 to 1 (these words
correspond to the VP part of the RHS of the rule)
– The dot separates the completed (known) part from
the incomplete (and possibly unattainable) part
Earley
Three types of entries

– ‘scan’ – for words
– ‘predict’ – for non-terminals
– ‘complete’ – otherwise
Earley
Earley
Earley
Looks for both full and partial constituents
• When reading word k, it has already identified all
hypotheses that are consistent with words 1 to k-1
• Example entry in column 1
– [0:1] VP -> VP . PP
– Created when processing word 1
– Corresponds to words 0 to 1 (these words correspond to
the VP part of the RHS of the rule)
– The dot separates the completed (known) part from the
incomplete (and possibly unattainable) part
S [i,j] → Aux . NP VP NP [j,k] → N S [i,k] → Aux NP . VP
take this book
Deterministic strategy
The idea of determinism goes the other way:
sacrificing richness of grammar for speed.
Common features:
• Grammars have certain constraints
• Based on pre-analysis of grammars to
predict possible situations
• Building tables of options, which
indicate what to do when encountering
specific situations
This is the strategy that all compilers use
due to its speed advantage (no
backtracking).

Session 7 - Syntax Parsing

Uploaded by

Copyright:

Available Formats

Session 7 - Syntax Parsing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session 7 - Syntax Parsing

Uploaded by

Copyright:

Available Formats

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

Lecturer: Dr. Bùi Thanh Hùng

Build a parse tree for the following sentence

I gave him the book

A parse tree typically has the following

Choosing the right parsing method

Choosing how to handle syntactic errors,

Definition: (called Aß deduces

Definition: (called Aß deduces

Given grammar G with productions

Left derivation from S into W

What encoding should we use to

A grammar in which a string w has at least two

There are also other methods that rely on the

These strategies are only theoretically meaningful

Time and Space Complexity :

▪ Problems with left recursion in top-down parsing

Looks for both full and partial constituents

• It uses a dynamic programming table, just like CKY

Three types of entries

You might also like