13-Dependency Grammar-03-09-2024

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Syntax Analysis and CFG

Dr. Hiteshwar Kumar Azad


Email ID: [email protected]
SCOPE
Syntactic Analysis
• Syntactic analysis, also referred to as syntax analysis or parsing, is the process
of analyzing natural language with the rules of a formal grammar.
• Grammatical rules are applied to categories and groups of words, not individual
words.
• Syntactic analysis basically assigns a semantic structure to text.

• For example,
A sentence includes a subject and a predicate where the subject is a noun phrase
and the predicate is a verb phrase. Take a look at the following sentence:
“The dog (noun phrase) went away (verb phrase).”

Processing Stages in NLP


Processing Stages in NLP System

Syntactic Analysis
• Syntactic analysis is concerned with the construction of sentences.

• It involves analysing the meaning of words and phrases to understand


the purpose of a sentence or paragraph.

• Syntactic structure indicates how the words are related to each other.

• Syntax tree is assigned by a grammar and a lexicon. Lexicon indicates


syntactic category of words.

• Grammar (typically Context Free Grammer) specifies legitimate


concatenations of constituents.

NLP 3
Syntactic Ambiguity
• Syntactic ambiguity is about confusion in how words are grouped together in a
sentence. The same words can be put together in different ways to mean different
things.
For example, a sentence could be confusing, and you might not know if a word is
describing a noun before it or after it.

Example: The phrase “I saw the man with a telescope,” which could mean that
the observer used a telescope to see the man or that the man had a telescope.
Syntactic Ambiguity
Syntactic Ambiguity
• Syntactic analysis or parsing may be defined as the process of analyzing the strings
of symbols in natural language conforming to the rules of formal grammar.
• The origin of the word ‘parsing’ is from Latin word ‘pars’ which means ‘part’.

Concept of Parser :
• It is used to implement the task of parsing. It may be defined as the software
component designed for taking input data (text) and giving structural
representation of the input after checking for correct syntax as per formal
grammar.
• It also builds a data structure generally in the form of parse tree or abstract
syntax tree or other hierarchical structure.
Types of Parsing

Derivation divides parsing into the followings two types −


• Top-down Parsing
• Bottom-up Parsing

Top-down Parsing
• In this kind of parsing, the parser starts constructing the parse tree from the start
symbol and then tries to transform the start symbol to the input.
• The most common form of topdown parsing uses recursive procedure to process
the input.
• The main disadvantage of recursive descent parsing is backtracking.

Bottom-up Parsing
• In this kind of parsing, the parser starts with the input symbol and tries to
construct the parser tree up to the start symbol.
Context Free Grammar
• A Context Free Grammar (CFG) consists of set of rules (productions) that groups
and organise the words and lexicons in a language.
• CFG is a type of formal grammar that can represent complex relations and can be
efficiently implemented.
• CFG also called as Phrase-Structure Grammar which follows Backus-Naur Form (BNF).
• Mathematically, CFG is defined as four tuple (T, N, S, R).

8
An example context-free grammar

9
Role of CFG in language processing

10
Application of grammar rewrite rules

Parse tree

11
13
14
CFG and evaluation

• If a sentence in a language is accepted, it should be possible to construct a


parse tree.
• The leaf node represents the words (terminals) in the language.
• Root node specifies the production rules.
• Sometimes, there could be more than one parse tree possible for a sentence.
This leads to an ambiguity in the grammar.

15
Chomsky Normal Form (CNF)

16
Chomsky Normal Form Conversion

17
CFN to CNF Conversion: Example

18
19
20
Issues with Context-Free Grammar (CFG)-Based Parsing
• Some limitations that can make parsing complex languages
challenging or inefficient:
1. Ambiguity:
– Multiple parse trees: A CFG can generate multiple parse trees for a
given input string, leading to ambiguity in the interpretation of the
sentence.
– Semantic disambiguation: Resolving ambiguity often requires
semantic information or context-dependent rules, which are beyond
the scope of CFGs
2. Context Sensitivity:
– Context-dependent rules: Some languages require context-dependent
rules to be parsed correctly, which CFGs cannot directly handle.
– Workarounds: While techniques like left factoring and semantic
actions can be used to address some context-sensitive aspects, they
have limitations.
21
Issues with Context-Free Grammar (CFG)-Based Parsing
3. Left Recursion:
– Infinite loops: Left-recursive rules can cause infinite loops during
parsing, leading to non-termination.
– Efficiency issues: Even if termination is guaranteed, left recursion can
significantly impact parsing efficiency.
– Transformations: To avoid left recursion, CFGs often need to be
transformed, which can introduce complexity.
4. Efficiency:
– Exponential time: In the worst case, CFG parsing can have exponential
time complexity, making it impractical for large inputs or complex
languages.
– Parsing techniques: While techniques like LL(k) and LR(k) parsing can
improve efficiency, they still have limitations.

22
Issues with Context-Free Grammar (CFG)-Based Parsing

• To address these issues, various parsing techniques and


extensions have been developed, including:
– Semantic parsing: Incorporating semantic information to resolve
ambiguity and handle context-dependent rules.
– Parsing expression grammars (PEGs): A more expressive alternative to
CFGs that avoids ambiguity and left recursion.
– Combinator parsing: A declarative approach to parsing that can handle
complex languages efficiently.

➢ The choice of parsing technique depends on the specific requirements of


the language being parsed, including its complexity, efficiency needs, and
the desired level of expressiveness.

23
Example: Ambiguity in CFG Parsing
Consider the CFG:

Input string: “the dog chased the cat with the bone”
• This input string can be parsed in two different ways:

This example illustrates the common issue of ambiguity in CFG parsing, where a
single input string can have multiple valid parse trees, leading to different
interpretations. 24
Example: Ambiguity in CFG Parsing

In Parse 1, "with the bone" modifies "dog". In Parse 2, "with the bone" modifies "chased".
Both parses are syntactically correct, but the semantic interpretations are different.
25
Shallow parsing
• Shallow parsing, also known as syntactic chunking or tokenization, is a
technique in Natural Language Processing (NLP) that involves identifying
the grammatical structure of a sentence at a basic level.
• Unlike deep parsing, which aims to analyze the entire syntactic structure
of a sentence (producing a full parse tree), shallow parsing provides a
more focused, yet less detailed analysis.
• Shallow parsing focuses on identifying phrases (noun phrases, verb
phrases, prepositional phrases, etc.) without delving into the
relationships between them.

Key Tasks in Shallow Parsing:


• Tokenization: Breaking a sentence into individual words or tokens.
• Part-of-Speech (POS) Tagging: Assigning a grammatical category
(e.g., noun, verb, adjective) to each word.
• Chunking: Identifying phrases within the sentence.

26
Shallow parsing
Example:
Sentence: The quick brown fox jumps over the lazy dog.

Tokenization: POS Tagging: Chunking:


✓ The ✓ The (DT) ✓ NP: The quick brown fox
✓ quick ✓ quick (JJ) ✓ VP: jumps over
✓ brown ✓ brown (JJ) ✓ NP: the lazy dog
✓ fox ✓ fox (NN)
✓ jumps ✓ jumps (VBZ)
✓ over ✓ over (IN)
✓ the ✓ the (DT)
✓ lazy ✓ lazy (JJ)
✓ dog ✓ dog (NN)

27
Top-Down Shallow Parsing Example
Sentence: The quick brown fox jumps over the lazy dog.

Parsing Process:
1.Start with the start symbol: S (sentence).
2.Expand the start symbol: S -> NP VP.
3.Expand the NP (noun phrase): S -> Det N VP.
4.Expand the Det (determiner): S -> the N VP.
5.Expand the N (noun): S -> the quick N VP.
6.Expand the N (noun): S -> the quick brown N VP.
7.Expand the N (noun): S -> the quick brown fox VP.
8.Expand the VP (verb phrase): S -> the quick brown fox V NP.
9.Expand the V (verb): S -> the quick brown fox jumps NP.
10.Expand the NP (noun phrase): S -> the quick brown fox jumps Det N.
11.Expand the Det (determiner): S -> the quick brown fox jumps the N.
12.Expand the N (noun): S -> the quick brown fox jumps the lazy N.
13.Expand the N (noun): S -> the quick brown fox jumps the lazy dog.

28
Top-Down Shallow Parsing Example

Sentence: The quick brown fox jumps over the lazy dog.
"The cat chased the mouse."
Applications of Shallow Parsing

• Shallow parsing, a technique in natural language processing (NLP) that


focuses on identifying the main syntactic elements of a sentence without
delving into deep syntactic details, has various practical applications:
Information Extraction
•Named entity recognition (NER): Identifying named entities like persons, organizations,
and locations.
•Relation extraction: Identifying relationships between entities, such as "works for" or
"is located in.“

Text Summarization
•Extractive summarization: Selecting the most important sentences or phrases from a
document.
•Abstractive summarization: Generating a new summary that captures the main ideas of
the original text.

Question Answering
•Question understanding: Analyzing the syntactic structure of a question to determine
its intent.
•Answer retrieval: Finding relevant information from a knowledge base or document.
Thanks

31

You might also like