Unit 2
Unit 2
Unit 2
Lexical Semantics
Unit 2
What is Parsing?
Parsing in natural language is termed as “to
analyze the input sentence in terms of grammatical
constituents, identifying the parts of speech,
syntactic relations”.
A parser uses the grammar rules, to verify if the
input text is valid or not syntactically.
The main functions of a parser are as follows:
o The parser reports to us any syntax error in
the syntax of the text.
o The parser helps us to recover from the most
commonly occurring errors so that the
processing of the rest of the program does not
get halted.
o Parser helps in generating the parse tree.
o Parser also helps in creating the symbol table
that is used by various stages of the NLP
process and thus it is quite important.
o Parses also help in the production of IR or
Intermediate Representations.
Linguistic constituents and constituency
test
Linguistic constituents are the building blocks of sentences and phrases in natural language.
They are smaller units that combine to form larger structures. Consistency tests are methods
used in linguistics to determine the boundaries and structure of these constituents within a
sentence or phrase.
There are several tests used to identify constituents:
o Substitution Test: Replace a group of words with a single word, such as a pronoun. If the
substitution is grammatical and makes sense, then the group of words is likely a
constituent.
o Movement Test: Move a group of words within the sentence. If the resulting sentence is
grammatical and retains the same meaning, then the group of words is likely a
constituent. Example: "Samantha quickly drank her coffee." -> "Her coffee, Samantha
quickly drank."
o Coordination Test: Combine two similar phrases using a conjunction like "and" or "or." If
the resulting phrase is grammatical, then the original phrases are constituents. Example:
"John likes pizza. Mary likes pasta." -> "John likes pizza and Mary likes pasta."
Partial or Shallow Parsing
Partial or shallow parsing, also known as chunking, is a technique used in natural language
processing (NLP) to identify and group together certain types of linguistic elements within a
sentence, typically without analyzing their internal syntactic structure as deeply as in full
parsing.
Partial parsing is often used in applications where a less detailed analysis of the text is
sufficient, such as information extraction, named entity recognition, or shallow semantic
analysis.
Partial parsing can be implemented using various techniques, including rule-based
approaches, regular expressions, or machine learning algorithms such as sequence labeling
models (e.g., Hidden Markov Models, Conditional Random Fields) or neural network-based
models (e.g., LSTM, Transformer-based models).
Dependency Parsing
Dependency parsing is a technique in natural language processing (NLP) that aims to analyze the
grammatical structure of a sentence by identifying the relationships between words.
The head of a dependency relation is typically the word that governs or determines the relationship,
while the dependent is the word that is governed or dependent on the head.
Dependency parsing can be performed using various algorithms, including deterministic algorithms,
graph-based algorithms, transition-based algorithms, and neural network-based models.
Dependency parsing is widely used in NLP for various tasks, including:
o Syntax Analysis: Dependency parsing provides valuable insights into the syntactic structure of
sentences, helping to identify relationships between words and their roles in the sentence.
o Information Extraction: Dependency parsing can be used to extract structured information
from unstructured text by identifying relationships between entities and attributes.
o Machine Translation: Dependency parsing can assist in aligning the syntactic structures of
sentences in different languages, which is useful for machine translation systems.
o Question Answering: Dependency parsing can aid in understanding the syntactic structure of
questions and identifying the relationships between question words and their corresponding
answers.
Word Senses
In natural language processing (NLP), the concept of word senses refers to the different
meanings that a word can have in different contexts.
Words often have multiple senses, and understanding these senses is crucial for accurately
interpreting and processing natural language text.
Here are some key aspects and techniques related to word senses in NLP:
o Polysemy: Many words in natural language are polysemous, meaning they have multiple
senses. For example, the word "bank" can refer to a financial institution or the side of a
river.
o WordNet: WordNet is a lexical database of the English language that organizes words
into synsets (sets of synonyms) and provides information about their semantic
relationships, including hypernyms (more general terms), hyponyms (more specific
terms), meronyms (part-whole relationships), and holonyms (whole-part relationships)
o Context Window: In many WSD approaches, the context of a word is defined by a fixed-
size window of surrounding words in the text.
Wordnet
WordNet is a lexical database and resource for the English language that is widely used in
natural language processing (NLP) and computational linguistics.
Here are some key features and aspects of WordNet:
o Organization: WordNet organizes words into sets called synsets, which represent sets of
synonyms that are synonymous with each other in some or all senses. Each synset
corresponds to a specific word sense or meaning. For example, the word "bank" in
WordNet has multiple synsets corresponding to its different meanings, such as "financial
institution" and "sloping land."
o Semantic Relationships: WordNet includes various semantic relationships between words
and synsets, such as:
▪ Hypernyms: More general terms (e.g., "animal" is a hypernym of "dog").
Part-of-Speech Tags: WordNet organizes words and synsets by their part of speech
(POS), including nouns, verbs, adjectives, and adverbs. This allows users to access
specific sets of synsets based on the POS of the word.
Availability: WordNet is freely available for research and educational purposes and can
be accessed through various interfaces and programming libraries, such as NLTK (Natural
Language Toolkit) in Python.
Applications: WordNet is used in a wide range of NLP tasks and applications, including
word sense disambiguation, information retrieval, machine translation, text
summarization, sentiment analysis, and more. It provides a valuable resource for
understanding the semantic relationships between words and disambiguating word
senses in text.