Semantic Analysis: 2018/19 Sem I
Semantic Analysis: 2018/19 Sem I
Semantic Analysis: 2018/19 Sem I
Semantic Analysis
2018/19 Sem I
• Semantic Analysis involves extraction of context-independent aspects of a sentence's
meaning, including the semantic roles of entities mentioned in the sentence, and
quantification information, such as cardinality, iteration, and dependency.
Syntactic Analysis
Parse tree
Semantic Analysis
Context-independent meaning +
Other relevant information
• There is a close link between the life of the society and the lexicon of the language
spoken.
For example:
ice, snow
politeness in Amharic, not in English
• This theory is based on the notion of semantic primitives which are claimed to have the
following properties:
Universally meaningful concepts
Do not need definition in terms of other words
Represent basic concepts
Reduce a language into a core that enables the full development of the
language (Natural Semantic Metalanguage, NSM)
Every language has essentially the same NSM though the syntax may differ.
Substantives: I/ , YOU/ ( ), SOMEONE/ , PEOPLE/ ,
SOMETHING/THING/ , BODY/
Determiners: THIS/ , THE SAME/ , OTHER/
Quantifiers: ONE/ , TWO/ , SOME/ , ALL/ , MUCH/MANY/
Evaluators: GOOD/ , BAD/
Descriptors: BIG/ , SMALL/
Mental predicates: THINK/ , KNOW/ , WANT/ , FEEL/ , SEE/ , HEAR/
Speech: SAY/ , WORDS/ , TRUE/
Actions, events and movement: DO/ , HAPPEN/ , MOVE/
Existence and possession: THERE IS/ , HAVE/
Life and death: LIVE/ , DIE/
Time: WHEN/TIME/ , NOW/ , BEFORE/ , AFTER/ , A LONG TIME/ ,A
SHORT TIME/ , FOR SOME TIME/
Space: WHERE/PLACE/ / , HERE/ , ABOVE/ , BELOW/ , FAR/ ,
NEAR/ , SIDE/ , INSIDE/
Logical concepts: NOT/ ( )… , MAYBE/ , CAN/ , BECAUSE/ -, IF/ -
Intensifier, augmentor: VERY/ , MORE/
Taxonomy, partonomy: KIND OF, PART OF
Similarity: LIKE/
• Not ideal as a meaning representation and doesn't do everything we want - but close
Supports the determination of truth
Supports compositionality of meaning
Supports question-answering (via variables)
Supports inference
Connective → ∨ | ∧|⇒
Quantifier → ∃| ∀
Constant → A | Abebe | Car1
Variable → x | y | z |...
Predicate → Red | Owns | Serves| Near |...
Function → FatherOf | Plus | LocationOf |...
• Terms are names used to represent objects.
Constants
Refer to specific object in the world being described.
Conventionally depicted as single capitalized letters such as A and B or
single capitalized words like Abebe.
For example: Abebe, Car1
Functions
Correspond to concepts that are often expressed as possessives.
Provide a convenient way to refer to specific objects without having to
associate a named constant with them.
For example: LocationOf(AAU), FatherOf(Abebe), Plus(1,2)..
Variables
Used to make assertions and draw inferences about objects without having
to make any reference to any particular named object.
Making statements about a particular unknown object;
Making statements about all the objects in some world of objects.
• A predicate represents a property of or relation between terms that can be true or false.
In a given interpretation, an n-ary predicate can defined as a function from tuples
of n terms to {True, False}
For example: Brother(Abebe, Kebede), Left-of(Square1, Square2),
GreaterThan(plus(1,1), plus(0,1))
P Q ¬P P∧Q P∨Q ⇒Q
P⇒
F F T F F T
F T T F T T
T F F F T F
T T F T T T
• An atomic sentence is simply a predicate applied to a set of terms.
For example: Owns(Abebe, Car1)
Sold(Abebe, Car1, Kebede)
Semantics is True or False depending on the interpretation
Existential quantifier: ∃
Asserts that a sentence is true for at least one value of a variable x.
∃x Loves(x, FOPC)
∃x(Cat(x) ∧ Color(x,Black) ∧ Owns(Abebe,x))
• Universal and existential quantification are logically related to each other:
∀x ¬Love(x,Abebe) ⇔ ¬∃x Loves(x,Abebe)
∀x Love(x,Kebede) ⇔ ¬∃x ¬Loves(x,Kebede)
• General Identities
∀x ¬P ⇔ ¬∃x P
¬∀x P ⇔ ∃x ¬P
∀x P ⇔ ¬∃x ¬P
∃x P ⇔ ¬∀x ¬P
∀x P(x)∧Q(x) ⇔ ∀xP(x) ∧ ∀xQ(x)
∃x P(x)∨Q(x) ⇔ ∃xP(x) ∨ ∃xQ(x)
• Sheraton Addis is a hotel
Hotel(SheratonAddis)
Every substitution of a known object for x must result in a sentence that is true.
EthiopianHotel(SheratonAddis) ⇒ Serves(SheratonAddis,EthiopianFood)
EthiopianHotel(HiltonAddis) ⇒ Serves(HiltonAddis,EthiopianFood)
EthiopianHotel(Ghion) ⇒ Serves(Ghion,EthiopianFood)
.
.
.
.
.
.
What happens when we consider a substitution from a set of objects that are
not Ethiopian hotels?
EthiopianHotel(KampalaSheraton) ⇒ Serves(KampalaSheraton, EthiopianFood)
EthiopianHotel(NairobiHilton) ⇒ Serves(NairobiHilton, EthiopianFood)
.
.
.
.
The sentence is always true
• Limitations:
If you are interested in a football, Ethiopian Coffee is playing today
HaveInterestIn(Hearer, Football) Playing( EthiopianCoffee, Today)
Flawed if the antecedent is false.
One more beer and I will fall off this stool
A simple-minded translation of this sentence might consist of a
conjunction of two clauses.
The use of the word and obscures the fact that this sentence in stead has
an implication underlying it.
Your money or your life!
• A semantic network is a network which represents semantic relations among
concepts.
This is often used as a form of knowledge representation.
It is a directed or undirected graph consisting of vertices, which represent
concepts, and edges.
Example:
Vertebrae Cat has Fur
has is a
has
• A semantic network is used when one has knowledge that is best understood as a
set of concepts that are related to one another.
• Lexicon has a highly systematic structure that governs what words can mean and how
they can be used.
• The structure of a lexicon consists of relations among words and their meanings, as
well as the internal structure of individual words.
• Lexical Semantics is the linguistic study of the systematic and meaning related
structure of the lexicon.
• Homonymy: refers to the relation that holds between words that have the same form
but different meaning.
• Example: 1. A bank can hold the investments in a custodial account in the client’s
name.
2. As the agriculture burgeons on the east bank, the river will shrink more.
This relationship is traditionally denoted by placing a superscript on the
orthographic form of the word as bank1 and bank2.
This notation indicates that these are two separate lexemes, with distinct and
unrelated meanings, that happen to share an orthographic form.
• Homophones: words with the same pronunciation but different spellings.
Example: to, two, too [too]
• Homographs: words with the same orthographic forms but different pronunciation.
Example: desert [dizurt/dezurt]
• Homonymy usually leads an application into dealing with ambiguity.
• In Spelling Correction, homophones can lead to real-world spelling errors.
Example: weather and whether are erroneously interchanged.
• In Speech Recognition, homophones pose a great deal of problems to language models.
Example: to, two and too cause obvious problems.
• Text-to-Speech Systems are particularly vulnerable to homographs with distinct
pronunciations.
The problem can be avoided through the use of part of speech tagging.
Example: desert (V) => [dizurt]
desert (N) => [dezurt]
• Example: 1. A bank can hold the investments in a custodial account in the client’s
name.
• To distinguish homonymy from polysemy, two criteria are used: history and etymology
of the lexeme in question.
For example, etymology reveals that bank1 and bank2 have Italian and
Scandinavian origins, respectively.
However, the use of banks (as in blood bank) is related to the sense of bank1,
and therefore there is no need to consider the usage of banks as homonym to
bank1.
• Synonymy: refers to the relation that holds between different lexemes with the same
meaning.
• Examples:
human, hominid
teacher, educator
• WordNet is freely and publicly available lexical database of English, in which lexemes
are interlinked by means of conceptual-semantic and lexical relations.
• Latent Semantic Analysis (LSA) aims to discover something about the meaning
behind the words; about the topics in the documents.
• What is the difference between topics and words?
Words are observable
Topics are not observable; they are latent.
• How to find out topics from the words in an automatic way?
We can imagine them as:
a compression of words
a combination of words
• Implements the idea that the meaning of a passage is the sum of the meanings of
its words:
meaning(word1) + meaning(word2) + … + meaning(wordn) = meaning(passage)
• This “bag of words” function shows that a passage is considered to be an unordered
set of word tokens and the meanings are additive.
• By creating an equation of this kind for every passage of language that a learner
observes, we get a large system of linear equations.
• Represent the document as a vector where each entry corresponds to a different
word and the number at that entry corresponds to how many times that word was
present in the document (or some function of it)
Number of words is huge.
Select and use a smaller set of words that are of interest.
E.g. uninteresting words: ‘and’, ‘the’ ‘at’, ‘is’, etc. These are called
stop-words
Stemming: remove endings. E.g. ‘learn’, ‘learning’, ‘learnable’, ‘learned’
could be substituted by the single stem ‘learn’
Other simplifications can also be invented and used
The set of different remaining words is called dictionary or vocabulary. Fix an
ordering of the terms in the dictionary so that you can operate them by their
index.