Natural Language Processing:: N-Gram Language Models
Natural Language Processing:: N-Gram Language Models
Natural Language Processing:: N-Gram Language Models
Processing:
1
Language Models
Formal grammars (e.g. regular, context
free) give a hard “binary” model of the
legal sentences in a language.
For NLP, a probabilistic model of a
language that gives a probability that a
string is a member of a language is more
useful.
To specify a correct probability
distribution, the probability of all
sentences in a language must sum to 1.
Uses of Language Models
Speech recognition
“I ate a cherry” is a more likely sentence than “Eye eight
uh Jerry”
OCR & Handwriting recognition
More probable sentences are more likely correct readings.
Machine translation
More likely sentences are probably better translations.
Generation
More likely sentences are probably better NL generations.
Context sensitive spelling correction
“Their are problems wit this sentence.” 6383010676 -
sankar
Completion Prediction
08/14/20 9
N-Gram Models
More formally, we will assess the
conditional probability of candidate words.
Also, we will assess the probability of an
entire sequence of words.
Being able to predict the next word (or
any linguistic unit) in a sequence is very
common in many applications.
08/14/20 10
Applications
It lies at the core of the following applications
Automatic speech recognition
Handwriting and character recognition
Spelling correction
Machine translation
Augmentative communication
Word similarity, generation, POS tagging, etc.
08/14/20 11
Counting
He stepped out into the hall, was delighted to
encounter a water brother.
13 tokens, 15 if we include “,” and “.” as separate
tokens.
Assuming we include the comma and period, how
many bigrams are there?
•Numbers
•Misspellings
•Names
•Acronyms
•etc
P(wn | w n1
1 ) P(wn | w n1
nN 1 )
Bigram version
P(w n | w n1
1 ) P(w n | w n1 )
count(w i1,w i )
P(w i | w i1 )
count(w i1 )
P(english|want) = .0011
World knowledge
P(chinese|want) = .0065
P(to|want) = .66
P(eat | to) = .28 Syntax
P(food | to) = 0
P(want | spend) = 0
P (i | <s>) = .25 Discourse
39
A bad language model
40
A bad language model
41
A bad language model
42
Evaluation
How do we know if one model is better
than another?
Shannon’s game gives us an intuition.
The generated texts from the higher order
models sound more like the text the model
was obtained from.
For bigrams: