CSCI 5832 Natural Language Processing: Jim Martin
CSCI 5832 Natural Language Processing: Jim Martin
CSCI 5832 Natural Language Processing: Jim Martin
Jim Martin
Lecture 9
10/11/20 1
Today 2/19
2
10/11/20
Statistical Sequence
Classification
• Given an input sequence, assign a label
(or tag) to each element of the tape
Or... Given an input tape, write a tag out to an
output tape for each cell on the input tape
• Can be viewed as a classification task if
we view
The individual cells on the input tape as things
to be classified
The tags written on the output tape as the
class labels
3
10/11/20
POS Tagging as Sequence
Classification
5
10/11/20
Road to HMMs
7
10/11/20
Likelihood and Prior
8
10/11/20
Transition Probabilities
9
10/11/20
Observation Probabilities
10
10/11/20
An Example: the verb “race”
11
10/11/20
Disambiguating “race”
12
10/11/20
Example
• P(NN|TO) = .00047
• P(VB|TO) = .83
• P(race|NN) = .00057
• P(race|VB) = .00012
• P(NR|VB) = .0027
• P(NR|NN) = .0012
• P(VB|TO)P(NR|VB)P(race|VB) = .00000027
• P(NN|TO)P(NR|NN)P(race|NN)=.00000000032
• So we (correctly) choose the verb reading,
13
10/11/20
Markov chain for words
14
10/11/20
Markov chain = “First-order
Observable Markov Model”
• A set of states
Q = q1, q2…qN; the state at time t is qt
• Transition probabilities:
a set of probabilities A = a01a02…an1…ann.
Each aij represents the probability of transitioning from
state i to state j
The set of these is the transition probability matrix A
17
10/11/20
B observation likelihoods for
POS HMM
18
10/11/20
The A matrix for the POS HMM
19
10/11/20
The B matrix for the POS HMM
20
10/11/20
Viterbi intuition: we are looking
for the best ‘path’
S1 S2 S3 S4 S5
RB
NN
VBN
JJ DT VB
TO
VBD
VB NNP NN
22
10/11/20
Viterbi example
23
10/11/20
Information Theory
24
10/11/20
Information Theory
25
10/11/20
Aside on logs
26
10/11/20
Entropy
27
10/11/20
Entropy cont.
28
10/11/20
Entropy cont.
LogP ( s ) = 4 * log P ( fish) +1* log P (one) +1* log P (two) +1* log P (red ) +1* log P(blue)
29
10/11/20
Entropy cont.
1 1
log P( S ) = ∑Count (w) log P(w)
N N w∈V
30
10/11/20
Entropy
Count ( w)
∑ log P( w)
w∈V N
• Rewriting and getting rid of the minus sign
H ( S ) = −∑ P ( w) log P( w)
w∈V 31
10/11/20
Entropy
H ( S ) = −∑ P( w) log P( w)
w∈V
32
10/11/20
Model Evaluation
34
10/11/20
Model Evaluation
36
10/11/20
Viterbi
37
10/11/20
Forward
38
10/11/20
Who cares?
39
10/11/20
Learning Models
40
10/11/20
EM
41
10/11/20
Generative vs. Discriminative
Models
42
10/11/20
Disambiguating “race”
43
10/11/20
Discriminative Models
44
10/11/20
MaxEnt Tagging
45
10/11/20
MaxEnt Tagging
46
10/11/20
Statistical Sequence
Classification
47
10/11/20