Week8 Semantics Lab
Week8 Semantics Lab
Week8 Semantics Lab
a) Use cosine similarity (the normalized dot product) and Euclidean distance to calculate
which of the vectors for words ‘cherry’ and ‘digital’ are closer to ‘information’. Discuss this
way of calculating word similarity.
b) One of the most important concepts in NLP is Pointwise Mutual Information (PMI). It is
a measure of how often two events x and y occur, compared with what we would expect if
they were independent. The pointwise mutual information between a target word w and a
context word c is then defined as:
P (w,c)
PMI(w,c) = log2 P (w)P (c)
Use the PMI values to recalculate cosine similarity and Euclidean distance.
I absolutely love this place. The 360 degree glass windows with the Yerba buena
garden view transports you to what feels like a different zen zone within the city.
[...]
a) Using WordNet, determine how many senses there are for each of the open-class words
in each sentence. How many distinct combinations of senses are there for each sentence?
How does this number seem to vary with sentence length? What steps of pre-processing
are necessary for this type of assigning meaning to sentences?
1
b) Using WordNet, tag each open-class word in your corpus with its correct tag. Was choo-
sing the correct sense always a straightforward task? Report on any difficulties you encoun-
tered.
NN JJ DT PRP VBZ
This 5 0 5 5 0
crap 7 8 0 0 0
game 10 5 0 0 0
is 0 0 0 0 15
over 3 3 3 3 3