Week8 Semantics Lab

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Computational Linguistics University of Passau

Lab session Annette Hautli-Janisz, Prof. Dr.


Summer 2022

Week 8: Lexical and distributional semantics

Task 1: Word similarity


Consider the frequencies in the following word co-occurrence matrix:

pie data computer


cherry 442 8 2
digital 5 1683 1670
information 5 3982 3325

a) Use cosine similarity (the normalized dot product) and Euclidean distance to calculate
which of the vectors for words ‘cherry’ and ‘digital’ are closer to ‘information’. Discuss this
way of calculating word similarity.

b) One of the most important concepts in NLP is Pointwise Mutual Information (PMI). It is
a measure of how often two events x and y occur, compared with what we would expect if
they were independent. The pointwise mutual information between a target word w and a
context word c is then defined as:
P (w,c)
PMI(w,c) = log2 P (w)P (c)

Use the PMI values to recalculate cosine similarity and Euclidean distance.

Task 2: Word sense assignment


This is a customer review mentioned on Kaggle (https://www.kaggle.com/code/econdata/
exercise-word-vectors/notebook).

I absolutely love this place. The 360 degree glass windows with the Yerba buena
garden view transports you to what feels like a different zen zone within the city.
[...]

a) Using WordNet, determine how many senses there are for each of the open-class words
in each sentence. How many distinct combinations of senses are there for each sentence?
How does this number seem to vary with sentence length? What steps of pre-processing
are necessary for this type of assigning meaning to sentences?

1
b) Using WordNet, tag each open-class word in your corpus with its correct tag. Was choo-
sing the correct sense always a straightforward task? Report on any difficulties you encoun-
tered.

Task 3: Fleiss’ kappa


Fleiss’ κ measures inter-annotator agreement for more than two annotators and illustrate
the extent of agreement between annotators above the level of agreement that would be
−P̄e
achieved if the annotators made their judgements randomly. It is defined as κ = P̄1− P̄e
, with
P̄ as actual agreement, and P̄e as the expected agreement by chance.
Use the judgements on POS labeling in the agreement table below to calculate Fleiss’ κ.

NN JJ DT PRP VBZ
This 5 0 5 5 0
crap 7 8 0 0 0
game 10 5 0 0 0
is 0 0 0 0 15
over 3 3 3 3 3

You might also like