Chapter 6 - Exercises

Introduction to Information Retrieval
Chapter 6
Exercise 6.10
Consider the table of term frequencies for 3 documents denoted Doc1, Doc2, Doc3 in Figure 6.9.
Compute the tf-idf weights for the terms car, auto, insurance, best, for each document, using the idf
values from Figure 6.8.
Figure 6.9 tf values N=806,791

Figure 6.8 idf values
Solution
Doc1 Doc2 Doc3

car 44.55 6.6 39.6
auto 6.24 68.64 0
insurance 0 53.46 46.98
best 21 0 25.5
==================================================================================
Exercise 6.15
Recall the tf-idf weights computed in Exercise 6.10. Compute the Euclidean normalized document
vectors for each of the documents, where each vector has four components, one for each of the four
terms.
Solution
Doc1 Doc2 Doc3

car 44.55 6.6 39.6
auto 6.24 68.64 0
insurance 0 53.46 46.98
best 21 0 25.5
length of di 49.65 85.81 66.52
doc1 = [0.8974, 0.1257, 0, 0.4230]

doc2 = [0.0756, 0.7867, 0.6127, 0]
doc3 = [0.5953, 0, 0.7062, 0.3833]
Exercise 6.17
With term weights as computed in Exercise 6.15, rank the three documents by computed score for the
query "car insurance", for each of the following cases of term weighting in the query:
1. The weight of a term is 1 if present in the query, 0 otherwise.
2. Euclidean normalized idf.
Solution
1. q = [1, 0, 1, 0] //[car, auto, insurance, best]
score(q, doc1)= 0.8974 //[0.8974*1 + 0.1257*0 + 0*1 + 0.4230*0]
score(q, doc2) = 0.6883 //[0.0756*1 + 0.7867*0 + 0.6127*1 + 0*0]
score(q, doc3) = 1.3015 //[0.5953*1 + 0*0 + 0.7062*1 + 0.3833*0]
Ranking: doc3, doc1, doc2
2. q = [0.4778, 0.6024, 0.4692, 0.4344] //[car, auto, insurance, best]

tf(t,q) idf norm idf
car 1 1.65 0.4778
auto 0 2.08 0.6024
insurance 1 1.62 0.4692
best 0 1.5 0.4344
3.453
score(q, doc1) = 0.6883 // [0.8974*0.4778 + 0.1257* 0.6024 + 0* 0.4692 + 0.4230*0.4344]

score(q, doc2) = 0.7975 //[0.0756*0.4778 + 0.7867*0.6024 + 0.6127*0.4692 + 0*0.4344]
score(q, doc3) = 0.7823 //[0.5953*0.4778 + 0*0.6024 + 0.7062*0.4692 + 0.3833*0.4344]
==================================================================================
Exercise 6.19
Compute the vector space similarity between the query “digital cameras” and the document “digital
cameras and video cameras” by filling out the empty columns in Table 6.1. Assume N = 10,000,000,
logarithmic term weighting (wf columns) for query and document, idf weighting for the query only and
cosine normalization for the document only. Treat "and" as a stop word. Enter term counts in the tf
columns. What is the final similarity score?
Solution
Similarity score = 1.56+1.56 = 3.12

==================================================================================
Exercise 6.23
Refer to the tf and idf values for four terms and three documents in Exercise 6.10. Compute the two top
scoring documents on the query "best car insurance" for each of the following weighing schemes: (i)
nnn.atc; (ii) ntc.atc.
Figure 6.9 tf values N=806,791

Figure 6.8 idf values
Solution
(i) nnn.atc
nnn weights for documents
Score(q, doc1) = 15.12 + 1.06 +0 + 7.14 = 23.32

Score(q, doc2) = 2.24 + 11.65 + 18.15 + 0 = 32.04
Score(q, doc3) = 13.44 + 0 + 15.95 + 8.67 = 38.06
..................................................................................................................
(ii) ntc.atc
ntc weight for doc1
ntc weight for doc2
ntc weight for doc3
ntc.atc
Score(q, doc1) = 0.762

..................................................................................................................
tf-idf weights Doc1 Doc2 Doc3
car 44.55 6.6 39.6
auto 6.24 68.64 0
insurance 0 53.46 46.98
best 21 0 25.5
ntc.ltn weight for doc1
query doc1 Product
Term w(tf) idf tf-idf tf idf tf-idf norm' w
car 1 1.65 1.65 27 1.65 44.55 0.8974 1.4807
auto 0 2.08 0 3 2.08 6.24 0.1257 0
insurance 1 1.62 1.62 0 1.62 0 0 0
best 1 1.5 1.5 14 1.5 21 0.4230 0.6345
49.65
query doc2 Product
car 1 1.65 1.65 4 1.65 6.6 0.0756 0.1247
auto 0 2.08 0 33 2.08 68.64 0.7867 0
insurance 1 1.62 1.62 33 1.62 53.46 0.6127 0.9926
best 1 1.5 1.5 0 1.5 0 0 0
85.81
query doc3 Product
car 1 1.65 1.65 24 1.65 39.6 0.5953 0.9822
auto 0 2.08 0 0 2.08 0 0 0
insurance 1 1.62 1.62 29 1.62 46.98 0.7062 1.144
best 1 1.5 1.5 17 1.5 25.5 0.3833 0.575
66.52
ntc.ltn
product
Term doc1 doc2 doc3
car 1.4807 0.1247 0.9822
auto 0 0 0
insurance 0 0.9926 1.144
best 0.6345 0 0.575
Score 2.1152 1.1173 2.7012
Score(q, doc1) = 1.4807+0.6345 = 2.1152

Score(q,doc2) = 0.1247 + 0.9926 = 1.1173
Score(q,doc3) = 0.9822 + 1.144 + 0.575 = 2.7012
==================================================================================

Chapter 6 - Exercises

Uploaded by

Copyright:

Available Formats

Chapter 6 - Exercises

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6 - Exercises

Uploaded by

Copyright:

Available Formats

Introduction to Information Retrieval

Figure 6.9 tf values N=806,791

Doc1 Doc2 Doc3

Doc1 Doc2 Doc3

doc1 = [0.8974, 0.1257, 0, 0.4230]

2. q = [0.4778, 0.6024, 0.4692, 0.4344] //[car, auto, insurance, best]

score(q, doc1) = 0.6883 // [0.89740.4778 + 0.1257 0.6024 + 0* 0.4692 + 0.4230*0.4344]

Similarity score = 1.56+1.56 = 3.12

Figure 6.9 tf values N=806,791

nnn weights for documents

Score(q, doc1) = 15.12 + 1.06 +0 + 7.14 = 23.32

ntc weight for doc2

ntc weight for doc3

Score(q, doc1) = 0.762

Score(q, doc1) = 1.4807+0.6345 = 2.1152

You might also like