Introduction To: Information Retrieval
Introduction To: Information Retrieval
Introduction To: Information Retrieval
Introduction to
Information Retrieval
CS276: Information Retrieval and Web Search
Christopher Manning and Pandu Nayak
Today’s topic
Latent Semantic Indexing
Linear Algebra
Background
Introduction to Information Retrieval Sec. 18.1
Example
Matrix-vector multiplication
Matrix-vector multiplication
Thus a matrix-vector multiplication such as Sx (S, x
as in the previous slide) can be rewritten in terms of
the eigenvalues/vectors:
Matrix-vector multiplication
Suggestion: the effect of “small” eigenvalues is
small.
If we ignored the smallest eigenvalue (1), then
instead of
we would get
Example
Let Real, symmetric.
Then
Eigen/diagonal Decomposition
Let be a square matrix with m linearly
independent eigenvectors (a “non-defective”
matrix)
diagonal Unique for
Theorem: Exists an eigen decomposition distinct
eigen-
values
And S=UΛU–1.
Introduction to Information Retrieval Sec. 18.1
Recall
Then, S=UΛU–1 =
Introduction to Information Retrieval Sec. 18.1
Example continued
Let’s divide U (and multiply U–1) by
Then, S=
Q Λ (Q-1= QT )
Exercise
Examine the symmetric eigen decomposition, if any,
for each of the following matrices:
Introduction to Information Retrieval
Time out!
I came to this class to learn about text retrieval and
mining, not to have my linear algebra past dredged
up again …
But if you want to dredge, Strang’s Applied Mathematics is
a good place to start.
What do these matrices have to do with text?
Similarity Clustering
We can compute the similarity between two
document vector representations xi and xj by xixjT
Let X = [x1 … xN]
Then XXT is a matrix of similarities
Xij is symmetric
So XXT = QΛQT
So we can decompose this similarity space into a set
of orthonormal basis vectors (given in Q) scaled by
the eigenvalues in Λ
This leads to PCA (Principal Components Analysis)
17
Introduction to Information Retrieval Sec. 18.2
Singular values
Introduction to Information Retrieval Sec. 18.2
SVD example
Let
Low-rank Approximation
SVD can be used to compute optimal low-rank
approximations.
Approximation problem: Find Ak of rank k such that
Frobenius norm
Low-rank Approximation
Solution via SVD
Reduced SVD
If we retain only k singular values, and set the rest to
0, then we don’t need the matrix parts in color
Then Σ is k×k, U is M×k, VT is k×N, and Ak is M×N
This is referred to as the reduced SVD
It is the convenient (space-saving) and usual form for
computational applications
It’s what Matlab gives you
k
Introduction to Information Retrieval Sec. 18.3
Approximation error
How good (bad) is this approximation?
It’s the best possible, measured by the Frobenius
norm of the error:
Latent Semantic
Indexing via the SVD
Introduction to Information Retrieval Sec. 18.4
What it is
From term-doc matrix A, we compute the
approximation Ak.
There is a row for each term and a column
for each doc in Ak
Thus docs live in a space of k<<r
dimensions
These dimensions are not the original axes
But why?
Introduction to Information Retrieval
Goals of LSI
LSI takes documents that are semantically similar (=
talk about the same topics), but are not similar in the
vector space (because they use different words) and
re-represents them in a reduced vector space in which
they have higher similarity.
LSA Example
A simple example term-document matrix (binary)
37
Introduction to Information Retrieval
LSA Example
Example of C = UΣVT: The matrix U
38
Introduction to Information Retrieval
LSA Example
Example of C = UΣVT: The matrix Σ
39
Introduction to Information Retrieval
LSA Example
Example of C = UΣVT: The matrix VT
40
Introduction to Information Retrieval
41
Original matrix C vs. reduced C2 =
Introduction to Information Retrieval
UΣ2VT
42
Introduction to Information Retrieval
43
Introduction to Information Retrieval Sec. 18.4
Empirical evidence
Experiments on TREC 1/2/3 – Dumais
Lanczos SVD code (available on netlib) due to
Berry used in these experiments
Running times of ~ one day on tens of thousands
of docs [still an obstacle to use!]
Dimensions – various values 250-350 reported.
Reducing k improves recall.
(Under 200 reported unsatisfactory)
Generally expect recall to improve – what about
precision?
Introduction to Information Retrieval Sec. 18.4
Empirical evidence
Precision at or above median TREC precision
Top scorer on almost 20% of TREC topics
Slightly better on average than straight vector
spaces
Effect of dimensionality:
Dimensions Precision
250 0.367
300 0.371
346 0.374
Introduction to Information Retrieval Sec. 18.4
Failure modes
Negated phrases
TREC topics sometimes negate certain
query/terms phrases – precludes simple automatic
conversion of topics to latent semantic space.
Boolean queries
As usual, freetext/vector space syntax of LSI
queries precludes (say) “Find any doc having to do
with the following 5 companies”
See Dumais for more.
Introduction to Information Retrieval Sec. 18.4
Simplistic picture
Topic 1
Topic 2
Topic 3
Introduction to Information Retrieval
Resources
IIR 18
Scott Deerwester, Susan Dumais, George
Furnas, Thomas Landauer, Richard Harshman.
1990. Indexing by latent semantic analysis.
JASIS 41(6):391—407.