Information Retrieval 1
Information Retrieval 1
Information Retrieval 1
Information Retrieval
ISiM Syllabus : MISM 623 – Information Retrieval Systems
Course Objectives - This course examines information retrieval within the
context of full-text datasets. The students should be able to understand and
critique existing information retrieval systems and to design and build
information retrieval systems themselves. The course will introduce students to
traditional methods as well as recent advances in information retrieval (IR),
handling and querying of textual data. The focus will be on newer techniques of
processing and retrieving textual information, including hypertext documents
available on the World Wide Web.
Course Outline
Topics covered include:
• IR Models
o Boolean Model
o Vector Space Model
o Relational DBMS
o Probabilistic Models
o Language Models
• Term Indexing
o Zipf's Law
o term weighting
ISiM
IR – Introduction Created by Chethan.M
Example of IR: Just getting a credit card out of your wallet so that you can type
in the card number is a form of information retrieval.
Motivation:
Information retrieval deals with the representation, storage, organization of
& access to information items. The representation & organization of the
information items should provide the user with easy access to the information in
which he is interested. Unfortunately, characterization of the user information
need is not a simple problem.
Given the user query, the key goal of an IR system (Search Engine) is to retrieve
information which might be useful or relevant to the user. The emphasis is on the
retrieval of information as opposed to the retrieval of data.
ISiM
IR – Introduction Created by Chethan.M
Data Retrieval:
Which documents contain a set of keywords
Well defined semantics
A single erroneous object implies failure.
Information Retrieval:
Information about a subject or topic
Semantics is frequently loose
Small errors are tolerated
NLP retrieval & non-structure data
Ranking & Relevance
IR System:
Interpret contents of information items.
Generate a Ranking which reflects relevance.
Notion of relevance is most important.
ISiM
IR – Introduction Created by Chethan.M
History (Past)
• 1960-70’s:
– Initial exploration of text retrieval systems for “small” corpora of
scientific abstracts, and law and business documents.
– Development of the basic Boolean and vector-space models of
retrieval.
– Prof. Salton and his students at Cornell University are the leading
researchers in the area.
• 1980’s:
– Large document database systems, many run by companies:
• Lexis-Nexis
• Dialog
• MEDLINE
• 1990’s:
– Searching FTPable documents on the Internet
• Archie
• WAIS
– Searching the World Wide Web
• Lycos
• Yahoo
• Altavista
• 1990’s continued:
– Organized Competitions
• NIST TREC
– Recommender Systems
• Ringo
• Amazon
• NetPerceptions
– Automated Text Categorization & Clustering
• 2000’s
– Link analysis for Web Search
• Google
– Automated Information Extraction
• Whizbang
• Fetch
• Burning Glass
– Question Answering
• TREC Q/A track
•
ISiM
IR – Introduction Created by Chethan.M
• 2000’s continued:
– Multimedia IR
• Image
• Video
• Audio and music
– Cross-Language IR
• DARPA Tides
– Document Summarization
Present
Source of data
Electronic Library
Document of University
Data Online (web site)
Example
AltaVista
Google
Etc.
Typical IR Task
• Given:
– A corpus of textual natural-language documents.
ISiM
IR – Introduction Created by Chethan.M
Document
corpus
Query IR
String System
1. Doc1
2. Doc2
Ranked 3. Doc3
.
Documen .
Relevance
• Relevance is a subjective judgment and may include:
– Being on the proper subject.
– Being timely (recent information).
– Being authoritative (from a trusted source).
– Satisfying the goals of the user and his/her intended use of the
information (information need).
• Much of IR depends upon idea that
– Similar vocabulary -> relevant to same queries
• Usually look for documents matching query words
• “Similar” can be measured in many ways
– String matching/comparison
ISiM
IR – Introduction Created by Chethan.M
Intelligent IR
• Taking into account the meaning of the words used.
• Taking into account the order of words in the query.
• Adapting to the user based on direct or indirect feedback.
• Taking into account the authority of the source.
IR Basic Concepts
ISiM
IR – Introduction Created by Chethan.M
Retrieval
Database
Browsing
Fig: Interaction of the user with the retrieval system through distinct tasks.
ISiM
IR – Introduction Created by Chethan.M
ISiM
IR – Introduction Created by Chethan.M
ISiM
IR – Introduction Created by Chethan.M
IR System Architecture
User Interface
Text
User Text Operations
Need
Logical View
Query Database
User Indexing
Operations Manager
Feedback
Inverted
Query Searching Index File
Text
Database
Ranked Retrieved
Ranking
Docs Docs
IR System Components
ISiM
IR – Introduction Created by Chethan.M
References:
1. Modern Information Retrieval / by Ricardo Baeza-Yates and Berthier Ribeiro-
Neto, 2001
2. Introduction to Information Retrieval / Christopher D. Manning, Prabhakar
3. Intelligent Information Retrieval and Web Search , Raymond Mooney,
University of Texas at Austin
4. Introduction to Information Retrieval (IR), T.Keerati Boonchote
ISiM