Gujarat Technological University: Page 1 of 2
Gujarat Technological University: Page 1 of 2
Gujarat Technological University: Page 1 of 2
Bachelor of Engineering
Subject Code: 3170718
INFORMATION RETRIEVAL
7th Semester
Prerequisite: Basic mathematics background is also required. You are supposed to be familiar basic concepts of
probability (e.g., Bayes’s theorem), linear algebra (e.g., vector, matrix and inner product).
Rationale: Information Retrieval (IR) systems give access to large amounts of online information stored as text,
images, speech or video, e.g., Web documents. IR systems should only retrieve those documents that are relevant to a
user's interest but have to deal with the uncertainty of describing what a document is about and what a user is actually
interested in.
Syllabus:
Sr. Content Total
No. Hrs
1 Introduction to Information Retrieval: The nature of unstructured and semi-structured
5
text. Inverted index and Boolean queries.
2 Text Indexing, Storage and Compression: Text encoding: tokenization, stemming, stop
words, phrases, index optimization. Index compression: lexicon compression and postings
lists compression. Gap encoding, gamma codes, Zipf's Law. Index construction. Postings 7
size estimation, merge sort, dynamic indexing, positional indexes, n-gram indexes, real-
world issues.
3 Retrieval Models: Boolean, vector space, TFIDF, Okapi, probabilistic, language modeling,
latent semantic indexing. Vector space scoring. The cosine measure. Efficiency
7
considerations. Document length normalization. Relevance feedback and query expansion.
Rocchio.
4 Performance Evaluation: Evaluating search engines. User happiness, precision, recall, F-
measure. Creating test collections: kappa measure, interjudge agreement. 4
5 Text Categorization and Filtering: Introduction to text classification. Naive Bayes models.
Spam filtering. Vector space classification using hyperplanes; centroids; k Nearest
5
Neighbors. Support vector machine classifiers. Kernel functions. Boosting.
Page 1 of 2
w.e.f. AY 2018-19
GUJARAT TECHNOLOGICAL UNIVERSITY
Bachelor of Engineering
Subject Code: 3170718
Note: This specification table shall be treated as a general guideline for students and teachers. The actual
distribution of marks in the question paper may vary slightly from above table.
Reference Books:
1. Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schuetze,
Cambridge University Press, 2007.
2. Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, and Trevor Strohman, Pearson
Education, 2009.
3. Modern Information Retrieval. Baeza-Yates Ricardo and Berthier Ribeiro-Neto. 2nd edition, Addison-Wesley,
2011.
4. Information Retrieval: Implementing and Evaluating Search Engines. Stefan Buttcher, Charlie Clarke, Gordon
Cormack, MIT Press, 2010.
Course Outcome:
Page 2 of 2
w.e.f. AY 2018-19