Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

NPTEL

Introduction to Information Retrieval

NPTEL via Swayam

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
ABOUT THE COURSE:This course offers a comprehensive introduction to Information Retrieval (IR), the ‘science behind search engines’ and document retrieval systems. It covers fundamental concepts such as indexing, ranking, retrieval models, evaluation metrics, and relevance feedback. Learners will also be briefly introduce modern IR applications, including web search, recommender systems, and semantic retrieval techniques. The course balances theory with hands-on components to prepare students for both academic research and industry rolesPREREQUISITES: Basic knowledge of Java/Python. Knowledge of data structures and algorithms.INDUSTRY SUPPORT: Information Retrieval is a foundational skill for multiple technology-driven industries, particularly those focusing on search engines, recommender systems, natural language processing, and large-scale data analytics. This course will be highly valued and recognized by companies involved in developing search infrastructure, enterprise solutions, and AI applications. Industries that will recognize the value of this course include:● Google – for its core work in web search, document retrieval, and question answering systems.● Microsoft – particularly in Bing, Azure Cognitive Search, and Office 365 search features.● Amazon – in both product search and AWS services like Amazon Kendra.

Syllabus

Week 1: Introduction: Library search and Information Retrieval (IR), Search in Unstructured Data, Comparison to DBMS.

Document Representation: text processing, controlled and free-text vocabulary, term filtering (tokenization, down-casing, stopword removal), linguistic processing (stemming, lemmatization).

Text Processing,

Incidence Matrix: term document incidence matrix, incidence vector; Query Processing; Boolean Retrieval; problem with larger collection.
Week 2:Inverted Index, Dictionary and Postings; Query Optimization and Merging, Basics of making an Inverted Index.
Week 3:Index Construction, Statistical Properties of Text: Zipf’s and Heap’s Law and their implications. Dictionary and Postings compression.
Week 4:Getting Started with Lucene - basics of Lucene, analyzing texts and indexing.

Some hands-on examples with real data.
Week 5:Introduction to Ranked Retrieval, Exact matching model, notion of relevance, best-match model, Jaccard coefficient and its drawbacks.

Introduction to Term Weighting, Term Frequency (TF), Document Frequency (DF), TF-IDF weighting.
Week 6:Vector Space Model, Length Normalization, Similarity Computation: Cosine Similarity.
Normalized Term Frequency, Discriminative term weights, sub-linear TF scaling, SMART notation for setting term weights during retrieval.
Week 7:Probability Ranking Principle, Binary Independence Model, Okapi BM25 Model, BM25 vs. VSM.
Week 8:
● Language Model for IR: Zero frequency problem, smoothing, Jelinek-Mercer Smoothing and its problem, Dirichlet Smoothing, comparing smoothing with IDF factor
● Divergence and how divergence can be used for ranking. Kullback–Leibler divergence, Jensen-Shannon divergence.
● Retrieval using Lucene - some hands-on examples with real data.
Week 9:Retrieval System Evaluation Metrics: Cranfield method; set-based metrics: Recall, Precision, F-measure, Accuracy; rank-based metrics: precision at K, R-precision, non-interpolated average precision, Mean Average Precision (MAP), 11-point interpolated average precision, graded relevance, cumulated gain, normalized discounted cumulated gain (NDCG), reciprocal rank, Geometric-MAP, Kappa measure, pooling, role of difference evaluation fora
Week 10:Relevance Feedback: explicit, implicit and pseudo feedback, Rocchio’s feedback algorithm, estimating the relevance based language model with independent and identically distributed (IID) sampling, RM3 model.
Week 11:
● An Introduction to Embeddings. Word, Sentence and Document embeddings.
● Semantic retrieval: Some applications of embeddings for improving retrieval performance.
● Transformers and their applications for improving retrieval performance; ColBERT, RoBERTA, SentenceBERT.
Week 12:
● Web search: crawling basics and architecture, shingling, Link analysis: HITS and PageRank algorithm. Basics of Search Engine Optimisation.
● A brief introduction to LLMs and their applications in Information Retrieval and Web Search.

Taught by

Prof . Dwaipayan Roy

Reviews

Start your review of Introduction to Information Retrieval

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.