BM25 Ranking

Text Representation & Classical NLP DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding BM25 Ranking, Term Frequency Saturation, Document Length Normalization, Inverse Document Frequency, BM25 Hyperparameters, Query Likelihood Estimation, Information Retrieval, Natural Language Processing, Probability and Statistics, Data Structures and Algorithms, Evaluation Metrics, Inverted Indexing, Probabilistic Ranking Models, Text Preprocessing, Vector Space Modeling, Corpus Statistics.

Implement the BM25 ranking function to calculate document scores for a query in an information retrieval context. BM25 is an advanced variation of TF IDF that incorporates term frequency saturation, document length normalization, and a configurable penalty for document length effects.