BLEU Score for Text Generation

Text Generation & NLP Evaluation DS practice problem on Onlearn.

Difficulty: medium.

Topics: BLEU Score for Text Generation, N-gram Precision, Brevity Penalty, Modified Unigram Count, Geometric Mean, Reference Corpus, Natural Language Processing, Information Theory, Statistical Learning, Software Engineering, Data Science, Sequence Modeling, Evaluation Metrics, Corpus Linguistics, Performance Benchmarking, Text Preprocessing.

Implement the BLEU (Bilingual Evaluation Understudy) score metric, which is widely used to evaluate the quality of machine generated text by comparing it against one or more reference texts. Given a candidate sentence (as a list of tokens), a list of reference sentences (each as a list of tokens), and a maximum n gram order, compute the BLEU score. Your function should: 1. Calculate modified n gram precision for each n from 1 to max n, where counts are clipped to avoid gaming by repetition 2. Apply a brevity penalty to discourage overly short translations 3. Combine the precisions using a geometric mean 4. Return 0.0 if any n gram precision is zero or if the candidate is empty 5. When selecting the reference length for brevity penalty with multiple references, choose the length closest to the candidate length (if tied, choose shorter)