Elo Rating System for Model Comparison

Retrieval & Ranking Systems DS practice problem on Onlearn.

Difficulty: medium.

Topics: Elo Rating System for Model Comparison, Logistic Regression, Maximum Likelihood Estimation, Normalized Discounted Cumulative Gain, Learning Rate Scheduling, Confidence Intervals, Game Theory, Statistical Inference, Information Retrieval, Optimization Theory, Performance Evaluation, Bradley-Terry Models, Bayesian Updating, Ranking Metrics, Stochastic Approximation, A/B Testing Frameworks.

Implement a function to update Elo ratings for machine learning models based on pairwise comparison results. The Elo rating system, originally designed for chess, is now widely used in ML to rank language models based on human preference data (e.g., Chatbot Arena leaderboard). Given: A dictionary of model names mapped to their current Elo ratings A list of match results, where each match is a tuple (model a, model b, result) result is 'a' if model a wins, 'b' if model b wins, or 'draw' for a tie A K factor that controls rating sensitivity Write a function elo rating update(ratings, matches, k factor) that: 1. Processes matches sequentially (ratings update after each match) 2. For each match, calculates expected scores based on current ratings 3. Updates both models' ratings based on the actual outcome 4. Returns the final updated ratings dictionary The expected score represents the probability of winning based on rating difference. When a lower rated model beats a higher rated one, it gains more points than if a higher rated model wins against a weaker opponent.