Quality Filtering with Rejection Sampling
Retrieval & Ranking Systems DS practice problem on Onlearn.
Difficulty: medium.
Topics: Quality Filtering with Rejection Sampling, Multi-Head Self-Attention, Spatial Max Pooling, Proximal Policy Optimization, Gibbs Sampling, Long Short-Term Memory Cells, Natural Language Processing, Computer Vision, Reinforcement Learning, Probabilistic Graphical Models, Time Series Analysis, Transformer Architectures, Convolutional Neural Networks, Policy Gradient Methods, Bayesian Inference, Recurrent Neural Networks.
Implement a function that performs quality based rejection sampling on a set of generated candidate samples. This technique is widely used in machine learning pipelines where a model generates multiple candidate outputs, and only those meeting a minimum quality standard are retained. Given a list of quality scores for N generated samples, a quality threshold, and an optional maximum number of samples to select, your function should: 1. Reject all samples whose quality score falls below the threshold 2. Among the accepted samples, rank them by quality score in descending order 3. If n select is provided and is less than the number of accepted samples, keep only the top n select samples 4. Compute summary statistics about the filtering process Your function should return a dictionary with: 'accepted indices': a list of integer indices of the final selected samples (sorted by score descending) 'acceptance rate': the fraction of original samples that passed the threshold (rounded to 4 decimal places) 'mean quality': the mean quality score of the final selected samples (rounded to 4 decimal places), or 0.0 if no samples are accepted