Knowledge Distillation Loss

Sequence Models & Generative Models DS practice problem on Onlearn.

Difficulty: medium.

Topics: Knowledge Distillation Loss, Temperature Scaling, Logit Matching, Cross-Entropy Loss, Weight Pruning, Feature Map Alignment, Information Theory, Optimization Theory, Probabilistic Graphical Models, Neural Network Architectures, Supervised Learning, Kullback-Leibler Divergence, Stochastic Gradient Descent, Softmax Normalization, Teacher-Student Frameworks, Model Compression.

Implement the knowledge distillation loss used to transfer capabilities from a large teacher model to a smaller student model. Distillation trains the student to match the teacher's soft probability distribution rather than hard labels, enabling smaller models to achieve performance closer to their larger counterparts. Compute the KL divergence between temperature softened teacher and student distributions.

dsSequence Models & Generative Models

Tutor

Waking the tutor…

Sequence Models & Generative Models

0 of 13 solved

Back to roadmap