Reading
books and papers, current and past
currently reading (6)
- On-Policy Distillation of Language Models Distilling from a teacher on the student's own samples — the paper the recent self-distillation wave builds on.
- Understanding Self-Distillation and Privileged Information Distillation A clear synthesis tying the recent self- and privileged-information distillation papers together.
- Reinforcement Learning via Self-Distillation Reframing RL fine-tuning through a distillation lens.
- Self-Distillation Enables Continual Learning Self-distillation as a remedy for catastrophic forgetting.
- Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models On-policy self-distillation aimed squarely at reasoning models.
- On-Policy Distillation Thinking Machines' accessible walkthrough of the idea, with the intuition front and centre.
queued (3)
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Where GRPO was introduced; pairs directly with the nano-math-reasoner project.
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Expert specialization in MoE — queued for the systems side.
- Proximal Policy Optimization Algorithms The policy-gradient workhorse — back to the source before the GRPO lineage.