Blog

notes, blogs, and lab logs

updated jun 2026

  1. jun 2026 RL for Language Models, From First Principles From probability basics to REINFORCE, PPO, GRPO, GSPO, and the knobs that make training work