jeetganatra.me
home
/
blog
/
projects
/
lab
/
reading
/
about
☾
dark
Blog
notes, blogs, and lab logs
updated jun 2026
jun 2026
RL for Language Models, From First Principles
From probability basics to REINFORCE, PPO, GRPO, GSPO, and the knobs that make training work