fine-tuning nanochat on math
does GRPO actually help on AIME problems?
a public notebook of works-in-progress
These are the experiments and questions I am chewing on this month. Updated when I remember to.
does GRPO actually help on AIME problems?
LLM-as-judge, but with rubrics.
how much latency can a draft model save?
still finding new things in the same paper.
what breaks past 100k tokens?
why did training spike at step 11k?