a working catalogue
more in the lab →
post-training a 0.6B model to reason about math — GRPO vs. distillation
reproducing the chat model training pipeline, end to end.
124M params, custom training loop