Lab — Jeet Ganatra

fine-tuning nanochat on math

does GRPO actually help on AIME problems?

agentic eval harness

LLM-as-judge, but with rubrics.

speculative decoding from scratch

how much latency can a draft model save?

reading: attention is all you need (5th time)

still finding new things in the same paper.

long-context experiments

what breaks past 100k tokens?

loss curve forensics

why did training spike at step 11k?

Reading

Scaling Laws for Neural Language Models — Kaplan et al.
Software Engineering at Google — Titus Winters et al.