Lab

a public notebook of works-in-progress

These are the experiments and questions I am chewing on this month. Updated when I remember to.

updated weekly

idea: better evals lead to better models.

fine-tuning nanochat on math

does GRPO actually help on AIME problems?

agentic eval harness

LLM-as-judge, but with rubrics.

speculative decoding from scratch

how much latency can a draft model save?

loss step ~11k

reading: attention is all you need (5th time)

still finding new things in the same paper.

long-context experiments

what breaks past 100k tokens?

loss curve forensics

why did training spike at step 11k?

Reading

  • Scaling Laws for Neural Language Models — Kaplan et al.
  • Software Engineering at Google — Titus Winters et al.