PYTHONNCCLBFLOAT16 2026
GPT-2, from scratch
124M params, custom training loop
A faithful reproduction of GPT-2 (small) — same architecture, same tokenizer, same hyperparameters — trained on a curated subset of OpenWebText. The goal was to inhabit each design decision rather than just consume the result.
Companion essay: On reproducing GPT-2 from scratch.