PYTHONNCCLBFLOAT16 2026

GPT-2, from scratch

124M params, custom training loop

[ shipped ]

A faithful reproduction of GPT-2 (small) — same architecture, same tokenizer, same hyperparameters — trained on a curated subset of OpenWebText. The goal was to inhabit each design decision rather than just consume the result.

Companion essay: On reproducing GPT-2 from scratch.