GuppyLM: A 9-Million-Parameter LLM Built in 130 Lines of PyTorch That Trains in 5 Minutes on a Free GPU
A developer has built GuppyLM — a tiny but functional language model with 9 million parameters, trained on 60,000 synthetic conversations using a vanilla transformer architecture written in roughly 130 lines of PyTorch. It trains to conversational competence in about 5 minutes on a free Google Colab T4 GPU. The project has 892 upvotes on Hacker News from developers who say it is the clearest educational LLM implementation they have seen.

D.O.T.S AI Newsroom
AI News Desk
The best way to understand how language models work is to build one from scratch at a scale that fits inside your head. GuppyLM, a GitHub project by developer Arman Basak that cracked the top of Hacker News this week with 892 points, makes that possible in under an hour. The model has 9 million parameters — roughly 0.00001% the size of a state-of-the-art frontier model — and is implemented in approximately 130 lines of PyTorch. Training the model to produce coherent, context-aware conversational responses takes about 5 minutes on a free T4 GPU in Google Colab.
What GuppyLM Actually Does
The architecture is a standard transformer decoder — the same fundamental design used by GPT-4, Claude, Llama, and every other major language model. GuppyLM uses multi-head self-attention, positional embeddings, and a standard autoregressive training loop. The training dataset is 60,000 synthetic conversations generated to cover a range of topics. The model learns to predict the next token given the prior context, exactly as large models do — the difference is purely one of scale. At 9M parameters, GuppyLM's outputs are limited: it handles simple conversational patterns but cannot reason, retrieve facts, or generalize to novel domains. But the implementation is clean enough that every component maps clearly to concepts described in the original "Attention Is All You Need" paper.
Why Educational Implementations Matter
The HN response (134 comments, many from practitioners) suggests GuppyLM fills a gap. Andrej Karpathy's nanoGPT is the canonical educational LLM implementation, but even nanoGPT requires non-trivial setup and several hours of training to produce meaningful outputs. GuppyLM is optimized for immediate comprehension: the 130-line implementation is short enough to read in a sitting, the training loop is fast enough to complete before interest fades, and the conversational output format makes it clear what the model is doing even at small scale. The project is open source under the MIT license.