Build a Large Language Model (From Scratch)
This was useful because it does not treat large language models like magic.
The value of the book is that it walks through the stack in a way that is concrete:
- tokenization and embeddings
- attention and transformer blocks
- training mechanics
- implementation details that matter when moving from toy ideas to actual models
It is one of the better resources for turning abstract LLM concepts into code you can reason about.