Building My Own LLaMA-2 Model
Implemented modern LLM components from scratch, including RoPE, grouped-query attention, and LoRA fine-tuning.
Built during Carnegie Mellon’s Generative AI course (10-623) in January and February 2025.
What I implemented
I rebuilt key pieces of a modern language model from scratch to understand how design choices inside LLMs affect memory, speed, and adaptation.
Highlights
- Implemented Rotary Position Embeddings (RoPE) to improve positional encoding.
- Implemented Grouped-Query Attention (GQA) to reduce memory cost and improve efficiency.
- Applied instruction fine-tuning with LoRA on attention layers to adapt the model efficiently.
Why this project mattered
This project forced me to go below the API layer and work at the model-mechanics level. It was a useful bridge between using LLMs and actually understanding how they are built.