Building My Own LLaMA-2 Model

Built during Carnegie Mellon’s Generative AI course (10-623) in January and February 2025.

What I implemented

I rebuilt key pieces of a modern language model from scratch to understand how design choices inside LLMs affect memory, speed, and adaptation.

Highlights

Implemented Rotary Position Embeddings (RoPE) to improve positional encoding.
Implemented Grouped-Query Attention (GQA) to reduce memory cost and improve efficiency.
Applied instruction fine-tuning with LoRA on attention layers to adapt the model efficiently.

Why this project mattered

This project forced me to go below the API layer and work at the model-mechanics level. It was a useful bridge between using LLMs and actually understanding how they are built.