Building My Own LLaMA-2 Model

Implemented modern LLM components from scratch, including RoPE, grouped-query attention, and LoRA fine-tuning.

Built during Carnegie Mellon’s Generative AI course (10-623) in January and February 2025.

What I implemented

I rebuilt key pieces of a modern language model from scratch to understand how design choices inside LLMs affect memory, speed, and adaptation.

Highlights

  • Implemented Rotary Position Embeddings (RoPE) to improve positional encoding.
  • Implemented Grouped-Query Attention (GQA) to reduce memory cost and improve efficiency.
  • Applied instruction fine-tuning with LoRA on attention layers to adapt the model efficiently.

Why this project mattered

This project forced me to go below the API layer and work at the model-mechanics level. It was a useful bridge between using LLMs and actually understanding how they are built.