LLM FORGE (Building Large Language Models from Scratch)

This project is a collection of my implementations of popular language models and attention mechanisms, including GPT-2 and LLaMA architectures, built using PyTorch.

Project Structure

├── Attentions/                       # Attention mechanism implementations
│   ├── GroupedQueryAttention.py      # Implementation of Grouped Query Attention
│   ├── CausalAttention.py            # Implementation of Causal Attetnion
│   ├── SimpleSelfAttention.py        # Implementation of Simple Attention layer
│   └── MultiHeadLatentAttention.py   # Implementation of Multi-Head Latent Attention
|
├── Blocks/                      # Core building blocks for the models
|   ├── Activations.py           # Activation functions (GELU, SiLU)
│   ├── Configs.py               # Model configurations
|   ├── FeedForwards.py          # Feed forward blocks for GPT and Llama models
|   ├── MixtureOfExperts.py      # Mixture of Experts
│   ├── Normalizations.py        # Layer and RMS Normalization implementations
|   ├── Positionals.py           # Positonal encoding and RoPE implementations
│   └── Transformers.py          # Transformer blocks for GPT2 and Llama
|
├── Models/                      # Main model implementations
│   ├── GPT2.py                  # GPT-2 model implementation
│   └── Llama.py                 # LLaMA model implementation
|
├── tokenizer.py                 # Llama 3 tokenizer implementation
└── trainGPT2.py                 # Training script for GPT-2 model

Papers Implemented

Features

Implementation of GPT-2 architecture
Implementation of LLaMA architecture
Custom attention mechanisms including Grouped Query Attention, Causal Attention
Modular design with separate blocks for transformers, normalizations, Postionals, and configurations
Training script for GPT-2 model

Training GPT-2

Install Dependencies:
```
pip install -r requirements.txt
```
Train the Model:
```
python -m trainGPT2
```

Future Work

LLaMA Training Implementation
- Develop a comprehensive training pipeline for the LLaMA model
- Add support for distributed training across multiple GPUs
Model Fine-tuning
- Implement fine-tuning capabilities for specific tasks
- Support for LoRA and other efficient fine-tuning methods
LLM Research Implementation
- Implement latest research papers in the field of LLMs
- Implement different attention mechanisms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM FORGE (Building Large Language Models from Scratch)

Project Structure

Papers Implemented

Features

Training GPT-2

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Attentions		Attentions
Blocks		Blocks
Models		Models
Papers		Papers
.gitattributes		.gitattributes
.gitignore		.gitignore
Hamlet.txt		Hamlet.txt
README.md		README.md
requirements.txt		requirements.txt
tokenizer.py		tokenizer.py
trainGPT2.py		trainGPT2.py

Mu7annad0/LLMForge

Folders and files

Latest commit

History

Repository files navigation

LLM FORGE (Building Large Language Models from Scratch)

Project Structure

Papers Implemented

Features

Training GPT-2

Future Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages