Repository for code to linearize Llama-3.2-1B
Currently contains code only for linear attention + sliding window
- Create and activate a virtual environment:
python -m venv .venv
# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate
- Install requirements:
pip install -r requirements.txt
- Run the notebooks in the following order:
# Attention Transfer
Llama_attn_transfer.ipynb
# LoRA fintune
llama_lora_finetune.ipynb
# Evaluation
Linear_llama_eval_inference_speed.ipynb
MMLU_eval-0shot.ipynb
MMLU_eval-5shot.ipynb