Skip to content

ruilimit/llama-cpu

Repository files navigation

Tested on Macbook Pro M1 Max -- pytorch nightly

LLaMA

This repository is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference. In order to download the checkpoints and tokenizer, fill this google form

Setup

In a conda env with pytorch / cuda available, run

pip install -r requirements.txt

Then in this repository

pip install -e .

If you are using MPS commit, use these to disable mps backend memory limit + fallback

export PYTORCH_ENABLE_MPS_FALLBACK=1
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
torchrun --nproc_per_node 1 example.py --ckpt_dir $TARGET_FOLDER/model_size --tokenizer_path $TARGET_FOLDER/tokenizer.model

Different models require different MP values:

Model MP
7B 1
13B 2
33B 4
65B 8

Model Card

See MODEL_CARD.md

License

See the LICENSE file.

About

Inference code for LLaMA models (modified for cpu)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.9%
  • Shell 7.1%