Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Adding quick start steps
  • Loading branch information
sekyondaMeta authored Sep 9, 2023
1 parent 2db73a5 commit 6c2f236
Showing 1 changed file with 26 additions and 2 deletions.
28 changes: 26 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,35 @@ We are also providing downloads on [Hugging Face](https://huggingface.co/meta-ll

## Setup

In a conda env with PyTorch / CUDA available, clone the repo and run in the top-level directory:
You can follow the steps below to quickly get up and running with Llama 2 models. These steps will let you run quick inference locally. For more examples, see the [Llama 2 recipes repository](https://github.com/facebookresearch/llama-recipes).

```
1. In a conda env with PyTorch / CUDA availableClone and download this repository

2. In the top level directory run:
```bash
pip install -e .
```
3. Visit the [Meta.AI website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and register to download the model/s.

4. Once registered, you will get an email with a URL to download the models. You will need this URL when you run the download.sh script.

5. Navigate to your downloaded llama repository and run the download.sh script.
- Make sure to grant execution permissions to the download.sh script
- During this process, you will be prompted to enter the URL from the email.
- Do not use the “Copy Link” option but rather make sure to manually copy the link from the email.

6. Once the model/s you want have been downloaded, you can run the model locally using the command below:
```bash
torchrun --nproc_per_node 1 example_chat_completion.py \
--ckpt_dir llama-2-7b-chat/ \
--tokenizer_path tokenizer.model \
--max_seq_len 512 --max_batch_size 6
```
**Note**
- Replace `llama-2-7b-chat/` with the path to your checkpoint directory and `tokenizer.model` with the path to your tokenizer model.
- The `–nproc_per_node` should be set to the [MP](#inference) value for the model you are using.
- Adjust the `max_seq_len` and `max_batch_size` parameters as needed.
- This example runs the example_chat_completion.py but you can change that to a different .py file.

## Inference

Expand Down

0 comments on commit 6c2f236

Please sign in to comment.