Update README.md

Adding quick start steps
hertera1 · Sep 9, 2023 · 6c2f236 · 6c2f236
1 parent 2db73a5
commit 6c2f236
Showing 1 changed file with 26 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -28,11 +28,35 @@ We are also providing downloads on [Hugging Face](https://huggingface.co/meta-ll
 
 ## Setup
 
-In a conda env with PyTorch / CUDA available, clone the repo and run in the top-level directory:
+You can follow the steps below to quickly get up and running with Llama 2 models. These steps will let you run quick inference locally. For more examples, see the [Llama 2 recipes repository](https://github.com/facebookresearch/llama-recipes). 
 
-```
+1. In a conda env with PyTorch / CUDA availableClone and download this repository
+
+2. In the top level directory run:
+```bash
 pip install -e .
 ```
+3. Visit the [Meta.AI website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and register to download the model/s.
+
+4. Once registered, you will get an email with a URL to download the models. You will need this URL when you run the download.sh script.
+
+5. Navigate to your downloaded llama repository and run the download.sh script. 
+    - Make sure to grant execution permissions to the download.sh script
+    - During this process, you will be prompted to enter the URL from the email. 
+    - Do not use the “Copy Link” option but rather make sure to manually copy the link from the email.
+
+6. Once the model/s you want have been downloaded, you can run the model locally using the command below:
+```bash
+torchrun --nproc_per_node 1 example_chat_completion.py \
+    --ckpt_dir llama-2-7b-chat/ \
+    --tokenizer_path tokenizer.model \
+    --max_seq_len 512 --max_batch_size 6
+```
+**Note**
+- Replace  `llama-2-7b-chat/` with the path to your checkpoint directory and `tokenizer.model` with the path to your tokenizer model.
+- The `–nproc_per_node` should be set to the [MP](#inference) value for the model you are using.
+- Adjust the `max_seq_len` and `max_batch_size` parameters as needed.
+- This example runs the example_chat_completion.py but you can change that to a different .py file.
 
 ## Inference