Skip to content

Commit

Permalink
Merge branch 'main' into xingjia01-download-script
Browse files Browse the repository at this point in the history
  • Loading branch information
xingjia01 authored Apr 18, 2024
2 parents c9791da + b5c744f commit ba19483
Show file tree
Hide file tree
Showing 4 changed files with 65 additions and 5 deletions.
Binary file added Llama3_Repo.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
<p align="center">
<img src="https://github.com/meta-llama/llama3/blob/main/Llama3_Repo.jpeg" width="400"/>
</p>

<p align="center">
🤗 <a href="https://huggingface.co/meta-Llama"> Models on Hugging Face</a>&nbsp | <a href="https://ai.meta.com/blog/"> Blog</a>&nbsp | <a href="https://llama.meta.com/">Website</a>&nbsp | <a href="https://llama.meta.com/get-started/">Get Started</a>&nbsp
<br>

---


# Meta Llama 3

We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly.
Expand Down Expand Up @@ -43,11 +54,11 @@ You can follow the steps below to quickly get up and running with Llama 3 models
```bash
torchrun --nproc_per_node 1 example_chat_completion.py \
--ckpt_dir Meta-Llama-3-8B-Instruct/ \
--tokenizer_path tokenizer.model \
--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
--max_seq_len 512 --max_batch_size 6
```
**Note**
- Replace `Meta-Llama-3-8B-Instruct/` with the path to your checkpoint directory and `tokenizer.model` with the path to your tokenizer model.
- Replace `Meta-Llama-3-8B-Instruct/` with the path to your checkpoint directory and `Meta-Llama-3-8B-Instruct/tokenizer.model` with the path to your tokenizer model.
- The `–nproc_per_node` should be set to the [MP](#inference) value for the model you are using.
- Adjust the `max_seq_len` and `max_batch_size` parameters as needed.
- This example runs the [example_chat_completion.py](example_chat_completion.py) found in this repository but you can change that to a different .py file.
Expand All @@ -72,7 +83,7 @@ See `example_text_completion.py` for some examples. To illustrate, see the comma
```
torchrun --nproc_per_node 1 example_text_completion.py \
--ckpt_dir Meta-Llama-3-8B-Instruct/ \
--tokenizer_path tokenizer.model \
--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
--max_seq_len 128 --max_batch_size 4
```

Expand All @@ -88,7 +99,7 @@ Examples using llama-3-8b-chat:
```
torchrun --nproc_per_node 1 example_chat_completion.py \
--ckpt_dir Meta-Llama-3-8B-Instruct/ \
--tokenizer_path tokenizer.model \
--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
--max_seq_len 512 --max_batch_size 6
```
Expand Down
50 changes: 50 additions & 0 deletions eval_details.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
### Llama 3 Evaluation Details
This document contains additional context on the settings and parameters for how we evaluated the Llama 3 pre-trained and instruct-aligned models.
### Auto-eval benchmark notes
#### MMLU
- We are reporting macro averages for MMLU benchmarks. The micro average numbers for MMLU are: 65.4 and 67.4 for the 8B pre-trained and instruct-aligned models, 78.9 and 82.0 for the 70B pre-trained and instruct-aligned models
- For the instruct-aligned MMLU we ask the model to generate the best choice character
#### AGI English
- We use the default few-shot and prompt settings as specified here. The score is averaged over the english subtasks.
#### CommonSenseQA
- We use the same 7-shot chain-of-thought prompt as in Wei et al. (2022).
#### Winogrande
- We use a choice based setup for evaluation where we fill in the missing blank with the two possible choices and then compute log-likelihood over the suffix. We use 5 shots for evaluation.
#### BIG-Bench Hard
- We use a 3-shot chain of thought style prompting and compute the average exact match over the subsets in this task.
#### ARC-Challenge
- We use the arc-challenge subset from the arc benchmark. We use 25 shots and use the MMLU setup for evaluation where we provide all the choices in the prompt and calculate likelihood over choice characters
#### TriviaQA-WIKI
- We evaluate on the Wiki validation set and use 5 few-shot examples.
#### SQuAD
- We are using SQuAD v2 and compute exact match in a 1-shot setting.
#### QuAC
- Same setting as Llama 2 (1-shot, f1).
#### BoolQ
- Same setting as Llama 1 and Llama 2 (0-shot, accuracy).
#### DROP
- For each validation example, we draw 3 random few-shot examples from the train split.
#### GPQA
- We report 0-shot exact match scores over the possible options using the Main subset for our models and other open-source models (Mistral, Gemma).
#### HumanEval
- Same setting as Llama 1 and Llama 2 (pass@1).
#### GSM8K
- We use the same 8-shot chain-of-thought prompt as in Wei et al. (2022) (maj@1).
#### MATH
- We use the 4-shot problem available in Lewkowycz et al. (2022) (maj@1).
### Human evaluation notes
This evaluation set contains 1,800 prompts that cover 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization.
|Capability|Category|Count|
|----------|--------|-----|
|Coding|Coding|150|
|Reasoning|Mathematical reasoning|150|
|English|Asking for Advice|150|
|English|Brainstorming|150|
|English|Classification|150|
|English|Closed Question Answering|150|
|English|Creative Writing|150|
|English|Extraction|150|
|English|Inhabiting a Character/Persona|150|
|English|Open Question Answering|150|
|English|Rewriting|150|
|English|Summarization|150|
1 change: 0 additions & 1 deletion eval_methodology.md

This file was deleted.

0 comments on commit ba19483

Please sign in to comment.