Skip to content

Commit

Permalink
Merge branch 'main' into mistral_docs
Browse files Browse the repository at this point in the history
  • Loading branch information
martinjaggi authored Nov 29, 2023
2 parents 93d8c25 + 162a0d7 commit 56b8866
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ This library enables pre-training and fine-tuning of large language models (LLMs
Our repository is a modification of the [original Megatron-LM codebase](https://github.com/NVIDIA/Megatron-LM) by Nvidia.

Added key features include:
- [Llama](https://arxiv.org/abs/2302.13971), [Llama 2](https://arxiv.org/abs/2307.09288), [Code Llama](https://arxiv.org/abs/2308.12950), [Falcon](https://huggingface.co/tiiuae), and [Mistral](https://arxiv.org/abs/2310.06825) support.
- support training of large models (70B Llama 2, 65B Llama 1, 34B Code Llama, and 40B Falcon) on commodity hardware on multiple nodes
- architectures supported: [Llama](https://arxiv.org/abs/2302.13971), [Llama 2](https://arxiv.org/abs/2307.09288), [Code Llama](https://arxiv.org/abs/2308.12950), [Falcon](https://huggingface.co/tiiuae) and [Mistral](https://arxiv.org/abs/2310.06825)
- support training of large models (70B Llama 2, 65B Llama 1, 34B Code Llama, 40B Falcon and Mistral) on commodity hardware on multiple nodes
- 3-way parallelism: tensor parallel, pipeline parallel and data parallel training (inherited from Megatron)
- full pretraining, finetuning and instruct tuning support
- Support for special tokens & tokenizers
Expand All @@ -35,10 +35,10 @@ make html
```

# Example models trained with *Megatron-LLM*
70B Llama 2 [1](https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10),
40B Falcon [1](https://huggingface.co/OpenAssistant/falcon-40b-megacode2-oasst),
13B Code Llama [1](https://huggingface.co/OpenAssistant/codellama-13b-oasst-sft-v10),
7B and 70B Meditron LLMs [1](https://huggingface.co/epfl-llm/meditron-70b).
70B Llama2: [meditron 70b](https://huggingface.co/epfl-llm/meditron-70b), [llama2-70b-oasst-sft-v10](https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10),
40B Falcon: [falcon-40b-megacode2-oasst](https://huggingface.co/OpenAssistant/falcon-40b-megacode2-oasst),
13B Code Llama: [codellama-13b-oasst-sft-v10](https://huggingface.co/OpenAssistant/codellama-13b-oasst-sft-v10),
7B Llama2: [meditron 7b](https://huggingface.co/epfl-llm/meditron-7b), ...
(Let us know about yours!)

# Citation
Expand All @@ -51,6 +51,7 @@ If you use this software please cite it:
Andreas Köpf and
Kyle Matoba and
Amirkeivan Mohtashami and
Xingyao Wang and
Olivia Simin Fan and
Axel Marmet and
Deniz Bayazit and
Expand Down

0 comments on commit 56b8866

Please sign in to comment.