🤗 Models on Hugging Face | Blog post | (TBU) Tech paper
Motif is a high-performance open-source Korean LLM with 102 billion parameters. It shows top-notch Korean language performance among open-source LLMs. This project demonstrates a simple example of how to run and train Motif. You can also directly chat with the instruct version on Model Hub.
Provider | Model | kmmlu_direct score | Source |
---|---|---|---|
Moreh | Llama-3-Motif-102B (pretrained) | 64.74 | Measured by Moreh |
Moreh | Llama-3-Motif-102B-Instruct | 64.81 | Measured by Moreh |
Meta | Llama-3-70B-Instruct | 54.5 | https://arxiv.org/abs/2402.11548 |
Meta | Llama-3.1-70B-Instruct | 52.1 | https://arxiv.org/abs/2402.11548 |
Meta | Llama-3.1-405B-Instruct | 65.8 | https://arxiv.org/abs/2402.11548 |
Alibaba | Qwen2-72B-Instruct | 64.1 | https://arxiv.org/abs/2402.11548 |
OpenAI | GPT-4-0125-preview | 59.95 | https://arxiv.org/abs/2402.11548 |
OpenAI | GPT-4o-2024-05-13 | 64.11 | Measured by Moreh |
Gemini Pro | 50.18 | https://arxiv.org/abs/2402.11548 | |
LG | EXAONE 3.0 | 44.5 | https://arxiv.org/pdf/2408.03541 |
Naver | HyperCLOVA X | 53.4 | https://arxiv.org/abs/2402.11548 |
Upstage | SOLAR-10.7B | 41.65 | https://arxiv.org/pdf/2404.01954 |
To successfully run and train Motif, we recommend that the system meets the following requirements:
- Inference
- GPU: Minimum of 4x NVIDIA A100-80GB (when using batch size ≤8, sequence length 8192, and bfloat16)
- Storage: 1.3 TB of disk space for each checkpoint
- RAM: 500 GB for model parameters
- Training
- GPU: Minimum of 4x NVIDIA A100-80GB (when using batch size 2, sequence length 4096, and DeepSpeed O3)
- Storage: 1.3 TB of disk space for each checkpoint
- RAM: 2.5 TB for model parameters and optimizer state
Refer to this link to install vLLM.
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# Change tensor_parallel_size to GPU numbers you can afford
model = LLM("moreh/Llama-3-Motif-102B", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B")
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"},
]
messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]
# vLLM does not support generation_config of HF. So we have to set it like below
sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
responses = model.generate(messages_batch, sampling_params=sampling_params)
print(responses[0].outputs[0].text)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "moreh/Llama-3-Motif-102B"
# all generation configs are set in generation_configs.json
model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"},
]
messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()
outputs = model.generate(input_ids)
This section introduces the installation and execution process on CUDA. For usage on the MoAI platform, please refer to the "Training on MoAI Platform" section.
- Python 3.8 or higher
- Docker
-
Clone this repository:
git clone https://github.com/moreh-io/motif-llm.git cd motif-llm
-
Start a Docker container:
bash set_docker.sh
-
Install the required dependencies in the container:
docker exec -it --user root motif_trainer /bin/bash cd ~/motif_trainer apt-get update apt-get install libaio-dev pip install -r requirements/cuda_requirements.txt
-
Set DeepSpeed's accelerate config:
accelerate config In which compute environment are you running? This machine Which type of machine are you using? multi-GPU How many different machines will you use (use more than 1 for multi-node training)? [1]: 1 Should distributed operations be checked while running for errors? This can avoid timeout issues but will be slower. [yes/NO]: NO Do you wish to optimize your script with torch dynamo?[yes/NO]:NO Do you want to use DeepSpeed? [yes/NO]: yes Do you want to specify a json file to a DeepSpeed config? [yes/NO]: yes Please enter the path to the json DeepSpeed config file: resources/deepspeed_config.json Do you want to enable `deepspeed.zero.Init` when using ZeRO Stage-3 for constructing massive models? [yes/NO]: No Do you want to enable Mixture-of-Experts training (MoE)? [yes/NO]: no How many GPU(s) should be used for distributed training? [1]:8 accelerate configuration saved at (user_hf_cache_path)
Run the sample training script with your desired model save path:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash run.sh /path/to/save/model
Run the sample vLLM inference script with the saved model path:
bash inference.sh /path/to/save/model
MoAI Platform enables training and fine-tuning of models with tens/hundreds of billions parameters on GPU clusters with minimal effort. To easily fine-tune the Motif model on a large number of GPUs, please follow the steps below.
- Contact Moreh for getting training environment
- Install packages in
requirements/moai_requirements.txt
- Run the
example_train_model_moai.py
script to use tens or hundreds of GPUs together