M³GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation

Mingshuang Luo, Ruibing Hou, Zhuo Li, Hong Chang, Zimo Liu, Yaowei Wang, Shiguang Shan

This is the official repository of M³GPT, An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation.

Environment Preparation

1. Conda environment

# clone project   
git clone https://github.com/luomingshuang/M3GPT.git

# create conda environment
cd M3GPT
conda create -n m3gpt python=3.8
conda activate m3gpt

# install dependencies
pip install torch==2.0.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt

Datasets Preparation

1. Download Datasets

This project is implemented with Motion-X, AIST++, and FineDance datasets.

2. Preprocess Data

In this project, we focus on the body-part motion (not including face and hand motion). So we use the body feats among the above three datasets to model m3gpt. For all motion data, we unify the frame rate to 30.

Train the Model

1. Train text-motion evaluator

2. Train motion vq-vae

3. Train music vq-vae

4. Multimodal Multitask LLM Pretrain

5. Multimodal Multitask LLM Instruction Tune

Evaluate the Model

1. Text-to-Motion

2. Motion-to-Text

3. Music-to-Dance

4. Dance-to-Music

5. Motion Prediction/Inbetween

Visualization

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
assets		assets
docs		docs
m3gpt		m3gpt
train_motion_vqvae		train_motion_vqvae
train_music_vqvae		train_music_vqvae
train_text2motion_evaluator		train_text2motion_evaluator
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M³GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation

Environment Preparation

1. Conda environment

Datasets Preparation

1. Download Datasets

2. Preprocess Data

Train the Model

1. Train text-motion evaluator

2. Train motion vq-vae

3. Train music vq-vae

4. Multimodal Multitask LLM Pretrain

5. Multimodal Multitask LLM Instruction Tune

Evaluate the Model

1. Text-to-Motion

2. Motion-to-Text

3. Music-to-Dance

4. Dance-to-Music

5. Motion Prediction/Inbetween

Visualization

About

Releases

Packages

Languages

luomingshuang/M3GPT

Folders and files

Latest commit

History

Repository files navigation

M3GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation

Environment Preparation

1. Conda environment

Datasets Preparation

1. Download Datasets

2. Preprocess Data

Train the Model

1. Train text-motion evaluator

2. Train motion vq-vae

3. Train music vq-vae

4. Multimodal Multitask LLM Pretrain

5. Multimodal Multitask LLM Instruction Tune

Evaluate the Model

1. Text-to-Motion

2. Motion-to-Text

3. Music-to-Dance

4. Dance-to-Music

5. Motion Prediction/Inbetween

Visualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

M³GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation

Packages