Skip to content

M3GPT: An advanced multimodal, multitask framework for motion comprehension and generation.

Notifications You must be signed in to change notification settings

luomingshuang/M3GPT

Repository files navigation


M3GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation

Mingshuang Luo, Ruibing Hou, Zhuo Li, Hong Chang, Zimo Liu, Yaowei Wang, Shiguang Shan

Paper PDF Project Page


This is the official repository of M3GPT, An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation.


Environment Preparation

1. Conda environment

# clone project   
git clone https://github.com/luomingshuang/M3GPT.git

# create conda environment
cd M3GPT
conda create -n m3gpt python=3.8
conda activate m3gpt

# install dependencies
pip install torch==2.0.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt

Datasets Preparation

1. Download Datasets

This project is implemented with Motion-X, AIST++, and FineDance datasets.

2. Preprocess Data

In this project, we focus on the body-part motion (not including face and hand motion). So we use the body feats among the above three datasets to model m3gpt. For all motion data, we unify the frame rate to 30.

Train the Model

1. Train text-motion evaluator

2. Train motion vq-vae

3. Train music vq-vae

4. Multimodal Multitask LLM Pretrain

5. Multimodal Multitask LLM Instruction Tune

Evaluate the Model

1. Text-to-Motion

2. Motion-to-Text

3. Music-to-Dance

4. Dance-to-Music

5. Motion Prediction/Inbetween

Visualization

About

M3GPT: An advanced multimodal, multitask framework for motion comprehension and generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages