Skip to content

Open-Sora: Democratizing Efficient Video Production for All

License

Notifications You must be signed in to change notification settings

mysqlsc/Open-Sora

Repository files navigation

Open-Sora: Towards Open Reproduction of Sora

Open-Sora is an open-source initiative dedicated to efficiently reproducing OpenAI's Sora. Our project aims to cover the full pipeline, including video data preprocessing, training with acceleration, efficient inference and more. Operating on a limited budget, we prioritize the vibrant open-source community, providing access to text-to-image, image captioning, and language models. We hope to make a contribution to the community and make the project more accessible to everyone.

📰 News

  • [2024.03.18] 🔥 We release Open-Sora 1.0, an open-source project to reproduce OpenAI Sora. Open-Sora 1.0 supports a full pipeline of video data preprocessing, training with acceleration, inference, and more. Our provided checkpoint can produce 2s 512x512 videos.

🎥 Latest Demo

2s 512x512 2s 512x512

🔆 New Features/Updates

  • 📍 Open-Sora-v1 is trained on xxx. We train the model in three stages. Model weights are available here. Training details can be found here.
  • ✅ Support training acceleration including flash-attention, accelerated T5, mixed precision, gradient checkpointing, splitted VAE, sequence parallelism, etc. XXX times. See more discussions here.
  • ✅ We provide video cutting and captioning tools for data preprocessing. Our data collection plan can be found here.
  • ✅ We find VQ-VAE from [] has a low quality and thus adopt a better VAE from []. We also find patching in the time dimension deteriorates the quality. See more discussions here.
  • ✅ We investigate different architectures including DiT, Latte, and our proposed STDiT. Our STDiT achieves a better trade-off between quality and speed. See more discussions here.
  • ✅ Support clip and t5 text conditioning.
  • ✅ By viewing images as one-frame videos, our project supports training DiT on both images and videos (e.g., ImageNet & UCF101).
  • ✅ Support inference with official weights from DiT, Latte, and PixArt.

TODO list sorted by priority

  • Complete the data processing pipeline (including dense optical flow, aesthetics scores, text-image similarity, deduplication, etc.). See datasets.md for more information. [WIP]
  • Training Video-VAE. [WIP]
  • Support image and video conditioning.
  • Evaluation pipeline.
  • Incoporate a better scheduler, e.g., rectified flow in SD3.
  • Support variable aspect ratios, resolutions, durations.
  • Support SD3 when released.

Contents

Installation

git clone https://github.com/hpcaitech/Open-Sora
cd Open-Sora
pip install xxx

After installation, to get fimilar with the project, you can check the here for the project structure and how to use the config files.

Model Weights

Model #Params url
16x256x256

Inference

python scripts/inference.py configs/opensora/inference/16x256x256.py

Data Processing

Split video into clips

We provide code to split a long video into separate clips efficiently using multiprocessing. See tools/data/scene_detect.py.

Generate video caption

Training

Acknowledgement

  • DiT: Scalable Diffusion Models with Transformers.
  • OpenDiT: An acceleration for DiT training. OpenDiT's team provides valuable suggestions on acceleration of our training process.
  • PixArt: An open-source DiT-based text-to-image model.
  • Latte: An attempt to efficiently train DiT for video.
  • StabilityAI VAE: A powerful image VAE model.
  • CLIP: A powerful text-image embedding model.
  • T5: The powerful text encoder.
  • LLaVA: A powerful image captioning model based on LLaMA and Yi-34B.
  • PySceneDetect: A powerful tool to split video into clips.

We are grateful for their exceptional work and generous contribution to open source.

Citation

@software{opensora,
  author = {Zangwei Zheng and Xiangyu Peng and Shenggui Li and Yang You},
  title = {Open-Sora: Towards Open Reproduction of Sora},
  month = {March},
  year = {2024},
  url = {https://github.com/hpcaitech/Open-Sora}
}

Zangwei Zheng and Xiangyu Peng equally contributed to this work.

Star History

Star History Chart

TODO

Modules for releasing:

  • configs
  • opensora
  • assets
  • scripts
  • tools

packages for data processing

put all outputs under ./checkpoints/, including pretrained_models, checkpoints, samples

About

Open-Sora: Democratizing Efficient Video Production for All

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%