Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
allegro		allegro
assets		assets
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
single_inference.py		single_inference.py
vae_inference.py		vae_inference.py

Repository files navigation

Gallery · HuggingFace · Blog · Paper · Discord

Allegro is capable of producing high-quality, 6-second videos at 30 frames per second and 720p resolution from simple text prompts.

Model Info

Model	Allegro
Description	Text-to-Video Generation Model
Download	Hugging Face
Parameter	VAE: 175M
Parameter	DiT: 2.8B
Inference Precision	VAE: FP32/TF32/BF16/FP16 (best in FP32/TF32)
Inference Precision	DiT/T5: BF16/FP32/TF32
Context Length	79.2k
Resolution	720 x 1280
Frames	88
Video Length	6 seconds @ 15 fps
Single GPU Memory Usage	9.3G BF16 (with cpu_offload)

Requirement

Download the weight in Hugging Face: rhymes-ai/Allegro
Prerequisites: Python >= 3.10, PyTorch >= 2.4, CUDA >= 12.4.
Tip: It is recommended to use Anaconda to create a new environment (Python >= 3.10) to run the following example.

  git clone https://github.com/rhythms-ai/allegro
  conda create -n allegro python=3.10 -y
  conda activate allegro
  
  pip install requirements

Inference

Tip: It is highly recommended to use a video frame interpolation model (such as EMA-VFI) to enhance the result to 30 FPS.

  python single_inference.py \
  --user_prompt 'A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.' \
  --vae your/path/to/vae \
  --dit your/path/to/transformer \
  --text_encoder your/path/to/text_encoder \
  --tokenizer your/path/to/tokenizer \
  --guidance_scale 7.5 \
  --num_sampling_steps 100 \
  --seed 42

Limitation

The model cannot render celebrities, legible text, specific locations, streets or buildings.

Future Plan

Multiple GPU inference and further speed up (PAB)
Text & Image-To-Video (TI2V) video generation
Motion-controlled video generation
Visual quality enhancement

Support

If you encounter any problems or have any suggestions, feel free to open an issue or send an email to [email protected].

Citation

License

This repo is released under the Apache 2.0 License.

Disclaimer

The Allegro model is provided on an "AS IS" basis, and we disclaim any liability for consequences or damages arising from your use. Users are kindly advised to ensure compliance with all applicable laws and regulations. This includes, but is not limited to, prohibitions against illegal activities and the generation of content that is violent, pornographic, obscene, or otherwise deemed non-safe, inappropriate, or illegal. By using these models, you agree that we shall not be held accountable for any consequences resulting from your use.

Acknowledgment

We extend our heartfelt appreciation for the great contribution to the open-source community, especially Open-Sora-Plan, as we build our diffusion transformer (DiT) based on Open-Sora-Plan v1.2.

Open-Sora-Plan: A project aims to create a simple and scalable repo, to reproduce Sora.
Open-Sora: An initiative dedicated to efficiently producing high-quality video.
ColossalAI: A powerful large model parallel acceleration and optimization system.
VideoSys: An open-source project that provides a user-friendly and high-performance infrastructure for video generation.
DiT: Scalable Diffusion Models with Transformers.
PixArt: An open-source DiT-based text-to-image model.
StabilityAI VAE: A powerful image VAE model.
CLIP: A powerful text-image embedding model.
T5: A powerful text encoder.
Playground: A state-of-the-art open-source model in text-to-image generation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Info

Requirement

Inference

Limitation

Future Plan

Support

Citation

License

Disclaimer

Acknowledgment

About

Releases

Packages

Contributors 7

Languages

License

rhymes-ai/Allegro

Folders and files

Latest commit

History

Repository files navigation

Model Info

Requirement

Inference

Limitation

Future Plan

Support

Citation

License

Disclaimer

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages