Stars
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
A family of diffusion models for text-to-audio generation.
A series of large language models developed by Baichuan Intelligent Technology
Meta-Transformer for Unified Multimodal Learning
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
ModelScope: bring the notion of Model-as-a-Service to life.
High-Resolution Image Synthesis with Latent Diffusion Models
[ICCV 2023 Oral] Text-to-Image Diffusion Models are Zero-Shot Video Generators
A Survey on Text-to-Video Generation/Synthesis.
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
Finetune ModelScope's Text To Video model using Diffusers 🧨
Generate image from anything with ImageBind and Stable Diffusion
BindDiffusion: One Diffusion Model to Bind Them All
ImageBind One Embedding Space to Bind Them All