-
University of Wisconsin Madison
- Madison
- lzhangbj.github.io
Highlights
- Pro
Stars
Emu Series: Generative Multimodal Models from BAAI
Official implementation of OneDiffusion paper (CVPR 2025)
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
A curated list of fellowships for graduate students in Computer Science and related fields.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
WavJourney: Compositional Audio Creation with LLMs
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A vector-quantized periodic autoencoder (VQ-PAE) for motion alignment across different morphologies with no supervision [SIGGRAPH 2024]
PyTorch extensions for high performance and large scale training.
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
[CVPR 2024] | LAMP: Learn a Motion Pattern for Few-Shot Based Video Generation
[CVPR 2025] Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
Large-scale text-video dataset. 10 million captioned short videos.
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
[ECCV 2024] Official implementation of the paper "Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning"
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
Audio Visual Instance Discrimination with Cross-Modal Agreement
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training