-
Sun Yat-sen University
- Guangzhou, China
-
12:24
(UTC +08:00)
Highlights
- Pro
Stars
Fundamentals of Digital Media Technology(04713901) | Peking University ECE Course Materials
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
YaRN: Efficient Context Window Extension of Large Language Models
Official Code for Stable Cascade
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models (CVPR 2024)
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
[ECCV 2024] PowerPaint, a versatile image inpainting model that supports text-guided object inpainting, object removal, image outpainting and shape-guided object inpainting with only a single model…
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
Efficient Triton Kernels for LLM Training
Generative Models by Stability AI
Iterable datapipelines for pytorch training.
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
A very minimal example to get LLM models working with FSDP. The goal is to be minimal, yet scalable to demonstrate the strengths of FSDP.
microsoft / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Enable macOS HiDPI and have a native setting.
A large-scale text-to-image prompt gallery dataset based on Stable Diffusion
MineRL Competition for Sample Efficient Reinforcement Learning - Python Package
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838