Skip to content
View yzxing87's full-sized avatar

Highlights

  • Pro

Block or report yzxing87

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas

Python 3,216 406 Updated Jan 18, 2025

VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Python 268 7 Updated Jan 19, 2025

Next-Token Prediction is All You Need

Python 1,972 79 Updated Oct 24, 2024

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.

Python 4,033 290 Updated Oct 5, 2024

High-resolution models for human tasks.

Python 4,776 277 Updated Nov 18, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 904 39 Updated Jan 16, 2025

Official code for "RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control"

Jupyter Notebook 354 27 Updated Sep 7, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,183 97 Updated Jan 24, 2025

[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Python 29 1 Updated Nov 18, 2024

Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…

Python 5,144 445 Updated Jan 24, 2025

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

877 11 Updated Jun 21, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 34,604 5,290 Updated Jan 24, 2025

Creative Commons Licenses for Github

566 304 Updated Dec 10, 2024

Pythonic bindings for FFmpeg's libraries.

Cython 2,621 375 Updated Jan 22, 2025

Text-to-3D Generation within 5 Minutes

Python 686 48 Updated Mar 10, 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 2,387 227 Updated Apr 24, 2024

[WIP] Layer Diffusion for WebUI (via Forge)

Python 3,941 338 Updated Aug 30, 2024

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Python 645 26 Updated Dec 2, 2024

Transparent Image Layer Diffusion using Latent Transparency

2,058 29 Updated Jun 16, 2024

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Python 2,941 184 Updated Oct 31, 2024
Python 3,873 253 Updated Mar 15, 2024

Official Code for MotionCtrl [SIGGRAPH 2024]

Python 1,376 75 Updated Sep 20, 2024

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

HTML 412 23 Updated Jan 18, 2025

Code for the paper "Pix2Video: Video Editing using Image Diffusion"

Python 68 5 Updated Oct 2, 2023

Focus on prompting and generating

Python 42,755 6,268 Updated Jan 14, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,194 2,327 Updated Aug 12, 2024

A feature-rich command-line audio/video downloader

Python 97,660 7,655 Updated Jan 23, 2025

Official and maintained implementation of the paper "Differentiable JPEG: The Devil is in the Details" [WACV 2024].

Python 91 5 Updated Dec 30, 2023

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 37,568 4,599 Updated Jan 23, 2025
Next