[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion model with additional semantic prior.

Python 60 5 Updated Jun 11, 2024

kuleshov-group / awesome-discrete-diffusion-models

A curated list for awesome discrete diffusion models resources.

259 9 Updated Feb 5, 2025

HKUNLP / DiffuLLaMA

[ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Python 112 8 Updated Feb 19, 2025

kuleshov-group / mdlm

Simple and Effective Masked Diffusion Language Model

Python 330 38 Updated Mar 3, 2025

SruthiSudhakar / CosHand

https://coshand.cs.columbia.edu/

Python 15 1 Updated Oct 23, 2024

openvla / openvla

Forked from TRI-ML/prismatic-vlms

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 2,187 284 Updated Mar 4, 2025

lupantech / IconQA

Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".

Python 51 15 Updated Jan 28, 2024

Lizw14 / Super-CLEVR

Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"

Python 33 2 Updated Sep 8, 2023

seominjoon / geosolver

Geometry Question Solver (GeoS)

Python 170 49 Updated Oct 17, 2017

willisma / SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Python 783 48 Updated Mar 12, 2024

chuanyangjin / fast-DiT

Fast Diffusion Models with Transformers

Python 801 107 Updated Oct 25, 2024

fuse-model / FuSe

Python 41 1 Updated Jan 13, 2025

cvlab-columbia / drrobot

Code for "Differentiable Robot Rendering" (CoRL 2024)

Python 128 9 Updated Oct 22, 2024

Tencent / HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 9,213 759 Updated Mar 12, 2025

AnjieCheng / SpatialRGPT

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"

Python 137 11 Updated Dec 14, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 41,326 6,235 Updated Mar 13, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,700 2,194 Updated Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cfeng16

Achievements

Achievements

Highlights

Block or report cfeng16

Stars

gpu-mode / lectures

Liuziyu77 / Visual-RFT

lumalabs / imm

ryoungj / ObsScaling

ShuangLI59 / unified_video_action

modelscope / DiffSynth-Studio

Wan-Video / Wan2.1

Haochen-Wang409 / ross

MoonshotAI / MoBA

ML-GSAI / LLaDA

uclanlp / visualbert

SalesforceAIResearch / DiffusionDPO

google-deepmind / md4

jianjieluo / SCD-Net