Skip to content
View Dingpx's full-sized avatar
  • Zhejiang University

Highlights

  • Pro

Block or report Dingpx

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models

Python 34 3 Updated Feb 14, 2025

📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.

146 8 Updated Apr 2, 2025

[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

Python 270 8 Updated Jan 8, 2025

🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes.

Python 189 10 Updated Mar 21, 2025

🔥CVPR2025 & ICLR2025 Embodied AI Paper List Resources. Star ⭐ the repo and follow me if you like what you see 🤩.

66 1 Updated Apr 1, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 1,376 103 Updated Mar 13, 2025
Python 6 Updated Mar 17, 2025

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

Python 147 4 Updated Mar 31, 2025

✨First Open-Source R1-like Video-LLM [2025/02/18]

Python 309 11 Updated Feb 23, 2025

Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (arXiv 2025)

Jupyter Notebook 24 Updated Mar 21, 2025

[EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens

Python 24 Updated Nov 6, 2023

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python 1,054 105 Updated Mar 13, 2025

🔥CVPR 2025 Multimodal Large Language Models Paper List

129 3 Updated Mar 12, 2025

The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

Python 1,865 117 Updated Mar 28, 2025

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

Python 1,364 71 Updated Jan 24, 2025
Python 44 2 Updated Apr 2, 2025

[CVPR 2025] Official implementation of "MangaNinja: Line Art Colorization with Precise Reference Following"

Python 573 42 Updated Mar 2, 2025

Boosting the Class-Incremental Learning in 3D Point Clouds via Zero-Collection-Cost Basic Shape Pre-Training

Python 9 Updated Nov 30, 2024

[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

Python 142 5 Updated Mar 4, 2025

Improving Video Generation with Human Feedback

Python 148 1 Updated Feb 12, 2025

Integrate the DeepSeek API into popular softwares

30,880 3,364 Updated Apr 2, 2025

Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’

Python 1,494 67 Updated Mar 19, 2025

[Technical Report 2023] PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

Python 205 9 Updated Sep 4, 2024

Code for "Diffusion Model Alignment Using Direct Preference Optimization"

Python 414 33 Updated Feb 3, 2025

Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various r…

Python 257 15 Updated Mar 12, 2025

Explore the Multimodal “Aha Moment” on 2B Model

Python 547 17 Updated Mar 18, 2025

Official code repository of "CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction"

Python 25 1 Updated Mar 5, 2025

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

Python 262 9 Updated Mar 26, 2025
Next