-
Huazhong University of Science and Technology
- Wu Han City,Hu Bei Province, China
Highlights
Stars
Latex code for making neural networks diagrams
[3DV 2025] Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Witness the aha moment of VLM with less than $3.
Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…
Fully open reproduction of DeepSeek-R1
Solve Visual Understanding with Reinforced VLMs
[AAAI 2024] Official implementation of NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Code for paper `Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection`.
Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting
A comprehensive surevy on Multimodal Models in 3D
Blockchain dark forest selfguard handbook. Master these, master the security of your cryptocurrency.
Janus-Series: Unified Multimodal Understanding and Generation Models
PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
A curated list of awesome libraries, packages, strategies, books, blogs, tutorials for systematic trading.
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Code accompanying our ECCV-2020 paper on 3D Neural Listeners.
An open-source library for GPU-accelerated robot learning and sim-to-real transfer.
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation".