Stars
[CVPR 2025] DEIM: DETR with Improved Matching for Fast Convergence
No fortress, purely open ground. OpenManus is Coming.
[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual Language Models
Wan: Open and Advanced Large-Scale Video Generative Models
Solve Visual Understanding with Reinforced VLMs
DeepEP: an efficient expert-parallel communication library
YOLOv12: Attention-Centric Real-Time Object Detectors
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
[IEEE TIP] TOPIC: A Parallel Association Paradigm for Multi-Object Tracking under Complex Motions and Diverse Scenes
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]
Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥
[T-PAMI] Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR and High Performance Computing (HPC) projects.
[CVPR24] COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Frontier Multimodal Foundation Models for Image and Video Understanding
Awesome Chinese LLM: A curated list of Chinese Large Language Model 中文大语言模型数据集和模型资料汇总
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。