Stars
Demonstration of running a native LLM on Android device.
Explore the Multimodal “Aha Moment” on 2B Model
NVIDIA Isaac GR00T N1 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
joncv / OpenHands
Forked from All-Hands-AI/OpenHands🙌 OpenHands: Code Less, Make More
Minimal re-implementation of pi0 vision-language-action (VLA) model
Manus AI alternative that run locally. Powered with Deepseek R1. No APIs, No $456 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity.
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping
A high-performance runtime framework for modern robotics.
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
R1-onevision, a visual language model capable of deep CoT reasoning.
[CVPR 2025] The offical Implementation of "Universal Actions for Enhanced Embodied Foundation Models"
(CVPR 2025) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models"
[KDD2025] Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
(TPAMI 2025) Invertible Diffusion Models for Compressed Sensing [PyTorch]
中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调,给出三元组信息抽取微调示例。
[CVPR 2025] OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
[IJCV 2024] Hard-normal Example-aware Template Mutual Matching for Industrial Anomaly Detection
Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".
[NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)