-
National Yang Ming Chiao Tung University
- Taiwan
-
18:44
(UTC +08:00)
Stars
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Witness the aha moment of VLM with less than $3.
verl: Volcano Engine Reinforcement Learning for LLMs
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
A high-throughput and memory-efficient inference and serving engine for LLMs
A fork of Distrobox that supports rootless docker
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
YOLOv12: Attention-Centric Real-Time Object Detectors
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)
COCO API - Dataset @ http://cocodataset.org/
[CVPR25] Official repository for the paper: "SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation"
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation (CVPR'25)
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
[ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing
Master programming by recreating your favorite technologies from scratch.
This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.
Official inference repo for FLUX.1 models
🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge managemen…