Skip to content
View FDInSky's full-sized avatar

Block or report FDInSky

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Framework to reduce autotune overhead to zero for well known deployments.

Python 58 8 Updated Nov 19, 2024

Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*

Python 77 Updated Jan 14, 2025
43 Updated Dec 13, 2024

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

15 Updated Jan 15, 2025

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

Python 274 12 Updated Dec 7, 2024

The official implementation of Tensor ProducT ATTenTion Transformer (T6)

Python 200 20 Updated Jan 20, 2025

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 17,028 1,217 Updated Jan 20, 2025

GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization

Python 69 3 Updated Dec 15, 2024

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 1,969 140 Updated Jan 17, 2025

A comprehensive video analysis tool that combines computer vision, audio transcription, and natural language processing to generate detailed descriptions of video content. This tool extracts key fr…

Python 441 51 Updated Jan 10, 2025
Python 70 2 Updated Jan 15, 2025

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Jupyter Notebook 2,240 152 Updated Dec 24, 2024

3D Occupancy Prediction Benchmark in Autonomous Driving

Python 326 21 Updated May 27, 2024

[ECCV 2024] Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Python 395 31 Updated Dec 11, 2024

The public-facing frontend, www.svlsimulator.com

HTML 3 10 Updated Jan 28, 2022

Large Driving Models

153 6 Updated Dec 16, 2024
65 2 Updated Sep 14, 2024

GaussianAD: Gaussian-Centric End-to-End Autonomous Driving

62 2 Updated Dec 16, 2024

The official repo for the paper "In-Context Imitation Learning via Next-Token Prediction"

Jupyter Notebook 59 5 Updated Oct 31, 2024

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python 285 13 Updated Jan 13, 2025

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Python 687 48 Updated Jan 20, 2025

A collection of reference AI microservices and workflows for Jetson Platform Services

Python 27 7 Updated Dec 21, 2024

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Python 7,127 431 Updated Jan 9, 2025

A Telegram bot to recommend arXiv papers

Python 223 17 Updated Jan 8, 2025

FastBee开源物联网平台,简单易用,可用于搭建物联网平台以及二次开发和学习。适用于智能家居、智慧办公、智慧社区、农业监测、水利监测、工业控制等。

Java 1,662 489 Updated Dec 17, 2024

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

Python 49 3 Updated Jan 20, 2025

🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

Python 226 1 Updated Dec 28, 2024

Low-level locomotion policy training in Isaac Lab

Python 89 5 Updated Dec 15, 2024

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 2,800 224 Updated Jan 11, 2025
Next