FDInSky

FDInSky

8 followers · 14 following

Baidu
BeiJing

Achievements

Starred repositories

IBM / triton-dejavu

Framework to reduce autotune overhead to zero for well known deployments.

Python 58 8 Updated Nov 19, 2024

RUCAIBox / Virgo

Forked from Richar-Du/Virgo

Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*

Python 77 Updated Jan 14, 2025

huang-yh / Owl

43 Updated Dec 13, 2024

appletea233 / LLaVA-ST

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

15 Updated Jan 15, 2025

RLHF-V / RLAIF-V

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

Python 274 12 Updated Dec 7, 2024

tensorgi / T6

The official implementation of Tensor ProducT ATTenTion Transformer (T6)

Python 200 20 Updated Jan 20, 2025

RifleZhang / LLaVA-Reasoner-DPO

Python 49 2 Updated Jan 8, 2025

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 17,028 1,217 Updated Jan 20, 2025

aiming-lab / GRAPE

GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization

Python 69 3 Updated Dec 15, 2024

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 1,969 140 Updated Jan 17, 2025

byjlw / video-analyzer

A comprehensive video analysis tool that combines computer vision, audio transcription, and natural language processing to generate detailed descriptions of video content. This tool extracts key fr…

Python 441 51 Updated Jan 10, 2025

zuosc19 / GaussianWorld

Python 70 2 Updated Jan 15, 2025

yformer / EfficientSAM

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Jupyter Notebook 2,240 152 Updated Dec 24, 2024

OpenDriveLab / OpenScene

3D Occupancy Prediction Benchmark in Autonomous Driving

Python 326 21 Updated May 27, 2024

huang-yh / GaussianFormer

[ECCV 2024] Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Python 395 31 Updated Dec 11, 2024

lgsvl / svlsimulator.com

The public-facing frontend, www.svlsimulator.com

HTML 3 10 Updated Jan 28, 2022

wzzheng / LDM

Large Driving Models

153 6 Updated Dec 16, 2024

EMZucas / minidrive

65 2 Updated Sep 14, 2024

wzzheng / GaussianAD

GaussianAD: Gaussian-Centric End-to-End Autonomous Driving

62 2 Updated Dec 16, 2024

Max-Fu / icrt

The official repo for the paper "In-Context Imitation Learning via Next-Token Prediction"

Jupyter Notebook 59 5 Updated Oct 31, 2024

ictnlp / LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python 285 13 Updated Jan 13, 2025

magic-research / Sa2VA

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Python 687 48 Updated Jan 20, 2025

NVIDIA-AI-IOT / jetson-platform-services

A collection of reference AI microservices and workflows for Jetson Platform Services

Python 27 7 Updated Dec 21, 2024

NVIDIA / Cosmos

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Python 7,127 431 Updated Jan 9, 2025