Lists (2)
Sort Name ascending (A-Z)
Stars
🔥LeetCode solutions in any programming language | 多种编程语言实现 LeetCode、《剑指 Offer(第 2 版)》、《程序员面试金典(第 6 版)》题解
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
A dummy's guide to setting up (and using) HPC clusters on Ubuntu 22.04LTS using Slurm and Munge. Created by the Quant Club @ UIowa.
Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
A robotics hardware platform for the integration sensors and end effectors into a common platform.
Sparsh Self-supervised touch representations for vision-based tactile sensing
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Code release for "Omni3D A Large Benchmark and Model for 3D Object Detection in the Wild"
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
transcengram / cambrian
Forked from cambrian-mllm/cambrianCambrian-1 is a family of multimodal LLMs with a vision-centric design.
This repository is created to retrain a Florence-2 model with your custom dataset
💯 Curated coding interview preparation materials for busy software engineers
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Utilities intended for use with Llama models.
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
An Open-source Toolkit for LLM Development
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Repository for synthetic RGB to Thermal Infrared translation module from "Edge-guided multidomain RGB to TIR translation", ICRA 2023 submission
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc.
A Household multimodal environment (HoME) based on the SUNCG indoor scenes dataset
🤩 An AWESOME Curated List of Papers, Workshops, Datasets, and Challenges from CVPR 2024
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks