Lists (1)
Sort Name ascending (A-Z)
Stars
This package contains the original 2012 AlexNet code.
[ICRA 2024]: Train your parkour robot in less than 20 hours.
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
Fast and memory-efficient exact attention
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)
Refine high-quality datasets and visual AI models
[ICCV2023] GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction
[CoRL 2023 Oral] GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Official Code for DragGAN (SIGGRAPH 2023)
Code for "PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring" ICCV2023
✨✨Latest Advances on Multimodal Large Language Models
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
CLIP+MLP Aesthetic Score Predictor
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Code for "SAR-Net: Shape Alignment and Recovery Network for Category-level 6D Object Pose and Size Estimation" CVPR2022
Split screen video comparison tool using FFmpeg and SDL2
ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors (TPAMI2023)
An English-language shell for any OS, powered by LLMs
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images. In ECCV2018.