Stars
Official PyTorch implementation of "A Unified Approach for Text- and Image-guided 4D Scene Generation", [CVPR 2024]
3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation
This is an open collection of state-of-the-art (SOTA), novel Text to X (X can be everything) methods (papers, codes and datasets).
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Recent LLM-based CV and related works. Welcome to comment/contribute!
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Refine high-quality datasets and visual AI models
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, B…
📚 A collection of papers about Referring Image Segmentation.
[CVPR 2024] GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Local text-driven editing of 3D shapes with Cascaded Score Distillation
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Official code for DreamEditor: Text-Driven 3D Scene Editing with Neural Fields (Siggraph Asia 2023)
This repo contains the python code as well as the webpage html files for the Vox-E project from VAILab at TAU.
🧙🏻♂️A list of papers curated for you to dive into the Awesome Radiance Field-based 3D Editing.
collection of diffusion model papers categorized by their subareas
Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering