Starred repositories
The Swin-UNet is a version of the widely used U-Net architecture that combines the windowed attention mechanism of Swin transfomer with the U-Net framework.
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Hybrid-Segmentor: Hybrid Approach for Automated Fine-Grained Crack Segmentation in Civil Infrastructure
Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021)
Dress Code: High-Resolution Multi-Category Virtual Try-On. ECCV 2022
This repo contains code and a pre-trained model for clothes segmentation.
PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
ModelScope: bring the notion of Model-as-a-Service to life.
Official implementation of "FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on"
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper
Learning Flow Fields in Attention for Controllable Person Image Generation
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Official PyTorch implementation of SegFormer
Official code of ICRA 2024 paper: CrackNex: a Few-shot Low-light Crack Segmentation Model Based on Retinex Theory for UAV Inspections
A simple and yet easy-to-use API for BBC News
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation
A high-throughput and memory-efficient inference and serving engine for LLMs
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
[ICML 2024] GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Mode
Official implementation of the paper "Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance" (WACV 2025)