Highlights
- Pro
Stars
A webapp to visualize relationships among Chinese characters and to see example sentences that illustrate their use. Also available for Japanese learners.
[CVPR 2024] Wired Perspectives: Multi-View Wire Art Embraces Generative AI
Personalized Representation from Personalized Generation
A curated list of Awesome Personalized Large Multimodal Models resources
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Utilities intended for use with Llama models.
Pytorch Implementation of "Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model"
Easily compute clip embeddings and build a clip retrieval system with them
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant
🌸 A collection of Vietnamese women who are currently working in the field of Computer Science.
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
A beautiful, simple, clean, and responsive Jekyll theme for academics
A curated list of Awesome Makeup Transfer resources
Open-Sora: Democratizing Efficient Video Production for All
[WACV 2024] An implementation of MEGANet for polyp segmentation with multi-scale edge-guided attention
✨✨Latest Advances on Multimodal Large Language Models
[ICLR'24] GTA: A Geometry-Aware Attention Mechanism for Multi-view Transformers