Lists (2)
Sort Name ascending (A-Z)
Stars
[CVPR 2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
LAVIS - A One-stop Library for Language-Vision Intelligence
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Code accompanying the paper "Massive Activations in Large Language Models"
[COLING'25] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
Official implementation of the Law of Vision Representation in MLLMs
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
DiffSeg is an unsupervised zero-shot segmentation method using attention information from a stable-diffusion model. This repo implements the main DiffSeg algorithm and additionally includes an expe…
Official implementation of EMNLP'23 paper "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Unofficial implementation of "Prompt-to-Prompt Image Editing with Cross Attention Control" with Stable Diffusion
Official Implementation for "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models" (SIGGRAPH 2023)
[ECCV 2024] FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
Rembg is a tool to remove images background
🚀 Cross attention map tools for huggingface/diffusers
[CVPR 2024] Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation (ICCV 2023, Oral)
A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..
4 bits quantization of LLaMA using GPTQ
Run PyTorch LLMs locally on servers, desktop and mobile
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
llama.cpp tutorial on Android phone
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
TerDiT: Ternary Diffusion Models with Transformers
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.