Skip to content
View shubhangb97's full-sized avatar

Highlights

  • Pro

Block or report shubhangb97

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[CVPR 2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

Python 130 5 Updated Mar 4, 2025
Python 377 31 Updated Mar 6, 2025

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 10,348 1,007 Updated Nov 18, 2024

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 2,197 219 Updated Mar 13, 2025

Code accompanying the paper "Massive Activations in Large Language Models"

Python 148 9 Updated Mar 4, 2024

[COLING'25] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

Python 68 4 Updated Jan 22, 2025

Official implementation of the Law of Vision Representation in MLLMs

Python 151 7 Updated Nov 17, 2024

[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training

Python 202 7 Updated Jan 13, 2025

DiffSeg is an unsupervised zero-shot segmentation method using attention information from a stable-diffusion model. This repo implements the main DiffSeg algorithm and additionally includes an expe…

Jupyter Notebook 304 24 Updated Jul 9, 2024

A suite of image and video neural tokenizers

Jupyter Notebook 1,575 73 Updated Feb 11, 2025

Official implementation of EMNLP'23 paper "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"

Python 19 Updated Oct 25, 2023

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

Python 28 1 Updated Aug 9, 2024

Unofficial implementation of "Prompt-to-Prompt Image Editing with Cross Attention Control" with Stable Diffusion

Jupyter Notebook 1,327 88 Updated Oct 18, 2022

Official Implementation for "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models" (SIGGRAPH 2023)

Jupyter Notebook 723 63 Updated Jan 26, 2024

[ECCV 2024] FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

Python 30 4 Updated Nov 19, 2024

Rembg is a tool to remove images background

Python 18,280 1,964 Updated Mar 10, 2025

🚀 Cross attention map tools for huggingface/diffusers

Python 231 19 Updated Jan 18, 2025

[CVPR 2024] Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation

Python 36 5 Updated May 14, 2024

Matryoshka Multimodal Models

Python 97 5 Updated Jan 22, 2025

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation (ICCV 2023, Oral)

Python 532 29 Updated Jan 8, 2024

A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..

Python 178 9 Updated Jan 14, 2025

4 bits quantization of LLaMA using GPTQ

Python 3,045 461 Updated Jul 13, 2024

Run PyTorch LLMs locally on servers, desktop and mobile

Python 3,522 239 Updated Mar 11, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 14,491 1,509 Updated Dec 25, 2024

llama.cpp tutorial on Android phone

94 8 Updated Jul 28, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 141,155 28,272 Updated Mar 13, 2025

TerDiT: Ternary Diffusion Models with Transformers

Python 68 3 Updated Jun 17, 2024

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

Python 231 8 Updated Jan 17, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,870 128 Updated Oct 30, 2024
Next