Highlights
- Pro
Stars
This project aims to collect the latest "call for reviewers" links from various top CS/ML/AI conferences/journals
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
A New Federated Learning Framework Against Gradient Inversion Attacks [AAAI 2025].
Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
(NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
Official implementation of X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
This is the official code for the paper "See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning" (EMNLP2024).
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models
The official codes for "AutoRG-Brain: Grounded Report Generation for Brain MRI".
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Utilities intended for use with Llama models.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.
Selective Aggregation for Low-Rank Adaptation in Federated Learning
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
The official code for "SegVol: Universal and Interactive Volumetric Medical Image Segmentation".
The original code for paper "Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation"