Stars
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization
The paper collections for the autoregressive models in vision.
Code for paper "CREAM: Consistency Regularized Self-Rewarding Language Models".
[arXiv'24 & NeurIPSW'24] MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
[EMNLP'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
[NeurIPS'24 & ICMLW'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Official implementation for "MJ-BENCH: Is Your Multimodal Reward Model Really a Good Judge?"
Multimodal Learning Method MLA for CVPR 2024
[ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image …
Benchmark for Natural Temporal Distribution Shift (NeurIPS 2022)