Stars
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
A course on aligning smol models.
Align Anything: Training All-modality Model with Feedback
⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A RLHF Infrastructure for Vision-Language Models
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Scalable toolkit for efficient model alignment
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
List of papers on hallucination detection in LLMs.
An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Train transformer language models with reinforcement learning.
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone