Stars
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
LMM which strictly superset LLM embedded
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
🚀 Codebase and Fondation Models for Visual Instruction Tuning
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Code Repository for the EACL 2023 paper "Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training"
Multimodal chatbot with computer vision capabilities integrated
基于ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B模型,进行下游具体任务微调,涉及Freeze、Lora、P-tuning、全参微调等
Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
An open-source framework for training large multimodal models.
由基于Stable-diffusion的Chilloutmix模型生成高清真实的人像
Unofficial implementation of "Prompt-to-Prompt Image Editing with Cross Attention Control" with Stable Diffusion
[TPAMI 2023] PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images
pytorch implementation of scene change detection
[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
Official Pytorch implementations for "SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation" (NeurIPS 2022)