A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python 268 26 Updated Feb 25, 2025

Huanshere / VideoLingo

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组

Python 11,721 1,135 Updated Feb 16, 2025

showlab / Show-o

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,242 55 Updated Mar 9, 2025

huggingface / speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 3,831 417 Updated Mar 5, 2025

xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.

Python 383 19 Updated Oct 23, 2024

InternLM / MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

JavaScript 6,168 617 Updated Jan 8, 2025

litwellchi / MMTrail

[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Python 31 Updated Feb 6, 2025

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 15,090 1,742 Updated Mar 2, 2025

gpt-open / rag-gpt

RAG-GPT, leveraging LLM and RAG technology, learns from user-customized knowledge bases to provide contextually relevant answers for a wide range of queries, ensuring rapid and accurate information…

Python 412 66 Updated Jul 19, 2024

multimodal-art-projection / MAP-NEO

Python 926 85 Updated Feb 7, 2025

SylphAI-Inc / AdalFlow

AdalFlow: The library to build & auto-optimize LLM applications.

Python 2,864 252 Updated Mar 6, 2025

PeterH0323 / Streamer-Sales

Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁，一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭…

Python 3,021 464 Updated Mar 8, 2025

Camb-ai / MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,633 217 Updated Aug 1, 2024

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 237 8 Updated Dec 26, 2024

JimmyMa99 / Roleplay-with-XiYou

基于《西游记》原文、白话文、ChatGPT生成数据制作的，以InternLM2微调的角色扮演多LLM聊天室。本项目将介绍关于角色扮演类 LLM 的一切，从数据获取、数据处理，到使用 XTuner 微调并部署至 OpenXLab，再到使用 LMDeploy 部署，以 openai api 的方式接入简单的聊天室，并可以观看不同角色的 LLM 互相交流、互怼。

Python 96 13 Updated Mar 31, 2024

NovaSearch-Team / RAG-Retrieval

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.

Python 732 59 Updated Mar 7, 2025

dyyoung poisonwine

Starred repositories

Azure

pybullet

Data visualization

MATLAB

Machine learning

robotics-simulation

3D