
-
XJTUer/ Sensetime Researcher
- Shanghai
Starred repositories
Ola: Pushing the Frontiers of Omni-Modal Language Model
Dataset pruning for ImageNet and LAION-2B.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
An easy to understand TTS / SVS / SVC framework
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
Unsupervised Speech Decomposition Via Triple Information Bottleneck
Code for Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information
Codes for "TriSAT: Trimodal Representation Learning for Multimodal Sentiment Analysis".
a Unified framework for popular offline reinforcement learning algorithms
Common DRL algorithms(DQN/Dueling-DQN/DDQN/PPO/TRPO/HTRPO/DDPG/TD3/HPG/)
An unofficial pytorch implementation of "STREAMVC: REAL-TIME LOW-LATENCY VOICE CONVERSION".
All-in-One: Text Embedding, Retrieval, Reranking and RAG in Transformers
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
An open-source implementation for training LLaVA-NeXT.
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
RAG-GPT, leveraging LLM and RAG technology, learns from user-customized knowledge bases to provide contextually relevant answers for a wide range of queries, ensuring rapid and accurate information…
AdalFlow: The library to build & auto-optimize LLM applications.
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭…
MARS5 speech model (TTS) from CAMB.AI
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
基于《西游记》原文、白话文、ChatGPT生成数据制作的,以InternLM2微调的角色扮演多LLM聊天室。 本项目将介绍关于角色扮演类 LLM 的一切,从数据获取、数据处理,到使用 XTuner 微调并部署至 OpenXLab,再到使用 LMDeploy 部署,以 openai api 的方式接入简单的聊天室,并可以观看不同角色的 LLM 互相交流、互怼。
Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.