Skip to content
View poisonwine's full-sized avatar
  • XJTUer/ Sensetime Researcher
  • Shanghai

Block or report poisonwine

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Ola: Pushing the Frontiers of Omni-Modal Language Model

Python 295 12 Updated Feb 28, 2025

Dataset pruning for ImageNet and LAION-2B.

Python 75 4 Updated Jul 5, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 10,194 1,391 Updated Feb 24, 2025

An easy to understand TTS / SVS / SVC framework

Python 685 90 Updated Mar 3, 2025

An efficient speech separation method

Python 271 33 Updated Apr 11, 2024

Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯

Python 806 32 Updated Mar 7, 2025

Unsupervised Speech Decomposition Via Triple Information Bottleneck

Python 667 93 Updated Oct 23, 2024

Code for Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information

Python 137 16 Updated Nov 27, 2023

Codes for "TriSAT: Trimodal Representation Learning for Multimodal Sentiment Analysis".

Python 5 Updated Sep 15, 2024

a Unified framework for popular offline reinforcement learning algorithms

Python 1 Updated May 11, 2023

Common DRL algorithms(DQN/Dueling-DQN/DDQN/PPO/TRPO/HTRPO/DDPG/TD3/HPG/)

Python 1 Updated May 31, 2023

goal-conditioned ISAR algorithm

Python 2 Updated Jun 5, 2023

An unofficial pytorch implementation of "STREAMVC: REAL-TIME LOW-LATENCY VOICE CONVERSION".

Python 63 7 Updated Jul 24, 2024

All-in-One: Text Embedding, Retrieval, Reranking and RAG in Transformers

Python 50 12 Updated Jan 23, 2025

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python 268 26 Updated Feb 25, 2025

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组

Python 11,721 1,135 Updated Feb 16, 2025

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,242 55 Updated Mar 9, 2025

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 3,831 417 Updated Mar 5, 2025

An open-source implementation for training LLaVA-NeXT.

Python 383 19 Updated Oct 23, 2024

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

JavaScript 6,168 617 Updated Jan 8, 2025

[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Python 31 Updated Feb 6, 2025

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 15,090 1,742 Updated Mar 2, 2025

RAG-GPT, leveraging LLM and RAG technology, learns from user-customized knowledge bases to provide contextually relevant answers for a wide range of queries, ensuring rapid and accurate information…

Python 412 66 Updated Jul 19, 2024

AdalFlow: The library to build & auto-optimize LLM applications.

Python 2,864 252 Updated Mar 6, 2025

Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭…

Python 3,021 464 Updated Mar 8, 2025

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,633 217 Updated Aug 1, 2024

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 237 8 Updated Dec 26, 2024

基于《西游记》原文、白话文、ChatGPT生成数据制作的,以InternLM2微调的角色扮演多LLM聊天室。 本项目将介绍关于角色扮演类 LLM 的一切,从数据获取、数据处理,到使用 XTuner 微调并部署至 OpenXLab,再到使用 LMDeploy 部署,以 openai api 的方式接入简单的聊天室,并可以观看不同角色的 LLM 互相交流、互怼。

Python 96 13 Updated Mar 31, 2024

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.

Python 732 59 Updated Mar 7, 2025
Next