liusongxiang

Follow

🎯

Focusing

Songxiang Liu liusongxiang

🎯

Focusing

Follow

Work on spoken language processing: General Audio synthesis, TTS, VC, SVS & SVC etc.

359 followers · 97 following

http://liusongxiang.github.io

Achievements

Achievements

Highlights

Pro

Lists (1)

Sort

🚀 My stack

Starred repositories

Zjh-819 / LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

2,780 178 Updated Nov 28, 2023

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 14,920 1,082 Updated Jan 18, 2025

NovaSky-AI / SkyThought

Sky-T1: Train your own O1 preview model within $450

Python 1,826 193 Updated Jan 17, 2025

zhentingqi / rStar

Python 827 99 Updated Jan 10, 2025

Stability-AI / stable-codec

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 292 17 Updated Jan 14, 2025

OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,559 97 Updated Jan 17, 2025

google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Jupyter Notebook 2,521 163 Updated Dec 20, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

13,562 864 Updated Jan 17, 2025

BradyFU / Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

444 18 Updated Dec 14, 2024

Ag2S1 / Sibyl-System

Python 110 9 Updated Aug 13, 2024

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 4,226 258 Updated Jan 11, 2025

KeSpeech / KeSpeech

The repo provides information about KeSpeech dataset.

131 9 Updated Oct 13, 2022

Genesis-Embodied-AI / Genesis

A generative world for general-purpose robotics & embodied AI learning.

Python 22,908 1,887 Updated Jan 18, 2025

Tencent / HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 7,467 581 Updated Jan 17, 2025

InternLM / InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,714 164 Updated Dec 26, 2024

Evil0ctal / Fast-Powerful-Whisper-AI-Services-API

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API，使用本地运行的Whisper模型进行推理，并支持多GPU并发，针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫，可实现来自多个社交平台的无缝媒体处理，为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

Python 299 32 Updated Jan 8, 2025

huggingface / data-is-better-together

Let's build better datasets, together!

Jupyter Notebook 244 29 Updated Dec 20, 2024

deepseek-ai / DeepSeek-VL2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 802 72 Updated Jan 16, 2025

xingchensong / S3Tokenizer

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 234 30 Updated Jan 15, 2025

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 2,023 143 Updated Jan 17, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 11,116 2,485 Updated Jan 17, 2025

gabrielchua / open-notebooklm

Forked from knowsuchagency/pdf-to-podcast

Convert any PDF into a podcast episode!

Python 1,819 200 Updated Dec 7, 2024

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 8,985 1,192 Updated Jan 15, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33,921 5,204 Updated Jan 18, 2025

alaskasquirrel / Chinese-Podcasts

播客 🎧 编程、设计、Vlog、音乐、访谈、博客...

1,992 109 Updated Oct 6, 2023

Doriandarko / o1-engineer

o1-engineer is a command-line tool designed to assist developers in managing and interacting with their projects efficiently. Leveraging the power of OpenAI's API, this tool provides functionalitie…

Python 2,861 295 Updated Dec 16, 2024

lamm-mit / PDF2Audio

Jupyter Notebook 1,151 145 Updated Sep 24, 2024

knowsuchagency / pdf-to-podcast

Convert any PDF into a podcast episode!

Python 654 281 Updated Nov 15, 2024

wangxuqi / Prompt-Engineering-Guide-Chinese

Prompt工程师指南，源自英文版，但增加了AIGC的prompt部分，为了降低同学们的学习门槛，翻译更新

MDX 1,100 116 Updated Sep 14, 2024

microsoft / autogen

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

Python 37,698 5,482 Updated Jan 18, 2025

Starred topics

singing-voice

text-to-speech