Skip to content
View Shengqiang-Li's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.
  • Northwestern Polytechnical University
  • Suzhou
  • 01:19 (UTC +08:00)

Block or report Shengqiang-Li

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 6 Updated Sep 16, 2024

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 82 2 Updated Dec 26, 2024
JavaScript 18 13 Updated Aug 9, 2018

TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.

Python 78 14 Updated Dec 20, 2024

《大语言模型》作者:赵鑫,李军毅,周昆,唐天一,文继荣

2,417 159 Updated Apr 22, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,676 159 Updated Dec 26, 2024

Collection of awesome LLM apps with RAG using OpenAI, Anthropic, Gemini and opensource models.

Python 9,650 960 Updated Dec 26, 2024

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 10,981 1,093 Updated Dec 26, 2024

Implementation of the CVPR 2019 Paper - Speech2Face: Learning the Face Behind a Voice by MIT CSAIL

Python 173 35 Updated Mar 24, 2023

Audio-Visual Speech Separation with Cross-Modal Consistency

Python 225 36 Updated Jul 25, 2023

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

189 2 Updated Dec 3, 2024

Awesome speech/audio LLMs, representation learning, and codec models

794 48 Updated Dec 21, 2024

A Survey of Spoken Dialogue Models (60 pages)

222 13 Updated Nov 28, 2024

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 1,867 129 Updated Dec 25, 2024

General Speech Restoration

Python 1,062 132 Updated May 31, 2024

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 6,568 799 Updated Dec 13, 2024

Manipulate audio with a simple and easy high level interface

Python 9,047 1,056 Updated Jul 25, 2024

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 8,511 1,089 Updated Apr 24, 2024

Noise supression using deep filtering

Python 2,616 243 Updated Oct 17, 2024

GUI for a Vocal Remover that uses Deep Neural Networks.

Python 18,759 1,395 Updated Dec 9, 2024

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Python 228 15 Updated Dec 18, 2024

Accessible large language models via k-bit quantization for PyTorch.

Python 6,451 641 Updated Dec 23, 2024

汉字转拼音(pypinyin)

Python 4,941 616 Updated Sep 15, 2024

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

143 12 Updated Nov 10, 2024

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 3,357 314 Updated Dec 26, 2024

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation

Python 240 27 Updated Nov 4, 2024

汉字拼音数据

Python 1,252 215 Updated Dec 12, 2024

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 12,797 878 Updated Oct 3, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,496 200 Updated Dec 5, 2024

Continuation of Clash Verge - A Clash Meta GUI based on Tauri (Windows, MacOS, Linux)

TypeScript 43,138 3,333 Updated Dec 25, 2024
Next