-
Northwestern Polytechnical University
- Suzhou
-
09:01
(UTC +08:00)
Lists (12)
Sort Name ascending (A-Z)
Stars
The official gpt4free repository | various collection of powerful language models
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
A high-throughput and memory-efficient inference and serving engine for LLMs
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Instant voice cloning by MIT and MyShell.
SoftVC VITS Singing Voice Conversion
Easily train a good VC model with voice data <= 10 mins!
Open-Sora: Democratizing Efficient Video Production for All
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
Universal LLM Deployment Engine with ML Compilation
GUI for a Vocal Remover that uses Deep Neural Networks.
リアルタイムボイスチェンジャー Realtime Voice Changer
Faster Whisper transcription with CTranslate2
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Geometric Computer Vision Library for Spatial AI
Collection of awesome LLM apps with RAG using OpenAI, Anthropic, Gemini and opensource models.
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Manipulate audio with a simple and easy high level interface