AudioGPT connects ChatGPT and a series of Audio Foundation Models to enable sending and receiving speech, sing, audio, and talking head during chatting.
Up-to-date link: https://cdb7b543afd1c8e8.gradio.app
Here we list the capability of AudioGPT at this time. More supported models and tasks are comming soon. For prompt examples, refer to asset.
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Speech | FastSpeech, SyntaSpeech, VITS | Yes (WIP) |
Style Transfer | GenerSpeech | Yes |
Speech Recognition | whisper, Conformer | Yes |
Speech Enhancement | ConvTasNet | WIP |
Speech Separation | TF-GridNet | WIP |
Speech Translation | Multi-decoder | WIP |
Mono-to-Binaural | NeuralWarp | Yes |
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Sing | DiffSinger, VISinger | Yes (WIP) |
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Audio | Make-An-Audio | Yes |
Audio Inpainting | Make-An-Audio | Yes |
Image-to-Audio | Make-An-Audio | Yes |
Sound Detection | Audio-transformer | Yes |
Target Sound Detection | TSDNet | Yes |
Sound Extraction | LASSNet | Yes |
Task | Supported Foundation Models | Status |
---|---|---|
Talking Head Synthesis | GeneFace | Yes (WIP) |
4.6 Support Sound Extraction/Detection
4.3 Support huggingface demo space
4.1 Support Audio inpainting and clean codes
3.27 Support Style Transfer/Talking head Synthesis
3.23 Support Text-to-Sing
3.21 Support Image-to-Audio
3.19 Support Speech Recognition
3.17 Support Text-to-Audio
- clean text to sing/speech code
- merge talking head synthesis into main
- change audio/video log output
- support huggingface space
We appreciate the open source of the following projects: