AI-VITS-Howto

AI VITS Howto documents

Q&A

1、So-VITS-SVC 4.0 训练/推理常见报错和Q&A - 哔哩哔哩

[https://www.bilibili.com/read/cv22206231/]

Python venv — Creation of virtual environments

[https://docs.python.org/3/library/venv.html]

一、Toolbox

1、AudioSlicer（音频分割）

[https://github.com/henrymaas/AudioSlicer.git]

A simple Audio Slicer in Python which can split .wav audio files into multiple .wav samples, based on silence detection.

批量分割：Git: https://github.com/maminge/AudioSlicerBat.git
RUN:
python AudioSegBat.py <Source-Wave-File-Dir> <Output-Sliced-Wave-File_dir>

2、人声/背景声分离

2-1、天花板级的AI音频分离demucs

[https://github.com/facebookresearch/demucs]

工具介绍：https://zhuanlan.zhihu.com/p/510755328

2-2、Ultimate Vocal Remover（UVR）

[https://github.com/Anjok07/ultimatevocalremovergui]

工具介绍：https://www.bilibili.com/read/cv24883000/

3、torchaudio：PyTorch的音频库

[https://github.com/pytorch/audio]

参考：PyTorch 音频处理教程 https://blog.csdn.net/qq_43613400/article/details/115524978
API: http://pytorch.org/audio/
Ref: https://ptorch.com/news/100.html

8、OCR

8-1、Python Tesseract (Google Tesseract)

https://github.com/madmaze/pytesseract
Python-tesseract is a wrapper for Google's Tesseract-OCR Engine.

8-2、Tesseract.js

https://tesseract.projectnaptha.com
https://github.com/naptha/tesseract.js#tesseractjs
Tesseract.js is a javascript library that gets words in almost any language out of images.

3-2、PaddleOCR

https://github.com/PaddlePaddle/PaddleOCR

3-3、TrOCR——基于transformer模型的OCR手写文字识别

https://zhuanlan.zhihu.com/p/656620989

9、Text Segmentation

https://github.com/koomri/text-segmentation

古文断句
https://github.com/ToolsForAncientChineseText/Text-segmentation

二、Audio To Text

Whisper

Whisper是一种强大的语音识别模型，能够将语音转换为文本，并支持多种语言。我们将使用 Whisper 将视频中的原始语音提取为文本，并通过翻译服务将其转换为目标语言的文本。

1、OpenAI Whisper: (Mac/Linux)

[https://github.com/openai/whisper/tree/main]

2、Whisper for Windows

2-1、Download：Whisper （CPU） [https://github.com/Const-me/Whisper]

https://github.com/Const-me/Whisper
https://github.com/Const-me/Whisper/releases
Download：Model https://huggingface.co/ggerganov/whisper.cpp
https://github.com/ggerganov/whisper.cpp/tree/master/models
https://huggingface.co/ggerganov/whisper.cpp

2-2、Download：Whisper（GPU） [https://github.com/guillaumekln/faster-whisper]

2-3、Whisper输出中文 [https://wulu.zone/posts/whisper-cn]

三、Text To Audio（train）

四、Text To Audio（run）

五、Text To Audio（Demos OR Project）

1、原神语音包测试

在线测试：[https://huggingface.co/spaces/Plachta/VITS-Umamusume-voice-synthesizer]

原神语音包（中/日/韩/英）: https://pan.baidu.com/s/1dWtW7qpVacRTXswfMUMqFw?pwd=2qxc 提取码: 2qxc

2、AI声音克隆vits：一键启动包（VITS-fast-fine-tuning）

Download: 链接: https://pan.baidu.com/s/14z7V6n530MZiAMXr7KW3MA?pwd=6h88 提取码: 6h88 
https://docs.qq.com/doc/DT0x6dHd5WER6VExr
https://www.bilibili.com/read/cv25689788/
https://www.bilibili.com/video/BV1K94y1k7Bw/?share_source=copy_web&vd_source=b55b8a2cb4d0e92a42df58f6ee597187

Original Download:
环境+代码+权重文件打包（提取码：o89H）：
https://pan.baidu.com/s/11e7Tgm49_5MZt74BmQc4Ww?pwd=o89H
unzip password: cuijiahua.com

六、其它系统

VITS-fast-fine-tuning (AI声音克隆)

https://github.com/Plachtaa/VITS-fast-fine-tuning/
https://www.bilibili.com/video/BV1Jg4y1E7df/?share_source=copy_web&vd_source=09c09be7267697fb21e6ec8f56b2016f

Others

用python分割语句（jieba）
https://jingyan.baidu.com/article/4d58d5416a660a9dd4e9c01c.html
https://blog.csdn.net/weixin_61631131/article/details/124274495

Python实现对中文文本分段分句
https://pythonjishu.com/shrhbowvmggprev/

MockingBird（Real Time Voice Clone）（效果差） 
你只需要提供一个人短短几秒钟的录音（3-10秒），就能模仿出那个人的声音。
https://github.com/babysor/MockingBird/tree/main

VALL-E X
https://github.com/Plachtaa/VALL-E-X/blob/master/README-ZH.md
实时语音Clone，该模型支持多种语言（英语、中文和日语）和零样本语音克隆，你只需要提供一个人短短几秒钟的录音（3-10秒），就能模仿出那个人的声音。
Demo(不成功): https://huggingface.co/spaces/Plachta/VALL-E-X

AI作画保姆级教程来了！逆天，太强了！
https://www.bilibili.com/video/BV1q84y1i78L/?share_source=copy_web&vd_source=b55b8a2cb4d0e92a42df58f6ee597187

Bert-Vits2:
https://github.com/fishaudio/Bert-VITS2
https://www.bilibili.com/video/BV1yz4y1M71e/?buvid=XYFCFA4F89A607273CA316B8F30FBDB21831C&is_story_h5=false&mid=1BdCQC4JGQbdG4lCFG58cw%3D%3D&p=1&plat_id=116&share_from=ugc&share_plat=android&share_session_id=4d499b7c-e423-4164-965a-13a15e1ccfbe&share_tag=s_i&unique_k=Ir2OG5d&up_id=284799364&vd_source=09c09be7267697fb21e6ec8f56b2016f

TTS-Vue（微软语音合成）Mac/win
https://loker-page.lgwawork.com
https://github.com/LokerL/tts-vue/releases/tag/1.9.15

长风4.2:
https://www.bilibili.com/video/BV1bN41117ot/?vd_source=09c09be7267697fb21e6ec8f56b2016f

Coqui TTS：（很奇怪）
https://github.com/coqui-ai/TTS

OpenAI Whisper + FFmpeg + TTS：动态实现跨语言视频音频翻译
https://zhuanlan.zhihu.com/p/631644803

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
VITS常见报错和QA.md		VITS常见报错和QA.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-VITS-Howto

Q&A

1、So-VITS-SVC 4.0 训练/推理常见报错和Q&A - 哔哩哔哩

Python venv — Creation of virtual environments

一、Toolbox

1、AudioSlicer（音频分割）

2、人声/背景声分离

2-1、天花板级的AI音频分离demucs

2-2、Ultimate Vocal Remover（UVR）

3、torchaudio：PyTorch的音频库

8、OCR

8-1、Python Tesseract (Google Tesseract)

8-2、Tesseract.js

3-2、PaddleOCR

3-3、TrOCR——基于transformer模型的OCR手写文字识别

9、Text Segmentation

二、Audio To Text

Whisper

1、OpenAI Whisper: (Mac/Linux)

2、Whisper for Windows

三、Text To Audio（train）

四、Text To Audio（run）

五、Text To Audio（Demos OR Project）

1、原神语音包测试

2、AI声音克隆vits：一键启动包（VITS-fast-fine-tuning）

六、其它系统

VITS-fast-fine-tuning (AI声音克隆)

Others

About

Releases

Packages

maminge/AI-VITS-Howto

Folders and files

Latest commit

History

Repository files navigation

AI-VITS-Howto

Q&A

1、So-VITS-SVC 4.0 训练/推理常见报错和Q&A - 哔哩哔哩

Python venv — Creation of virtual environments

一、Toolbox

1、AudioSlicer（音频分割）

2、人声/背景声分离

2-1、天花板级的AI音频分离demucs

2-2、Ultimate Vocal Remover（UVR）

3、torchaudio：PyTorch的音频库

8、OCR

8-1、Python Tesseract (Google Tesseract)

8-2、Tesseract.js

3-2、PaddleOCR

3-3、TrOCR——基于transformer模型的OCR手写文字识别

9、Text Segmentation

二、Audio To Text

Whisper

1、OpenAI Whisper: (Mac/Linux)

2、Whisper for Windows

三、Text To Audio（train）

四、Text To Audio（run）

五、Text To Audio（Demos OR Project）

1、原神语音包测试

2、AI声音克隆vits：一键启动包（VITS-fast-fine-tuning）

六、其它系统

VITS-fast-fine-tuning (AI声音克隆)

Others

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages