Download || Upgrading || Manual installation
audio__bark__continued_generation__2023-05-04_16-07-49_long.webm
audio__bark__continued_generation__2023-05-04_16-09-21_long.webm
audio__bark__continued_generation__2023-05-04_16-10-55_long.webm
https://rsxdalv.github.io/bark-speaker-directory/
Sep 5:
- Add voice mixing to Bark
- Add v1 Burn in prompt to Bark (Burn in prompts are for directing the semantic model without spending time on generating the audio. The v1 works by generating the semantic tokens and then using it as a prompt for the semantic model.)
- Add generation length limiter to Bark
Aug 27:
- Fix MusicGen ignoring the melody rsxdalv#153
Aug 26:
- Add Send to RVC, Demucs, Vocos buttons to Bark and Vocos
Aug 24:
- Add date to RVC outputs to fix rsxdalv#147
- Fix safetensors missing wheel
- Add Send to demucs button to musicgen
Aug 21:
- Add torchvision install to colab for musicgen issue fix
- Remove rvc_tab file logging
Aug 20:
- Fix MBD by reinstalling hydra-core at the end of an update
Aug 18:
- CI: Add a GitHub Action to automatically publish docker image.
Aug 16:
- Add "name" to tortoise generation parameters
Aug 15:
- Pin torch to 2.0.0 in all requirements.txt files
- Bump audiocraft and bark versions
- Remove Tortoise transformers fix from colab
- Update Tortoise to 2.8.0
Aug 13:
- Potentially big fix for new user installs that had issues with GPU not being supported
Aug 11:
- Tortoise hotfix thanks to manmay-nakhashi
- Add Tortoise option to change tokenizer
Aug 8:
- Update AudioCraft, improving MultiBandDiffusion performance
- Fix Tortoise parameter 'cond_free' mismatch with 'ultra_fast' preset
Aug 7:
- add tortoise deepspeed fix to colab
Aug 6:
- Fix audiogen + mbd error, add tortoise fix for colab
Aug 4:
- Add MultiBandDiffusion option to MusicGen rsxdalv#109
- MusicGen/AudioGen save tokens on generation as .npz files.
Aug 3:
- Add AudioGen rsxdalv#105
Aug 2:
- Fix Model locations not showing after restart
July 26:
- Voice gallery
- Voice cropping
- Fix voice rename bug, rename picture as well, add a hash textbox
- Easier downloading of voices (rsxdalv#98)
July 24:
- Change bark file format to include history hash: ...continued_generation... -> ...from_3ea0d063...
July 23:
- Docker Image thanks to https://github.com/jonfairbanks
- RVC UI naming improvements
July 21:
- Fix hubert not working with CPU only (rsxdalv#87)
- Add Google Colab demo (rsxdalv#88)
- New settings tab and model locations (for advanced users) (rsxdalv#90)
July 19:
- Add Tortoise Optimizations, Thank you https://github.com/manmay-nakhashi rsxdalv#79 (Implements rsxdalv#18)
July 16:
- Voice Photo Demo
- Add a directory to store RVC models/indexes in and a dropdown
- Workaround rvc not respecting is_half for CPU rsxdalv#74
- Tortoise model and voice selection improvements rsxdalv#73
July 10:
- Demucs Demo rsxdalv#67
July 9:
- RVC Demo + Tortoise, v6 installer with update script and automatic attempts to install extra modules rsxdalv#66
July 5:
- Improved v5 installer - faster and more reliable rsxdalv#63
July 2:
- Upgrade bark settings rsxdalv#59
July 1:
- Studio-tab rsxdalv#58
Jun 29:
- Tortoise new params rsxdalv#54
Jun 27:
- Fix eager loading errors, refactor rsxdalv#50
Jun 20
- Tortoise: proper long form generation files rsxdalv#46
Jun 19
- Tortoise-upgrade rsxdalv#45
June 18:
- Update to newest audiocraft, add longer generations
Jun 14:
- add vocos wav tab rsxdalv#42
June 5:
- Fix "Save to Favorites" button on bark generation page, clean up console (v4.1.1)
- Add "Collections" tab for managing several different data sets and easier curration.
June 4:
- Update to v4.1 - improved hash function, code improvements
June 3:
- Update to v4 - new output structure, improved history view, codebase reorganization, improved metadata, output extensions support
May 21:
- Update to v3 - voice clone demo
May 17:
- Update to v2 - generate results as they appear, preview long prompt generations piece by piece, enable up to 9 outputs, UI tweaks
May 16:
- Add gradio settings tab, fix gradio errors in console, improve logging.
- Update History and Favorites with "use as voice" and "save voice" buttons
- Add voices tab
- Bark tab: Remove "or Use last generation as history"
- Improve code organization
May 13:
- Enable deterministic generation and enhance generated logs. Credits to suno-ai/bark#175.
May 10:
- Enable the possibility of reusing history prompts from older generations. Save generations as npz files. Add a convenient method of reusing any of the last 3 generations for the next prompts. Add a button for saving and collecting history prompts under /voices. rsxdalv#10
May 4:
- Long form generation (credits to https://github.com/suno-ai/bark/blob/main/notebooks/long_form_generation.ipynb and suno-ai/bark#161)
- Adapt to fixed env var bug
May 3:
- Improved Tortoise UI: Voice, Preset and CVVP settings as well as ability to generate 3 results (rsxdalv#6)
May 2 Update 2:
- Added support for history recylcing to continue longer prompts manually
May 2 Update 1:
- Added support for v2 prompts
Before:
- Added support for Tortoise TTS
In case of issues, feel free to contact the developers.
- Download and run the new installer
- Replace the "tts-generation-webui" directory in the newly installed directory
- Run update_platform
Not exactly, the dependencies clash, especially between conda and python (and dependencies are already in a critical state, moving them to conda is ways off). Therefore, while it might be possible to just replace the old installer with the new one and running the update, the problems are unpredictable and unfixable. Making an update to installer requires a lot of testing so it's not done lightly.
- Download and run the new installer
- Replace the "tts-generation-webui" directory in the newly installed directory
- Run update_platform
-
Install conda or another virtual environment
-
Highly recommended to use Python 3.10
-
Install git (
conda install git
) -
Install ffmpeg (
conda install -y -c pytorch ffmpeg
) -
Set up pytorch with CUDA or CPU (https://pytorch.org/audio/stable/build.windows.html#install-pytorch)
-
Clone the repo:
git clone https://github.com/rsxdalv/tts-generation-webui.git
-
install the root requirements.txt with
pip install -r requirements.txt
-
clone the repos in the ./models/ directory and install requirements under them
-
run using
(venv) python server.py
-
Potentially needed to install build tools (without Visual Studio): https://visualstudio.microsoft.com/visual-cpp-build-tools/
tts-generation-webui can also be ran inside of a Docker container. To get started, first build the Docker image while in the root directory:
docker build -t rsxdalv/tts-generation-webui .
Once the image has built it can be started with Docker Compose:
docker compose up -d
The container will take some time to generate the first output while models are downloaded in the background. The status of this download can be verified by checking the container logs:
docker logs tts-generation-webui
This project utilizes the following open source libraries:
-
suno-ai/bark - MIT License
- Description: A powerful library for XYZ.
- Repository: suno/bark
-
tortoise-tts - Apache-2.0 License
- Description: A flexible text-to-speech synthesis library for various platforms.
- Repository: neonbjb/tortoise-tts
-
ffmpeg - LGPL License
- Description: A complete and cross-platform solution for video and audio processing.
- Repository: FFmpeg
- Use: Encoding Vorbis Ogg files
-
ffmpeg-python - Apache 2.0 License
- Description: Python bindings for FFmpeg library for handling multimedia files.
- Repository: kkroening/ffmpeg-python
-
audiocraft - MIT License
- Description: A library for audio generation and MusicGen.
- Repository: facebookresearch/audiocraft
-
vocos - MIT License
- Description: An improved decoder for encodec samples
- Repository: charactr-platform/vocos
-
RVC - MIT License
- Description: An easy-to-use Voice Conversion framework based on VITS.
- Repository: RVC-Project/Retrieval-based-Voice-Conversion-WebUI
This technology is intended for enablement and creativity, not for harm.
By engaging with this AI model, you acknowledge and agree to abide by these guidelines, employing the AI model in a responsible, ethical, and legal manner.
- Non-Malicious Intent: Do not use this AI model for malicious, harmful, or unlawful activities. It should only be used for lawful and ethical purposes that promote positive engagement, knowledge sharing, and constructive conversations.
- No Impersonation: Do not use this AI model to impersonate or misrepresent yourself as someone else, including individuals, organizations, or entities. It should not be used to deceive, defraud, or manipulate others.
- No Fraudulent Activities: This AI model must not be used for fraudulent purposes, such as financial scams, phishing attempts, or any form of deceitful practices aimed at acquiring sensitive information, monetary gain, or unauthorized access to systems.
- Legal Compliance: Ensure that your use of this AI model complies with applicable laws, regulations, and policies regarding AI usage, data protection, privacy, intellectual property, and any other relevant legal obligations in your jurisdiction.
- Acknowledgement: By engaging with this AI model, you acknowledge and agree to abide by these guidelines, using the AI model in a responsible, ethical, and legal manner.
The codebase is licensed under MIT. However, it's important to note that when installing the dependencies, you will also be subject to their respective licenses. Although most of these licenses are permissive, there may be some that are not. Therefore, it's essential to understand that the permissive license only applies to the codebase itself, not the entire project.
That being said, the goal is to maintain MIT compatibility throughout the project. If you come across a dependency that is not compatible with the MIT license, please feel free to open an issue and bring it to our attention.
Known non-permissive dependencies:
Library | License | Notes |
---|---|---|
encodec | CC BY-NC 4.0 | Newer versions are MIT, but need to be installed manually |
diffq | CC BY-NC 4.0 | Optional in the future, not necessary to run, can be uninstalled, should be updated with demucs |
lameenc | GPL License | Future versions will make it LGPL, but need to be installed manually |
unidecode | GPL License | Not mission critical, can be replaced with another library, issue: neonbjb/tortoise-tts#494 |
Model weights have different licenses, please pay attention to the license of the model you are using.
Most notably:
- Bark: CC BY-NC 4.0 (MIT but HuggingFace has not been updated yet)
- Tortoise: Unknown (Apache-2.0 according to repo, but no license file in HuggingFace)
- MusicGen: CC BY-NC 4.0
- AudioGen: CC BY-NC 4.0
You can configure the interface through the "Settings" tab or, for advanced users, via the config.json file in the root directory (not recommended). Below is a detailed explanation of each setting:
Argument | Default Value | Description |
---|---|---|
text_use_gpu |
true |
Determines whether the GPU should be used for text processing. |
text_use_small |
true |
Determines whether a "small" or reduced version of the text model should be used. |
coarse_use_gpu |
true |
Determines whether the GPU should be used for "coarse" processing. |
coarse_use_small |
true |
Determines whether a "small" or reduced version of the "coarse" model should be used. |
fine_use_gpu |
true |
Determines whether the GPU should be used for "fine" processing. |
fine_use_small |
true |
Determines whether a "small" or reduced version of the "fine" model should be used. |
codec_use_gpu |
true |
Determines whether the GPU should be used for codec processing. |
load_models_on_startup |
false |
Determines whether the models should be loaded during application startup. |
Argument | Default Value | Description |
---|---|---|
inline |
false |
Display inline in an iframe. |
inbrowser |
true |
Automatically launch in a new tab. |
share |
false |
Create a publicly shareable link. |
debug |
false |
Block the main thread from running. |
enable_queue |
true |
Serve inference requests through a queue. |
max_threads |
40 |
Maximum number of total threads. |
auth |
null |
Username and password required to access interface, format: username:password . |
auth_message |
null |
HTML message provided on login page. |
prevent_thread_lock |
false |
Block the main thread while the server is running. |
show_error |
false |
Display errors in an alert modal. |
server_name |
0.0.0.0 |
Make app accessible on local network. |
server_port |
null |
Start Gradio app on this port. |
show_tips |
false |
Show tips about new Gradio features. |
height |
500 |
Height in pixels of the iframe element. |
width |
100% |
Width in pixels of the iframe element. |
favicon_path |
null |
Path to a file (.png, .gif, or .ico) to use as the favicon. |
ssl_keyfile |
null |
Path to a file to use as the private key file for a local server running on HTTPS. |
ssl_certfile |
null |
Path to a file to use as the signed certificate for HTTPS. |
ssl_keyfile_password |
null |
Password to use with the SSL certificate for HTTPS. |
ssl_verify |
true |
Skip certificate validation. |
quiet |
true |
Suppress most print statements. |
show_api |
true |
Show the API docs in the footer of the app. |
file_directories |
null |
List of directories that Gradio is allowed to serve files from. |
_frontend |
true |
Frontend. |