Wav2Lip UHQ extension for Stable diffusion webui Automatic1111

Result video can be find here : https://www.youtube.com/watch?v=-3WLUxz6XKM

demo_1.mp4

Description

This repository contains a Wav2Lip UHQ extension for Automatic1111.

It's an all-in-one solution: just choose a video and a speech file (wav or mp3), and it will generate a lip-sync video. It improves the quality of the lip-sync videos generated by the Wav2Lip tool by applying specific post-processing techniques with Stable diffusion.

Requirements

latest version of Stable diffusion webui automatic1111
FFmpeg

Install Stable Diffusion WebUI by following the instructions on the Stable Diffusion Webui repository.
Download FFmpeg from the official FFmpeg site. Follow the instructions appropriate for your operating system. Note that FFmpeg should be accessible from the command line.

Installation

Launch Automatic1111
In the extensions tab, enter the following URL in the "Install from URL" field and click "Install":

Go to the "Installed Tab" in the extensions tab and click "Apply and quit".

if you don't see the "Wav2lip Uhq tab" restart automatic1111.
🔥 Important: Get the weights. Download the model weights from the following locations and place them in the corresponding directories:

Model	Description	Link to the model	install folder
Wav2Lip	Highly accurate lip-sync	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\
Wav2Lip + GAN	Slightly inferior lip-sync, but better visual quality	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\
s3fd	Face Detection pre trained model	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\face_detection\detection\sfd\s3fd.pth
s3fd	Face Detection pre trained model (alternate link)	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\face_detection\detection\sfd\s3fd.pth
landmark predicator	Dlib 68 point face landmark prediction (click on the download icon)	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat
landmark predicator	Dlib 68 point face landmark prediction (alternate link)	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat
landmark predicator	Dlib 68 point face landmark prediction (alternate link click on the download icon)	Link	extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat

Usage

Choose a video or an image.
Choose an audio file with speech.
choose a checkpoint (see table above).
Padding: Wav2Lip uses this to add a black border around the mouth, which is useful to prevent the mouth from being cropped by the face detection. You can change the padding value to suit your needs, but the default value gives good results.
No Smooth: If checked, the mouth will not be smoothed. This can be useful if you want to keep the original mouth shape.
Resize Factor: This is a resize factor for the video. The default value is 1.0, but you can change it to suit your needs. This is useful if the video size is too large.
Choose a good Stable diffusion checkpoint, like delibarate_v2 or revAnimated_v122 (SDXL models don't seem to work, but you can generate a SDXL image and change model for wav2lip process).
Click on the "Generate" button.

Behind the scenes

This extension operates in several stages to improve the quality of Wav2Lip-generated videos:

Generate a Wav2lip video: The script first generates a low-quality Wav2Lip video using the input video and audio.
Mask Creation: The script creates a mask around the mouth and try to keep other face motion like cheeks and chin.
Video Quality Enhancement: It takes the low-quality Wav2Lip video and overlays the low-quality mouth onto the high-quality original video.
Img2Img: The script then sends the original image with the low-quality mouth and the mouth mask into Img2Img.

Quality tips

Use a high quality image/video as input
Try to minimize the grain on the face on the input as much as possible, for example you can try to use "Restore faces" in img2img before use an image as wav2lip input.
Use a high quality model in stable diffusion webui like delibarate_v2 or revAnimated_v122

Contributing

Contributions to this project are welcome. Please ensure any pull requests are accompanied by a detailed description of the changes made.

License

The code in this repository is released under the MIT license as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
install.py		install.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wav2Lip UHQ extension for Stable diffusion webui Automatic1111

Description

Requirements

Installation

Usage

Behind the scenes

Quality tips

Contributing

License

About

Releases

Sponsor this project

Packages

Languages

License

golendercaria/wav2lipStable

Folders and files

Latest commit

History

Repository files navigation

Wav2Lip UHQ extension for Stable diffusion webui Automatic1111

Description

Requirements

Installation

Usage

Behind the scenes

Quality tips

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages