Skip to content

golendercaria/wav2lipStable

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wav2Lip UHQ extension for Stable diffusion webui Automatic1111

Illustration

Result video can be find here : https://www.youtube.com/watch?v=-3WLUxz6XKM

demo_1.mp4

Description

This repository contains a Wav2Lip UHQ extension for Automatic1111.

It's an all-in-one solution: just choose a video and a speech file (wav or mp3), and it will generate a lip-sync video. It improves the quality of the lip-sync videos generated by the Wav2Lip tool by applying specific post-processing techniques with Stable diffusion.

Illustration

Requirements

  • latest version of Stable diffusion webui automatic1111
  • FFmpeg
  1. Install Stable Diffusion WebUI by following the instructions on the Stable Diffusion Webui repository.
  2. Download FFmpeg from the official FFmpeg site. Follow the instructions appropriate for your operating system. Note that FFmpeg should be accessible from the command line.

Installation

  1. Launch Automatic1111
  2. In the extensions tab, enter the following URL in the "Install from URL" field and click "Install":

Illustration

  1. Go to the "Installed Tab" in the extensions tab and click "Apply and quit".

Illustration

  1. if you don't see the "Wav2lip Uhq tab" restart automatic1111.

  2. 🔥 Important: Get the weights. Download the model weights from the following locations and place them in the corresponding directories:

Model Description Link to the model install folder
Wav2Lip Highly accurate lip-sync Link extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\
Wav2Lip + GAN Slightly inferior lip-sync, but better visual quality Link extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\
s3fd Face Detection pre trained model Link extensions\sd-wav2lip-uhq\scripts\wav2lip\face_detection\detection\sfd\s3fd.pth
s3fd Face Detection pre trained model (alternate link) Link extensions\sd-wav2lip-uhq\scripts\wav2lip\face_detection\detection\sfd\s3fd.pth
landmark predicator Dlib 68 point face landmark prediction (click on the download icon) Link extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat
landmark predicator Dlib 68 point face landmark prediction (alternate link) Link extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat
landmark predicator Dlib 68 point face landmark prediction (alternate link click on the download icon) Link extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat

Usage

  1. Choose a video or an image.
  2. Choose an audio file with speech.
  3. choose a checkpoint (see table above).
  4. Padding: Wav2Lip uses this to add a black border around the mouth, which is useful to prevent the mouth from being cropped by the face detection. You can change the padding value to suit your needs, but the default value gives good results.
  5. No Smooth: If checked, the mouth will not be smoothed. This can be useful if you want to keep the original mouth shape.
  6. Resize Factor: This is a resize factor for the video. The default value is 1.0, but you can change it to suit your needs. This is useful if the video size is too large.
  7. Choose a good Stable diffusion checkpoint, like delibarate_v2 or revAnimated_v122 (SDXL models don't seem to work, but you can generate a SDXL image and change model for wav2lip process).
  8. Click on the "Generate" button.

Behind the scenes

This extension operates in several stages to improve the quality of Wav2Lip-generated videos:

  1. Generate a Wav2lip video: The script first generates a low-quality Wav2Lip video using the input video and audio.
  2. Mask Creation: The script creates a mask around the mouth and try to keep other face motion like cheeks and chin.
  3. Video Quality Enhancement: It takes the low-quality Wav2Lip video and overlays the low-quality mouth onto the high-quality original video.
  4. Img2Img: The script then sends the original image with the low-quality mouth and the mouth mask into Img2Img.

Quality tips

  • Use a high quality image/video as input
  • Try to minimize the grain on the face on the input as much as possible, for example you can try to use "Restore faces" in img2img before use an image as wav2lip input.
  • Use a high quality model in stable diffusion webui like delibarate_v2 or revAnimated_v122

Contributing

Contributions to this project are welcome. Please ensure any pull requests are accompanied by a detailed description of the changes made.

License

  • The code in this repository is released under the MIT license as found in the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages