Unofficial LatentSync implementation for ComfyUI on windows.
This node provides lip-sync capabilities in ComfyUI using ByteDance's LatentSync model. It allows you to synchronize video lips with audio input.
latentsync_00029-audio.mp4
Before installing this node, you must install the following in order:
- ComfyUI installed and working
- Python 3.8-3.11 (mediapipe is not yet compatible with Python 3.12)
- FFmpeg installed on your system:
- Download from here to your root C:\ drive, extract it, and add 'C:\ffmpeg\bin' to system PATH
- If you get PYTHONPATH errors:
- Make sure Python is in your system PATH
- Try running ComfyUI as administrator
Only proceed with installation after confirming all prerequisites are installed and working.
- Clone this repository into your ComfyUI custom_nodes directory:
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI-LatentSyncWrapper.git
cd ComfyUI-LatentSyncWrapper
pip install -r requirements.txt
diffusers
transformers
huggingface-hub
omegaconf
einops
opencv-python
mediapipe>=0.10.8
face-alignment
decord
ffmpeg-python
safetensors
soundfile
The models can be obtained in two ways:
The node will attempt to automatically download required model files from HuggingFace on first use. If automatic download fails, use Option 2.
- Visit the HuggingFace repo: https://huggingface.co/chunyu-li/LatentSync
- Download these files:
latentsync_unet.pt
whisper/tiny.pt
- Place them in the following structure:
ComfyUI/custom_nodes/ComfyUI-LatentSyncWrapper/checkpoints/
├── latentsync_unet.pt
└── whisper/
└── tiny.pt
- Select an input video file
- Load an audio file using ComfyUI audio loader
- (Optional) Set a seed value for reproducible results
- Connect to the LatentSync node
- Run the workflow
The processed video will be saved in ComfyUI's output directory.
video_path
: Path to input video fileaudio
: Audio input from AceNodes audio loaderseed
: Random seed for reproducible results (default: 1247)
- Works best with clear, frontal face videos
- Currently does not support anime/cartoon faces
- Video should be at 25 FPS (will be automatically converted)
- Face should be visible throughout the video
A complementary node that helps manage video length and synchronization with audio.
- Displays video and audio duration information
- Three modes of operation:
normal
: Passes through video frames with added padding to prevent frame losspingpong
: Creates a forward-backward loop of the video sequenceloop_to_audio
: Extends video by repeating frames to match audio duration
- Place the Video Length Adjuster between your video input and the LatentSync node
- Connect audio to both the Video Length Adjuster and Video Combine nodes
- Select desired mode based on your needs:
- Use
normal
for standard lip-sync - Use
pingpong
for back-and-forth animation - Use
loop_to_audio
to match longer audio durations
- Use
- Load Video (Upload) → Video frames output
- Load Audio → Audio output
- Connect both to Video Length Adjuster
- Video Length Adjuster → LatentSync Node
- LatentSync Node + Original Audio → Video Combine
If you encounter mediapipe installation errors:
- Ensure you're using Python 3.8-3.11 (Check with
python --version
) - If using Python 3.12, you'll need to downgrade to a compatible version
- Try installing mediapipe separately first:
pip install mediapipe>=0.10.8
This is an unofficial implementation based on:
- LatentSync by ByteDance Research
- ComfyUI
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.