Skip to content

Pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

License

Notifications You must be signed in to change notification settings

zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

9511d25 · Apr 19, 2022

History

85 Commits
Nov 18, 2021
Nov 18, 2021
Sep 1, 2021
Apr 19, 2022
Mar 3, 2022
Nov 11, 2021
Sep 1, 2021
Sep 1, 2021
Nov 22, 2021
Sep 1, 2021
Oct 28, 2021
Nov 18, 2021
Sep 1, 2021

Repository files navigation

One-Shot Free-View Neural Talking Head Synthesis

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing".

Python 3.6 and Pytorch 1.7 are used.

Updates:

2021.11.05 :

  • Replace Jacobian with the rotation matrix (Assuming J = R) to avoid estimating Jacobian.
  • Correct the rotation matrix.

2021.11.17 :

  • Better Generator, better performance (models and checkpoints have been released).

Driving | Beta Version | FOMM | New Version:

driving-beta-fomm-new.mp4

Driving | FOMM | Ours:
show

Free-View:
show

Train:

python run.py --config config/vox-256.yaml --device_ids 0,1,2,3,4,5,6,7

Demo:

python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame

free-view (e.g. yaw=20, pitch=roll=0):

python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame --free_view --yaw 20 --pitch 0 --roll 0

Note: run crop-video.py --inp driving_video.mp4 first to get the cropping suggestion and crop the raw video.

Pretrained Model:

Model Train Set Baidu Netdisk Media Fire
Vox-256-Beta VoxCeleb-v1 Baidu (PW: c0tc) MF
Vox-256-New VoxCeleb-v1 - MF
Vox-512 VoxCeleb-v2 soon soon

Note:

  1. For now, the Beta Version is not well tuned.
  2. For free-view synthesis, it is recommended that Yaw, Pitch and Roll are within ±45°, ±20° and ±20° respectively.
  3. Face Restoration algorithms (GPEN) can be used for post-processing to significantly improve the resolution. show

Acknowlegement:

Thanks to NV, AliaksandrSiarohin and DeepHeadPose.

About

Pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages