Baris Gecer 1,2, Stylianos Ploumpis 1,2, Irene Kotsia 3, & Stefanos Zafeiriou 1,2
1 Imperial College London
2 FaceSoft.io
3 University of Middlesex
- 📌 GANFit now has a better versions, faster (called FastGANFit) and more stable (called GANFit++)). See our TPAMI paper for details
- 📌 Evaluation code of GANFit and FastGANFit on MICC Florence dataset is now available (see below for instructions)
- 📌 Unfortunately the code of these studies have been commercialized, so we cannot share it publicly. However, if you send some images, we can send back our results.
- 📌 We opened another texture-shape model similar to GANFit, you can apply for it here TBGAN
In the past few years a lot of work has been done towards reconstructing the 3D facial structure from single images by capitalizing on the power of Deep Convolutional Neural Networks (DCNNs). In the most recent works, differentiable renderers were employed in order to learn the relationship between the facial identity features and the parameters of a 3D morphable model for shape and texture. The texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction of the state-of-the-art methods is still not capable of modelling textures in high fidelity. In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful generator of facial texture in UV space. Then, we revisit the original 3D Morphable Models (3DMMs) fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details.
Detailed overview of the proposed approach. A 3D face reconstruction is rendered by a differentiable renderer (shown in purple). Cost functions are mainly formulated by means of identity features on a pretrained face recognition network (shown in gray) and they are optimized by flowing the error all the way back to the latent parameters (p_s, p_e, p_t, p_c, p_i, shown in green) with gradient descent optimization. End-to-end differentiable architecture enables us to use computationally cheap and reliable first order derivatives for optimization thus making it possible to employ deep networks as a generator (i.e,. statistical model) or as a cost function.Overview of the approach with regression network. The network is end-to-end connected with the differentiable renderer and the lost functions of GANFit. It benefits from the activations of all layers of a pretrained face recognition network and detection of a hourglass landmark detector.The network is trained similar to GANFit optimization: 1) alignment 2) full objective. The only difference is that now the regression network is being optimized instead of the trainable latent parameters of GANFit.
- First, apply for license and download the dataset here
- Register the Ground Truth meshes from the dataset to a common template:
-
python micc_registration.py [MICC_path] [Registration_path]
- The manually annotated landmarks are under 'MICC_landmarks' and the path is given by default
-
- Estimate 3D reconstruction based on videos in different settings (GANFit uses 5 random image per video), save them as obj files under the same folder structure
- Run the evaluation code:
-
python micc_evaluation.py [Registration_path] [Reconstruction_path]
- Evaluation code first align meshed based on landmark, so landmark indices should be given as [--template_lms] (default template and corresponding landmarks are given in this repo)
- Then it runs Rigid-ICP to deal with any misalignment remains without deforming the meshes
- Finally, errors are calculated as mean symmetrical point-to-plane distance
-
- This evaluation scenario is borrowed from [Unsupervised Training for 3D Morphable Model Regression, Genova et al. CVPR 2018] and implemented by us.
If you find this work is useful for your research, please cite our papers: GANFit, FastGANFit:
@InProceedings{Gecer_2019_CVPR,
author = {Gecer, Baris and Ploumpis, Stylianos and Kotsia, Irene and Zafeiriou, Stefanos},
title = {GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
@article{gecer2021fast,
title={Fast-GANFIT: Generative Adversarial Network for High Fidelity 3D Face Reconstruction},
author={Gecer, Baris and Ploumpis, Stylianos and Kotsia, Irene and Zafeiriou, Stefanos P},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2021},
publisher={IEEE}
}