Speech2Face

Important note

Notice that this repo is a preliminary work before our Wav2Pix paper in ICASSP 2019. You probably want to check that other repo instead, as it is more mature and stable than this one.

Introduction

Image synthesis has been a trending task for the AI community in recent years. Many works have shown the potential of Generative Adversarial Networks (GANs) to deal with tasks such as text or audio to image synthesis. In particular, recent advances in deep learning using audio have inspired many works involving both visual and auditory information. In this work we propose a face synthesis method which is trained end-to-end using audio and/or language representations as inputs. We used this project as baseline.

Requirements

pytorch
h5py
PIL
numpy
matplotlib

This implementation currently only support running with GPUs.

Usage

Training

`python runtime.py

Arguments:

type : GAN archiecture to use (gan | wgan | vanilla_gan | vanilla_wgan). default = gan. Vanilla mean not conditional
dataset: Dataset to use (birds | flowers). default = flowers
split : An integer indicating which split to use (0 : train | 1: valid | 2: test). default = 0
lr : The learning rate. default = 0.0002
diter : Only for WGAN, number of iteration for discriminator for each iteration of the generator. default = 5
vis_screen : The visdom env name for visualization. default = gan
save_path : Path for saving the models.
l1_coef : L1 loss coefficient in the generator loss fucntion for gan and vanilla_gan. default=50
l2_coef : Feature matching coefficient in the generator loss fucntion for gan and vanilla_gan. default=100
pre_trained_disc : Discriminator pre-tranined model path used for intializing training.
pre_trained_gen Generator pre-tranined model path used for intializing training.
batch_size: Batch size. default= 64
num_workers: Number of dataloader workers used for fetching data. default = 8
epochs : Number of training epochs. default=200
cls: Boolean flag to whether train with cls algorithms or not. default=False

References

[1] Generative Adversarial Text-to-Image Synthesis https://arxiv.org/abs/1605.05396

[2] Improved Techniques for Training GANs https://arxiv.org/abs/1606.03498

[3] Wasserstein GAN https://arxiv.org/abs/1701.07875

[4] Improved Training of Wasserstein GANs https://arxiv.org/pdf/1704.00028.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
checkpoints		checkpoints
images		images
models		models
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
convert_cub_to_hd5_script.py		convert_cub_to_hd5_script.py
convert_flowers_to_hd5_script.py		convert_flowers_to_hd5_script.py
loss_estimator.py		loss_estimator.py
onehot2image_dataset.py		onehot2image_dataset.py
plot_logfile.py		plot_logfile.py
runtime.py		runtime.py
trainer.py		trainer.py
txt2image_dataset.py		txt2image_dataset.py
utils.py		utils.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech2Face

Important note

Introduction

Requirements

Usage

Training

References

About

Releases

Packages

Languages

License

imatge-upc/speech2face

Folders and files

Latest commit

History

Repository files navigation

Speech2Face

Important note

Introduction

Requirements

Usage

Training

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages