Notice that this repo is a preliminary work before our Wav2Pix paper in ICASSP 2019. You probably want to check that other repo instead, as it is more mature and stable than this one.
Image synthesis has been a trending task for the AI community in recent years. Many works have shown the potential of Generative Adversarial Networks (GANs) to deal with tasks such as text or audio to image synthesis. In particular, recent advances in deep learning using audio have inspired many works involving both visual and auditory information. In this work we propose a face synthesis method which is trained end-to-end using audio and/or language representations as inputs. We used this project as baseline.
- pytorch
- h5py
- PIL
- numpy
- matplotlib
This implementation currently only support running with GPUs.
`python runtime.py
Arguments:
type
: GAN archiecture to use(gan | wgan | vanilla_gan | vanilla_wgan)
. default =gan
. Vanilla mean not conditionaldataset
: Dataset to use(birds | flowers)
. default =flowers
split
: An integer indicating which split to use(0 : train | 1: valid | 2: test)
. default =0
lr
: The learning rate. default =0.0002
diter
: Only for WGAN, number of iteration for discriminator for each iteration of the generator. default =5
vis_screen
: The visdom env name for visualization. default =gan
save_path
: Path for saving the models.l1_coef
: L1 loss coefficient in the generator loss fucntion for gan and vanilla_gan. default=50
l2_coef
: Feature matching coefficient in the generator loss fucntion for gan and vanilla_gan. default=100
pre_trained_disc
: Discriminator pre-tranined model path used for intializing training.pre_trained_gen
Generator pre-tranined model path used for intializing training.batch_size
: Batch size. default=64
num_workers
: Number of dataloader workers used for fetching data. default =8
epochs
: Number of training epochs. default=200
cls
: Boolean flag to whether train with cls algorithms or not. default=False
[1] Generative Adversarial Text-to-Image Synthesis https://arxiv.org/abs/1605.05396
[2] Improved Techniques for Training GANs https://arxiv.org/abs/1606.03498
[3] Wasserstein GAN https://arxiv.org/abs/1701.07875
[4] Improved Training of Wasserstein GANs https://arxiv.org/pdf/1704.00028.pdf