DCGAN on 256x256 pictures
(*) This repo is a modification of manicman1999/GAN256
(*) The full credit of the basic model structure design goes to manicman1999/GAN256
This GAN was based on GAN256 by Matthew Mann as his implementation allowed input and output of pictures of 256x256 resolution. Normally, GANs trained at a personal scale only works well at much lower resolutions (32x32 or 64x64), and may not transfer to datasets other than Mnist.
A couple of modifications were made to the model architecture for better results. With reference from GANs articles and research (https://medium.com/@utk.is.here/keep-calm-and-train-a-gan-pitfalls-and-tips-on-training-generative-adversarial-networks-edd529764aa9, https://arxiv.org/pdf/1606.03498.pdf), the following changes were made:
-
Larger Kernel and more filters The kernel size of layers 2 and 3 in the generator was increased by 1. As larger kernels cover more area it should capture more information.
-
Flip labels (Generated=True, Real=False) Helps with gradient flow.
-
Instance noise added to stabilize training A Gaussian Noise layer was added at the top of the Discriminator, with the idea being to prevent the Discriminator from being too good too early, which will prevent the Generator from learning and possibly converging.
For female blouses, the images were scraped using Scrapy from the Amazon website, and filters were used to select only 4-/5-star products.
For heels, the images were taken from the UT Zappos50K dataset. They are all catalog images collected from Zappos.com.
All the images were resized and background-filled to 256x256 resolution before input into the model for training.
Read the code first in the Jupyter Notebook, and create an images folders to store your image dataset (with the names renamed to 5-digit numbers starting from 0, images resized to 256x256). Then just run it and let the community know what you came up with!
Using 10,000 images of women's blouses with 4/5-star ratings scraped from Amazon:
Using 5,700 images of Zippo's Heels from the UT Zappos50K dataset:
Using a DCGANs, we were able to input 256x256 images and train the Generative Adversarial Network to generate random novel images. Depending on the input, it can be useful to generate such images in order to inspire designers to create new designs. As in the case of the Amazon clothes, we can restrict the input of the images to popular items so that the generated images will mainly incorporate popular colours and designs. Further research and tweaking is needed to tailor different GANs to suit different purposes and outputs since as we can see, the current DCGAN architecture seems to work better for shoes rather than clothes as it is easier to generate more abstract designs than frabic-like images.
https://github.com/manicman1999/GAN256
Generative adversarial nets [arXiv]
Improved Techniques for Training GANs [arXiv]
Improved Training of Wasserstein GANs [arXiv]