Pytorch implementation for Clustering Generative Adversarial Networks for Story Visualization. The goal is to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters.
Clustering Generative Adversarial Networks for Story Visualization.
Bowen Li, Philip H. S. Torr, Thomas Lukasiewicz.
University of Oxford, TU Wien
ACM MM 2022
- Download Pororo dataset and extract the folder to
data/pororo
. - Download Abstract Scenes dataset and extract the folder to
data/abstract
.
All code was developed and tested on CentOS 7 with Python 3.7 (Anaconda) and PyTorch 1.1.
- Please refer ControlGAN for more details about pretraining the text encoder. The text encoder pretraining is based on DAMSM, which maximizes the cosine similarity between text and image pairs provided by the corresponding dataset.
- Train the model for Pororo dataset:
python main_pororo.py --cfg cfg/pororo.yml
- Train the model for Abstract dataset:
python main_abstract.py --cfg cfg/abstract.yml
*.yml
files include configuration for training and testing. If you store the datasets in somewhere else, please modify DATA_DIR
to point to the location.
Note that
we evaluate our approach at the resolution 64 × 64 on Pororo and 256×256 on Abstract Scenes, as Abstract Scenes provides larger-scale ground-truth images. To work on images at the resolution 256 × 256, we repeat the same upsampling blocks in the generator and downsampling blocks in the discriminator.
- Text Encoder for Pororo. Download and save it to
textEncoder/
. - Text Encoder for Abstract Scenes. Download and save it to
textEncoder/
.
- Pororo. Download and save it to
models/
.
- Run the following commands to evaluate our approach on the
Pororo
andAbstract Scenes
test dataset, including image generation of all stories in the test dataset, and calculation of both FID and FSD scores:
python main_pororo.py --cfg ./cfg/pororo.yml --eval_fid True
python main_abstract.py --cfg ./cfg/abstract.yml --eval_fid True
FID and FSD results will be saved in a .csv
file.
- cfg: contains
*.yml
files. - datasets: dataloader.
- main_pororo.py: the entry point for training and testing on Pororo.
- main_abstract.py: the entry point for training and testing on Abstract Scenes.
- trainer.py: creates the networks, harnesses and reports the progress of training.
- model.py: defines the architecture.
- inference.py: functions for evaluation.
- miscc/utils.py: loss functions and addtional help functions.
- miscc/config.py: creates the option list.
If you find this useful for your research, please use the following.
@inproceedings{li2022clustering,
title={Clustering generative adversarial networks for story visualization},
author={Li, Bowen and Torr, Philip HS and Lukasiewicz, Thomas},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
pages={769--778},
year={2022}
}
This code borrows from Word Visualization, StoryGAN, and ControlGAN repositories. Many thanks.