Clustering Generative Adversarial Networks for Story Visualization

Pytorch implementation for Clustering Generative Adversarial Networks for Story Visualization. The goal is to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters.

Overview

Clustering Generative Adversarial Networks for Story Visualization.
Bowen Li, Philip H. S. Torr, Thomas Lukasiewicz.
University of Oxford, TU Wien
ACM MM 2022

Data

Download Pororo dataset and extract the folder to data/pororo.
Download Abstract Scenes dataset and extract the folder to data/abstract.

Training

All code was developed and tested on CentOS 7 with Python 3.7 (Anaconda) and PyTorch 1.1.

Text Encoder Pretraining

Please refer ControlGAN for more details about pretraining the text encoder. The text encoder pretraining is based on DAMSM, which maximizes the cosine similarity between text and image pairs provided by the corresponding dataset.

Our Model

Train the model for Pororo dataset:

python main_pororo.py --cfg cfg/pororo.yml

Train the model for Abstract dataset:

python main_abstract.py --cfg cfg/abstract.yml

*.yml files include configuration for training and testing. If you store the datasets in somewhere else, please modify DATA_DIR to point to the location.

Note that we evaluate our approach at the resolution 64 × 64 on Pororo and 256×256 on Abstract Scenes, as Abstract Scenes provides larger-scale ground-truth images. To work on images at the resolution 256 × 256, we repeat the same upsampling blocks in the generator and downsampling blocks in the discriminator.

Pretrained Text Encoder

Text Encoder for Pororo. Download and save it to textEncoder/.
Text Encoder for Abstract Scenes. Download and save it to textEncoder/.

Pretrained Our Model

Pororo. Download and save it to models/.

Evaluation

Run the following commands to evaluate our approach on the Pororo and Abstract Scenes test dataset, including image generation of all stories in the test dataset, and calculation of both FID and FSD scores:

python main_pororo.py --cfg ./cfg/pororo.yml --eval_fid True

python main_abstract.py --cfg ./cfg/abstract.yml --eval_fid True

FID and FSD results will be saved in a .csv file.

Code Structure

cfg: contains *.yml files.
datasets: dataloader.
main_pororo.py: the entry point for training and testing on Pororo.
main_abstract.py: the entry point for training and testing on Abstract Scenes.
trainer.py: creates the networks, harnesses and reports the progress of training.
model.py: defines the architecture.
inference.py: functions for evaluation.
miscc/utils.py: loss functions and addtional help functions.
miscc/config.py: creates the option list.

Citation

If you find this useful for your research, please use the following.

@inproceedings{li2022clustering,
  title={Clustering generative adversarial networks for story visualization},
  author={Li, Bowen and Torr, Philip HS and Lukasiewicz, Thomas},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={769--778},
  year={2022}
}

Acknowledgements

This code borrows from Word Visualization, StoryGAN, and ControlGAN repositories. Many thanks.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cfg		cfg
datasets		datasets
fid		fid
fvd		fvd
kmeans_pytorch		kmeans_pytorch
miscc		miscc
Attention.py		Attention.py
README.md		README.md
archi.jpg		archi.jpg
inference.py		inference.py
layers.py		layers.py
main_abstract.py		main_abstract.py
main_pororo.py		main_pororo.py
model.py		model.py
requirements.txt		requirements.txt
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering Generative Adversarial Networks for Story Visualization

Overview

Data

Training

Text Encoder Pretraining

Our Model

Pretrained Text Encoder

Pretrained Our Model

Evaluation

Code Structure

Citation

Acknowledgements

About

Releases

Packages

Languages

mrlibw/Clustering-Story-Visualization

Folders and files

Latest commit

History

Repository files navigation

Clustering Generative Adversarial Networks for Story Visualization

Overview

Data

Training

Text Encoder Pretraining

Our Model

Pretrained Text Encoder

Pretrained Our Model

Evaluation

Code Structure

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages