Disclaimer: This is not an official Google product.
This directory contains code for generating the data and model described in "SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders".
For questions or issues, contact [email protected].
This package depends on Tensorflow and google_research/rouge. See
the included run.sh
for how to install and run a unit test inside
of a virtualenv
.
Request (free) and download raw data for ROCStories corpora
into a directory pointed to by the environment variable ROCSTORIES_RAW
.
This directory should contain the following files:
- "ROCStories_winter2017 - ROCStories_winter2017.csv"
- "ROCStories__spring2016 - ROCStories_spring2016.csv"
Then run the data processing script:
export ROCSTORIES_RAW=absolutepathto/raw_rocstories
export ROCSTORIES_DATA=absolutepathto/processed_rocstoriesdata
Inside google_research/google_research directory:
bash summae/generate_data.sh $ROCSTORIES_RAW summae/testdata $ROCSTORIES_DATA
python -m summae.verify_data --data_dir=$ROCSTORIES_DATA
export HYPERS=`pwd`/testdata/hypers.json
bash summae/run_locally.sh train /tmp/testmodel
bash summae/run_locally.sh decode /tmp/testmodel 0
mkdir /tmp/best
cp -r summae/testdata/best /tmp/
bash summae/run_locally.sh decode /tmp/best 358000
Decodes output is saved to /tmp/best/decodes/
.
If you use this code in research please cite:
@article{liu2019summae,
title={SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic
Auto-Encoders},
author={Liu, Peter J. and Chung, Yu-An and Ren, Jie},
journal={arXiv preprint arXiv:1910.00998},
url={http://arxiv.org/abs/1910.00998},
year={2019}
}