Code for the paper Generating Steganographic Text with LSTMs. The LSTM is based on the Word Language Model example from PyTorch (http://pytorch.org/).
- Latest NVIDIA driver
- CUDA 8 Toolkit
- cuDNN
- PyTorch
A small sample of Penn Treebank and Tweets. pre-process.py
is tokenization of punctuation.
python main.py --cuda --nhid 600 --nlayers 3 --epochs 6 --data './data/tweets --save './models/twitter-model.pt'
For the full list of arguments, check the PyTorch example README.
One of our key original contributions. After we train our model, we generate words and restrict the output based on the secret text. generate.py
is modified such that it takes the secret text and modifies the probabilities based on the "bins" as described in our paper.
Example generation with 4 bins:
python generate.py --data './data/tweets' --checkpoint './models/twitter-model.pt' --cuda --outf 'outputs/stegotweets.txt' --words 1000 --temperature 0.8 --bins 4 --common_bin_factor 4 --num_tokens 20
See the arguments in generate.py
or refer to the PyTorch example README.
We proposed and implemented an alternate measure of perplexity in Section 3.2 of our paper. The code is in evaluate.py
.
Example evaluation: python evaluate.py --cuda --data './data/tweets' --model './models/twitter-model.pt' --bins 4
If there are any questions or concerns about the code or paper, please contact Tina Fang at [email protected]. We would love to hear your feedback!