NaNoGenMo

My entry for NaNoGenMo. Currently investigating various methods of analysis on corpus text in order to come up with some kind of engine for generating a few different kinds of sequences. Thinking about this in layers, I'm trying to split up generation into several different phases, from individual sentences to high-level plot themeatics.

I'm looking at hand-building some generators based of rules from various story- telling and roleplaying games such as FATE, Fiasco and Microscope, then combining those with stuff derived from the corpus text analysis.

None of this is likely to end well.

Corpus

I'm making use of a few hand-picked novels from Project Gutenburg, namely:

From which I stripped the non-novel text out to make processing easier.

I'm also using various corpora from the NLTK project, namely gutenberg, abc, reuters, brown and movie_reviews as well as a lovecraft corpus found here: https://raw.github.com/jiko/lovecraft_ebooks/master/corpus.txt