Stars
A python library/command-line tool to quickly and automatically generate BibTeX data starting from the pdf file of a scientific publication.
The NORN Poems corpus consists of 3,440 poems published in the 1890s in Norwegian and Danish and encoded in TEI.
This is a selected subset of the Gutenberg corpus.
A repository of datasets paired with rich documentation, data essays, and teaching resources
tracing variation in poetic metres via local sequence alignment
An incomplete, unofficial documentation for speedrun.com's new API as used by the web interface
idiolect: An R package for forensic authorship analysis
A simple text reuse detection CLI tool.
Corpus of Hungarian poems in TEI XML with machine annotation
python package russtress accentuates russian text
Lexical Simplification with Pretrained Encoders
A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Spanish poetry
Thesis RMA Dutch literature and culture at Utrecht University (July 2019)
Collection of songs from the Dutch Song Database of the Meertens Institute
A great intro dataset for data exploration & visualization (alternative to iris).