This repository contains the DiscoGeM corpus: A Crowdsourced Corpus of Genre-Mixed Inter-Sentential Implicit Discourse Relations annotated in PDTB3-style.
DiscoGeM is a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech (Europarl), literature, and encyclopedic (Wikipedia) texts. It has been updated with annotations of 100 implicit PDTB items.
Each instance was annotated by 10 crowd workers. We also make available the dataset with all annotator-level labels and annotator quality scores.
If you use this resource, please consider citing:
@inproceedings{scholman2022DiscoGeM,
title = "DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations",
author = "Scholman, Merel C. J. and
Dong, Tianai and
Yung, Frances and
Demberg, Vera",
booktitle = "Proceedings of the Thirteenth International Conference on Language Resources and Evaluation ({LREC}'22)",
month = June,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association (ELRA)"
}