Skip to content

antonckoenig/DiscoGeM

 
 

Repository files navigation

DiscoGeM

This repository contains the DiscoGeM corpus: A Crowdsourced Corpus of Genre-Mixed Inter-Sentential Implicit Discourse Relations annotated in PDTB3-style.

DiscoGeM is a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech (Europarl), literature, and encyclopedic (Wikipedia) texts. It has been updated with annotations of 100 implicit PDTB items.

Each instance was annotated by 10 crowd workers. We also make available the dataset with all annotator-level labels and annotator quality scores.

Reference

If you use this resource, please consider citing:

    @inproceedings{scholman2022DiscoGeM,
       title = "DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations",
       author = "Scholman, Merel C. J.  and
       Dong, Tianai and
       Yung, Frances and
       Demberg, Vera",
       booktitle = "Proceedings of the Thirteenth International Conference on Language Resources and Evaluation ({LREC}'22)",
       month = June,
       year = "2022",
       address = "Marseille, France",
       publisher = "European Language Resources Association (ELRA)"
   }

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 51.1%
  • Python 48.9%