The corpus contains 759 sentences, 2268 Named Entities, 15213 words and 24839 characters.
The Named Entities are labelled in the "BIO" format.
All the data are in the Construction-NER-corpus.csv, where words are listed in the first column, while tags in the sencond.
The raw corpora is collected from a series of supervison documents of a contruction project.
The principles guiding the annotation are in Specification for annotation.docx or Specification(in English) for English version.
Kappa test were applied on the corpus, below is the result:
B I O
B 2026 119 123
I 70 2763 174
O 18 46 9874
If you have any question, please feel free to contact us: [email protected]