GitHub - mattslyons/MeasEval: SemEval-2021 Task 8: MeasEval

Welcome to MeasEval: Counts and Measurements!

Counts and measurements are an important part of scientific discourse. It is relatively easy to find measurements in text, but a bare measurement like 17 mg is not informative. However, relatively little attention has been given to parsing and extracting these important semantic relations. This is challenging because the way scientists write can be ambiguous and inconsistent, and the location of this information relative to the measurement can vary greatly.

MeasEval is a new entity and semantic relation extraction task focused on finding counts and measurements, attributes of these quantities, and additional information including measured entities, properties, and measurement contexts.

For more details and to participate, head over to the competition's CodaLab pages: https://competitions.codalab.org/competitions/25770

Summary

Abstract: SemEval 2021 Task 8: MeasEval: Counts and Measurements aimed to generate research on the contextual extraction of counts and measurements in order to introduce a system of measurement extraction that identified entities while preserving semantic relationships. MeasEval included five subtasks ranging from straightforward quantity extraction to more complicated qualifier extractions. Because these tasks primarily concern scientific texts, we completed several MeasEval tasks using base RoBERTa as our baseline, and cs_roberta_base and biomed_roberta_base as competitor models, hypothesizing that the domain-adapted pretrained (DAPT) RoBERTa models, especially the model trained on biomedical texts, would yield improved performance. We demonstrate that DAPT models offer improvement over non-DAPT models even when unique labeled training tokens are repeated as few as two or three times within a domain.

The primary code, including preprocessing, pipeline, and model building, are contained here.

The final paper (ACL format) is located here.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
annotationGuidelines		annotationGuidelines
assets		assets
baselines		baselines
data		data
eval		eval
outputs		outputs
sandbox		sandbox
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Stephens Shen Lyons Final Paper DataSci 266.pdf		Stephens Shen Lyons Final Paper DataSci 266.pdf
anlaysis.ipynb		anlaysis.ipynb
fileCategories.txt		fileCategories.txt
final_scores.csv		final_scores.csv
labs_per_utok.csv		labs_per_utok.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to MeasEval: Counts and Measurements!

Summary

About

Releases

Packages

Languages

mattslyons/MeasEval

Folders and files

Latest commit

History

Repository files navigation

Welcome to MeasEval: Counts and Measurements!

Summary

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages