DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Art performance on all Benchmark tasks for Danish. This repository contains code for reproducing DaCy. To download the models use the DaNLP package (request pending), SpaCy (request pending) or downloading the project directly here.
the folder DaCy
contains a SpaCy project which will allow for a reproduction of the results. This folder also includes the evaluation metrics on DaNE.
To load in the project using the direct download simple place the downloaded "packages" folder in your directory load the model using SpaCy:
import spacy
spacy.load("packages/da_dacy_large_tft-0.0.0")
The following table show the performance on DaNE when compared to other models. Highest scores are highlighted with bold and second highest is underlined
Want to learn more about how the model was trained, check out this blog post.
To ask questions, report issues or request features 🤔 , please use the GitHub Issue Tracker. Question related to SpaCy is referred to the SpaCy GitHub or forum.
This is really an acknowledgement of great open-source software and contributors. This wouldn't have been possible with the work by the SpaCy team which developed an integrated the software. Huggingface for developing Transformers and making model sharing convenient. BotXO for training and sharing the Danish BERT model and Malte Bertelsen for making it easily available. DaNLP has made it extremely easy to get access to Danish resources to train on and even supplied some of the tagged data themselves and does a great job of actually developing these datasets.
If you use this library in your research, please kindly cite:
@inproceedings{enevoldsen2020dacy,
title={DaCy: A SpaCy NLP Pipeline for Danish},
author={Enevoldsen, Kenneth},
year={2021}
}
DaCy is released under the Apache License, Version 2.0. See the LICENSE
file for more details.