GitHub - lusterck/preturn_ie: Code for paper “Clinical Information Extraction for Preterm Birth Risk Prediction” (To be released soon)

This repository contains the information extraction pipeline used to extract features from semi-structured and structured from medical notes related to preterm birth. Extracted features were used to support clinical decision making in the context of preterm birth risk prediction. This library was developed for the B-project preturn at Gent University Hospital.

This pipeline is largely based on the spacy and scispacy libraries.

Data

Because of its sensitive nature, for now, no raw text and training data is included.

Installation

To use the repo create a virtual environment and call

python3 -m venv .env
source .env/bin/activate
pip install requirements.txt

Generate embeddings and vocabulary

python -m scripts.generate_embeddings --embedding_size 300 --window_size 2
python -m scripts.create_vocab data/input/corpus.csv data/input/counts.freq

Convert UMLS database to json

python -m scripts.export_umls_json --meta_path data/umls/umls/ --output_path data/umls/umls_2017_aa_cat0129.json

Generate UMLS inverted index vectors

python -m scripts.train_linker --umls_path data/umls/umls_2017_aa_cat0129 --train

Train weakly supervised NER

Weak training data is created by labeling unlabeled text using patterns stored in data/input/{lang}/entities.json

python -m scripts.extract_features

This stores the weakly supervised training data in files data/weak_supervision/{training,dev}_data.json A weakly supervised NER model can then be trained calling

python -m scripts.train_ner {lang} ./ data/input/training_data.json data/input/training_data.json -b models/preturn

Example

To extract features from medical in the csv file and create a bew csv containing features, call

python -m scripts.extract_features

from preturn_ie import load_pipeline

text_doc = ("2 maal 1000 mg Dafalgan")

nlp = load_preturn_model(extract_features=extract_features, enable_linker=enable_linker)
doc = nlp(text_doc)

for feature in doc._.features:
    pprint(feature)

''' 
    {
        'attribute': '',
        'canonical_name': '',
        'concept_id': ' ',
        'drug_name': 'dafalgan',
        'feature_name': 'DRUG_ADMINISTRATION',
        'feature_string': '_2_keer_1000_mg_dafalgan',
        'feature_type': 'drug',
        'match_id': '',
        'modifier': '',
        'source_text': '2 keer 1000 mg dafalgan',
        'unit_name': ' TIMES  MASS_UNIT MEDICATION',
        'value': 2.0
    }
 '''

Citations and Acknowledgments

Should you use this code for your own research, please cite:

@article{STERCKX2020103544,
    title = "Clinical information extraction for preterm birth risk prediction",
    journal = "Journal of Biomedical Informatics",
    volume = "110",
    pages = "103544",
    year = "2020",
    issn = "1532-0464",
    doi = "https://doi.org/10.1016/j.jbi.2020.103544",
    url = "http://www.sciencedirect.com/science/article/pii/S1532046420301726",
    author = "Lucas Sterckx and Gilles Vandewiele and Isabelle Dehaene and Olivier Janssens and Femke Ongenae and Femke {De Backere} and Filip {De Turck} and Kristien Roelens and Johan Decruyenaere and Sofie {Van Hoecke} and Thomas Demeester",
    keywords = "Clinical information extraction, Clinical decision support models, Preterm birth, Text mining"
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data

Installation

Train weakly supervised NER

Example

Citations and Acknowledgments

About

Releases

Packages

License

lusterck/preturn_ie

Folders and files

Latest commit

History

Repository files navigation

Data

Installation

Train weakly supervised NER

Example

Citations and Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages