The project investigates how biomedical language models' performance degrades over time due to shifts in data between training and deployment. It focuses on three key biomedical tasks, employing various data drift measurements, and statistical analyses to quantify the temporal effects and explore the relation between data shift and model performance variation.

This study reveals that time impacts model performance, and data shift could serve as an indicator for determining when model updates are necessary. The presentation slides are available for viewing: Time_Matters_Slides_AMIA24.pdf

Environment Setup

Platform:

Ubuntu 22.04
Anaconda, Python 3.10.13
Linux Kernel: 6.8.0-40-generic

Run the following commands to create the environment:

conda env create -f environment.yml
conda activate tempo0

Time-varying Biomedical Data

Overview of the data

Dataset	Time Intervals	Task	Labels	Data Size
MIMIC	2014 - 2016, 2008 - 2010, 2011 - 2013, 2017 -2019	Phenotype Inference	Top 50 frequent ICD codes	331,794 notes
BioNER	2009ST, 2011EPI, 2011ID, 2013GE	Information Extraction	IOBES Tags of Protein Entity	49,354 entities
BioASQ	2013-2015, 2016-2018, 2019-2020, 2021-2023	Question Answering	Gold Standard Answer	5,046 QA pairs

Usage

Data preprocessing and split: After downlading data, go to each data folder and run python build-[data].py and python split-[data].py.
Test temporal effect: Run sh [data].sh to conduct test the performance on all time domain pairs. After getting the result, use python record_[data].py to get the experiment result summary.
Data shift measuring:

Run analysis/[encoder]_embedings.py to get the text embedding for semantic shift evaluation. Here, [encoder] can be selected from general domain encoders like SBERT and USE, or biomedical domain encoders like BioLORD and MedCPT.
Run analysis/token_level_metrics.py to measure the token-level data shift, using metrics such as TF-IDF cosine similarity and Jaccard similarity.

Statistic analysis of the performance and the data shift. Run analysis/statistic_analysis.py to explore the relationships between language model performance and data shifts.

Note: Replace [data] with the actual data path. Since processing differs for each dataset, ensure to specify the correct [data] and file path.

Contact and Citation

For any inquiries, feel free to contact the author at: [email protected]

To cite this work:

@misc{liu2024timemattersexaminetemporal,
      title={Time Matters: Examine Temporal Effects on Biomedical Language Models}, 
      author={Weisi Liu and Zhe He and Xiaolei Huang},
      year={2024},
      eprint={2407.17638},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.17638}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
analysis		analysis
baselines		baselines
data		data
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Time_Matters_Slides_AMIA24.pdf		Time_Matters_Slides_AMIA24.pdf
environment.yml		environment.yml
record_bioasq.py		record_bioasq.py
record_mimic.py		record_mimic.py
record_ner.py		record_ner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of Contents

Environment Setup

Time-varying Biomedical Data

Overview of the data

Usage

Contact and Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

trust-nlp/TemporalAssessment

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Environment Setup

Time-varying Biomedical Data

Overview of the data

Usage

Contact and Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages