Clinical-Longformer/NER at main · luoyuanlab/Clinical-Longformer

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
run_ner.py		run_ner.py

README.md

We used i2b2 as an example to illustrate how Clinical-Longformer can be applied to NER tasks.

Download i2b2 2006, 2010, 2012, 2014 dataset at: https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/. N2C2 access should be requested.
Pre-process i2b2 and formulate the dataset to .json format. We used this repo. Please also cite there NAACL paper, if you think it is useful.
On running the previous 2 steps, you shall have such directory structure

NER  
|
└───I2B2_2006
│   │   train.json
│   │   test.json
│   │   dev.json
└───I2B2_2010
│   │   train.json
│   │   test.json
│   │   dev.json
└───I2B2_2012
│   │   train.json
│   │   test.json
│   │   dev.json
└───I2B2_2014
│   │   train.json
│   │   test.json
│   │   dev.json

Download run_ner.py
RUN

CUDA_VISIBLE_DEVICES=1,2,3,4 \
python run_ner.py \
  --model_name_or_path yikuan8/Clinical-Longformer \
  --num_train_epochs 4 \
  --do_train \
  --do_eval \
  --do_predict \
  --evaluation_strategy epoch \
  --train_file NER/I2B2_2014/train.json \
  --validation_file NER/I2B2_2014/dev.json \
  --test_file NER/I2B2_2014/test.json \
  --per_device_train_batch_size 4 \
  --per_device_eval_batch_size 8 \
  --output_dir /YOUR/PATH/OUTPUTDIR/ \
  --learning_rate 2e-5 \
  --fp16 \
  --fp16_backend amp \
  --overwrite_cache \
  --overwrite_output_dir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER

NER

README.md

Files

NER

Directory actions

More options

Directory actions

More options

Latest commit

History

NER

Folders and files

parent directory

README.md