Skip to content

This repository stands for applying and evaluating HUNER pre-trained model ("disease_all") on "BC5CDR-Disease" data set .

Notifications You must be signed in to change notification settings

miladnouriezade/HUNER-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HUNER

This repository stands for applying and evaluating HUNER pre-trained model ("disease_all") on "BC5CDR-Disease" data set .

Installation

  1. Install docker
  2. Download pretrained model ("disease_all") from here, place it into huner/models directory and untar it using
tar xzf disease_all.tar.gz

Prediction

For applying prediction on BC5CDR-Disease data set we need to remove labeles from .tsv file and convert it to pre-tokenized .txt file that tokens are seprated by whitespace.

  1. Use tokenized_txt.py in helper folder for preprocess your .tsv data and make it ready for using as model input.

    e.g. tokenized_test.txt

    Selegiline - induced postural hypotension in Parkinson ' s disease : a longitudinal study on the effects of drug withdrawal .
    
  2. Start HUNER server using

     ./start_server.sh disease_all

    model must reside in models directory .

  3. While server is running use another terminal tab for tagging input data using

    python client.py --name disease_all --assume_tokenized  /path/to/tokenized_test.txt OUTPUT.CONLL

    The output will then be written to OUTPUT.CONLL .

Result

OUTPUT.CONLL sample result on tokenized_test.txt looks like this

Torsade	POS	B-NP
de	POS	I-NP
pointes	POS	I-NP
ventricular	POS	I-NP
tachycardia	POS	I-NP
during	POS	O
low	POS	O
dose	POS	O
intermittent	POS	O
dobutamine	POS	O
treatment	POS	O
in	POS	O
a	POS	O
patient	POS	O
with	POS	O
dilated	POS	B-NP
cardiomyopathy	POS	I-NP
and	POS	O
congestive	POS	B-NP
heart	POS	I-NP
failure	POS	I-NP
.	POS	O
The	POS	O
authors	POS	O
describe	POS	O
the	POS	O
case	POS	O
of	POS	O
a	POS	O
56	POS	O
-	POS	O
year	POS	O
-	POS	O
old	POS	O
woman	POS	O
with	POS	O
chronic	POS	O
,	POS	O
severe	POS	O
heart	POS	B-NP
failure	POS	I-NP
secondary	POS	O
to	POS	O
dilated	POS	B-NP
cardiomyopathy	POS	I-NP
and	POS	O
absence	POS	O
of	POS	O
significant	POS	O
ventricular	POS	B-NP
arrhythmias	POS	I-NP
who	POS	O
developed	POS	O
QT	POS	B-NP
prolongation	POS	I-NP
and	POS	O
torsade	POS	B-NP
de	POS	I-NP
pointes	POS	I-NP
ventricular	POS	I-NP
tachycardia	POS	I-NP
during	POS	O
one	POS	O
cycle	POS	O
of	POS	O
intermittent	POS	O
low	POS	O
dose	POS	O
(	POS	O
2	POS	O
.	POS	O
5	POS	O
mcg	POS	O
/	POS	O
kg	POS	O
per	POS	O
min	POS	O
)	POS	O
dobutamine	POS	O
.	POS	O

Evaluation

We use seqeval classification_report(y_true, y_pred) metric to evaluate HUNER model .

Setting up an environment

  1. Follow the installation instructions for Conda.

  2. Create a Conda environment called "seqeval" with Python 3.7.6:

    conda create -n seqeval python=3.7.6
  3. Activate the Conda environment:

    conda activate seqeval

Installation

To install seqeval, simply run:

$ pip install seqeval[cpu]

If you want to install seqeval on GPU environment, please run:

$ pip install seqeval[gpu]

Requirement

  • numpy >= 1.14.0

Preprocess and Evaluate

Since OUTPUT.CONLL format is a little bit different from BC5CDR-Disease IOB schemed, we need to modify our BC5CDR-Disease data.

  • BC5CDR-Disease

    Torsade	B
    de	I
    pointes	I
    ventricular	B
    tachycardia	I
    during	O
    low	O
    dose	O
    intermittent	O
    dobutamine	O
    treatment	O
    in	O
    a	O
    patient	O
    with	O
    dilated	B
    cardiomyopathy	I
    and	O
    congestive	B
    heart	I
    failure	I
    .	O
    
    
  • OUTPUT.CONLL

    Torsade	POS	B-NP
    de	POS	I-NP
    pointes	POS	I-NP
    ventricular	POS	I-NP
    tachycardia	POS	I-NP
    during	POS	O
    low	POS	O
    dose	POS	O
    intermittent	POS	O
    dobutamine	POS	O
    treatment	POS	O
    in	POS	O
    a	POS	O
    patient	POS	O
    with	POS	O
    dilated	POS	B-NP
    cardiomyopathy	POS	I-NP
    and	POS	O
    congestive	POS	B-NP
    heart	POS	I-NP
    failure	POS	I-NP
    .	POS	O
    

Use test.tsv or any file that you used it for prediction in BC5CDR-Disease data set and replace all B tags with B-NP and all I tags with I-NP using Exel .

E.g.test.tsv shuold look like this after modification .

Torsade	B-NP
de	I-NP
pointes	I-NP
ventricular	B-NP
tachycardia	I-NP
during	O
low	O
dose	O
intermittent	O
dobutamine	O
treatment	O
in	O
a	O
patient	O
with	O
dilated	B-NP
cardiomyopathy	I-NP
and	O
congestive	B-NP
heart	I-NP
failure	I-NP
.	O

Now use evaluation.py in helper/evaluation folder to evaluate model .

About

This repository stands for applying and evaluating HUNER pre-trained model ("disease_all") on "BC5CDR-Disease" data set .

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published