Skip to content

Chi-SquareX/BERT-Finetuning-for-Medical-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

BERT-Finetuning-for-Medical-Dataset

In this project, we pre-trained and fine-tuned BERT, RoBERTa and GPT2 from the HuggingFace Library to classify medical dataset. The specifics are mentioned in the experiment section. The correspoding Jupyter Notebooks can be found in this repo.

Dataset

The datasets used are allpatients, zreddit, smalldataset, Reddit All, Twitter.

Experiments

Out domain:

Experiments Model Continue pretraining Finetune Regression (severity) /Classification
Experiment 1 Reddit dataset ---- - dataset1/dataset2/Twitter
Experiment 2 Bert-RoBERTa-GPT zReddit Patients only(without labels) - dataset1/dataset2/Twitter
Experiment 3 Bert-RoBERTa-GPT Reddit All Patients (without labels) - dataset1/dataset2/Twitter
Experiment 4 Bert-RoBERTa-GPT Twitter - dataset1/dataset2
Experiment 5 Bert-RoBERTa-GPT - zReddit dataset (sent via email) dataset1/dataset2/Twitter
Experiment 6 Bert-RoBERTa-GPT - AllReddit Patients and control only (i will include control text and labels) dataset1/dataset2/Twitter
Experiment 7 Bert-RoBERTa-GPT - Twitter dataset1/dataset2/

In Domain:

Experiments Model Finetune Classification
Experiment 1 Bert zReddit (patients and control) zReddit
Experiment 2 RoBERTa zReddit (patients and control) zReddit
Experiment 3 GPT2 zReddit(patients and control) zReddit
Experiment 4 Bert Reddit All(patients and control) Reddit All
Experiment 5 RoBERTa Reddit All (patients and control) Reddit All
Experiment 6 GPT2 Reddit All (patients and control) Reddit All

Model Architectures:

Screenshot from 2021-10-28 22-25-11 BERT

The-RoBERTa-model-architecture ppm RoBERTa

Decoder-Only-Architecture-used-by-GPT-2 ppm, width="400" GPT2

Results

Screenshot from 2021-10-28 22-12-18

Screenshot from 2021-10-28 22-12-45

Due to computational constrains some models couldn't be trained by us using Google Colab. Hence, the results for these are left blank.

Releases

No releases published

Packages

No packages published