Skip to content

Commit

Permalink
Merge branch 'Documentation_eeg_chbmit' into 'main'
Browse files Browse the repository at this point in the history
Documentation eeg chbmit

See merge request es/ai/hannah/hannah!362
  • Loading branch information
juliawerner committed Nov 28, 2023
2 parents 9dc985c + 7013e38 commit 2d07031
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 0 deletions.
44 changes: 44 additions & 0 deletions doc/applications/seizure_detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Dataset creation

The current used dataset was generated with the 16 channels with the highest variance within the ictal data from the <cite>**CHB-MIT Scalp EEG Database** [1]</cite>. If a new preprocessed dataset should be created, the ``scripts/eeg/eeg_dataset_creator.py`` file can be used. This takes the edf files from the CHB-MIT dataset as an input and performs a basic preprocessing including a band-pass filtering (0.1 Hz and 50 Hz) to remove DC components as well as the noisy component from the EEG measurement device. It finally creates binary labels for each data fragment.

Parameters that can be specified are amongst others:

class_ratio
: default = 5, Ratio of zeros:ones in the final `dev` and `retrain` datasets to account for the scarcity of the ictal data

data_length
: default = 1, Number of seconds per data point. For a sample rate of 256, the final data will be of the shape (D,C,256xdata_length)

samp_rate
: default = 256, Sampling rate for the dataset. The data is already sampled at 256 Hz, use this argument only when the rate needed differs from this

The dataset creation can be invoked by:

python eeg_dataset_creator.py --output_dir "./data/" --class_ratio 4 --data_length 0.5

Currently, the `16c_retrain_id` preprocessed dataset is configured as the default dataset in HANNAH if someone works with the CHB-MIT dataset. This was generated with a `class_ratio` of 4, a `data_length` of 0.5 and a `sampling_rate` of 256.

[1] Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


# Training a base model with the CHB-MIT dataset

The training with the preprocessed CHB-MIT dataset can be invoked with:

hannah-train dataset=chbmit features=identity ~normalizer

The data is normalized directly when loaded and does not need additional normalization during training. `~normalizer` is used to turn additional normalization off. Adding `+dataset.weighted_loss=True` improves the results notably (same applies for the retraining). The trained model can function as a base model which can be fine-tuned for each patient by subsequent retraining.


# Retraining

The main idea of retraining is to account for individual differences within the data. To perform subsequent retraining, for each patient a new model needs to be trained based on the checkpoint of the best model trained on all patients. The prior trained base model is loaded and retrained on patient-specific data. To invoke this training the dataset `chbmitrt` needs to be used as this specifically loads patient-specific data only. For a single patient, the retraining can be invoked by:

hannah-train dataset=chbmitrt trainer.max_epochs=10 model=tc-res8 module.batch_size=8 input_file=/PATH/TO/BASE_MODEL/best.ckpt ~normalizer

Alternatively, if the retraining should be performed for all patients, ``scripts/eeg/run_patient_retraining_complete.py`` can be used. To execute this script, one can use

python run_patient_retraining_complete.py 'model-name' 'dataset_name'

A model which has been successfully used for this application is for example the TC-ResNet8 `tc-res8` and it is recommended to use `16c_retrain_id` as a dataset name, which names the preprocessd CHB-MIT dataset with balanced class samples. During the retraining, for each patient a results folder is generated.
5 changes: 5 additions & 0 deletions pydoc-markdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,11 @@ renderer:
- title: "Evaluation"
name: eval
source: doc/eval.md
- title: "Specific applications"
children:
- title: Seizure Detection
name: applications/seizure_detection
source: doc/applications/seizure_detection.md
- title: Development
children:
- title: Overview
Expand Down

0 comments on commit 2d07031

Please sign in to comment.