Skip to content

Commit

Permalink
Merge branch 'master' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
yzhao062 authored Aug 13, 2020
2 parents 78bf8c4 + 8f4b8cb commit fabef54
Show file tree
Hide file tree
Showing 26 changed files with 437 additions and 778 deletions.
4 changes: 4 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,7 @@ v<0.0.1>, <08/03/2020> -- Initial release & minor example fix.
v<0.0.2>, <08/04/2020> -- Set up autotests.
v<0.0.2>, <08/05/2020> -- Enable read the docs.
v<0.0.2>, <08/06/2020> -- Add models and remove task parameter from model invocation.
v<0.0.3>, <08/10/2020> -- Massive code refactoring.
v<0.0.3>, <08/11/2020> -- Add GPU support.
v<0.0.3>, <08/12/2020> -- Add support for image and clinical notes.

80 changes: 45 additions & 35 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Python Library for Healthcare AI (PyHealth)

-----

**Development Status**: **As of 08/04/2020, PyHealth is under active development and in its alpha stage. Please follow, star, and fork to get the latest functions**!
**Development Status**: **As of 08/12/2020, PyHealth is under active development and in its alpha stage. Please follow, star, and fork to get the latest functions**!


**PyHealth** is a comprehensive and flexible **Python library** for **healthcare AI**, designed for both **ML researchers** and **medical practitioners**.
Expand All @@ -75,7 +75,6 @@ PyHealth makes many important healthcare tasks become accessible, such as **phen
**ICU length stay forecasting**, etc. Running these prediction tasks with deep learning models can be as short as 10 lines of code.



PyHealth comes with three major modules: (i) *data preprocessing module*; (ii) *learning module*
and (iii) *evaluation module*. Typically, one can run the data prep module to prepare the data, then feed to the learning module for prediction, and finally assess
the result with the evaluation module.
Expand All @@ -89,6 +88,7 @@ PyHealth is featured for:

* **Unified APIs, detailed documentation, and interactive examples** across various datasets and algorithms.
* **Advanced models**\ , including **latest deep learning models** and **classical machine learning models**.
* **Wide coverage**, supporting **sequence data**, **image data**, and **text data** like clinical notes.
* **Optimized performance with JIT and parallelization** when possible, using `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_.
* **Customizable modules and flexible design**: each module may be turned on/off or totally replaced by custom functions. The trained models can be easily exported and reloaded for fast exexution and deployment.

Expand All @@ -99,15 +99,18 @@ PyHealth is featured for:
# load pre-processed CMS dataset
from pyhealth.data.expdata_generator import cms as cms_expdata_generator
from pyhealth.data.expdata_generator import sequencedata as expdata_generator
cur_dataset = cms_expdata_generator(exp_id=exp_id, sel_task='phenotyping')
cur_dataset.get_exp_data()
expdata_id = '2020.0810.data.mortality.mimic'
cur_dataset = expdata_generator(exp_id=exp_id)
cur_dataset.get_exp_data(sel_task='mortality', )
cur_dataset.load_exp_data()
# initialize the model for training
from pyhealth.models.lstm import LSTM
clf = LSTM(exp_id) # LSTM related parameters can be set here
from pyhealth.models.sequence.lstm import LSTM
# enable GPU
clf = LSTM(expmodel_id=expmodel_id, n_batchsize=20, use_gpu=True,
n_epoch=100, gpu_ids='0,1')
clf.fit(cur_dataset.train, cur_dataset.valid)
# load the best model for inference
Expand All @@ -116,9 +119,10 @@ PyHealth is featured for:
pred_results = clf.get_results()
# evaluate the model
from pyhealth import evaluation
evaluator = evaluation.__dict__['phenotyping']
r = evaluator(pred_results['hat_y'], pred_results['y'])
from pyhealth.evaluation.evaluator import func
r = func(pred_results['hat_y'], pred_results['y'])
print(r)
**Citing PyHealth**\ :
Expand Down Expand Up @@ -240,27 +244,27 @@ EHU-Claim CMS DE-SynPUF: CMS 2008-2010 Data Entrepreneu

You may download the above datasets at the links. The structure of the generated datasets can be found in datasets folder:

* \\datasets\\cms\\x_datat\\...csv
* \\datasets\\cms\\x_data\\...csv
* \\datasets\\cms\\y_data\\phenotyping.csv
* \\datasets\\cms\\y_data\\mortality.csv

The processed datasets (X,y) should be put in x_data, y_data correspondingly, to be appropriately digested by deep learning models.
The processed datasets (X,y) should be put in x_data, y_data correspondingly, to be appropriately digested by deep learning models. We include some sample datasets under \\datasets folder.

**(ii) Machine Learning and Deep Learning Models** :

=================== ================ ====================================================================================================== ===== ========================================
=================== ================ ======================================== ====================================================================================================== ===== ========================================
Type Abbr Algorithm Year Ref
=================== ================ ====================================================================================================== ===== ========================================
Classical Models LogisticReg Logistic Regression N/A
Classical Models XGBoost XGBoost: A scalable tree boosting system 2016 [#Chen2016Xgboost]_
Neural Networks LSTM Long short-term memory 1997 [#Hochreiter1997Long]_
Neural Networks GRU Gated recurrent unit 2014 [#Cho2014Learning]_
Neural Networks RETAIN RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism 2016 [#Choi2016RETAIN]_
Neural Networks Dipole Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks 2017 [#Ma2017Dipole]_
Neural Networks tLSTM Patient Subtyping via Time-Aware LSTM Networks 2017 [#Baytas2017tLSTM]_
Neural Networks RAIM RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data 2018 [#Xu2018RAIM]_
Neural Networks StageNet StageNet: Stage-Aware Neural Networks for Health Risk Prediction 2020 [#Gao2020StageNet]_
=================== ================ ====================================================================================================== ===== ========================================
=================== ================ ======================================== ====================================================================================================== ===== ========================================
Classical Models LogisticReg pyhealth.models.sequence.lr Logistic Regression N/A
Classical Models XGBoost pyhealth.models.sequence.lr.xgboost XGBoost: A scalable tree boosting system 2016 [#Chen2016Xgboost]_
Neural Networks LSTM pyhealth.models.sequence.lstm Long short-term memory 1997 [#Hochreiter1997Long]_
Neural Networks GRU pyhealth.models.sequence.gru Gated recurrent unit 2014 [#Cho2014Learning]_
Neural Networks RETAIN pyhealth.models.sequence.retain RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism 2016 [#Choi2016RETAIN]_
Neural Networks Dipole pyhealth.models.sequence.dipole Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks 2017 [#Ma2017Dipole]_
Neural Networks tLSTM pyhealth.models.sequence.tlstm Patient Subtyping via Time-Aware LSTM Networks 2017 [#Baytas2017tLSTM]_
Neural Networks RAIM pyhealth.models.sequence.raim RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data 2018 [#Xu2018RAIM]_
Neural Networks StageNet pyhealth.models.sequence.stagenet StageNet: Stage-Aware Neural Networks for Health Risk Prediction 2020 [#Gao2020StageNet]_
=================== ================ ======================================== ====================================================================================================== ===== ========================================

Examples of running ML and DL models can be found below, or directly at \\examples\\learning_examples\\

Expand Down Expand Up @@ -356,8 +360,12 @@ scripts to generate the customized datasets.
Quick Start for Running Predictive Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

`"examples/learning_models/lstm_cms_example.py" <https://github.com/yzhao062/pyhealth/blob/master/examples/learning_models/lstm_cms_example.py>`_
demonstrates the basic API of using LSTM for phenotyping prediction. **It is noted that the API across all other algorithms are consistent/similar**.

Before running examples, you need the datasets. Please download from the GitHub repository `"datasets" <https://github.com/yzhao062/PyHealth/tree/master/datasets>`_.
You can either unzip them manually or running our script `"00_extract_data_run_before_learning.py" <https://github.com/yzhao062/pyhealth/blob/master/examples/learning_models/00_extract_data_run_before_learning.py>`_

`"examples/learning_models/example_sequence_gpu_mortality.py" <https://github.com/yzhao062/pyhealth/blob/master/examples/learning_models/example_sequence_gpu_mortality.py>`_
demonstrates the basic API of using GRU for mortality prediction. **It is noted that the API across all other algorithms are consistent/similar**.

**If you do not have the preprocessed datasets yet, download the \\datasets folder (cms.zip and mimic.zip) from PyHealth repository, and run \\examples\\learning_models\\extract_data_run_before_learning.py to prepare/unzip the datasets.**

Expand All @@ -367,10 +375,11 @@ demonstrates the basic API of using LSTM for phenotyping prediction. **It is not
.. code-block:: python
# load pre-processed CMS dataset
from pyhealth.data.expdata_generator import cms as cms_expdata_generator
from pyhealth.data.expdata_generator import sequencedata as expdata_generator
cur_dataset = cms_expdata_generator(exp_id=exp_id, sel_task='phenotyping')
cur_dataset.get_exp_data()
expdata_id = '2020.0810.data.mortality.mimic'
cur_dataset = expdata_generator(exp_id=exp_id)
cur_dataset.get_exp_data(sel_task='mortality', )
cur_dataset.load_exp_data()
Expand All @@ -379,8 +388,10 @@ demonstrates the basic API of using LSTM for phenotyping prediction. **It is not
.. code-block:: python
# initialize the model for training
from pyhealth.models.lstm import LSTM
clf = LSTM(exp_id)
from pyhealth.models.sequence.lstm import LSTM
# enable GPU
clf = LSTM(expmodel_id=expmodel_id, n_batchsize=20, use_gpu=True,
n_epoch=100, gpu_ids='0,1')
clf.fit(cur_dataset.train, cur_dataset.valid)
#. Load the best shot of the training, predict on the test datasets
Expand All @@ -398,9 +409,9 @@ demonstrates the basic API of using LSTM for phenotyping prediction. **It is not
.. code-block:: python
# evaluate the model
from pyhealth import evaluation
evaluator = evaluation.__dict__['phenotyping']
r = evaluator(pred_results['hat_y'], pred_results['y'])
from pyhealth.evaluation.evaluator import func
r = func(pred_results['hat_y'], pred_results['y'])
print(r)
Expand All @@ -417,7 +428,6 @@ Blueprint & Development Plan
The long term goal of PyHealth is to become a comprehensive healthcare AI toolkit that supports
beyond EHR data, but also the images and clinical notes.

- The support of image datasets and clinical notes
- The compatibility and the support of OMOP format datasets
- Model persistence (save, load, and portability)
- The release of a benchmark paper with PyHealth
Expand Down
Binary file added datasets/image.zip
Binary file not shown.
26 changes: 16 additions & 10 deletions docs/example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,11 @@ scripts to generate the customized datasets.
Quick Start for Running Predictive Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

`"examples/learning_models/lstm_cms_example.py" <https://github.com/yzhao062/pyhealth/blob/master/examples/learning_models/lstm_cms_example.py>`_
demonstrates the basic API of using LSTM for phenotyping prediction. **It is noted that the API across all other algorithms are consistent/similar**.
Before running examples, you need the datasets. Please download from the GitHub repository `"datasets" <https://github.com/yzhao062/PyHealth/tree/master/datasets>`_.
You can either unzip them manually or running our script `"00_extract_data_run_before_learning.py" <https://github.com/yzhao062/pyhealth/blob/master/examples/learning_models/00_extract_data_run_before_learning.py>`_

`"examples/learning_models/example_sequence_gpu_mortality.py" <https://github.com/yzhao062/pyhealth/blob/master/examples/learning_models/example_sequence_gpu_mortality.py>`_
demonstrates the basic API of using GRU for mortality prediction. **It is noted that the API across all other algorithms are consistent/similar**.

**If you do not have the preprocessed datasets yet, download the \\datasets folder (cms.zip and mimic.zip) from PyHealth repository, and run \\examples\\learning_models\\extract_data_run_before_learning.py to prepare/unzip the datasets.**

Expand All @@ -81,10 +84,11 @@ demonstrates the basic API of using LSTM for phenotyping prediction. **It is not
.. code-block:: python
# load pre-processed CMS dataset
from pyhealth.data.expdata_generator import cms as cms_expdata_generator
from pyhealth.data.expdata_generator import sequencedata as expdata_generator
cur_dataset = cms_expdata_generator(exp_id=exp_id, sel_task='phenotyping')
cur_dataset.get_exp_data()
expdata_id = '2020.0810.data.mortality.mimic'
cur_dataset = expdata_generator(exp_id=exp_id)
cur_dataset.get_exp_data(sel_task='mortality', )
cur_dataset.load_exp_data()
Expand All @@ -93,8 +97,10 @@ demonstrates the basic API of using LSTM for phenotyping prediction. **It is not
.. code-block:: python
# initialize the model for training
from pyhealth.models.lstm import LSTM
clf = LSTM(exp_id)
from pyhealth.models.sequence.lstm import LSTM
# enable GPU
clf = LSTM(expmodel_id=expmodel_id, n_batchsize=20, use_gpu=True,
n_epoch=100, gpu_ids='0,1')
clf.fit(cur_dataset.train, cur_dataset.valid)
#. Load the best shot of the training, predict on the test datasets
Expand All @@ -112,7 +118,7 @@ demonstrates the basic API of using LSTM for phenotyping prediction. **It is not
.. code-block:: python
# evaluate the model
from pyhealth import evaluation
evaluator = evaluation.__dict__['phenotyping']
r = evaluator(pred_results['hat_y'], pred_results['y'])
from pyhealth.evaluation.evaluator import func
r = func(pred_results['hat_y'], pred_results['y'])
print(r)
Loading

0 comments on commit fabef54

Please sign in to comment.