-
Notifications
You must be signed in to change notification settings - Fork 31
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
131 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
1. Raw preprocess: | ||
- Replace '|' -> '\t' | ||
raw_preprocess.py | ||
inputs: data/raw/*.txt | ||
outputs: data/preprocessed/*.txt | ||
|
||
2. Mapping: | ||
- Map medication names -> medication codes | ||
map_medi_code.py | ||
- Map procedure codes -> procedure blocks | ||
map_proc_code.py | ||
|
||
inputs: | ||
data/preprocessed/medications.txt | ||
data/preprocessed/diagnosis_procedures.txt | ||
outputs: | ||
data/preprocessed/medication_mapped.txt | ||
data/preprocessed/diag_proc_block_mapped.txt | ||
|
||
|
||
3. | ||
- Cut off levels of diagnosis codes: keep the first 2 letters | ||
- Cut off levels of medication codes: keep the first 6 letters | ||
cut_off_code.py | ||
|
||
inputs: | ||
data/preprocessed/medication_mapped.txt | ||
data/preprocessed/diag_proc_block_mapped.txt | ||
|
||
outputs: | ||
data/preprocessed/medication_mapped_cutoff.txt | ||
data/preprocessed/diag_proc_block_mapped_cutoff.txt | ||
|
||
4. Filter admission: | ||
- Filter out all unusual admissions (admissions starting with one of following characters: R, D, L, M, Q, S, Y) | ||
- Filter out all dialysis admissions (not Emergency) | ||
filter_adm.py | ||
|
||
inputs: | ||
data/preprocessed/admissions.txt | ||
data/preprocessed/diag_proc_block_mapped_cutoff.txt | ||
outputs: | ||
data/preprocessed/admissions_filtered.txt | ||
data/preprocessed/diag_proc_filtered.txt | ||
|
||
5. Filter & cut off attendance: | ||
- Filter out all miss-information attendances | ||
- Cut off levels of diagnosis code: keep the first 2 letters | ||
filter_cutoff_atd.py | ||
|
||
inputs: | ||
data/preprocessed/attendances.txt | ||
outputs: | ||
data/preprocessed/atd_filtered.txt | ||
|
||
6. Filter patients: remove all duplicated information | ||
filter_patients.py | ||
input: | ||
data/preprocessed/patients.txt | ||
output: | ||
data/preprocessed/patnts_filtered.txt | ||
|
||
|
||
---> Files for creating the dataset: | ||
patnts_filtered.txt | ||
admissions_filtered.txt | ||
diag_proc_filtered | ||
medication_mapped_cutoff.txt | ||
atd_filtered.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
Files for creating datasets: | ||
patnts_filtered.txt | ||
admissions_filtered.txt | ||
diag_proc_filtered | ||
medication_mapped_cutoff.txt | ||
atd_filtered.txt | ||
|
||
I. CREATE ADM DATASET AND ATD DATASET | ||
Brief description: | ||
- From admissions_filtered.txt & diag_proc_filtered.txt: create adm_dataset | ||
- From atd_filtered.txt: create atd_dataset | ||
- Dump 2 datasets into 2 separated pkl files: adm.pkl & atd.pkl | ||
|
||
Steps: | ||
- Create 3 dictionaries: | ||
+ diag_dict (encoding diagnosis in diag_proc and attendances) | ||
+ proc_dict (encoding procedures in diag_proc) | ||
+ medi_dict (encoding medications in medi) | ||
- Create 2 dictionaries: prvsp_dict & prcae_dict (medi uses prvsp_refno & diag_proc uses prcae_refno) | ||
These two dicts are used for mapping diag, proc, medi into their admissions | ||
- Map diag, proc, medi into their admissions | ||
After this step, we have adm_dataset containing information of all admissions. | ||
Each admission has information of patnt_refno, admit_time, disch_time, method & a list its diag and a list of it's proc & medi | ||
- Create atd_dataset: | ||
Encode diagnosis of each attendance and then create the atd_dataset with the information of UR, arr_time, dep_time & code (code of diagnosis) | ||
|
||
Script: combine_data.py | ||
|
||
II. CREATE PATIENT DATASET | ||
Steps: | ||
- Create 2 dictionaries: | ||
+ patnt_dict (admissions use patnt_refno to identify the patients) | ||
+ ur_dict (attendances use ur to identify the patients) | ||
|
||
- Map admissions into their patients (use patnt_dict): | ||
This step create for each patient a list of his/her admissions (list_adm) | ||
|
||
- Map attendances into their patiens (use ur_dict): | ||
This step create for each patien a list of his/her attendances (list_atd) | ||
|
||
- Dump these 2 lists (list_adm & list_atd) to the file patnt.pkl | ||
|
||
Script: create_patnt_records.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
- Input: 'patnt.pkl', 'adm_pkl' | ||
patnt.pkl: contains all information of all patients' admissions. | ||
Each row is information of a patient with a list of admissions and a list of attendances. | ||
|
||
|
||
- Randomly create a datasets, each dataset includes train, validation, test sets and an adm dataset | ||
|
||
- Readmission prediction: | ||
Each data point is a sequence of a patient's admissions. Randomly choose an admission A and cut off all latter ones. | ||
Check if in the duration of 1 year (for diabetes) and 3 months (for mental health) after admission A there is any emergency admission. If it is, the label is 1, otherwise 0. | ||
|
||
- Next diagnosis prediction | ||
Sequence mapping: from a sequence of admissions -> sequence of outputs, each output is a set of next diagnoses. | ||
|
||
- High risk prediction: | ||
Same as readmission prediction. Output is 1 if after 1 year (for diabetes) and 3 months (for mental health) of discharge, the patient have at least 3 emergency readmissions. | ||
|
||
- Current interventions: | ||
Same as Next diagnosis prediction |