Evaluation API for the MEDS Decentralized Extensible Validation MEDS-DEV Benchmark.
Note
This is a work-in-progress package and currently only supports evaluation of binary classification tasks.
MEDS Evaluation pipeline is intended to be used together with MEDS-DEV, but can also be adapted to use as a standalone package.
Please refer to the MEDS-DEV tutorial to learn how to extract and prepare the data in the MEDS format and obtain model predictions ready to be evaluated.
Inputs to MEDS Evaluation must follow the prediction schema, which by default has five fields:
subject_id
: ID of the subject (patient) associated with the eventprediction_time
: time at which the prediction as being madeboolean_value
: ground truth boolean label for the prediction taskpredicted_boolean_value
(optional): predicted boolean label generated by the modelpredicted_boolean_probability
(optional): predicted probability logits generated by the model
This is equivalent to the following polars
schema:
Schema(
[
("subject_id", Int64),
("prediction_time", Datetime(time_unit="us")),
("boolean_value", Boolean),
("predicted_boolean_value", Boolean),
("predicted_boolean_probability", Float64),
]
)
Note that while predicted_boolean_value
and predicted_boolean_probability
are optional, at least one of
them must be present and contain non-null values in order to generate the results. In addition, a schema can
contain additional fields but at the moment these will not be used in MEDS Evaluation.