Using xgboost with `crankcompositor` #363

fa1999abdi · 2024-02-06T09:13:57Z

Hello
I'm doing ML survival study using the {MLR3proba} package, and I'm using three learners, "surv.rfsrc", "surv.xgboost" and "surv.penalized". I want to predict survival time for each individual and compare my three learners(with RMSE and C-index criteria). Would you please explain how can I use {mlr3pipelines} and {distrcompositor, crankcompositor} to do that?
The following are my codes:

# create a task
tsk_s <- as_task_surv(tb, time = "time_to_death", event = "status", type = "right")

#impute missing
po = po("imputehist")

# new task
new_task = po$train(list(tsk_s= tsk_s))[[1]]

# benckmark
srfs=lrn("surv.rfsrc",predict_type = "crank",importance ="permute")
sbboost=lrn("surv.xgboost",predict_type = "crank")
spe=lrn("surv.penalized", lambda1=485.86,predict_type = "crank")
learners=list(srfs,sbboost,spe)
resample = rsmp("cv", folds = 3)
design = benchmark_grid(new_task, learners, resample)
bm = benchmark(design)
msr_txt = c("surv.cindex","surv.rmse")
bm$aggregate(measures)[, c("learner_id","task_id", ..msr_txt)]

^{Created on 2024-02-06 with reprex v2.1.0}

bblodfon · 2024-02-06T22:05:03Z

Hi, please consult the crankcompose docs. Practically you will do something like:

task = tsk("rats")
pipe = po("imputehist") %>>% 
           ppl("crankcompositor", learner = lrn(#whatever#), response = TRUE, method = "sum_haz")
pipe$train(task)
p = pipe$predict(task)[[1]] # p will have a response (survival time) now
p$score(#your_measures#)

But note that in general in survival analysis, there are issues when trying to compose the response from a distr prediction via different methods and surv.rmse is rarely used if at all. More common is to evaluate the whole distr with measures like the integrated survival brier score, ie surv.graf (docs) or surv.rcll, etc.

bblodfon · 2024-02-08T13:16:32Z

@fa1999abdi question covered?

fa1999abdi · 2024-02-08T15:07:13Z

@bblodfon ,Thank you so much for your response.
but I should use distrcompositor for lrn("surv.xgboost") to predict survival time for each individual. is it correct?

bblodfon · 2024-02-08T17:54:17Z

You should use distrcompositor with xgboost and the estimator = breslow for cox objective, see #263 . This will give you a distr prediction type. If you really want a response, crankcompositor works (but note some issues with improper distributions and how taking mean or median will not work as expected but with good reasoning behind that).

bblodfon · 2024-02-09T07:52:17Z

@fa1999abdi I am going to soon split the xgboost objectives/learners (Cox vs AFT are very different) and for the Cox, the distr predictions will by default be generated using the breslow estimator to streamline things (so no distr-composition will be required for the XGboost-Cox learner). Of course response prediction will not be included, you will still need to compose that with the crankcompositor

fa1999abdi · 2024-02-10T08:23:22Z

but it didn't work

    tsk_s <- as_task_surv(tb, time = "time_to_death", event = "status", type = "right")
    pipe = po("imputehist") %>>% 
      ppl("crankcompositor", learner = lrn("surv.xgboost"), response = TRUE, method = "sum_haz")
    pipe$train(tsk_s)
p = pipe$predict(tsk_s)[[1]] # p will have a response (survival time) now

$compose_crank.output
NULL
>  p = pipe$predict(tsk_s)[[1]] # p will have a response (survival time) now
Error: Assertion on 'distr' failed: FALSE.
This happened PipeOp compose_crank's $predict()
`

bblodfon · 2024-02-10T08:59:51Z

Yes, you need to estimate the distr either way (crankcompositor converts a distr to crank/response), so now it looks a bit complex but the following works:

library(mlr3proba)
#> Loading required package: mlr3
library(mlr3pipelines)
library(mlr3extralearners)

task = tsk("rats")

learner =
  po("encode", method = "treatment") %>>%
  ppl("crankcompositor",
    # crank needs a distr prediction type, xgboost doesn't have one, so we have to estimate it:
    learner = ppl("distrcompositor", learner = lrn("surv.xgboost", nrounds = 10),
                   estimator = "breslow", overwrite = FALSE),
    response = TRUE, method = "sum_haz", overwrite = FALSE) |>
  as_learner()

learner$train(task)
p = learner$predict(task)
p
#> <PredictionSurv> for 300 observations:
#>     row_ids time status      crank         lp response     distr
#>           1  101  FALSE -0.5318943 -0.5318943 3.987942 <list[1]>
#>           2   49   TRUE -0.9984229 -0.9984229 2.501140 <list[1]>
#>           3  104  FALSE -0.9984229 -0.9984229 2.501140 <list[1]>
#> ---                                                             
#>         298   92  FALSE -1.0661759 -1.0661759 2.337293 <list[1]>
#>         299  104  FALSE -0.8688244 -0.8688244 2.847226 <list[1]>
#>         300  102  FALSE -0.8688244 -0.8688244 2.847226 <list[1]>

p$score(msr("surv.cindex")) # uses lp prediction type
#> surv.cindex 
#>   0.8984875
p$score(msr("surv.rmse")) # uses response prediction type
#> surv.rmse 
#>  61.24336
p$score(msr("surv.brier")) # uses distr prediction type
#>  surv.graf 
#> 0.03333211

^{Created on 2024-02-10 with reprex v2.0.2}

fa1999abdi · 2024-02-11T07:20:15Z

@bblodfon thanks so much for your help.

bblodfon · 2024-02-11T07:38:37Z

FYI, even though you can do the above and get a response (survival time), this is from Haider's paper (he introduced the D-calibration score), where he mentions why converting a distr to a single value response is not good practice for survival modeling:

bblodfon added the Type: Question label Feb 9, 2024

bblodfon changed the title ~~Question~~ Using xgboost with crankcompositor/distrcompositor [Question] Feb 9, 2024

bblodfon changed the title ~~Using xgboost with crankcompositor/distrcompositor [Question]~~ Using xgboost with crankcompositor [Question] Feb 9, 2024

bblodfon changed the title ~~Using xgboost with crankcompositor [Question]~~ Using xgboost with crankcompositor Feb 9, 2024

bblodfon closed this as completed Feb 11, 2024

mlr-org locked and limited conversation to collaborators Feb 11, 2024

bblodfon converted this issue into discussion #366 Feb 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Using xgboost with `crankcompositor` #363

Using xgboost with `crankcompositor` #363

fa1999abdi commented Feb 6, 2024

bblodfon commented Feb 6, 2024 •

edited

Loading

bblodfon commented Feb 8, 2024

fa1999abdi commented Feb 8, 2024 •

edited

Loading

bblodfon commented Feb 8, 2024 •

edited

Loading

bblodfon commented Feb 9, 2024

fa1999abdi commented Feb 10, 2024

bblodfon commented Feb 10, 2024

fa1999abdi commented Feb 11, 2024

bblodfon commented Feb 11, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Using xgboost with crankcompositor #363

Using xgboost with crankcompositor #363

Comments

fa1999abdi commented Feb 6, 2024

bblodfon commented Feb 6, 2024 • edited Loading

bblodfon commented Feb 8, 2024

fa1999abdi commented Feb 8, 2024 • edited Loading

bblodfon commented Feb 8, 2024 • edited Loading

bblodfon commented Feb 9, 2024

fa1999abdi commented Feb 10, 2024

bblodfon commented Feb 10, 2024

fa1999abdi commented Feb 11, 2024

bblodfon commented Feb 11, 2024

This issue was moved to a discussion.

Using xgboost with `crankcompositor` #363

Using xgboost with `crankcompositor` #363

bblodfon commented Feb 6, 2024 •

edited

Loading

fa1999abdi commented Feb 8, 2024 •

edited

Loading

bblodfon commented Feb 8, 2024 •

edited

Loading