The output score of an individual-level feature #568

JWKKWJ123 · 2024-08-19T19:47:40Z

Hi all,
I would like to ask if I want to use the ebm.eval_terms(.) to get local explanations or the ebm.term_importances(.) to get global explanations, what is the exactly of output score of an individual-level feature k in a classification task? Is it exactly the output of shape functions f_k(.), or the a normalized number?
If the feature importance score is a normalized number, is it the output of sigmoid function like this:

paulbkoch · 2024-08-19T21:51:26Z

Hi @JWKKWJ123 -- eval_terms finds the vertical value on the shape plots for the feature values in X. For EBMs, that value is equal to the local explanations.

term_importances are not normalized, and no sigmoid is applied. term_importances can be calculated with eval_terms. If you take the mean of the absolute values returned from eval_terms, it will be equal to the term importances.

JWKKWJ123 · 2024-08-20T10:18:00Z

Hi @JWKKWJ123 -- eval_terms finds the vertical value on the shape plots for the feature values in X. For EBMs, that value is equal to the local explanations.

term_importances are not normalized, and no sigmoid is applied. term_importances can be calculated with eval_terms. If you take the mean of the absolute values returned from eval_terms, it will be equal to the term importances.

Hi Paul, Thank you very your reply!
Indeed I used to took the mean absolute value of the local explanations and it have the same result as the output of the term_importances .
I have another question that if I want to compare the feature importance among EBMs who trained on same kinds of features, do I need to do the normalization? If it is needed, what kind of normalization do you recommend (L1/L2/sigmoid/minmax/...)?

paulbkoch · 2024-08-27T15:42:44Z

For classification, the scores are in logits and should be comparable across datasets. For regression, the scores are in the units being predicted, so you'll probably want to normalize across datasets. There isn't a generally agreed upon normalization for regression. I use the interquartile range in our benchmarks.

JWKKWJ123 · 2024-08-29T14:14:02Z

For classification, the scores are in logits and should be comparable across datasets. For regression, the scores are in the units being predicted, so you'll probably want to normalize across datasets. There isn't a generally agreed upon normalization for regression. I use the interquartile range in our benchmarks.

Hi Paul,
Thank you for your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The output score of an individual-level feature #568

The output score of an individual-level feature #568

JWKKWJ123 commented Aug 19, 2024 •

edited

Loading

paulbkoch commented Aug 19, 2024

JWKKWJ123 commented Aug 20, 2024 •

edited

Loading

paulbkoch commented Aug 27, 2024

JWKKWJ123 commented Aug 29, 2024

The output score of an individual-level feature #568

The output score of an individual-level feature #568

Comments

JWKKWJ123 commented Aug 19, 2024 • edited Loading

paulbkoch commented Aug 19, 2024

JWKKWJ123 commented Aug 20, 2024 • edited Loading

paulbkoch commented Aug 27, 2024

JWKKWJ123 commented Aug 29, 2024

JWKKWJ123 commented Aug 19, 2024 •

edited

Loading

JWKKWJ123 commented Aug 20, 2024 •

edited

Loading