Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The output score of an individual-level feature #568

Open
JWKKWJ123 opened this issue Aug 19, 2024 · 4 comments
Open

The output score of an individual-level feature #568

JWKKWJ123 opened this issue Aug 19, 2024 · 4 comments

Comments

@JWKKWJ123
Copy link

JWKKWJ123 commented Aug 19, 2024

Hi all,
I would like to ask if I want to use the ebm.eval_terms(.) to get local explanations or the ebm.term_importances(.) to get global explanations, what is the exactly of output score of an individual-level feature k in a classification task? Is it exactly the output of shape functions fk(.), or the a normalized number?
If the feature importance score is a normalized number, is it the output of sigmoid function like this:
image

@paulbkoch
Copy link
Collaborator

Hi @JWKKWJ123 -- eval_terms finds the vertical value on the shape plots for the feature values in X. For EBMs, that value is equal to the local explanations.

term_importances are not normalized, and no sigmoid is applied. term_importances can be calculated with eval_terms. If you take the mean of the absolute values returned from eval_terms, it will be equal to the term importances.

@JWKKWJ123
Copy link
Author

JWKKWJ123 commented Aug 20, 2024

Hi @JWKKWJ123 -- eval_terms finds the vertical value on the shape plots for the feature values in X. For EBMs, that value is equal to the local explanations.

term_importances are not normalized, and no sigmoid is applied. term_importances can be calculated with eval_terms. If you take the mean of the absolute values returned from eval_terms, it will be equal to the term importances.

Hi Paul, Thank you very your reply!
Indeed I used to took the mean absolute value of the local explanations and it have the same result as the output of the term_importances .
I have another question that if I want to compare the feature importance among EBMs who trained on same kinds of features, do I need to do the normalization? If it is needed, what kind of normalization do you recommend (L1/L2/sigmoid/minmax/...)?

@paulbkoch
Copy link
Collaborator

For classification, the scores are in logits and should be comparable across datasets. For regression, the scores are in the units being predicted, so you'll probably want to normalize across datasets. There isn't a generally agreed upon normalization for regression. I use the interquartile range in our benchmarks.

@JWKKWJ123
Copy link
Author

For classification, the scores are in logits and should be comparable across datasets. For regression, the scores are in the units being predicted, so you'll probably want to normalize across datasets. There isn't a generally agreed upon normalization for regression. I use the interquartile range in our benchmarks.

Hi Paul,
Thank you for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants