Skip to content

Commit

Permalink
Macro+Binary F1/Recall/Precision + improved typing
Browse files Browse the repository at this point in the history
  • Loading branch information
jacobgil committed Jun 11, 2023
1 parent ed3081a commit 62d080c
Show file tree
Hide file tree
Showing 9 changed files with 869 additions and 268 deletions.
85 changes: 60 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![Build Status](https://github.com/jacobgil/confidenceinterval/workflows/Tests/badge.svg)
[![Downloads](https://static.pepy.tech/personalized-badge/confidenceinterval?period=month&units=international_system&left_color=black&right_color=brightgreen&left_text=Monthly%20Downloads)](https://pepy.tech/project/confidenceinterval)
[![Downloads](https://static.pepy.tech/personalized-badge/confidenceinterval?period=total&units=international_system&left_color=black&right_color=blue&left_text=Total%20Downloads)](https://pepy.tech/project/confidenceinterval)

`pip install confidenceinterval`

Expand All @@ -12,6 +14,8 @@ This is a package that computes common machine learning metrics like F1, and ret

⭐ Support for many metrics, with modern confidence interval methods.

⭐ The only package with analytical computation of the CI for Macro/Micro/Binary averaging F1, Precision and Recall.

⭐ Support for both analytical computation of the confidence intervals, and bootstrapping methods.

⭐ Easy to use interface to compute confidence intervals on new metrics that don't appear here, with bootstrapping.
Expand All @@ -31,10 +35,19 @@ Part of this is because there were no simple to use python packages for this.
## Getting started

```python
# All the possible imports:
from confidenceinterval import roc_auc_score
auc, ci = roc_auc_score(y_true, y_pred, confidence_level=0.95)
auc, ci = roc_auc_score(y_true, y_pred, confidence_level=0.95, method='bootstrap_bca')
auc, ci = roc_auc_score(y_true, y_pred, confidence_level=0.95, method='bootstrap_percentile', n_resamples=5000)
from confidence interval import precision_score, recall_score, f1_score
from confidence interval import accuracy_score,
ppv_score,
npv_score,
tpr_score,
fpr_score,
tnr_score
from confidenceinterval.bootstrap import bootstrap_ci

# Example usage:
auc, ci = roc_auc_score(y_true, y_pred, confidence_level=0.95, method='bootstrap_bca', n_resamples=5000)
```

## All methods do an analytical computation by default, but can do bootsrapping instead
Expand All @@ -50,32 +63,21 @@ random_state = np.random.default_rng()
n_resamples=9999
```

## Get a confidence interval for any external metric with Bootstrapping
With the bootstrap_ci method, you can get the CI for any metric function that gets y_true and y_pred as arguments.

As an example, lets get the CI for the balanced accuracy metric from scikit-learn.

```python
from confidenceinterval.bootstrap import bootstrap_ci
# You can specify a random generator for reproducability, or pass None
random_generator = np.random.default_rng()
bootstrap_ci(y_true=y_true,
y_pred=y_pred,
metric=sklearn.metrics.balanced_accuracy_score,
confidence_level=0.95,
n_resamples=9999,
method='bootstrap_bca',
random_state=random_generator)
```

## F1, Precision, Recall (with Macro and Micro averaging)
## Support for binary, macro and micro averagin for F1, Precision and Recall.
```python
from confidence interval import precision_score, recall_score, f1_score
binary_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='binary')
macro_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='macro')
micro_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='micro')
bootstrap_binary_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='binary', method='bootstrap_bca', n_resamples=5000)

```

These methods also accept average='micro' or average='macro'.
These methods also accept average='micro' or average='macro', or 'binary'.

The analytical computation here is using the (amazing) 2022 paper of Takahashi et al (reference below).
The analytical computation here is using the (amazing) 2022 paper of Takahashi et al (reference below).

The paper derived recall and precision only for micro. We derive the recall and precision confidence intervals for macro F1 as well using the delta method.


## ROC AUC
Expand All @@ -94,7 +96,7 @@ from confidence interval import accuracy_score,
fpr_score,
tnr_score
# Wilson is used by default:
ppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95)
ppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='wilson')
ppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='jeffreys')
ppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='agresti_coull')
ppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='bootstrap_bca')
Expand All @@ -108,6 +110,39 @@ By default method='wilson', the wilson interval, which behaves better for smalle

method can be one of ['wilson', 'normal', 'agresti_coull', 'beta', 'jeffreys', 'binom_test'], or one of the boostrap methods.

## Get a confidence interval for any custom metric with Bootstrapping
With the bootstrap_ci method, you can get the CI for any metric function that gets y_true and y_pred as arguments.

As an example, lets get the CI for the balanced accuracy metric from scikit-learn.

```python
from confidenceinterval.bootstrap import bootstrap_ci
# You can specify a random generator for reproducability, or pass None
random_generator = np.random.default_rng()
bootstrap_ci(y_true=y_true,
y_pred=y_pred,
metric=sklearn.metrics.balanced_accuracy_score,
confidence_level=0.95,
n_resamples=9999,
method='bootstrap_bca',
random_state=random_generator)
```



----------

Citation
If you use this for research, please cite. Here is an example BibTeX entry:

@misc{jacobgildenblatconfidenceinterval,
title={A python library for confidence intervals},
author={Jacob Gildenblat},
year={2023},
publisher={GitHub},
howpublished={\url{https://github.com/jacobgil/confidenceinterval}},
}

----------

## References
Expand Down
4 changes: 2 additions & 2 deletions confidenceinterval/auc.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

def roc_auc_score_bootstrap(y_true: List,
y_pred: List,
confidence_level: int = 0.95,
confidence_level: float = 0.95,
method: str = 'bootstrap_bca',
n_resamples: int = 9999,
random_state: Callable = None) -> Tuple[float, float]:
Expand All @@ -24,7 +24,7 @@ def roc_auc_score_bootstrap(y_true: List,

def roc_auc_score(y_true: List,
y_pred: List,
confidence_level: int = 0.95,
confidence_level: float = 0.95,
method: str = 'delong',
*args, **kwargs) -> Tuple[float, float]:
assert method in [
Expand Down
Loading

0 comments on commit 62d080c

Please sign in to comment.