Macro+Binary F1/Recall/Precision + improved typing

jacobgil · Jun 11, 2023 · 62d080c · 62d080c
1 parent ed3081a
commit 62d080c
Show file tree

Hide file tree

Showing 9 changed files with 869 additions and 268 deletions.
diff --git a/README.md b/README.md
@@ -2,6 +2,8 @@
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 ![Build Status](https://github.com/jacobgil/confidenceinterval/workflows/Tests/badge.svg)
+[![Downloads](https://static.pepy.tech/personalized-badge/confidenceinterval?period=month&units=international_system&left_color=black&right_color=brightgreen&left_text=Monthly%20Downloads)](https://pepy.tech/project/confidenceinterval)
+[![Downloads](https://static.pepy.tech/personalized-badge/confidenceinterval?period=total&units=international_system&left_color=black&right_color=blue&left_text=Total%20Downloads)](https://pepy.tech/project/confidenceinterval)
 
 `pip install confidenceinterval`
 
@@ -12,6 +14,8 @@ This is a package that computes common machine learning metrics like F1, and ret
 
 ⭐ Support for many metrics, with modern confidence interval methods.
 
+⭐ The only package with analytical computation of the CI for Macro/Micro/Binary averaging F1, Precision and Recall.
+
 ⭐ Support for both analytical computation of the confidence intervals, and bootstrapping methods.
 
 ⭐ Easy to use interface to compute confidence intervals on new metrics that don't appear here, with bootstrapping.
@@ -31,10 +35,19 @@ Part of this is because there were no simple to use python packages for this.
 ## Getting started
 
 ```python
+# All the possible imports:
 from confidenceinterval import roc_auc_score
-auc, ci = roc_auc_score(y_true, y_pred, confidence_level=0.95)
-auc, ci = roc_auc_score(y_true, y_pred, confidence_level=0.95, method='bootstrap_bca')
-auc, ci = roc_auc_score(y_true, y_pred, confidence_level=0.95, method='bootstrap_percentile', n_resamples=5000)
+from confidence interval import precision_score, recall_score, f1_score
+from confidence interval import accuracy_score,
+                                ppv_score,
+                                npv_score,
+                                tpr_score,
+                                fpr_score,
+                                tnr_score
+from confidenceinterval.bootstrap import bootstrap_ci
+
+# Example usage:
+auc, ci = roc_auc_score(y_true, y_pred, confidence_level=0.95, method='bootstrap_bca', n_resamples=5000)
 ```
 
 ## All methods do an analytical computation by default, but can do bootsrapping instead
@@ -50,32 +63,21 @@ random_state = np.random.default_rng()
 n_resamples=9999
 ```
 
-## Get a confidence interval for any external metric with Bootstrapping
-With the bootstrap_ci method, you can get the CI for any metric function that gets y_true and y_pred as arguments.
-
-As an example, lets get the CI for the balanced accuracy metric from scikit-learn.
-
-```python
-from confidenceinterval.bootstrap import bootstrap_ci
-# You can specify a random generator for reproducability, or pass None
-random_generator = np.random.default_rng()
-bootstrap_ci(y_true=y_true,
-             y_pred=y_pred,
-             metric=sklearn.metrics.balanced_accuracy_score,
-             confidence_level=0.95,
-             n_resamples=9999,
-             method='bootstrap_bca',
-             random_state=random_generator)
-```
-
-## F1, Precision, Recall (with Macro and Micro averaging)
+## Support for binary, macro and micro averagin for F1, Precision and Recall.
 ```python
 from confidence interval import precision_score, recall_score, f1_score
+binary_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='binary')
+macro_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='macro')
+micro_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='micro')
+bootstrap_binary_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='binary', method='bootstrap_bca', n_resamples=5000)
+
 ```
 
-These methods also accept average='micro' or average='macro'.
+These methods also accept average='micro' or average='macro', or 'binary'.
 
-The analytical computation here is using the (amazing) 2022 paper of Takahashi et al (reference below). 
+The analytical computation here is using the (amazing) 2022 paper of Takahashi et al (reference below).
+
+The paper derived recall and precision only for micro. We derive the recall and precision confidence intervals for macro F1 as well using the delta method.
 
 
 ## ROC AUC
@@ -94,7 +96,7 @@ from confidence interval import accuracy_score,
                                 fpr_score,
                                 tnr_score
 # Wilson is used by default:
-ppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95)
+ppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='wilson')
 ppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='jeffreys')
 ppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='agresti_coull')
 ppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='bootstrap_bca')
@@ -108,6 +110,39 @@ By default method='wilson', the wilson interval, which behaves better for smalle
 
 method can be one of ['wilson', 'normal', 'agresti_coull', 'beta', 'jeffreys', 'binom_test'], or one of the boostrap methods.
 
+## Get a confidence interval for any custom metric with Bootstrapping
+With the bootstrap_ci method, you can get the CI for any metric function that gets y_true and y_pred as arguments.
+
+As an example, lets get the CI for the balanced accuracy metric from scikit-learn.
+
+```python
+from confidenceinterval.bootstrap import bootstrap_ci
+# You can specify a random generator for reproducability, or pass None
+random_generator = np.random.default_rng()
+bootstrap_ci(y_true=y_true,
+             y_pred=y_pred,
+             metric=sklearn.metrics.balanced_accuracy_score,
+             confidence_level=0.95,
+             n_resamples=9999,
+             method='bootstrap_bca',
+             random_state=random_generator)
+```
+
+
+
+----------
+
+Citation
+If you use this for research, please cite. Here is an example BibTeX entry:
+
+@misc{jacobgildenblatconfidenceinterval,
+  title={A python library for confidence intervals},
+  author={Jacob Gildenblat},
+  year={2023},
+  publisher={GitHub},
+  howpublished={\url{https://github.com/jacobgil/confidenceinterval}},
+}
+
 ----------
 
 ## References

diff --git a/confidenceinterval/auc.py b/confidenceinterval/auc.py
@@ -9,7 +9,7 @@
 
 def roc_auc_score_bootstrap(y_true: List,
                             y_pred: List,
-                            confidence_level: int = 0.95,
+                            confidence_level: float = 0.95,
                             method: str = 'bootstrap_bca',
                             n_resamples: int = 9999,
                             random_state: Callable = None) -> Tuple[float, float]:
@@ -24,7 +24,7 @@ def roc_auc_score_bootstrap(y_true: List,
 
 def roc_auc_score(y_true: List,
                   y_pred: List,
-                  confidence_level: int = 0.95,
+                  confidence_level: float = 0.95,
                   method: str = 'delong',
                   *args, **kwargs) -> Tuple[float, float]:
     assert method in [