This project implements adversarial debiasing to help increase model fairness in the DiCE (Diverse Counterfactual Explanations) framework.
Adversarial debiasing is a technique where a model is trained to be both accurate in its predictions and fair with respect to protected attributes (e.g., gender, race) by using an adversarial approach. The implementation includes:
- A standard model (baseline)
- An adversarially debiased model
- Evaluation of fairness metrics
- Analysis of model performance
- Assessment of recourse feasibility
adversarial_debiasing.py
: Complete implementation of adversarial debiasing with all analysestrain_adversarial_debiasing.py
: Script to train a model with adversarial debiasingevaluate_fairness.py
: Script to evaluate fairness metricscompare_performance.py
: Script to compare model performanceanalyze_recourse.py
: Script to analyze recourse feasibilityrun_all.py
: Script to run all analyses in sequence
- Python 3.7+
- TensorFlow 2.x
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Scikit-learn
- DiCE
You can run the complete analysis with:
python adversarial_debiasing.py
Or run individual scripts:
python train_adversarial_debiasing.py
python evaluate_fairness.py
python compare_performance.py
python analyze_recourse.py
Or run all scripts in sequence:
python run_all.py
- Demographic Parity Difference
- Equal Opportunity Difference
- Disparate Impact Ratio
- Equalized Odds Difference
- Accuracy
- Precision
- Recall
- F1 Score
- ROC AUC
- Success Rate
- Average Number of Changes
- Average Distance
The analysis generates several plots in the plots
directory:
fairness_metrics_comparison.png
: Comparison of fairness metricsperformance_metrics_comparison.png
: Comparison of model performanceaccuracy_by_gender.png
: Analysis of accuracy by genderrecourse_metrics_comparison.png
: Comparison of recourse feasibility metrics
The implementation uses the UCI Adult Income dataset, which predicts whether income exceeds $50K/yr based on census data, with gender as the protected attribute.