Skip to content

Differential Privacy via Distributionally Robust Optimization

License

Notifications You must be signed in to change notification settings

selvi-aras/DP-via-DRO

Repository files navigation

Differential Privacy via Distributionally Robust Optimization

GitHub supplement of the paper Differential Privacy via Distributionally Robust Optimization (DPviaDRO) by Aras Selvi, Huikang Liu, and Wolfram Wiesemann (2023). The paper has undergone a major revision, which is completed on the 23rd of May, 2024. All links are up-to-date, but we also include the earlier version of the manuscript in this repository.

The preprint is available on Optimization Online and arXiv. The paper was presented at the Robust Optimization Webinars, Rotman Young Scholar Seminar Series, SIAM Conference on Optimization, LBS Transatlantic Doctoral Conference, INFORMS Annual Meeting (2023), PGMO Days, University of Amsterdam, Georgia Tech ISyE Junior Researcher Workshop, and Imperial College Business School.

Update: This paper received a runner-up award for the 2023 INFORMS Optimization Society (IOS) Student Paper Prize, as well as a first place at Imperial College Business School best PhD paper in Analytics and Operations.

Description

The following is a guide to use this repository. The folder "GitHub Appendices" includes the additional experiments and discussions that are supplied with the GitHub repository. We now explain the codes in the following.

Data Independent Noise

This folder includes the C++ codes for the data independent noise optimization.

Data Dependent Noise

This folder includes the C++ codes for the data dependent noise optimization.

Multi-dimensional Data Independent Noise

This folder includes the C++ codes for the 2D query optimization (both directly optimizing it, or computing the 2D guarantees of the product of two 1-D distributions) that we document in the folder "GitHub Appendices" and file "GitHub_supplement.pdf" (pages 11-13). Note that this is a preliminary study, and it is not included in the main paper.

Dataset Cook

This folder has two Python3 Jupyter Notebooks which shows how we cleaned datasets examples: one for naïve Bayes, and another one for proximal coordinate descent.

Visualization

This folder has examples of Python3 visualizations that were used to generate Figures 3 and 4.

HPC Codes

This folder includes C++ compilation commands on the Imperial HPC's Linux Terminal as well as an example .PBS jobs file.

ML Algorithms Julia

Implementations of ML algorithms (NB, PCD) in Julia. analytic_gaussian.jl has the Analytic Gaussian mechanism's Julia implementation (we adopted the original Python implementation in Julia). compare.jl has the standard logistic regression (LR) implementation in case one wants to benchmark the non-private optimal LR classifier (uses MOSEK solver). grid_ell1_prep_main.jl, grid_ell1_prep_ub.jl, grid_ell1_prep_lb.jl are the files generating the code for Tables 1,4,5. interpret_results.jl includes the analysis (p-values, in/out-sample comparison, etc.) of the DP naïve Bayes methods. main.jl is the main DP-NB function including the Gaussian, truncated Laplace, and our optimised data independent mechanism, main_analytic.jl is the same for analytic Gaussian (added later on, hence in another function), and main_data_dependent.jl is the data dependent noise mechanism version which also prepares files for noise distributions to be optimized in C++ with the help of the function in read_distributions.jl. NB.jl has helper functions for the naïve Bayes method, including the non-noisy counts/statistics. The noisy versions are computed in the file sensitivites.jl. The file PCD_functions.jl includes the PCD iteration functions to be called in the iterations of the DP PCD algorithm. PCD_histogram.jl gives the histogram in the GitHub Additional Appendices. PCD_interpret_results ... .jl are the files we used to interpret PCD results. PCD.jl is the main function that runs the DP PCD algorithm, and PCD_multi.jl is the same which uses the data dependent noise (that was prepared in C++). sample.jl includes the functions that allow us to sample noise from various distributions. train_test.jl splits datasets into training and test-sets. The files visualize_bounds ... .jl were used to generate Figure 2.

Final Notes

The following scripts are also available upon request:

  • The C++ codes that we supplied are the simple implementation of our cutting-plane algorithms. We have several modifications that were used on the HPC to run parallel jobs.
  • Different (including hand-crafted) loss functions, and corresponding analytical "speeding up" stages for the cutting-plane method.
  • We shared how we cleaned datasets for two specific datasets. Similar codes for each of the datasets we used, and the final data itself are available upon request.
  • We provided some examples of HPC commands; however, for a specific PBS file, or a specific algorithm's parallelised implementation (to be able to run on the cluster computers), please get in touch with us.
  • We also have the codes of more visualizations that we did and did not report.

Thank You

Thank you for your interest in our work. If you found this work useful in your research and/or applications, please star this repository and cite:

@article{DPviaDRO,
  title={Differential Privacy via Distributionally Robust Optimization},
  author={Selvi, Aras and Liu, Huikang and Wiesemann, Wolfram},
  journal={arXiv preprint 2304.12681},
  year={2023}
}

Please contact Aras ([email protected]) if you encounter any issues using the scripts. For any other comment or question, please do not hesitate to contact us:

Aras Selvi ([email protected])

Huikang Liu ([email protected])

Wolfram Wiesemann ([email protected])

About

Differential Privacy via Distributionally Robust Optimization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published