RAIFLE: Reconstruction Attack on Interaction-based Federated Learning with Adversarial Data Manipulation

This is the repository for our paper RAIFLE: Reconstruction Attack on Interaction-based Federated Learning with Adversarial Data Manipulation. We improve the performance of reconstruction attacks against federated recommendation and learning to rank systems by manipulating the training features of the recommendation/ranking items. To reproduce our results in the paper, please follow the following steps:

Requirements:

Hardware:
- A commodity machine with at least 16GB of RAM and 30GB of storage.
Software:
- A (x86-64) Unix-based OS. (Windows WSL will probably also work, but might need some installation modifications.)
- conda (such as Miniconda)
Highly recommended (for the image-based experiment only):
- A CUDA-capable NVIDIA GPU with at least 8GB of VRAM (preferably 12-16GB) and NVIDIA driver installed.

Installation:

Download this repo to your machine (e.g. git clone https://github.com/dzungvpham/raifle.git) and navigate to the downloaded folder.
Install conda if needed (e.g. Miniconda).
Create a conda environment with environment.yml: conda env create -f environment.yml. This will create an environment named raifle and install all necessary packages.
Activate the environment: conda activate raifle.

Download Data:

Download the following datasets into the dataset folder and make sure to put the content in the specified subfolders (case-sensitive):

Recommendation:
- MovieLens-100K: Download the 100K version from here and extract into folder ML-100K.
- Steam-200K: Download the Kaggle .csv file from here. You will need to sign up for a Kaggle account. Put the .csv file in folder STEAM-200K.
Learning to Rank:
- LETOR: Download MQ2007.rar from here. (Microsoft link.) Extract to folder MQ2007. (Optional: Do the same for MQ2008.rar if you want to test on this dataset)
- MSLR: Download the zip file MSLR-WEB10K.zip. (Microsoft link.) Unzip the content into folder MSLR-WEB10K.
ImageNet:
- Sign up for an account at ImageNet and obtain permissions to download data.
- Once you have permission, go to ILSVRC 2012 and download the "Development kit (Task 1 & 2)" and the "Validation images (all tasks)". You do not need to extract, just put the compressed files into folder ImageNet.

Explanation of Code Files:

All of our code are in the code folder.

attack.py: Our implementation of the RAIFLE attack.
dataset.py: Code to load and preprocess datasets.
ranker.py: Our implementation of the FPDGD algorithm and the FNCF algorithm.
utils.py: Contains code for metrics, differential privacy, and click models.
experiment_rec.ipynb: Experiments for Federated Recommender Systems.
experiment_ltr.ipynb: Experiments for Federated Online Learning to Rank (FOLTR) with LETOR and MSLR data.
experiment_ltr_cv.ipynb: Experiments for FOLTR with image-based data from ImageNet.
raifle_ltr_cv_colab.ipynb: A Colab notebook modified from experiment_ltr_cv.ipynb in case a GPU is not available.
plot.ipynb: Code for generating various plots and tables.

Reproducing Our Results:

All of our experiment code are in IPython Jupyter notebooks. We recommend using Visual Studio Code so that you can interactively run/modify our code.

Main results (Section VI) + Local DP results (Section VII.A):
- Federated recommendation (Section VI.A):
  - Run all cells in experiment_rec.ipynb in order (default dataset is MovieLens-100K).
  - Cell #2 contains instructions on how to change the dataset. For artifact evaluation, we scaled down the number of users attacked to 30 (a full run on MovieLens can take more than 1 day).
  - After cell #3 is done, the results are saved to disk and also printed out. The name column describes the configuration in the format FNCF_eps_{epsilon}_IMIA_{reg_factor}, where {epsilon} refers to to the local DP epsilon parameter (inf means no privacy). IMIA_0.0 means the IMIA defense is not applied, IMIA_1.0 means the IMIA defense is applied with L1 regularization factor 1.0.
- Federated OLTR with MQ2007 and MSLR-WEB10K (Section VI.B):
  - Run cell #1, 2, and 3 in experiment_ltr.ipynb.
  - Cell #2 contains instructions on how to change the dataset and other configs. For artifact evaluation, the default configuration is a linear ranker + a neural net ranker with 16 hidden units, MQ2007 dataset, and 16 queries per user (a full run on MQ2007 can take more than 1 day, MSLR-WEB10K is much longer).
  - After cell #3 is done, the results are saved to disk and also printed out. The name column describes the config in the format {model_name}_{click_model_name}_{num_query}_query_eps_{epsilon}_{key}, where {model_name} is either 'linear_pdgd' or 'neural_16_pdgd', {click_model_name} is either 'informational' or 'navigational', {num_query} is the number of queries per user (e.g., 16), {epsilon} is the local DP epsilon (inf means no privacy), and {key} is either 0.0 or 1.0, where 0.0 means no manipulation and 1.0 means full manipulation.
- FOLTR with ImageNet (Section VI.C):
  - If you have a GPU:
    - Run cell #1, 2, 3, and 4 in experiment_ltr_cv.ipynb. The default configuration (scaled down for artifact evaluation) is ResNet18 as feature extractor, 30 rounds of simulation, and 5,000 images.
    - Cell #2 contains instructions on how to change the feature extractor.
    - Cell #3 generates the manipulated images. You may need to adjust the batch size depending on how much GPU memory is available, e.g., 128 if 8GB, 256 if 12GB or more.
    - After cell #4 is done, the results are saved to disk and also printed out. The name column describes the config in the format {model_name}_{num_items}_items_eps_{epsilon}_{key}, where {num_items} is 512, 1024, or 2048 (assuming ResNet18), {epsilon} is the local DP epsilon (inf means no privacy), and {key} is 'no_adm' (no manipulation) or 'adm_opt' (RAIFLE).
    - If you want to run FGSM, see the instructions in cell #4, it will take quite a bit longer since it optimizes the images. The {key} will be adm_FGSM_0.1.
  - If you don't have a GPU: GPU is only necessary for cell #3 to generate the manipulated images. The reconstruction is decently fast without GPU (unless you also want FGSM).
    - You can use Google Colab with the T4 GPU as a free alternative. You will need to download the ImageNet data to Colab (e.g., upload to Google Drive then download from Drive to Colab). Use our notebook raifle_ltr_cv_colab.ipynb which has been modified from experiment_ltr_cv.ipynb to work with Colab.
Additional discussion/appendix results:
- FL Utility vs DP (Table X of Section VII.A, Section VIII.A):
  - Run cell #1, 2, and 4 of experiment_ltr.ipynb. Make sure to change the dataset to MSLR-WEB10K to see the results in our paper.
- RAIFLE vs Secure aggregation + DP (distributed DP) (Table XI of Section VII.B):
  - Run cell #1, 2, and 5 of experiment_ltr.ipynb. Make sure to change the dataset to MQ2007 to see the results in our paper.
- t-SNE visualization:
  - Run cell #1, 2, and 6 of experiment_ltr.ipynb. Make sure to change the dataset to MSLR-WEB10K to see the results in our paper.
  - Run cell #1, 2, and 5 of experiment_ltr_cv.ipynb.
- Constrained server capability:
  - Run cell #1, 2, and 3 in experiment_ltr.ipynb. Make sure to set alphas = [0.5, 0.75] in cell #3, as this controls the % of features that the server can manipulate.
- Manipulated image quality:
  - Run cell #1, 2, and 6 of experiment_ltr_cv.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
code		code
dataset		dataset
output		output
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAIFLE: Reconstruction Attack on Interaction-based Federated Learning with Adversarial Data Manipulation

Requirements:

Installation:

Download Data:

Explanation of Code Files:

Reproducing Our Results:

About

Releases

Packages

Languages

License

dzungvpham/raifle

Folders and files

Latest commit

History

Repository files navigation

RAIFLE: Reconstruction Attack on Interaction-based Federated Learning with Adversarial Data Manipulation

Requirements:

Installation:

Download Data:

Explanation of Code Files:

Reproducing Our Results:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages