Skip to content

Latest commit

 

History

History
161 lines (103 loc) · 6.01 KB

README.md

File metadata and controls

161 lines (103 loc) · 6.01 KB
title url labels dataset
Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection
robustness
model poisoning
anomaly detection
autoregressive model
regression
classification
MNIST
FashionMNIST

FLANDERS: Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection

Note: If you use this baseline in your work, please remember to cite the original authors of the paper as well as the Flower paper.

Paper: arxiv.org/abs/2303.16668

Authors: Edoardo Gabrielli, Gabriele Tolomei, Dimitri Belli, Vittorio Miori

Abstract: Current defense mechanisms against model poisoning attacks in federated learning (FL) systems have proven effective up to a certain threshold of malicious clients. In this work, we introduce FLANDERS, a novel pre-aggregation filter for FL resilient to large-scale model poisoning attacks, i.e., when malicious clients far exceed legitimate participants. FLANDERS treats the sequence of local models sent by clients in each FL round as a matrix-valued time series. Then, it identifies malicious client updates as outliers in this time series by comparing actual observations with estimates generated by a matrix autoregressive forecasting model maintained by the server. Experiments conducted in several non-iid FL setups show that FLANDERS significantly improves robustness across a wide spectrum of attacks when paired with standard and robust existing aggregation methods.

About this baseline

What’s implemented: The code in this directory replicates the results of FLANDERS+[baseline] on MNIST and Fashion-MNIST under all attack settings: Gaussian, LIE, OPT, and AGR-MM; with $r=[0.2,0.6,0.8]$ (i.e., the fraction of malicious clients), specifically about tables 1, 3, 10, 11, 15, 17, 19, 20 and Figure 3.

Datasets: MNIST, FMNIST

Hardware Setup: AMD Ryzen 9, 64 GB RAM, and an NVIDIA 4090 GPU with 24 GB VRAM.

Estimated time to run: You can expect to run experiments on the given setup in 2m with MNIST and 3m with Fashion-MNIST, without attacks. With an Apple M2 Pro, 16gb RAM, each experiment with 10 clients for MNIST runs in about 24 minutes. Note that experiments with OPT (fang) and AGR-MM (minmax) can be up to 5x times slower.

Contributors: Edoardo Gabrielli, Sapienza University of Rome (GitHub, Scholar)

Experimental Setup

Please, checkout Appendix F and G of the paper for a comprehensive overview of the hyperparameters setup, however here's a summary.

Task: Image classification

Models:

MNIST (multilabel classification, fully connected, feed forward NN):

  • Multilevel Perceptron (MLP)
  • minimizing multiclass cross-entropy loss using Adam optimizer
  • input: 784
  • hidden layer 1: 128
  • hidden layer 2: 256

Fashion-MNIST (multilabel classification, fully connected, feed forward NN):

  • Multilevel Perceptron (MLP)
  • minimizing multiclass cross-entropy loss using Adam optimizer
  • input: 784
  • hidden layer 1: 256
  • hidden layer 2: 128
  • hidden layer 3: 64

Dataset: Every dataset is partitioned into two disjoint sets: 80% for training and 20% for testing. The training set is distributed across all clients (100) by using the Dirichlet distribution with $\alpha=0.5$, simulating a high non-i.i.d. scenario, while the testing set is uniform and held by the server to evaluate the global model.

Description Default Value
Partitions 100
Evaluation centralized
Training set 80%
Testing set 20%
Distribution Dirichlet
$\alpha$ 0.5

Training Hyperparameters:

Dataset # of clients Clients per round # of rounds Batch size Learning rate Optimizer Dropout Alpha Beta # of clients to keep Sampling
MNIST 100 100 50 32 $10^{-3}$ Adam 0.2 0.0 0.0 $m - b$ 500
FMNIST 100 100 50 32 $10^{-3}$ Adam 0.2 0.0 0.0 $m - b$ 500

Where $m$ is the number of clients partecipating during n-th round and $b$ is the number of malicious clients. The variable $sampling$ identifies how many parameters MAR analyzes.

Environment Setup

# Use a version of Python >=3.9 and <3.12.0.
pyenv local 3.10.12
poetry env use 3.10.12

# Install everything from the toml
poetry install

# Activate the env
poetry shell

Running the Experiments

Ensure that the environment is properly set up, then run:

python -m flanders.main

To execute a single experiment with the default values in conf/base.yaml.

To run custom experiments, you can override the default values like that:

python -m flanders.main dataset=mnist server.attack_fn=lie server.num_malicious=1

To run multiple custom experiments:

python -m flanders.main --multirun dataset=mnist,fmnist server.attack_fn=gaussian,lie,fang,minmax server.num_malicious=0,1,2,3,4,5

Expected Results

To run all the experiments of the paper (for MNIST and Fashion-MNIST), I've set up a script:

sh run.sh

This code will produce the output in the file outputs/all_results.csv. To generate the plots and tables displayed below, you can use the notebook in the plotting/ directory.

Accuracy over multiple rounds

(left) MNIST, FLANDERS+FedAvg with 80% of malicious clients (b = 80); (right) Vanilla FedAvg in the same setting:

acc_over_rounds

Precision and Recall of FLANDERS

b = 20:

alt text

b = 60:

alt text

b = 80:

alt text

Accuracy w.r.t. number of attackers:

b = 0:

alt text


b = 20:

alt text


b = 60:

alt text


b = 80:

alt text