Skip to content

Latest commit

 

History

History
 
 

flanders

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
title url labels dataset
Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection
robustness
model poisoning
anomaly detection
autoregressive model
regression
classification
MNIST
FashionMNIST

FLANDERS: Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection

Note: If you use this baseline in your work, please remember to cite the original authors of the paper as well as the Flower paper.

Paper: arxiv.org/abs/2303.16668

Authors: Edoardo Gabrielli, Gabriele Tolomei, Dimitri Belli, Vittorio Miori

Abstract: Current defense mechanisms against model poisoning attacks in federated learning (FL) systems have proven effective up to a certain threshold of malicious clients. In this work, we introduce FLANDERS, a novel pre-aggregation filter for FL resilient to large-scale model poisoning attacks, i.e., when malicious clients far exceed legitimate participants. FLANDERS treats the sequence of local models sent by clients in each FL round as a matrix-valued time series. Then, it identifies malicious client updates as outliers in this time series by comparing actual observations with estimates generated by a matrix autoregressive forecasting model maintained by the server. Experiments conducted in several non-iid FL setups show that FLANDERS significantly improves robustness across a wide spectrum of attacks when paired with standard and robust existing aggregation methods.

About this baseline

What’s implemented: The code in this directory replicates the results of FLANDERS+[baseline] on MNIST and Fashion-MNIST under all attack settings: Gaussian, LIE, OPT, and AGR-MM; with $r=[0.2,0.6,0.8]$ (i.e., the fraction of malicious clients), specifically about tables 1, 3, 10, 11, 15, 17, 19, 20 and Figure 3.

Datasets: MNIST, FMNIST

Hardware Setup: AMD Ryzen 9, 64 GB RAM, and an NVIDIA 4090 GPU with 24 GB VRAM.

Estimated time to run: You can expect to run experiments on the given setup in 2m with MNIST and 3m with Fashion-MNIST, without attacks. With an Apple M2 Pro, 16gb RAM, each experiment with 10 clients for MNIST runs in about 24 minutes. Note that experiments with OPT (fang) and AGR-MM (minmax) can be up to 5x times slower.

Contributors: Edoardo Gabrielli, Sapienza University of Rome (GitHub, Scholar)

Experimental Setup

Please, checkout Appendix F and G of the paper for a comprehensive overview of the hyperparameters setup, however here's a summary.

Task: Image classification

Models:

MNIST (multilabel classification, fully connected, feed forward NN):

  • Multilevel Perceptron (MLP)
  • minimizing multiclass cross-entropy loss using Adam optimizer
  • input: 784
  • hidden layer 1: 128
  • hidden layer 2: 256

Fashion-MNIST (multilabel classification, fully connected, feed forward NN):

  • Multilevel Perceptron (MLP)
  • minimizing multiclass cross-entropy loss using Adam optimizer
  • input: 784
  • hidden layer 1: 256
  • hidden layer 2: 128
  • hidden layer 3: 64

Dataset: Every dataset is partitioned into two disjoint sets: 80% for training and 20% for testing. The training set is distributed across all clients (100) by using the Dirichlet distribution with $\alpha=0.5$, simulating a high non-i.i.d. scenario, while the testing set is uniform and held by the server to evaluate the global model.

Description Default Value
Partitions 100
Evaluation centralized
Training set 80%
Testing set 20%
Distribution Dirichlet
$\alpha$ 0.5

Training Hyperparameters:

Dataset # of clients Clients per round # of rounds Batch size Learning rate Optimizer Dropout Alpha Beta # of clients to keep Sampling
MNIST 100 100 50 32 $10^{-3}$ Adam 0.2 0.0 0.0 $m - b$ 500
FMNIST 100 100 50 32 $10^{-3}$ Adam 0.2 0.0 0.0 $m - b$ 500

Where $m$ is the number of clients partecipating during n-th round and $b$ is the number of malicious clients. The variable $sampling$ identifies how many parameters MAR analyzes.

Environment Setup

# Use a version of Python >=3.9 and <3.12.0.
pyenv local 3.10.12
poetry env use 3.10.12

# Install everything from the toml
poetry install

# Activate the env
poetry shell

Running the Experiments

Ensure that the environment is properly set up, then run:

python -m flanders.main

To execute a single experiment with the default values in conf/base.yaml.

To run custom experiments, you can override the default values like that:

python -m flanders.main dataset=mnist server.attack_fn=lie server.num_malicious=1

To run multiple custom experiments:

python -m flanders.main --multirun dataset=mnist,fmnist server.attack_fn=gaussian,lie,fang,minmax server.num_malicious=0,1,2,3,4,5

Expected Results

To run all the experiments of the paper (for MNIST and Fashion-MNIST), I've set up a script:

sh run.sh

This code will produce the output in the file outputs/all_results.csv. To generate the plots and tables displayed below, you can use the notebook in the plotting/ directory.

Accuracy over multiple rounds

(left) MNIST, FLANDERS+FedAvg with 80% of malicious clients (b = 80); (right) Vanilla FedAvg in the same setting:

acc_over_rounds

Precision and Recall of FLANDERS

b = 20:

alt text

b = 60:

alt text

b = 80:

alt text

Accuracy w.r.t. number of attackers:

b = 0:

alt text


b = 20:

alt text


b = 60:

alt text


b = 80:

alt text