Skip to content

alessiomora/fl_algorithms_non_iid_2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Federated Learning Algorithms with Heterogeneous Data Distributions

This repository contains TensorFlow codes to simulate different FL algorithms to limit the degradation introduced by non-IID data distributions.

Algorithms

List of implemented algorithms in this repo:

  • FedAvg [0]
  • FedProx [1]
  • FedGKD [2]
  • FedNTD [3]
  • FedMLB [4]
  • FedDyn [5]
  • MOON [6]
  • FedAvgM [7]
  • FedAdam [8]
  • FedSAM [9]
  • FedFA [10]
  • FedMAX [11]
  • FedLC/DMFL [12, 13]

Default hyperparameters.

Similar to [4, 5], in all the experiments, SGD with learning rate fixed to 0.1 is used as local optimizer, and the global learning rate is set to 1.0, except for FedAdam which used 0.01 for both local and global learning rate. Local epochs are fixed to 5 and a random fraction of 0.05 (5 %) clients is selected per round.Weight decay with a factor of 0.001 is applied to avoid local overfitting. Local epochs are fixed to 5. Local batch size is determined so that each client performs 50 local updates. Gradient clipping is performed to stabilize local training. Local learning rate is exponentially decayed with a factor of 0.998 similar to the work in [4, 5]. The model architecture used in our experiments is ResNet-18, but replacing the batch normalization layer with group normalization. We used random rotation, horizontal flip and random crop as preprocessing layers. For fair comparison, seeds are used to select random client at each round, to perform local data preprocessing, and to initialize client models.

Algorithm-specific hyperparameters.

  • For FedProx we tuned $\mu$ in {0.01, 0.001. $\mu$ controls the weight of the proximal term in the local objective function.
  • For FedGKD we set $\gamma$ to 0.2, as in the original paper. $\gamma$ controls the weight of the KD-based term in the local objective function.
  • For FedNTD we selected $\lambda$ in {0.3, 1.0}.
  • For FedMLB $\lambda_1$ and $\lambda_2$ are both set to 1. $\lambda_1$ and $\lambda_2$ weight the impact of the hybrid cross-entropy loss and the KD-based loss. 5 blocks are considered, formed as in the original paper, where conv1, conv2_x, conv3_x, conv4_x, conv5_x and the fully connected layer constitutes a single block.
  • For FedAvgM we selected the momentum parameter among {0.4, 0.6, 0.8, 0.9}.
  • For FedAdam we set $\tau$ (a constant for numerical stability) equal to 0.001.
  • For FedDyn we set $\alpha$ equal to 0.1 as in the original paper.

Dataset and Model Architecture

cifar100 and ResNet-18.

Data partitioning

The CIFAR100 dataset is partitioned following the paper Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification: a Dirichlet distribution is used to decide the per-client label distribution. A concentration parameter controls the identicalness among clients. Very high values for such a concentration parameter, alpha in the code, (e.g., > 100.0) imply an identical distribution of labels among clients, while low (e.g., 1.0) values produce a very different amount of examples for each label in clients, and for very low values (e.g., 0.1) all the client's examples belong to a single class.

Instructions

fed_resnet18.py contains the simulations code for all the algorithms. Hyperparameters can be choosen by manually modifying the hp dictionary. A simulation of each combination of hyperparameters will be run.

The client-side algorithms (FedProx, FedGKD, FedNTD, FedMLB, MOON, FedDyn, etc..) are implemented by subclassing the tf.keras.Model class, and overwriting the train_step and test_step methods. For FedSAM we employ the TensorFlow implementation of SAM. For FedFA we defined a custom layer FFALayer by subclassing the keras.layers.Layer class.

Creating a virtual environment with venv

python3 -m venv fd_env

source fd_env/bin/activate

pip install -r requirements.txt

The code has been tested with python==3.8.10.

Note: to run FedDyn and FedSAM algorithms, tf == 2.11.0 or above is needed, but it has to be changed manually. We did not include it in the requirements by default because that version has still memory leaks as I pointed out here.

Creating partitioned CIFAR100

Before running fed_resnet18.py, the partitioned CIFAR100 dataset must be generated by executing dirichlet_partition.py. The script will create a cifar100_alpha folder inside the current directory. This directory will contain a folder for each client with their examples.

When possible, the dirichlet_partition.py will create disjoint dataset for clients.

Logging

fed_resnet18.py produces tensorboard logging files with global model test accuracy.

References

[1] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith. “Federated optimization in heterogeneous networks”. arXiv preprint arXiv:1812.06127, 2018.

[2] D. Yao, W. Pan, Y. Dai, Y. Wan, X. Ding, H. Jin, Z. Xu, and L. Sun. “Local- Global Knowledge Distillation in Heterogeneous Federated Learning with Non- IID Data”. arXiv preprint arXiv:2107.00051, 2021.

[3] G. Lee,M.Jeong,Y. Shin, S. Bae, and S.-Y. Yun. “Preservation of the GlobalKnowledge by Not-True Distillation in Federated Learning”. In: Advances in Neural Information Processing Systems. 2022.

[4] J. Kim, G. Kim, and B. Han. “Multi-Level Branched Regularization for Federated Learning”. In: International Conference onMachine Learning.PMLR.2022, pp. 11058– 11073.

[5] D. A. E. Acar, Y. Zhao, R. M. Navarro, M. Mattina, P. N. Whatmough, and V. Saligrama. “Federated learning based on dynamic regularization”. arXiv preprint arXiv:2111.04263, 2021.

[6] Q. Li, B. He, andD. Song.“Model-contrastive federated learning”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, pp. 10713–10722.

[7] T.-M. H. Hsu, H. Qi, and M. Brown.“Measuring the effects of non-identical data distribution for federated visual classification”. arXiv preprint arXiv:1909.06335, 2019.

[8] S. Reddi, Z. Charles,M. Zaheer, Z. Garrett, K. Rush, J. Konečny` , S. Kumar, and H. B.McMahan. “Adaptive Federated Optimization”. arXiv preprint arXiv:2003.00295, 2020.

[9] Caldarola, D., Caputo, B., Ciccone, M., 2022. Improving generalization in federated learning by seeking flat minima, in: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIII, Springer. pp. 654– 672.

[10] Zhou, T., Konukoglu, E., 2023. Fedfa: Federated feature augmentation. arXiv preprint arXiv:2301.12995.

[11] W. Chen, K. Bhardwaj, and R. Marculescu, “Fedmax: Mitigating Activation Divergence for Accurate and Communication-efficient Federated Learning,” in Proc. of Machine Learning and Knowledge Discovery in Databases: European Conference. Springer, 2021, pp. 348–363.

[12] J. Zhang, Z. Li, B. Li, J. Xu, S. Wu, S. Ding, and C. Wu, “Federated Learning with Label Distribution Skew via Logits Calibration,” in Proc. of International Conference on Machine Learning. PMLR, 2022, pp. 26 311–26 329.

[13] X. Ran, L. Ge, and L. Zhong, “Dynamic margin for federated learning with imbalanced data,” in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages