This repository contains TensorFlow codes to simulate different FL algorithms to limit the degradation introduced by non-IID data distributions.
List of implemented algorithms in this repo:
- FedAvg [0]
- FedProx [1]
- FedGKD [2]
- FedNTD [3]
- FedMLB [4]
- FedDyn [5]
- MOON [6]
- FedAvgM [7]
- FedAdam [8]
- FedSAM [9]
- FedFA [10]
- FedMAX [11]
- FedLC/DMFL [12, 13]
Default hyperparameters.
Similar to [4, 5], in all the experiments, SGD with learning rate fixed to 0.1 is used as local optimizer, and the global learning rate is set to 1.0, except for FedAdam which used 0.01 for both local and global learning rate. Local epochs are fixed to 5 and a random fraction of 0.05 (5 %) clients is selected per round.Weight decay with a factor of 0.001 is applied to avoid local overfitting. Local epochs are fixed to 5. Local batch size is determined so that each client performs 50 local updates. Gradient clipping is performed to stabilize local training. Local learning rate is exponentially decayed with a factor of 0.998 similar to the work in [4, 5]. The model architecture used in our experiments is ResNet-18, but replacing the batch normalization layer with group normalization. We used random rotation, horizontal flip and random crop as preprocessing layers. For fair comparison, seeds are used to select random client at each round, to perform local data preprocessing, and to initialize client models.
Algorithm-specific hyperparameters.
- For FedProx we tuned
$\mu$ in {0.01, 0.001.$\mu$ controls the weight of the proximal term in the local objective function. - For FedGKD we set
$\gamma$ to 0.2, as in the original paper.$\gamma$ controls the weight of the KD-based term in the local objective function. - For FedNTD we selected
$\lambda$ in {0.3, 1.0}. - For FedMLB
$\lambda_1$ and$\lambda_2$ are both set to 1.$\lambda_1$ and$\lambda_2$ weight the impact of the hybrid cross-entropy loss and the KD-based loss. 5 blocks are considered, formed as in the original paper, where conv1, conv2_x, conv3_x, conv4_x, conv5_x and the fully connected layer constitutes a single block. - For FedAvgM we selected the momentum parameter among {0.4, 0.6, 0.8, 0.9}.
- For FedAdam we set
$\tau$ (a constant for numerical stability) equal to 0.001. - For FedDyn we set
$\alpha$ equal to 0.1 as in the original paper.
cifar100 and ResNet-18.
The CIFAR100 dataset is partitioned following the paper Measuring the Effects of Non-Identical Data
Distribution for Federated Visual Classification: a Dirichlet distribution is used to decide the per-client label distribution.
A concentration parameter controls the identicalness among clients. Very high values for such a concentration parameter, alpha
in the code, (e.g., > 100.0) imply an identical distribution of labels among clients,
while low (e.g., 1.0) values produce a very different amount of examples for each label in clients, and for very low values (e.g., 0.1) all the client's examples belong to a single class.
fed_resnet18.py
contains the simulations code for all the algorithms.
Hyperparameters can be choosen by manually modifying the hp
dictionary. A simulation of each combination of hyperparameters will be run.
The client-side algorithms (FedProx, FedGKD, FedNTD, FedMLB, MOON, FedDyn, etc..) are implemented by subclassing the
tf.keras.Model
class, and overwriting the train_step
and test_step
methods.
For FedSAM we employ the TensorFlow implementation of SAM.
For FedFA we defined a custom layer FFALayer
by subclassing the keras.layers.Layer
class.
python3 -m venv fd_env
source fd_env/bin/activate
pip install -r requirements.txt
The code has been tested with python==3.8.10
.
Note: to run FedDyn and FedSAM algorithms, tf == 2.11.0
or above is needed, but it has to be changed manually.
We did not include it in the requirements by default because that version has still memory leaks
as I pointed out here.
Before running fed_resnet18.py
, the partitioned CIFAR100 dataset must be generated by executing dirichlet_partition.py
.
The script will create a cifar100_alpha
folder inside the current directory. This directory will
contain a folder for each client
with their examples.
When possible, the dirichlet_partition.py
will create disjoint dataset for clients.
fed_resnet18.py
produces tensorboard logging files with global model test accuracy.
[1] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith. “Federated optimization in heterogeneous networks”. arXiv preprint arXiv:1812.06127, 2018.
[2] D. Yao, W. Pan, Y. Dai, Y. Wan, X. Ding, H. Jin, Z. Xu, and L. Sun. “Local- Global Knowledge Distillation in Heterogeneous Federated Learning with Non- IID Data”. arXiv preprint arXiv:2107.00051, 2021.
[3] G. Lee,M.Jeong,Y. Shin, S. Bae, and S.-Y. Yun. “Preservation of the GlobalKnowledge by Not-True Distillation in Federated Learning”. In: Advances in Neural Information Processing Systems. 2022.
[4] J. Kim, G. Kim, and B. Han. “Multi-Level Branched Regularization for Federated Learning”. In: International Conference onMachine Learning.PMLR.2022, pp. 11058– 11073.
[5] D. A. E. Acar, Y. Zhao, R. M. Navarro, M. Mattina, P. N. Whatmough, and V. Saligrama. “Federated learning based on dynamic regularization”. arXiv preprint arXiv:2111.04263, 2021.
[6] Q. Li, B. He, andD. Song.“Model-contrastive federated learning”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, pp. 10713–10722.
[7] T.-M. H. Hsu, H. Qi, and M. Brown.“Measuring the effects of non-identical data distribution for federated visual classification”. arXiv preprint arXiv:1909.06335, 2019.
[8] S. Reddi, Z. Charles,M. Zaheer, Z. Garrett, K. Rush, J. Konečny` , S. Kumar, and H. B.McMahan. “Adaptive Federated Optimization”. arXiv preprint arXiv:2003.00295, 2020.
[9] Caldarola, D., Caputo, B., Ciccone, M., 2022. Improving generalization in federated learning by seeking flat minima, in: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIII, Springer. pp. 654– 672.
[10] Zhou, T., Konukoglu, E., 2023. Fedfa: Federated feature augmentation. arXiv preprint arXiv:2301.12995.
[11] W. Chen, K. Bhardwaj, and R. Marculescu, “Fedmax: Mitigating Activation Divergence for Accurate and Communication-efficient Federated Learning,” in Proc. of Machine Learning and Knowledge Discovery in Databases: European Conference. Springer, 2021, pp. 348–363.
[12] J. Zhang, Z. Li, B. Li, J. Xu, S. Wu, S. Ding, and C. Wu, “Federated Learning with Label Distribution Skew via Logits Calibration,” in Proc. of International Conference on Machine Learning. PMLR, 2022, pp. 26 311–26 329.
[13] X. Ran, L. Ge, and L. Zhong, “Dynamic margin for federated learning with imbalanced data,” in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8