Model Discovery and Distillation (MDD)

An Federated Learning implementation which is based on FLASH framework designed to Simulate Heterogeneity-Aware Federated Learning
an updated version of FLASH can be found: HERE

How to run it

Installing

Pip packages

git clone https://github.com/ahmedcs/mdd.git
pip3 install -r requirements.txt
# download data, modify code if needed, refer to Chapter.Dataset for more details

Conda environment

git clone https://github.com/ahmedcs/mdd.git
cd ibex
bash create-conda-env.sh
# download data, modify code if needed, refer to Chapter.Dataset for more details

results in paper

For details on the experimental results, please refer to our paper.

The experimental configs are stored a folder for each data set in models/exp_config.

To run your own experiment, you can just modify the models/exp_config/${dataset}/default.cfg and then run python main.py --config ${path_to_config}.

What is MDD?

Briefly speaking, we develop MDD to experiment with Model Discovery and Distillation within the federated learning simulation environment.

Deadline

We add deadline setting for simulating failed downloading/uploading and time out training. Now deadline follows a normal distribution in each round, and each client has the same deadline in one round. You can set the deadline's normal distribution parameters in the config file.

Heterogeneity

Hardware Heterogeneity

Each client is bundled with a device type. Each device type has different training speeds and network speeds. We also support self-defined device type(-1) whose parameter you can set in the code manually for more complexed simulation. Note that if a client's device is not specified i.e. None, the program will use real training time instead of the simulation time, which is not recommended.

The source code for measure the on-device training time is available in the android directory

Behavior Heterogeneity

Each client is bundled with a timer, which is bundled with one trace. Timer gets the available time according to google definition. FLASH will run in ideal mode if trace file is not found or behav_hete is set to False

Data Heterogeneity

data in each client is non-i.i.d
you can set max_sample to control the max sample number in each client

Round Failure

In federated settings, if there are not enough devices to upload the results in a round, then this round will be regarded as a failed round and the global model will not be updated. To simulate it, we add a update_frac parameter. If the uploaded fraction is smaller than update_frac, then this round will fail. You can also set it in the config file.

Config

To simplify the command line arguments, we move most of the parameters to a config file. Also, we add some other parameters as put above for better simulation. Here are some details.

# line started with # (commented) will be ignored
behav_hete True
# bool, whether to simulate behavior heterogeneity
hard_hete True
# bool, whether to simulate hardware heterogeneity, which contains differential on-device training time and network speed
no_training False
# bool, whether to run in no_training mode, skip training process if True
real_world False
# bool, whether to run read-world DL dataset
dataset femnist
# dataset to use
model cnn
# file that defines the DNN model
num_rounds 500
# number of FL rounds to run
learning_rate 0.01
# learning-rate of DNN
eval_every 5
# evaluate every # rounds, -1 for not evaluate
clients_per_round 100
# expected clients in each round
min_selected 60
# min selected clients number in each round, fail if not satisfied
max_sample 340
# number of max sampleto use in each selected client
batch_size 10
# batch-size for training
num_epochs 5
# number epochs imn each client in each round
seed 0
# basic random seed
round_ddl 270 0
# μ and σ for deadline, which follows a normal distribution
update_frac 0.8  
# min update fraction in each round, round succeeds only when fraction of succeeded client not less than #
max_client_num -1
#

# NOTE! [aggregate_algorithm, fedprox*, structure_k, qffl*] is mutually-exclusive
aggregate_algorithm SucFedAvg
## choose in [SucFedAvg, FedAvg], please refer to models/server.py for more details
# compress_algo grad_drop
## gradiant compress algorithm, choose in [grad_drop, sign_sgd], not use if commented
fedprox True
fedprox_mu 0.5
fedprox_active_frac 0.8
## whether to apply fedprox and params needed, please refer to the sysml'20 for more details
# structure_k 100
## the k for structured update, not use if commented, please refer to the arxiv for more
# qffl True
# qffl_q 5
## whether to apply qffl and params needed, please refer to the ICLR'20 for more

Datasets

DL Datasets

FEMNIST

Overview: Image Dataset
Details: 62 different classes (10 digits, 26 lowercase, 26 uppercase), images are 28 by 28 pixels (with option to make them all 128 by 128 pixels), 3500 users
Task: Image Classification

Celeba

Overview: Image Dataset based on the Large-scale CelebFaces Attributes Dataset
Details: 9343 users (we exclude celebrities with less than 5 images)
Task: Image Classification (Smiling vs. Not smiling)

Reddit

Overview: We preprocess the Reddit data released by pushshift.io corresponding to December 2017.
Details: 1,660,820 users with a total of 56,587,343 comments.
Task: Next-word Prediction.

User Behavior Trace

you can download the user behavior trace data here.
modify the file path in models/client.py,

i.e. with open('/path/to/user_behavior_trace.json', 'r', encoding='utf-8') as f:
The trace tracks the device’s meta information and its status changes, including battery charge status, battery level, network environment, screen lock status, and screen on and off. (See more details in our manuscript.)

On-device Training

the code we used to measure the on-device training time is in OnDeviceTraining directory

please refer to the doc for more details

Commands used

Running the experiment on IBEX

bash ibex/submit_exp.sh 10:59:59 exp_config/uncompleted_runs shakespeare "" 0

Generating the missing runs from WANDB

python regenerate_uncompleted_runs.py 0 490 17750 femnist celeba shakespeare reddit sent140

Notes

Install the libraries listed in requirements.txt
- i.e. with pip: run pip3 install -r requirements.txt
Go to directory of respective dataset data/$DATASET for instructions on generating data
please consider to cite our paper if you use the code or data in your research project

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
models		models
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Discovery and Distillation (MDD)

How to run it

Installing

Pip packages

Conda environment

results in paper

What is MDD?

Deadline

Heterogeneity

Hardware Heterogeneity

Behavior Heterogeneity

Data Heterogeneity

Round Failure

Config

Datasets

DL Datasets

FEMNIST

Celeba

Reddit

User Behavior Trace

On-device Training

Commands used

Notes

About

Releases

Packages

Languages

License

ahmedcs/MDD

Folders and files

Latest commit

History

Repository files navigation

Model Discovery and Distillation (MDD)

How to run it

Installing

Pip packages

Conda environment

results in paper

What is MDD?

Deadline

Heterogeneity

Hardware Heterogeneity

Behavior Heterogeneity

Data Heterogeneity

Round Failure

Config

Datasets

DL Datasets

FEMNIST

Celeba

Reddit

User Behavior Trace

On-device Training

Commands used

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages