Jeddak provides a both academia- and industry-oriented platform for privacy computing and federated learning.
This is a competition-oriented lite version of Jeddak. Three guides for deploy, develop and use, respectively are provided below.
Jeddak provides two deployment modes: standalone and cluster, where standalone mode is for fast experimental verifications of new algorithms over a single host, and cluster mode supports production in real multi-host applications. Note that the competition is conducted over the cluster mode.
Refer to doc/guide/quickstart.md
for deployment guide.
Jeddak provides standardized interfaces for developing your own federated learning and privacy-preserving algorithms.
Refer to doc/guide/develop_guide.md
for more details.
Jeddak provides a series of developed privacy-preserving algorithms
as described in the following table. For this lite version, a limited number of such algorithms are mainly for the purpose of demonstration. Their configurations can be found
at example/conf/
.
Algorithm Name | Classification | Description |
---|---|---|
data_loader | Preprocessing | Read data from various data sources |
data_saver | Postprocessing | Save data to disk in various data structures |
aligner | Preprocessing | Seek the intersection of the private sets held by multiple parties in a privacy-preserving fashion |
glm | Federated Learning | A set of generalized linear models, including linear regression, logistic regression and poisson regression |
dpgbdt | Federated Learning | Differentially Private Gradient Boosting Decision Tree |
neural_network | Federated Learning | Deep Neural Network |
evaluate | Postprocessing | Evaluate a federated learning model |
model_loader | Postprocessing | Load model from local file / unload model from memory |
predict_offline | Postprocessing | Offline prediction through specified model |
Parameter | Type | Range | Default | Description |
---|---|---|---|---|
task_type | str | "data_loader" | "data_loader" | task type |
task_role | str | {"guest", "host", "sole", "slack"} | "guest" | task role. "guest/host" means party's role in a task. "sole" means only this party carries out the task. "slack" means the party does nothing in the task |
input_data_source | str | {"csv", "hdfs"} | "csv" | type of input data source. "csv" means local files and "hdfs" means a file path of Hadoop HDFS. |
input_data_path | str | any strings | N/A | file path of input data which is valid and readable |
train_data_path | str | any strings | N/A | file path of train data which is valid and readable, if not, will get from input_data_path. |
validate_data_path | str | any strings | N/A | file path of validate data which is valid and readable |
convert_sparse_to_index | bool | {true, false} | true | convert sparse features to natural numbers if true |
Parameter | Type | Range | Default | Description |
---|---|---|---|---|
task_type | str | "data_saver" | "data_saver" | task type |
task_role | str | {"guest", "host", "sole", "slack"} | "guest" | task role |
output_data_source | str | {"csv"} | "csv" | type of output data source |
Parameter | Type | Range | Default | Description |
---|---|---|---|---|
task_type | str | "aligner" | "aligner" | task type |
task_role | str | {"guest", "host"} | "guest" | task role |
align_mode | str | {"diffie_hellman", "cm20", "dh_PSI", "tee"} | "cm20" | psi type |
output_id_only | bool | {true, false} | true | output only id of each element in the intersection set |
sync_intersection | bool | {true, false} | true | synchronizing the intersection set among all parties |
key_size | int | {1024, 2048, 3072, 4096} | 1024 | cryptographic key length (in bits) |
batch_num | int | {"auto"}, [1, inf) | "auto" | batch number for PSI in "cm20" mode, integer will be rounded up to power of 2 |
Parameter | Type | Range | Default | Description |
---|---|---|---|---|
task_type | str | {"linear_regression", "logistic_regression", "poisson_regression"} | N/A | task type |
task_role | str | {"guest", "host", "server", "client"} | "guest" | task role |
penalty | str | {"l1", "l2", null} | "l2" | penalty term |
tol | float | [0, inf) | 1e-4 | tolerance for stopping criteria |
C | float | (0, inf) | 1.0 | inverse of regularization strength |
fit_intercept | bool | {true, false} | true | bias |
intercept_scaling | float | [0, inf) | 1.0 | x becomes [x, self.intercept_scaling] if fit_intercept is true |
solver | str | {"gradient_descent", "AdaGram", "AdaDelta", "RMSprop"} | "gradient_descent" | optimization method |
max_iter | int | [1, inf) | 100 | maximum iteration rounds |
learning_rate | float | (0, inf) | 0.15 | learning step size |
homomorphism | str | {"cpaillier"} | "cpaillier" | homomorphic encryption method |
key_size | int | [1, inf) | 1024 | homomorphic encryption key size |
gamma | float | (0, inf) | 0.9 | adjust the sum of past squared gradients |
epsilon | float | (0, inf) | 1e-8 | smooth gradient and avoid division by zero |
batch_fraction | float | (0, 1] | 0.1 | the subset fraction of mini-batch training |
batch_type | str | {"batch", "mini-batch"} | "batch" | batch method |
balanced_class_weight | bool | {true, false} | true | automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)), auto-disabled for continuous labels |
train_validate_freq | int | [1, inf) | None | validation using validate data each train_validate_freq epoch if train_validate_freq is not None |
Parameter | Type | Range | Default | Description |
---|---|---|---|---|
task_type | str | "dpgbdt" | "dpgbdt" | task type |
task_role | str | {"guest", "host"} | "guest" | task role |
objective | str | {"reg_squarederror", "binary_logistic", "count_poisson"} | "binary_logistic" | learing objective |
num_round | int | [1, inf) | 20 | the number of boosting rounds |
eta | float | (0, inf) | 0.3 | learning rate |
gamma | float | [0, inf) | 0.0 | minimum loss reduction required to make a further partition on a leaf node of the tree |
max_depth | int | [1, inf) | 3 | maximum depth of a tree |
min_child_weight | float | [0, inf) | 1.0 | minimum sum of instance weight (hessian) needed in a child |
max_delta_step | float | [0, inf) | 0.0 | maximum delta step we allow each leaf output to be |
sub_sample | float | (0, 1] | 1.0 | subsample ratio of the training instances at each boosting iteration |
lam | float | [0, inf) | 1.0 | L2 regularization strength |
sketch_eps | float | (0, 1) | 0.03 | convert every column into 1 / sketch_eps number of bins at most |
homomorphism | str | {"cpaillier"} | "cpaillier" | homomorphic encryption method |
key_size | int | [1, inf) | 1024 | homomorphic encryption key size |
importance_type | str | {"weight", "gain", "cover", "total_gain", "total_cover", "all"} | "weight" | feature importance type |
train_validate_freq | int | [1, inf) | None | validation using validate data each train_validate_freq tree if train_validate_freq is not None |
Parameter | Type | Range | Default | Description |
---|---|---|---|---|
task_type | str | "neural_network" | "neural_network" | task type |
task_role | str | {"guest", "host"} | "guest" | task role |
backend | str | {"keras", "pytorch"} | None | backend framework of deep learning |
format | str | {"file", "conf"} | None | input format of top/mid/bottom model to be loaded |
btm | str | Any | None | keras model config json string or model file path |
mid | str | Any | None | keras model config json string or model file path |
top | str | Any | None | keras model config json string or model file path |
epochs | int | [1, inf) | 1 | epochs of training |
batch_size | int | [1, inf) | 1 | batch size of training |
loss_fn | str | {"CrossEntropyLoss", "MSELoss", ...} | None | loss function of top model |
learning_rate | float | [0, inf) | 0.001 | learning rate of training |
optimizer | str | {"SGD", "Adam", ...} | None | optimizer of top/bottom model |
use_mid | bool | {true, false} | true | use mid model for vertical-nn or not (only top/bottom models) |
mid_shape_in | int | [1, inf) | 1 | input shape of mid model, the same as output shape of host bottom model |
mid_shape_out | int | [1, inf) | 1 | output shape of mid model, equals to input shape of guest top model minus output shape of guest btm model |
mid_activation | str | {"linear", "Relu", ...} | "linear" | activation function of mid model |
privacy_mode | str | {"plain"} | "plain" | encryption mechanism of interaction between multiple parties |
metrics | str | {"accuracy", "", ...} | None | output metrics for model evaluation |
predict_model | str | {"categorical", "", ...} | None | set to "categorical" only if top model is a classification model and the prediction value should be transformed to categorical vector |
num_classes | int | [1, inf) | None | number of categories, only needed in the case of top model is a classification model and the option "predict_model" is "categorical" |
client_frac | (0.0, 1.0] | float | 1.0 | (cluster-server mode) the fraction of clients selected to update the global model |
model_conf | str | Any | None | keras model config json string or model file path |
train_validate_freq | int | [1, inf) | None | validation using validate data each train_validate_freq epoch if train_validate_freq is not None |
Parameter | Type | Range | Default | Description |
---|---|---|---|---|
task_type | str | "evaluate" | "evaluate" | task type |
task_role | str | {"guest", "host"} | "guest" | task role |
Parameter | Type | Range | Default | Description |
---|---|---|---|---|
task_type | str | "model_loader" | "model_loader" | task type |
task_role | str | {"guest", "host"} | "guest" | task role |
model_id | str | {model_id} | None | id of model to be loaded/unloaded |
action | str | {"load", "unload"} | None | load/unload model |
Parameter | Type | Range | Default | Description |
---|---|---|---|---|
task_type | str | "predict_offline" | "predict_offline" | task type |
task_role | str | {"guest", "host"} | "guest" | task role |
model_id | str | {model_id} | None | id of model used for prediction |
input_data_path | str | {file_path} | None | input file's path and filename |