Merge pull request FederatedAI#4289 from FederatedAI/feature-1.9.0-mo…

…del-merge Feature 1.9.0 model merge
LinuxSuRen · Aug 29, 2022 · 5295347 · 5295347
2 parents 01950ac + e9bb7e2
commit 5295347
Show file tree

Hide file tree

Showing 5 changed files with 257 additions and 314 deletions.
diff --git a/doc/tutorial/README.md b/doc/tutorial/README.md
@@ -2,7 +2,7 @@
 
 Here we provide tutorials on running FATE jobs:
 
-Submiting jobs with `Pipeline` is recommended, here are some `Jupyter Notebook` with guided instructions:
+Submitting jobs with `Pipeline` is recommended, here are some `Jupyter Notebook` with guided instructions:
 
 - [Upload Data with FATE-Pipeline](pipeline/pipeline_tutorial_upload.ipynb)
 - [Train & Predict Hetero SecureBoost with FATE-Pipeline](pipeline/pipeline_tutorial_hetero_sbt.ipynb)
@@ -24,3 +24,7 @@ Models can be pushlished with FATE Serving to `Serving without FATE`:
 And for those who want to run jobs in batches, ie. run algorithm tests, try using `fate_test`:
 
 - [FATE-Test Tutorial](fate_test_tutorial.md)
+
+To merge models from different roles and export as sklearn/lighGBM or PMML format, please refer to the tutorial below:
+
+- [Guide to Merging FATE Model](./model_merge.md)
diff --git a/doc/tutorial/model_merge.md b/doc/tutorial/model_merge.md
@@ -0,0 +1,252 @@
+# Guide to Merging FATE Model
+
+## 1. Overview
+
+Trained FATE models from federated job of `host` and `guest` may be merged locally and reused in other frameworks. 
+Currently, FATE supports merging and exporting models from `HeteroLR`, `HeteroSSHELR` and `HeteroSecureBoost` into PMML or sklearn Logistic Regression Model/lightGBM objects.
+Below we walk through the general process of merging a Hetero Logistic Regression model using FATE-Flow Client command line tool.
+
+## 2. Install FATE-Flow Client
+
+`flow` command line tool is distributed along with [FATE-Client](https://pypi.org/project/fate-client/).
+Once `FATE-Client` installed, one can find a cmd enterpoint named `flow`:
+
+```commandline
+pip install fate_client
+flow --help
+```
+```
+Usage: flow [OPTIONS] COMMAND [ARGS]...
+
+  Fate Flow Client
+
+Options:
+  -h, --help  Show this message and exit.
+
+Commands:
+  checkpoint  Checkpoint Operations
+  component   Component Operations
+  data        Data Operations
+  init        Flow CLI Init Command
+  job         Job Operations
+  key         Key Operations
+  model       Model Operations
+  privilege   Privilege Operations
+  provider    Component Provider Operations
+  queue       Queue Operations
+  resource    Resource Manager
+  server      FATE Flow Server Operations
+  service     FATE Flow External Service Operations
+  table       Table Operations
+  tag         Tag Operations
+  task        Task Operations
+  template    Template Operations
+  test        FATE Flow Test Operations
+  tracking    Component Operations
+```
+
+To use FATE-Flow Client in terminal, we need to first specify which `FATE Flow Service` to connect to. 
+Assume we have a FATE Flow Service at 172.15.0.1:9380, then execute the following command to initialize FATE-Flow Client:
+
+```bash
+flow init -c /data/projects/fate/conf/service_conf.yaml
+flow init --ip 172.15.0.1 --port 9380
+```
+
+In the following steps, we assume that `host` is the role executes merging action.
+
+## 3. Prepare Model
+
+To obtain `guest's` trained Hetero Logistic Regression, we need to first make `host` load exported model.
+For models trained in standalone mode, you may skip this step.
+
+### 3.1 Export Model(Cluster mode)
+
+Below is an example model exportation configuration, adopted from [here](https://github.com/FederatedAI/FATE-Flow/blob/main/examples/model/export_model.json):
+
+```json
+{
+    "role": "guest",
+    "party_id": 9999,
+    "model_id": "arbiter-10000#guest-9999#host-10000#model",
+    "model_version": "202208291446341887230",
+    "output_path": "/data/projects/fate/examples/export_model"
+}
+```
+
+Run the following command to export specified model:
+
+```commandline
+flow model export -c export_model.json
+``` 
+
+If model exportation succeeds, `flow` will return success message like this:
+
+```
+{
+    "retcode": 0,
+    "file": "/data/projects/fate/examples/model_export/guest#9999#arbiter-10000#guest-9999#host-10000#model_202208291446341887230.zip",
+    "retmsg": "download successfully, please check /data/projects/fate/examples/model_export/guest#9999#arbiter-10000#guest-9999#host-10000#model_202208291446341887230.zip"
+}
+```
+
+### 3.2 Import Model(Cluster mode)
+
+If the FATE model is trained in distributed mode, where `guest` and `host` have FATE each installed in different machines,
+exported model needs to be imported/registered on merging role's system first before merging models.
+
+Here is an example configuration for model importation 
+adopted from [here](https://github.com/FederatedAI/FATE-Flow/blob/main/examples/model/import_model.json), 
+assuming we have put the exported `guest's` model from above step to directory "/data/projects/fate/temp" on `host's` machine:
+
+```json
+{
+  "role": "guest",
+  "party_id": 9999,
+  "model_id": "arbiter-10000#guest-9999#host-10000#model",
+  "model_version": "202208291446341887230",
+  "file": "/data/projects/fate/temp/guest#9999#arbiter-10000#guest-9999#host-10000#model_202208291446341887230.zip"
+}
+```
+
+Run the following command to import specified model:
+
+```commandline
+flow model import -c import_model.json
+``` 
+
+If model importation succeeds, `flow` will return success message like this:
+
+```
+{
+    "data": {
+        "job_id": "202208291521250963160",
+        "model_id": "arbiter-10000#guest-9999#host-10000#model",
+        "model_version": "202208291446341887230",
+        "party_id": "9999",
+        "role": "guest"
+    },
+    "retcode": 0,
+    "retmsg": "success"
+}
+```
+
+## 4. Merge Model
+
+### Hetero Logistic Regression Example
+
+Once model is ready, run the following command to merge models into a sklearn Logistic Regression model:
+
+```commandline
+flow component hetero-model-merge --model-id arbiter-10000#guest-9999#host-10000#model --model-version 202208291446341887230 --guest-party-id 9999 --host-party-ids 10000 --component-name hetero_lr_0 --model-type lr --output-format sklearn --target-name y --include-guest-coef --output-path model_merge_sklearn
+```
+
+We may then load and predict with the obtained sklearn model:
+
+```python
+import base64
+import json
+import pandas
+import pickle
+
+filename = "model_merge_sklearn"
+test_data_h = pandas.read_csv("/data/projects/fate/examples/data/breast_hetero_host.csv",
+                                    header=0, index_col="id")
+test_data_g = pandas.read_csv("/data/projects/fate/examples/data/breast_hetero_guest.csv",
+                                      header=0, index_col="id")
+test_data_g = test_data_g.loc[:, test_data_g.columns != 'y']
+test_data = pandas.concat((test_data_g, test_data_h), axis=1)
+
+with open(filename, "r") as f:
+    loaded_model = pickle.loads(base64.b64decode(json.load(f)))
+    predict_score = loaded_model.predict_proba(test_data)
+```
+
+### HeteroSecureBoost Example
+
+If the federated model is a HeteroSecureBoost Model, run this command to merge model and export it in LightGBM format.
+
+```commandline
+flow component hetero-model-merge --model-id arbiter-10000#guest-9999#host-10000#model --model-version 202208291446341887230 
+--guest-party-id 9999 --host-party-ids 10000 --component-name hetero_secureboost_0 --model-type secureboost --output-format lgb --target-name y --output-path ./lgb.txt
+```
+
+If you want a model in PMML format, use this command:
+
+```commandline
+flow component hetero-model-merge --model-id arbiter-10000#guest-9999#host-10000#model --model-version 202208291446341887230 
+--guest-party-id 9999 --host-party-ids 10000 --component-name hetero_secureboost_0 --model-type secureboost --output-format pmml --target-name y --output-path ./lgb.txt
+``` 
+
+Please notice that the model merge function offers a parameter 'host_rename'. Host features will be renamed by adding a sitename
+suffix to the original feature name once this option is on:
+
+```commandline
+Host party id is 9998, sitename is host_9998, origin feature is x1
+
+x1 -> x1, default setting
+x1 -> x1_host_9998, when enable host_rename
+```
+
+Use this command to rename host names when output 
+
+```commandline
+flow component hetero-model-merge --model-id arbiter-10000#guest-9999#host-10000#model --model-version 202208291446341887230 --host-rename
+--guest-party-id 9999 --host-party-ids 10000 --component-name hetero_secureboost_0 --model-type secureboost --output-format lgb --target-name y --output-path ./lgb.txt
+```
+
+We may then load and predict with the obtained lightgbm model. Remember that to get the correct predicted result,
+the order of features in predicted data should be in the same order as the merged tree model.
+
+```python
+import json
+import pandas
+import lightgbm as lgb
+
+filename = "./lgb.txt"
+test_data_h = pandas.read_csv("/data/projects/fate/examples/data/breast_hetero_host.csv",
+                                    header=0, index_col="id")
+test_data_g = pandas.read_csv("/data/projects/fate/examples/data/breast_hetero_guest.csv",
+                                      header=0, index_col="id")
+test_data_g = test_data_g.loc[:, test_data_g.columns != 'y']
+test_data = pandas.concat((test_data_g, test_data_h), axis=1)
+
+bst = lgb.Booster(model_str=open(filename, 'r').read())
+
+pred_rs = bst.predict(test_data)
+```
+
+For more information on command options, please check `help` menu:
+
+```commandline
+flow component hetero-model-merge --help
+```
+```
+Usage: flow component hetero-model-merge [OPTIONS]
+
+  - DESCRIPTION:
+      Merge a hetero model.
+
+  - USAGE:
+      flow component hetero-model-merge --model-id guest-9999#host-9998#model --model-version 202208241838502253290 --guest-party-id 9999 --host-party-ids 9998,9997 --component-name hetero_secure_boost_0 --model-type secureboost --output-format pmml --target-name y --no-host-rename --no-include-guest-coef --output-path model.xml
+
+Options:
+  --model-id TEXT                 Model id.  [required]
+  --model-version TEXT            Model version.  [required]
+  -gid, --guest-party-id TEXT     A valid party id.  [required]
+  -hids, --host-party-ids TEXT    Multiple party ids, use a comma to separate
+                                  each one.  [required]
+
+  -cpn, --component-name TEXT     A valid component name.  [required]
+  --model-type TEXT               [required]
+  --output-format TEXT            [required]
+  --target-name TEXT
+  --host-rename / --no-host-rename
+  --include-guest-coef / --no-include-guest-coef
+  -o, --output-path PATH          User specifies output directory path.
+                                  [required]
+
+  -h, --help                      Show this message and exit.
+```
+
+