forked from DeepRec-AI/DeepRec
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Modelzoo] Add ESMM model. (DeepRec-AI#247)
Co-authored-by: Anna Alberska <[email protected]>
Showing
8 changed files
with
1,617 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,302 @@ | ||
# ESMM | ||
The following is a brief directory structure and description for this example: | ||
``` | ||
├── data # Data set directory | ||
│ └── README.md # Documentation describing how to prepare dataset | ||
├── distribute_k8s # Distributed training related files | ||
│ ├── distribute_k8s_BF16.yaml # k8s yaml to crate a training job with BF16 feature | ||
│ ├── distribute_k8s_FP32.yaml # k8s yaml to crate a training job | ||
│ └── launch.py # Script to set env for distributed training | ||
├── README.md # Documentation | ||
├── result # Output directory | ||
│ └── README.md # Documentation describing output directory | ||
└── train.py # Training script | ||
``` | ||
|
||
## Content | ||
- [ESMM](#ESMM) | ||
- [Content](#content) | ||
- [Model Structure](#model-structure) | ||
- [Usage](#usage) | ||
- [Stand-alone Training](#stand-alone-training) | ||
- [Distributed Training](#distributed-training) | ||
- [Benchmark](#benchmark) | ||
- [Stand-alone Training](#stand-alone-training-1) | ||
- [Test Environment](#test-environment) | ||
- [Performance Result](#performance-result) | ||
- [Distributed Training](#distributed-training-1) | ||
- [Test Environment](#test-environment-1) | ||
- [Performance Result](#performance-result-1) | ||
- [Dataset](#dataset) | ||
- [Prepare](#prepare) | ||
- [Fields](#fields) | ||
- [Processing](#processing) | ||
|
||
## Model Structure | ||
[Entire Space Multi-Task Model](https://arxiv.org/abs/1804.07931)(ESMM) is a model proposed by Alibaba in 2018 that estimates post-click convertion rate(CVR) directly over the entire space. It also employes a feature representation transfer learning strategy. | ||
|
||
The model structure in this folder is as follow: | ||
``` | ||
output: | ||
pCTCVR | ||
(post-view click-through&conversion rate) | ||
model: | ||
| | ||
_____> elem-wise x <_____ | ||
/ \ | ||
/ \ | ||
pCVR pCTR | ||
(post-view conversion rate) (post-view click-through rate) | ||
______|______ ______|______ | ||
| | | | | ||
| Main task | | Aux. task | | ||
| MLP | | MLP | | ||
| | | | | ||
|___________| |___________| | ||
^----------- ----------^ | ||
\ / | ||
\ / | ||
______________> Concat <_____________ | ||
/ | \ | ||
| | | | ||
______|______ ______|______ ______|______ | ||
| | | | | | | ||
| | | | | | | ||
| MLP | | MLP | | MLP | | ||
| | | | | | | ||
|___________| |___________| |___________| | ||
| | | | ||
_____________ _____________ ____________ | ||
|_Emb_|____|__| |_Emb_|____|__| |_Emb_|__|___| | ||
| | | | ||
input: | ||
[User features] [Item features] [Combo features] | ||
``` | ||
|
||
## Usage | ||
|
||
### Stand-alone Training | ||
1. Please prepare the data set and DeepRec environment. | ||
1. Manually | ||
- Follow [dataset preparation](#prepare) to prepare data set. | ||
- Download code by `git clone https://github.com/alibaba/DeepRec` | ||
- Follow [How to Build](https://github.com/alibaba/DeepRec#how-to-build) to build DeepRec whl package and install by `pip install $DEEPREC_WHL`. | ||
2. *Docker(**Recommended**)* | ||
``` | ||
docker pull alideeprec/deeprec-release-modelzoo:latest | ||
docker run -it alideeprec/deeprec-release-modelzoo:latest /bin/bash | ||
# In docker container | ||
cd /root/modelzoo/ESMM | ||
``` | ||
2. Training. | ||
``` | ||
python train.py | ||
# Memory acceleration with jemalloc. | ||
# The required ENV `MALLOC_CONF` is already set in the code. | ||
LD_PRELOAD=./libjemalloc.so.2.5.1 python train.py | ||
``` | ||
Use argument `--bf16` to enable DeepRec BF16 feature. | ||
``` | ||
python train.py --bf16 | ||
# Memory acceleration with jemalloc. | ||
# The required ENV `MALLOC_CONF` is already set in the code. | ||
LD_PRELOAD=./libjemalloc.so.2.5.1 python train.py --bf16 | ||
``` | ||
In the community tensorflow environment, use argument `--tf` to disable all of DeepRec's feature. | ||
``` | ||
python train.py --tf | ||
``` | ||
Use arguments to set up a custom configuration: | ||
- DeepRec Features: | ||
- `export START_STATISTIC_STEP` and `export STOP_STATISTIC_STEP`: Set ENV to configure CPU memory optimization. This is already set to `100` & `110` in the code by default. | ||
- `--bf16`: Enable DeepRec BF16 feature in DeepRec. Use FP32 by default. | ||
- `--emb_fusion`: Whether to enable embedding fusion, Default is `True`. | ||
- `--op_fusion`: Whether to enable Auto graph fusion feature. Default is `True`. | ||
- `--optimizer`: Choose the optimizer for deep model from ['adam', 'adamasync', 'adagraddecay', 'adagrad', 'gradientdescent']. Use `adamasync` by default. | ||
- `--smartstaged`: Whether to enable SmartStaged feature of DeepRec, Default is `True`. | ||
- `--micro_batch`: Set num for Auto Micro Batch. Default is `0`. (Not really enabled) | ||
- `--ev`: Whether to enable DeepRec EmbeddingVariable. Default is `False`. | ||
- `--adaptive_emb`: Whether to enable Adaptive Embedding. Default is `False`. | ||
- `--ev_elimination`: Set Feature Elimination of EmbeddingVariable Feature. Options: [None, 'l2', 'gstep'], default is `None`. | ||
- `--ev_filter`: Set Feature Filter of EmbeddingVariable Feature. Options: [None, 'counter', 'cbf'], default to `None`. | ||
- `--dynamic_ev`: Whether to enable Dynamic-dimension Embedding Variable. Default is `False`. (Not really enabled) | ||
- `--multihash`: Whether to enable Multi-Hash Variable. Default is `False`. (Not really enabled) | ||
- `--incremental_ckpt`: Set time of save Incremental Checkpoint. Default is `0`. | ||
- `--workqueue`: Whether to enable WorkQueue. Default is `False`. | ||
- Basic Settings: | ||
- `--data_location`: Full path of train & eval data. Default is `./data`. | ||
- `--steps`: Set the number of steps on train dataset. When default(`0`) is used, the number of steps is computed based on dataset size and number of epochs equals 1000. | ||
- `--no_eval`: Do not evaluate trained model by eval dataset. | ||
- `--batch_size`: Batch size to train. Default is `512`. | ||
- `--output_dir`: Full path to output directory for logs and saved model. Default is `./result`. | ||
- `--checkpoint`: Full path to checkpoints output directory. Default is `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMP)` | ||
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to `None`. | ||
- `--seed`: Random seed. Default is `2021`. | ||
- `--timeline`: Save steps of profile hooks to record timeline, zero to close. Default is `None`. | ||
- `--keep_checkpoint_max`: Maximum number of recent checkpoint to keep. Default is `1`. | ||
- `--learning_rate`: Learning rate for network. Default is `0.1`. | ||
- `--l2_regularization`: L2 regularization for the model. Default is `None`. | ||
- `--protocol`: Set the protocol('grpc', 'grpc++', 'star_server') used when starting server in distributed training. Default is `grpc`. | ||
- `--inter`: Set inter op parallelism threads. Default is `0`. | ||
- `--intra`: Set intra op parallelism threads. Default is `0`. | ||
- `--input_layer_partitioner`: Slice size of input layer partitioner(units MB). Default is `0`. | ||
- `--dense_layer_partitioner`: Slice size of dense layer partitioner(units kB). Default is `0`. | ||
- `--tf`: Use TF 1.15.5 API and disable all DeepRec features. | ||
### Distributed Training | ||
1. Prepare a K8S cluster. [Alibaba Cloud ACK Service(Alibaba Cloud Container Service for Kubernetes)](https://cn.aliyun.com/product/kubernetes) can quickly create a Kubernetes cluster. | ||
2. Perpare a shared storage volume. For Alibaba Cloud ACK, [OSS(Object Storage Service)](https://cn.aliyun.com/product/oss) can be used as a shared storage volume. | ||
3. Create a PVC(PeritetVolumeClaim) named `deeprec` for storage volumn in cluster. | ||
4. Prepare docker image. `alideeprec/deeprec-release-modelzoo:latest` is recommended. | ||
5. Create a k8s job from `.yaml` to run distributed training. | ||
``` | ||
kubectl create -f $YAML_FILE | ||
``` | ||
6. Show training log by `kubectl logs -f trainer-worker-0` | ||
## Benchmark | ||
### Stand-alone Training | ||
#### Test Environment | ||
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7). | ||
- Hardware | ||
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz | ||
- CPU(s): 8 | ||
- Socket(s): 1 | ||
- Core(s) per socket: 4 | ||
- Thread(s) per core: 2 | ||
- Memory: 32G | ||
- Software | ||
- kernel: 4.18.0-305.12.1.el8_4.x86_64 | ||
- OS: CentOS Linux release 8.4.2105 | ||
- Docker: 20.10.12 | ||
- Python: 3.6.12 | ||
#### Performance Result | ||
<table> | ||
<tr> | ||
<td colspan="1"></td> | ||
<td>Framework</td> | ||
<td>DType</td> | ||
<td>Accuracy</td> | ||
<td>AUC</td> | ||
<td>Globalsteps/sec</td> | ||
</tr> | ||
<tr> | ||
<td rowspan="3">ESMM</td> | ||
<td>Community TensorFlow</td> | ||
<td>FP32</td> | ||
<td></td> | ||
<td></td> | ||
<td></td> | ||
</tr> | ||
<tr> | ||
<td>DeepRec w/ oneDNN</td> | ||
<td>FP32</td> | ||
<td></td> | ||
<td></td> | ||
<td></td> | ||
</tr> | ||
<tr> | ||
<td>DeepRec w/ oneDNN</td> | ||
<td>FP32+BF16</td> | ||
<td></td> | ||
<td></td> | ||
<td></td> | ||
</tr> | ||
</table> | ||
Community TensorFlow version is v1.15.5. | ||
### Distributed Training | ||
#### Test Environment | ||
The benchmark is performed on the [Alibaba Cloud ACK Service(Alibaba Cloud Container Service for Kubernetes)](https://cn.aliyun.com/product/kubernetes), the K8S cluster is composed of the following ten machines. | ||
- Hardware | ||
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz | ||
- CPU(s): 8 | ||
- Socket(s): 1 | ||
- Core(s) per socket: 4 | ||
- Thread(s) per core: 2 | ||
- Memory: 32G | ||
#### Performance Result | ||
<table> | ||
<tr> | ||
<td colspan="1"></td> | ||
<td>Framework</td> | ||
<td>Protocol</td> | ||
<td>DType</td> | ||
<td>Globalsetp/Sec</td> | ||
</tr> | ||
<tr> | ||
<td rowspan="3">ESMM</td> | ||
<td>Community TensorFlow</td> | ||
<td>GRPC</td> | ||
<td>FP32</td> | ||
<td></td> | ||
</tr> | ||
<tr> | ||
<td>DeepRec w/ oneDNN</td> | ||
<td>GRPC</td> | ||
<td>FP32</td> | ||
<td></td> | ||
</tr> | ||
<tr> | ||
<td>DeepRec w/ oneDNN</td> | ||
<td>GRPC</td> | ||
<td>FP32+BF16</td> | ||
<td></td> | ||
</tr> | ||
</table> | ||
- Community TensorFlow version is v1.15.5. | ||
## Dataset | ||
Taobao dataset from [EasyRec](https://github.com/AlibabaPAI/EasyRec) is used. | ||
### Prepare | ||
Put data file **taobao_train_data & taobao_test_data** into ./data/ | ||
For details of Data download, see [EasyRec](https://github.com/AlibabaPAI/EasyRec/#GetStarted) | ||
### Fields | ||
The dataset contains 20 columns, details as follow: | ||
| Name | clk | buy | pid | adgroup_id | cate_id | campaign_id | customer | brand | user_id | cms_segid | cms_group_id | final_gender_code | age_level | pvalue_level | shopping_level | occupation | new_user_class_level | tag_category_list | tag_brand_list | price | | ||
| ---- | -------- | -------- | --------- | ---------- | --------- | ----------- | --------- | --------- | --------- | --------- | ------------ | ----------------- | --------- | ------------ | -------------- | ---------- | -------------------- | ----------------- | -------------- | -------- | | ||
| Type | tf.int32 | tf.int32 | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.int32 | | ||
The data in `tag_category_list` and `tag_brand_list` column are separated by `'|'` | ||
### Processing | ||
The `clk` and `buy` columns are used as labels. | ||
User's feature columns is as follow: | ||
| Column name | Hash bucket size | Embedding dimension | | ||
| -------------------- | ---------------- | ------------------- | | ||
| user_id | 100000 | 16 | | ||
| cms_segid | 100 | 16 | | ||
| cms_group_id | 100 | 16 | | ||
| age_level | 10 | 16 | | ||
| pvalue_level | 10 | 16 | | ||
| shopping_level | 10 | 16 | | ||
| occupation | 10 | 16 | | ||
| new_user_class_level | 10 | 16 | | ||
| tag_category_list | 100000 | 16 | | ||
| tag_brand_list | 100000 | 16 | | ||
Item's feature columns is as follow: | ||
| Column name | Hash bucket size | Number of buckets | Embedding dimension | | ||
| ----------- | ---------------- | ------------------- | ------------------- | | ||
| pid | 10 | N/A | 16 | | ||
| adgroup_id | 100000 | N/A | 16 | | ||
| cate_id | 10000 | N/A | 16 | | ||
| campaign_id | 100000 | N/A | 16 | | ||
| customer | 100000 | N/A | 16 | | ||
| brand | 100000 | N/A | 16 | | ||
| price | N/A | 50 | 16 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Dataset | ||
Taobao dataset from [EasyRec](https://github.com/AlibabaPAI/EasyRec) is used. | ||
## Prepare dataset | ||
Put data file **taobao_train_data & taobao_test_data** into ./data/ | ||
For details of Data download, see [EasyRec](https://github.com/AlibabaPAI/EasyRec/#GetStarted) | ||
|
||
### Fields | ||
The dataset contains 20 columns, details as follow: | ||
| Name | clk | buy | pid | adgroup_id | cate_id | campaign_id | customer | brand | user_id | cms_segid | cms_group_id | final_gender_code | age_level | pvalue_level | shopping_level | occupation | new_user_class_level | tag_category_list | tag_brand_list | price | | ||
| ---- | -------- | -------- | --------- | ---------- | --------- | ----------- | --------- | --------- | --------- | --------- | ------------ | ----------------- | --------- | ------------ | -------------- | ---------- | -------------------- | ----------------- | -------------- | -------- | | ||
| Type | tf.int32 | tf.int32 | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.int32 | | ||
|
||
|
||
The data in `tag_category_list` and `tag_brand_list` column are separated by `'|'` | ||
|
||
### Processing | ||
The `clk` and `buy` columns are used as labels. | ||
User's feature columns is as follow: | ||
| Column name | Hash bucket size | Embedding dimension | | ||
| -------------------- | ---------------- | ------------------- | | ||
| user_id | 100000 | 16 | | ||
| cms_segid | 100 | 16 | | ||
| cms_group_id | 100 | 16 | | ||
| age_level | 10 | 16 | | ||
| pvalue_level | 10 | 16 | | ||
| shopping_level | 10 | 16 | | ||
| occupation | 10 | 16 | | ||
| new_user_class_level | 10 | 16 | | ||
| tag_category_list | 100000 | 16 | | ||
| tag_brand_list | 100000 | 16 | | ||
|
||
Item's feature columns is as follow: | ||
| Column name | Hash bucket size | Number of buckets | Embedding dimension | | ||
| ----------- | ---------------- | ------------------- | ------------------- | | ||
| pid | 10 | N/A | 16 | | ||
| adgroup_id | 100000 | N/A | 16 | | ||
| cate_id | 10000 | N/A | 16 | | ||
| campaign_id | 100000 | N/A | 16 | | ||
| customer | 100000 | N/A | 16 | | ||
| brand | 100000 | N/A | 16 | | ||
| price | N/A | 50 | 16 | |
Oops, something went wrong.