Skip to content

Commit a94a8ec

Browse files
authored
v0.8.5
- Implement the BST model . - Update Transformer Layer, support new attention_type and output_type .
2 parents 4762e85 + e9863cb commit a94a8ec

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+315
-101
lines changed

.travis.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ script:
5959

6060
notifications:
6161
recipients:
62-
- wcshen1994@163.com
62+
- weichenswc@163.com
6363

6464
on_success: change
6565
on_failure: change

README.md

+37-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
[![Documentation Status](https://readthedocs.org/projects/deepctr-doc/badge/?version=latest)](https://deepctr-doc.readthedocs.io/)
1313
![CI status](https://github.com/shenweichen/deepctr/workflows/CI/badge.svg)
1414
[![Coverage Status](https://coveralls.io/repos/github/shenweichen/DeepCTR/badge.svg?branch=master)](https://coveralls.io/github/shenweichen/DeepCTR?branch=master)
15-
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/d4099734dc0e4bab91d332ead8c0bdd0)](https://www.codacy.com/app/wcshen1994/DeepCTR?utm_source=github.com&utm_medium=referral&utm_content=shenweichen/DeepCTR&utm_campaign=Badge_Grade)
15+
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/d4099734dc0e4bab91d332ead8c0bdd0)](https://www.codacy.com/gh/shenweichen/DeepCTR?utm_source=github.com&utm_medium=referral&utm_content=shenweichen/DeepCTR&utm_campaign=Badge_Grade)
1616
[![Disscussion](https://img.shields.io/badge/chat-wechat-brightgreen?style=flat)](./README.md#DisscussionGroup)
1717
[![License](https://img.shields.io/github/license/shenweichen/deepctr.svg)](https://github.com/shenweichen/deepctr/blob/master/LICENSE)
1818
<!-- [![Gitter](https://badges.gitter.im/DeepCTR/community.svg)](https://gitter.im/DeepCTR/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) -->
@@ -54,6 +54,7 @@ Let's [**Get Started!**](https://deepctr-doc.readthedocs.io/en/latest/Quick-Star
5454
| Deep Session Interest Network | [IJCAI 2019][Deep Session Interest Network for Click-Through Rate Prediction ](https://arxiv.org/abs/1905.06482) |
5555
| FiBiNET | [RecSys 2019][FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction](https://arxiv.org/pdf/1905.09433.pdf) |
5656
| FLEN | [arxiv 2019][FLEN: Leveraging Field for Scalable CTR Prediction](https://arxiv.org/pdf/1911.04690.pdf) |
57+
| BST | [DLP-KDD 2019][Behavior sequence transformer for e-commerce recommendation in Alibaba](https://arxiv.org/pdf/1905.06874.pdf) |
5758
| DCN V2 | [arxiv 2020][DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/abs/2008.13535) |
5859

5960
## Citation
@@ -82,3 +83,38 @@ If you find this code useful in your research, please cite it using the followin
8283

8384
![wechat](./docs/pics/code.png)
8485

86+
87+
## Main contributors([welcome to join us!](./CONTRIBUTING.md))
88+
89+
<table border="0">
90+
<tbody>
91+
<tr align="center" >
92+
<td>
93+
​ <a href="https://github.com/shenweichen"><img width="70" height="70" src="https://github.com/shenweichen.png?s=40" alt="pic"></a><br>
94+
​ <a href="https://github.com/shenweichen">Shen Weichen</a> ​
95+
<p>
96+
Alibaba Group </p>​
97+
</td>
98+
<td>
99+
<a href="https://github.com/zanshuxun"><img width="70" height="70" src="https://github.com/zanshuxun.png?s=40" alt="pic"></a><br>
100+
<a href="https://github.com/zanshuxun">Zan Shuxun</a> ​
101+
<p>Beijing University <br> of Posts and <br> Telecommunications </p>​
102+
</td>
103+
<td>
104+
​ <a href="https://github.com/pandeconscious"><img width="70" height="70" src="https://github.com/pandeconscious.png?s=40" alt="pic"></a><br>
105+
​ <a href="https://github.com/pandeconscious">Harshit Pande</a>
106+
<p> Amazon </p>​
107+
</td>
108+
<td>
109+
​ <a href="https://github.com/codewithzichao"><img width="70" height="70" src="https://github.com/codewithzichao.png?s=40" alt="pic"></a><br>
110+
​ <a href="https://github.com/codewithzichao">Li Zichao</a>
111+
<p> Peking University </p>​
112+
</td>
113+
<td>
114+
​ <a href="https://github.com/TanTingyi"><img width="70" height="70" src="https://github.com/TanTingyi.png?s=40" alt="pic"></a><br>
115+
<a href="https://github.com/TanTingyi">LeoCai</a>
116+
<p> Chongqing University <br> of Posts and <br> Telecommunications </p>​
117+
</td>
118+
</tr>
119+
</tbody>
120+
</table>

deepctr/__init__.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from .utils import check_version
2-
3-
__version__ = '0.8.3'
4-
check_version(__version__)
1+
from .utils import check_version
2+
3+
__version__ = '0.8.5'
4+
check_version(__version__)

deepctr/estimator/models/afm.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen, wcshen1994@163.com
5+
Weichen Shen, weichenswc@163.com
66
77
Reference:
88
[1] Xiao J, Ye H, He X, et al. Attentional factorization machines: Learning the weight of feature interactions via attention networks[J]. arXiv preprint arXiv:1708.04617, 2017.

deepctr/estimator/models/autoint.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen, wcshen1994@163.com
5+
Weichen Shen, weichenswc@163.com
66
77
Reference:
88
[1] Song W, Shi C, Xiao Z, et al. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks[J]. arXiv preprint arXiv:1810.11921, 2018.(https://arxiv.org/abs/1810.11921)

deepctr/estimator/models/ccpm.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen, wcshen1994@163.com
5+
Weichen Shen, weichenswc@163.com
66
77
Reference:
88
[1] Liu Q, Yu F, Wu S, et al. A convolutional click prediction model[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015: 1743-1746.

deepctr/estimator/models/dcn.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding:utf-8 -*-
22
"""
33
Author:
4-
Weichen Shen, wcshen1994@163.com
4+
Weichen Shen, weichenswc@163.com
55
66
Reference:
77
[1] Wang R, Fu B, Fu G, et al. Deep & cross network for ad click predictions[C]//Proceedings of the ADKDD'17. ACM, 2017: 12. (https://arxiv.org/abs/1708.05123)

deepctr/estimator/models/deepfm.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding:utf-8 -*-
22
"""
33
Author:
4-
Weichen Shen, wcshen1994@163.com
4+
Weichen Shen, weichenswc@163.com
55
66
Reference:
77
[1] Guo H, Tang R, Ye Y, et al. Deepfm: a factorization-machine based neural network for ctr prediction[J]. arXiv preprint arXiv:1703.04247, 2017.(https://arxiv.org/abs/1703.04247)

deepctr/estimator/models/fibinet.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding:utf-8 -*-
22
"""
33
Author:
4-
Weichen Shen, wcshen1994@163.com
4+
Weichen Shen, weichenswc@163.com
55
66
Reference:
77
[1] Huang T, Zhang Z, Zhang J. FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction[J]. arXiv preprint arXiv:1905.09433, 2019.

deepctr/estimator/models/fnn.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding:utf-8 -*-
22
"""
33
Author:
4-
Weichen Shen, wcshen1994@163.com
4+
Weichen Shen, weichenswc@163.com
55
66
Reference:
77
[1] Zhang W, Du T, Wang J. Deep learning over multi-field categorical data[C]//European conference on information retrieval. Springer, Cham, 2016: 45-57.(https://arxiv.org/pdf/1601.02376.pdf)

deepctr/estimator/models/fwfm.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding:utf-8 -*-
22
"""
33
Author:
4-
Weichen Shen, wcshen1994@163.com
4+
Weichen Shen, weichenswc@163.com
55
Harshit Pande
66
77
Reference:

deepctr/estimator/models/nfm.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding:utf-8 -*-
22
"""
33
Author:
4-
Weichen Shen, wcshen1994@163.com
4+
Weichen Shen, weichenswc@163.com
55
66
Reference:
77
[1] He X, Chua T S. Neural factorization machines for sparse predictive analytics[C]//Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 2017: 355-364. (https://arxiv.org/abs/1708.05027)

deepctr/estimator/models/pnn.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding:utf-8 -*-
22
"""
33
Author:
4-
Weichen Shen, wcshen1994@163.com
4+
Weichen Shen, weichenswc@163.com
55
66
Reference:
77
[1] Qu Y, Cai H, Ren K, et al. Product-based neural networks for user response prediction[C]//Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 2016: 1149-1154.(https://arxiv.org/pdf/1611.00144.pdf)

deepctr/estimator/models/wdl.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding:utf-8 -*-
22
"""
33
Author:
4-
Weichen Shen, wcshen1994@163.com
4+
Weichen Shen, weichenswc@163.com
55
66
Reference:
77
[1] Cheng H T, Koc L, Harmsen J, et al. Wide & deep learning for recommender systems[C]//Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016: 7-10.(https://arxiv.org/pdf/1606.07792.pdf)

deepctr/estimator/models/xdeepfm.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding:utf-8 -*-
22
"""
33
Author:
4-
Weichen Shen, wcshen1994@163.com
4+
Weichen Shen, weichenswc@163.com
55
66
Reference:
77
[1] Lian J, Zhou X, Zhang F, et al. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems[J]. arXiv preprint arXiv:1803.05170, 2018.(https://arxiv.org/pdf/1803.05170.pdf)

deepctr/inputs.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen,wcshen1994@163.com
5+
Weichen Shen,weichenswc@163.com
66
77
"""
88

deepctr/layers/activation.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen,wcshen1994@163.com
5+
Weichen Shen,weichenswc@163.com
66
77
"""
88

deepctr/layers/core.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen,wcshen1994@163.com
5+
Weichen Shen,weichenswc@163.com
66
77
"""
88

deepctr/layers/interaction.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Authors:
5-
Weichen Shen,wcshen1994@163.com,
5+
Weichen Shen,weichenswc@163.com,
66
Harshit Pande
77
88
"""

deepctr/layers/normalization.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen,wcshen1994@163.com
5+
Weichen Shen,weichenswc@163.com
66
77
"""
88

deepctr/layers/sequence.py

+34-12
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen,wcshen1994@163.com
5+
Weichen Shen,weichenswc@163.com
66
77
"""
88

@@ -79,7 +79,7 @@ def call(self, seq_value_len_list, mask=None, **kwargs):
7979
mask = tf.tile(mask, [1, 1, embedding_size])
8080

8181
if self.mode == "max":
82-
hist = uiseq_embed_list - (1-mask) * 1e9
82+
hist = uiseq_embed_list - (1 - mask) * 1e9
8383
return reduce_max(hist, 1, keep_dims=True)
8484

8585
hist = reduce_sum(uiseq_embed_list * mask, 1, keep_dims=False)
@@ -417,12 +417,12 @@ class Transformer(Layer):
417417
""" Simplified version of Transformer proposed in 《Attention is all you need》
418418
419419
Input shape
420-
- a list of two 3D tensor with shape ``(batch_size, timesteps, input_dim)`` if supports_masking=True.
421-
- a list of two 4 tensors, first two tensors with shape ``(batch_size, timesteps, input_dim)``,last two tensors with shape ``(batch_size, 1)`` if supports_masking=False.
420+
- a list of two 3D tensor with shape ``(batch_size, timesteps, input_dim)`` if ``supports_masking=True`` .
421+
- a list of two 4 tensors, first two tensors with shape ``(batch_size, timesteps, input_dim)``,last two tensors with shape ``(batch_size, 1)`` if ``supports_masking=False`` .
422422
423423
424424
Output shape
425-
- 3D tensor with shape: ``(batch_size, 1, input_dim)``.
425+
- 3D tensor with shape: ``(batch_size, 1, input_dim)`` if ``output_type='mean'`` or ``output_type='sum'`` , else ``(batch_size, timesteps, input_dim)`` .
426426
427427
428428
Arguments
@@ -436,14 +436,16 @@ class Transformer(Layer):
436436
- **blinding**: bool. Whether or not use blinding.
437437
- **seed**: A Python integer to use as random seed.
438438
- **supports_masking**:bool. Whether or not support masking.
439+
- **attention_type**: str, Type of attention, the value must be one of { ``'scaled_dot_product'`` , ``'additive'`` }.
440+
- **output_type**: ``'mean'`` , ``'sum'`` or `None`. Whether or not use average/sum pooling for output.
439441
440442
References
441443
- [Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.](https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf)
442444
"""
443445

444446
def __init__(self, att_embedding_size=1, head_num=8, dropout_rate=0.0, use_positional_encoding=True, use_res=True,
445447
use_feed_forward=True, use_layer_norm=False, blinding=True, seed=1024, supports_masking=False,
446-
**kwargs):
448+
attention_type="scaled_dot_product", output_type="mean", **kwargs):
447449
if head_num <= 0:
448450
raise ValueError('head_num must be a int > 0')
449451
self.att_embedding_size = att_embedding_size
@@ -456,6 +458,8 @@ def __init__(self, att_embedding_size=1, head_num=8, dropout_rate=0.0, use_posit
456458
self.dropout_rate = dropout_rate
457459
self.use_layer_norm = use_layer_norm
458460
self.blinding = blinding
461+
self.attention_type = attention_type
462+
self.output_type = output_type
459463
super(Transformer, self).__init__(**kwargs)
460464
self.supports_masking = supports_masking
461465

@@ -464,7 +468,7 @@ def build(self, input_shape):
464468
if self.num_units != embedding_size:
465469
raise ValueError(
466470
"att_embedding_size * head_num must equal the last dimension size of inputs,got %d * %d != %d" % (
467-
self.att_embedding_size, self.head_num, embedding_size))
471+
self.att_embedding_size, self.head_num, embedding_size))
468472
self.seq_len_max = int(input_shape[0][-2])
469473
self.W_Query = self.add_weight(name='query', shape=[embedding_size, self.att_embedding_size * self.head_num],
470474
dtype=tf.float32,
@@ -475,6 +479,11 @@ def build(self, input_shape):
475479
self.W_Value = self.add_weight(name='value', shape=[embedding_size, self.att_embedding_size * self.head_num],
476480
dtype=tf.float32,
477481
initializer=tf.keras.initializers.TruncatedNormal(seed=self.seed + 2))
482+
if self.attention_type == "additive":
483+
self.b = self.add_weight('b', shape=[self.att_embedding_size], dtype=tf.float32,
484+
initializer=tf.keras.initializers.glorot_uniform(seed=self.seed))
485+
self.v = self.add_weight('v', shape=[self.att_embedding_size], dtype=tf.float32,
486+
initializer=tf.keras.initializers.glorot_uniform(seed=self.seed))
478487
# if self.use_res:
479488
# self.W_Res = self.add_weight(name='res', shape=[embedding_size, self.att_embedding_size * self.head_num], dtype=tf.float32,
480489
# initializer=tf.keras.initializers.TruncatedNormal(seed=self.seed))
@@ -525,10 +534,18 @@ def call(self, inputs, mask=None, training=None, **kwargs):
525534
keys = tf.concat(tf.split(keys, self.head_num, axis=2), axis=0)
526535
values = tf.concat(tf.split(values, self.head_num, axis=2), axis=0)
527536

528-
# head_num*None T_q T_k
529-
outputs = tf.matmul(querys, keys, transpose_b=True)
537+
if self.attention_type == "scaled_dot_product":
538+
# head_num*None T_q T_k
539+
outputs = tf.matmul(querys, keys, transpose_b=True)
530540

531-
outputs = outputs / (keys.get_shape().as_list()[-1] ** 0.5)
541+
outputs = outputs / (keys.get_shape().as_list()[-1] ** 0.5)
542+
elif self.attention_type == "additive":
543+
querys_reshaped = tf.expand_dims(querys, axis=-2)
544+
keys_reshaped = tf.expand_dims(keys, axis=-3)
545+
outputs = tf.tanh(tf.nn.bias_add(querys_reshaped + keys_reshaped, self.b))
546+
outputs = tf.squeeze(tf.tensordot(outputs, tf.expand_dims(self.v, axis=-1), axes=[-1, 0]), axis=-1)
547+
else:
548+
NotImplementedError
532549

533550
key_masks = tf.tile(key_masks, [self.head_num, 1])
534551

@@ -579,7 +596,12 @@ def call(self, inputs, mask=None, training=None, **kwargs):
579596
if self.use_layer_norm:
580597
result = self.ln(result)
581598

582-
return reduce_mean(result, axis=1, keep_dims=True)
599+
if self.output_type == "mean":
600+
return reduce_mean(result, axis=1, keep_dims=True)
601+
elif self.output_type == "sum":
602+
return reduce_sum(result, axis=1, keep_dims=True)
603+
else:
604+
return result
583605

584606
def compute_output_shape(self, input_shape):
585607

@@ -593,7 +615,7 @@ def get_config(self, ):
593615
'dropout_rate': self.dropout_rate, 'use_res': self.use_res,
594616
'use_positional_encoding': self.use_positional_encoding, 'use_feed_forward': self.use_feed_forward,
595617
'use_layer_norm': self.use_layer_norm, 'seed': self.seed, 'supports_masking': self.supports_masking,
596-
'blinding': self.blinding}
618+
'blinding': self.blinding, 'attention_type': self.attention_type, 'output_type': self.output_type}
597619
base_config = super(Transformer, self).get_config()
598620
return dict(list(base_config.items()) + list(config.items()))
599621

deepctr/layers/utils.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen,wcshen1994@163.com
5+
Weichen Shen,weichenswc@163.com
66
77
"""
88
import tensorflow as tf

deepctr/models/__init__.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
from .fibinet import FiBiNET
1919
from .flen import FLEN
2020
from .fwfm import FwFM
21+
from .bst import BST
2122

2223
__all__ = ["AFM", "CCPM", "DCN", "DCNMix", "MLR", "DeepFM", "MLR", "NFM", "DIN", "DIEN", "FNN", "PNN",
23-
"WDL", "xDeepFM", "AutoInt", "ONN", "FGCNN", "DSIN", "FiBiNET", 'FLEN', "FwFM"]
24+
"WDL", "xDeepFM", "AutoInt", "ONN", "FGCNN", "DSIN", "FiBiNET", 'FLEN', "FwFM", "BST"]

deepctr/models/afm.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen, wcshen1994@163.com
5+
Weichen Shen, weichenswc@163.com
66
77
Reference:
88
[1] Xiao J, Ye H, He X, et al. Attentional factorization machines: Learning the weight of feature interactions via attention networks[J]. arXiv preprint arXiv:1708.04617, 2017.

deepctr/models/autoint.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
44
Author:
5-
Weichen Shen, wcshen1994@163.com
5+
Weichen Shen, weichenswc@163.com
66
77
Reference:
88
[1] Song W, Shi C, Xiao Z, et al. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks[J]. arXiv preprint arXiv:1810.11921, 2018.(https://arxiv.org/abs/1810.11921)

0 commit comments

Comments
 (0)