Releases: Qihoo360/tensornet
v0.2.0
Full Changelog: v0.1.3...v0.2.0
Add Global Normalization
Code: normalization_layer.py
Changes:
-
Use Cumulative Sum and Sum of Squares Across Samples:
- Instead of relying solely on mean and variance within a single batch, the normalization now utilizes the cumulative sum and sum of squares across all samples.
-
Synchronize Across Nodes After Accumulating Partial Batches:
- After accumulating a portion of batches, the statistics are synchronized across all nodes to ensure consistency.
-
Update BN Table Statistics Per Batch:
- For each batch, the statistics in the Batch Normalization (BN) table are updated. The overall mean and variance are computed, and the results are outputted accordingly.
-
Store Mean and Variance in Checkpoint:
- The mean and variance are stored within the checkpoint, serving as parameters for prediction.
- The statistics are also stored in HDFS for future reference and usage.
Ref
Local Testing
0.1.3.post2-tool
Full Changelog: v0.1.3.post1...0.1.3.post2-tool
新增 qihoo-tensornet-tool pip package, 主要包括
- 合并sparse: sparse文件分散在不同层级下, 合并成一个hdfs目录下的单个或多个文件
- sparse/dense的并行度修改: 旧版在并行度确定后无法更改, 不能实现扩缩容.
- 合并外部embedding: 通过传入 已有sign与新sign值的对应关系, 实现新增embedding
新增功能都通过脚本手动提交, 暂不支持日常训练嵌入
v0.1.3.post1
Full Changelog: v0.1.3...v0.1.3.post1
v0.1.3
What's Changed
- Format tensornet build env
- Format release pipeline
- Add deleteByShow for longtail embeddings
- Compat input format and file pattern
- Add sequence_embedding_features.py
- Provide tool to merge sparse table
Full Changelog: 0.1.1...v0.1.3
tensornet-0.1.1
enhance:
- optimize parameter push and pull performance
- compatible with tf-2.3, tf-2.4
- support sparse table save with name.
- dd feature drop show threshold and update show decay with moving avg
add:
- add ftrl optimizer
- add deepfm demo.
delete:
- delete
version
field in sparse table.
bug fix:
- fix model reload warning bug
tensornet-0.1.0
TensorNet V-0.1.0
First version of TensorNet.
In this version we have published tensornet with async train mode support which we have tested completely.
the main API contained:
-
tn.distribute.PsStrategy.
tn.distribute.PsStrategy
with same interface with tensorflow's strategy in order to cluster management. -
tn.feature_column.category_column
tn.feature_column.category_column
is one of the important API of tensornet, with which we could define sparse feature column support dimension close to 2**64.tn.feature_column.category_column
has the same interface withtf.feature_column
. -
tn.layers.EmbeddingFeatures
tn.layers.EmbeddingFeatures
is the second important API in tensornet, in which we pull and push sparse embeddiong vector from parameter server. -
tn.optimizer.Optimizer
We wrapped tensorflow optimizer intn.optimizer.Optimizer
, this is mainly used in asyc train mode, in which we intercept tensorflow's gradients update logic, and update gradients in parameter server asynchronously. -
tn.model.Model
We inherited keras.layers.model and override its save model method to support save and load sparse feature in parameter server.