MLR is a sub-regional linear model that is widely used in advertising ctr estimates. MLR adopts the divide and conquer strategy: firstly divide the feature space into multiple local intervals, then fit a linear model in each local interval, and the output is the weighted sum of multiple linear models. These two steps are to minimize the loss function. For the goal, learn at the same time. For details, see the Large Scale Piece-wise Linear Model (LS-PLM). The MLR algorithm has three distinct advantages:
- Nonlinear:Choosing enough partitions, the MLR algorithm can fit arbitrarily complex nonlinear functions.
- Scalability:Similar to the LR algorithm, the MLR algorithm has a good scalability for massive samples and ultra-high dimensional models.
- Sparsity:MLR algorithm with,regular terms can get good sparsity.
Note: is the partition function, is the parameter of fitting function . For a given sample x, our prediction function model has two parts. The first part divides the feature space into m regions, and the second part gives the predicted value for each region. The function ensures that the model satisfies the definition of the probability function.
MLR algorithm model uses softmax as the partition function ,Sigmoid function as a fitting function ,and: ,gets the model of MLR as follows:
The schematic diagram of the MLR model is as follows,
This model can be understood from two perspectives:
- The MLR can be regarded as a three-layer neural network with threshold. There are k sigmoid neurons in the hidden layer. The output of each neuron has a gate. The output value of softmax is the switch of the gate.
- The MLR can be regarded as an ensemble model, which is composed of k simple sigmoid models. The output value of softmax is the combination coefficient.
In many cases, a sub-model needs to be built on a part of the data, and then predicted by multiple models. MLR uses softmax to divide the data (soft division) and predict it with a unified model. Another advantage of MLR is that it can be characterized. Combination, some features are active for sigmoid, and other features are active for softmax, multiplying them is equivalent to making feature combinations at lower levels.
Note: Since the output value of sigmoid model is between 0 and 1, and the output value of softmax is between 0 and 1 and normalized, the combined value is also between 0 and 1 (when all sigmoid values are 1, the maximum value can be obtained, of course, in other cases, the combined sum is 1), which can be regarded as a probability.
For the sample (x, y), the cross entropy loss function is:
Note: Under normal circumstances, cross entropy manifests itself as, The meaning of is given, and the probability at y = 1, ifrepresents the probability of Y at given x (i.e., y is not only the probability of y = 1), the expression of cross entropy is as follows:
In this way, the derivative for a single sample is,
Gradient:
-
Model Storage:
- The model parameters of MLR algorithm are: soft Max function parameters:,Sigmoid function parameters:. Where 、is an N-dimensional vector,N is the dimension of the data, that is, the number of features. A matrix of two m*N dimensions is used to represent a softmax matrix and a sigmodi matrix, respectively.
- The truncated values of softmax function and sigmoid function are represented by two m*1 dimension matrices.
-
Model Calculation:
- MLR model is trained by gradient descent method, and the algorithm is carried out iteratively. At the beginning of each iteration, worker pulls up the latest model parameters from PS, calculates the gradient with its own training data, and pushes the gradient to PS.
- PS receives all the gradient values pushed by the worker, takes the average, and updates the PSModel.
The format of data is set by "ml. data. type" parameter, and the number of data features, that is, the dimension of feature vectors, is set by "ml. feature. num" parameter.
MLR on Angel supports "libsvm" and "dummy" data formats as follows:
- dummy format
Each line of text represents a sample in the format of "y index 1 Index 2 index 3...". Among them: the ID of index feature; y of training data is the category of samples, which can take 1 and -1 values; y of prediction data is the ID value of samples. For example, the text of a positive sample [2.0, 3.1, 0.0, 0.0, -1, 2.2] is expressed as "10145", where "1" is the category and "0145" means that the values of dimension 0, 1, 4 and 5 of the eigenvector are not zero. Similarly, samples belonging to negative classes [2.0, 0.0, 0.1, 0.0, 0.0, 0.0] are represented as "-102".
- libsvm format
Each line of text represents a sample in the form of "y index 1: value 1 index 2: value 1 index 3: value 3...". Among them: index is the characteristic ID, value is the corresponding eigenvalue; y of training data is the category of samples, and can take 1 and - 1 values; y of prediction data is the ID value of samples. For example, the text of a positive sample [2.0, 3.1, 0.0, 0.0, -1, 2.2] is expressed as "10:2.01:3.14:-15:2.2", where "1" is the category and "0:2.0" means the value of the zero feature is 2.0. Similarly, samples belonging to negative classes [2.0, 0.0, 0.1, 0.0, 0.0, 0.0] are represented as "-10:2.02:0.1".
-
Algorithm Parameter
- ml.epoch.num:number of iterations
- ml.batch.sample.ratio:sample sampling rate per iteration
- ml.num.update.per.epoch:number of updates of parameters per epoch
- ml.data.validate.ratio:sample ratio for each validation, set to 0 without validation
- ml.learn.rate:initial learning rate
- ml.learn.decay:learning rate decay coefficient
- ml.mlr.reg.l2:L2 penalty coefficient
- ml.mlr.rank:the number of regions, corresponding to 'm' in the model formula
- ml.mlr.v.init:model initialization parameter, standard deviation of Gaussian distribution
- ml.inputlayer.optimizer:Optimizer type, such as "adam", "ftrl" and "momentum"
- ml.data.label.trans.class: Whether to convert labels, default is "NoTrans", optional is "ZeroOneTrans" (to 0-1), "PosNegTrans" (to positive and negative 1), "AddOneTrans" (to add 1), "SubOneTrans" (to subtract 1).
- ml.data.label.trans.threshold: "ZeroOneTrans"(turn to 0-1), "PosNegTrans"(turn +1 or -1)These two transitions also need to set a threshold value, which is greater than 1, and the default threshold is 0.
- ml.data.posneg.ratio: the resampling ratio of positive and negative samples is useful for cases with large difference between positive and negative samples (for example, more than 5 times)
-
Input and Output Parameters
- angel.train.data.path:input path for training data
- angel.predict.data.path:input path for predictive data
- ml.feature.index.range:number of data features
- ml.data.type:data format, support "dummy", "libsvm"
- angel.save.model.path:after training, the path of model preservation
- angel.predict.out.path:path to save prediction results
- angel.log.path:path to save log files
-
Resource Parameters
- angel.workergroup.number:number of worker
- angel.worker.memory.mb:memory size that worker apply
- angel.worker.task.number:the number of tasks on each Worker by default is 1
- angel.ps.number:number of ps
- angel.ps.memory.mb:memory size that ps apply
-
Submitting command
You can submit LR algorithm training tasks to the Yarn cluster by setting the parameters above one by one in submit script or using the following json file to construct the network:
- If you use both parameters and json in script, parameters in script have higher priority.
- If you only use parameters in script you must change
ml.model.class.name
as--ml.model.class.name com.tencent.angel.ml.classification.MixedLogisticRegression
and do not set this parameterangel.ml.conf
which is for json file path. Here we provide an example submitted by using json file(see data,see Json description for a complete description of the Json configuration file)
{
"data": {
"format": "dummy",
"indexrange": 148,
"validateratio": 0.1,
"numfield": 13,
"sampleratio": 0.2
},
"train": {
"epoch": 5,
"lr": 0.8,
"decayclass": "WarmRestarts",
"decayalpha": 0.05
},
"model": {
"modeltype": "T_FLOAT_DENSE",
"modelsize": 148
},
"default_optimize
"default_optimizer": {
"type": "momentum",
"momentum": 0.9,
"reg2": 0.01
},
"layers": [
{
"name": "sigmoid",
"type": "simpleinputlayer",
"outputdim": 10,
"transfunc": "sigmoid"
},
{
"name": "softmax",
"type": "simpleinputlayer",
"outputdim": 10,
"transfunc": "softmax"
},
{
"name": "dotpooling",
"type": "dotpooling",
"outputdim": 1,
"inputlayers": [
"sigmoid",
"softmax"
]
},
{
"name": "simplelosslayer",
"type": "simplelosslayer",
"lossfunc": "CrossEntropyLoss",
"inputlayer": "dotpooling"
}
]
}
runner="com.tencent.angel.ml.core.graphsubmit.GraphRunner"
modelClass="com.tencent.angel.ml.core.graphsubmit.AngelModel"
$ANGEL_HOME/bin/angel-submit \
--angel.job.name mlr \
--action.type train \
--angel.app.submit.class $runner \
--ml.model.class.name $modelClass \
--angel.train.data.path $input_path \
--angel.save.model.path $model_path \
--angel.log.path $log_path \
--angel.workergroup.number $workerNumber \
--angel.worker.memory.gb $workerMemory \
--angel.worker.task.number $taskNumber \
--angel.ps.number $PSNumber \
--angel.ps.memory.gb $PSMemory \
--angel.output.path.deleteonexist true \
--angel.task.data.storage.level $storageLevel \
--angel.task.memorystorage.max.gb $taskMemory \
--angel.worker.env "LD_PRELOAD=./libopenblas.so" \
--angel.ml.conf $mixedlr_json_path \
--ml.optimizer.json.provider com.tencent.angel.ml.core.PSOptimizerProvider
[1]Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction