Skip to content

Commit

Permalink
batch norm: hide statistics from solver, simplifying layer definition
Browse files Browse the repository at this point in the history
batch norm statistics are not learnable parameters subject to solver
updates, so they must be shielded from the solver. `BatchNorm` layer now
masks its statistics for itself by zeroing parameter learning rates
instead of relying on the layer definition.

n.b. declaring `param`s for batch norm layers is no longer allowed.
  • Loading branch information
shelhamer committed Sep 13, 2016
1 parent 3b6fd1d commit c8f446f
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 4 deletions.
6 changes: 2 additions & 4 deletions include/caffe/layers/batch_norm_layer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,8 @@ namespace caffe {
* mean/variance statistics via a running average, which is then used at test
* time to allow deterministic outputs for each input. You can manually toggle
* whether the network is accumulating or using the statistics via the
* use_global_stats option. IMPORTANT: for this feature to work, you MUST set
* the learning rate to zero for all three blobs, i.e., param {lr_mult: 0} three
* times in the layer definition. For reference, these three blobs are (0)
* mean, (1) variance, and (2) the moving average factor.
* use_global_stats option. For reference, these statistics are kept in the
* layer's three blobs: (0) mean, (1) variance, and (2) moving average factor.
*
* Note that the original paper also included a per-channel learned bias and
* scaling factor. To implement this in Caffe, define a `ScaleLayer` configured
Expand Down
8 changes: 8 additions & 0 deletions src/caffe/layers/batch_norm_layer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@ void BatchNormLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
this->blobs_[i]->mutable_cpu_data());
}
}
// Mask statistics from optimization by setting local learning rates
// for mean, variance, and the bias correction to zero.
CHECK_EQ(this->layer_param_.param_size(), 0)
<< "Cannot configure batch normalization statistics as layer parameters.";
for (int i = 0; i < this->blobs_.size(); ++i) {
ParamSpec* fixed_param_spec = this->layer_param_.add_param();
fixed_param_spec->set_lr_mult(0.);
}
}

template <typename Dtype>
Expand Down

0 comments on commit c8f446f

Please sign in to comment.