Skip to content

Commit

Permalink
small tweaks, make default WD be 0.1 as is often cited, and remove sp…
Browse files Browse the repository at this point in the history
…urious init of LayerNorm, which is already initialized at 1,0
  • Loading branch information
karpathy committed Feb 6, 2023
1 parent ab21d6c commit 8b1e432
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 5 deletions.
4 changes: 0 additions & 4 deletions model.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,10 +173,6 @@ def _init_weights(self, module):
torch.nn.init.zeros_(module.bias)
elif isinstance(module, nn.Embedding):
torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
elif isinstance(module, (LayerNorm, nn.LayerNorm)):
torch.nn.init.ones_(module.weight)
if module.bias is not None:
torch.nn.init.zeros_(module.bias)

def forward(self, idx, targets=None):
device = idx.device
Expand Down
2 changes: 1 addition & 1 deletion train.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
# adamw optimizer
learning_rate = 6e-4 # max learning rate
max_iters = 600000 # total number of training iterations
weight_decay = 1e-2
weight_decay = 1e-1
beta1 = 0.9
beta2 = 0.95
grad_clip = 1.0 # clip gradients at this value, or disable if == 0.0
Expand Down

0 comments on commit 8b1e432

Please sign in to comment.