Skip to content

Commit

Permalink
reverse the order, making sure that the final layer init is preserved…
Browse files Browse the repository at this point in the history
…, and becomes the token embedding instead of the other way around. otherwise the loss can be all messed up from a bad init
  • Loading branch information
karpathy committed Jan 14, 2023
1 parent 7c82885 commit 43b37fd
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion model.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def __init__(self, config):
ln_f = nn.LayerNorm(config.n_embd),
))
self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)
self.lm_head.weight = self.transformer.wte.weight # https://paperswithcode.com/method/weight-tying
self.transformer.wte.weight = self.lm_head.weight # https://paperswithcode.com/method/weight-tying

# report number of parameters
n_params = sum(p.numel() for p in self.parameters())
Expand Down

0 comments on commit 43b37fd

Please sign in to comment.