Skip to content

Commit

Permalink
Merge pull request karpathy#71 from cchan/patch-1
Browse files Browse the repository at this point in the history
Zero-grad more aggressively to save memory
  • Loading branch information
karpathy authored Jan 20, 2023
2 parents 1f77d03 + 6716607 commit 3611338
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion train.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,6 @@ def get_lr(iter):
break

# forward backward update, with optional gradient accumulation to simulate larger batch size
optimizer.zero_grad(set_to_none=True)
for micro_step in range(gradient_accumulation_steps):
X, Y = get_batch('train')
if ddp:
Expand All @@ -272,6 +271,7 @@ def get_lr(iter):
logits, loss = model(X, Y)
loss.backward()
optimizer.step()
optimizer.zero_grad(set_to_none=True)

# timing and logging
t1 = time.time()
Expand Down

0 comments on commit 3611338

Please sign in to comment.