Skip to content

Commit

Permalink
Finish training Gemma transcoders
Browse files Browse the repository at this point in the history
  • Loading branch information
neverix committed Aug 5, 2024
1 parent 2e3dc93 commit f20fc24
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion scripts/train_gemma_sae.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def train(
cache_size = 2**16,
cache_batch_size = 256,
cache_ratio=1.0,
batch_size = 2**16,
batch_size = 2048,
max_seq_len = 128,
sparsity_coefficients=[4e-6],
# save_steps=2500,
Expand Down
2 changes: 1 addition & 1 deletion train_gemmas.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import os
layers = [1, 2, 3, 4, 5]
layers = [0]
for layer_idx in range(len(layers)):
layer = layers[layer_idx]
restore = None # if layer_idx == 0 else f"weights/phi-l{layers[layer_idx-1]}-gated.safetensors"
Expand Down

0 comments on commit f20fc24

Please sign in to comment.