-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathreadme_commands.txt
31 lines (19 loc) · 2.67 KB
/
readme_commands.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
For pre-training the sentence-level transformer: (Here the data files should be in format source ||| target and --model-path provides path to folder where model files will be save)
./build_gpu/transformer-train --dynet_mem 15500 --minibatch-size 1000 --treport 8000 --dreport 40000 -t $trainfname -d $devfname --model-path $modelfname \
--sgd-trainer 4 --lr-eta 0.0001 -e 35 --patience 10 --use-label-smoothing --encoder-emb-dropout-p 0.1 --encoder-sublayer-dropout-p 0.1 --decoder-emb-dropout-p 0.1 \
--decoder-sublayer-dropout-p 0.1 --attention-dropout-p 0.1 --ff-dropout-p 0.1 --ff-activation-type 1 --nlayers 4 --num-units 512 --num-heads 8
For computing encoder-based (monolingual) representations: (Here input_doc is in format docID ||| source, input_type denotes the portion of data i.e. 0 for train, 1 for dev)
./build_gpu/transformer-computerep --dynet-mem 15500 --model-path $modelfname --input_doc $fname --input_type 0 --rep_type 1
For computing decoder-based (bilingual) representations: (Here input_doc is in format docID ||| source, --input_type denotes the portion of data i.e. 0 for train, 1 for dev)
./build_gpu/transformer-computerep --dynet-mem 15500 --model-path $modelfname --input_doc $fname --input_type 0 --rep_type 2
For training the document-level model with hierarchical attention: (Here the data files should be in format docID ||| source ||| target, --model-file is to give the model a name of your choice,
--context-type is 1 for monolingual and 2 for bilingual context, --use-sparse-soft is 1 for sparse at sentence and soft at word-level)
./build_gpu/transformer-context --dynet_mem 15500 --dynet-devices GPU:0,CPU --minibatch-size 1000 --dtreport 65 --ddreport 325 --train_doc $trainfname --devel_doc $devfname \
--model-path $modelfname --model-file $modelname --context-type 1 --doc-attention-type 3 --use-sparse-soft 1 --use-new-dropout \
--encoder-emb-dropout-p 0.2 --encoder-sublayer-dropout-p 0.2 --decoder-emb-dropout-p 0.2 --decoder-sublayer-dropout-p 0.2 --attention-dropout-p 0.2 --ff-dropout-p 0.2 \
--sgd-trainer 4 --lr-eta 0.0001 -e 35 --patience 10
Decoding the sentence-level Transformer: (Here the test file is in format docID ||| source, only greedy decoding has been implemented)
./build_gpu/transformer-decode --dynet-mem 15500 --model-path $modelfname --beam 1 -T $testfname
Decoding the document-level model with hierarchical attention: (For decoding the representations are computed on the fly)
./build_gpu/transformer-context-decode --dynet-mem 15500 --dynet-devices GPU:0,CPU --model-path $modelfname --model-file $modelname --beam 1 -T $testfname --context-type 1
(Note: need to set --dynet-devices only when using sparsemax)