Hope you can find everything you want and happy parsing 😃.
First, get the dynet and eigen3 library:
git submodule init
git submodule update
You will also need boost
Then compile:
mkdir build
cmake .. -DEIGEN3_INCLUDE_DIR=${YOUR_EIGEN3_PATH}
make
If success, you should found the executable ./bin/trans_parser
Currently, we support the following parsers in the corresponding papers:
- d15: Transition-Based Dependency Parsing with Stack Long Short-Term Memory
- b15: Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs
- k16: Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations
You can config different parsers with --architecture
option in the command.
Special constrains on parsing action is adopted to make sure the output tree has only one root word with root dependency relation.
--root
is used to specify the relation name. Dummy root token is positioned at right according to Going to the roots of dependency parsing.
Dynamic oracles outputs the optimal action at non-canonical states (states that are not on the oracle transition sequence).
- ArcStandard: A tabular method for dynamic oracles in transition-based parsing
- {ArcHybrid|ArcEager}: Training deterministic parsers with non-deterministic oracles
Dynamic oracles can be activated by setting --supervised_oracle
option as true
.
Nosifying means randomly set some words as unknown word to improve the model's generalization ability. Two random replacement strategies are implemented:
- singleton: random replace singleton during training according to Transition-Based Dependency Parsing with Stack Long Short-Term Memory
- word: word dropout strategy according to Deep unordered composition rivals syntactic methods for text classification.
Training on partially annotated trees generally follows Training Dependency Parsers with Partial Annotation and Constrained arc-eager dependency parsing. The basic idea is performing constrained decoding on the partial tree to get a pseduo-oracle sequence and use it as training data.
Training on partial tree is specified by setting --partial
option as true
.
WARNING: training with partial tree on ArcStandard
system is impossible at current status.
Training with beam-search follows Globally Normalized Transition-Based Neural Networks. Early stopping is used.
Training with beam-search is specified by setting --supervised_objective
to structure
and --beam_size
greater than 1.
Testing with beam-search only needs to set --beam_size
greater than 1.
An example of the PTB data
1 Ms. Ms. NNP NNP NNP 2 nn _ _
2 Haag Haag NNP NNP NNP 3 nsubj _ _
3 plays plays VBZ VBZ VBZ 0 root _ _
4 Elianti Elianti NNP NNP NNP 3 dobj _ _
5 . . . . . 3 punct _ _
Commands:
./bin/trans_parser --dynet-mem 1024 \
--dynet-seed 1234 \
--train \
--architecture d15 \
-T ./data/PTB_train_auto.conll \
-d ./data/PTB_development_auto.conll \
-w ./data/sskip.100.vectors.ptb_filtered \
--lambda 1e-6 \
--noisify_method singleton \
--optimizer_enable_eta_decay true \
--optimizer_enable_clipping true \
--external_eval ./script/eval_ex_enpunt.py
Commands:
./bin/trans_parser --dynet-mem 1024 \
--dynet-seed 1234 \
--train \
--architecture d15 \
-T ./data/PTB_train_auto.conll \
-d ./data/PTB_development_auto.conll \
-w ./data/sskip.100.vectors.ptb_filtered \
--lambda 1e-6 \
--noisify_method singleton \
--optimizer_enable_eta_decay true \
--optimizer_enable_clipping true \
--external_eval ./script/eval_ex_enpunt.py \
--beam_size 8 \
--supervised_objective structure
Example of partial annotated tree:
1 Ms. Ms. NNP NNP NNP 2 nn _ _
2 Haag Haag NNP NNP NNP _ _ _ _
3 plays plays VBZ VBZ VBZ 0 root _ _
4 Elianti Elianti NNP NNP NNP _ _ _ _
5 . . . . . 3 punct _ _
The token without annotation (say Haag, Elianti in this example) is marked as _
.
Commands:
./bin/trans_parser --dynet-mem 1024 \
--dynet-seed 1234 \
--train \
--architecture d15 \
-T ./data/PTB_train_auto.drop_arc_0.50.conll \
-d ./data/PTB_development_auto.conll \
-w ./data/sskip.100.vectors.ptb_filtered \
--lambda 1e-6 \
--noisify_method singleton \
--optimizer_enable_eta_decay true \
--optimizer_enable_clipping true \
--external_eval ./script/eval_ex_enpunt.py \
--partial true
UPDATE: 2017/09/02