This repository contains code for training and using the Abstract Meaning Representation model described in: AMR Parsing as Graph Prediction with Latent Alignment
If you use our code, please cite our paper as follows:
@inproceedings{Lyu2018AMRPA,
title={AMR Parsing as Graph Prediction with Latent Alignment},
author={Chunchuan Lyu and Ivan Titov},
booktitle={Proceedings of the Annual Meeting of the Association for Computational Linguistics},
year={2018}
}
- Python 3.6
- Stanford Corenlp 3.9.1 (the python wrapper is not compatible with the new one)
- pytorch 0.20
- GloVe embeddings
- AMR dataset and resources files
- Set up Stanford Corenlp server, which feature extraction relies on.
- Change file paths in utility/constants.py accordingly.
Either a) combine all *.txt
files into a single one, and use Stanford CoreNLP to extract ner, pos and lemma.
Processed file saved in the same folder.
python src/preprocessing.py
or b) process from AMR-to-English aligner using java script in AMR_FEATURE (I used Eclipse to run it).
Build the copying dictionary and recategorization system (can skip as they are in data/).
python src/rule_system_build.py
Build data into tensor.
python src/data_build.py
Default model is saved in [save_to]/gpus_0valid_best.pt . (save_to is defined in constants.py)
python src/train.py
Load model to parse from pre-build data.
python src/generate.py -train_from [gpus_0valid_best.pt]
Please use amr-evaluation-tool-enhanced. This is based on Marco Damonte's amr-evaluation-tool But with correction concerning unlabeled edge score.
Either a) parse a file where each line consists of a single sentence, output saved at [file]_parsed
python src/parse.py -train_from [gpus_0valid_best.pt] -input [file]
or b) parse a sentence where each line consists of a single sentence, output saved at [file]_parsed
python src/parse.py -train_from [gpus_0valid_best.pt] -text [type sentence here]
Keeping the files under data/ folder unchanged, download model Should allow one to run parsing.
This "python src/preprocessing.py" starts with sentence original AMR files, while the paper version is trained on tokenized version provided by AMR-to-English aligner So the results could be slightly different. Also, to build a parser for out of domain data, please start preprocessing with "python src/preprocessing.py" to make everything consistent.
Contact [email protected] if you have any questions!