We build up a RST discourse parser described in [tree Representations in Transition System for RST Parsing] based on the paper of http://aclanthology.coli.uni-saarland.de/pdf/P/P17/P17-2029.pdf.
Due to the licence of RST data corpus, we can't include the data in our project folder. To reproduce the result in the paper, you need to download it from the LDC. Put the TRAINING and TEST folders under data directory
-
Prepare Virtual Environment
conda create -n rst_env python=3.6
conda activate rst_env
pip install requirements.txt
-
Configurations
directory of data: --data_dir [../data/TRAINING | ../data/TEST]
directory of model or any output: --output_dir saved_model
directory of Stanford CoreNLP tool: --corenlp_dir [your stanford nlp path]
parse type, including Treebank and any other type of data: --parse_type [Treebank | Your data with the dir name under folder of data]
Set True to use RN~model and False to use N~model: --isFlat [True|False]
use Stanford CoreNLP tool to parse data first: --preprocess [True|False]
whether to extract feature templates, action maps and relation maps: --prepare [True|False]
whether to train on Treebank: --train [True|False]
whether to evaluate on Treebank: --eval [True|False]
whether to predict any data: --pred [True|False]
All the codes are tested under Python 3.6. And see requirements.txt for python library dependency and you can install them with pip.