Dialogue Graph Modeling for Conversational Machine Reading

This is the code for the paper Dialogue Graph Modeling for Conversational Machine Reading.

Here is a codalab bundle link to reproduce our results.

1. Requirements

(Our experiment environment for reference)

Python 3.6
Python 2.7 (for open discourse tagging tool)
Pytorch (1.6.0)
NLTK (3.4.5)
spacy (2.0.16)
transformers (2.8.0)
editdistance (0.5.2)
dgl (0.5.3)

2. Datasets

Download the dataset and extract it, or to use it directly in the directory data/sharc_raw

3. Instructions

3.1 Preprocess Data

Fixing errors in raw data

python fix_question.py

EDU segmentation

The environment requirements are listed here

cd segedu
python preprocess_discourse_segment.py
python sharc_discourse_segmentation.py

Discourse relations tagging

We need to train a discourse relation tagging model according to here. Firstly, download Glove for pretrained word vector and put it in DialogueDiscourseParsing/glove/glove.6B.100d.txt.

Secondly, preprocess data for training.

python data_pre.py <input_dir> <output_file>

Or you can directly use the data in DialogueDiscourseParsing/data/processed_data.

Then train the parser with

python main.py --train

The model should be stored in DialogueDiscourseParsing/dev_model. One can directly use the model trained here.

Finally, we can inference for ShARC dataset to get the discourse relations.

python construct_tree_mapping.py
python convert.py

cd DialogueDiscourseParsing
python main_.py

Preprocessing for Decision Making

python preprocess_decision_base.py

Preprocessing for Question Generation

python preprocess_span.py

All the preprocessed data can be found in the directory ./data. You can also download it here

3.2 Decision Making and Question Generation

To train the model on decision making subtask, run the following:

python -u train_sharc.py \
--train_batch=16 \
--gradient_accumulation_steps=2 \
--epoch=5 \
--seed=323 \
--learning_rate=5e-5 \
--loss_entail_weight=3.0 \
--dsave="out/{}" \
--model=decision_gcn \
--early_stop=dev_0a_combined \
--data=./data/ \
--data_type=decision_electra-large-discriminator \
--prefix=train_decision \
--trans_layer=2 \
--eval_every_steps=300

The trained model and corresponding results are stored in out/train_decision

For question generation subtask, we first extract the under-specified span by following:

python -u train_sharc.py \
--train_batch=16 \
--gradient_accumulation_steps=2 \
--epoch=5 \
--seed=115 \
--learning_rate=5e-5 \
--dsave="out/{}" \
--model=span \
--early_stop=dev_0_combined \
--data=./data/ \
--data_type=span_electra-large-discriminator \
--prefix=train_span \
--eval_every_steps=100

The trained model and corresponding results are stored in out/train_span

Then, use the inference result of under-specified span and the rule document to generate follow-up questions:

python -u qg.py \
--fin=./data/sharc_raw/json/sharc_dev.json \
--fpred=./out/inference_span \  # directory of span prediction
--model_recover_path=/absolute/path/to/pretrained_models/qg.bin \
--cache_path=/absolute/path/to/pretrain_models/unilm/

The final results are stored in final_res.json

Acknowledgement

Part of code is modified from the [Discern](https://github.com/Yifan-Gao/Discern) implementation.

Name	Name	Last commit message	Last commit date
Latest commit ozyyshr Update README.md Sep 1, 2021 8121390 · Sep 1, 2021 History 27 Commits
DialogueDiscourseParsing	DialogueDiscourseParsing	Update main_inference.py	Jun 28, 2021
data	data	first commit	May 22, 2021
model	model	first commit	May 22, 2021
segedu	segedu	Create trained_model.torchsave	Jun 28, 2021
unilmqg	unilmqg	first commit	May 22, 2021
README.md	README.md	Update README.md	Sep 1, 2021
construct_tree_mapping.py	construct_tree_mapping.py	first commit	May 22, 2021
convert.py	convert.py	first commit	May 22, 2021
evaluator.py	evaluator.py	Create evaluator.py	Jun 28, 2021
fix_question.py	fix_question.py	add fix_question.py	Jun 28, 2021
preprocess_decision_base.py	preprocess_decision_base.py	first commit	May 22, 2021
preprocess_span.py	preprocess_span.py	first commit	May 22, 2021
qg.py	qg.py	first commit	May 22, 2021
train_sharc.py	train_sharc.py	first commit	May 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dialogue Graph Modeling for Conversational Machine Reading

1. Requirements

2. Datasets

3. Instructions

3.1 Preprocess Data

Fixing errors in raw data

EDU segmentation

Discourse relations tagging

Preprocessing for Decision Making

Preprocessing for Question Generation

3.2 Decision Making and Question Generation

Acknowledgement

About

Releases

Packages

Languages

ozyyshr/DGM

Folders and files

Latest commit

History

Repository files navigation

Dialogue Graph Modeling for Conversational Machine Reading

1. Requirements

2. Datasets

3. Instructions

3.1 Preprocess Data

Fixing errors in raw data

EDU segmentation

Discourse relations tagging

Preprocessing for Decision Making

Preprocessing for Question Generation

3.2 Decision Making and Question Generation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages