Dynamic Topic Boundary Refinement (DTBR) for Dialogue Topic Segmentation

Installation

To use this repository, clone the repository and install the required dependencies:

Clone the repository

git clone https://github.com/your-username/DyDTS.git
cd DyDTS

Install dependencies

We recommend using a virtual environment (e.g., venv, conda) to install the dependencies.

pip install -r requirements.txt

Usage

1. Data Description

DialSeg711 is a real-world dataset consisting of 711 English dialogues, sourced from MultiWOZ and KVRET. It exhibits an average of 4.9 topic segments and 5.6 utterances per segment. Doc2Dial is a synthetic dataset comprising over 4,100 English conversations grounded in 450+ documents across four domains. It presents an average of 3.7 topic segments and 3.5 utterances per segment.

Details of Dialogue Datasets

Datasets	DialSeg711	Doc2Dial
#samples	711	4100
#Avg. Topic Segments/Dialogue	4.9	3.7
#Avg. Utterances/Topic Segments	3.7	3.5

2. Data Preparation

Prepare your dialogue data in the required format. The dataset should consist of a series of utterances, where each dialogue is represented as a sequence of text. The dataset is available right here

python data_prepare.py --data_dir data/dialseg711 --file_name 711.pkl --output_dir processed_711_data --model_name  sup-simcse-bert-base-uncased

3. Training

To train the model on your dataset:

python train.py --data_dir processed_711_data --model_name sup-simcse-bert-base-uncased --output_dir model_711_trained

4. Evaluation

To evaluate the model's performance, we provide evaluation scripts and model for calculating various metrics, such as Pk and WD, based on the segmented output:

python inference.py --data_dir data/dialseg711 --model_name sup-simcse-bert-base-uncased --output_dir model_711

Contributing

We welcome contributions to improve the ATBR method. Feel free to fork the repository and submit pull requests for:

Bug fixes
Feature enhancements
Improvements to the documentation

Contact

For any questions, feel free to open an issue or contact the project maintainers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Topic Boundary Refinement (DTBR) for Dialogue Topic Segmentation

Installation

Clone the repository

Install dependencies

Usage

1. Data Description

Details of Dialogue Datasets

2. Data Preparation

3. Training

4. Evaluation

Contributing

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
LICENSE		LICENSE
README.md		README.md
data_prepare.py		data_prepare.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

License

Mark131434/DyDTS

Folders and files

Latest commit

History

Repository files navigation

Dynamic Topic Boundary Refinement (DTBR) for Dialogue Topic Segmentation

Installation

Clone the repository

Install dependencies

Usage

1. Data Description

Details of Dialogue Datasets

2. Data Preparation

3. Training

4. Evaluation

Contributing

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages