Skip to content
/ OxVf Public

The code base of empathetic dialogue generation

License

Notifications You must be signed in to change notification settings

shalei120/OxVf

Repository files navigation

OxVf

The code base of empathetic dialogue generation

dataloader

Without pretrained embedding

the dataloader can automatically read Facebook empathetic data

Assume empathetic data is at last file level:

../empatheticdialogues_data/train.csv
../empatheticdialogues_data/valid.csv
../empatheticdialogues_data/test.csv

Then at anywhere of python file, call:

textdata = Dataloader('fb')

Then, the data is loaded to the instance textdata
There are 5 data structures in the textdata

1, The train, valid, test data is stored in textdata.dataset

textdata.dataset['train'] is a list, each element is a 5-element tuple:

(context_id, sentence_id, emotion_id, raw_context, raw_sentence)

context_id is a list of integers, each of them is a word index. context here means the previous k sentences in the dialogue.
sentence_id is a list of word indexes, which is the current utterance
emotion_id is the emotion's id
raw_context, raw_sentence are the original words of the context and sentence

2, textdata.word2index is a mapping between vocabulary to index,

Example:

index = textdata.word2index['hello']

3, textdata.index2word

Example:

word = textdata.index2word[23]

4, textdata.emotion2index

5, textdata.index2emotion

Similar to above

With pretrained embedding

For example, if embedding is in Glove format:

word d1 d2 d3 ... d100

all split with blank ' '

Then use

textdata = Dataloader('fb', embfile = '../glove.6B.50d.txt')

embfile stores the path to the embedding file

About

The code base of empathetic dialogue generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published