Update format for partition input in ReadMe #6

tenggaard · 2021-04-22T11:27:09Z

OCTIS version: 1.2.0
Python version: Python 3.8.3
Operating System: Linux

According to the readme, input in the partition column for a custom dataset should be of the type 'training', 'validation', 'test', which I can't get to yield a partition:

Make sure that the dataset is in the following format:

corpus file: a .tsv file (tab-separated) that contains up to three columns, i.e. the document, the partitition, and the label associated to the document (optional).
vocabulary: a .txt file where each line represents a word of the vocabulary

The partition can be "training", "test" or "validation". An example of dataset can be found here: sample_dataset_.

However, it seems the right format is 'train', 'val', 'test', which does work for me - just passing this on to make the ReadMe clearer.

    def load_custom_dataset_from_folder(self, path):
        """
        Loads all the dataset from a folder
        Parameters
        ----------
        path : path of the folder to read
        """
        self.dataset_path = path
        try:
            if exists(self.dataset_path + "/metadata.json"):
                self._load_metadata(self.dataset_path + "/metadata.json")
            else:
                self.__metadata = dict()
            df = pd.read_csv(self.dataset_path + "/corpus.tsv", sep='\t', header=None)
            if len(df.keys()) > 1:
                df[1] = df[1].replace("train", "a_train")
                df[1] = df[1].replace("val", "b_val")
                df = df.sort_values(1).reset_index(drop=True)

                self.__metadata['last-training-doc'] = len(df[df[1] == 'a_train'])
                self.__metadata['last-validation-doc'] = len(df[df[1] == 'b_val']) + len(df[df[1] == 'a_train'])

The text was updated successfully, but these errors were encountered:

silviatti · 2021-04-22T11:35:43Z

Hi! I just fixed this in the readme.
Thanks again for reporting :) If you'd like to contribute, feel free to make a pull request!

Bye!

Silvia

silviatti added a commit that referenced this issue Apr 22, 2021

fixing readme (#6)

7b07f0b

silviatti closed this as completed Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update format for partition input in ReadMe #6

Update format for partition input in ReadMe #6

tenggaard commented Apr 22, 2021

silviatti commented Apr 22, 2021

Update format for partition input in ReadMe #6

Update format for partition input in ReadMe #6

Comments

tenggaard commented Apr 22, 2021

silviatti commented Apr 22, 2021