Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_custom_dataset_from_folder #69

Closed
srashtchi opened this issue Aug 22, 2022 · 3 comments
Closed

load_custom_dataset_from_folder #69

srashtchi opened this issue Aug 22, 2022 · 3 comments

Comments

@srashtchi
Copy link

Hi Silvia

I managed to get my code running fine, thanks for your response.

I have another question , I am trying to make the code smoother, right now in order to create a dataset object I have to save my variable to a .tsv file first, and then use the load_custom_dataset_from_folder method to load the data from .tsv into empty dataset object. without this object obviously the get_corpus() method wouldn't do its magic. See the sample code below.

So basically the question is: is there a way to directly pass my variable to a dataset object without saving and loading?

from octis.dataset.dataset import Dataset
f=Path('/myFolderPath/corpus.tsv')
df.to_csv(f, sep="\t", index=False, header=False, columns = ['document'])

dataset = Dataset()
dataset.load_custom_dataset_from_folder('/myFolderPath/')

texts=dataset.get_corpus()

Originally posted by @srashtchi in #68 (comment)

@srashtchi srashtchi changed the title Hi Silvia load_custom_dataset_from_folder Aug 22, 2022
@srashtchi
Copy link
Author

Is there any chance you could respond to this question?

@silviatti
Copy link
Collaborator

Hello, sorry for the late reply.
If you need the dataset only for the computation of the coherence, then you can directly define the "texts" as a list of lists of strings. I.e.

texts=[['a', 'b', 'c'], ['a', 'd', 'e'], ...]

This will not require to save and load the dataset.
Let me know if this helped :)

Silvia

@srashtchi
Copy link
Author

Thank for the quick reply. I will try this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants