-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCTIS could not evaluate an external result? #71
Comments
I just saw that in the introduction of OCTIS, it was mentioned that OCTIS provides the 20newsgroup. But it says "# Docs: 16309". However, 20newsgroup from scikit-learn has about 18000 documents. Is this the reason they are not compatible with each other? |
An update is: BBCNews works properly. The difference is that BBCNews in sklean has the size of 2225 as well, they are the same as the description in OCTIS. So I think the reason for 20newsgroup not working is that OCTIS provides the 20newsgroup corpus with the wrong size? |
Hello, For example, if you want to use topic coherence, you can do the following: # the list of topics
topics = {"topics": [['cheek', 'yep', 'huh', 'ken', 'lets', 'ignore', 'forget', 'art', 'dilemma', 'dilemna'], ....]}
# this is the list of documents that you want to use as a reference to compute the topic coherence,
# i.e. in your case, scikit's 20newsgroups
texts = [['cheek', 'yep'], [ 'yep', 'huh', 'lets'], ....]
# define the metric and provide texts as input
npmi = Coherence(texts=texts, topk=10, measure='c_npmi')
# get the score
npmi.score(topics) Hope it helps! Silvia |
Thanks for your help! |
Description
I got an error:
unable to interpret topic as either a list of tokens or a list of ids.
What I Did
I use another method to get the topics of 20newsgroup, and I want to use the metrics provided by octis to test their quality.
So, I have many lists of topics. for example, one list is: ['cheek', 'yep', 'huh', 'ken', 'lets', 'ignore', 'forget', 'art', 'dilemma', 'dilemna']. I need to calculate the topic cohesion between these topics and the document(corpus).
As a topic modeling metrics system, I thought OCTIS may do this for me. However, it is hard.
I got this error because: among my result topics, some of the words are not in the corpus of 20newsgroup provided by OCTIS. I got my data from scikit-learn's 20newsgroup. So I think the only explanation is that the corpus of 20newsgroup from scikit-learn and OCTIS is different.
Therefore, it seems that the only solution is to use OCTIS's dataset to do the training. And then use OCTIS's evaluation system to do the topic cohesion. Does this mean that OCTIS is not accepting external topics?
Not sure if there are any other solutions for this case. I believe OCTIS should be able to work with external topic modeling methods. I just did not find the way. So please tell me if there is any suggestions.
The text was updated successfully, but these errors were encountered: