OCTIS could not evaluate an external result? #71

KesselZ · 2022-09-03T00:52:00Z

OCTIS version: 1.10.4
Python version: 3.9
Operating System: Windows 11

Description

I got an error:
unable to interpret topic as either a list of tokens or a list of ids.

What I Did

I use another method to get the topics of 20newsgroup, and I want to use the metrics provided by octis to test their quality.

So, I have many lists of topics. for example, one list is: ['cheek', 'yep', 'huh', 'ken', 'lets', 'ignore', 'forget', 'art', 'dilemma', 'dilemna']. I need to calculate the topic cohesion between these topics and the document(corpus).

As a topic modeling metrics system, I thought OCTIS may do this for me. However, it is hard.

I got this error because: among my result topics, some of the words are not in the corpus of 20newsgroup provided by OCTIS. I got my data from scikit-learn's 20newsgroup. So I think the only explanation is that the corpus of 20newsgroup from scikit-learn and OCTIS is different.

Therefore, it seems that the only solution is to use OCTIS's dataset to do the training. And then use OCTIS's evaluation system to do the topic cohesion. Does this mean that OCTIS is not accepting external topics?

Not sure if there are any other solutions for this case. I believe OCTIS should be able to work with external topic modeling methods. I just did not find the way. So please tell me if there is any suggestions.

KesselZ · 2022-09-03T00:57:13Z

I just saw that in the introduction of OCTIS, it was mentioned that OCTIS provides the 20newsgroup. But it says "# Docs: 16309".

However, 20newsgroup from scikit-learn has about 18000 documents. Is this the reason they are not compatible with each other?

KesselZ · 2022-09-03T02:47:21Z

An update is: BBCNews works properly. The difference is that BBCNews in sklean has the size of 2225 as well, they are the same as the description in OCTIS. So I think the reason for 20newsgroup not working is that OCTIS provides the 20newsgroup corpus with the wrong size?

silviatti · 2022-09-09T15:43:42Z

Hello,
20Newsgroup in OCTIS is different from the other version because we preprocessed it. This means that it also removes documents with less than a certain number of words. That's why the two number of documents do not match. However, you can use OCTIS just for evaluation without training a new topic model.

For example, if you want to use topic coherence, you can do the following:

# the list of topics
topics = {"topics": [['cheek', 'yep', 'huh', 'ken', 'lets', 'ignore', 'forget', 'art', 'dilemma', 'dilemna'], ....]}

# this is the list of documents that you want to use as a reference to compute the topic coherence, 
# i.e. in your case, scikit's 20newsgroups 
texts = [['cheek', 'yep'], [ 'yep', 'huh', 'lets'], ....] 

# define the metric and provide texts as input 
npmi = Coherence(texts=texts, topk=10, measure='c_npmi')

# get the score
npmi.score(topics)

Hope it helps!

Silvia

KesselZ · 2022-10-17T00:23:43Z

Hello, 20Newsgroup in OCTIS is different from the other version because we preprocessed it. This means that it also removes documents with less than a certain number of words. That's why the two number of documents do not match. However, you can use OCTIS just for evaluation without training a new topic model.

For example, if you want to use topic coherence, you can do the following:
# the list of topics
topics = {"topics": [['cheek', 'yep', 'huh', 'ken', 'lets', 'ignore', 'forget', 'art', 'dilemma', 'dilemna'], ....]}

# this is the list of documents that you want to use as a reference to compute the topic coherence, 
# i.e. in your case, scikit's 20newsgroups 
texts = [['cheek', 'yep'], [ 'yep', 'huh', 'lets'], ....] 

# define the metric and provide texts as input 
npmi = Coherence(texts=texts, topk=10, measure='c_npmi')

# get the score
npmi.score(topics)
Hope it helps!

Silvia

Thanks for your help!

KesselZ closed this as completed Oct 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCTIS could not evaluate an external result? #71

OCTIS could not evaluate an external result? #71

KesselZ commented Sep 3, 2022

KesselZ commented Sep 3, 2022

KesselZ commented Sep 3, 2022

silviatti commented Sep 9, 2022

KesselZ commented Oct 17, 2022

OCTIS could not evaluate an external result? #71

OCTIS could not evaluate an external result? #71

Comments

KesselZ commented Sep 3, 2022

Description

What I Did

KesselZ commented Sep 3, 2022

KesselZ commented Sep 3, 2022

silviatti commented Sep 9, 2022

KesselZ commented Oct 17, 2022