-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate 3 different topic modeling algorithms #55
Comments
Hello,
I am not sure if BERTopic generates the document-topic and word-topic distributions (in that case, you will not be able to compute the topic significance metrics). Maybe you'd like to consider Contextualized Topic Models (CTM) which is a topic model that uses pre-trained contextualized representations (as BERTopic). CTM is part of OCTIS too. Let me know if you have further questions, Silvia |
Hello Silvia
the "topic_term_dist" contains the normalized distribution of word-topic:
|
Which diversity metric are you using? Can you also show the snippet of the code in which you call the metric? model_output = {"topic-word-matrix": topic_term_dist} And then use it to compute the score of a metric. For example, div = KLDivergence()
result = div.score(model_output) Let me know if it works. Silvia |
yes, it works perfectly. |
Hello, sorry for the late reply. mapping each document with a topic is indeed a strategy to label documents. In OCTIS we provide some already labeled corpora, you may want to have a look at those. For example, 20 Newsgroups and BBC news. And yes, |
Description
I am a PhD candidate and I need to evaluate the performance of three different topic model algorithm including: LDA, LSI and Bertopic. ( LDA and LSI were trained using the Gensim package)
what are the relevance metrics that I should use apart from coherence score? I would like to include in my paper a sort of table or graph that shows an evaluation in term of accuracy of the model (coherence score) and relevance of topics ( should I use the topic diversity metric ?)
Thank you
What I Did
The text was updated successfully, but these errors were encountered: