Latent Dirichlet Allocation (LDA) modeling a a type of Topic modelling
(Not original work, content collected from various sources as described in the bottom of this page)
- LDA states that each document in a corpus is a combination of a fixed number of topics.
- A topic has a probability of generating various words, where the words are all the observed words in the corpus.
- These ‘hidden’ topics are then surfaced based on the likelihood of word co-occurrence
Image Source: http://chdoig.github.io/pytexas2015-topic-modeling/#/3/4
- https://www.kaggle.com/datasets/benhamner/nips-papers?select=papers.csv
- https://www.yelp.com/dataset
- https://github.com/deepmind/rc-data
- http://jmcauley.ucsd.edu/data/amazon/qa/
- https://towardsdatascience.com/evaluate-topic-model-in-python-latent-dirichlet-allocation-lda-7d57484bb5d0
- https://towardsdatascience.com/unsupervised-nlp-topic-models-as-a-supervised-learning-input-cf8ee9e5cf28
- https://www.analyticsvidhya.com/blog/2019/08/how-to-remove-stopwords-text-normalization-nltk-spacy-gensim-python/
- https://www.analyticsvidhya.com/blog/2020/02/quick-introduction-bag-of-words-bow-tf-idf/
- https://www.analyticsvidhya.com/blog/2020/04/beginners-guide-exploratory-data-analysis-text-data/
- https://github.com/kapadias/mediumposts/blob/master/natural_language_processing/topic_modeling/notebooks/Evaluate%20Topic%20Models.ipynb
- https://github.com/AnshMittal1811/MachineLearning-AI/blob/master/046_Topic_Modelling/01_Introdcution_to_Topic_Modeling.ipynb
- https://github.com/AnshMittal1811/MachineLearning-AI/blob/master/046_Topic_Modelling/02_Evaluate_Topic_Models.ipynb
- https://github.com/AnshMittal1811/MachineLearning-AI/tree/master/046_Topic_Modelling/05_Topic_Modelling_using_BERTopic/BERTopic-main