Tags: chrisji/BERTopic
v0.15 (MaartenGr#1291) Prepare for v0.15 release by including changelog and many documentation updates.
v0.14.1 - ChatGPT support and improved Prompting (MaartenGr#1057)
v0.14 (MaartenGr#977) * Add representation models * bertopic.representation.KeyBERTInspired * bertopic.representation.PartOfSpeech * bertopic.representation.MaximalMarginalRelevance * bertopic.representation.Cohere * bertopic.representation.OpenAI * bertopic.representation.TextGeneration * bertopic.representation.LangChain * bertopic.representation.ZeroShotClassification * Fix topic selection when extracting repr docs * Improve documentation, MaartenGr#769, MaartenGr#954, MaartenGr#912 * Add wordcloud example to documentation * Add title param for each graph, MaartenGr#800 * Improved nr_topics procedure * Fix MaartenGr#952, MaartenGr#903, MaartenGr#911, MaartenGr#965. Add MaartenGr#976
v0.13 (MaartenGr#840) * Calculate topic distributions with .approximate_distribution regardless of the cluster model used * Fully supervised topic modeling with BERTopic * Manual topic modeling with BERTopic * Reduce outliers with 4 different strategies using .reduce_outliers * Install BERTopic without SentenceTransformers for a lightweight package * Get metadata of trained documents such as topics and probabilities using .get_document_info(docs) * Added more support for cuML's HDBSCAN * More images to the documentation and a lot of changes/updates/clarifications * Get representative documents for non-HDBSCAN models by comparing document and topic c-TF-IDF representations * Sklearn Pipeline Embedder
v0.12 (MaartenGr#668) * Online/incremental topic modeling with .partial_fit * Expose c-TF-IDF model for customization with bertopic.vectorizers.ClassTfidfTransformer * Expose attributes for easier access to internal data * Major changes to the Algorithm page of the documentation, which now contains three overviews of the algorithm * Added an example of combining BERTopic with KeyBERT * Added many tests with the intention of making development a bit more stable * Fix MaartenGr#632, MaartenGr#648, MaartenGr#673, MaartenGr#682, MaartenGr#667, MaartenGr#664
v0.11.0 (MaartenGr#578) * Perform hierarchical topic modeling with `.hierarchical_topics` * Visualize hierarchical topic representations with `.visualize_hierarchy` * Extract a text-based hierarchical topic representation with `.get_topic_tree` * Visualize 2D documents with `.visualize_documents()` * Visualize 2D hierarchical documents with `.visualize_hierarchical_documents()` * Create custom labels to the topics throughout most visualizations with `.generate_topic_labels` and `.set_topic_labels` * Manually merge topics with `.merge_topics()` * Added example for finding similar topics between two models in the tips & tricks page * Add multi-modal example in the tips & tricks page * Added native Hugging Face transformers support
v0.10.0 (MaartenGr#492) * Use any dimensionality reduction technique instead of UMAP * Use any clustering technique instead of HDBSCAN * Add a CountVectorizer page with tips and tricks on how to create topic representations that fit your use case * Added pages on how to use other dimensionality reduction and clustering algorithms * Additional instructions on how to reduce outliers in the FAQ * Fixed `None` being returned for probabilities when transforming unseen documents * Replaced all instances of `arg:` with `Arguments:` for consistency * Before saving a fitted BERTopic instance, we remove the stopwords in the fitted CountVectorizer model as it can get quite large due to the number of words that end in stopwords if `min_df` is set to a value larger than 1 * Set `"hdbscan>=0.8.28"` to prevent numpy issues * Update gensim dependency to `>=4.0.0` (MaartenGr#371) * Fix topic 0 not appearing in visualizations (MaartenGr#472) * Fix MaartenGr#506 * Fix MaartenGr#429
v0.9.4 (MaartenGr#335) * Expose diversity parameter * Improve stability of topic reduction * Added property to c-TF-IDF that all IDF values are positive (MaartenGr#351) * Improve stability of `.visualize_barchart()` and `.visualize_hierarchy()` * Major documentation overhaul (including MaartenGr#330) * Drop python 3.6 (MaartenGr#333) * Relax plotly dependency (MaartenGr#88) * Additional logging for `.transform` (MaartenGr#356)
v0.9.3 - Quickfix (MaartenGr#284) * Fix MaartenGr#282, MaartenGr#285, MaartenGr#288
v0.9.2 (MaartenGr#239) * Update default embedding model from 'paraphrase' to 'all' * Fix probability mapping * Optimize cTFIDF topic extraction * Fix algorithm image, update documentation, fix spelling, etc. * Fix MaartenGr#258 * Update README with visualization example