Skip to content

Tags: chrisji/BERTopic

Tags

v0.15.0

Toggle v0.15.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
v0.15 (MaartenGr#1291)

Prepare for v0.15 release by including changelog and many documentation updates.

v0.14.1

Toggle v0.14.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
v0.14.1 - ChatGPT support and improved Prompting (MaartenGr#1057)

v0.14.0

Toggle v0.14.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
v0.14 (MaartenGr#977)

* Add representation models
  * bertopic.representation.KeyBERTInspired
  * bertopic.representation.PartOfSpeech
  * bertopic.representation.MaximalMarginalRelevance
  * bertopic.representation.Cohere
  * bertopic.representation.OpenAI
  * bertopic.representation.TextGeneration
  * bertopic.representation.LangChain
  * bertopic.representation.ZeroShotClassification
* Fix topic selection when extracting repr docs
* Improve documentation, MaartenGr#769, MaartenGr#954, MaartenGr#912
* Add wordcloud example to documentation
* Add title param for each graph, MaartenGr#800
* Improved nr_topics procedure
* Fix MaartenGr#952, MaartenGr#903, MaartenGr#911, MaartenGr#965. Add MaartenGr#976

v0.13.0

Toggle v0.13.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
v0.13 (MaartenGr#840)

* Calculate topic distributions with .approximate_distribution regardless of the cluster model used
* Fully supervised topic modeling with BERTopic
* Manual topic modeling with BERTopic
* Reduce outliers with 4 different strategies using .reduce_outliers
* Install BERTopic without SentenceTransformers for a lightweight package
* Get metadata of trained documents such as topics and probabilities using .get_document_info(docs)
* Added more support for cuML's HDBSCAN
* More images to the documentation and a lot of changes/updates/clarifications
* Get representative documents for non-HDBSCAN models by comparing document and topic c-TF-IDF representations
* Sklearn Pipeline Embedder

v0.12.0

Toggle v0.12.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
v0.12 (MaartenGr#668)

* Online/incremental topic modeling with .partial_fit
* Expose c-TF-IDF model for customization with bertopic.vectorizers.ClassTfidfTransformer
* Expose attributes for easier access to internal data
* Major changes to the Algorithm page of the documentation, which now contains three overviews of the algorithm
* Added an example of combining BERTopic with KeyBERT
* Added many tests with the intention of making development a bit more stable
* Fix MaartenGr#632, MaartenGr#648, MaartenGr#673, MaartenGr#682, MaartenGr#667, MaartenGr#664

v0.11.0

Toggle v0.11.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
v0.11.0 (MaartenGr#578)

* Perform hierarchical topic modeling with `.hierarchical_topics`
* Visualize hierarchical topic representations with `.visualize_hierarchy`
* Extract a text-based hierarchical topic representation with `.get_topic_tree`
* Visualize 2D documents with `.visualize_documents()`
* Visualize 2D hierarchical documents with `.visualize_hierarchical_documents()`
* Create custom labels to the topics throughout most visualizations with `.generate_topic_labels` and `.set_topic_labels`
* Manually merge topics with `.merge_topics()`
* Added example for finding similar topics between two models in the tips & tricks page
* Add multi-modal example in the tips & tricks page
* Added native Hugging Face transformers support

v0.10.0

Toggle v0.10.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
v0.10.0 (MaartenGr#492)

* Use any dimensionality reduction technique instead of UMAP
* Use any clustering technique instead of HDBSCAN
* Add a CountVectorizer page with tips and tricks on how to create topic representations that fit your use case
* Added pages on how to use other dimensionality reduction and clustering algorithms
* Additional instructions on how to reduce outliers in the FAQ
* Fixed `None` being returned for probabilities when transforming unseen documents
* Replaced all instances of `arg:` with `Arguments:` for consistency
* Before saving a fitted BERTopic instance, we remove the stopwords in the fitted CountVectorizer model as it can get quite large due to the number of words that end in stopwords if `min_df` is set to a value larger than 1
* Set `"hdbscan>=0.8.28"` to prevent numpy issues
* Update gensim dependency to `>=4.0.0` (MaartenGr#371)
* Fix topic 0 not appearing in visualizations (MaartenGr#472)
* Fix MaartenGr#506
* Fix MaartenGr#429

v0.9.4

Toggle v0.9.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
v0.9.4 (MaartenGr#335)

* Expose diversity parameter
* Improve stability of topic reduction 
* Added property to c-TF-IDF that all IDF values are positive (MaartenGr#351)
* Improve stability of `.visualize_barchart()` and `.visualize_hierarchy()`
* Major documentation overhaul (including MaartenGr#330)
* Drop python 3.6 (MaartenGr#333)
* Relax plotly dependency (MaartenGr#88)
* Additional logging for `.transform` (MaartenGr#356)

v0.9.3

Toggle v0.9.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

v0.9.2

Toggle v0.9.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
v0.9.2 (MaartenGr#239)

* Update default embedding model from 'paraphrase' to 'all'
* Fix probability mapping
* Optimize cTFIDF topic extraction
* Fix algorithm image, update documentation, fix spelling, etc.
* Fix MaartenGr#258
* Update README with visualization example