Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BERT #67

Closed
nassera2014 opened this issue Jul 25, 2022 · 1 comment
Closed

BERT #67

nassera2014 opened this issue Jul 25, 2022 · 1 comment

Comments

@nassera2014
Copy link

Thank you for your great work, can I use those models using BERT as a word embedding model?

@silviatti
Copy link
Collaborator

Thanks :)
The model that supports BERT embeddings is CTM (Contextualized Topic Models). This is a snippet to run it in OCTIS:

from octis.models.CTM import CTM
from octis.dataset.dataset import Dataset

model = CTM(
    num_topics=10, num_epochs=30, inference_type='combined',
    bert_model="bert-base-nli-mean-tokens", bert_path="path\to\store\the\embbedings\")

where the parameter bert_model represents the name of the contextualized model. You can find the other possible supported models here: https://www.sbert.net/docs/pretrained_models.html
You can find the other parameters here: https://github.com/MIND-Lab/OCTIS/blob/master/octis/models/CTM.py

Just a note: this integrated model uses the pre-processed text to generate the document embeddings. If you want to use the unpreprocessed documents, as the original model, then you should refer to the original implementation: https://github.com/MilaNLProc/contextualized-topic-models

Also ETM uses embeddings, but static word embeddings, so it can't be easily adapted to BERT embeddings. See the implementation here: https://github.com/MIND-Lab/OCTIS/blob/master/octis/models/etm.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants