BERT #67

nassera2014 · 2022-07-25T19:02:21Z

Thank you for your great work, can I use those models using BERT as a word embedding model?

silviatti · 2022-07-26T10:21:50Z

Thanks :)
The model that supports BERT embeddings is CTM (Contextualized Topic Models). This is a snippet to run it in OCTIS:

from octis.models.CTM import CTM
from octis.dataset.dataset import Dataset

model = CTM(
    num_topics=10, num_epochs=30, inference_type='combined',
    bert_model="bert-base-nli-mean-tokens", bert_path="path\to\store\the\embbedings\")

where the parameter bert_model represents the name of the contextualized model. You can find the other possible supported models here: https://www.sbert.net/docs/pretrained_models.html
You can find the other parameters here: https://github.com/MIND-Lab/OCTIS/blob/master/octis/models/CTM.py

Just a note: this integrated model uses the pre-processed text to generate the document embeddings. If you want to use the unpreprocessed documents, as the original model, then you should refer to the original implementation: https://github.com/MilaNLProc/contextualized-topic-models

Also ETM uses embeddings, but static word embeddings, so it can't be easily adapted to BERT embeddings. See the implementation here: https://github.com/MIND-Lab/OCTIS/blob/master/octis/models/etm.py

silviatti closed this as completed Aug 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT #67

BERT #67

nassera2014 commented Jul 25, 2022

silviatti commented Jul 26, 2022

BERT #67

BERT #67

Comments

nassera2014 commented Jul 25, 2022

silviatti commented Jul 26, 2022