An R package for estimating text embedding regression models as described in Rodriguez, Spirling and Stewart (2021).
devtools::install_github("prodriguezsosa/conText")
To use conText you will need three datasets:
- A corpus with the text and corresponding metadata you want to evaluate.
- A set of pre-trained embeddings (a V by D matrix) used to embed context words.
- A transformation matrix (D by D) specific to the pre-trained embeddings.
In this Dropbox folder (see the /data folder) we have included the three datasets we use in the Quick Start Guide along with their documentation. Due to memory constraints we could not include them directly in the package. We'll be adding other useful datasets to this folder in the near future.
Check out this Quick Start Guide to get going with conText
. If it makes sense to estimate your own embeddings and transformation matrix, also check out this Quick Start Guide - Local Transform.