This repository contains the code for the following paper, which proposes a novel approach of learning crosslingual word embeddings optimized for document level aggregation.
"Crosslingual Document Embedding as Reduced-Rank Ridge Regression". Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 2019.
We also publish a dataset of pretrained word embeddings in 28 languages, where words are embedded in a shared latent space. The dataset is available here.
If you found the provided resources useful, please cite the above paper. Here's a BibTeX entry you may use:
@inproceedings{josifoski-wsdm2019-cr5,
title={Crosslingual Document Embedding as Reduced-Rank Ridge Regression},
author={Josifoski, Martin and Paskov, Ivan S. and Paskov, Hristo S. and Jaggi, Martin and West, Robert},
booktitle={Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining},
organization={ACM},
year={2019}
}
Contact [email protected].