GitHub - woollysocks/aMUSE: Cross-lingual word embeddings with or without parallel vocabulary

Unsupervised Alignment of Word Embeddings

This codebase is forked from facebook's MUSE which implements Word Translation without Parallel Data.

Additions to the forked codebase implement Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction.

Dependencies

Python 2/3 with numpy/scipy
PyTorch

Get evaluation datasets

Get monolingual and cross-lingual word embeddings evaluation datasets by simply running (in data/): ./get_evaluation.sh

Get monolingual word embeddings

For pre-trained monolingual word embeddings, download fastText Wikipedia embeddings.

You can download the English (en) and Chinese (zh) embeddings this way:

# English fastText Wikipedia embeddings
curl -Lo data/wiki.en.vec https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.en.vec
# Spanish fastText Wikipedia embeddings
curl -Lo data/wiki.zh.vec https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.zh.vec

Training (CPU or GPU)

To train the Conneau et al. (2017) model run:

python unsupervised.py --src_lang en --tgt_lang zh --src_emb data/wiki.en.vec --tgt_emb data/wiki.zh.vec

To train the Conneau et al. (2017) model run:

python unsupervised_wgan.py --src_lang en --tgt_lang zh --src_emb data/wiki.en.vec --tgt_emb data/wiki.zh.vec

By default, the validation metric is the mean cosine of word pairs from a synthetic dictionary built with CSLS (Cross-domain similarity local scaling). Pay close attention to all default flags before running experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
src		src
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
run_wgan.sbatch		run_wgan.sbatch
supervised.py		supervised.py
unsupervised.py		unsupervised.py
unsupervised_wgan.py		unsupervised_wgan.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Alignment of Word Embeddings

Dependencies

Get evaluation datasets

Get monolingual word embeddings

Training (CPU or GPU)

About

Releases

Packages

Languages

License

woollysocks/aMUSE

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Alignment of Word Embeddings

Dependencies

Get evaluation datasets

Get monolingual word embeddings

Training (CPU or GPU)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages