Transforming and Visualizing Word Embeddings

This codebase contains a set of simple postprocessing transformations that improve the performance of word embeddings. Prior work has shown that mean subtraction and removal of early principal components can enhance performance on lexical similarity tasks. We further demonstrate that, simply by performing these transformations only on a strategic subset of the vocabulary, we can consistently achieve even further gains (up to 20% overall), while consuming less compute and memory resources. Not only does this behavior offer insights into the linguistic properties of these word representations, but the gains are considerable and hold on both static word embeddings (word2vec and GloVe) and contextual word embeddings (BERT and GPT-2) across a broad range of lexical similarity tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
colabs		colabs
docs		docs
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transforming and Visualizing Word Embeddings

About

Releases

Packages

Languages

License

google/one-weird-trick

Folders and files

Latest commit

History

Repository files navigation

Transforming and Visualizing Word Embeddings

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages