This repository accompanies the paper (arXiv). Accepted to ACL 2020.
tl;dr read intro and one of the usage headings.
Tools to understand neural representations, and application to contextualizers. A “contexualizer” is a model producing a context-dependent word embedding.
Concretely, similarity measures, eg.
applied to SOTA contextualizers, eg.
All similarity measures can be found in corr_methods.py
.
We also experimented with novel attention-based similarity measures in
attention_corr_methods.py
.
This repository should be on your Python path.
export PYTHONPATH="${PWD}:${PYTHONPATH}"
The main script is main.py
.
main.py [--methods [METHODS ...]] REPRESENTATION_FILES OUTPUT_FILE
For examples, see slurm
(eg. mk_resultsN.sh
and
mk_resultsN-helper.sh
). To see all options, run python main.py
--help
. Note that REPRESENTATION_FILES
is a file containing an input
file on each line. OUTPUT_FILE
is a pickle dump.
main_attn.py
is analogous.
You can also call the correlation methods directly from python. See
ex.ipynb
.
var.py
. Stuff you might want to change if you use this. eg, it has functionfname2mname
(filename to model name) that transforms/data/sls/temp/belinkov/contextual-corr-analysis/contextualizers/bert_large_cased/ptb_pos_dev.hdf5
tobert_large_cased-ptb_pos_dev.hdf5
.- analysis. Data analysis. The results that will be presented. analysis-n analyzes the result of experiment n.
- hnb. “Helper notebook.” Files in this directory are to
- help me code
- help the reader understand
the resulting .py files.
These are files containing a copy of the function with loops and co. destructured (run once with an arbitrary value, to help debugging).
It may help you understand a function.
- slurm. SLURM scripts.
Run directly as
SCRIPTNAME
. - other. Everything else. Lots of junk.
Our pipeline is:
- Generate representations (hdf5 files)
- Run
main.py
on them- Loads the representations (
load_representations
incorr_methods.py
) - Compute the correlations using the given methods
- Writes them to
OUTPUT_FILE
.
- Loads the representations (
- Analyze results (the OUTPUT_FILE above) in the
analysis
directory
New correlation methods should extend corr_methods.Method
.
SVCCA similarities (Method
CCA
in corr_methods.py
):
CKA attention similarity (Method
AttnLinCKA
in
attention_corr_methods.py
):