In this repository I store the code base of my research work conducted at ILLC UvA fall 2021 to spring 2022 in collaboration with Wilker Aziz.
🔗 Link to pre-print paper on Arxiv
/analysis
/bda_models
bda_dp_mixture_surprisal_vals.py
: analyse surprisal values with DP Mixture (NumPyro model + plot functions)bda_MM_latent_analysis.py
: analyse latent samples with DP Mixture (calls Wilker's Pyro Model)bda_pixel_model_mnist.py
: conditional beta bernoulli pixel model (NumPyro + plot functions)bda_sequence_length_model_ptb.py
: rate poisson model (NumPyro + plot functions)bda_topic_model_ptb.py
topic model (uses altered Gensim LDA implementation, optimised with VI)gensim_LDA.py
alteration of the class by Gensim/Pyro_BDA
: holds code fromprobabll/bda
repository
/data_space
:MNIST_pixels.ipynb
: fit BDA & compute surprisals for MNISTPTB_sequence_length_preprocess.ipynb
: pre-process for length analyis (outputsptb_length_analysis_data.pt
)PTB_sequence_length.ipynb
: fit BDA & compute surprisals for PTB sequence length analysisPTB_topics.ipynb
: fit BDA & compute surprisals for PTB lda topic analysissurprisal_DPs.ipynb
: fit DPs on surprisal values/output_files
: stores intermediate output files
/latent_space
latent_space_analysis.ipynb
: perform latent space analysislatent_analysis.py
: some functions that help inlatent_space_analysis.ipynb
image_encoding_stats.csv
: stores some basic statistics of the image encodings (KS, MMD, etc.)language_encoding_stats.csv
: stores some basic statistics of the language (KS, MMD, etc.)
final_selection_runs.csv
: file that stores the experiments used in the analysisglobal_stats.csv
: file that stores aggregated intrinsic evaluation results used in the analysis