Computational Methods for the manuscript "Spatial discordances between mRNAs and proteins in the intestinal epithelium"
This repository contains code associated with the parameter estimation process in the manuscript "Spatial discordances between mRNAs and proteins in the intestinal epithelium" by Yotam Harnik, Lisa Buchauer, Shani Ben-Moshe, Yishai Levin, Alon Savidor, Raya Eilam, Andreas E. Moor, Inna Averbukh and Shalev Itzkovitz.
-
constant_rate_model - Protein translation and decay rate estimation based on mRNA and protein profiles, assumes constant translation and decay rates along the villus axis. This part requires python 3.8 and the third-party packages numpy, matplotlib, scipy, seaborn, pandas, scipy, emcee, statsmodels and corner. The parameter estimation process is outlined in five jupyter notebooks (N1 - N5) which detail data preprocessing, prior construction, MCMC sampling and result validation. The directory further contains three external data sets used for comparison to the results derived here and some code meant to facilitate large-scale parameter estimation on computational clusters.
-
declining_rate_model - Protein translation and decay rate estimation based on mRNA and protein profiles, assumes a global decline in translation rates and constant decay rates along the villus axis. This part requires python 3.8 and the third-party packages numpy, matplotlib, scipy, seaborn, pandas, scipy, emcee, statsmodels, dill and corner. The parameter estimationprocess is outlined in five jupyter notebooks (N1 - N5) which detail data preprocessing, prior construction, MCMC sampling and result validation. The directory further contains three external data sets used for comparison to the results derived here and some code meant to facilitate large-scale parameter estimation on computational clusters.
-
statistical_power_analysis - Scriptcs fpr analyzing under which circumstances the constant-rate model can be rejected by data of the type used in the manuscript. To this end, such data (mRNA-protein profiles in 6 villus zones) is first simulated (notebook N1) and then submitted to the same parameter estimation procedure applied to the real data (using the constant translation-rate model, notebooks N2 and N3). Notebook N4 shows under which circumstances the constant-rate model can be rejected and which parameter estimates are derived under the assumption of constant rates.
All data which is directly required for executing the scripts in this repository is likewise contained in the repository. Raw mRNA sequencing data have been deposited in the GenBank GEO database under accession code GSE164746 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164746). The full collection of model fit results and parameter estimates, which includes MCMC-chains approximating the posterior parameter distribution and figures showing the model's fit to the data for each gene, can be accessed in a zenodo repository (https://zenodo.org/record/5136420).