Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do samplers or ADVI assume parameters/data points independence? #1517

Closed
ClaudMor opened this issue Jan 8, 2021 · 7 comments
Closed

Do samplers or ADVI assume parameters/data points independence? #1517

ClaudMor opened this issue Jan 8, 2021 · 7 comments

Comments

@ClaudMor
Copy link

ClaudMor commented Jan 8, 2021

Hello,

This is a partially theoretical question, I hope it is not too wrong to post it as an issue. These resources:

definitively helped a lot in understanding the inner workings of some of Turing.jl's features, but I'd also like to ask here to be sure.

Suppose one has to calibrate a model ( e.g. a DifferentialEquations.jl model) using Turing.

  1. Does the adoption of the likelihood data ~ MvNormal(predicted, σ) where σ is a Float64 imply that the points in data are assumed to be independent?
  2. If one knows that the data are somehow correlated and strictly positive, would substituting the MvNormal(predicted, σ) with something like a TruncatedMvNormal(predicted, Σ) ( where Σ is now a matrix) be the best option? If so, what prior would you set on Σ ( or on its entries)?
  3. Do the samplers ( NUTS in particular) or ADVI somewhere assume the independence of the model's parameters being sampled?
  4. Specifically regarding ADVI, did I correctly understand that this example shows how to relax the supposed independence assumption of question 3. ?

Thanks in advance

@cpfiffer
Copy link
Member

cpfiffer commented Jan 8, 2021

I can only answer 1 and 2:

  1. Yes, please look at the Distributions.jl documentation for that -- it's not really a Turing issue.
  2. Yep, that'd work. Priors on Sigma are often InverseWishart for the covariance matrix directly, or LKJ priors on the correlation matrices coupled with InverseGamma/Cauchy/etc. priors on variance to allow for scale. See here for more info on LKJ priors in PyMC.

@torfjelde
Copy link
Member

torfjelde commented Jan 8, 2021

Suppose one has to calibrate a model ( e.g. a DifferentialEquations.jl model) using Turing.

  1. Does the adoption of the likelihood data ~ MvNormal(predicted, σ) where σ is a Float64 imply that the points in data are assumed to be independent?
  2. If one knows that the data are somehow correlated and strictly positive, would substituting the MvNormal(predicted, σ) with something like a TruncatedMvNormal(predicted, Σ) ( where Σ is now a matrix) be the best option? If so, what prior would you set on Σ ( or on its entries)?
  3. Do the samplers ( NUTS in particular) or ADVI somewhere assume the independence of the model's parameters being sampled?
  4. Specifically regarding ADVI, did I correctly understand that this example shows how to relax the supposed independence assumption of question 3. ?
  1. Yes. So σ being a Float64 is equivalent to a covariance matrix Diagonal(σ^2 .* zeros(d)), i.e. a d-dimensional diagonal matrix with the diagonal entries being equal to σ. EDIT: Changed σ to σ^2 thanks to @devmotion
  2. If that's the case, then yes it makes sense to know incorporate the correlation between the observations into the model. This can for example be done by using a non-diagonal covariance matrix. @cpfiffer 's answer mentions different ways to put a prior on the covariance matrix. It's worth pointing out that there is no such thing as a TruncatedMvNormal simply because the logpdf does not exist in closed form. Alternatively one could do something like using an exponential transform of the distribution, e.g. transformed(MvNormal(predicted, Σ), Bijectors.Exp{1}()), to get a distribution with non-diagonal covariance and positively constrained. But this has implications for how you model noise; yes it might make sense to have positively constrained variables, but at the same time it might not make much sense depending on how the observations actually were obtained. I also want to just say that if you think your observations are actually correlated to the point where "it matters", it seems to me that you have some fairly strong intuition of how the observations were generated "in the real world". If that's the case, there might be some better approaches to incorporating this into your model rather than just adding in a non-diagonal covariance matrix.
  3. So NUTS will do its best to sample from essentially any distribution you give it (in reality there are some theoretical assumptions, but not related specifically to inpendence of model's parameters). NUTS only really sees a mapping θ ↦ p(θ) for some (potentially unnormalized) density p and tries to sample. For ADVI it really depends on what family of approximate distributions you are optimizing over, e.g. see the answer below.
  4. Yep!

@devmotion
Copy link
Member

Yes. So σ being a Float64 is equivalent to a covariance matrix Diagonal(σ .* zeros(d)), i.e. a d-dimensional diagonal matrix with the diagonal entries being equal to σ.

Actually, it corresponds to a diagonal matrix with entries being equal to σ^2.

@ClaudMor
Copy link
Author

ClaudMor commented Jan 9, 2021

Hello @cpfiffer @torfjelde @devmotion ,

And thanks a lot for your support.

Specifically answering the 3. reply of @torfjelde , our model is an epidemiological model: you may find an MWE here, where I posted a simplified version of the calibration pipeline, the turing model along with fake ( but credible) data.
As you may see, the data are hospitalizations, intensive care and deaths time series ( so they are all positive integers), then of course one may expect them to be "correlated".
We are trying to capture these correlations by fitting an epidemiological model, but is there any information coming from such data that we should give to the turing_model ?

@pitmonticone
Copy link
Contributor

Do you have any update? I would be very interested to know.

@luiarthur
Copy link
Contributor

For me, the referenced example is quite long. I find it difficult to know exactly what you're after in the context of your question.

But, if you're referring to what prior information you should provide to the model, that would depend on your domain expertise. It seems that your priors are all quite specifically constrained and uniform. If you know there should be some correlation, you would need to reconsider those priors in light of the knowledge you have.

If you're asking how one can allow the ADVI algorithm to model correlation between model parameters, then the answer would be to use the full-rank approximation (instead of the default meanfield), which is here as you mentioned. You would not necessarily have to provide domain expertise here, but you do need to ensure the dimensions of the covariance matrix are correct for your model.

Thanks for posting this fascinating question here. If you care to continue this discussion, would you reference this in the Discussions page? The Discussions page is meant for questions related to statistical theory and applications. Items related to bugs and feature requests would, of course, still be welcome here in Issues.

@luiarthur
Copy link
Contributor

I'm closing this for now as it does not concern bugs or feature requests. As mentioned, feel free to continue this discussion on the Discussions page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants