-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add expectations and variances in gold standard #108
Comments
Is the idea here that we can get more accurate expectations by using more samples? So we might have a gold standard with 10 000 draws but expectations computed with 100 000 draws? Or is it more that for some posteriors we might have a way of computing accurate expectations but not accurate draws? I guess simulated posteriors also fall under this case as we can have ground truth expectations while the expectations computed from draws are only estimates. |
Yes. Thats the reason. We could have much more accurate estimates of expectations and the covariance. Good point. We should also add how the expectations were calculated. |
So {
"name": "eight_schools-eight_schools_noncentered",
"keywords": ["stan_benchmark"],
"model_name": "eight_schools_noncentered",
"reference_draws_name": "eight_schools-eight_schools_noncentered",
"reference_expectations_name": "something",
"data_name": "eight_schools",
"dimensions": {"theta":8, "mu":1, "tau":1},
"added_by": "Mans Magnusson",
"added_date": "2019-08-12"
} (I'm using reference instead of gold standard here) I guess the |
Exactly! |
Here's some things that popped to my mind. 1Lets say we have a posterior where the expectations computed with 100 000 draws are more accurate than the ones computed from a sample of 10 000 draws. Do we even want to expose the smaller sample? One could argue that if the expectation with the smaller draws is less accurate then the draws do not represent the posterior well and should not ever be used. 1.5Lets continue the previous case. We know that 100 000 draws gave a better result. Should we also try 1 000 000 draws to see if that gives an even better result? 2Lets say we have a small posterior where storing large draws (lets say 100 000) takes less space than storing small draws (say 10 000) of a "normal-sized" posterior. For the small posterior the large draws gives a more accurate estimate than with small one. Should we just store the large draws (so 100 000) in this case? In other words, do we want to have a fixed number of draws in the first place? 3How can we recognize that an estimate is more accurate than another estimate? Sure it is more likely that a larger sample gives more accurate estimate but it is still possible that sometimes the smaller sample gives a better estimate. Or can we consider the chance of this to be small enough to be ignored? |
There are different use-cases. So if you only want to check that you get the expectations right, then the larger sample is better. Storing expectations from 100 000 draws is also less costly (it only depends on the dimension). Although others may want to have draws from the posterior to, for example compute log_lik values for a subset of observations. @avehtari is working with writing down use cases now. 1.5) The more draws the better. We need to set the bar for reference draws somewhere since there is a computational cost, especially for larger models.
|
I probably explained 1. a bit poorly. What I mean is that
As for 2. perhaps we should have a simple and straightforward guideline (10 000 draws are preferred) that we can deviate from if there's good reasons to do so. Maybe this is what you also had in mind? |
|
This is what I'm essentially hearing: 10 000 samples will have a small error compared to 100 000 and thus it's fine to use the smaller sample to compute log_lik etc. Yet 10 000 samples has too big of an error to compute expectations so we need to use the larger sample for that. This sounds like a contradiction. |
10000 samples is good enogh in most situations, 100 000 is better but would take up 10x space. There is no on-off here. Computing expectations and covariance with respect to 100 000 give a slightly better estimate but with no additional storage cost. Using 1000000 would be even better but we need to draw the line somewhere. |
Expectations and Covariance
|
We may be interested in gold standards expectations without actual gold standard draws. Hence we should add them as a separate slot that could be accessed.
This should include:
Mean, variances and covariance for all parameters based on 100 000 draws. Hence a new gold standard slot should be created.
The text was updated successfully, but these errors were encountered: