Merge pull request #182 from complexbrains/master

Unit conversions regarding #179
poldrack · Feb 29, 2020 · 48b2e19 · 48b2e19
2 parents 9174043 + 52a6948
commit 48b2e19
Show file tree

Hide file tree

Showing 5 changed files with 22 additions and 12 deletions.
diff --git a/02-Data.Rmd b/02-Data.Rmd
@@ -52,13 +52,13 @@ In general, most programming languages treat truth values and binary numbers equ
 
 **Integers**. Integers are whole numbers with no fractional or decimal part. We most commonly encounter integers when we count things, but they also often occur in psychological measurement. For example, in my introductory survey I administer a set of questions about attitudes towards statistics (such as "Statistics seems very mysterious to me."), on which the students respond with a number between 1 ("Disagree strongly") and 7 ("Agree strongly"). 
 
-**Real numbers**. Most commonly in statistics we work with real numbers, which have a fractional/decimal part. For example, we might measure someone's weight, which can be measured to an arbitrary level of precision, from whole pounds down to micrograms.
+**Real numbers**. Most commonly in statistics we work with real numbers, which have a fractional/decimal part. For example, we might measure someone's weight, which can be measured to an arbitrary level of precision, from whole kg down to micrograms.
 
 ## Discrete versus continuous measurements
 
 A *discrete* measurement is one that takes one of a set of particular values. These could be qualitative values (for example, different breeds of dogs) or numerical values (for example, how many friends one has on Facebook). Importantly, there is no middle ground between the measurements; it doesn't make sense to say that one has 33.7 friends.
 
-A *continuous* measurement is one that is defined in terms of a real number. It could fall anywhere in a particular range of values, though usually our measurement tools will limit the precision with which we can measure; for example, a floor scale might measure weight to the nearest pound, even though weight could in theory be measured with much more precision.
+A *continuous* measurement is one that is defined in terms of a real number. It could fall anywhere in a particular range of values, though usually our measurement tools will limit the precision with which we can measure; for example, a floor scale might measure weight to the nearest kg, even though weight could in theory be measured with much more precision.
 
 It is common in statistics courses to go into more detail about different "scales" of measurement, which are discussed in more detail in the Appendix to this chapter. The most important takeaway from this is that some kinds of statistics don't make sense on some kinds of data. For example, imagine that we were to collect postal Zip Code data from a number of individuals. Those numbers are represented as integers, but they don't actually refer to a numeric scale; each zip code basically serves as a label for a different region. For this reason, it wouldn't make sense to talk about the average zip code, for example. 
 

diff --git a/07-Sampling.Rmd b/07-Sampling.Rmd
@@ -117,7 +117,7 @@ sampMeans_df %>%
  label = "Population mean"
  ) +
  labs(
- x = "Height (inches)"
+ x = "Height (cm)"
  )
 ```
 

diff --git a/07b-SamplingInR.Rmd b/07b-SamplingInR.Rmd
@@ -91,7 +91,7 @@ sampMeans %>%
  size=6
  ) +
  # label the x axis
- labs(x = "Height (inches)") +
+ labs(x = "Height (cm)") +
  # add normal based on population mean/sd
  stat_function(
  fun = dnorm, n = sampSize,

diff --git a/11-BayesianStatistics.Rmd b/11-BayesianStatistics.Rmd
@@ -54,6 +54,11 @@ If we know the value of the latent variable, then it's easy to reconstruct what
 
 The reason that Bayesian statistics has its name is because it takes advantage of Bayes' theorem to make inferences from data about the underlying process that generated the data. Let's say that we want to know whether a coin is fair. To test this, we flip the coin 10 times and come up with 7 heads. Before this test we were pretty sure that the $P_{heads}=0.5$), but finding 7 heads out of 10 flips would certainly give us pause if we believed that $P_{heads}=0.5$. We already know how to compute the conditional probability that we would flip 7 or more heads out of 10 if the coin is really fair ($P(n\ge7|p_{heads}=0.5)$), using the binomial distribution.
 
+
+```{r echo=FALSE}
+# *TBD: MOTIVATE SWITCH FROM 7 To 7 OR MORE*
+```
+
 The resulting probability is `r I(sprintf("%.3f",pbinom(7, 10, .5, lower.tail = FALSE)))`. That is a fairly small number, but this number doesn't really answer the question that we are asking -- it is telling us about the likelihood of 7 or more heads given some particular probability of heads, whereas what we really want to know is the probability of heads. This should sound familiar, as it's exactly the situation that we were in with null hypothesis testing, which told us about the likelihood of data rather than the likelihood of hypotheses.
 
 Remember that Bayes' theorem provides us with the tool that we need to invert a conditional probability:
@@ -210,10 +215,11 @@ ggplot(likeDf,aes(resp,likelihood5)) +
 
 In addition to the likelihood of the data under different hypotheses, we need to know the overall likelihood of the data, combining across all hypotheses (i.e., the marginal likelihood). This marginal likelihood is primarily important because it helps to ensure that the posterior values are true probabilities. In this case, our use of a set of discrete possible parameter values makes it easy to compute the marginal likelihood, because we can just compute the likelihood of each parameter value under each hypothesis and add them up. 
 
+```{r ,echo=FALSE}
+# *MH:*not sure there’s a been clear discussion of the marginal likelihood up this point. it’s a confusing and also very deep construct.. the overall likelihood of the data is the likelihood of the data under each hypothesis, averaged together (weighted by) the prior probability of those hypotheses. it is how likely the data is under your prior beliefs about the hypotheses.
 
-*MH:*not sure there’s a been clear discussion of the marginal likelihood up this point. it’s a confusing and also very deep construct.. the overall likelihood of the data is the likelihood of the data under each hypothesis, averaged together (weighted by) the prior probability of those hypotheses. it is how likely the data is under your prior beliefs about the hypotheses.
-
-might be worth thinking of two examples, where the likelihood of the data under that hypothesis of interest is the same, but where the marginal likelihood changes i.e., the hypothesis is pretty good at predicting the data, while other hypothese are bad vs. other hypotheses are always good (perhaps better)
+# might be worth thinking of two examples, where the likelihood of the data under that hypothesis of interest is the same, but where the marginal likelihood changes i.e., the hypothesis is pretty good at predicting the data, while other hypothese are bad vs. other hypotheses are always good (perhaps better)
+```
 
 ```{r echo=FALSE}
 # compute marginal likelihood
@@ -289,10 +295,11 @@ Given our data we would like to obtain an estimate of $p_{respond}$ for our samp
 
 Often we would like to know not just a single estimate for the posterior, but an interval in which we are confident that the posterior falls. We previously discussed the concept of confidence intervals in the context of frequentist inference, and you may remember that the interpretation of confidence intervals was particularly convoluted: It was an interval that will contain the the value of the parameter 95% of the time. What we really want is an interval in which we are confident that the true parameter falls, and Bayesian statistics can give us such an interval, which we call a *credible interval*.
 
-```{r echo=FALSE}
-# *TBD: USE POSTERIOR FROM ABOVE*
-```
 
+```{r ,echo=FALSE}
+# *TBD: USE POSTERIOR FROM ABOVE*
+
+```
 
 The interpretation of this credible interval is much closer to what we had hoped we could get from a confidence interval (but could not): It tells us that there is a 95% probability that the value of $p_{respond}$ falls between these two values. Importantly, it shows that we have high confidence that $p_{respond} > 0.0$, meaning that the drug seems to have a positive effect.
 
@@ -302,8 +309,11 @@ In some cases the credible interval can be computed *numerically* based on a kno
 
 In the previous example we used a *flat prior*, meaning that we didn't have any reason to believe that any particular value of $p_{respond}$ was more or less likely. However, let's say that we had instead started with some previous data: In a previous study, researchers had tested 20 people and found that 10 of them had responded positively. This would have lead us to start with a prior belief that the treatment has an effect in 50% of people. We can do the same computation as above, but using the information from our previous study to inform our prior (see oanel A in Figure \@ref(fig:posteriorDistPrior)). 
 
-*MH:* i wonder what you’re doing here: is this the same thing as doing a bayesian inference assuming 10 / 20 data and using the posterior from that as the prior for this analysis? that is what woud normally be the straightfoward thing to do.
+```{r ,echo=FALSE}
+
+# *MH:* i wonder what you’re doing here: is this the same thing as doing a bayesian inference assuming 10 / 20 data and using the posterior from that as the prior for this analysis? that is what woud normally be the straightfoward thing to do.
 
+```
 
 ```{r echo=FALSE}
 # compute likelihoods for data under all values of p(heads) 

diff --git a/14-GeneralLinearModel.Rmd b/14-GeneralLinearModel.Rmd
@@ -559,7 +559,7 @@ kable(combined_rsquared, caption='Root mean squared error for model applied to o
 ```
 
 
-Here we see that whereas the model fit on the original data showed a very good fit (only off by a few pounds per individual), the same model does a much worse job of predicting the weight values for new children sampled from the same population (off by more than 25 pounds per individual). This happens because the model that we specified is quite complex, since it includes not just each of the individual variables, but also all possible combinations of them (i.e. their *interactions*), resulting in a model with 32 parameters. Since this is almost as many coefficients as there are data points (i.e., the heights of 48 children), the model *overfits* the data, just like the complex polynomial curve in our initial example of overfitting in Section \@ref(overfitting).
+Here we see that whereas the model fit on the original data showed a very good fit (only off by a few kg per individual), the same model does a much worse job of predicting the weight values for new children sampled from the same population (off by more than 25 kg per individual). This happens because the model that we specified is quite complex, since it includes not just each of the individual variables, but also all possible combinations of them (i.e. their *interactions*), resulting in a model with 32 parameters. Since this is almost as many coefficients as there are data points (i.e., the heights of 48 children), the model *overfits* the data, just like the complex polynomial curve in our initial example of overfitting in Section \@ref(overfitting).
 
 Another way to see the effects of overfitting is to look at what happens if we randomly shuffle the values of the weight variable (shown in the second row of the table). Randomly shuffling the value should make it impossible to predict weight from the other variables, because they should have no systematic relationship. This shows us that even when there is no true relationship to be modeled (because shuffling should have obliterated the relationship), the complex model still shows a very low error in its predictions, because it fits the noise in the specific dataset. However, when that model is applied to a new dataset, we see that the error is much larger, as it should be.