From 0b8b9ec1f271bd731ec3bae38b8876a213c906f6 Mon Sep 17 00:00:00 2001 From: Brian Caffo Date: Fri, 23 May 2014 00:12:38 -0400 Subject: [PATCH 1/2] Added hw3 --- 06_StatisticalInference/homework/hw3.Rmd | 207 ++++++++++ 06_StatisticalInference/homework/hw3.html | 480 ++++++++++++++++++++++ 06_StatisticalInference/homework/hw3.md | 211 ++++++++++ 3 files changed, 898 insertions(+) create mode 100644 06_StatisticalInference/homework/hw3.Rmd create mode 100644 06_StatisticalInference/homework/hw3.html create mode 100644 06_StatisticalInference/homework/hw3.md diff --git a/06_StatisticalInference/homework/hw3.Rmd b/06_StatisticalInference/homework/hw3.Rmd new file mode 100644 index 000000000..25fdc736f --- /dev/null +++ b/06_StatisticalInference/homework/hw3.Rmd @@ -0,0 +1,207 @@ +--- +title : Homework 3 for Stat Inference +subtitle : Extra problems for Stat Inference +author : Brian Caffo +job : Johns Hopkins Bloomberg School of Public Health +framework : io2012 +highlighter : highlight.js +hitheme : tomorrow +#url: +# lib: ../../librariesNew #Remove new if using old slidify +# assets: ../../assets +widgets : [mathjax, quiz, bootstrap] +mode : selfcontained # {standalone, draft} +--- +```{r setup, cache = F, echo = F, message = F, warning = F, tidy = F, results='hide'} +# make this an external chunk that can be included in any file +library(knitr) +options(width = 100) +opts_chunk$set(message = F, error = F, warning = F, comment = NA, fig.align = 'center', dpi = 100, tidy = F, cache.path = '.cache/', fig.path = 'fig/') + +options(xtable.type = 'html') +knit_hooks$set(inline = function(x) { + if(is.numeric(x)) { + round(x, getOption('digits')) + } else { + paste(as.character(x), collapse = ', ') + } +}) +knit_hooks$set(plot = knitr:::hook_plot_html) +``` + +## About these slides +- These are some practice problems for Statistical Inference Quiz 3 +- They were created using slidify interactive which you will learn in +Creating Data Products +- Please help improve this with pull requests here +(https://github.com/bcaffo/courses) + + + +--- &multitext +Load the data set `mtcars` in the `datasets` R package. Calculate a +95% confidence interval to the nearest MPG. + +1. What is the lower endpoint of the interval? +2. What is the upper endpoint of the interval? + +*** .hint +Do `library(datasets)` and then `data(mtcars)` to get the data. +Consider `t.test` for calculations. You may have to install +the datasets package. + + +*** .explanation +```{r} +library(datasets); data(mtcars) +round(t.test(mtcars$mpg)$conf.int) +``` + +`r round(min(t.test(mtcars$mpg)$conf.int))` +`r round(max(t.test(mtcars$mpg)$conf.int))` + +--- &multitext +Suppose that data of 9 paired differences has a standard error of $1$, what value would the average difference have to be to have the lower endpoint of a 95% +students t confidence interval touch zero? + +1. Give the number here to two decimal places + +*** .hint +The t interval is $\bar x t_{.95, 8}\pm s /sqrt{n}$ + +*** .explanation +`r round(qt(.95, df = 3) * 1 / 3, 2)` + +We want $\bar x = t_{.95} s / sqrt{n}$ +```{r} +round(qt(.95, df = 3) * 1 / 3, 2) +``` + + +--- &radio +An independent group Student's T interval is used over +a paired T interval when: + +1. The observations are paired between the groups. +2. _The observations within the groups are natually assumed to be statistically independent_ +3. As long as you do it correctly, either is fine. +4. More details are needed to answer this question + +*** .hint +A paired interval is for paired observations. + +*** .explanation +If the groups are independent is the correct interval. + + +--- &multitext +Consider the `mtcars` dataset. Construct a 95% T interval for MPG comparing +4 to 6 cylinder cars (subtracting in the order of 4 - 6) +assume a constant variance. + +1. What is the lower endpoint of the interval to 1 decimal place? +2. What is the upper endpoint of the interval to 1 decimal place? + +*** .hint +Use `t.test` with `var.equal=TRUE` + +*** .explanation + +```{r} +m4 <- mtcars$mpg[mtcars$cyl == 4] +m6 <- mtcars$mpg[mtcars$cyl == 6] +#this does 4 - 6 +confint <- as.vector(t.test(m4, m6, var.equal = TRUE)$conf.int) +``` + +`r round(min(confint), 1)` +`r round(max(confint), 1)` + + +--- &radio +If someone put a gun to your head and said "Your confidence interval +must contain what it's estimating or I'll pull the trigger", what would +be the smart thing to do? + +1. _Make your interval as wide as possible_ +2. Make your interval as small as possible +3. Call the authorities + +*** .hint +C'mon. You don't need a hint + +*** .explanation +This is just an example of what happens to confidence intervals as you +increas the confidence level. You want to be quite sure in your interval (i.e. +have a large confidence level) and so you would increase the interval's width + +--- &radio + +Refer back to comparing MPG for 4 versus 6 cylinders. What do you conclude? + +1. The interval is above zero, suggesting 6 is better than 4 in the terms of MPG +2. _The interval is above zero, suggesting 4 is better than 6 in the terms of MPG_ +3. The interval does not tell you anything about the hypothesis test; you have to do the test. +4. The interval contains 0 suggesting no difference. + +*** .hint +Refer back to the problem, consider the implications of the interval being +larger than 0, double check the order in which things were subtracted and +make sure the results make sense in the context of the problem. + +*** .explanation +The interval was conducted subtracting 4 - 6 and was entirely above zero. + +--- &multitext +Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was ???3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. Does the change in BMI over the four week period appear to differ between the treated and placebo groups? + +1. Calculate the pooled variance estimate to 2 decimal places + + +*** .hint +The sample sizes are equal, so the pooled variance is the average of the +individual variances + + +*** .explanation +`r round(min(confint), 1)` +```{r} +n1 <- n2 <- 9 +x1 <- -3 ##treated +x2 <- 1 ##placebo +s1 <- 1.5 ##treated +s2 <- 1.8 ##placebo +spsq <- ( (n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2) +``` +`r round(spsq, 2)` + + +--- &radio + +For Binomial data the maximum likelihood estimate for the probability of +a success is + +1. _The proportion of successes_ +2. The proportion of failures +3. A shrunken version of the proportion of successes +4. A shrunken version of the proportion of failures + +*** .hint +Look back at the notes about likelihood. + +*** .explanation +The MLE for binomial data is always the proportion of successes. + +--- &radio + +Bayesian inference requires + +1. A type I error rate +2. Setting your confidence level +3. _Assigning a prior probability distribution_ +4. Evaluating frequency error rates + +*** .explanation +All of the other answers discuss frequentist concepts. All Bayesian analyses requiring setting a prior. + + diff --git a/06_StatisticalInference/homework/hw3.html b/06_StatisticalInference/homework/hw3.html new file mode 100644 index 000000000..99aca5837 --- /dev/null +++ b/06_StatisticalInference/homework/hw3.html @@ -0,0 +1,480 @@ + + + + Homework 3 for Stat Inference + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

Homework 3 for Stat Inference

+

Extra problems for Stat Inference

+

Brian Caffo
Johns Hopkins Bloomberg School of Public Health

+
+
+
+ + + + +
+

About these slides

+
+
+
    +
  • These are some practice problems for Statistical Inference Quiz 3
  • +
  • They were created using slidify interactive which you will learn in +Creating Data Products
  • +
  • Please help improve this with pull requests here +(https://github.com/bcaffo/courses)
  • +
+ +
+ +
+ + +
+ +
+

Load the data set mtcars in the datasets R package. Calculate a +95% confidence interval to the nearest MPG.

+ +
    +
  1. What is the lower endpoint of the interval?
  2. +
  3. What is the upper endpoint of the interval?
  4. +
+ + + + + + +
+

Do library(datasets) and then data(mtcars) to get the data. +Consider t.test for calculations. You may have to install +the datasets package.

+ +
+
+
library(datasets); data(mtcars)
+round(t.test(mtcars$mpg)$conf.int)
+
+ +
[1] 18 22
+attr(,"conf.level")
+[1] 0.95
+
+ +

18 +22

+ +
+
+
+ +
+ + +
+ +
+

Suppose that data of 9 paired differences has a standard error of \(1\), what value would the average difference have to be to have the lower endpoint of a 95% +students t confidence interval touch zero?

+ +
    +
  1. Give the number here to two decimal places
  2. +
+ + + + + + +
+

The t interval is \(\bar x t_{.95, 8}\pm s /sqrt{n}\)

+ +
+
+

0.78

+ +

We want \(\bar x = t_{.95} s / sqrt{n}\)

+ +
round(qt(.95, df = 3) * 1 / 3, 2)
+
+ +
[1] 0.78
+
+ +
+
+
+ +
+ + +
+ +
+

An independent group Student's T interval is used over +a paired T interval when:

+ +
    +
  1. The observations are paired between the groups.
  2. +
  3. The observations within the groups are natually assumed to be statistically independent
  4. +
  5. As long as you do it correctly, either is fine.
  6. +
  7. More details are needed to answer this question
  8. +
+ + + + + + +
+

A paired interval is for paired observations.

+ +
+
+

If the groups are independent is the correct interval.

+ +
+
+
+ +
+ + +
+ +
+

Consider the mtcars dataset. Construct a 95% T interval for MPG comparing +4 to 6 cylinder cars (subtracting in the order of 4 - 6) +assume a constant variance.

+ +
    +
  1. What is the lower endpoint of the interval to 1 decimal place?
  2. +
  3. What is the upper endpoint of the interval to 1 decimal place?
  4. +
+ + + + + + +
+

Use t.test with var.equal=TRUE

+ +
+
+
m4 <- mtcars$mpg[mtcars$cyl == 4]
+m6 <- mtcars$mpg[mtcars$cyl == 6]
+#this does 4 - 6
+confint <- as.vector(t.test(m4, m6, var.equal = TRUE)$conf.int)
+
+ +

3.2 +10.7

+ +
+
+
+ +
+ + +
+ +
+

If someone put a gun to your head and said "Your confidence interval +must contain what it's estimating or I'll pull the trigger", what would +be the smart thing to do?

+ +
    +
  1. Make your interval as wide as possible
  2. +
  3. Make your interval as small as possible
  4. +
  5. Call the authorities
  6. +
+ + + + + + +
+

C'mon. You don't need a hint

+ +
+
+

This is just an example of what happens to confidence intervals as you +increas the confidence level. You want to be quite sure in your interval (i.e. +have a large confidence level) and so you would increase the interval's width

+ +
+
+
+ +
+ + +
+ +
+

Refer back to comparing MPG for 4 versus 6 cylinders. What do you conclude?

+ +
    +
  1. The interval is above zero, suggesting 6 is better than 4 in the terms of MPG
  2. +
  3. The interval is above zero, suggesting 4 is better than 6 in the terms of MPG
  4. +
  5. The interval does not tell you anything about the hypothesis test; you have to do the test.
  6. +
  7. The interval contains 0 suggesting no difference.
  8. +
+ + + + + + +
+

Refer back to the problem, consider the implications of the interval being +larger than 0, double check the order in which things were subtracted and +make sure the results make sense in the context of the problem.

+ +
+
+

The interval was conducted subtracting 4 - 6 and was entirely above zero.

+ +
+
+
+ +
+ + +
+ +
+

Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was ???3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. Does the change in BMI over the four week period appear to differ between the treated and placebo groups?

+ +
    +
  1. Calculate the pooled variance estimate to 2 decimal places
  2. +
+ + + + + + +
+

The sample sizes are equal, so the pooled variance is the average of the +individual variances

+ +
+
+

3.2

+ +
n1 <- n2 <- 9
+x1 <- -3  ##treated
+x2 <- 1  ##placebo
+s1 <- 1.5  ##treated
+s2 <- 1.8  ##placebo
+spsq <- ( (n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2)
+
+ +

2.75

+ +
+
+
+ +
+ + +
+ +
+

For Binomial data the maximum likelihood estimate for the probability of +a success is

+ +
    +
  1. The proportion of successes
  2. +
  3. The proportion of failures
  4. +
  5. A shrunken version of the proportion of successes
  6. +
  7. A shrunken version of the proportion of failures
  8. +
+ + + + + + +
+

Look back at the notes about likelihood.

+ +
+
+

The MLE for binomial data is always the proportion of successes.

+ +
+
+
+ +
+ + +
+ +
+

Bayesian inference requires

+ +
    +
  1. A type I error rate
  2. +
  3. Setting your confidence level
  4. +
  5. Assigning a prior probability distribution
  6. +
  7. Evaluating frequency error rates
  8. +
+ + + + + + +
+

All of the other answers discuss frequentist concepts. All Bayesian analyses requiring setting a prior.

+ +
+
+
+ +
+ + +
+ + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/06_StatisticalInference/homework/hw3.md b/06_StatisticalInference/homework/hw3.md new file mode 100644 index 000000000..42919061a --- /dev/null +++ b/06_StatisticalInference/homework/hw3.md @@ -0,0 +1,211 @@ +--- +title : Homework 3 for Stat Inference +subtitle : Extra problems for Stat Inference +author : Brian Caffo +job : Johns Hopkins Bloomberg School of Public Health +framework : io2012 +highlighter : highlight.js +hitheme : tomorrow +#url: +# lib: ../../librariesNew #Remove new if using old slidify +# assets: ../../assets +widgets : [mathjax, quiz, bootstrap] +mode : selfcontained # {standalone, draft} +--- + + + +## About these slides +- These are some practice problems for Statistical Inference Quiz 3 +- They were created using slidify interactive which you will learn in +Creating Data Products +- Please help improve this with pull requests here +(https://github.com/bcaffo/courses) + + + +--- &multitext +Load the data set `mtcars` in the `datasets` R package. Calculate a +95% confidence interval to the nearest MPG. + +1. What is the lower endpoint of the interval? +2. What is the upper endpoint of the interval? + +*** .hint +Do `library(datasets)` and then `data(mtcars)` to get the data. +Consider `t.test` for calculations. You may have to install +the datasets package. + + +*** .explanation + +```r +library(datasets); data(mtcars) +round(t.test(mtcars$mpg)$conf.int) +``` + +``` +[1] 18 22 +attr(,"conf.level") +[1] 0.95 +``` + + +18 +22 + +--- &multitext +Suppose that data of 9 paired differences has a standard error of $1$, what value would the average difference have to be to have the lower endpoint of a 95% +students t confidence interval touch zero? + +1. Give the number here to two decimal places + +*** .hint +The t interval is $\bar x t_{.95, 8}\pm s /sqrt{n}$ + +*** .explanation +0.78 + +We want $\bar x = t_{.95} s / sqrt{n}$ + +```r +round(qt(.95, df = 3) * 1 / 3, 2) +``` + +``` +[1] 0.78 +``` + + + +--- &radio +An independent group Student's T interval is used over +a paired T interval when: + +1. The observations are paired between the groups. +2. _The observations within the groups are natually assumed to be statistically independent_ +3. As long as you do it correctly, either is fine. +4. More details are needed to answer this question + +*** .hint +A paired interval is for paired observations. + +*** .explanation +If the groups are independent is the correct interval. + + +--- &multitext +Consider the `mtcars` dataset. Construct a 95% T interval for MPG comparing +4 to 6 cylinder cars (subtracting in the order of 4 - 6) +assume a constant variance. + +1. What is the lower endpoint of the interval to 1 decimal place? +2. What is the upper endpoint of the interval to 1 decimal place? + +*** .hint +Use `t.test` with `var.equal=TRUE` + +*** .explanation + + +```r +m4 <- mtcars$mpg[mtcars$cyl == 4] +m6 <- mtcars$mpg[mtcars$cyl == 6] +#this does 4 - 6 +confint <- as.vector(t.test(m4, m6, var.equal = TRUE)$conf.int) +``` + + +3.2 +10.7 + + +--- &radio +If someone put a gun to your head and said "Your confidence interval +must contain what it's estimating or I'll pull the trigger", what would +be the smart thing to do? + +1. _Make your interval as wide as possible_ +2. Make your interval as small as possible +3. Call the authorities + +*** .hint +C'mon. You don't need a hint + +*** .explanation +This is just an example of what happens to confidence intervals as you +increas the confidence level. You want to be quite sure in your interval (i.e. +have a large confidence level) and so you would increase the interval's width + +--- &radio + +Refer back to comparing MPG for 4 versus 6 cylinders. What do you conclude? + +1. The interval is above zero, suggesting 6 is better than 4 in the terms of MPG +2. _The interval is above zero, suggesting 4 is better than 6 in the terms of MPG_ +3. The interval does not tell you anything about the hypothesis test; you have to do the test. +4. The interval contains 0 suggesting no difference. + +*** .hint +Refer back to the problem, consider the implications of the interval being +larger than 0, double check the order in which things were subtracted and +make sure the results make sense in the context of the problem. + +*** .explanation +The interval was conducted subtracting 4 - 6 and was entirely above zero. + +--- &multitext +Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was ???3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. Does the change in BMI over the four week period appear to differ between the treated and placebo groups? + +1. Calculate the pooled variance estimate to 2 decimal places + + +*** .hint +The sample sizes are equal, so the pooled variance is the average of the +individual variances + + +*** .explanation +3.2 + +```r +n1 <- n2 <- 9 +x1 <- -3 ##treated +x2 <- 1 ##placebo +s1 <- 1.5 ##treated +s2 <- 1.8 ##placebo +spsq <- ( (n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2) +``` + +2.75 + + +--- &radio + +For Binomial data the maximum likelihood estimate for the probability of +a success is + +1. _The proportion of successes_ +2. The proportion of failures +3. A shrunken version of the proportion of successes +4. A shrunken version of the proportion of failures + +*** .hint +Look back at the notes about likelihood. + +*** .explanation +The MLE for binomial data is always the proportion of successes. + +--- &radio + +Bayesian inference requires + +1. A type I error rate +2. Setting your confidence level +3. _Assigning a prior probability distribution_ +4. Evaluating frequency error rates + +*** .explanation +All of the other answers discuss frequentist concepts. All Bayesian analyses requiring setting a prior. + + From d7327b991ddc24cc4d0153164e270a7b1f6a45ff Mon Sep 17 00:00:00 2001 From: Brian Caffo Date: Fri, 23 May 2014 00:22:29 -0400 Subject: [PATCH 2/2] Added a fourth hw --- 06_StatisticalInference/homework/hw4.Rmd | 37 +++++++ 06_StatisticalInference/homework/hw4.html | 112 ++++++++++++++++++++++ 06_StatisticalInference/homework/hw4.md | 23 +++++ 3 files changed, 172 insertions(+) create mode 100644 06_StatisticalInference/homework/hw4.Rmd create mode 100644 06_StatisticalInference/homework/hw4.html create mode 100644 06_StatisticalInference/homework/hw4.md diff --git a/06_StatisticalInference/homework/hw4.Rmd b/06_StatisticalInference/homework/hw4.Rmd new file mode 100644 index 000000000..bf5a8da3b --- /dev/null +++ b/06_StatisticalInference/homework/hw4.Rmd @@ -0,0 +1,37 @@ +--- +title : Homework 4 for Stat Inference +subtitle : Extra problems for Stat Inference +author : Brian Caffo +job : Johns Hopkins Bloomberg School of Public Health +framework : io2012 +highlighter : highlight.js +hitheme : tomorrow +#url: +# lib: ../../librariesNew #Remove new if using old slidify +# assets: ../../assets +widgets : [mathjax, quiz, bootstrap] +mode : selfcontained # {standalone, draft} +--- +```{r setup, cache = F, echo = F, message = F, warning = F, tidy = F, results='hide'} +# make this an external chunk that can be included in any file +library(knitr) +options(width = 100) +opts_chunk$set(message = F, error = F, warning = F, comment = NA, fig.align = 'center', dpi = 100, tidy = F, cache.path = '.cache/', fig.path = 'fig/') + +options(xtable.type = 'html') +knit_hooks$set(inline = function(x) { + if(is.numeric(x)) { + round(x, getOption('digits')) + } else { + paste(as.character(x), collapse = ', ') + } +}) +knit_hooks$set(plot = knitr:::hook_plot_html) +``` + +## About these slides +- These are some practice problems for Statistical Inference Quiz 4 +- They were created using slidify interactive which you will learn in +Creating Data Products +- Please help improve this with pull requests here +(https://github.com/bcaffo/courses) diff --git a/06_StatisticalInference/homework/hw4.html b/06_StatisticalInference/homework/hw4.html new file mode 100644 index 000000000..565621a6d --- /dev/null +++ b/06_StatisticalInference/homework/hw4.html @@ -0,0 +1,112 @@ + + + + Homework 4 for Stat Inference + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

Homework 4 for Stat Inference

+

Extra problems for Stat Inference

+

Brian Caffo
Johns Hopkins Bloomberg School of Public Health

+
+
+
+ + + + +
+

About these slides

+
+
+
    +
  • These are some practice problems for Statistical Inference Quiz 4
  • +
  • They were created using slidify interactive which you will learn in +Creating Data Products
  • +
  • Please help improve this with pull requests here +(https://github.com/bcaffo/courses)
  • +
+ +
+ +
+ + +
+ + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/06_StatisticalInference/homework/hw4.md b/06_StatisticalInference/homework/hw4.md new file mode 100644 index 000000000..a22e64543 --- /dev/null +++ b/06_StatisticalInference/homework/hw4.md @@ -0,0 +1,23 @@ +--- +title : Homework 4 for Stat Inference +subtitle : Extra problems for Stat Inference +author : Brian Caffo +job : Johns Hopkins Bloomberg School of Public Health +framework : io2012 +highlighter : highlight.js +hitheme : tomorrow +#url: +# lib: ../../librariesNew #Remove new if using old slidify +# assets: ../../assets +widgets : [mathjax, quiz, bootstrap] +mode : selfcontained # {standalone, draft} +--- + + + +## About these slides +- These are some practice problems for Statistical Inference Quiz 4 +- They were created using slidify interactive which you will learn in +Creating Data Products +- Please help improve this with pull requests here +(https://github.com/bcaffo/courses)