--- title: "Computing Confidence Intervals in R" author: "Tom Fletcher" date: "November 16, 2017" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Gaussian $\mu$ parameter, assuming known variance ### Snowfall example The first example in the notes gives information about an experiment where you measure the year's snowfall at 40 locations along the Wasatch front. We model each measurement as a normal (a.k.a., Gaussian) random variable, $X_1, X_2, \ldots, X_{40} \sim N(\mu, \sigma^2)$. You are told that the variance of snowfall is assumed to be *known in advance* as $\sigma^2 = 36$ inches$^2$. You are also told that the sample mean of your measurements was $\bar{x} = 620$ inches. Here is a 95\% confidence interval in R. Notice we don't need the original data, just the sample mean and known variance. ```{r} ## Information given in problem xBar = 620 sigma = 6 ## standard deviation = sqrt(variance) n = 40 lower = xBar - qnorm(0.975) * sigma / sqrt(n) upper = xBar + qnorm(0.975) * sigma / sqrt(n) ``` The 95\% confidence interval is (`r lower`, `r upper`). ## Gaussian $\mu$ parameter, using estimated variance ### Snow fall example As discussed in the lecture, knowing the true variance $\sigma^2$ is not really realistic. Let's take the snowfall example again, but this time use the sample variance (or really, sample standard deviation) computed from the data. This means we need to use the Student $t$ distribution. ```{r} ## Information given in problem xBar = 620 s = sqrt(34) ## sample standard deviation = sqrt(sample variance) n = 40 lower = xBar - qt(0.975, df = n - 1) * s / sqrt(n) upper = xBar + qt(0.975, df = n - 1) * s / sqrt(n) ``` The 95\% confidence interval for the mean is now (`r lower`, `r upper`). ### Tree ring data example Finally, let's do one more example from raw data. Recall the tree ring data. Here is how we'd get a 95\% confidence interval of the mean, using the sample standard deviation and $t$ distribution. ```{r} xBar = mean(treering) s = sd(treering) n = length(treering) lower = xBar - qt(0.975, df = n - 1) * s / sqrt(n) upper = xBar + qt(0.975, df = n - 1) * s / sqrt(n) ``` The 95\% confidence interval for the mean is (`r lower`, `r upper`). Let's see the effect if we want to be "more confident", say a 99\% confidence interval. The only thing we need to change is the computation of the critical value (the call to `qt`). ```{r} lower = xBar - qt(0.995, df = n - 1) * s / sqrt(n) upper = xBar + qt(0.995, df = n - 1) * s / sqrt(n) ``` The 99\% confidence interval for the mean is (`r lower`, `r upper`). Notice that this is *wider* than the 95\% interval.