--- title: "The Gaussian Distribution" author: "Tom Fletcher" date: "September 28, 2017" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, comments = "") ``` This is an example of how to plot the Gaussian distribution in R. Recall the notation for a Gaussian (a.k.a. Normal) random variable is $X \sim N(\mu, \sigma^2)$, which has parameters $\mu \in \mathbb{R}$, the _mean_, and $\sigma^2 > 0$, the _variance_. The probability density function (pdf) for a normal random variable is given by $$ f(x) = \frac{1}{\sqrt{2 \pi} \sigma} \exp\left(-\frac{(x - \mu)^2}{2 \sigma^2}\right). $$ First, for plotting functions of $x$, create a sequence of $x$ values in the range -4 to 4 in increments of 0.01. ```{r} x = seq(-4, 4, 0.01) ``` Next, the $y$ values correspond to the Gaussian pdf, with $\mu = 0$ and $\sigma = 1$. This uses the built-in `dnorm` command in R. ```{r} y = dnorm(x, mean = 0, sd = 1) ``` An equivalent line of code would be to type in the formula yourself: ```{r} y = (1 / sqrt(2 * pi)) * exp(-0.5 * x^2) ``` Now plot the Gaussian pdf as a continuous curve: ```{r} plot(x, y, type = 'l', lwd = 2, col = 'red', main = "Gaussian pdf") ``` Note the effect of changing the mean ($\mu$). We can plot these on top of the previous plot by using the `lines` command. ```{r} y2 = dnorm(x, mean = -1, sd = 1) y3 = dnorm(x, mean = 2, sd = 1) plot(x, y, type = 'l', lwd = 2, col = 'red', main = "Gaussian pdf") lines(x, y2, col = 'black', lwd = 2) lines(x, y3, col = 'blue', lwd = 2) legend("topright", lwd = 2, c("mu = 0", "mu = -1", "mu = 2"), col = c("red", "black", "blue")) ``` Now let's check the effect of changing the standard deviation (\sigma). ```{r} y2 = dnorm(x, mean = 0, sd = 2) y3 = dnorm(x, mean = 0, sd = 3) plot(x, y, type = 'l', lwd = 2, col = 'red', main = "Gaussian pdf") lines(x, y2, col = 'black', lwd = 2) lines(x, y3, col = 'blue', lwd = 2) legend("topright", lwd = 2, c("sigma = 1", "sigma = 2", "sigma = 3"), col = c("red", "black", "blue")) ``` The `pnorm` function will give you the cdf. So, `pnorm(a)` returns $P(X \leq a)$. You can also specify the mean and standard deviation, just like `dnorm`. Examples: ```{r} pnorm(0, mean = 0, sd = 1) pnorm(1, mean = 0, sd = 1) ``` Here is a plot of the cdf: ```{r} plot(x, pnorm(x, mean = 0, sd = 1), type = 'l', lwd = 2, main = "Gaussian cdf") ``` To get the probability in a range $[a, b]$, that is, $P(a \leq X \leq b)$, you would use the `pnorm(b) - pnorm(a)` Example: probability (area) between $[-1, 1]$: ```{r} pnorm(1, mean = 0, sd = 1) - pnorm(-1, mean = 0, sd = 1) ``` Example: area between $[0, 2]$: ```{r} pnorm(2, mean = 0, sd = 1) - pnorm(-2, mean = 0, sd = 1) ``` Here are some fancy plots for showing the shaded-in area under the pdf curve. ```{r} # First the area between (-infinity, 1] x = seq(-4, 4, 0.01) y = dnorm(x) plot(x, y, type = 'l', lwd = 2, col = 'red', main = paste("Area under a Gaussian pdf (", expression(infinity), ", 1)")) x = seq(-4, 1, 0.01) y = dnorm(x) polygon(c(-4, x, 1), c(0, y, 0), col = "grey") text(-0.1, 0.15, format(pnorm(1), digits = 3)) # These next plots are the 1, 2, and 3 standard deviation rules # Sigma = 1 x = seq(-4, 4, 0.01) y = dnorm(x) plot(x, y, type = 'l', lwd = 2, col = 'red', main = "Area under a Gaussian pdf [-1, 1]") x = seq(-1, 1, 0.01) y = dnorm(x) polygon(c(-1, x, 1), c(0, y, 0), col = "grey") text(0, 0.15, format(pnorm(1) - pnorm(-1), digits = 3)) # Sigma = 2 x = seq(-4, 4, 0.01) y = dnorm(x) plot(x, y, type = 'l', lwd = 2, col = 'red', main = "Area under a Gaussian pdf [-2, 2]") x = seq(-2, 2, 0.01) y = dnorm(x) polygon(c(-2, x, 2), c(0, y, 0), col = "grey") text(0, 0.15, format(pnorm(2) - pnorm(-2), digits = 3)) # Sigma = 3 x = seq(-4, 4, 0.01) y = dnorm(x) plot(x, y, type = 'l', lwd = 2, col = 'red', main = "Area under a Gaussian pdf [-3, 3]") x = seq(-3, 3, 0.01) y = dnorm(x) polygon(c(-3, x, 3), c(0, y, 0), col = "grey") text(0, 0.15, format(pnorm(3) - pnorm(-3), digits = 3)) ```