---
title: "Statistics Plots"
author: "Tom Fletcher"
date: "November 2, 2017"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, comment = "")
```

We'll use the Old Faithful data that comes with R. Type just "faithful" and hit enter to see the raw data:
```{r}
faithful
```

To get information on this data, see the help file by typing this in the R prompt:
```{r, eval=FALSE}
?faithful
```

Notice this is a "data frame" in R, which just means a table of data with named columns. To retrieve a single column, use the "$" operator:
```{r}
faithful$eruptions
```

Here's the other column:
```{r}
faithful$waiting
```

The "summary" command in R will give you some basic statistics:
```{r}
summary(faithful)
```

The usual sample statistics are also commands:
```{r}
mean(faithful$eruptions)   # mean
median(faithful$eruptions) # median
var(faithful$eruptions)    # variance
sd(faithful$eruptions)     # standard deviation
```

As a first visualization, we could look at the histogram.
```{r}
hist(faithful$eruptions, main="Histogram of Old Faithful Eruption Time")
```

You can change the number of bins in the histogram with the `breaks` option.
```{r}
hist(faithful$eruptions, main="Histogram of Old Faithful Eruption Time", breaks=20)
```

The vertical axis on the histogram plots denotes the counts of how many data points land in a bin. We can convert this to an estimate of the probability density function by dividing by the total number of data points. This is done by using the `freq = FALSE` option.

The `density()` command in R is another way to visualize the estimated pdf from a data set. Here we will plot it on top of our histogram to compare:
```{r]}
hist(faithful$eruptions, main="Histogram of Old Faithful Eruption Time", breaks=20, freq=FALSE)
lines(density(faithful$eruptions, bw = 0.1), col = 'red', lw = 2)
```

The empirical cdf is plotted using the `ecdf` command.
```{r}
plot(ecdf(faithful$eruptions), cex.points=0.5, main="Empirical CDF for Old Faithful Eruption Time")
```

Here's how to do box plots:
```{r}
boxplot(faithful$eruptions, main="Box Plot of Old Faithful Eruption Time")
```

We can look at joint statistics such as sample covariance and correlation also.
```{r}
cov(faithful$waiting, faithful$eruptions)
cor(faithful$waiting, faithful$eruptions)
```

Scatter plots are useful for looking at two random variables, $X, Y$, and their relationships.
```{r}
plot(faithful$eruptions, faithful$waiting, main="Scatter Plot of Old Faithful Waiting vs Eruption Times")
```

This example is taken from Wikipedia. It is a box plot of the Michelson-Morley speed of light experiments.
```{r}
morley$Expt <- factor(morley$Expt)
par(las=1, mar=c(5.1, 5.1, 2.1, 2.1))
boxplot(Speed ~ Expt, morley, xlab = "Experiment No.",
        ylab="Speed of light (km/s minus 299,000)")
abline(h=792.458, col="red")
text(3,792.458,"true\nspeed")
```

Another nice example of a box plot is at the bottom of the help page for boxplot (type `?boxplot` at the R prompt).