---
title: "HW 3 Help: Simulating Random Experiments in R"
author: "Tom Fletcher"
date: "September 14, 2017"
output: html_document
---

```{r,echo=FALSE}
# This code chunk sets the options for the document
# Here I'm just getting rid of the comment '##' that prints with the R output
knitr::opts_chunk$set(comment="")
```

# Simulating a Six-sided Die Roll
This document will show you how to simulate a random experiment in R. In the first example, we will simulate rolling a six-sided die. First, we need to set up our sample space $\Omega$, which is the set of all possible outcomes:
```{r}
(Omega = 1:6)
```
The R code `1:6` generates a sequence of integers from 1 to 6. Also, notice that by placing the above assignment in parentheses, it causes the result to be printed out (just to check that `Omega` is what we want it to be!)

Now let's simulate rolling a die $N = 1000$ times using the R function `sample`. Remember you can read the help for this R function by typing `?sample` at the R command prompt.
```{r}
N = 1000  ## number of die rolls
dieRolls = sample(Omega, N, replace = TRUE)
```

We need to specify `replace = TRUE` to indicate that we are sampling with replacement. This means when we roll a particular value, it is not removed from the sample space for the next roll (in other words, the value we roll is "replaced").

We can print all of the rolls that we simulated:
```{r}
dieRolls
```

Next, this code counts the number of sixes that we rolled:
```{r}
sum(dieRolls == 6)
```
Let's break this down. The code `dieRolls == 6` converts every number in the list into a binary value, `TRUE` if it is a 6, or `FALSE` if it is not a 6. Then the `sum` function adds up all of the binary values, with `TRUE` being a 1 and `FALSE` being a 0. So, in the end, each 6 in the original list will count +1 to the final sum, and we get the total count of how many sixes are in the list.

Finally, we can estimate the probability of rolling a six by dividing by the total number of rolls:
```{r}
sum(dieRolls == 6) / N
```

Of course, we know that the true probability of getting a 6 is 1 out of 6, so the number above should be close to
```{r}
1 / 6
```

# Simulating Rolling Two Dice
If we want to simulate rolling more than one die, we can use the function `replicate`. This will replicate the experiment a set number of times (just like the name of the function implies!) Again, call `?replicate` from the R command line to learn more about it.

Let's roll two dice 10 times. The inner `sample` function rolls a single die 10 times (similar to the above code), the outer `replicate` function repeats this experiment twice:
```{r}
(diceRolls = replicate(2, sample(Omega, 10, replace = TRUE)))
```

The result of this command is an array, the first column are the results of the first die being rolled (10 times), and the second column are the results of the second die being rolled (10 times). So, each row of the array represents a roll of both dice. Indexing into the array let's us extract one of the die rolls, for example, the 5th roll of the first die:
```{r}
diceRolls[5, 1]
```

We can also access an entire column of this array in R, for example, all 10 rolls of the first die:
```{r}
diceRolls[ , 1]
```

Finally, we can access a row of the array, for example, the 4th roll of both dice:
```{r}
diceRolls[4, ]
```

Now that we've seen a small example, let's roll $N = 1000$ times (without printing it out this time):
```{r}
diceRolls = replicate(2, sample(Omega, N, replace = TRUE))
```

Let's count the number of times we rolled doubles (that is, first die equal to second die).
```{r}
(numberOfDoubles = sum(diceRolls[,1] == diceRolls[,2]))
```

The proportion of times this occurred is
```{r}
numberOfDoubles / N
```

There are 6 ways to get doubles and 36 possible rolls of two dice. So, this should be close to 6/36 = 1/6 = `r 1/6`.

Next, let's estimate the probability of getting exactly the pair $(3, 4)$, in that order:
```{r}
sum(diceRolls[,1] == 3 & diceRolls[,2] == 4) / N
```
The `&` is a binary AND operation. Type `?"&"` at the R command prompt to see the help for other logic operators (for example, NOT is `!` and OR is `|`). This should be close to the true probability of 1 / 36 = `r 1 / 36`.

How about the probability that the dice we rolled sum to five?
```{r}
sum(diceRolls[,1] + diceRolls[,2] == 5) / N
```

There are four ways to roll a sum of 5 (verify this!), so our probability should be close to 4 / 36 = 1 / 9 = `r 1 / 9`.

Finally, let's try an example that is a bit more tricky. What is the probability that the first die value is larger than the second one?
```{r}
sum(diceRolls[,1] > diceRolls[,2]) / N
```
If you count the number of possible outcomes where the first die is greater than the second, you should get 15. So, the true probability is 15 / 36 = 5 / 12 = `r 5 / 12`.

# Problem from Homework 3

In the second problem of the homework, you are asked to simulate flipping 3 coins. There are at least two ways you can set this up (pick your favorite).

Option 1: use the `replicate` function as in the above "two dice" example to replicate a single coin flip three times. Here your sample space for a single coin would be:
```{r}
(Omega = c("H", "T"))
```

Option 2: since there are only 8 total possible outcomes of flipping three coins, and each of the 8 are equally likely, you can enumerate all of them as your sample space. Then there is no need to use the `replicate` function, you can proceed similar to the first example (single die). Here your sample space for all three coins would be:
```{r}
(Omega = c("HHH", "HHT", "HTH", "HTT", "THH", "THT", "TTH", "TTT"))
```

# Complicated Problem (optional)
Consider the following problem: Say you flip a coin 100 times. What is the probability that the number of heads you flipped is a prime number?

This is a very difficult problem to solve with pencil and paper! However, we can very easy simulate this experiment and estimate what the probabilty is:
```{r}
# This function tests if a number x between 1 and 100 is prime
isPrime = function(x)
{
  if(x == 1)
    return(FALSE)
  else if(x == 2 | x == 3 | x == 5 | x == 7)
    return(TRUE)
  else if(x %% 2 == 0 | x %% 3 == 0 | x %% 5 == 0 | x %% 7 == 0)
    return(FALSE)
  else
    return(TRUE)
}

N = 10000
primeNumberOfHeads = replicate(N, isPrime(sum(sample(c(0,1), 100, replace=TRUE))))
sum(primeNumberOfHeads) / N
```
The true probability is 0.2051 (this can be painstakingly calculated by brute force!)