Random Variables
You would notice that the discussion about probability spaces was a little too verbose for comfort. Having to say stuff like "the probability of the event that the die is either odd or below three" is quite tiring, and will definitely become increasingly cumbersome as we deal with more complex probabilities.
To streamline the discussion, we introduce the notation of random variables. A random variable is like a number, except we don't know exactly what it is, only how it distributes.
So if I start my discussion with "let be a fair die", anyone who studied a bit of probability will interpret this as " is a random variable that can have an integer value between and , all of which are equally probable". Then, instead of the cumbersome "the probability that result of the tie toss less than three", we can just write .
We can also use random variables to describe events, for example, the event "the result of throwing the die was even", we can write or more conventionally or even simply . Whatever is most comfortable for you.
This might seem like a cosmetic convenience, but defining random variables will actually go a long way in helping us understand complex situation. With all their simplicity, they turn out to be a very powerful abstraction, that allows us to focus on how things distribute while disregarding what they are.
Discrete Random Variables
We will introduce the definition of a random variable specifically for discrete probability spaces.
Recall that in such a space is defined over a sample space of atomic events. A random variable is simply a function that assigns to each atomic event a real number. What does that number represent? Whatever we want!
In the die toss case (where ), if we care about the result we can set . But we can do a whole lot of other stuff. For example, we can consider to be the set of all possible results of flipping a coin times. Each atomic event corresponds to one of the possible lists of length of "heads" and "tails". If we only care about the amount of heads we can define to be the number of heads (regardless of where they appear in the list). Then a complex event such as "the probability that between 27 and 34 flips landed on heads" admits the succinct description .
We can define variables as complex as the imagination would let us, for example could be the length of the longest succession of identical results in , if that's what we're interested in.
Expectation
Given a random variable , we define its expectation as the average value it could get, weighted by their probability.
For example, say that we flip a coin. The probability space is (which stand for heads and tails respectively). Say that we consider the desired result. We can model this by defining a random variable satisfying and .
If our coin is fair, the expectation of , denoted , is intuitively one half, because there is a one half chance that it is zero and a one half chance that it is one. In math:
More generally speaking, if is a random variable over some discrete probability space we have the general formula
So for example, if is the result of tossing a fair die, we have that
Let us work out a more complicated example. Say that we have an unfair day with the following properties: all odd results are equally likely, all even results are equally likely, but an even result is three times as likely as an odd result, What is the expected value of tossing the die?
Let be the random variable describing the die. If we denote we get that . However, since a die is always either odd or even but never both, we get that
From which we get that .
However, since we assumed that all odd outcomes are equally likely, we get that and it follows that
from which it follows that , and we can similarly prove that . We then get that
For a spicier example, we can ask what is the expected number of times we need to flip a fair coin before we get heads. We already modeled this with the sample space and the probability function . The random variable maps each atomic event to the number of coin flips it implies, and its expectation is
where the last equality is not hard to show, but requires a bit of calculus.
We can generalize this to unfair coins, to show that if a coin lands on head with probability , the expected number of times we'd have to flip it before seeing heads is .
This is a very important result! It tells that the amount of attempts before we are successful at something is inversely proportional to how likely we are to succeed each time (at least when talking about stuff like participating in a raffle, that we can't improve at and increase our chances over time).
The Distribution of a Random Variable
What's nice about a random variable is that we often don't really care about the sample space its defined above. We really only care how it distributes. Often times, we can tell very different origin stories for random variables, but find out that they are exactly the same.
For example, consider we flip a fair coin until we get tails, and let be zero of is even or one if is odd. Consider that we toss a fair die once, and let be zero for or one for . We've already seen that is and you should be able to see that is as well. Since both could only be or we get that .
In other words, and distribute equally. They might have a very different origin story, but they are identical as random variables. We denote this as .
This notation is also very powerful, as it allows to consider a few ubiquitous random variables, and exploring them in the abstract would give useful results for any random variable we run into that distributes the same.
The simplest random variable is a Bernoulli experiment. The sample space is and we set the random variable . We typically call the event failure and the event success. Any probability function on is defined by the success probability: a single number satisfying . It immediately follows that .
We call the resulting a Poisson variable with parameter p and denote it .
With this language in hand, if is the random variable defined on a fair die as and then we have that .
Another important variable is a Binomial variable. Say we repeat a Poisson experiment with parameter for times, and let be the random variable that counts how many of the repetitions were successful. What is the probability that ? Clearly, for it is zero, because we can't have flipped more heads than we flipped coins. How many sequences there are for which exactly flips are heads? Well, there were a total of flips, and we have to choose of them. The number of ways to make this choice is the binomial coefficient . And what is the probability of each such sequence? Well, for each coin that landed on heads, it did so with probability . Since there are of those, the probability they all landed on heads is . Similarly, all the remaining coins landed on tails, which happens with probability . Multiplying all of this together we get that
This leads us to the following definition: if is a random variable, we say that distributes binomially with parameters , and denote , if for any we have that
Exercise: Verify that
Exercise: Show that , namely that if we flip a $p$-coin $n$ times, we expect $pn$ coins to land on heads.
Laws of Large Numbers
What is the meaning of expectation? What motivates this definition?
We claimed that expectation is what happens in the "average case", but why is this proclamation justified? It certainly isn't something that's obvious from the definition.
Consider for example a fair coin toss, and say that if it lands on heads I win one dollar, but if it lands on tails I get nothing. That is, my profit is described by the random variable satisfying and . We know how to compute that , but why is that the "average"?
Well, say I repeat the game times, what is the probability that I got, say, less than three heads? The number of heads distributes like , and we know that
Using the symmetry of the problem (the probability of exactly 3 heads is exactly like the probability of exactly 3 tails, which is the same as the probability of exactly 7 heads), we can conclude that we will flip between four and six heads (that is, will be at distance at most one from the average) with a probability of about . We already see the concentration around the half, with less than a third of the options getting almost two thirds of the probability.
Using a binomial distribution to understand how this progresses as becomes larger is possible, but a bit tiresome. Instead, we appeal to one of computer science's most beloved secret weapon: the Chernoff bound.
There is no reason to describe the bound here, I will just tell you that it is an extremely powerful weapon to the "scattering" of repeating many independent (though not necessarily identical!) experiments. Applying Chernoff to coin tossing, one can prove that if we toss a coin times, the probablity to see more than flips of the same result is less than . By plugging numbers we can see, for example, that if we flip a thousand independent fair coins, the probability that get the same result more than times is less than one in billion, and if we flip a million such coins, the number of heads will be within flips from half a million with probability above .
This is a special example of a law of large numbers. Such laws give us conditions under which repeating experiments and averaging will converge to the expectation as we average more and more experiments. To demonstrate it, I had my computer flip ten thousand random coins, and averaged the difference between heads and tails. I repeated the experiment five times, and plotted each. This is the result:
Last updated