Understanding BlockDAGs and GHOSTDAG
My WebsiteSupport My Work
  • Welcome!
  • Preface
  • Introduction
  • Part 1: BlockChains and BlockDAGs
    • Introduction
    • Chapter 1: From BFT to PoW
      • Byzantine Fault Tolerance
      • Proof-of-Work (PoW)
      • How PoW Works*
      • PoS Vs. PoW
      • Exercises
    • Chapter 2: the Block Chain Paradigm
      • A Graph Theory Primer
      • The Paradigm
      • Honesty and Rationality
      • Three Chain Selection Rules
      • Security Notions
      • A Security Model for Blockchain Consensus
      • Selfish Mining in Bitcoin
      • Safety
      • Liveness
      • Confirmation Times
      • Proving Bitcoin's Security*
      • Exercises
  • Part 2: The GHOSTDAG Protocol
    • Page 2
  • Supplementary Material
    • Math
      • Stuff You Should Know
        • Asymptotics, Growth, and Decay
        • Binomial Coefficients
        • The Binomial Formula
        • Geometric Sums and Series
      • Probability Theory
        • Probability Spaces
        • Random Variables
        • The Math of Block Creation
    • Computer Science
      • Stuff you Should Know
        • Asymptotic Notation
        • Negligible Functions
      • Cryptography
        • Random Oracles
        • Merkle Trees
        • One Time Pads
Powered by GitBook
On this page
  • Discrete Random Variables
  • Expectation
  • The Distribution of a Random Variable
  • Laws of Large Numbers
  1. Supplementary Material
  2. Math
  3. Probability Theory

Random Variables

PreviousProbability SpacesNextThe Math of Block Creation

Last updated 3 months ago

You would notice that the discussion about was a little too verbose for comfort. Having to say stuff like "the probability of the event that the die is either odd or below three" is quite tiring, and will definitely become increasingly cumbersome as we deal with more complex probabilities.

To streamline the discussion, we introduce the notation of random variables. A random variable is like a number, except we don't know exactly what it is, only how it distributes.

So if I start my discussion with "let XXX be a fair die", anyone who studied a bit of probability will interpret this as "XXX is a random variable that can have an integer value between 111 and 666, all of which are equally probable". Then, instead of the cumbersome "the probability that result of the tie toss less than three", we can just write P[X≤3]\mathbb{P}[X\le 3]P[X≤3].

We can also use random variables to describe events, for example, the event "the result of throwing the die was even", we can write A={X=2 or X=4 or X=6}A=\left\{ X=2\text{ or }X=4\text{ or }X=6\right\} A={X=2 or X=4 or X=6} or more conventionally A={X∈{2,4,6}}A=\left\{ X\in\left\{ 2,4,6\right\} \right\} A={X∈{2,4,6}} or even simply A={X is even}A=\left\{ X\text{ is even}\right\} A={X is even}. Whatever is most comfortable for you.

This might seem like a cosmetic convenience, but defining random variables will actually go a long way in helping us understand complex situation. With all their simplicity, they turn out to be a very powerful abstraction, that allows us to focus on how things distribute while disregarding what they are.

Discrete Random Variables

We will introduce the definition of a random variable specifically for .

Recall that in such a space is defined over a sample space Ω\OmegaΩ of atomic events. A random variable XXX is simply a function that assigns to each atomic event a real number. What does that number represent? Whatever we want!

In the die toss case (where Ω={ω1,…,ω6}\Omega = \{\omega_1,\ldots,\omega_6\}Ω={ω1​,…,ω6​}), if we care about the result we can set X(ωn)=nX(\omega_n)=nX(ωn​)=n. But we can do a whole lot of other stuff. For example, we can consider Ω\OmegaΩ to be the set of all possible results of flipping a coin 100100100 times. Each atomic event ω∈Ω\omega \in \Omegaω∈Ω corresponds to one of the 21002^{100}2100 possible lists of length 100100100 of "heads" and "tails". If we only care about the amount of heads we can define X(ω)X(\omega)X(ω) to be the number of heads (regardless of where they appear in the list). Then a complex event such as "the probability that between 27 and 34 flips landed on heads" admits the succinct description {27≤X≤34}\{27\le X\le 34\}{27≤X≤34}.

We can define variables XXX as complex as the imagination would let us, for example X(ω)X(\omega)X(ω) could be the length of the longest succession of identical results in ω\omegaω, if that's what we're interested in.

Expectation

Given a random variable XXX, we define its expectation as the average value it could get, weighted by their probability.

For example, say that we flip a coin. The probability space is Ω={H,T}\Omega=\{H,T\}Ω={H,T} (which stand for heads and tails respectively). Say that we consider HHH the desired result. We can model this by defining a random variable XXX satisfying X(H)=1X(H) = 1X(H)=1 and X(T)=0X(T) = 0X(T)=0.

If our coin is fair, the expectation of XXX, denoted E[X]\mathbb{E}[X]E[X], is intuitively one half, because there is a one half chance that it is zero and a one half chance that it is one. In math:

E[X]=P[H]⋅X(H)+P[T]⋅X(T)=12⋅1+12⋅0=12\begin{aligned}\mathbb{E}\left[X\right] & =\mathbb{P}\left[H\right]\cdot X\left(H\right)+\mathbb{P}\left[T\right]\cdot X\left(T\right)\\ & =\frac{1}{2}\cdot1+\frac{1}{2}\cdot0\\ & =\frac{1}{2} \end{aligned} E[X]​=P[H]⋅X(H)+P[T]⋅X(T)=21​⋅1+21​⋅0=21​​

More generally speaking, if XXX is a random variable over some discrete probability space (Ω,P)(\Omega,\mathbb{P})(Ω,P) we have the general formula

E[X]=∑ω∈ΩP[ω]⋅X(ω)\mathbb{E}\left[X\right]=\sum_{\omega\in\Omega}\mathbb{P}\left[\omega\right]\cdot X\left(\omega\right)E[X]=ω∈Ω∑​P[ω]⋅X(ω)

So for example, if XXX is the result of tossing a fair die, we have that

E[X]=∑n=16P[ωn]⋅X(ωn)=∑n=1616⋅n=312\begin{aligned}\mathbb{E}\left[X\right] & =\sum_{n=1}^{6}\mathbb{P}\left[\omega_{n}\right]\cdot X\left(\omega_{n}\right)\\ & =\sum_{n=1}^{6}\frac{1}{6}\cdot n\\ & =3\frac{1}{2} \end{aligned} E[X]​=n=1∑6​P[ωn​]⋅X(ωn​)=n=1∑6​61​⋅n=321​​

Let us work out a more complicated example. Say that we have an unfair day with the following properties: all odd results are equally likely, all even results are equally likely, but an even result is three times as likely as an odd result, What is the expected value of tossing the die?

Let XXX be the random variable describing the die. If we denote P[X is odd]=x\mathbb{P}\left[X\text{ is odd}\right]=xP[X is odd]=x we get that P[X is even]=3x\mathbb{P}\left[X\text{ is even}\right]=3xP[X is even]=3x. However, since a die is always either odd or even but never both, we get that

1=P[X is odd or even]=P[X is odd]+P[X is even]=x+3x=4x\begin{aligned}1 & =\mathbb{P}\left[X\text{ is odd or even}\right]\\ & =\mathbb{P}\left[X\text{ is odd}\right]+\mathbb{P}\left[X\text{ is even}\right]\\ & =x+3x\\ & =4x \end{aligned} 1​=P[X is odd or even]=P[X is odd]+P[X is even]=x+3x=4x​

From which we get that P[X is odd]=14\mathbb{P}\left[X\text{ is odd}\right]=\frac{1}{4}P[X is odd]=41​.

However, since we assumed that all odd outcomes are equally likely, we get that P[ω1]=P[ω3]=P[ω5]\mathbb{P}[\omega_1] = \mathbb{P}[\omega_3] = \mathbb{P}[\omega_5]P[ω1​]=P[ω3​]=P[ω5​] and it follows that

14=P[X is odd]=P[ω1]+P[ω3]+P[ω5]=3⋅P[ω1]\begin{aligned}\frac{1}{4} & =\mathbb{P}\left[X\text{ is odd}\right]\\ & =\mathbb{P}\left[\omega_{1}\right]+\mathbb{P}\left[\omega_{3}\right]+\mathbb{P}\left[\omega_{5}\right]\\ & =3\cdot\mathbb{P}\left[\omega_{1}\right] \end{aligned} 41​​=P[X is odd]=P[ω1​]+P[ω3​]+P[ω5​]=3⋅P[ω1​]​

from which it follows that P[ω1]=P[ω3]=P[ω5]=112\mathbb{P}[\omega_1] = \mathbb{P}[\omega_3] = \mathbb{P}[\omega_5]= \frac{1}{12}P[ω1​]=P[ω3​]=P[ω5​]=121​, and we can similarly prove that P[ω2]=P[ω4]=P[ω6]=14\mathbb{P}[\omega_2] = \mathbb{P}[\omega_4] = \mathbb{P}[\omega_6]= \frac{1}{4}P[ω2​]=P[ω4​]=P[ω6​]=41​. We then get that

E[X]=∑n=16P[ωn]⋅X(ωn)=112⋅1+14⋅2+112⋅3+14⋅4+112⋅5+14⋅6=1+3+512+2+4+64=334\begin{aligned}\mathbb{E}\left[X\right] & =\sum_{n=1}^{6}\mathbb{P}\left[\omega_{n}\right]\cdot X\left(\omega_{n}\right)\\ & =\frac{1}{12}\cdot1+\frac{1}{4}\cdot2+\frac{1}{12}\cdot3+\frac{1}{4}\cdot4+\frac{1}{12}\cdot5+\frac{1}{4}\cdot6\\ & =\frac{1+3+5}{12}+\frac{2+4+6}{4}\\ & =3\frac{3}{4} \end{aligned} E[X]​=n=1∑6​P[ωn​]⋅X(ωn​)=121​⋅1+41​⋅2+121​⋅3+41​⋅4+121​⋅5+41​⋅6=121+3+5​+42+4+6​=343​​
E[X]=∑n=1∞P[ωn]⋅X(ωn)=∑n=1∞n2n=2\begin{aligned}\mathbb{E}\left[X\right] & =\sum_{n=1}^{\infty}\mathbb{P}\left[\omega_{n}\right]\cdot X\left(\omega_{n}\right)\\ & =\sum_{n=1}^{\infty}\frac{n}{2^{n}}\\ & =2 \end{aligned} E[X]​=n=1∑∞​P[ωn​]⋅X(ωn​)=n=1∑∞​2nn​=2​

where the last equality is not hard to show, but requires a bit of calculus.

Proving the equality*

We use the fact that if ∣x∣<1|x|<1∣x∣<1 then the sequence ∑n=0∞xn\sum_{n=0}^{\infty}x^{n}∑n=0∞​xn absolutely converges around xxx. This means we are allowed to change the order of summation and differentiation, getting

∑x=1∞nxn=x∑x=1∞nxn−1=x∑x=0∞(n+1)xn=x∑x=0∞ddxxn+1=xddx∑x=0∞xn+1=xddxx∑x=0∞xn=xddxx1−x=x(1−x)2\begin{aligned}\sum_{x=1}^{\infty}nx^{n} & =x\sum_{x=1}^{\infty}nx^{n-1} & & =x\sum_{x=0}^{\infty}\left(n+1\right)x^{n}\\ & =x\sum_{x=0}^{\infty}\frac{d}{dx}x^{n+1} & & =x\frac{d}{dx}\sum_{x=0}^{\infty}x^{n+1}\\ & =x\frac{d}{dx}x\sum_{x=0}^{\infty}x^{n} & & =x\frac{d}{dx}\frac{x}{1-x}\\ & =\frac{x}{\left(1-x\right)^{2}} \end{aligned} x=1∑∞​nxn​=xx=1∑∞​nxn−1=xx=0∑∞​dxd​xn+1=xdxd​xx=0∑∞​xn=(1−x)2x​​​=xx=0∑∞​(n+1)xn=xdxd​x=0∑∞​xn+1=xdxd​1−xx​​\

and setting x=1/2x=1/2x=1/2 we get the desired result.

We can generalize this to unfair coins, to show that if a coin lands on head with probability ppp, the expected number of times we'd have to flip it before seeing heads is 1/p1/p1/p.

This is a very important result! It tells that the amount of attempts before we are successful at something is inversely proportional to how likely we are to succeed each time (at least when talking about stuff like participating in a raffle, that we can't improve at and increase our chances over time).

The Distribution of a Random Variable

What's nice about a random variable is that we often don't really care about the sample space its defined above. We really only care how it distributes. Often times, we can tell very different origin stories for random variables, but find out that they are exactly the same.

For example, consider we flip a fair coin until we get tails, and let X(ωn)X(\omega_n)X(ωn​) be zero of nnn is even or one if nnn is odd. Consider that we toss a fair die once, and let Y(ωn)Y(\omega_n)Y(ωn​) be zero for n=1,2n=1,2n=1,2 or one for n=3,…,6n=3,\ldots,6n=3,…,6. We've already seen that P[X=1]\mathbb{P}[X = 1]P[X=1] is 2/32/32/3 and you should be able to see that P[Y=1]\mathbb{P}[Y = 1]P[Y=1] is 2/32/32/3 as well. Since both could only be 000 or 111 we get that P[X=0]=P[Y=0]\mathbb{P}[X = 0]=\mathbb{P}[Y=0]P[X=0]=P[Y=0].

In other words, XXX and YYY distribute equally. They might have a very different origin story, but they are identical as random variables. We denote this as X∼YX\sim YX∼Y.

This notation is also very powerful, as it allows to consider a few ubiquitous random variables, and exploring them in the abstract would give useful results for any random variable we run into that distributes the same.

The simplest random variable is a Bernoulli experiment. The sample space is Ω={ω0,ω1}\Omega=\{\omega_0,\omega_1\}Ω={ω0​,ω1​} and we set the random variable X(ωb)=bX(\omega_b)=bX(ωb​)=b. We typically call the event ω0\omega_0ω0​ failure and the event ω1\omega_1ω1​ success. Any probability function on Ω\OmegaΩ is defined by the success probability: a single number 0<p<10<p<10<p<1 satisfying P[ω1]=p\mathbb{P}[\omega_1] = pP[ω1​]=p. It immediately follows that P[ω0]=1−p\mathbb{P}[\omega_0] = 1-pP[ω0​]=1−p.

We call the resulting XXX a Poisson variable with parameter p and denote it Poi(p)Poi(p)Poi(p).

With this language in hand, if XXX is the random variable defined on a fair die as X(ω1)=X(ω2)=0X(\omega_1)=X(\omega_2) = 0X(ω1​)=X(ω2​)=0 and X(ω3)=X(ω4)=X(ω5)=X(ω6)=1X(\omega_3)=X(\omega_4)=X(\omega_5)=X(\omega_6)=1X(ω3​)=X(ω4​)=X(ω5​)=X(ω6​)=1 then we have that X∼Poi(p)X\sim Poi(p)X∼Poi(p).

P[X=k]=(nk)pk(1−p)n−k\mathbb{P}\left[X=k\right]={n \choose k}p^{k}\left(1-p\right)^{n-k}P[X=k]=(kn​)pk(1−p)n−k

This leads us to the following definition: if XXX is a random variable, we say that XXX distributes binomially with parameters n,pn,pn,p, and denote X∼B(n,p)X\sim \mathcal{B}(n,p)X∼B(n,p), if for any k=0,…,nk=0,\ldots,nk=0,…,n we have that

P[X=k]=(nk)pk(1−p)n−k\mathbb{P}\left[X=k\right]={n \choose k}p^{k}\left(1-p\right)^{n-k}P[X=k]=(kn​)pk(1−p)n−k

Exercise: Verify that ∑k=0nP[B(n,p)=k]=1\sum_{k=0}^n \mathbb{P}[\mathcal{B}(n,p)=k] = 1∑k=0n​P[B(n,p)=k]=1

Exercise: Show that E[B(n,p)]\mathbb{E}\left[\mathcal{B}\left(n,p\right)\right]E[B(n,p)], namely that if we flip a $p$-coin $n$ times, we expect $pn$ coins to land on heads.

Solution

E[B(n,p)]=∑k=0n(nk)pk(1−p)n−kk=n∑k=1n(n−1k−1)pk(1−p)n−k=n∑k=0n−1(n−1k)pk+1(1−p)n−(k+1)=np∑k=0n−1(n−1k)pk(1−p)(n−1)−k=np\begin{aligned}\mathbb{E}\left[\mathcal{B}\left(n,p\right)\right] & =\sum_{k=0}^{n}{n \choose k}p^{k}\left(1-p\right)^{n-k}k\\ & =n\sum_{k=1}^{n}{n-1 \choose k-1}p^{k}\left(1-p\right)^{n-k}\\ & =n\sum_{k=0}^{n-1}{n-1 \choose k}p^{k+1}\left(1-p\right)^{n-\left(k+1\right)}\\ & =np\sum_{k=0}^{n-1}{n-1 \choose k}p^{k}\left(1-p\right)^{\left(n-1\right)-k}\\ & =np \end{aligned} E[B(n,p)]​=k=0∑n​(kn​)pk(1−p)n−kk=nk=1∑n​(k−1n−1​)pk(1−p)n−k=nk=0∑n−1​(kn−1​)pk+1(1−p)n−(k+1)=npk=0∑n−1​(kn−1​)pk(1−p)(n−1)−k=np​

where in the second equality we used the fact that

(nk)k=n!k!(n−k)!k=(n−1)!nk(k−1)!(n−k)=(n−1k−1)n{n \choose k}k=\frac{n!}{k!\left(n-k\right)!}k=\frac{\left(n-1\right)!n}{k\left(k-1\right)!\left(n-k\right)}={n-1 \choose k-1}n(kn​)k=k!(n−k)!n!​k=k(k−1)!(n−k)(n−1)!n​=(k−1n−1​)n

Laws of Large Numbers

What is the meaning of expectation? What motivates this definition?

We claimed that expectation is what happens in the "average case", but why is this proclamation justified? It certainly isn't something that's obvious from the definition.

Consider for example a fair coin toss, and say that if it lands on heads I win one dollar, but if it lands on tails I get nothing. That is, my profit is described by the random variable XXX satisfying X(H)=1X(H)=1X(H)=1 and X(T)=0X(T)=0X(T)=0. We know how to compute that E[X]=1/2\mathbb{E}[X]=1/2E[X]=1/2, but why is that the "average"?

Well, say I repeat the game 101010 times, what is the probability that I got, say, less than three heads? The number of heads distributes like Bin(10,1/2)\mathcal{Bin}(10,1/2)Bin(10,1/2), and we know that

P[Bin(10,1/2)≤3]=P[Bin(10,1/2)=0]+P[Bin(10,1/2)=1]+P[Bin(10,1/2)=2]+P[Bin(10,1/2)=3]=1210((100)+(101)+(102)+(103))=1761024≈17.2%\begin{aligned}\mathbb{P}\left[\mathcal{Bin}\left(10,1/2\right)\le3\right] & =\mathbb{P}\left[\mathcal{Bin}\left(10,1/2\right)=0\right]+\mathbb{P}\left[\mathcal{Bin}\left(10,1/2\right)=1\right]\\ & +\mathbb{P}\left[\mathcal{Bin}\left(10,1/2\right)=2\right]+\mathbb{P}\left[\mathcal{Bin}\left(10,1/2\right)=3\right]\\ & =\frac{1}{2^{10}}\left({10 \choose 0}+{10 \choose 1}+{10 \choose 2}+{10 \choose 3}\right)\\ & =\frac{176}{1024}\approx17.2\% \end{aligned} P[Bin(10,1/2)≤3]​=P[Bin(10,1/2)=0]+P[Bin(10,1/2)=1]+P[Bin(10,1/2)=2]+P[Bin(10,1/2)=3]=2101​((010​)+(110​)+(210​)+(310​))=1024176​≈17.2%​

Using the symmetry of the problem (the probability of exactly 3 heads is exactly like the probability of exactly 3 tails, which is the same as the probability of exactly 7 heads), we can conclude that we will flip between four and six heads (that is, will be at distance at most one from the average) with a probability of about 65.4%65.4\%65.4%. We already see the concentration around the half, with less than a third of the options getting almost two thirds of the probability.

There is no reason to describe the bound here, I will just tell you that it is an extremely powerful weapon to the "scattering" of repeating many independent (though not necessarily identical!) experiments. Applying Chernoff to coin tossing, one can prove that if we toss a coin nnn times, the probablity to see more than n/2+6n⋅ln⁡nn/2+\sqrt{6n\cdot\ln n}n/2+6n⋅lnn​ flips of the same result is less than 2/n42/n^42/n4. By plugging numbers we can see, for example, that if we flip a thousand independent fair coins, the probability that get the same result more than 700700700 times is less than one in 500500500 billion, and if we flip a million such coins, the number of heads will be within 100010001000 flips from half a million with probability above 99.99999999999999999998%99.99999999999999999998\%99.99999999999999999998%.

This is a special example of a law of large numbers. Such laws give us conditions under which repeating experiments and averaging will converge to the expectation as we average more and more experiments. To demonstrate it, I had my computer flip ten thousand random coins, and averaged the difference between heads and tails. I repeated the experiment five times, and plotted each. This is the result:

For a spicier example, we can ask what is the expected number of times we need to flip a fair coin before we get heads. We already with the sample space Ω={ω1,ω2,…}\Omega = \{\omega_1,\omega_2,\ldots\}Ω={ω1​,ω2​,…} and the probability function P[ωn]=2−n\mathbb{P}[\omega_n] = 2^{-n}P[ωn​]=2−n. The random variable X(ωn)=nX(\omega_n) = nX(ωn​)=n maps each atomic event to the number of coin flips it implies, and its expectation is

Another important variable is a Binomial variable. Say we repeat a Poisson experiment with parameter ppp for nnn times, and let XXX be the random variable that counts how many of the repetitions were successful. What is the probability that X=kX=kX=k? Clearly, for k>nk>nk>n it is zero, because we can't have flipped more heads than we flipped coins. How many sequences there are for which exactly kkk flips are heads? Well, there were a total of nnn flips, and we have to choose kkk of them. The number of ways to make this choice is the (nk){n \choose k}(kn​). And what is the probability of each such sequence? Well, for each coin that landed on heads, it did so with probability ppp. Since there are kkk of those, the probability they all landed on heads is pkp^kpk. Similarly, all the remaining n−kn-kn−k coins landed on tails, which happens with probability (1−p)(1-p)(1−p). Multiplying all of this together we get that

Using the :

Using a binomial distribution to understand how this progresses as nnn becomes larger is possible, but a bit tiresome. Instead, we appeal to one of computer science's most beloved secret weapon: .

modeled this
binomial coefficient
binomial formula
the Chernoff bound
probability spaces
discrete probability spaces
Flipping ten thousand coins and tracking the result. The xxx axis is the number of repetitions, the yyy axis is the difference between heads and tails over the number of repetitions. We see that while the paths start very differently, they converge very quickly to 000.