Understanding BlockDAGs and GHOSTDAG
My WebsiteSupport My Work
  • Welcome!
  • Preface
  • Introduction
  • Part 1: BlockChains and BlockDAGs
    • Introduction
    • Chapter 1: From BFT to PoW
      • Byzantine Fault Tolerance
      • Proof-of-Work (PoW)
      • How PoW Works*
      • PoS Vs. PoW
      • Exercises
    • Chapter 2: the Block Chain Paradigm
      • A Graph Theory Primer
      • The Paradigm
      • Honesty and Rationality
      • Three Chain Selection Rules
      • Security Notions
      • A Security Model for Blockchain Consensus
      • Selfish Mining in Bitcoin
      • Safety
      • Liveness
      • Confirmation Times
      • Proving Bitcoin's Security*
      • Exercises
  • Part 2: The GHOSTDAG Protocol
    • Page 2
  • Supplementary Material
    • Math
      • Stuff You Should Know
        • Asymptotics, Growth, and Decay
        • Binomial Coefficients
        • The Binomial Formula
        • Geometric Sums and Series
      • Probability Theory
        • Probability Spaces
        • Random Variables
        • The Math of Block Creation
    • Computer Science
      • Stuff you Should Know
        • Asymptotic Notation
        • Negligible Functions
      • Cryptography
        • Random Oracles
        • Merkle Trees
        • One Time Pads
Powered by GitBook
On this page
  • Poisson Processes
  • Blocks Creation is a Poisson Process
  • Block Delays and Exponential Distribution
  1. Supplementary Material
  2. Math
  3. Probability Theory

The Math of Block Creation

The purpose of this section is two-fold:

  • Use block creation to understand the Poisson and exponential distributions

  • Use the Poisson and exponential distributions to understand block creation

Poisson Processes

A Poisson process describes any process during something happens at a know rate, but we can't know when exactly it happens, just how often. A common example is radioactive decay. Emission of radioactive particles is a memoryless process. Knowing when particles were emitted before tells us nothing about the next particle to emit. Other common examples include people arriving at a store, or typos in a book. We assume the time/place each happens is random and independent, but that there is some fixed rate.

Appropriately, the Poisson distribution has a rate parameter kkk that tells us how often we expect the event to happen within an agreed upon unit of time (say, an hour). We want the random variable Poi(k)Poi(k)Poi(k) to distribute like the number of events we see in an hour, given that we have kkk events an hour on average.

It turns out that the correct distribution is given by

P[Poi(k)=m]=km⋅e−km!\mathbb{P}[Poi(k)=m] = \frac{k^m\cdot e^{-k}}{m!}{}P[Poi(k)=m]=m!km⋅e−k​

We can arrive this formula by dividing our "hour" into small segment, and treating each like a Bernoulli trial.

Say we have an event that happens on average kkk times an hour. We split the hour into nnn parts, making nnn large enough that we feel comfortable to ignore the possibility that the event happens more than once during an nnnth of an hour. We note that each nnnth of an hour distributes the same, and for each one there is some probability ppp that the event will happen therein. Hence, the number of times we see the event is approximated by Bin(n,p)\mathcal{Bin}(n,p)Bin(n,p), so we have that E[Bin(n,p)]=k\mathbb{E}[\mathcal{Bin}(n,p)] = kE[Bin(n,p)]=k. But we have computed that E[Bin(n,p)]=np\mathbb{E}[\mathcal{Bin}(n,p)] = np E[Bin(n,p)]=np so we get that p=k/np=k/np=k/n.

The upshot is that Poi(k)=nPoi(k)=nPoi(k)=n is the limit (for an appropriate sense of the word) of Bin(n,k/n)\mathcal{Bin}(n,k/n)Bin(n,k/n) as nnn approaches infinity. One can compute explicitly this leads to the formula above, or you can just take my word for it.

Exercise*: Prove that for any mmm we have that

lim⁡n→∞P[Bin(n,k/n)=m]=kmm!e−k\lim_{n\to\infty}\mathbb{P}\left[\mathcal{Bin}(n,k/n)=m\right]=\frac{k^{m}}{m!}e^{-k}n→∞lim​P[Bin(n,k/n)=m]=m!km​e−k
Solution

lim⁡n→∞P[Bin(n,k/n)=m]=lim⁡n→∞(nm)(k/n)m(1−k/n)n−m=lim⁡n→∞n!m!(n−m)!km1nm(1−k/n)n−m=kmm!lim⁡n→∞n!(n−m)!1nm(1−k/n)n−m=kmm!lim⁡n→∞(n−m+1)n⋅…⋅nnlim⁡n→∞(1−k/n)n−m=kmm!lim⁡n→∞(1−1n/k)n=kmm!e−k\begin{aligned}\lim_{n\to\infty}\mathbb{P}\left[\mathcal{Bin}(n,k/n)=m\right] & =\lim_{n\to\infty}{n \choose m}\left(k/n\right)^{m}\left(1-k/n\right)^{n-m}\\ & =\lim_{n\to\infty}\frac{n!}{m!\left(n-m\right)!}k^{m}\frac{1}{n^{m}}\left(1-k/n\right)^{n-m}\\ & =\frac{k^{m}}{m!}\lim_{n\to\infty}\frac{n!}{\left(n-m\right)!}\frac{1}{n^{m}}\left(1-k/n\right)^{n-m}\\ & =\frac{k^{m}}{m!}\lim_{n\to\infty}\frac{\left(n-m+1\right)}{n}\cdot\ldots\cdot\frac{n}{n}\lim_{n\to\infty}\left(1-k/n\right)^{n-m}\\ & =\frac{k^{m}}{m!}\lim_{n\to\infty}\left(1-\frac{1}{n/k}\right)^{n}\\ & =\frac{k^{m}}{m!}e^{-k} \end{aligned} n→∞lim​P[Bin(n,k/n)=m]​=n→∞lim​(mn​)(k/n)m(1−k/n)n−m=n→∞lim​m!(n−m)!n!​kmnm1​(1−k/n)n−m=m!km​n→∞lim​(n−m)!n!​nm1​(1−k/n)n−m=m!km​n→∞lim​n(n−m+1)​⋅…⋅nn​n→∞lim​(1−k/n)n−m=m!km​n→∞lim​(1−n/k1​)n=m!km​e−k​

Blocks Creation is a Poisson Process

Disregarding difficulty adjustment and such, we assume that the network is parametrized to create, on average, one block block delay, where the block delay λ\lambdaλ is determined by the system designer. For example, in Bitcoin we have that λ=10 minutes\lambda = 10\text{ minutes}λ=10 minutes.

The random variable Poi(k)Poi(k)Poi(k) how block creation distributes over kkk block delays. For example, the probability that we see nine Bitcoin blocks within 50 minutes (which are five block delays) is

P[Poi(5)=9]=599!e−k≈3.6%\mathbb{P}\left[Poi\left(5\right)=9\right]=\frac{5^{9}}{9!}e^{-k}\approx3.6\%P[Poi(5)=9]=9!59​e−k≈3.6%

What is the probability that the network produces exactly six blocks in an hour? We expect it to be formidable, but if we compute it we find that

P[Poi(6)=6]=666!e−6≈16%\mathbb{P}\left[Poi\left(6\right)=6\right]=\frac{6^{6}}{6!}e^{-6}\approx16\%P[Poi(6)=6]=6!66​e−6≈16%

which is actually quite low!

What if we take margins? Say, ask ourselves what is the probability that between 5 and 7 blocks are created in an hour?

Using the formula we get:

P[Poi(6)=5]=655!e−6≈16%\mathbb{P}\left[Poi\left(6\right)=5\right]=\frac{6^{5}}{5!}e^{-6}\approx16\%P[Poi(6)=5]=5!65​e−6≈16%

Wait, why did we get the same result? Of course! By replacing 666^666 with 656^565 we made the six times smaller, but by replacing 6!6!6! with 5!5!5! in the denominator we made it six times bigger. Don't be bothered by this too much, it's jus ta coincicidence.

We can also compute that

P[Poi(6)=7]=677!e−6≈15%\mathbb{P}\left[Poi\left(6\right)=7\right]=\frac{6^{7}}{7!}e^{-6}\approx15\%P[Poi(6)=7]=7!67​e−6≈15%

so we see that even with an entire block delay as a margin of error, the probability we fall within the margin remains below half! That's a testimony to how noisy block production is.

Block Delays and Exponential Distribution

Instead of asking ourselves how many blocks we expect to see in a given time period, we can ask ourselves how long will we wait for the next block.

To discuss this, it is convenient to shift our perspective a bit, and replace rates with delays. The statement "we expect kkk blocks per hour" we can say "we expect a block once every 1/k1/k1/k" hours. The quantity 1/k1/k1/k is just the block delay λ\lambdaλ we described earlier.

Up to now, when we described a distribution we just stated, for each value, what is the probability that we see that value. E.g. what is the probability that we see five blocks within one block delay.

If we try that approach here we will run into an annoying obstruction: for any α≥0\alpha\ge 0α≥0 the probability we waited exactly α\alphaα seconds is zero. Why? Because time is continuous. The probability that a block was created at some specific moment is as likely as hitting an infinitesimally small point with an infinitely thin arrow.

Without diving into the formalities, the correct way to reason about continuous is to consider ranges. Instead of asking futile questions about the exact value of a block delay (that will inevitably lead to a heated philosophic argument about whether an exact instance in time is even well defined in our physical reality), we can more reasonably ask something like "what is the probability we wait between 599 and 601 seconds?"

It turns out that if the block rate 1/λ1/\lambda1/λ blocks per minute (so the number of blocks we see in a minute distribuets like Poi(1/λ)Poi(1/\lambda)Poi(1/λ)), then the amount of time we wait distributes according to what we call an exponential distribution. We denote by Exp(λ)Exp(\lambda)Exp(λ) the continuous random variable with the property that for every a,ba,ba,b satisfying 0≤a<b≤∞0\le a < b \le \infty0≤a<b≤∞ we have that

P[aλ≤Exp(λ)≤bλ]=e−a−e−b\mathbb{P}\left[a\lambda\le Exp\left(\lambda\right)\le b\lambda\right]=e^{-a}-e^{-b}P[aλ≤Exp(λ)≤bλ]=e−a−e−b

Note that we are allowing b=∞b=\inftyb=∞ with the understanding that e−∞=0e^{-\infty}=0e−∞=0.

It turns out that this random variable perfectly represents the waiting time for a block, given that the average block delay is λ\lambdaλ. For example, if we want to know the probability that we wait between 599599599 and 601601601 seconds for the next Bitcoin block, we choose a=599/600a = 599/600a=599/600 and b=601/600b=601/600b=601/600 (because we assume λ=600 sec\lambda = 600\text{ sec}λ=600 sec) and find that the answer is

e−599/600−e−601/600≈0.1%e^{-599/600}-e^{-601/600}\approx0.1\%e−599/600−e−601/600≈0.1%

which is a meaningful number we can work with.

Exercise*: Assuming that the block rate of mining distributes like Poi(1/λ)Poi(1/\lambda)Poi(1/λ), prove the block delay distributes like Exp(λ)Exp(\lambda)Exp(λ).

Solution

We first note that

P[aλ≤Exp(λ)≤bλ]=1−P[Exp(λ)≤aλ or bλ≤Exp(λ)]=1−(P[Exp(λ)≤aλ]+P[bλ≤Exp(λ)])=1−(P[Exp(λ)≤aλ]+1−P[Exp(λ)≤bλ])=P[Exp(λ)≤aλ]−P[Exp(λ)≤bλ]\begin{aligned}\mathbb{P}\left[a\lambda\le Exp\left(\lambda\right)\le b\lambda\right] & =1-\mathbb{P}\left[Exp\left(\lambda\right)\le a\lambda\text{ or }b\lambda\le Exp\left(\lambda\right)\right]\\ & =1-\left(\mathbb{P}\left[Exp\left(\lambda\right)\le a\lambda\right]+\mathbb{P}\left[b\lambda\le Exp\left(\lambda\right)\right]\right)\\ & =1-\left(\mathbb{P}\left[Exp\left(\lambda\right)\le a\lambda\right]+1-\mathbb{P}\left[Exp\left(\lambda\right)\le b\lambda\right]\right)\\ & =\mathbb{P}\left[Exp\left(\lambda\right)\le a\lambda\right]-\mathbb{P}\left[Exp\left(\lambda\right)\le b\lambda\right] \end{aligned} P[aλ≤Exp(λ)≤bλ]​=1−P[Exp(λ)≤aλ or bλ≤Exp(λ)]=1−(P[Exp(λ)≤aλ]+P[bλ≤Exp(λ)])=1−(P[Exp(λ)≤aλ]+1−P[Exp(λ)≤bλ])=P[Exp(λ)≤aλ]−P[Exp(λ)≤bλ]​

Now to figure out P[aλ≤Exp(λ)]\mathbb{P}\left[a\lambda\le Exp\left(\lambda\right)\right]P[aλ≤Exp(λ)] we note that it is the same of asking what is the probability of seeing zero blocks during a period of aλa\lambdaaλ block delays, a period through which we expect to see aaa blocks. In other words, we get that

P[aλ≤Exp(λ)]=P[Poi(a)=0]=(ak)0e−a0!=e−a\mathcal{P}\left[a\lambda\le Exp\left(\lambda\right)\right]=\mathcal{P}\left[Poi\left(a\right)=0\right]=\frac{\left(ak\right)^{0}e^{-a}}{0!}=e^{-a}P[aλ≤Exp(λ)]=P[Poi(a)=0]=0!(ak)0e−a​=e−a

and we can repeat the computation for bbb, and get the desired expression.

We derived the Poisson formula from first principles by assuming that the process is memoryless. The assumption that each interval is independent of the others is what allows us to reduce the Poisson process into a limit of binomial distributions. Can we similarly derive the exponential distribution formula from first principles? Yes, we can. The idea is that the amount of time we have to wait does not depend on how long we've been waiting already. Expressing this requires conditional probabilities, a topic we haven't touched. But the idea is that the probability we wait aaa block delays is exactly the probability that we wait a total of a+ba+ba+b block delays given that we already waited bbb block delays. In conditional probability notation, the exponential distribution Exp(λ)Exp(\lambda)Exp(λ) must satisfy the equation

P[aλ≤Exp(λ)]=P[(a+b)λ≤Exp(λ)∣bλ≤Exp(λ)]\mathbb{P}\left[a\lambda\le Exp\left(\lambda\right)\right]=\mathbb{P}\left[\left(a+b\right)\lambda\le Exp\left(\lambda\right)\mid b\lambda\le Exp\left(\lambda\right)\right]P[aλ≤Exp(λ)]=P[(a+b)λ≤Exp(λ)∣bλ≤Exp(λ)]

With a bit of work, one can show that the exponential distribution we described above is the only one that has this property.

So what can we compute with this formula?

Let's start with an easy one: what is the probability we wait more than an hour for the next Bitcoin block?

To compute this, we plug in a=6a=6a=6 (as there are six Bitcoin block delays in one hour) and b=∞b=\inftyb=∞ (as we want to know the probability we waited any time longer than an hour) to get the result e−6≈0.2%e^{-6}\approx 0.2\%e−6≈0.2%. This means we should see such delays about once per 500 blocks, or twice a week.

A good next question is: what is the probability that the next block is created within a block delay> Let us compute:

P[0≤Exp(λ)≤λ]=e−0−e−1=1−1e≈63%\mathbb{P}\left[0\le Exp\left(\lambda\right)\le\lambda\right]=e^{-0}-e^{-1}=1-\frac{1}{e}\approx63\%P[0≤Exp(λ)≤λ]=e−0−e−1=1−e1​≈63%

We see that almost two-thirds of the time, the delay between blocks will be shorter than expected.

A good next question is how fast do the majority of blocks arrive. In other words, how long do we have to wait so that the probability we see a block is exactly half. We already know this should be less than a block delay. To get an exact answer we need to solve for aaa the equation 12=P[0≤Exp(λ)≤aλ]\frac{1}{2}=\mathbb{P}\left[0\le Exp\left(\lambda\right)\le a\lambda\right]21​=P[0≤Exp(λ)≤aλ], and I'll leave it to you to verify that the answer is a=log⁡(2)≈0.69a=\log\left(2\right)\approx0.69a=log(2)≈0.69. Nice! We get that half of the blocks arrive within 6.9 seconds. That is, a majority of blocks are significantly earlier than expected. This sheds a bit of light on why Bitcoin's block delay has to be so long!

So how bad is the problem? How short is short enough that, say, 99%99\%99% of the time, the block delay will be longer? To work this out we solve for bbb the equation P[Exp(λ)≤bλ]=1100\mathbb{P}\left[Exp\left(\lambda\right)\le b\lambda\right]=\frac{1}{100}P[Exp(λ)≤bλ]=1001​ to find that the answer is just above 0.010.010.01 block delays, or six seconds! That's right, one in 100 Bitcoin blocks will be discovered within six seconds, despite the block delay being 10 minutes.

We can work out many things with these simple yet power formulae, which are always good to have in your toolbox. I hope that I at least convinced you that there is a concrete reason why many people are concerned with reducing block times in Bitcoin.

PreviousRandom VariablesNextComputer Science

Last updated 2 months ago

dictates that block creation is actually a Poisson process. Say that the global hashrate is nnn hashes per second. That is, in an average nnnth of a second, someone makes a single attempt to guess the block. And if we divide the second into much more segments, say 2n2^n2n, then it becomes extremely unlikely that two hashes were computed in the same segment.

In this by Jameson Lopp, you can see how well this theory holds in practice.

The way proof-of-work works
lovely post