The Math of Block Creation
The purpose of this section is two-fold:
Use block creation to understand the Poisson and exponential distributions
Use the Poisson and exponential distributions to understand block creation
Poisson Processes
A Poisson process describes any process during something happens at a know rate, but we can't know when exactly it happens, just how often. A common example is radioactive decay. Emission of radioactive particles is a memoryless process. Knowing when particles were emitted before tells us nothing about the next particle to emit. Other common examples include people arriving at a store, or typos in a book. We assume the time/place each happens is random and independent, but that there is some fixed rate.
Appropriately, the Poisson distribution has a rate parameter that tells us how often we expect the event to happen within an agreed upon unit of time (say, an hour). We want the random variable to distribute like the number of events we see in an hour, given that we have events an hour on average.
It turns out that the correct distribution is given by
We can arrive this formula by dividing our "hour" into small segment, and treating each like a Bernoulli trial.
Say we have an event that happens on average times an hour. We split the hour into parts, making large enough that we feel comfortable to ignore the possibility that the event happens more than once during an th of an hour. We note that each th of an hour distributes the same, and for each one there is some probability that the event will happen therein. Hence, the number of times we see the event is approximated by , so we have that . But we have computed that so we get that .
The upshot is that is the limit (for an appropriate sense of the word) of as approaches infinity. One can compute explicitly this leads to the formula above, or you can just take my word for it.
Exercise*: Prove that for any we have that
Blocks Creation is a Poisson Process
Disregarding difficulty adjustment and such, we assume that the network is parametrized to create, on average, one block block delay, where the block delay is determined by the system designer. For example, in Bitcoin we have that .
The random variable how block creation distributes over block delays. For example, the probability that we see nine Bitcoin blocks within 50 minutes (which are five block delays) is
What is the probability that the network produces exactly six blocks in an hour? We expect it to be formidable, but if we compute it we find that
which is actually quite low!
What if we take margins? Say, ask ourselves what is the probability that between 5 and 7 blocks are created in an hour?
Using the formula we get:
Wait, why did we get the same result? Of course! By replacing with we made the six times smaller, but by replacing with in the denominator we made it six times bigger. Don't be bothered by this too much, it's jus ta coincicidence.
We can also compute that
so we see that even with an entire block delay as a margin of error, the probability we fall within the margin remains below half! That's a testimony to how noisy block production is.
Block Delays and Exponential Distribution
Instead of asking ourselves how many blocks we expect to see in a given time period, we can ask ourselves how long will we wait for the next block.
To discuss this, it is convenient to shift our perspective a bit, and replace rates with delays. The statement "we expect blocks per hour" we can say "we expect a block once every " hours. The quantity is just the block delay we described earlier.
Up to now, when we described a distribution we just stated, for each value, what is the probability that we see that value. E.g. what is the probability that we see five blocks within one block delay.
If we try that approach here we will run into an annoying obstruction: for any the probability we waited exactly seconds is zero. Why? Because time is continuous. The probability that a block was created at some specific moment is as likely as hitting an infinitesimally small point with an infinitely thin arrow.
Without diving into the formalities, the correct way to reason about continuous is to consider ranges. Instead of asking futile questions about the exact value of a block delay (that will inevitably lead to a heated philosophic argument about whether an exact instance in time is even well defined in our physical reality), we can more reasonably ask something like "what is the probability we wait between 599 and 601 seconds?"
It turns out that if the block rate blocks per minute (so the number of blocks we see in a minute distribuets like ), then the amount of time we wait distributes according to what we call an exponential distribution. We denote by the continuous random variable with the property that for every satisfying we have that
Note that we are allowing with the understanding that .
It turns out that this random variable perfectly represents the waiting time for a block, given that the average block delay is . For example, if we want to know the probability that we wait between and seconds for the next Bitcoin block, we choose and (because we assume ) and find that the answer is
which is a meaningful number we can work with.
Exercise*: Assuming that the block rate of mining distributes like , prove the block delay distributes like .
So what can we compute with this formula?
Let's start with an easy one: what is the probability we wait more than an hour for the next Bitcoin block?
To compute this, we plug in (as there are six Bitcoin block delays in one hour) and (as we want to know the probability we waited any time longer than an hour) to get the result . This means we should see such delays about once per 500 blocks, or twice a week.
A good next question is: what is the probability that the next block is created within a block delay> Let us compute:
We see that almost two-thirds of the time, the delay between blocks will be shorter than expected.
A good next question is how fast do the majority of blocks arrive. In other words, how long do we have to wait so that the probability we see a block is exactly half. We already know this should be less than a block delay. To get an exact answer we need to solve for the equation , and I'll leave it to you to verify that the answer is . Nice! We get that half of the blocks arrive within 6.9 seconds. That is, a majority of blocks are significantly earlier than expected. This sheds a bit of light on why Bitcoin's block delay has to be so long!
So how bad is the problem? How short is short enough that, say, of the time, the block delay will be longer? To work this out we solve for the equation to find that the answer is just above block delays, or six seconds! That's right, one in 100 Bitcoin blocks will be discovered within six seconds, despite the block delay being 10 minutes.
We can work out many things with these simple yet power formulae, which are always good to have in your toolbox. I hope that I at least convinced you that there is a concrete reason why many people are concerned with reducing block times in Bitcoin.
Last updated