The Math of Block Creation
The purpose of this section is two-fold:
Use block creation to understand the Poisson and exponential distributions
Use the Poisson and exponential distributions to understand block creation
Poisson Processes
A Poisson process describes any process during something happens at a know rate, but we can't know when exactly it happens, just how often. A common example is radioactive decay. Emission of radioactive particles is a memoryless process. Knowing when particles were emitted before tells us nothing about the next particle to emit. Other common examples include people arriving at a store, or typos in a book. We assume the time/place each happens is random and independent, but that there is some fixed rate.
Appropriately, the Poisson distribution has a rate parameter that tells us how often we expect the event to happen within an agreed upon unit of time (say, an hour). We want the random variable to distribute like the number of events we see in an hour, given that we have events an hour on average.
It turns out that the correct distribution is given by
We can arrive this formula by dividing our "hour" into small segment, and treating each like a Bernoulli trial.
Say we have an event that happens on average times an hour. We split the hour into parts, making large enough that we feel comfortable to ignore the possibility that the event happens more than once during an th of an hour. We note that each th of an hour distributes the same, and for each one there is some probability that the event will happen therein. Hence, the number of times we see the event is approximated by , so we have that . But we have computed that so we get that .
The upshot is that is the limit (for an appropriate sense of the word) of as approaches infinity. One can compute explicitly this leads to the formula above, or you can just take my word for it.
Exercise*: Prove that for any we have that
Blocks Creation is a Poisson Process
The way proof-of-work works dictates that block creation is actually a Poisson process. Say that the global hashrate is hashes per second. That is, in an average th of a second, someone makes a single attempt to guess the block. And if we divide the second into much more segments, say , then it becomes extremely unlikely that two hashes were computed in the same segment.
Disregarding difficulty adjustment and such, we assume that the network is parametrized to create, on average, one block block delay, where the block delay is determined by the system designer. For example, in Bitcoin we have that .
The random variable how block creation distributes over block delays. For example, the probability that we see nine Bitcoin blocks within 50 minutes (which are five block delays) is
What is the probability that the network produces exactly six blocks in an hour? We expect it to be formidable, but if we compute it we find that
which is actually quite low!
What if we take margins? Say, ask ourselves what is the probability that between 5 and 7 blocks are created in an hour?
Well, the probability that exactly five blocks are created is
Wait, why did we get the same result? Of course! By replacing with we made the answer smaller by , but by replacing with we made it large by a factor of 6 (because we divide by a number six times smaller). It's just a curiousity of Poisson distribution that should not bother us too much.
We can also compute that
so we see that even with an entire block delay as a margin of error, the probability we fall within the margin remains below half! That's a testimony to how noisy block production is.
Block Delays and Exponential Distribution
Instead of saying "we expect blocks per hour" we can say "we expect a block once every " hours. The quantity is just the block delay we described earlier. Instead of asking ourselves how many blocks we expect to see in a given time we can ask ourselves how long will we wait for the next block.
This is described by what we call an exponential distribution . We would like to say something like " is the probability that we have waited exactly block delays", which is technically correct but also extremely useless. You see, it turns out that for any we have that . Why is this expected? Because the probability we wait exactly alpha block rewards is . The number eventually expresses time, and time is continuous. Without getting into the theory of continuous variables, I will just say that the useful representation of such variables is to ask what is the probability that they fall within some range. The question "what is the probability that we wait exactly ten minutes for a block?" is not interesting, but the question "what is the probability that we wait between nine and a half and ten and a half minutes?" is.
It turns out that if blocks are created according to the distribution , then we have that the probability we have to wait between and block delays is given by
Note that we are allowed to put and with the understanding that .
Exercise*: Prove this formula
We derived the Poisson formula from first principles by assuming that the process is memoryless. The assumption that each interval is independent of the others is what allows us to reduce the Poisson process into a limit of binomial distributions. Can we similarly derive the exponential distribution formula from first principles? Yes, we can. The idea is that the amount of time we have to wait does not depend on how long we've been waiting already. Expressing this requires conditional probabilities, a topic we haven't touched. But the idea is that the probability we wait block delays is exactly the probability that we wait a total of block delays given that we already waited block delays. In conditional probability notation, the exponential distribution must satisfy the equation
With a bit of work, one can show that the exponential distribution we described above is the only one that has this property.
So what does this tell us about block distribution? Let's start with an easy one: what is the probability we wait more than an hour to the next Bitcoin block?
To compute this, we plug in (as there are six Bitcoin block delay in one hour) and (as we want to know the probability we waited any time longer than an hour) to get the result . This means that we should see such delays about once per 500 blocks, or twice a week.
A good next question is what is the probability that the next block is created within a block delay. Let us compute:
We see that almost two thirds of the time, the delay between blocks will be shorter than expected.
A good next question is how fast do the majority of blocks arrive. In other words, how long do we have to wait so that the probability we see a block is exactly half. We already know this should be less than a block delay. To get an exact answer we need to solve for the equation , and I'll leave it to you to verify that the answer is . Nice! We get that half of the blocks arrive within 6.9 seconds. That is, a majority of blocks are significantly earlier than expected. This sheds a bit of light on why Bitcoin's block delay has to be so long!
So how bad is the problem? How short is short enough that, say, of the time, the block delay will be longer? To work this out we solve for the equation to find that the answer is just above block delays, or six seconds! That's right, one in 100 Bitcoin blocks will be discovered within six seconds, despite the block delay being 10 minutes.
There are many more things that can be worked out using these simple yet powerful formula, and it is always good to have them in your toolbox. I hope that I at least convinced you that there is a concrete reason why many people are concerned with reducing block times in Bitcoin.
Last updated