# A Graph Theory Primer

You have probably noticed that most of the mathematical exposition has been deferred to an [appendix](https://shai-deshe.gitbook.io/understanding-blockdags-and-ghostdag/supplementary-material/math). However, basic graph theory shall be the exception to the rule.

Graphs are important to understand even for people who only seek cursory, informal understanding. They aren't just a theoretical instrument, they underwrite all of the *visual language* used to illustrate and demonstrate pretty much every idea throughout this book. Understanding what they represent could be extremely valuable to the layperson. Besides, they are easy and fun, so why not?

## Graphs

A (directed) *graph* is nothing but a bunch of circles called *nodes* or *vertices,* and a bunch of arrows between them called *edges*. For example:

<figure><img src="https://1222478707-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8yQ9stRzMBbRsCxJr0eJ%2Fuploads%2FVlzbPetxDuvBEwSsziJj%2Fimage.png?alt=media&#x26;token=fb8c9d4c-caad-4264-a979-fdd6c9cb14f8" alt=""><figcaption><p>A whacky graph</p></figcaption></figure>

We are usually interested in less whacky graphs, so we assume there are no *multiple arrows* between *the two same vertices*, and no *loopy arrows* going from a vertex to itself, for example:

<figure><img src="https://1222478707-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8yQ9stRzMBbRsCxJr0eJ%2Fuploads%2FMwLRrgCpf4UjAqa3862X%2Fimage.png?alt=media&#x26;token=566f56c4-b4bf-446d-9320-d79d6cbc7489" alt=""><figcaption><p>a simple graph</p></figcaption></figure>

This is called a *simple* (directed) graph.

In most textbooks, directed graphs are considered an intermediate topic, and the discussions start from *un*directed graphs. These are graphs where for any two vertices $$A$$, $$B$$ there is an arrow from $$A$$ to $$B$$ if and only if there is an arrow from $$B$$ to $$A$$. In this case, the two arrows are compressed into a single, arrowless line. For example:<br>

<figure><img src="https://1222478707-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8yQ9stRzMBbRsCxJr0eJ%2Fuploads%2FtXRbCMyAElmK3FdH7Imn%2Fimage.png?alt=media&#x26;token=6ec5d0ce-f77e-46f6-993e-085e2685ae7d" alt=""><figcaption><p>an undirected graph</p></figcaption></figure>

Unlike most textbooks, we will be concerned almost exclusively with directed graphs, so we will employ a backward convention: a *graph* is assumed to be *directed* unless explicitly stated otherwise.

Usually undirected graphs are used to describe *reciprocal* affinities: for example, the graph whose nodes are Facebook users, and there is an edge between any two people if and only if they are friends on Facebook, is undirected. That's because being friends on Facebook is subject to *mutual agreement* and is thus reciprocal. In contrast, Twitter allows users to follow other users without being followed back, and this lack of reciprocity is manifest in the directedness of the graph.

## DAGs

We want to use graphs to describe *causality* or *chronology*. It makes sense that such a graph would be directed, because it does not make sense that an event $$A$$ caused or preceded event $$B$$ *while* event $$B$$ caused or preceded event $$A$$.

However, how will you understand the chronology described by this graph:

<figure><img src="https://1222478707-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8yQ9stRzMBbRsCxJr0eJ%2Fuploads%2FHCrRvrpu3q6NR4ghIZlP%2Fimage.png?alt=media&#x26;token=dd3af835-d2e4-46a8-b3e9-30f3dd7ac8a7" alt=""><figcaption><p>A closed timelike curve</p></figcaption></figure>

Does it say that event $$A$$ happened before event $$B$$ that happened before event $$C$$, that happened *before* event $$A$$?&#x20;

If we are being *completely* pedantic, the existence of [closed timelike curves](https://en.wikipedia.org/wiki/Closed_timelike_curve) is still very much an open question. Nevertheless, we very reasonably assume assumption that nodes running our blockchain will never traverse such a loop. Let us carefully define what this means, the terms we introduce for that purpose will accompany us for the rest of the book.

Given a simple directed graph, we say that the vertex $$B$$ is *reachable* from the vertex $$A$$ if there is some way to get from $$A$$ to $$B$$ by only following arrows. For example:

<figure><img src="https://1222478707-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8yQ9stRzMBbRsCxJr0eJ%2Fuploads%2F1rqhEfATZOenIN4kS7m0%2Fimage.png?alt=media&#x26;token=992fe44e-41f9-4481-840b-b3a7c6cc0421" alt=""><figcaption><p><span class="math">A</span> and <span class="math">A'</span> are both reachable from <span class="math">B</span>. However, <span class="math">B</span> is reachable from <span class="math">A</span> but not from <span class="math">A'</span>.</p></figcaption></figure>

Each way to reach $$B$$ from $$A$$ is called a *path*. A path is just a list of arrows such that the first arrow leaves $$A$$, the last arrow enters $$B$$, and any arrow in between leaves the vertex entered by the previous arrow.

For example, in the graph above:

<figure><img src="https://1222478707-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8yQ9stRzMBbRsCxJr0eJ%2Fuploads%2FEjN99MuBCD9pt7n7qvvq%2Fimage.png?alt=media&#x26;token=51d31ce9-2f40-42d1-933e-411b2f9ec1f5" alt=""><figcaption><p>There is a single path from <span class="math">A</span> to <span class="math">B</span> (red), but three different paths from <span class="math">B</span> to <span class="math">A'</span> (orange, blue, and green)</p></figcaption></figure>

Note that in our convention, a path must have *at least one arrow*. Staying in place *doesn't count*. Under this convention, a *cycle* is a path from any vertex $$A$$ to itself. Given the name, the following definition should not come as a surprise.

**Definition**: A *directed acyclic graph (DAG)* is a simple directed graph with no cycles.

The DAGs that will interest us have one more useful property, they are *rooted*. That is, there is a vertex that we call $$G$$ (alluding to the genesis block) with the property that it is reachable from *all* other vertices. We call this vertex the *root* of the DAG.

Note that I assumed above that if the DAG is rooted then it only has one root. It is easy to prove that this is indeed the case, and there could not be a rooted DAG with two roots.

## Trees

In the context of block *chains*, we are interested in a particular type of rooted DAG called a *tree*:

**Definition**: A rooted DAG is called a *tree* if all vertices have at most one arrow leaving them

One can see that the definition is less flexible than seems. The properties of a rooted DAG actually imply that there are zero arrows leaving the root vertex, and one arrow leaving any other vertex. A node that has no arrow going *into* it is called a *tip* or a *leaf*.

<figure><img src="https://1222478707-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8yQ9stRzMBbRsCxJr0eJ%2Fuploads%2FwI0GX9LKB2grdyIK7vfl%2Fimage.png?alt=media&#x26;token=041634a9-de91-4b94-9cb8-c2201989d70d" alt=""><figcaption></figcaption></figure>

Given any node beside the root, we call the nodes it is pointing at its *parent.* Given any node that is not a tip, we call the nodes pointing at it its children. Note that a vertex can have many children but only one parent. Hence, for any non-root vertex $$B$$ we can define without ambiguity that $$B.Parent$$ is the *unique* vertex that $$B$$ points to:

<figure><img src="https://1222478707-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8yQ9stRzMBbRsCxJr0eJ%2Fuploads%2FDrcqU8hCXzlgKHoJzTfd%2Fimage.png?alt=media&#x26;token=b05ef17b-e2a6-476f-b155-b0c461401cf7" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
In pretty much any graph theory book you open, you will find an *opposite* convention. Trees are either undirected, or directed with the arrows pointing *away from the root*, making the parents further from the root than the children. This is not because computer scientists think that trees grow from foliage to root, but because trees are often used to describe many possible routes that all converge to the same conclusion, and from this vantage the root is the "end of time" and not the "beginning of time". However, in this book we assume that the root is the oldest block, encouraging an inverse convention.
{% endhint %}

Note that in a rooted tree there is only *one* path from any (non-root) vertex to the root (in the exercises you will show the converse, that a rooted DAG where there's only one path from any non-root vertex to the root is a tree). We call this path the *chain* of $$B$$, and denote it $$B.Chain$$.
