Worked example of automata runs as sheaves

This post is about "sheaf semantics," a way to model, understand, talk about dynamic phenomena. The situation is that you have some representation of such phenomena - possibly partial - and you would like to assign meaning, and are mathematical/logical model for that purpose. Here, we have such a possible mathematical model. This is a semantic technique - we do not consider the representation you may have on your hands, the data, evidence or proof, only the model.

As we will see, sheaves can help describe observations of dynamic phenomena in a precise way — even if these observation are partial, ambiguous, admit multiple interpretations. The definition of a sheaf is technical and presupposes familiarity with a bunch of abstract math. Many people who need to describe "observations of dynamic phenomena" as part of their day-job will not find it very accessible.

I am going to use automata theory as example since I assume this is understood by a lot more people (informaticians, programmers) than sheaves. There are textbooks on regular expressions and automata, but I am quite sure there are no textbooks that would connect it to sheaves. I am not advocating that automata theory be taught this way, but believe this might be a good way to introduce a method and vocabulary that one may find helpful to describe (fragments of) reality. For anyone who shares the motivation, but finds this example simplistic, Joseph A. Goguen's paper "Sheaf semantics for concurrent interacting objects" may be a good thing to read next. $\newcommand{\redq}{\color{red}{q}}$ $\newcommand{\redQ}{\color{red}{Q}}$

Warm-up: Deterministic Automata

This is (roughly) how I learnt about deterministic automata (state machines):

  • we have a set $\Sigma$ of input symbols $a, b, \ldots$ called alphabet. $\Sigma^*$ is the set of sequences $w$ of such letters (or, if you prefer, the free monoid on generators $\Sigma$), with $\epsilon$ the empty word.
  • $\redQ$ a set of states (assumed finite),
  • a transition function $\delta: \redQ \times \Sigma \rightarrow \redQ$.

We extend the transition function $\delta: \redQ \times \Sigma \rightarrow \redQ$ to a transition function $\delta^*: \redQ \times \Sigma^* \rightarrow \redQ$ as usual, meaning: $$\begin{array}{ll} \delta^*(q, \epsilon) &= q \\ \delta^*(q, aw) &= \delta^*(\delta(q, a), w) \\ \end{array} $$

This extended transition function tells us where we end up if we start in a state $\redq$ and feed it a word.

Finite state machine example with comments:  two states for door open, door closed

So far so good. Now, let's imagine that we do not only want the destination state, but also the sequence of states we passed through.

We can write a sequence of alternating state and input symbols $$[ \redq_{\color{red}{(0)}} ] \ \ a_{(0)} \redq_{\color{red}{(1)}} ~ a_{(1)} \redq_{\color{red}{(2)}} ~ \ldots ~ a_{(n-1)} \redq_{(n)}$$ such that $\redq_{(0)} = \redq_0$ and $\delta(\redq_{\color{red}{(i)}}, a_{(i)}) = \redq_{\color{red}{(i+1)}}$. Since the $\redq_0$ does not add much value, let's define a run as being this sequence, without $\redq_0$: $$(a_{(0)}, \redq_{\color{red}{(1)}}) ~ (a_{(1)}, \redq_{\color{red}{(2)}}) ~ \ldots ~ (a_{(n-1)}, \redq_{(n)})$$

The set of all possible runs counts as a "behavior" of the automaton described by the transition function $\delta$ and starting state $\redq_0$. Since $\delta$ is a function, a given word $w$ will correspond to exactly one run, not more, not less.

Let's write down the set of all runs, for a non-empty word $w$: $$ \mathrm{runs}(w) = \{ v\ \in (\Sigma \times \redQ)^*\ \ |\ \mathit{IsRun}(v)\ \}$$ where $\mathit{IsRun}$ is a constraint $v_{(i)} = (a_{(i)}, \delta(\redq_{(i)}, a_{(i)}))$ for $0 \leq i \lt n$, where $\redq_{(i)}$ is either $\redq_0$ or obtained from $v_{(i-1)}$. So far, this is not new or different from the $\delta^*$ definition above.


Let us take a different path to this definition of "set of runs". We call prefix domain any set $U$ of natural numbers that is downward-closed with respect to the normal ordering of natural numbers. Think of such sets to be assembled in a space $X$ like $\{\{\}, \{0\}, \{0, 1\}, \ldots, \{0, ..., n - 1\}, \ldots\} \cup \{\omega\}$ that is closed under pair-wise intersection and arbitrary union. The letter $\omega$ here stands for the set of all natural numbers, and by "arbitrary" we mean even an infinite union of prefix domains would be a prefix domain. The set $ \{0, ..., n - 1\}$ is abbreviated with $[n]$, with $[0]$ being empty set.

The conditions do not require that "all" possible prefixes are present in $X$. A space of prefix domains can be finite or have gaps, like $\{[0], [1], [2], [5]\}$.

A word of length $n$ can be described as a function from $w: [n] \rightarrow \Sigma$. Since we want to talk about the run on a word, let's consider an "observation" $U$ to be function that maps each point of the prefix domain to a pair $(a, \redq)$ at that position. These functions are the reason why we called those sets "domains". In symbols: $$\mathcal{O}(U) = \{~ f: U \rightarrow \Sigma \times \redQ ~|~ \mathit{IsRun}(f) \ \} $$ where $\mathit{IsRun}$ are the same constraints as before:

  • if $0 \in U$, then $f(0) = (a_0, \redq_{\color{red}{(1)}})$
  • if $u \in U$ and $u > 0$, then $f(u) = \delta(f(u-1))$

This really is saying the same thing as before, except that for any position $u$, we have a definition for $f(u)$ that is connected directly to $f(u-1)$, which we can do because $U$ is downwards-closed.

The functions $f$ are called sections of $\mathcal{O}$ under $U$.

Restriction and Contravariance

For prefix domains $U, V$, if we have $U \subseteq V$, then $U$ is — you guessed it — a prefix domain for a prefix of $V$. Write $i: U \rightarrow V$ for the inclusion of $U$ into $V$.

To this inclusion corresponds an operation we can do on any section $f \in \mathcal{O}(V)$, namely the restriction $f\big|_U: U \rightarrow \Sigma \times \redQ$ which is the same as $f$ but only defined on domain $U$.

This correspondence can be written as $\mathcal{O}(i): \mathcal{O}(V) \rightarrow \mathcal{O}(U)$. Note the direction of the arrow.

With a few mild conditions regarding the "identity inclusion" and composition, we can consider $\mathcal{O}$ a contravariant functor. And any contravariant functor $\mathcal{O}: X^{op} \rightarrow \mathsf{Sets}$ for topological space $X$ is called a presheaf.

Non-determinism and the Sheaf Condition

For a presheaf to be considered a sheaf it needs to satisfy the sheaf condition, (1) locality and (2) gluing. Before we get to these, we want to consider non-determinism.

A nondeterministic automaton has a transition relation $\delta \subseteq \redQ \times \Sigma \times \redQ$ instead of a function.

For figuring out the set of runs, we could now keep track of the sets of states the automaton can be on. Let's not do that and use our presheaves instead: $$\mathcal{O}(U) = \{~ f: U \rightarrow \redQ \times \Sigma ~|~ \mathit{IsRun}(f) \ \} $$ where $\mathit{IsRun}$ is very similar to the constraint we had before, just adapted to the non-determinism that comes from $\delta$ being a relation:

  • if $0 \in U$, then $f(0) = (a_0, \redq) $ for some $\redq$ with $(\redq_0, a, \redq) \in \delta$
  • for all $u \in U$, if $u > 0$ and $f(u-1) = (\_, \redq)$ and $f(u) = (a, \redq')$ then $(\redq, a, \redq') \in \delta$

The determinism is gone, as intended: we describe the set of runs, but if someone showed us a suffix of a particular run we would not (in general) be able to tell what happened before. Depending on the $\delta$, we may be able to make some inference.

Sheaf Condition

We almost have enough material to understand the sheaf condition. A set of elements $(U_i)_{i \in I}$ from our space covers $U$ if $U \subseteq \bigcup_{i \in I} U_i$.

  • locality: if $(U_i)_{i \in I}$ cover $U$ and $s, t \in \mathcal{O}(U)$ with $s\big|_{U_i} = f\big|_{U_i}$ for all $i$, then $s = t$.
  • gluing: if $(U_i)_{i \in I}$ cover $U$ and we have $s_i \in \mathcal{O}(U_i)$ such that they agree on the overlaps $U_i \cap U_j$, then there is a section $s \in \mathcal{O}(U)$ and $s\big|_{U_i} = s_i$.

For our space of prefix domains and automata runs, we shall find that it satisfies the sheaf condition (what can you say about covers for prefix domain?).

Snippet Domains and More States

The prefix domains were chosen as a simple example, and the condition may get more interesting if we consider other spaces. Consider "snippet domains" which are sets of contiguous sequences. Now the sheaf condition is more useful as it you can glue the snippets of a run together to obtain an actual run.

From automata (finite state machines), one can start generalizing to other structures. What if the state space is very large (parameterized in some way)? Or we leave certain things unspecified or asynchronous like in I/O automata.

Modeling observations as sheaves means defining a suitable domain for fragments of observation, along with mappings that provide the observation data while doing this in a way that permits the combination of the fragments into a "complete observation". These domains do not have to involve time, but tracking behavior over time is a domain where formal approaches quickly get unwieldy and the intuitive sense we can make of the sheaf condition can prove helpful. There is a whole, and the sheaves describe its "parts". When we solve a jigsaw puzzle, we are taking advantage of a gluing condition of sorts.

Many of these advantages remain in effect when working only with presheaves. Consider transitions between states are labeled not with symbols, but with expressions. This opens up the direction of modeling concurrent processes, like Joyal, Nielsen, Winskel describe in Bisimulation from Open Maps.


In math, it seems to be well known and somewhat accepted that knowledge acquired by some same-generation group of researchers may get "lost" if it does not make it to textbooks (it appears in some monographs and is available to specialists, but from undergrads to grad school to full professors, people need to traverse a rocky path to get to an understanding which was a mere baseline in that same-generation group of researchers).

In informatics, we have means of codifying knowledge in the forms of programs, type systems. It is then ironic that formalization itself leads to "loss" since the various syntactic and insignificant choices lead to fragmentation among researchers. This can be observed in the modeling of concurrent systems, where the scientific process has led to whole subfields where specialists in one area could not be bothered to learn the language of the others.

If we want to preserve and codify knowledge in ways that does not get fragmented in space or time, it may be good to make use of more abstract modeling languages, as provided by category theory, and leverage the succinctness that comes from aspects that are left informal (semantic techniques). Presheaves seem like a technique where formal and informal can be mixed in ways that can be useful in this sense.


What are Commuting Conversions?

The term "commuting conversions" in proof theory refers to a proof transformation of natural deduction proofs that involve sums (disjunction and existential formulae). The motivation for these comes from a desirable equivalence relation among natural deduction proofs that is lost when $\vee$ and $\exists$ are added. \(\newcommand{\orelim}[0]{\color{blue}{(\vee{\mathrm{\small ELIM}})}}\) Consider the following natural deduction rules $$ \cfrac{\begin{array}[cc]{} \\ {} \\ A \vee B \end{array} \quad \begin{array}[cc]{} [A] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \end{array}\quad \begin{array}[cc]{} [B] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \end{array} } {C}~\color{blue}{(\vee{\mathrm{\small ELIM}})} $$ $$ \cfrac{\begin{array}[cc]{} \\ {} \\ \exists x . (x)\end{array} \quad \begin{array}[c]{} [P(t)] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \end{array}} {C}~\color{blue}{(\exists{\mathrm{\small ELIM}})} $$ When the formula $C$ is not the end of the proof, but there is another formula $W$ that is derived, then we get multiple proofs to derive the same thing. These multiple proofs of the same thing are all normal in the sense that there are no "detours" in the form of implication introduction and elimination. If we want a normal form to be unique, then we must necessarily pick one among these possible proof and designate it as "normal." This happens by adding additional conversions. $$ \cfrac{\begin{array}[cc]{} \\ {} \\ A \vee B \end{array} \quad \begin{array}[cc]{} [A] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \end{array}\quad \begin{array}[cc]{} [B] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \end{array} } {\cfrac{~C~}{~W~}} ~ \rightsquigarrow ~ % \cfrac{\begin{array}[cc]{} \\ {} \\ A \vee B \end{array} \quad \begin{array}[cc]{} [A] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \\ \hline \phantom{[}W\phantom{]} \end{array}\quad \begin{array}[cc]{} [B] \\ ~~\vdots \\ \phantom{[}C\phantom{]} \\ \hline \phantom{[}W\phantom{]} \end{array} } {W}$$ $$ \cfrac{\begin{array}[cc] {} & [P(t)] \\ {} & \vdots \\ \exists x . P(x) & C \\ \end{array}}{\cfrac{~C~}{~W~}} ~ \rightsquigarrow ~ \cfrac{\begin{array}[cc] {} & [P(t)] \\ {} & \vdots \\ {} & C \\ \exists x . P(x) & \overline{~W~} \\ \end{array}}{W} $$

Pattern matching example

Programmers may appreciate the following presentation of the same thing in Scala-like syntax. This:
$\color{blue}{W}$($r$ match { case Left(a) => $\color{blue}{s}$; case Right(b) => $\color{blue}{t}$ })
is the same as this:
$r$ match { case Left(a) => $\color{blue}{W(s)}$; case Right(b) => $\color{blue}{W(t)}$ })