Reader’s Guide

This book assumes comfort with a few standard mathematical tools, but it does not assume prior study of state space models. The first chapter introduces the state, the continuous-time equation, the discrete recurrence, and the convolution kernel. Later chapters reuse those objects rather than reintroducing them from scratch.

Mathematical background

You should be able to multiply matrices and vectors, keep track of their shapes, and read a matrix as a linear map. Transposes and inverses are used when they exist. Eigenvalues appear because repeated updates are controlled by the modes of the update matrix: applying the same matrix many times can grow, decay, or rotate different components of a state.

Single-variable calculus is used throughout. Derivatives, integrals, exponentials, and first-order ordinary differential equations appear early. The starting example is the scalar equation

\[ x'(t)=a x(t)+b u(t). \]

Here \(x(t)\) is the quantity being updated, \(u(t)\) is the input, \(a\) sets how \(x(t)\) changes when no input is present, and \(b\) sets how strongly the input changes it. In the vector equations later in the book, \(x(t)\) has several coordinates instead of one. Calling it a state just means that it is the variable carried forward by the model.

Complex numbers appear when matrices have complex eigenvalues and when Fourier or transfer-function calculations are used. They are a calculation tool, not a separate modelling assumption. In this setting, complex eigenvalues mainly represent decay together with oscillation.

Generating functions appear as a compact way to represent a whole sequence of kernel coefficients. The relevant idea is that coefficients \(K_0,K_1,K_2,\dots\) can be packaged into a power series, so that algebra on the series can replace algebra on each coefficient separately.

Machine-learning background

The required neural-network background is limited. The book assumes that you know what a layer is, that parameters are learned by gradient-based training, and that a sequence model processes representations indexed by position. Attention, convolution, recurrence, S4, HiPPO, Mamba, and control-theory terminology are introduced when they are needed.

Notational conventions

Continuous time is written with parentheses, as in \(x(t)\). Discrete sequence positions are written with subscripts, as in \(x_k\). Matrices with bars, such as \(\bar A\) and \(\bar B\), are discretised versions of continuous-time matrices. The state dimension is \(N\), the sequence length is \(L\), and the neural representation width is \(\dmodel\).

Most calculations begin with one scalar input and one scalar output. This keeps the formulas short. Multi-input and multi-output versions use the same equations with wider input and output matrices. When the distinction matters, the shapes are stated explicitly.

The notation page collects the recurring symbols. It is meant as a reference while reading, not as a chapter to memorise before starting.

Reading the chapters

The chapters are ordered so that each representation of a state space model is derived from the previous one: continuous-time equation, sampled recurrence, convolution kernel, transfer function, memory construction, and structured algorithms for S4 and diagonal models. A symbol should either be introduced in the local discussion or be listed on the notation page.

The code examples are reference implementations. They show the same calculation in NumPy, PyTorch, and JAX after the mathematics has been stated.