State Space Models
  1. Home
  • Home
  • Preface
  • Acknowledgements
  • Reader’s Guide
  • Notation & Preliminaries
  • Roadmap & Contributing
  • 1. Introduction
    • 1.1 Sequences and memory
    • 1.2 Recurrence, convolution, and attention
    • 1.3 The state space route
    • 1.4 Why structure is needed
    • 1.5 State space layers and attention
    • Footnotes and references
    • Exercises
  • 2. State as Memory
    • 2.1 What problem is the model trying to solve?
    • 2.2 The state equation
    • 2.3 What does it mean for the state to be memory?
    • 2.4 Notation
    • Exercises
    • Footnotes and references
  • 3. Continuous-Time Linear State Spaces
    • 3.1 Solving the differential equation
    • 3.2 Lag and the impulse response
    • 3.3 How the eigenvalues of A shape memory
    • 3.4 Fading memory and oscillatory memory
    • 3.5 Linearity and time-invariance
    • 3.6 From a scalar input to model dimension d_model
    • 3.7 Notation
    • Exercises
    • Footnotes and references
  • 4. Discretisation
    • 4.1 Why discretisation is needed
    • 4.2 Sampling and the exact one-step formula
    • 4.3 Holding the input constant between samples
    • 4.4 A second route: averaging the state dynamics
    • 4.5 Eigenvalues and the step size as a timescale
    • 4.6 From recurrence to kernel
    • 4.7 Notation
    • Exercises
    • Footnotes and references
  • 5. Recurrence, Convolution, and Toeplitz Maps
    • 5.1 The question after discretisation
    • 5.2 From the recurrence to a convolution
    • 5.3 What the discrete kernel means
    • 5.4 The convolution matrix
    • 5.5 Why a fixed kernel requires time-invariance
    • 5.6 Cost of recurrence and convolution
    • 5.7 The kernel-generation problem
    • 5.8 Same model, two algorithms
    • 5.9 Notation
    • Exercises
    • Footnotes and references
  • 6. Transfer Functions and Kernel Generation
    • 6.1 Why another view is useful
    • 6.2 The transfer function in diagonal coordinates
    • 6.3 The discrete generating function
    • 6.4 The discrete generating function in diagonal coordinates
    • 6.5 Finite kernels as polynomials
    • 6.6 What the resolvent changes
    • 6.7 Notation
    • Exercises
    • Footnotes and references
  • 7. The Memory Problem
    • 7.1 A fixed state is not a stored history
    • 7.2 Slow decay is not enough
    • 7.3 Approximating the history by projection
    • 7.4 Moving the history to a fixed interval
    • 7.5 A polynomial basis and its online dynamics
    • 7.6 Notation
    • Exercises
    • Footnotes and references
  • 8. HiPPO
    • 8.1 From projection to a state equation
    • 8.2 The shifted Legendre basis
    • 8.3 Differentiating the coefficients
    • 8.4 The HiPPO-LegS matrix
    • 8.5 What the state contains
    • 8.6 The time factor
    • 8.7 Notation
    • Exercises
    • Footnotes and references
  • 9. Structured State Matrices
    • 9.1 The cost of density
    • 9.2 Why diagonalisation would help
    • 9.3 Stable diagonalisation and cheap corrections
    • 9.4 The HiPPO matrix is normal plus rank one
    • 9.5 Why the DPLR form is useful
    • 9.6 Complex coordinates
    • 9.7 Notation
    • Exercises
    • Footnotes and references
  • 10. The S4 Kernel Algorithm
    • 10.1 The computational target
    • 10.2 From kernel coefficients to Fourier samples
    • 10.3 The bilinear discretisation
    • 10.4 The diagonal resolvent as a Cauchy product
    • 10.5 The low-rank correction
    • 10.6 Assembling the kernel and its cost
    • 10.7 Notation
    • Exercises
    • Footnotes and references
  • 11. Diagonal State Spaces: DSS, S4D, and S5
    • 11.1 Removing the correction
    • 11.2 The continuous-time diagonal kernel
    • 11.3 The discrete-time kernel and its Vandermonde form
    • 11.4 Complex modes and real kernels
    • 11.5 Why diagonalisation alone is not enough
    • 11.6 Structured diagonal initialisations
    • 11.7 Stable parameterisation
    • 11.8 Shared state and the parallel scan
    • 11.9 What the diagonal model preserves
    • 11.10 Notation
    • Exercises
    • Footnotes and references
  • References
Front cover of the book State Space Models: the title in orange above an abstract spiral motif.
State Space Models

A guide to structured memory and complex dynamics in deep learning

Open Edition · 2026

Start reading GitHub

DOI 10.5281/zenodo.20736327

Changelog

  • [Jun 2026] First public preview. Foundations and Structured State Spaces & S4 chapters available.

Authors

Headshot of Cosmo Santoni.
Cosmo Santoni Imperial College London

Cite this book

@book{santoni2026ssm,
  author = {Santoni, Cosmo},
  title  = {State Space Models: A guide to structured memory and complex dynamics in deep learning},
  year   = {2026},
  note   = {Open Edition},
  doi    = {10.5281/zenodo.20736327},
  url    = {https://ssm.guide}
}

Table of contents

  • Preface
  • Acknowledgements
  • Reader’s Guide
  • Notation & Preliminaries
  • Roadmap & Contributing
  • 1 Introduction
  • 2 State as Memory
  • 3 Continuous-Time Linear State Spaces
  • 4 Discretisation
  • 5 Recurrence, Convolution, and Toeplitz Maps
  • 6 Transfer Functions and Kernel Generation
  • 7 The Memory Problem
  • 8 HiPPO
  • 9 Structured State Matrices
  • 10 The S4 Kernel Algorithm
  • 11 Diagonal State Spaces: DSS, S4D, and S5
  • References
Preface
 

© 2026 Cosmo Santoni. Text under CC BY 4.0; code under Apache 2.0.