References
Blelloch, Guy E. 1990. Prefix Sums and Their Applications. CMU-CS-90-190. School of Computer Science, Carnegie Mellon University.
Cooley, James W., and John W. Tukey. 1965. “An Algorithm for the Machine Calculation of Complex Fourier Series.” Mathematics of Computation 19 (90): 297–301. https://doi.org/10.1090/S0025-5718-1965-0178586-1.
Gu, Albert, and Tri Dao. 2024. “Mamba: Linear-Time Sequence Modeling with Selective State Spaces.” Conference on Language Modeling (COLM). https://arxiv.org/abs/2312.00752.
Gu, Albert, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher Ré. 2020. “HiPPO: Recurrent Memory with Optimal Polynomial Projections.” Advances in Neural Information Processing Systems (NeurIPS) 33. https://arxiv.org/abs/2008.07669.
Gu, Albert, Karan Goel, and Christopher Ré. 2022. “Efficiently Modeling Long Sequences with Structured State Spaces.” International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2111.00396.
Gu, Albert, Ankit Gupta, Karan Goel, and Christopher Ré. 2022. “On the Parameterization and Initialization of Diagonal State Space Models.” Advances in Neural Information Processing Systems (NeurIPS) 35. https://arxiv.org/abs/2206.11893.
Gu, Albert, Isys Johnson, Karan Goel, et al. 2021. “Combining Recurrent, Convolutional, and Continuous-Time Models with Linear State-Space Layers.” Advances in Neural Information Processing Systems (NeurIPS) 34: 572–85. https://arxiv.org/abs/2110.13985.
Gupta, Ankit, Albert Gu, and Jonathan Berant. 2022. “Diagonal State Spaces Are as Effective as Structured State Spaces.” Advances in Neural Information Processing Systems (NeurIPS) 35. https://arxiv.org/abs/2203.14343.
Hager, William W. 1989. “Updating the Inverse of a Matrix.” SIAM Review 31 (2): 221–39. https://doi.org/10.1137/1031049.
Hwang, Sukjun, Aakash Lahoti, Ratish Puduppully, Tri Dao, and Albert Gu. 2024. “Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers.” Advances in Neural Information Processing Systems (NeurIPS) 37. https://arxiv.org/abs/2407.09941.
Kailath, Thomas. 1980. Linear Systems. Prentice-Hall.
Lieber, Opher, Barak Lenz, et al. 2024. Jamba: A Hybrid Transformer-Mamba Language Model. https://arxiv.org/abs/2403.19887.
NVIDIA. 2026. Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning. https://arxiv.org/abs/2604.12374.
Oppenheim, Alan V., Ronald W. Schafer, and John R. Buck. 1999. Discrete-Time Signal Processing. 2nd ed. Prentice Hall.
Pan, Victor Y. 2001. Structured Matrices and Polynomials: Unified Superfast Algorithms. Birkhäuser. https://doi.org/10.1007/978-1-4612-0129-8.
Smith, Jimmy T. H., Andrew Warrington, and Scott W. Linderman. 2023. “Simplified State Space Layers for Sequence Modeling.” International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2208.04933.
Szegő, Gábor. 1939. Orthogonal Polynomials. Vol. 23. American Mathematical Society Colloquium Publications. American Mathematical Society. https://doi.org/10.1090/coll/023.
Tustin, A. 1947. “A Method of Analysing the Behaviour of Linear Systems in Terms of Time Series.” Journal of the Institution of Electrical Engineers 94 (1): 130–42. https://doi.org/10.1049/ji-2a.1947.0020.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, et al. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems (NeurIPS) 30: 5998–6008. https://arxiv.org/abs/1706.03762.