References

[1]
S. Lipschutz. General Topology (McGraw-Hill Book Company, New York City, New York, 1965).
[2]
S. Lang. Fundamentals of differential geometry. Vol. 191 (Springer Science & Business Media, 2012).
[3]
S. I. Richard L. Bishop. Tensor Analysis on Manifolds (Dover Publications, Mineola, New York, 1980).
[4]
S. Lang. Real and functional analysis. Vol. 142 (Springer Science & Business Media, 2012).
[5]
M. P. Do Carmo and J. Flaherty Francis. Riemannian geometry. Vol. 2 (Springer, 1992).
[6]
P.-A. Absil, R. Mahony and R. Sepulchre. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica 80, 199–220 (2004).
[7]
E. Hairer, C. Lubich and G. Wanner. Geometric Numerical integration: structure-preserving algorithms for ordinary differential equations (Springer, 2006).
[8]
F. Mezzadri. How to generate random matrices from the classical compact groups, arXiv preprint math-ph/0609050 (2006).
[9]
D. D. Holm, T. Schmah and C. Stoica. Geometric mechanics and symmetry: from finite to infinite dimensions. Vol. 12 (Oxford University Press, Oxford, UK, 2009).
[10]
P.-A. Absil, R. Mahony and R. Sepulchre. Optimization algorithms on matrix manifolds (Princeton University Press, Princeton, New Jersey, 2008).
[11]
T. Bendokat, R. Zimmermann and P.-A. Absil. A Grassmann manifold handbook: Basic geometry and computational aspects, arXiv preprint arXiv:2011.13699 (2020).
[12]
W. S. Moses, V. Churavy, L. Paehler, J. Hückelheim, S. H. Narayanan, M. Schanen and J. Doerfert. Reverse-Mode Automatic Differentiation and Optimization of GPU Kernels via Enzyme. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '21 (Association for Computing Machinery, New York, NY, USA, 2021).
[13]
M. Betancourt. A geometric theory of higher-order automatic differentiation, arXiv preprint arXiv:1812.11592 (2018).
[14]
J. Bolte and E. Pauwels. A mathematical model for automatic differentiation in machine learning. Advances in Neural Information Processing Systems 33, 10809–10819 (2020).
[15]
T. Bendokat and R. Zimmermann. The real symplectic Stiefel and Grassmann manifolds: metrics, geodesics and applications, arXiv preprint arXiv:2108.12447 (2021).
[16]
B. O'neill. Semi-Riemannian geometry with applications to relativity (Academic press, New York City, New York, 1983).
[17]
E. Celledoni and A. Iserles. Approximating the exponential from a Lie algebra to a Lie group. Mathematics of Computation 69, 1457–1480 (2000).
[18]
C. Fraikin, K. Hüper and P. V. Dooren. Optimization over the Stiefel manifold. In: PAMM: Proceedings in Applied Mathematics and Mechanics, Vol. 7 no. 1 (Wiley Online Library, 2007); pp. 1062205–1062206.
[19]
M. Schlarb. Covariant Derivatives on Homogeneous Spaces: Horizontal Lifts and Parallel Transport. The Journal of Geometric Analysis 34, 1–43 (2024).
[20]
I. Goodfellow, Y. Bengio and A. Courville. Deep learning (MIT press, Cambridge, MA, 2016).
[21]
L. Kong, Y. Wang and M. Tao. Momentum stiefel optimizer, with applications to suitably-orthogonal attention, and optimal transport, arXiv preprint arXiv:2205.14173v3 (2023).
[22]
J. N. Stephen J. Wright. Numerical optimization (Springer Science+Business Media, New York, NY, 2006).
[23]
A. (https://math.stackexchange.com/users/253273/a-%ce%93). Quasi-newton methods: Understanding DFP updating formula. Mathematics Stack Exchange. URL:https://math.stackexchange.com/q/2279304 (version: 2017-05-13).
[24]
P. Jin, Z. Zhang, A. Zhu, Y. Tang and G. E. Karniadakis. SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems. Neural Networks 132, 166–179 (2020).
[25]
D. Bahdanau, K. Cho and Y. Bengio. Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2014).
[26]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. Attention is all you need. Advances in neural information processing systems 30 (2017).
[27]
K. Jacobs. Discrete Stochastics (Birkhäuser Verlag, Basel, Switzerland, 1992).
[28]
K. Feng. The step-transition operators for multi-step methods of ODE's. Journal of Computational Mathematics, 193–202 (1998).
[29]
M.-T. Luong, H. Pham and C. D. Manning. Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025 (2015).
[30]
K. Feng and M.-z. Qin. The symplectic methods for the computation of Hamiltonian equations. In: Numerical Methods for Partial Differential Equations: Proceedings of a Conference held in Shanghai, PR China, March 25–29, 1987 (Springer, 1987); pp. 1–37.
[31]
Z. Ge and K. Feng. On the approximation of linear Hamiltonian systems. Journal of Computational Mathematics, 88–97 (1988).
[32]
B. Brantner, G. de Romemont, M. Kraus and Z. Li. Volume-Preserving Transformers for Learning Time Series Data with Structure, arXiv preprint arXiv:2312:11166v2 (2024).
[33]
T. Blickhan. A registration method for reduced basis problems using linear optimal transport, arXiv preprint arXiv:2304.14884 (2023).
[34]
S. Fresca, L. Dede’ and A. Manzoni. A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs. Journal of Scientific Computing 87, 1–36 (2021).
[35]
M. Raissi, P. Perdikaris and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics 378, 686–707 (2019).
[36]
P. Buchfink, S. Glas and B. Haasdonk. Symplectic model reduction of Hamiltonian systems on nonlinear manifolds and approximation with weakly symplectic autoencoder. SIAM Journal on Scientific Computing 45, A289–A311 (2023).
[37]
L. Peng and K. Mohseni. Symplectic model reduction of Hamiltonian systems. SIAM Journal on Scientific Computing 38, A1–A27 (2016).
[38]
C. Greif and K. Urban. Decay of the Kolmogorov N-width for wave problems. Applied Mathematics Letters 96, 216–222 (2019).
[39]
K. Lee and K. T. Carlberg. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. Journal of Computational Physics 404, 108973 (2020).
[40]
B. Brantner and M. Kraus. Symplectic autoencoders for Model Reduction of Hamiltonian Systems, arXiv preprint arXiv:2312.10004 (2023).
[41]
B. Leimkuhler and S. Reich. Simulating hamiltonian dynamics. No. 14 (Cambridge university press, 2004).
[42]
[43]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation 9, 1735–1780 (1997).
[44]
A. Hemmasian and A. Barati Farimani. Reduced-order modeling of fluid flows with transformers. Physics of Fluids 35 (2023).
[45]
A. Solera-Rico, C. S. Vila, M. Gómez, Y. Wang, A. Almashjary, S. Dawson and R. Vinuesa, $\beta$-Variational autoencoders and transformers for reduced-order modelling of fluid flows, arXiv preprint arXiv:2304.03571 (2023).
[46]
P. Jin, Z. Lin and B. Xiao. Optimal unit triangular factorization of symplectic matrices. Linear Algebra and its Applications (2022).
[47]
N. Patwardhan, S. Marrone and C. Sansone. Transformers in the real world: A survey on nlp applications. Information 14, 242 (2023).
[48]
B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).
[49]
T. Lin and H. Zha. Riemannian manifold learning. IEEE transactions on pattern analysis and machine intelligence 30, 796–809 (2008).
[50]
T. Blickhan. BrenierTwoFluids.jl, https://github.com/ToBlick/BrenierTwoFluids (2023).
[51]
T. Frankel. The geometry of physics: an introduction (Cambridge university press, Cambridge, UK, 2011).
[52]
B. Brantner, G. de Romemont, M. Kraus and Z. Li. Structure-Preserving Transformers for Learning Parametrized Hamiltonian Systems, arXiv preprint arXiv:2312:11166 (2023).
[53]
W. Huang, P.-A. Absil and K. A. Gallivan. A Riemannian BFGS method for nonconvex optimization problems. In: Numerical Mathematics and Advanced Applications ENUMATH 2015 (Springer, 2016); pp. 627–634.