References

[1]: E. Hairer, C. Lubich and G. Wanner. Geometric Numerical integration: structure-preserving algorithms for ordinary differential equations (Springer, Heidelberg, 2006).
[2]: M. Kraus. GeometricIntegrators.jl: Geometric Numerical Integration in Julia, https://github.com/JuliaGNI/GeometricIntegrators.jl (2020).
[3]: B. Brantner and M. Kraus. Symplectic autoencoders for Model Reduction of Hamiltonian Systems, arXiv preprint arXiv:2312.10004 (2023).
[4]: B. Brantner, G. de Romemont, M. Kraus and Z. Li. Volume-Preserving Transformers for Learning Time Series Data with Structure, arXiv preprint arXiv:2312:11166v2 (2024).
[5]: P. Jin, Z. Zhang, A. Zhu, Y. Tang and G. E. Karniadakis. SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems. Neural Networks 132, 166–179 (2020).
[6]: S. Greydanus, M. Dzamba and J. Yosinski. Hamiltonian neural networks. Advances in neural information processing systems 32 (2019).
[7]: B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).
[8]: L. Kong, Y. Wang and M. Tao. Momentum stiefel optimizer, with applications to suitably-orthogonal attention, and optimal transport, arXiv preprint arXiv:2205.14173v3 (2023).
[9]: A. Zhang, A. Chan, Y. Tay, J. Fu, S. Wang, S. Zhang, H. Shao, S. Yao and R. K.-W. Lee. On orthogonality constraints for transformers. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Vol. 2 (Association for Computational Linguistics, 2021); pp. 375–382.
[10]: V. Churavy. KernelAbstractions.jl, https://github.com/JuliaGPU/KernelAbstractions.jl (2024). Used on August 14, 2024.
[11]: T. Besard, C. Foket and B. De Sutter. Effective Extensible Programming: Unleashing Julia on GPUs. IEEE Transactions on Parallel and Distributed Systems (2018), arXiv:1712.03112 [cs.PL].
[12]: T. Besard and M. Hawkins. Metal.jl, https://github.com/JuliaGPU/Metal.jl (2022). Used on August 14, 2024.
[13]: T. Besard, oneAPI.jl, https://github.com/JuliaGPU/oneAPI.jl (2022). Used on August 14, 2024.
[14]: S. Lipschutz. General Topology (McGraw-Hill Book Company, New York City, New York, 1965).
[15]: S. Lang. Fundamentals of differential geometry. Vol. 191 (Springer Science & Business Media, 2012).
[16]: S. I. Richard L. Bishop. Tensor Analysis on Manifolds (Dover Publications, Mineola, New York, 1980).
[17]: S. Lang. Real and functional analysis. Vol. 142 (Springer Science & Business Media, 2012).
[18]: F. Mezzadri. How to generate random matrices from the classical compact groups, arXiv preprint math-ph/0609050 (2006).
[19]: M. P. Do Carmo and J. Flaherty Francis. Riemannian geometry. Vol. 2 (Springer, 1992).
[20]: P.-A. Absil, R. Mahony and R. Sepulchre. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica 80, 199–220 (2004).
[21]: D. D. Holm, T. Schmah and C. Stoica. Geometric mechanics and symmetry: from finite to infinite dimensions. Vol. 12 (Oxford University Press, Oxford, UK, 2009).
[22]: P.-A. Absil, R. Mahony and R. Sepulchre. Optimization algorithms on matrix manifolds (Princeton University Press, Princeton, New Jersey, 2008).
[23]: T. Bendokat, R. Zimmermann and P.-A. Absil. A Grassmann manifold handbook: Basic geometry and computational aspects, arXiv preprint arXiv:2011.13699 (2020).
[24]: W. S. Moses, V. Churavy, L. Paehler, J. Hückelheim, S. H. Narayanan, M. Schanen and J. Doerfert. Reverse-Mode Automatic Differentiation and Optimization of GPU Kernels via Enzyme. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '21 (Association for Computing Machinery, New York, NY, USA, 2021).
[25]: M. Betancourt. A geometric theory of higher-order automatic differentiation, arXiv preprint arXiv:1812.11592 (2018).
[26]: J. Bolte and E. Pauwels. A mathematical model for automatic differentiation in machine learning. Advances in Neural Information Processing Systems 33, 10809–10819 (2020).
[27]: V. I. Arnold. Mathematical methods of classical mechanics. Vol. 60 of Graduate Texts in Mathematics (Springer Verlag, Berlin, 1978).
[28]: M. Kraus, K. Kormann, P. J. Morrison and E. Sonnendrücker. GEMPIC: geometric electromagnetic particle-in-cell methods. Journal of Plasma Physics 83, 905830401 (2017).
[29]: Z. Ge and J. E. Marsden. Lie-poisson hamilton-jacobi theory and lie-poisson integrators. Physics Letters A 133, 134–139 (1988).
[30]: K. Hornik, M. Stinchcombe and H. White. Multilayer feedforward networks are universal approximators. Neural networks 2, 359–366 (1989).
[31]: C. Yun, S. Bhojanapalli, A. S. Rawat, S. J. Reddi and S. Kumar. Are transformers universal approximators of sequence-to-sequence functions? arXiv preprint arXiv:1912.10077 (2019).
[32]: D.-X. Zhou. Universality of deep convolutional neural networks. Applied and computational harmonic analysis 48, 787–794 (2020).
[33]: Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljačić, T. Y. Hou and M. Tegmark. Kan: Kolmogorov-arnold networks, arXiv preprint arXiv:2404.19756 (2024).
[34]: J. W. Burby, Q. Tang and R. Maulik. Fast neural Poincaré maps for toroidal magnetic fields. Plasma Physics and Controlled Fusion 63, 024001 (2020).
[35]: P. Horn, V. S. Ulibarrena, B. Koren and S. P. Zwart. A generalized framework of neural networks for Hamiltonian systems. Journal of Computational Physics 521, 113536 (2025).
[36]: E. Celledoni, M. J. Ehrhardt, C. Etmann, R. I. McLachlan, B. Owren, C.-B. Schonlieb and F. Sherry. Structure-preserving deep learning. European journal of applied mathematics 32, 888–936 (2021).
[37]: T. Bendokat and R. Zimmermann. The real symplectic Stiefel and Grassmann manifolds: metrics, geodesics and applications, arXiv preprint arXiv:2108.12447 (2021).
[38]: B. Gao, N. T. Son, P.-A. Absil and T. Stykel. Riemannian optimization on the symplectic Stiefel manifold. SIAM Journal on Optimization 31, 1546–1575 (2021).
[39]: B. O'neill. Semi-Riemannian geometry with applications to relativity (Academic press, New York City, New York, 1983).
[40]: E. Celledoni and A. Iserles. Approximating the exponential from a Lie algebra to a Lie group. Mathematics of Computation 69, 1457–1480 (2000).
[41]: C. Fraikin, K. Hüper and P. V. Dooren. Optimization over the Stiefel manifold. In: PAMM: Proceedings in Applied Mathematics and Mechanics, Vol. 7 no. 1 (Wiley Online Library, 2007); pp. 1062205–1062206.
[42]: M. Schlarb. Covariant Derivatives on Homogeneous Spaces: Horizontal Lifts and Parallel Transport. The Journal of Geometric Analysis 34, 1–43 (2024).
[43]: I. Goodfellow, Y. Bengio and A. Courville. Deep learning (MIT press, Cambridge, MA, 2016).
[44]: J. N. Stephen J. Wright. Numerical optimization (Springer Science+Business Media, New York, NY, 2006).
[45]: A.Γ. (math.stackexchange user 253273). Quasi-newton methods: Understanding DFP updating formula, https://math.stackexchange.com/q/2279304 (2017). Accessed on September 19, 2024.
[46]: P. Kenneweg, T. Kenneweg and B. Hammer. Improving Line Search Methods for Large Scale Neural Network Training. In: 2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA) (IEEE, 2024); pp. 1–6.
[47]: S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel and S. Lacoste-Julien. Painless stochastic gradient: Interpolation, line-search, and convergence rates. Advances in neural information processing systems 32 (2019).
[48]: W. Huang, P.-A. Absil and K. A. Gallivan. A Riemannian BFGS method for nonconvex optimization problems. In: Numerical Mathematics and Advanced Applications ENUMATH 2015 (Springer, 2016); pp. 627–634.
[49]: B. Gao, N. T. Son and T. Stykel. Symplectic Stiefel manifold: tractable metrics, second-order geometry and Newton's methods, arXiv preprint arXiv:2406.14299 (2024).
[50]: J. Bajārs. Locally-symplectic neural networks for learning volume-preserving dynamics. Journal of Computational Physics 476, 111911 (2023).
[51]: F. Kang and S. Zai-Jiu. Volume-preserving algorithms for source-free dynamical systems. Numerische Mathematik 71, 451–463 (1995).
[52]: H. Cardot. Recurrent neural networks for temporal data processing (BoD–Books on Demand, 2011).
[53]: D. Bahdanau, K. Cho and Y. Bengio. Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2014).
[54]: A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. Attention is all you need. Advances in neural information processing systems 30 (2017).
[55]: K. Jacobs. Discrete Stochastics (Birkhäuser Verlag, Basel, Switzerland, 1992).
[56]: K. Feng. The step-transition operators for multi-step methods of ODE's. Journal of Computational Mathematics, 193–202 (1998).
[57]: M.-T. Luong, H. Pham and C. D. Manning. Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025 (2015).
[58]: K. Feng and M.-z. Qin. The symplectic methods for the computation of Hamiltonian equations. In: Numerical Methods for Partial Differential Equations: Proceedings of a Conference held in Shanghai, PR China, March 25–29, 1987 (Springer, 1987); pp. 1–37.
[59]: Z. Ge and K. Feng. On the approximation of linear Hamiltonian systems. Journal of Computational Mathematics, 88–97 (1988).
[60]: T. Blickhan. A registration method for reduced basis problems using linear optimal transport, arXiv preprint arXiv:2304.14884 (2023).
[61]: S. Fresca, L. Dede’ and A. Manzoni. A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs. Journal of Scientific Computing 87, 1–36 (2021).
[62]: K. Lee and K. T. Carlberg. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. Journal of Computational Physics 404, 108973 (2020).
[63]: M. J. Gander and G. Wanner. From Euler, Ritz, and Galerkin to modern computing. Siam Review 54, 627–666 (2012).
[64]: F. Arbes, C. Greif and K. Urban. The Kolmogorov N-width for linear transport: Exact representation and the influence of the data, arXiv preprint arXiv:2305.00066 (2023).
[65]: C. Greif and K. Urban. Decay of the Kolmogorov N-width for wave problems. Applied Mathematics Letters 96, 216–222 (2019).
[66]: A. Chatterjee. An introduction to the proper orthogonal decomposition. Current science, 808–817 (2000).
[67]: S. Volkwein. Proper orthogonal decomposition: Theory and reduced-order modelling. Lecture Notes, University of Konstanz 4, 1–29 (2013).
[68]: L. Peng and K. Mohseni. Symplectic model reduction of Hamiltonian systems. SIAM Journal on Scientific Computing 38, A1–A27 (2016).
[69]: T. M. Tyranowski and M. Kraus. Symplectic model reduction methods for the Vlasov equation. Contributions to Plasma Physics 63, e202200046 (2023).
[70]: P. Buchfink, S. Glas and B. Haasdonk. Symplectic model reduction of Hamiltonian systems on nonlinear manifolds and approximation with weakly symplectic autoencoder. SIAM Journal on Scientific Computing 45, A289–A311 (2023).
[71]: M. Raissi, P. Perdikaris and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics 378, 686–707 (2019).
[72]: S. Yıldız, P. Goyal, T. Bendokat and P. Benner. Data-Driven Identification of Quadratic Representations for Nonlinear Hamiltonian Systems Using Weakly Symplectic Liftings. Journal of Machine Learning for Modeling and Computing 5 (2024).
[73]: A. Van Der Schaft, D. Jeltsema and others. Port-Hamiltonian systems theory: An introductory overview. Foundations and Trends in Systems and Control 1, 173–378 (2014).
[74]: R. Morandin. Modeling and numerical treatment of port-Hamiltonian descriptor systems. Ph.D. Thesis, Technische Universität Berlin (2023).
[75]: H. Yoshimura and J. E. Marsden. Dirac structures in Lagrangian mechanics Part I: implicit Lagrangian systems. Journal of Geometry and Physics 57, 133–156 (2006).
[76]: H. Yoshimura and J. E. Marsden. Dirac structures in Lagrangian mechanics Part II: Variational structures. Journal of Geometry and Physics 57, 209–250 (2006).
[77]: P. Kotyczka and L. Lefevre. Discrete-time port-Hamiltonian systems: A definition based on symplectic integration. Systems & Control Letters 133, 104530 (2019).
[78]: V. Mehrmann and R. Morandin. Structure-preserving discretization for port-Hamiltonian descriptor systems. In: 2019 IEEE 58th Conference on Decision and Control (CDC) (IEEE, 2019); pp. 6863–6868.
[79]: T. F. Moser. Structure-Preserving Model Reduction of Port-Hamiltonian Descriptor Systems. Ph.D. Thesis, Technische Universität München (2023).
[80]: J. Rettberg, J. Kneifl, J. Herb, P. Buchfink, J. Fehr and B. Haasdonk. Data-driven identification of latent port-Hamiltonian systems, arXiv preprint arXiv:2408.08185 (2024).
[81]: S. E. Otto, G. R. Macchio and C. W. Rowley. Learning nonlinear projections for reduced-order modeling of dynamical systems using constrained autoencoders. Chaos: An Interdisciplinary Journal of Nonlinear Science 33 (2023).
[82]: B. Leimkuhler and S. Reich. Simulating hamiltonian dynamics. No. 14 (Cambridge university press, 2004).
[83]: S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation 9, 1735–1780 (1997).
[84]: A. Hemmasian and A. Barati Farimani. Reduced-order modeling of fluid flows with transformers. Physics of Fluids 35 (2023).
[85]: A. Solera-Rico, C. S. Vila, M. Gómez, Y. Wang, A. Almashjary, S. Dawson and R. Vinuesa, $\beta$-Variational autoencoders and transformers for reduced-order modelling of fluid flows, arXiv preprint arXiv:2304.03571 (2023).
[86]: P. Jin, Z. Lin and B. Xiao. Optimal unit triangular factorization of symplectic matrices. Linear Algebra and its Applications (2022).
[87]: N. Patwardhan, S. Marrone and C. Sansone. Transformers in the real world: A survey on nlp applications. Information 14, 242 (2023).
[88]: A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly and others. An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020).
[89]: M. Toda. Vibration of a chain with nonlinear interaction. Journal of the Physical Society of Japan 22, 431–436 (1967).
[90]: L. Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 141–142 (2012).
[91]: C. Villani and others. Optimal transport: old and new. Vol. 338 (Springer, 2009).
[92]: C. Villani. Topics in optimal transportation. Vol. 58 (American Mathematical Soc., 2021).
[93]: T. Blickhan. BrenierTwoFluids.jl, https://github.com/ToBlick/BrenierTwoFluids (2023).
[94]: M. Innes. Don't Unroll Adjoint: Differentiating SSA-Form Programs. CoRR abs/1810.07951 (2018), arXiv:1810.07951.
[95]: J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat and others. Gpt-4 technical report, arXiv preprint arXiv:2303.08774 (2023).
[96]: Y. Duan, J. S. Edwards and Y. K. Dwivedi. Artificial intelligence for decision making in the era of Big Data–evolution, challenges and research agenda. International journal of information management 48, 63–71 (2019).
[97]: D. C. Psichogios and L. H. Ungar. A hybrid neural network-first principles approach to process modeling. AIChE Journal 38, 1499–1511 (1992).
[98]: N. Baker, F. Alexander, T. Bremer, A. Hagberg, Y. Kevrekidis, H. Najm, M. Parashar, A. Patra, J. Sethian, S. Wild and others. Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence (USDOE Office of Science (SC), Washington, DC (United States), 2019).
[99]: D. N. Arnold, R. S. Falk and R. Winther. Finite element exterior calculus, homological techniques, and applications. Acta numerica 15, 1–155 (2006).
[100]: Y. Lishkova, P. Scherer, S. Ridderbusch, M. Jamnik, P. Liò, S. Ober-Blöbaum and C. Offen. Discrete Lagrangian neural networks with automatic symmetry discovery. IFAC-PapersOnLine 56, 3203–3210 (2023).
[101]: E. Dierkes, C. Offen, S. Ober-Blöbaum and K. Flaßkamp. Hamiltonian neural networks with automatic symmetry detection. Chaos: An Interdisciplinary Journal of Nonlinear Science 33 (2023).
[102]: B. Brantner and M. Kraus. GeometricMachineLearning.jl: Geometric Machine Learning in Julia, https://github.com/JuliaGNI/GeometricMachineLearning.jl (2020).
[103]: The Julia Company. Documentation, https://docs.julialang.org/en/v1/manual/documentation/ (2024). Accessed on August 19, 2024.
[104]: Nvidia Corporation. GeForce RTX 4090, https://www.nvidia.com/de-de/geforce/graphics-cards/40-series/rtx-4090/ (2022). Accessed on August 13, 2024.
[105]: GitHub, Inc. About GitHub-hosted runners, https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners (2024). Accessed on August 13, 2024.
[106]: S. Danisch and J. Krumbiegel. Makie.jl: Flexible high-performance data visualization for Julia. Journal of Open Source Software 6, 3349 (2021).
[107]: D. Bon, G. Pai, G. Bellaard, O. Mula and R. Duits. Optimal Transport on the Lie Group of Roto-translations, arXiv preprint arXiv:2402.15322 (2024).
[108]: D. Kingma. Adam: a method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
[109]: T. Frankel. The geometry of physics: an introduction (Cambridge university press, Cambridge, UK, 2011).
[110]: J. Li, L. Fuxin and S. Todorovic. Efficient riemannian optimization on the stiefel manifold via the cayley transform, arXiv preprint arXiv:2002.01113 (2020).
[111]: L. Huang, X. Liu, B. Lang, A. Yu, Y. Wang and B. Li. Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32 no. 1 (2018).
[112]: R. Xiong, Y. Yang, D. He, K. Zheng, S. Zheng, C. Xing, H. Zhang, Y. Lan, L. Wang and T. Liu. On layer normalization in the transformer architecture. In: International Conference on Machine Learning (PMLR, 2020); pp. 10524–10533.
[113]: S. Fresca and A. Manzoni. POD-DL-ROM: Enhancing deep learning-based reduced order models for nonlinear parametrized PDEs by proper orthogonal decomposition. Computer Methods in Applied Mechanics and Engineering 388, 114181 (2022).
[114]: P. J. Morrison. A paradigm for joined Hamiltonian and dissipative systems. Physica D: Nonlinear Phenomena 18, 410–419 (1986).
[115]: A. Gruber, M. Gunzburger, L. Ju and Z. Wang. Energetically consistent model reduction for metriplectic systems. Computer Methods in Applied Mechanics and Engineering 404, 115709 (2023).
[116]: P. Schulze. Structure-preserving model reduction for port-hamiltonian systems based on a special class of nonlinear approximation ansatzes, arXiv preprint arXiv:2302.06479 (2023).
[117]: M. Mamunuzzaman and H. Zwart. Structure preserving model order reduction of port-Hamiltonian systems, arXiv preprint arXiv:2203.07751 (2022).
[118]: S. Duane, A. D. Kennedy, B. J. Pendleton and D. Roweth. Hybrid monte carlo. Physics letters B 195, 216–222 (1987).
[119]: A. D. Cobb and B. Jalaian. Scaling Hamiltonian Monte Carlo inference for Bayesian neural networks with symmetric splitting. In: Uncertainty in Artificial Intelligence (PMLR, 2021); pp. 675–685.
[120]: A. Fichtner and S. Simutė. Hamiltonian Monte Carlo inversion of seismic sources in complex media. Journal of Geophysical Research: Solid Earth 123, 2984–2999 (2018).
[121]: A. Wibisono, A. C. Wilson and M. I. Jordan. A variational perspective on accelerated methods in optimization, proceedings of the National Academy of Sciences 113, E7351–E7358 (2016).
[122]: V. Duruisseaux and M. Leok. Accelerated optimization on Riemannian manifolds via discrete constrained variational integrators. Journal of Nonlinear Science 32, 42 (2022).
[123]: H. Sato and K. Aihara. Cholesky QR-based retraction on the generalized Stiefel manifold. Computational Optimization and Applications 72, 293–308 (2019).
[124]: B. Gao, N. T. Son and T. Stykel. Optimization on the symplectic Stiefel manifold: SR decomposition-based retraction and applications. Linear Algebra and its Applications 682, 50–85 (2024).
[125]: B. Brantner, G. de Romemont, M. Kraus and Z. Li. Structure-Preserving Transformers for Learning Parametrized Hamiltonian Systems, arXiv preprint arXiv:2312:11166 (2023).
[126]: T. Lin and H. Zha. Riemannian manifold learning. IEEE transactions on pattern analysis and machine intelligence 30, 796–809 (2008).