FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials

Seung Yul Lee; Hojoon Kim; Yutack Park; Dawoon Jeong; Seungwu Han; Yeonhong Park; Jae W. Lee

FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials

Seung Yul Lee, Hojoon Kim, Yutack Park, Dawoon Jeong, Seungwu Han, Yeonhong Park, Jae W. Lee

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: FlashTP accelerates equivariant MLIPs by optimizing Tensor-Product operations, achieving up to speedup and significantly reducing memory footprint.

Abstract: Machine Learning Interatomic Potentials (MLIPs) enable efficient molecular dynamics (MD) simulations with high accuracy. While equivariant MLIPs achieve state-of-the-art accuracy, they face significant computational bottlenecks centered around their Tensor-Product layer, which account for up to 75\% of training time and cause substantial memory overhead. We present FlashTP, a highly optimized tensor-product library that addresses these inefficiencies through kernel fusion, sparse computation, and path-aggregated execution. FlashTP achieves up to 41.6$\times$ and 60.8$\times$ kernel speedups over _e3nn_ and NVIDIA cuEquivariance, respectively. For SevenNet-l3i5, it delivers 4.2$\times$ and 3.5$\times$ speedup while reducing peak memory usage by 6.3$\times$ and 6.2$\times$ for inference and training, respectively. The code is available at https://github.com/SNU-ARC/flashTP.

Lay Summary: Imagine watching a slow-motion movie of atoms as they jiggle, bump into each other, and form new structures. That’s what molecular dynamics (MD) simulations do on a computer—letting scientists see how materials behave or how proteins fold, without costly lab experiments Recently, researchers have started using machine-learning interatomic potentials (MLIPs)—deep neural networks trained on high-precision quantum data—to make these simulations both faster and more accurate. However, MLIP-driven simulations are bottlenecked by a mathematical operation called the tensor product, which consumes approximately 75–90% of both computation time and memory. We built FlashTP, an optimized GPU library that fuses all of those slow steps into one, removing redundant data movement and cleverly skipping work that isn’t needed. On modern hardware, FlashTP lets scientists train their models more than 3.5× faster, run simulations 4.2× faster, and use over 6× less memory compared to the popular MLIP framework _e3nn_. Best of all, it plugs right into the _e3nn_ framework, so you can switch on FlashTP with almost zero code changes and start seeing the speed boost immediately.

Link To Code: https://github.com/SNU-ARC/flashTP

Primary Area: Applications->Chemistry, Physics, and Earth Sciences

Keywords: Equivariant neural networks, Tensor Product, Software libraries, Efficiency, Machine-learned interatomic potential (MLIP), Machine Learning Force Fields (MLFF)

Submission Number: 450

Loading