Accelerating Molecular Simulations with OpenAI Triton: Fused GPU Kernels for TensorNet Neural Potentials
Keywords: molecular dynamics, GPU optimization, kernel fusion, Triton, TensorNet, neural network potentials, machine learning force fields, equivariant neural networks, computational chemistry, drug discovery
TL;DR: We achieve 2.82× speedup on TensorNet molecular simulations through profiling-driven GPU kernel fusion with Triton, enabling faster drug discovery and protein folding studies without compromising physical accuracy.
Abstract: Molecular dynamics (MD) simulations are essential for understanding molecular behavior in biology and chemistry, but remain computationally expensive at the scales required for drug discovery and materials design. Machine learning force fields (MLFFs), particularly TensorNet-based architectures, have shown promise in accelerating simulations while maintaining physical accuracy. However, these models still face significant performance bottlenecks in key operations like message passing and tensor decomposition. We present a systematic approach to accelerating TensorNet using Triton, a GPU programming framework that enables kernel fusion and optimized memory access patterns. Through profiling-driven optimization of bottleneck operations, we achieve 3.14 times average speedup on micro-benchmarks and 2.82 times speedup on end-to-end inference, reducing computational time from 13 hours to 4.6 hours for a 1M-step MD simulation. By reducing GPU bottlenecks in TensorNet inference, our approach enables longer molecular simulations without compromising physical accuracy, supporting scalable studies in drug discovery, protein folding, and materials design. Our approach reduces kernel launches by 67-88% through fusion, directly addressing memory bandwidth limitations that dominate MD simulation performance. Code and benchmarks are available at https://github.com/anonymous1234556-peer/TorchMD-triton
Release To Public: Yes, please release this paper to the public
Submission Number: 32
Loading