Benchmarking Hamiltonian Simulation Using Graphical Processing Units

Published: 29 Jul 2025, Last Modified: 29 Jul 2025PQAI 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hamiltonian simulation, CUDA-Q, GPU acceleration, Trotterization, multi-GPU scaling
TL;DR: Performance analysis of quantum Hamiltonian simulation using acceleration from Graphical Processing Units (GPU)
Abstract: Simulating quantum systems is a foundational application in quantum computing, particularly in fields such as computational chemistry. We present our use of a scalable framework, the Quantum Economic Development Consortium (QED-C) Application-Oriented Benchmark Suite (QED-C), to evaluate the performance of quantum algorithms across various hardware platforms. A key focus is leveraging NVIDIA CUDA-Q, a powerful GPU-accelerated platform for quantum-classical hybrid programming, to benchmark Hamiltonian simulation, Quantum Fourier Transform (QFT), and Phase Estimation (PE). We simulate a range of physical systems within HamLib [12], including the transverse field Ising, Heisenberg, and Fermi-Hubbard models, as well as molecules such as H2 using Suzuki-Trotter evolution. Simulations were executed on NVIDIA GPUs, including the A100, H100, GH200, and GB200 systems, at Purdue University [7] and Lawrence Berkeley National Laboratory (LBNL) [8], as well as in collaboration with NVIDIA. CUDA-Q’s SpinOperator formalism enabled emulation of circuits for up to 38 qubits on the LBNL cluster, with performance up to 3× faster than real quantum hardware. Strong scaling behavior is observed up to 32 GPUs, with execution times for some simulations reduced by more than 90%. For example, execution times for simulating a 33-qubit TFIM dropped from 19 s (1 GPU) to 2 s (32 GPUs). Despite these gains, we observe classical HPC-like diminishing returns beyond 8 GPUs, due to inter-GPU communication bottlenecks. This impact is mitigated in the latest GB200 clusters that support extending the high-bandwidth NVLink GPU interconnect across multiple nodes. CUDA-Q proves especially effective for sampling-heavy workloads, offering near-linear scaling and improved parallel efficiency for PE and QFT. Our findings demonstrate that GPU-accelerated quantum Hamiltonian simulation with CUDA-Q provides a robust and high-throughput alternative to noisy intermediate-scale quantum (NISQ) devices, paving the way for future kernel-level optimizations and distributed quantum computing strategies. We would like to have the paper published as an extended abstract!
Submission Number: 23
Loading