Keywords: Sparse Attention, Point Cloud Processing, Large-Scale Simulations
TL;DR: Ball Sparse Attention (BSA) extends Native Sparse Attention with Ball-Tree neighborhoods and grouped selection, giving transformers a global receptive field on irregular geometries at sub-quadratic cost.
Abstract: Self-attention scales quadratically with input size, limiting its use for large-scale physical systems. Although sparse attention mechanisms provide a viable alternative, they are primarily designed for regular structures such as text or images, making them inapplicable for irregular geometries. In this work, we present Ball Sparse Attention (BSA), which adapts Native Sparse Attention (NSA) (Yuan et al., 2025) to unordered point sets by imposing regularity using the ball-tree structure from the Erwin Transformer (Zhdanov et al., 2025). We modify each of NSA’s components to work with ball-based neighborhoods, yielding a global receptive field at sub-quadratic cost. On an airflow pressure prediction task, we achieve accuracy comparable to full attention while reducing computational cost by a significant margin.
Submission Number: 40
Loading