Automated Data Selection for Efficient Cost Model Training to Optimize Sparse Matrix Kernels on Emerging Hardware Accelerators

Chamika Sudusinghe; Gerasimos Gerogiannis; Damitha Lenadora; Charles Block; Josep Torrellas; Charith Mendis

Automated Data Selection for Efficient Cost Model Training to Optimize Sparse Matrix Kernels on Emerging Hardware Accelerators

Chamika Sudusinghe, Gerasimos Gerogiannis, Damitha Lenadora, Charles Block, Josep Torrellas, Charith Mendis

Published: 12 Jun 2025, Last Modified: 10 Jul 2025EXAIT@ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: AI for Science

Keywords: deep learning for systems, sparse matrix computations, learned cost models, hardware accelerators, active learning, high-dimensional search spaces

TL;DR: We propose exploration-aware data selection strategies for training neural cost models to optimize sparse matrix kernels, reducing reliance on large datasets and expert heuristics by leveraging structural matrix representations and active learning.

Abstract: Sparse matrix computations are critical for many applications in machine learning, computer vision, and scientific computing. However, optimizing sparse kernels, such as Sparse Matrix-Matrix Multiplication (SpMM) and Sampled Dense-Dense Matrix Multiplication (SDDMM), remain challenging as their performance is sensitive to input characteristics and high dimensionality of the scheduling search space. Specifically, this complexity arises from the interplay of factors such as matrix dimensions, sparsity patterns, sparse storage formats, hardware targets, and compiler-specific scheduling primitives, which together create a highly irregular and non-intuitive performance landscape. While prior work has introduced learned cost models to guide the selection of scheduling primitives, these cost models are typically kernel- and hardware-specific, and either require millions of training samples or depend heavily on expert-designed heuristics. In this work, we frame optimizing sparse matrix kernels as a structured exploration problem and identify key limitations in prior work, including its inability to generalize across kernels and hardware, and to train cost models with limited data samples without relying on expert heuristics. We then propose a solution to automate the data collection effort for cost model training on emerging hardware accelerators. Our method augments a state-of-the-art (SOTA) framework with exploration-aware data sampling and multi-armed bandit-based active learning, enabling data-efficient fine-tuning with minimal manual interventions. Our experimental results demonstrate that these strategies substantially reduce reliance on large training datasets and expert heuristics, while achieving performance comparable to SOTA.

Serve As Reviewer: ~Chamika_Sudusinghe1, ~Damitha_Lenadora1

Submission Number: 100

Loading