PICore: Physics-Informed Unsupervised Coreset Selection for Data Efficient Neural Operator Training

PICore: Physics-Informed Unsupervised Coreset Selection for Data Efficient Neural Operator Training

TMLR Paper5406 Authors

16 Jul 2025 (modified: 18 Nov 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Neural operators offer a powerful paradigm for solving partial differential equations (PDEs) that cannot be solved analytically by learning mappings between function spaces. However, there are two main bottlenecks in training neural operators: they require a significant amount of training data to learn these mappings, and this data needs to be labeled, which can only be accessed via expensive simulations with numerical solvers. To alleviate both of these issues simultaneously, we propose PICore, an unsupervised coreset selection framework that identifies the most informative training samples without requiring access to ground-truth PDE solutions. PICore leverages a physics-informed loss to select unlabeled inputs by their potential contribution to operator learning. After selecting a compact subset of inputs, only those samples are simulated using numerical solvers to generate labels, reducing annotation costs. We then train the neural operator on the reduced labeled dataset, significantly decreasing training time as well. Across four diverse PDE benchmarks and multiple coreset selection strategies, PICore achieves up to 78% average increase in training efficiency relative to supervised coreset selection methods with minimal changes in accuracy.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We summarize the changes below: > In (4) $\mathcal{A}$ is used as a function space, which is in conflict with its usage as coreset selection algorithm in Fig. 1. 1) We have updated the figure to use $\mathcal{C}$ as the coreset selection algorithm instead of $\mathcal{A}$. > There is an error in "which mitachieving both higher" 2) We have fixed the typo. > Provide some insights into why the chosen active learning approach performs worse than random sampling (as in the rebuttal). Also Figs. 4-7 show that the difference between PICore and unsupervised selection methods is small, at least if it is not clear which coreset selection approach one should take. 3) We have added this information in Sections 6.1 and 6.2.1. > Tone down claims on speed-ups, especially with the random baseline in mind. 4) We have changed the wording to reduce the speed-up claim, largely emphasizing PICore's speed up relative to supervised coreset selection while acknowledging the accuracy and efficiency of random sampling. > Acknowledge prior work on active learning for PDEs in a section devoted to related work. Correct the claim that Musekamp et al. only considers PINNs (it does not consider PINNs and only considers Neural Operators such as the models used here). 5) We have added this as part of the related work.

Assigned Action Editor: ~Bernhard_C_Geiger1

Submission Number: 5406

Loading