Keywords: Circuit Analysis, Attribution Graphs, Methods (probing, steering, causal interventions), Interpretability for Knowledge Discovery
TL;DR: LAWFUL tests whether neural networks predicting continuous physical systems internally use law-consistent circuits, illustrated by Doppler consistency in MoCap-to-radar synthesis.
Abstract: When a neural network predicts a physical system accurately, has it learned the governing law as formal, structured knowledge, and if so, does the network's internal computation actually use that representation throughout the law's domain of validity? We identify four interpretability gaps that limit answering these questions for {\em physics laws over continuous variables}: the absence of a coverage-aware causal-consistency measure over continuous counterfactuals; of a domain-of-validity test for the identified circuit; of a verification of the law's invariants and forbidden behaviors; and of a quantification of how a derived physical quantity flows through the circuit. We develop a foundational framework, LAWFUL, that closes the first two and lays the groundwork for the remaining two, and illustrate it on the Mocap2Radar transformer, validating whether it learns and internally uses the Doppler frequency law $f(t) = \frac{2 v(t)}{\lambda}$
from motion-capture and radar data in which neither $f(t)$ nor $v(t)$ appears.
Submission Number: 648
Loading