Physical reinforcement learning

Sam Dillavou; Shruti Mishra

Physical reinforcement learning

Sam Dillavou, Shruti Mishra

Published: 10 Jun 2026, Last Modified: 10 Jun 2026RL in Big Worlds PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: physical learning, big world, contrastive network

TL;DR: We demonstrate tabular Q learning on a simulated physical substrate, and present properties that make it suitable for learning in the big world framework.

Abstract: Digital computers are power-hungry and largely intolerant of damaged components, making them potentially difficult tools for energy-limited autonomous agents in uncertain environments. Recently developed Contrastive Local Learning Networks (CLLNs), analog networks of self-adjusting nonlinear resistors, are inherently low-power and robust to physical damage, but have been exclusively used to perform supervised learning tasks (Dillavou et al., 2024). In this work, we demonstrate success on two simple Markov decision processes using tabular Q-learning adapted for simulated CLLNs. Doing so makes explicit the components (beyond the network being trained) required to enact various tools in the RL toolbox, some of which (policy function and value function) are more natural in this system than others (replay buffer). Limitations to external memory make the big world hypothesis, and learning via tracking and streaming, an attractive framework for reinforcement learning on CLLN-based agents (Javed & Sutton, 2024; Elsayed et al., 2024). We discuss assumptions such as the physical safety that digital hardware requires, CLLNs can forgo, and biological systems cannot rely on, and highlight secondary goals that are important in biology and trainable in CLLNs, but make little sense in digital computers. We share our code for Q-learning adapted for simulated CLLNs, for further consideration as a physical framework for learning in the big world.

Submission Number: 10

Loading