Keywords: programmable subjects, LLM agents, scientific discovery, AI alignment, machine learning, agent-based modeling, emergent behavior, experimental framework, safety, interpretability
TL;DR: LLM agents should be developed and studied as programmable subjects to advance scientific discovery and AI alignment.
Abstract: This position paper argues that the next leap in machine learning science will come from treating LLM agents as programmable subjects—digital analogues to laboratory animals—enabling controlled, systematic discovery of emergent traits and alignment failures. Just as laboratory rats revolutionized biology by enabling precise experimentation, LLM agents, when configured as programmable subjects, can serve as digital instruments for probing the generative mechanisms and risks of complex AI systems. Current evaluation methods focus on capabilities, but miss the deeper understanding of emergent behaviors needed for safety and alignment. By building computational laboratories around programmable subjects, researchers can identify inherent traits, rigorously test alignment strategies, and reveal potential failure modes before deployment. This position is timely and important as LLMs are increasingly deployed in high-stakes domains, and it aims to stimulate discussion on the scientific foundations of AI safety and alignment. We call for the community to prioritize the development and adoption of programmable subject frameworks as a standard tool for alignment and safety research.
Submission Number: 658
Loading