Behavioral Continuity in Agentic LLMs: An Engineering Mental Structure Approach
Keywords: agentic LLM, behavioral continuity, identity drift, template regression, state machine, runtime architecture, drift benchmark, multi-model evaluation
TL;DR: EMSA is a runtime architecture that prevents identity drift in agentic LLMs through explicit state control; across five models, all baseline agents abandon identity under attack while all EMSA agents resist.
Abstract: As agentic LLM systems are deployed in multi-turn, multi-domain settings, a critical failure mode emerges: behavioral discontinuity—the agent's identity, operating mode, and interaction contract drift unpredictably across turns, collapsing into generic templates or unintentionally shifting persona. We argue that this problem cannot be solved by longer context or better memory alone; continuity requires explicit state control, not merely information availability. We introduce EMSA (Engineering Mental Structure API), a runtime architecture that externalizes behavior-critical variables (identity anchor, constraints, abstraction level, goals) into auditable state, governed by a deterministic state machine. To measure continuity, we propose DTS (Drift Test Suite), a reproducible benchmark with five stress-test cases and six metrics covering identity stability, template regression, abstraction control, and goal recovery. Experiments across five instruction-tuned LLMs from four providers show that EMSA achieves the highest identity stability across all models, with the strongest effect under direct identity attack—where all five baseline agents abandon their operating contract while all five EMSA agents resist (+0.096 ISS). Extended context, by contrast, consistently degrades continuity across all models tested, empirically confirming that behavioral stability is a state control problem, not a memory problem. All specifications, test cases, and experimental code are released to support reproducibility.
Submission Number: 21
Loading