MERA: Model Evolution and Routing with Skill Adaptation for Agentic Systems at Scale

Published: 23 May 2026, Last Modified: 23 May 2026ACM CAIS 2026: RLEval Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: RL, evolution, agentic system, routing
Abstract: Language-model agents increasingly mix strong but expensive frontier models with cheaper models that are useful only on safe subsets of a workflow. The challenge is not only to choose a model once per user request, but to adapt many individual invocations inside a multi-step trace without silently degrading quality. We present MERA, a trace-driven framework that jointly evolves three tracks: SkillBook statistics for recurring prompt signatures, a learned invocation-level router, and a small-model adapter. The main empirical finding is that the joint schedule matters: in a code-generation setting with 590 executable code tasks and 3,328 weakly labelled router examples, the best four-cycle order is Skill $\rightarrow$ LLM $\rightarrow$ Router. This setting reaches 87.3\% router accuracy with 4.4\% fallback and reduces estimated serving cost to 51.8\% of always using the large model. An eight-cycle run peaks at 87.8\% router accuracy and then stabilizes in the 82--87\% band. Component behavior is uneven but informative: SkillBook and router updates provide the largest cost-quality gains, while the 1.5B GRPO adapter gives a modest pass-rate improvement, reaching 47.5\% on MBPP eval200 versus roughly 47.0\% for the base setting, but does not yet show cumulative gains across cycles. MERA therefore frames agent self-evolution as a conservative systems loop over shared traces, where skill, model, and routing updates are admitted together through replay.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 13
Loading