They Are Not Static: A Survey of Dynamic Agent Skills

They Are Not Static: A Survey of Dynamic Agent Skills

03 May 2026 (modified: 04 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language model agents increasingly externalize procedural knowledge into reusable skills: invocable code, natural-language procedures, SKILL.md packages, graphs, or parametric adapters. This externalization turns adaptation into a new learning problem. The agent does not only update its prompt or weights; it updates a library of artifacts that changes what future policies can retrieve, compose, execute, and trust. This survey studies the rapidly growing 2023–2026 literature on dynamic or self-evolving skill systems and argues that such systems are best understood as lifecycle-managed, verified, evolving artifact stores for LLM agents. We extend the options-based skill formalism to a seven-tuple—applicability, policy, termination, interface, edit, verification, and lineage—that makes edits, admission verification, and provenance explicit. We further lift this view to library-level dynamics, in which a library at time t is transformed into a new library at time t+1 by a ten-operator algebra: ADD, REFINE, MERGE, SPLIT, PRUNE, DISTILL, ABSTRACT, COMPOSE, REWRITE, and RERANK. Using this formalism, we organize a 94-paper modern audit set of dynamic-skill and boundary/context papers around a skill lifecycle: evidence acquisition, proposal, verification and admission, organization, retrieval and composition, maintenance and repair, distillation and portability, and governance. The resulting taxonomy separates artifact families, update loci, assurance models, storage topologies, maintenance regimes, and governance maturity without reducing the field to a list of systems. We then synthesize the mechanisms that make lifecycle-managed stores improve: edit repertoires, admission gates, storage and retrieval structure, and fast-slow update clocks. The most consistent evidence is that admission and repair matter more than raw skill count, verifier quality is often load-bearing in skill-aware reinforcement learning, flat retrieval degrades in the moderate-library-size regime, and benchmarks still under-report library trajectories. We close with a research agenda for compositional verifiers, maintenance schedules, registry-scale retrieval, cross-library portability, provenance, and lifecycle-aware evaluation.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Tommaso_R._Cesari1

Submission Number: 8734

Loading