Keywords: Code, agents, refactoring, compression, library learning
TL;DR: We introduce a benchmark and method for refactoring programs into libraries. Using a user study and analysis, we find Minimum Description Length (MDL) captures good refactorings. Our method outperforms prior work and refactors real-world repos.
Abstract: Maintainable and general software allows developers to build robust applications efficiently, yet achieving these qualities often requires refactoring specialized solutions into reusable components. This challenge becomes particularly relevant as code agents become used to solve isolated one-off programming problems. We investigate code agents' capacity to refactor code in ways that support growth and reusability. We first investigate what makes a good refactoring, finding via asymptotics analysis and a human study that Minimum Description Length best aligns with developer preferences for code refactoring quality. We then present both a benchmark and a method for refactoring: MiniCode, a benchmark where multiple files must be refactored into a shared library, and Librarian, a sample-and-rerank method for generating reusable libraries. We compare Librarian to state-of-the-art library generation methods, and study it on real-world code bases.
Supplementary Material: pdf
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 21836
Loading