Refactoring Codebases Through Library Design

Refactoring Codebases Through Library Design

ICLR 2026 Conference Submission21836 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Code, agents, refactoring, compression, library learning

TL;DR: We introduce a benchmark and method for refactoring programs into libraries. Using a user study and analysis, we find Minimum Description Length (MDL) captures good refactorings. Our method outperforms prior work and refactors real-world repos.

Abstract: Maintainable and general software allows developers to build robust applications efficiently, yet achieving these qualities often requires refactoring specialized solutions into reusable components. This challenge becomes particularly relevant as code agents become used to solve isolated one-off programming problems. We investigate code agents' capacity to refactor code in ways that support growth and reusability. We first investigate what makes a good refactoring, finding via asymptotics analysis and a human study that Minimum Description Length best aligns with developer preferences for code refactoring quality. We then present both a benchmark and a method for refactoring: MiniCode, a benchmark where multiple files must be refactored into a shared library, and Librarian, a sample-and-rerank method for generating reusable libraries. We compare Librarian to state-of-the-art library generation methods, and study it on real-world code bases.

Supplementary Material: pdf

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Submission Number: 21836

Loading