Keywords: competitive programming, large language models, multi-agent systems, algorithmic complexity, profiling, efficiency, C++17, resource constraints
TL;DR: SwiftSolve is a multi-agent system that goes beyond correctness by profiling runtime and memory, using complexity-guided repair to solve competitive programming tasks more efficiently than single-agent LLMs.
Abstract: Correctness alone is insufficient: LLM-generated programs frequently satisfy unit tests while violating contest time or memory budgets. We present SwiftSolve, a complexity-aware multi-agent system for competitive programming that couples algorithmic planning with empirical profiling and complexity-guided repair. We frame competitive programming as a software environment where specialised agents act as programmers, each assuming roles such as planning, coding, profiling, and complexity analysis. A Planner proposes an algorithmic sketch; a deterministic Static Pruner filters high-risk plans; a Coder emits ISO C++17; a Profiler compiles and executes candidates on a fixed input-size schedule to record wall time and peak memory; and a Complexity Analyst fits log–log growth (slope s, R2 ) with an LLM fallback to assign a complexity class and dispatch targeted patches to either the Planner or Coder. Agents communicate via typed, versioned JSON; a controller enforces iteration caps and diminishing returns, stopping. Evaluated on 26 problems (16 BigO(Bench), 10 Codeforces Div. 2), three seeds each (N = 78 runs) in a POSIX sandbox (2 s / 256–512 MB), SwiftSolve attains PASS @1 = 61.54% (16/26) on the first attempt and S OLVED @≤3 = 80.77% with marginal latency change (mean 11.96 s → 12.66 s per attempt). Aggregate run-level success is 73.08% at 12.40 s mean. Failures are predominantly resource-bound, indicating inefficiency rather than logic errors as the principal barrier. Against a Claude Opus 4 single-agent baseline, SwiftSolve improves run-level success (73.1% vs. 52.6%) at ∼2× runtime overhead (12.4 s vs. 6.8 s). Beyond correctness (PASS @ K), we report efficiency metrics (EFF @ K for runtime/memory, incidence of TLE / MLE, and complexity fit accuracy on BigO (Bench), demonstrating that profiling and complexity-guided replanning reduce inefficiency while preserving accuracy. Future studies would integrate comparisons against other multi-agent frameworks and include ablation studies. Our implementation is available at https://github.com/jonasrohw/swiftsolve.
Submission Number: 213
Loading