CursorCore: Assist Programming through Aligning Anything

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models have been successfully applied to programming assistance tasks, such as code completion, code insertion, and instructional code editing. However, these applications remain insufficiently automated and struggle to effectively integrate various types of information during the programming process, including coding history, code context, and user instructions. In this work, we propose a new framework that comprehensively integrates these information sources, collect data to train our models and evaluate their performance. Firstly, to thoroughly evaluate how well models align with different types of information and the quality of their outputs, we introduce a new benchmark, APEval (Assist Programming Eval), to comprehensively assess the performance of models in programming assistance tasks. Then, for data collection, we develop a data generation pipeline, Programming-Instruct, which synthesizes training data from diverse sources, such as GitHub and online judge platforms. This pipeline can automatically generate various types of messages throughout the programming process. Finally, using this pipeline, we generate 219K samples, fine-tune multiple models, and develop the CursorCore series. We show that CursorCore outperforms other models of comparable size. This framework unifies applications such as inline chat and automated editing, contributes to the advancement of coding assistants.
Lay Summary: Modern code assistants act like short-term memory. They only see the line you are typing and miss everything that led up to it. We give them a longer memory, letting an AI read three things at once: the edits you have already made, the code you are working on now, and any notes you leave for it. To teach this skill, we build a new test called APEval to check whether an AI has really learned this, and collect examples from diverse sources to train the AI, resulting in the CursorCore series, which solve programming tasks more accurately than other AIs of the same size. Our system is open-source — anyone can build on it — paving the way for coding assistants that fix bugs, add features, and answer questions with far fewer operations from developers. It turns AI from a one-off code suggester into a collaborative partner that remembers what you have already done.
Link To Code: https://github.com/TechxGenus/CursorCore
Primary Area: Applications
Keywords: Large Language Models for Code, AI-Assisted Programming
Submission Number: 10068
Loading