CodeV: Issue Resolving with Visual Data

CodeV: Issue Resolving with Visual Data

ACL ARR 2024 December Submission580 Authors

14 Dec 2024 (modified: 26 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have advanced rapidly in recent years, with their applications in software engineering expanding to more complex repository-level tasks. GitHub issue resolving is a key challenge among these tasks. While recent approaches have made progress on this task, they focus on textual data within issues, neglecting visual data. However, this visual data is crucial for resolving issues as it conveys additional knowledge that text alone cannot. We propose CodeV, the first approach to leveraging visual data to enhance the issue-resolving capabilities of LLMs. CodeV resolves each issue by following a two-phase process: data processing and patch generation. To evaluate CodeV, we construct a benchmark for visual issue resolving, namely Visual SWE-bench. Through extensive experiments, we demonstrate the effectiveness of CodeV, as well as provide valuable insights into leveraging visual data to resolve GitHub issues.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: code generation and understanding

Contribution Types: NLP engineering experiment

Languages Studied: English; Python

Submission Number: 580

Loading