Keywords: Large Language Models (LLMs), Code Generation, Attention, Logits, Anchoring, Prompt
TL;DR: A novel approach to adjusting the influence of selected input tokens and improving performance by addressing the attention dilution issue in code generation tasks.
Abstract: Recent advances in large language models (LLMs) have transformed software development by automatically generating code based on users' requests in natural language. Despite these advancements, challenges remain in generating buggy code and fully aligning with user intent. Our empirical study reveals LLMs tend to dilute their self-attentions on the initial prompt as more code tokens are generated. We hypothesize this self-attention dilution issue is one of the root causes of inaccuracies in LLM-generated code. To mitigate this issue, we propose **S**elective **P**rompt **A**nchoring (SPA) to amplify the influence of the selected parts in the initial prompt, which we refer to as "anchored text", during code generation. Specifically, SPA calculates the logit distribution difference with and without the anchored text. We prove this logit difference approximates the anchored text's contextual contribution to the output logits. SPA creates an augmented logit distribution by linearly combining the original logit distribution and the logit difference. We evaluate SPA with five LLMs on four benchmarks. Our results show that after tuning on a few dozen instances, SPA consistently improves Pass@1 on new tasks by up to 7.6% across all settings. Notably, with selective text anchoring, a small version of DeepSeek-Coder (6.7B) can achieve better performance than an original much larger version (33B). Our code is available at https://anonymous.4open.science/r/Selective-Prompt-Anchoring-74E7.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8882
Loading