How Can LLMs Serve as Experts in Malicious Code Detection? A Graph Representation Learning Based Approach

Hang Gao; Tao Peng; Baoquan Cui; Hong Huang; Fengge Wu; Zhao Junsuo; Jian Zhang

How Can LLMs Serve as Experts in Malicious Code Detection? A Graph Representation Learning Based Approach

Hang Gao, Tao Peng, Baoquan Cui, Hong Huang, Fengge Wu, Zhao Junsuo, Jian Zhang

15 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large lanaguage model, Code, Graph Representation Learning

Abstract: Large Language Models (LLMs) excel in code processing yet encounter challenges in malicious code detection, primarily due to their limited ability to capture long-range dependencies within large and complex codebases. To address this limitation, we propose a graph representation learning-based attention acquisition framework to enhance LLMs’ malicious code detection capabilities. Specifically, our method constructs a graph representation of the codes, extracts semantic and structural features using an LLM, and trains a Graph Neural Network (GNN) with minimally labeled data. The GNN performs an initial detection and, through backtracking of its predictions, identifies key code sections that are most likely to contain malicious behavior. These identified sections then guide the attention of the LLM for in-depth analysis. By concentrating the LLM’s processing on these critical regions, our approach reduces the interference of redundant or irrelevant data, thereby improving detection accuracy and efficiency while maintaining low annotation costs. Extensive evaluation on both custom-built and public datasets demonstrates that our approach consistently outperforms existing detection methods, highlighting its potential for practical deployment in software security scenarios.

Supplementary Material: zip

Primary Area: learning on graphs and other geometries & topologies

Submission Number: 5505

Loading