Abstract: Large language models~(LLMs) demonstrate significant potential to revolutionize software engineering (SE). However, the high reliability and risk control requirements in software engineering raise concerns about the need for interpretability of LLMs. To address this concern, we conducted a study to evaluate the capabilities of LLMs and their limitations for code analysis in SE. Code analysis is essential in software development. It identifies bugs, security, and compliance problems and evaluates code quality and performance. We break down the abilities needed for LLMs to address SE tasks related to code analysis into three categories: 1) syntax understanding, 2) static behaviour understanding, and 3) dynamic behaviour understanding. We used four foundational models and assessed the performance of LLMs on multiple-language tasks. We found that, while LLMs are good at understanding code syntax, they struggle with comprehending code semantics, particularly dynamic semantics. Furthermore, our study highlights that LLMs are susceptible to hallucinations when interpreting code semantic structures. It is necessary to explore methods to verify the correctness of LLM's output to ensure its dependability in SE. More importantly, our study provides an initial answer to why the codes generated by LLM are usually syntax-correct but are possibly vulnerable.
Paper Type: long
Research Area: NLP Applications
Contribution Types: Model analysis & interpretability
Languages Studied: Python, C, Java and Solodity
0 Replies
Loading