Abstract: Recent advances in pre-trained code models, like CodeBERT and Codex, have demonstrated remarkable performance across diverse tasks. However, the accurate and clear use of APIs is vital for optimal program functionality, necessitating a deep understanding of API fully qualified names both structurally and semantically. Despite their prowess, current models often falter in suggesting appropriate APIs during code generation, with the underlying reasons remaining largely unexplored. To bridge this gap, we leverage the knowledge probing technique and employ cloze-style tests to gauge the knowledge embedded within these models. Our in-depth analysis assesses a model's grasp of API fully qualified names from two angles: API call and API import. The results shed light on the strengths and weaknesses of existing pre-trained code models. We posit that integrating API structure during pre-training can enhance API usage and code representation. This research aims to steer the evolution of code intelligence and set the course for subsequent investigations.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: code models, knowledge probing
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources
Languages Studied: Python
Submission Number: 2003
Loading