Probing API Name Knowledge in Pre-trained Code Models

Probing API Name Knowledge in Pre-trained Code Models

ACL ARR 2024 June Submission2003 Authors

15 Jun 2024 (modified: 08 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in pre-trained code models, like CodeBERT and Codex, have demonstrated remarkable performance across diverse tasks. However, the accurate and clear use of APIs is vital for optimal program functionality, necessitating a deep understanding of API fully qualified names both structurally and semantically. Despite their prowess, current models often falter in suggesting appropriate APIs during code generation, with the underlying reasons remaining largely unexplored. To bridge this gap, we leverage the knowledge probing technique and employ cloze-style tests to gauge the knowledge embedded within these models. Our in-depth analysis assesses a model's grasp of API fully qualified names from two angles: API call and API import. The results shed light on the strengths and weaknesses of existing pre-trained code models. We posit that integrating API structure during pre-training can enhance API usage and code representation. This research aims to steer the evolution of code intelligence and set the course for subsequent investigations.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: code models, knowledge probing

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources

Languages Studied: Python

Submission Number: 2003

Loading