API Misuse Detection Based on Stacked LSTM

Shuyin OuYang; Fan Ge; Li Kuang; Yuyu Yin

API Misuse Detection Based on Stacked LSTM

Shuyin OuYang, Fan Ge, Li Kuang, Yuyu Yin

Published: 01 Jan 2020, Last Modified: 13 Nov 2024CollaborateCom (1) 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In modern software engineering, API (Application Programming Interface) is widely used to develop applications rapidly by reusing data structure, frameworks, class libs, and etc. However, due to the considerable number of interfaces, lack of documents and timely maintenance and updates, APIs are often used in a wrong way. Therefore, it has become an important problem to detect API misuse in an automatic way. Many existing automatic API detecting methods do not make full use of APIs’ potential semantic information and independent integrity of each API. In this paper, we employ Stacked LSTM to learn the API usage specification to detect the API misuse defects. Specifically, first, we obtain ACSG (API Call Syntax Graph) through the static analysis of source code. And then, based on ACSG, we generate API sequences, and transform the sequences into <precious API sequence, next API> for training. Third, in order to represent the APIs in a semantic way, we apply word2vec as a pre-training model to embed features of each API. Though the stacked LSTM model, we regard embedding precious API sequence as the input to model the API use specifications and discover the potential API misuse defects by judging whether the next API is in the output (API probability list) or not. We design experiments to evaluate the effectiveness our method with Java Cryptography APIs and their used code, and the results show the advancement of our proposed method.

Loading