MultiCode: A Unified Code Analysis Framework based on Multi-type and Multi-granularity Semantic Learning

Xu Duan, Jingzheng Wu, Mengnan Du, Tianyue Luo, Mutian Yang, Yanjun Wu

Published: 2021, Last Modified: 13 May 2025ISSRE Workshops 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Code analysis is one of the common way to ensure software reliability. With the development of machine learning technology, more and more learning-based code analysis methods are proposed. However, most existing methods are aimed at specific code analysis tasks, which leads to the extra effort to implement different models for different tasks in industrial applications. In this paper, we propose MultiCode, a novel unified code analysis framework, which learns code semantic information of different types and granularities to cover the semantic information required by different tasks, so that it can be effectively adapted to multiple tasks with higher accuracy. To prove the effectiveness of MultiCode, we demonstrate and evaluate it on two common tasks: vulnerability detection and code clone detection. Experimental results show that MultiCode achieves F1-scores of 94.6%, 92.5% and 97.1% on SARD-BE, SARD-RME and OJClone datasets, which is significantly higher than the advanced existing methods.