A Stacked Bidirectional LSTM Model for Classifying Source Codes Built in MPLs

Md. Mostafizer Rahman; Yutaka Watanobe; Rage Uday Kiran; Raihan Kabir

A Stacked Bidirectional LSTM Model for Classifying Source Codes Built in MPLs

Md. Mostafizer Rahman, Yutaka Watanobe, Rage Uday Kiran, Raihan Kabir

Published: 01 Jan 2021, Last Modified: 19 Feb 2025PKDD/ECML Workshops (2) 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Over the years, programmers have improved their programming skills and can now write code in many different languages to solve problems. A lot of new code is being generated all over the world regularly. Since a programming problem can be solved in many different languages, it is quite difficult to identify the problem from the written source code. Therefore, a classification model is needed to help programmers identify the problems built (written/developed) in Multi-Programming Languages (MPLs). This classification model can help programmers learn better programming. However, source code classification models based on deep learning are still lacking in the field of programming education and software engineering. To address this gap, we propose a stacked Bidirectional Long Short-Term Memory (Bi-LSTM) neural network-based model for classifying source codes developed in MPLs. To accomplish this research, we collect a large number of real-world source codes from the Aizu Online Judge (AOJ) system. The proposed model is trained, validated, and tested on the AOJ dataset. Various hyperparameters are fine-tuned to improve the performance of the model. Based on the experimental results, the proposed model achieves an accuracy of about 93% and an F1-score of 89.24%. Moreover, the proposed model outperforms the state-of-the-art models in terms of other evaluation matrices such as precision (90.12%) and recall (89.48%).

Loading