Abstract: We present an algorithm to find nonidiomatic snippets in Python source code and replace them with cleaner and more performant alternatives. The algorithm is divided into three subtasks: (i) the snippets are localized, then for each snippet (ii) the type of the nonidiomatic pattern, and (iii) the key variables are determined. The subtasks of localizing patterns and extracting variables are solved as Sequence Tagging tasks with LSTM networks. Determining the nonidiomatic pattern type is a classification problem, which we solve by training a feedforward neural network. We evaluate the process on a dataset containing more than 13 000 programs coded by students.
0 Replies
Loading