Untargeted Code Authorship Evasion with Seq2Seq Transformation

Soohyeon Choi, Rhongho Jang, DaeHun Nyang, David Mohaisen

Published: 2023, Last Modified: 25 Jan 2026CSoNet 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system designed initially for function-level code translation from one language to another (e.g., Java to C\(\#\)), using transfer learning. SCAE improved the efficiency at a slight accuracy degradation compared to existing work. We also reduced the processing time by \(\approx \) 68% while maintaining an 85% transformation success rate and up to 95.77% evasion success rate in the untargeted setting.

External IDs:dblp:conf/csonet/ChoiJNM23