Abstract: With the development of communication technologies, the practice of creating new texts by manipulating original sentence structures through multi-turn machine translation is widespread across various domains. Existing plagiarism detection models often treat different features uniformly and overlook the significance of disparities within high-dimensional features. Therefore, this paper proposes a novel plagiarism detection model towards multi-turn text back-translation (PDMTT), adopting a novel mechanism that combines local and global features and enhances them. The grouping enhancement fusion (GEF) mechanism assigns importance coefficients to sub-features, reinforcing critical aspects while diminishing less relevant ones. These enhanced features, generated by the GEF mechanism, are leveraged to extract high-quality text representations, thereby improving the precision of the model in distinguishing original content from back-translated texts. Furthermore, we improve the back-translation plagiarism detection capability of our model by optimizing the contrastive loss function and utilizing the fused translated representations as targets. To validate the effectiveness of our model, we also constructed a multi-tuple back-translation plagiarism dataset for model training and validation. Experimental results demonstrate that the proposed PDMTT outperforms previous methods in back-translation plagiarism detection, yielding superior text representations. The ablation study further confirms that the incorporation of the GEF mechanism effectively enhances the discrimination capability of our model.
Loading