COCO: Coherence-Enhanced Machine-Generated Text Detection Under Low Resource With Contrastive Learning
Abstract: Machine-Generated Text (MGT) detection, a task that discriminates MGT from Human-Written Text (HWT), plays a crucial role in preventing misuse of text generative models, which excel in mimicking human writing style recently.Latest proposed detectors usually take coarse text sequences as input and fine-tune pretrained models with standard cross-entropy loss.However, these methods fail to consider the linguistic structure of texts.Moreover, they lack the ability to handle the low-resource problem which could often happen in practice considering the enormous amount of textual data online.In this paper, we present a coherence-based contrastive learning model named CoCo to detect the possible MGT under low-resource scenario.To exploit the linguistic feature, we encode coherence information in form of graph into text representation.To tackle the challenges of low data resource, we employ a contrastive learning framework and propose an improved contrastive loss for preventing performance degradation brought by simple samples.The experiment results on two public datasets and two self-constructed datasets prove our approach outperforms the state-of-art methods significantly.
Paper Type: long
Research Area: NLP Applications
0 Replies
Loading