Abstract: Large Language Models (LLMs) are gearing up to surpass human creativity. The veracity of the statement needs careful consideration. Numerous LLMs entered the market in succession, each better than the last. In light of recent developments in LLMs market,such as reasoning and agent-based architecture models, LLMs have significantly uplifted the opinion of LLMs in human minds. In these developments, critical questions arise regarding the authenticity of human work and the preservation of their creativity and innovative abilities. This paper investigates such issues. This paper addresses machine-generated content across several scenarios, including document-level binary and multiclass classification, sentence-level segmentation to differentiate between human and machine generated text, and a survey of adversarial attacks aimed at reducing the detectability of machine-generated content. We introduce a new work called BMAS English: an English language dataset for Binary classification of human and machine text, for Multiclass-classification, which not only identifies machine-generated text but can also try to determine its generator, and Adversarial attack addressing where it is a common act for the mitigation of detection, and Sentence-level segmentation, for predicting the boundaries between human and machine-generated text. We believe that this paper will address previous work done in machine-generated text detection (MGTD) in a more meaningful way. All source codes and datasets can be seen in our GitHub repository https://github.com/saitejalekkala33/E-BMAS-A-mixture-of-AI-Detectors.git.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Natural Language Processing, AI-Generated Text Detection, Sentence-Level Segmentation, Adversarial Attacks, Mixture-of-Experts, Text Classification
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 3656
Loading