Abstract: Large Language Models (LLMs) have advanced natural language generation but pose ethical and practical challenges, making it crucial to detect machine-generated texts. Traditional detection methods rely on complex, hard-to-interpret neural encodings and model-specific features like perplexity. This study explores whether grammatical patterns-specifically sequences of parts of speech (POS), including punctuation and symbols-can distinguish machine-written texts from human ones. Using a CNN classifier on POS sequences, the approach achieves nearly 90 % accuracy on a benchmark dataset. Combining POS-based features with neural embeddings improves performance, and the model shows robustness against adversarial attacks, though it is less effective on short texts.
External IDs:dblp:conf/ictai/NgouanfouoD25
Loading