Short-PHD: Detecting Short LLM-generated Text with Topological Data Analysis After Off-topic Content Insertion

Dongjun Wei; Minjia Mao; Xiao Fang; Michael Chau

Short-PHD: Detecting Short LLM-generated Text with Topological Data Analysis After Off-topic Content Insertion

Dongjun Wei, Minjia Mao, Xiao Fang, Michael Chau

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language model, zero-shot detection, short text, topological data analysis

TL;DR: We present Short-PHD, a zero-shot LLM-generated text detection method tailored for short texts.

Abstract: The malicious usage of large language models (LLMs) has motivated the detection of LLM-generated texts. Previous work in topological data analysis shows that the persistent homology dimension (PHD) of text embeddings can serve as a more robust and promising score than other zero-shot methods. However, effectively detecting short LLM-generated texts remains a challenge. This paper presents Short-PHD, a zero-shot LLM-generated text detection method tailored for short texts. Short-PHD stabilizes the estimation of the previous PHD method for short texts by inserting off-topic content before the given input text and identifies LLM-generated text based on an established detection threshold. Experimental results on both public and generated datasets demonstrate that Short-PHD outperforms existing zero-shot methods in short LLM-generated text detection. The implementation codes of this study are available online.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 273

Loading