Human or Machine? Contrastive Learning for Detecting AI-Generated Chinese E-Commerce Reviews with a Custom Dataset
Abstract: AI-generated content proliferation in Chinese e-commerce platforms face challenges in integrity and consumer trust. While existing detection methods show promising performance within specific domains, but their cross-domain robustness remains largely unexplored for Chinese e-commerce reviews. We present the first systematic cross-domain robustness evaluation for Chinese AI-generated text detection, constructing a high-fidelity benchmark dataset using controlled synthesis methodology and developing a progressive out-of-distribution evaluation framework. Through extensive experiments across multiple detection approaches, we provide systematic analysis of cross-dataset generalization patterns. Our evaluation reveals that fine-tuned large language models, particularly Qwen-2.5-7B, achieve superior performance across all scenarios (94.8\% F1-score in-domain, 63.9\% in extreme cross-domain conditions), while contrastive learning approaches show significant performance degradation under distribution shifts. These findings provide crucial insights into detection paradigm trade-offs and cross-domain robustness challenges in practical deployment.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: AI-Generated Text Detection, Turing Test, Contrastive Learning, Cross-Domain Robustness, Large Language Models
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: Chinese
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
Data: zip
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 3.1
B2 Discuss The License For Artifacts: N/A
B2 Elaboration: The license or terms of use for the dataset and code were not discussed in the current version of the paper. This is because the artifact release process is still under internal review for compliance with institutional and ethical guidelines. We intend to finalize and include appropriate licensing details (e.g., Creative Commons or open-source licenses) at the time of public release.
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: The paper discusses the use of the ASAP corpus (Bu et al., 2021) for generating prompts and constructing AI-generated reviews. This dataset was originally created for research purposes (aspect-based sentiment analysis), and our usage aligns with its intended academic use. The new dataset derived from ASAP was used exclusively for research on AI-generated content detection. Details of dataset construction and intended research use are provided in Section 3.1 (Dataset Construction and Evaluation Framework).
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: The paper does not explicitly discuss checks for personally identifying information (PII) or offensive content because the source dataset used (ASAP corpus) consists of anonymized Chinese e-commerce reviews that do not include user metadata or identifiable personal information.
B5 Documentation Of Artifacts: Yes
B5 Elaboration: The paper provides documentation of the dataset and artifacts in Section 3.1 (Dataset Construction and Evaluation Framework).
B6 Statistics For Data: Yes
B6 Elaboration: Relevant dataset statistics are reported in Section 3.1 (Dataset Composition and Statistics) and summarized in Table 4 and Table 5.
C Computational Experiments: Yes
C1 Model Size And Budget: No
C1 Elaboration: The paper does not explicitly report the number of parameters for each model, total computational budget (GPU hours).
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 3.2
C3 Descriptive Statistics: No
C3 Elaboration: Section 4.2
C4 Parameters For Packages: No
C4 Elaboration: The paper does not specify the exact software packages or libraries (e.g., NLTK, Hugging Face, BLEU/BERTScore implementations) used for preprocessing, evaluation, or normalization due to page limiit. This information will be included in the supplementary material upon release to support reproducibility.
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: No
D3 Elaboration: The paper does not explicitly discuss consent procedures because the primary dataset used (the ASAP corpus) consists of publicly available and anonymized Chinese e-commerce reviews originally collected for research purposes. No personally identifiable information (PII) is included, and the data was used strictly within a research context. Since the reviews were previously curated and anonymized by the original dataset creators, additional consent was not obtained. Future dataset releases will consider including explicit statements on data provenance and ethical usage.
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: No
E1 Elaboration: Used as coding assistant and grammar checker
Author Submission Checklist: yes
Submission Number: 579
Loading