Counter-GEO-Bench: Evaluating Defenses Against Information-Distorting Generative Engine Optimization

ACL ARR 2026 May Submission16717 Authors

26 May 2026 (modified: 02 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative Engine Optimization, GEO-optimized Misinformation, Defense Benchmarking, LLM Guardrails
Abstract: Generative engine optimization (GEO) enables content producers to increase the visibility of their web pages in generative search engines, but the same techniques can deliver targeted misinformation when adversaries publish ordinary-looking GEO-optimized documents that victim large language models (LLMs) retrieve and synthesize into distorted answers. No existing benchmark evaluates defenses against this threat under controlled conditions. Therefore, we present \bench, a defense benchmark that pairs 247 human-verified, quality-gated queries with information-preserving and information-distorting GEO rewrites, and evaluates defenses on attack success rate (ASR), false positive rate, and answer quality across three victim LLMs. Under \bench, three off-the-shelf defenses (Granite Guardian, Llama Guard~3, and NeMo SelfCheck) reduce ASR by at most 5.7\% relative, while Granite Guardian's reduction is not statistically significant. Safety-taxonomy guardrails target policy violations, while GEO misinformation passes through them as fluent informational content. To this end, a lightweight benchmark baseline, C-GEO Guard, is proposed, reducing ASR by 48\% relative with near-zero utility loss, which proves threat tractable. We publicly release the code, benchmark harness, and benchmark data for research use.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation, Language Modeling, Information Retrieval and Text Mining
Contribution Types: Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
EMNLP 2026 AI Reviewing Experiment: no
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
Software: zip
Data: zip
Visa Needs: yes
Country Of Origin: CN
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Ethics Statement
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: 3.2, References
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: Ethics Statement
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: 3.2, 4.2, Ethics Statement
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: During human verification, we checked benchmark instances for private personal identifiers and gratuitously offensive content. We retain named individuals only when they appear in public web content inherited from GEO-Bench or are necessary for factual evaluation, and all generated misinformation claims are labeled for research use.
B5 Documentation Of Artifacts: Yes
B5 Elaboration: 3.2, Appendix C, Limitations
B6 Statistics For Data: Yes
B6 Elaboration: 3.2, 4.2, Appendix C, Appendix Table 11
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Appendix Table 11
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: 3.3, 4.1, 4.2, Appendix E, Appendix Table 11
C3 Descriptive Statistics: Yes
C3 Elaboration: 5.2, 5.3, Appendix H, I, J
C4 Parameters For Packages: Yes
C4 Elaboration: 3.3, 4.1, 4.2, and Appendix Table 11
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: No
D1 Elaboration: We release the our annotation package containing annotation instructions and HTML interface besides the project code in software submission.
D2 Recruitment And Payment: N/A
D2 Elaboration: No external participants or paid annotators were recruited. Full dataset was annotated by authors.
D3 Data Consent: N/A
D3 Elaboration: N/A. We did not collect new data from human participants. The source query-document instances were obtained from the publicly released GEO-Bench dataset on Hugging Face under its stated access conditions, and our use is limited to research benchmark construction.
D4 Ethics Review Board Approval: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Ethics Statement
Author Submission Checklist: yes
Submission Number: 16717
Loading