Counter-GEO-Bench: Evaluating Defenses Against Information-Distorting Generative Engine Optimization

Counter-GEO-Bench: Evaluating Defenses Against Information-Distorting Generative Engine Optimization

ACL ARR 2026 May Submission16717 Authors

26 May 2026 (modified: 02 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generative Engine Optimization, GEO-optimized Misinformation, Defense Benchmarking, LLM Guardrails

Abstract: Generative engine optimization (GEO) enables content producers to increase the visibility of their web pages in generative search engines, but the same techniques can deliver targeted misinformation when adversaries publish ordinary-looking GEO-optimized documents that victim large language models (LLMs) retrieve and synthesize into distorted answers. No existing benchmark evaluates defenses against this threat under controlled conditions. Therefore, we present \bench, a defense benchmark that pairs 247 human-verified, quality-gated queries with information-preserving and information-distorting GEO rewrites, and evaluates defenses on attack success rate (ASR), false positive rate, and answer quality across three victim LLMs. Under \bench, three off-the-shelf defenses (Granite Guardian, Llama Guard~3, and NeMo SelfCheck) reduce ASR by at most 5.7\% relative, while Granite Guardian's reduction is not statistically significant. Safety-taxonomy guardrails target policy violations, while GEO misinformation passes through them as fluent informational content. To this end, a lightweight benchmark baseline, C-GEO Guard, is proposed, reducing ASR by 48\% relative with near-zero utility loss, which proves threat tractable. We publicly release the code, benchmark harness, and benchmark data for research use.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Resources and Evaluation, Language Modeling, Information Retrieval and Text Mining

Contribution Types: Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

EMNLP 2026 AI Reviewing Experiment: no

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

Software: zip

Data: zip

Visa Needs: yes

Country Of Origin: CN

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

A2 Elaboration: Ethics Statement

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: 3.2, References

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: Ethics Statement

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: 3.2, 4.2, Ethics Statement

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: During human verification, we checked benchmark instances for private personal identifiers and gratuitously offensive content. We retain named individuals only when they appear in public web content inherited from GEO-Bench or are necessary for factual evaluation, and all generated misinformation claims are labeled for research use.

B5 Documentation Of Artifacts: Yes

B5 Elaboration: 3.2, Appendix C, Limitations

B6 Statistics For Data: Yes

B6 Elaboration: 3.2, 4.2, Appendix C, Appendix Table 11

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Appendix Table 11

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: 3.3, 4.1, 4.2, Appendix E, Appendix Table 11

C3 Descriptive Statistics: Yes

C3 Elaboration: 5.2, 5.3, Appendix H, I, J

C4 Parameters For Packages: Yes

C4 Elaboration: 3.3, 4.1, 4.2, and Appendix Table 11

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: No

D1 Elaboration: We release the our annotation package containing annotation instructions and HTML interface besides the project code in software submission.

D2 Recruitment And Payment: N/A

D2 Elaboration: No external participants or paid annotators were recruited. Full dataset was annotated by authors.

D3 Data Consent: N/A

D3 Elaboration: N/A. We did not collect new data from human participants. The source query-document instances were obtained from the publicly released GEO-Bench dataset on Hugging Face under its stated access conditions, and our use is limited to research benchmark construction.

D4 Ethics Review Board Approval: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: Ethics Statement

Author Submission Checklist: yes

Submission Number: 16717

Loading