Can Large Language Models Find Connections between Social Beliefs?

Can Large Language Models Find Connections between Social Beliefs?

ACL ARR 2025 July Submission1119 Authors

29 Jul 2025 (modified: 20 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Understanding how people’s perspectives on different issues change in correspondence with one another is essential for modeling collective reasoning and social dynamics. However, this problem remains underexplored due to the absence of standardized benchmarks and evaluation protocols. In this work, we introduce \textsc{BeliefBench}, a new benchmark for evaluating whether large language models (LLMs) can detect when shifts in beliefs about one real-world event are accompanied by corresponding shifts in beliefs about another. The benchmark is constructed from Polymarket, a prediction market platform where event probabilities are updated daily, reflecting crowd belief over time. We formulate a classification task in which event pairs are labeled based on a combination of time-series co-movement, semantic similarity, and other metadata. Label quality is validated by human annotators. Our evaluation reveals two key findings: (1) LLMs consistently outperform heuristic and neural baselines in identifying meaningful belief correlations across diverse domains; (2) Chain-of-Thought prompting improves performance in settings that require multi-step reasoning, such as politics and elections, but can hurt performance in domains where surface-level signals are more predictive. \textsc{BeliefBench} thus provides a challenging testbed for evaluating how well LLMs capture the co-evolution of perspectives and the underlying temporal and causal reasoning processes.

Paper Type: Long

Research Area: Computational Social Science and Cultural Analytics

Research Area Keywords: Social Belief Modeling, Large Language Models (LLMs), Multi-hop Reasoning

Contribution Types: Model analysis & interpretability

Languages Studied: English

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

A2 Elaboration: Section 10 and Section 11

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Appendix A

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: Appendix A

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Appendix A

B4 Data Contains Personally Identifying Info Or Offensive Content: Yes

B4 Elaboration: See Appendix A. The dataset contains no PII or offensive content. All data is event-level and anonymized.

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Appendix A

B6 Statistics For Data: Yes

B6 Elaboration: Appendix A

C Computational Experiments: Yes

C1 Model Size And Budget: No

C1 Elaboration: We used black-box APIs (e.g., OpenAI GPT-4, Claude 3, Gemini 1.5) without direct access to model parameters or FLOPs. This is explained in Appendix A.

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 5, Section 6, Appendix B

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 6

C4 Parameters For Packages: No

C4 Elaboration: We did not use traditional NLP libraries (e.g., NLTK, ROUGE); our evaluations used custom scripts and API outputs.

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: Yes

D1 Elaboration: Appendix G

D2 Recruitment And Payment: Yes

D2 Elaboration: Appendix G

D3 Data Consent: Yes

D3 Elaboration: Appendix G

D4 Ethics Review Board Approval: No

D4 Elaboration: The study used publicly available event-level data with no PII. As such, no IRB approval was required or sought, and this was deemed exempt under our institution’s guidelines.

D5 Characteristics Of Annotators: Yes

D5 Elaboration: Appendix G.

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: Appendix E

Author Submission Checklist: yes

Submission Number: 1119

Loading