What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Question Generation

What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Question Generation

ACL ARR 2025 July Submission809 Authors

28 Jul 2025 (modified: 05 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) are increasingly widely used as critical components of knowledge retrieval systems and agentic systems. These systems can benefit from knowledge-seeking capabilities of LLMs, in other words, curiosity. However, this capability has not been evaluated quantitatively. Towards bridging this gap, we propose an evaluation framework, CDQG (Curiosity-Driven Question Generation). The CDQG task prompts LLMs to generate questions about a statement introducing scientific knowledge, simulating a curious person when facing the statement for the first time. The CDQG dataset contains 1,988 statements including physics, chemistry, and mathematics with distinct levels of difficulty, general knowledge statements, and intentionally erroneous statements. We score the qualities of the questions generated by LLMs along multiple dimensions. These scores are validated by rigorous controlled ablation studies and human evaluations. While large models like GPT-4 and Mistral 8x7b can generate highly coherent and relevant questions, the smaller Phi-2 model is equally or more effective. This indicates that size does not solely determine a model’s knowledge acquisition potential. CDQG quantifies a critical model capability, and opens up research opportunities for developing future knowledge retrieval systems driven by LLMs.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: Questioning, Curiosity, Evaluation, Science

Contribution Types: Model analysis & interpretability

Languages Studied: English

Previous URL: https://openreview.net/forum?id=i5BN85npOy

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: Yes, I want a different area chair for our submission

Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)

Justification For Not Keeping Action Editor Or Reviewers: The meta reviewer didn't consider the rebuttal discussion, the reviewer haven't replied to our rebuttals, we did justify the things mentioned, but neither reviewer not meta reviewer consider any of that. We also reported the meta-reviewer but no action was taken.

Data: zip

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: We created the CDQG dataset and the CDQG evaluation framework, methodology section of the papers details it.

B2 Discuss The License For Artifacts: No

B2 Elaboration: We created the dataset and not used it from somewhere else.

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: section 3

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: The data is research papers

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Section 4

B6 Statistics For Data: Yes

B6 Elaboration: Section 4

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 5 and Appendix

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: 3

C3 Descriptive Statistics: Yes

C3 Elaboration: 6

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: Yes

D1 Elaboration: We have given the instructions in person, our annotators size was small

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: Yes

D5 Elaboration: 3

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: No

E1 Elaboration: Used for coding and grammar corrections in writing

Author Submission Checklist: yes

Submission Number: 809

Loading