IntelliAsk: Learning to Ask Critical Questions with Human-Aligned Rewards

Karun Sharma; Vidushee Vats; Shengzhi LI; Yuxiang Wang; Zhongtian Sun; Prayag Tiwari

IntelliAsk: Learning to Ask Critical Questions with Human-Aligned Rewards

Karun Sharma, Vidushee Vats, Shengzhi LI, Yuxiang Wang, Zhongtian Sun, Prayag Tiwari

20 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: critical question generation, benchmark, reward modeling, dataset, human preference alignment

Abstract: Peer review relies on substantive, evidence-based questions, but existing LLM-based approaches often generate surface-level queries. We find that LLM-generated questions take over 50\% of their question tokens from a paper’s first page, while human reviewers draw on the full text. Human questions are also more insightful, showing effort and grounding, whereas LLM questions mostly reflect surface style. To address this, we extract 151k candidate questions from ICLR 2024 reviews and filter them through a multi-stage filtering process into Probe-15K, a set of 15.5k high-quality questions. From this, we create ProbeVote-500, where human annotators score questions along effort, evidence, and grounding. Using these labels, we train IntelliReward, a reward model built from a frozen Autoregressive LLM with trainable multi-head transformers over the final 50 token states. This architecture outperforms API-based SFT finetuning (Gemini 2.5 Flash, GPT-4.1) as baselines for reward. Applying DAPO with IntelliReward, we train IntelliAsk, a question-generation model aligned with human preferences and substantially stronger than existing fine-tuned review models. Finally, by releasing Probe-15K, ProbeVote-500, and IntelliReward, we provide an automatic evaluation benchmark for reviewer questions that measures groundedness, effort, and evidence.

Primary Area: datasets and benchmarks

Submission Number: 24668

Loading