OpenReview
.net
OpenReview
.net
Login
OpenReview
.net
Login
Go to
NeurIPS 2025 Workshop WiML
homepage
Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models
Yik Siu Chan
,
Zheng Xin Yong
,
Stephen Bach
Published: 22 Sept 2025, Last Modified: 03 Jan 2026
WiML @ NeurIPS 2025
Everyone
Revisions
BibTeX
CC BY 4.0
Keywords:
reasoning language models, safety alignment, chain of thought
Submission Number:
322
Loading