Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models
Keywords: reasoning language models, safety alignment, chain of thought
Submission Number: 322
Loading
OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2026 OpenReview