Keywords: Self-Awareness, Large Language Model
Abstract: Overconfidence in large language model responses has emerged as a critical barrier for deploying these systems in high-stakes tasks such as cyber threat intelligence, financial analysis, and clinical decision support.
This issue stems from reward-optimal behavior, as LLMs are trained to produce answers even under uncertainty.
Nevertheless, most approaches in high-stakes domains continue to treat these tasks as primarily knowledge-intensive, focusing on scaling, retrieval, or fine-tuning, while leaving the problem of overconfidence unresolved.
Recent studies have begun to highlight this gap, calling for solutions that move beyond superficial calibration or knowledge expansion.
Building on these challenges, we identify self-awareness as a missing capability for LLMs in high-stakes deployment: the ability to recognize the limits of what they know and to assess the certainty of how well they know it.
To this end, we propose a framework that trains LLMs to cultivate self-awareness via reinforcement learning and decouples awareness learning from task performance, pairing it with adaptive inference-time strategies such as retrieval-augmented generation and low-confidence regeneration.
We evaluate our framework in the cybersecurity domain.
Results demonstrate that our method substantially reduces confidently wrong outputs, surpassing the strongest baseline by up to 95.6\%, while achieving competitive performance.
Our implementations are available at https://anonymous.4open.science/r/SelfAwareLLM.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 16164
Loading