\vspace{-0.5em}
\section{Related Work}
\vspace{-0.5em}
\label{sec:related-work}
The work of \citet{frohlich2024scoring} is most closely related to ours. They also explore the generalization of proper scoring rules to imprecise forecasts, with a specific emphasis on calibration~\citep{dawid1982well}. While their focus is on imprecisions arising from data models, we address more general issues related to the elicitation of imprecise forecasts. Their findings demonstrate that, unlike in precise settings where proper scoring and calibration objectives align, these goals can diverge when dealing with imprecise forecasts—a result that parallels our own. However, their reliance on the min-max aggregation within their scoring framework limits their analysis to pessimistic decision-making, resulting in a scoring rule that only satisfies properness. 


% \citet{frohlich2024scoring} have recently generalised proper scoring rules to imprecise probabilities and applied them to calibration \citep{dawid1982well}\textcolor{red}{[REFS: more recent papers]}. Their work develops a framework in which the imprecision is embedded within the data model and decision problem, unlike ours, where we employ a more subjective discussion of imprecise forecast elicitation. Their results reveal that, unlike in the precise case where proper scoring and calibration align, these two goals can diverge for imprecise forecasts—offering new insights into the design of loss functions for distributionally robust optimisation (DRO). However, the exclusive use of the min-max rule within their scoring rule framework confines their analysis to the pessimistic aggregation of epistemic uncertainty.

Impossibility results show that no continuous scoring rule over credal sets can satisfy strict incentive compatibility, calibration, and non-domination simultaneously~\citep{seidenfeld2012forecasting, mayo2015accuracy, schoenfield2017accuracy}. \citet{seidenfeld2012forecasting} proved that such rules must either weaken incentive compatibility or permit domination by precise forecasts. \citet{mayo2015accuracy} highlighted that these trade-offs can inadvertently reward false precision, while \citet{schoenfield2017accuracy} showed that any continuous rule is either constant or fails to calibrate in natural decision contexts. While our approach partly mitigates these issues, these impossibility results still constrain deterministic methods. Some view the lack of imprecise scoring rules analogous to precise ones as a fundamental trait of imprecision \citep{konek2015}. Building on this, \citet{konek2019} proposes a family of IP scoring rules based on the Hurwicz criterion, extended by \citet{konek2023} to formalize precision–robustness trade-offs axiomatically. Since the Hurwicz criterion yields Pareto-efficient aggregation, our results in Section~\ref{sec:imprecisescoringrules} directly apply to their framework, offering a social choice lens on these trade-offs. 

% It has been shown through several impossibility results that no continuous scoring rule defined over credal sets can simultaneously guarantee strict incentive compatibility, calibration, and non–domination \citep{seidenfeld2012forecasting, mayo2015accuracy, schoenfield2017accuracy}. \citet{seidenfeld2012} demonstrated that any such scoring rule must either relax incentive compatibility or permit some imprecise forecasts to be dominated by more precise ones. \citet{mayo2015accuracy} showed that the classical trade-offs force these rules to inadvertently reward false precision, while \citet{schoenfield2017accuracy} proved that any continuous rule either degenerates or fails to calibrate properly in natural decision scenarios. Notably, while randomised tailored scoring rules can sidestep some of these issues by operating over aggregations, the impossibility results remain a fundamental barrier for deterministic formulations.


%Alternatively, some argue that the impossibility of imprecise scoring rules analogous to precise ones reflects an inherent feature of imprecision \citep{konek2015}.
%Building on this, \citet{konek2019} introduces a family of imprecise probability (IP) scoring rules parameterized by the Hurwicz criterion, and \citet{konek2023} extends these ideas to formalize the trade-offs between precision and robustness through axiomatic foundations. Since the Hurwicz criterion represents a Pareto-efficient aggregation, our results in Section~\ref{sec:imprecisescoringrules} apply directly to their framework, offering insights into precision-robustness trade-offs from a social choice perspective.



Finally, our work is uniquely positioned at the intersection of proper scoring rules, forecast elicitation, and machine learning, providing novel perspectives on decision-making under uncertainty. Credal sets have become a mainstream approach for representing modelers’ imprecision with applications in prediction~\citep{singh2024domain, caprio_credal_2024}, uncertainty quantification~\citep{sale_is_2023,wang2024credal}, optimal transport~\citep{caprio2024optimal}, statistical hypothesis testing~\citep{chau2025credal}, and statistical distances~\citep{chau2025integral}, among others. To this end, our results concerning strictly proper scoring rules for credal sets are directly relevant to the challenges of learning and decision-making with credal sets, providing insights into fundamental problems and future research directions.

% Another line of work has argued that the impossibility of imprecise scoring rules that are directly analogous to those for precise distributions is not a shortcoming but rather an inherent feature of imprecision. \citet{konek2015} argues from an epistemological perspective that imprecise credences should embody a form of conservativeness, reflecting a cautious commitment to evidence by maintaining a range of probabilities. \citet{konek2019} introduces a family of IP scoring rules with a choice of hyper-parameter $\alpha$ (Hurwicz criterion),  More recently, \citet{konek2023} extends these ideas to capture the trade-offs between precision and robustness by trying to build axiomatic foundations of imprecise scoring rules. Since Hurwicz criterion is a Pareto efficient aggregation, our results in section~\ref{sec:imprecisescoringrules} are directly applicable to their setting, allowing for the interpretation of trade-offs in precision and robustness from social choice and collective decision-making perspective. 