The problem we study is related to OOD detection and Open-World learning \citep{ruff2021unifying}. We provide a short review of these problems below, and after defining our specific setting formally in \cref{sec:problem_setting}, we will review assumptions made in more closely related work in \cref{sec:dist_assum}. Our own assumptions are described in \cref{sec:guarantees} along with their theoretical guarantees. Then in \cref{sec:algorithm} and \cref{sec:experiments} we present our method \ours{} and its experimental evaluation.
\section{Related Work}
Our results apply to a generalized form of Novel Category Detection. Let us discuss this setting and other problems related to ours. 

\textbf{Novel Category Discovery, Open World Learning and PU Learning.} In Open World Learning \citep{parmar2023open} (and specifically Open Set Domain Adaptation \citep{Busto_2017_ICCV}) and Novel Category Detection \citep{liu2018open} the learner is given labelled data from known categories, and unlabelled data from both known and novel categories. These problems bear similarity to learning from Positive and Unlabelled data (PU-learning) \citep{bekker2020learning}, where we are given data that are labelled as positive, which in our terminology means ``from a previously observed category", or unlabelled (i.e. from both observed and novel categories). The task is then to learn a model that classifies categories, both known and novel. The main difference between these settings and our work is that prior work provides guarantees for cases where the base distribution does not shift \footnote{we elaborate on existing results in \cref{sec:dist_assum}}. We thus refer to our generalized setting as \emph{OOD} Novel Category Detection.
% In practice, our analysis motivates an algorithm that to the best of our knowledge, have not been widely explored in the literature.

\textbf{OOD Detection.} Identifying anomalous instances that are not members of previously seen classes, or out-of-support for the distribution of observed data, is a well-studied problem in ML. The main difference from our setting is that in OOD detection the learner does not observe any examples from the target distribution (e.g. COVID-19 patients and shifted baseline population in our healthcare example) at training time, and the baseline distribution is assumed to be fixed. Classic approaches for this problem include One-Class Support Vector Machines \citep{scholkopf2001estimating} and Kernel Density Estimation \citep{parzen1962estimation}, they have modern counterparts suited for flexible models such as neural networks \citep{chalapathy2018anomaly, nachman2020anomaly}. We refer the reader to \citep{ruff2021unifying} for a comprehensive survey. Since in OOD detection no data from the target distribution is observed, theoretical guarantees such as PAC-learning generalization bounds are limited. Recent work shows that it is often impossible to provide such guarantees on OOD detection, unless we make restrictive assumptions relating our hypothesis class (i.e. architecture), and the target distribution \citep{fang2022is}. In our setting, we instead allow the learner to access target data. This is a reasonable assumption in any setting where our novelty detector can adapt to newly observed data, and it lets us alleviate assumptions on the hypothesis class and to provide guarantees under shifts in the baseline distribution.

\textbf{Constrained Learning and Fairness.} Our method draws on developments in learning with data-dependent constraints, and specifically rate-constraints (e.g. constraining the amount of examples that are labelled positive) (e.g \citep{eban2017scalable, donini2018empirical}). Many of these methods were motivated by applications in fairness \citep{pmlr-v80-agarwal18a,pmlr-v65-woodworth17a,donini2018empirical}, yet general frameworks for learning with data dependent constraints are useful for many other tasks and there is growing interest in them \citep{donti2021dc3, chamon2022constrained}.
% To summarize, the setting we study which we call OOD novel category detection, incorporates a baseline distribution shift into the novel category detection problem. Therefore we provide guarantees in a setting that is somewhat simpler than the difficult OOD detection problem, but significantly extends upon the well-studied PU-learning problem.

We now turn to provide a formal definition of the OOD novel category detection problem, and an intuition to our proposed solution.