\section{Introduction}\label{sec:intro}

Predicting the risk of an adverse event is one of the most common applications of machine learning in medicine. The ability to identify patients at risk of developing a disease, suffering a side-effect from a treatment, or seeing a drastic change in their health allows clinicians to take the appropriate steps to prevent or mitigate the negative outcome. However, many clinically important  events are highly impactful but extremely rare, resulting in imbalanced training datasets and making it difficult to train an effective model \citep{megahed2021class, he2009learning}. Yet, oftentimes we also have access to surrogate or related outcomes that are more common, yet share etiology or risk factors with the rare event(s) of interest. In such scenarios, we would like to leverage information from the more common event(s) to help us predict the rare event(s).

In this paper, we consider two separate medical contexts in which rare event prediction is challenging, but related outcomes are available to enhance learning. Our first task aims to predict stroke in patients with hypertensive disorders of pregnancy (HDP). HDP put pregnant people at high risk for stroke as well as several other severe complications, such as hypertensive crisis, that are more common \citep{meng2023maternal}. Our second task aims to predict autism at an early age based on electronic health record (EHR) data collected from birth through age 18 months. While autism diagnosis is uncommon, the condition shares clinical features and risk factors with several more common neurodevelopmental conditions, including ADHD and developmental delays. In both of these tasks, the events of interest are rare and we would like to share information across related outcomes to help improve model performance.

In such settings, it is natural to consider multi-label learning (MLL) as a way to share information between outcomes. Indeed, the concept of MLL has been studied extensively and successfully applied to many prediction tasks in medicine \citep{zhang2015predicting, li2015patient, zufferey2015performance, ge2020prediction}. However, typical MLL methods aim to improve performance across all event types and were not designed specifically to improve rare event prediction by leveraging related but more common events. Moreover, we know very little, both empirically and theoretically, about the conditions under which MLL confers benefit for rare event prediction tasks. For the same reasons, we do not know which MLL methods are well suited to the clinical scenarios previously described.

In this work, we expand upon existing MLL literature to focus on rare event prediction. Specifically, we provide insight into how event rarity and underlying similarity affect MLL performance. Motivated by this insight, we propose a variant of \textit{regularized} MLL when working with rare events. Our approach bridges early MLL literature, which largely focused on task-sharing shrinkage/priors, with more recent work, which has focused on representation learning-based methods \citep{evgeniou2004regularized, zhou2012modeling, huang2019supervised, zhu2021representation}. We show that a combination of these two may be suitable for cases where clinical events share latent risk factors but the events we are trying to model are extremely rare. 

\textbf{Contributions.} Our work contributes to the field of MLL learning and rare event prediction with a focus on its applications to medical settings. Our main contributions are as follows:

\begin{itemize}
    \item Propose a variant of regularized MLL, which we call common event tethering (\textit{CET}), that is specifically suited for rare events.
    \item Provide theoretical analysis identifying conditions on event similarity under which CET is superior to standard shrinkage estimators.
    \item Analyze the effect of event similarity and event rate on the effectiveness of CET for rare events both theoretically (Sections~\ref{sec: theory}) and via simulation (Section~\ref{sec:simulations}). 
    % \item Discuss implications of empirical results relevant to our understanding of feature learning in the multi-label setting.
    \item Demonstrate the benefits of our approach when predicting rare cardiovascular morbidities in pregnant people with HDP and predicting autism likelihood in early childhood.
\end{itemize}

This paper proceeds as follows. In Section~\ref{sec:setup}, we outline the setup of our problem statement and discuss related work. We proceed to introduce our MLL methods, which we call \textit{\sinname (CET-LR)} and \textit{\mulname (CET-NN)}, in Section~\ref{sec:method}. We then provide theoretical results on the asymptotic properties of our estimators in Section~\ref{sec: theory}.  Section~\ref{sec:simulations} uses simulation to highlight how the underlying similarity between and rate of events affects the performance of our CET methods. We ultimately highlight the benefits of these methods in our two real-world applications in Section~\ref{sec:experiments}.
