\section{Introduction}\label{sec:intro}
In the fall of $1973$, the Graduate Division of the University of California, Berkeley, made admission decisions for $12763$ applicants to its $101$ departments. The admission rate for $8442$ male applicants was approximately $44.2\%$ and for $4321$ female applicants was approximately $34.6 \%$. This disparity prompted \citet{BickelHO75} to investigate whether the Graduate Admissions Office discriminated on the basis of sex. The authors found that despite there being a statistically significant disparity in the aggregate data, when each department was examined, the per-department admission rates did not differ significantly between the sexes, thus making this case an instance of Simpson's paradox. The resolution was that the ``proportion of women applicants tends to be high in departments that are hard to get into and low in those that are easy to get into''. The disparity was therefore attributed to societal biases and the authors concluded that there was ``no pattern of discrimination on the part of the admissions committee.''  

In the fairness literature, the Berkeley graduate admissions case is a canonical example of Simpson's paradox, which illustrates the limitations of correlation-based fairness notions such as demographic parity, and therefore motivates the need for causal reasoning of fairness. \citet[Section 4.5.4]{Pearl09} analyzes the Berkeley example and frames the conclusion of \citet{BickelHO75} as discerning the direct effect of sex on admissions outcome by conditioning on the mediator, namely department choice. Most works in the fairness literature that mention the Berkeley example follow this analysis, which is predicated on the assumption that the causal model includes no latent confounders while the causal graph is akin to a simple mediation graph with sex being the treatment, department choice being the mediator and the admissions decision being the outcome. However, in both \citet{Pearl09} and \citet{PearlMackenzie18}, Pearl notes that merely conditioning on department choice might not always be appropriate. In particular, he cites a fascinating exchange between William Kruskal and Peter Bickel in \cite{FairleyMosteller77} where Kruskal objects to the analysis in \cite{BickelHO75} by pointing out that controlling for department leads to erroneous conclusions if there is a confounder that affects department choice and admissions outcome. To the best of our knowledge, subsequent works that mention the Berkeley example, do not address the latent confounder issue, including \citet{Pearl09} where the analysis assumes that the common causes are observed. Further, while there are multiple causal fairness notions proposed in the literature,\footnote{Some directly inspired by the Berkeley admissions case, for example the path-dependent counterfactual fairness notion in \citet[Appendix S4]{KusnerLRS17}.} the issue of statistical testing of these fairness notions has received little attention. 

In this work, we undertake a causal reasoning exercise centered around the Berkeley admissions case. We take the view that a causal analysis is predicated on causal modeling assumptions that define a family of causal models. A \textit{fairness notion} is either an observational, interventional, counterfactual or a graphical query on a causal model which, as a result, defines a subset of the aforementioned family of causal models, i.e., a fairness notion defines a \textit{causal hypothesis}. Given that we usually have only observational data at hand the question of fairness boils down to statistical testing of a causal hypothesis. Indeed, the Berkeley admissions data can be thought of as sampled from the joint distribution of the sex, department choice and admissions outcome.\footnote{Albeit possibly post-selection, which we don't address in this work.}

For the Berkeley admissions case, we consider multiple fairness notions based on graphical, counterfactual and interventional queries to the family of causal models defined by our causal modeling assumptions, which allows latent confounding between department choice and admissions outcome. For these notions, we develop new statistical tests. One of our key insights is that the graphical notion of fairness can be tested by using the instrumental-variable (IV) inequalities \citep{Pearl95}, thus making our proposed statistical test a new test for the IV inequalities. Conversely, \emph{any} statistical test for the IV inequalities can be used to test for fairness in settings that are analogous to the Berkeley case. In the process, we also prove a result of independent interest, namely the sharpness of the IV inequalities for the case where the instrument and the effect are binary, and the treatment takes any finite number of values. For the Berkeley example, while our proposed fairness notions correspond to different rungs of the causal hierarchy and are in general not equivalent, we show, rather surprisingly, that the tests are equivalent within the IV setting. Although our results are inspired by the Berkeley case, they can also be applied in other analogous settings, e.g.\ to investigate sex discrimination in awarding distinctions to PhD students \citep{Bol23}.

\subsection{Related Work}
The question of fairness in decision-making and predictive systems has received increased attention since the past few decades. See \citet{HutchinsonMitchell19,BarocasHN23} for an excellent historical  and technical overview, respectively. While attempts at formalizing fairness lead to correlation-based notions such as fairness through unawareness \citep{DworkHPRZ12}, demographic parity, equality of odds \citep{HardtPS16} etc., purely observational notions of fairness are at odds with each other \cite{Chouldechova17, KleinbergMR17} and are prone to erroneous conclusions. On the other hand, observational notions of fairness are readily translated to statistical tests.

Causal analysis tools such as counterfactuals and interventions provide a framework suitable for fairness. As a result, multiple general fairness notions based on counterfactuals were proposed.  \citet{KusnerLRS17} defined a counterfactual fairness notion that required invariance of the distribution of the decision in a given context, with respect to hypothetical changes in the protected attribute. \citet{NabiShpitser18} and \citet{ZhangWW17} consider path-specific effects. \citet{Chiappa19} proposes a path-specific counterfactual fairness notion and a related notion appears in the appendix of \citet{KusnerLRS17}. Another separate line of work seeks to explain observed disparity through causal discrimination mechanisms \citep{ZhangBareinboim18, plecko2022causal}.

The Berkeley graduate admissions case makes an appearance in multiple papers to motivate the need for causal fairness notions. \cite{KilbertusRPHJS17, plecko2022causal,KusnerLRS17,Chiappa19,BerkKT23} are a few among many works. In addition, the Berkeley example also serves as a motivation to introduce path-specific notions given the assumption that the direct effect of sex on admissions outcome is the only `unfair' path. Also, see \citet{BarocasHN23} for a critique of this common assumption. \citet{Pearl09} considers the Berkeley example at length and illustrates the objection to controlling for the mediator by positing an observed confounder. 

Despite the fact that most causal fairness works mention the Berkeley example, to the best of our knowledge, no previous work gives a definitive answer to the question of fairness for the Berkeley dataset under unobserved confounding. \citet{KilbertusBKWS20} discusses the impact of unmeasured confounding under restrictive parametric assumptions. \citet{ZhangBareinboim18, plecko2022causal} consider fairness models that allow for specific forms of unobserved confounding. \citet{SchroderFF24} build on this by providing sensitivity analysis on fairness of prediction models. However, the kinds of unobserved confounding that they allow affects the sensitive attribute which is different from the kind we allow for in the Berkeley dataset. 

% Other related areas under fairness that consider unobserved confounding are risk assessments \citep{RambachanCK22} and auditing \citep{ByunSOLW24}. 


% in particular, he frames the question of discrimination in causal terms--Given the presence of an indirect effect of sex on admissions outcome (through the choice of department), is there a direct effect of sex on admissions outcome? 

