\section{Discussion}\label{sec:discussion}
The Berkeley admissions case is a canonical example in the causal fairness literature. \citet{BickelHO75} reached the conclusion of there being no evidence to reject fairness, although under the unrealistic assumption of no unobserved confounding. When allowing for unobserved confounding, we arrived at a different conclusion: since there is very strong evidence that the data satisfies the IV inequalities, it is undecidable from the available data whether the admission procedure was fair or discriminated against sex.

While our analysis was centered around the Berkeley case, there are multiple aspects that generalize---a) \ifdefined \SINGLE The family of causal models we consider \else $\modelsedgerelax$ \fi can be thought of as a mediator with a confounder between mediator and outcome, which is common in mediation analysis. b) The approach of fairness notions being causal hypotheses, with respect to the class of models defined by modeling assumptions, that need to be translated into statistical tests to be useful in practice. c) The observation that for the case of inequality constraints on observational data, a straightforward Bayesian testing procedure is available.

\textbf{Generalization: Non-binary variables}: While the Berkeley dataset had a binary protected attribute and a binary decision outcome, our approach can be generalized to non-binary variables. The response-function parametrization of $\modeliv$ can be used to characterize the set of induced observed distributions of $X,Y$ and $Z$ as a convex polyhedral set. Using computer algebra, this polyhedral set can be characterized by linear inequality constraints even when the alphabet sizes are arbitrary. However, the number of linear inequality constraints quickly explodes for non-binary $Z$ (e.g. for binary $X$ and $Y$, and for $|\cZ|=2,3,4$
 and $5$, we get $12,48,160$ and $420$ 
 inequalities). Nevertheless, these sets of inequality constraints can be tested using a Bayesian testing procedure akin to the one we outline in Section~\ref{sec:bayesiantest},
 thus giving a statistical test for all our fairness notions since the statement of Theorem~\ref{thm:equivalence} holds for any alphabet size.

%This suggests that the approach of obtaining a statistical test as in Section~\ref{subsec:bounds} by a) bounding the interventional query in the interventional notion of fairness using the response-function parameterization and b) testing if $0$ belongs to the interval obtained by the bounds,  generalizes to non-binary alphabets as well. Note that this approach merely provides a test for a superset of the set of observational distributions induced by the fair causal models since $0$ belonging to the interval obtained by the bounds is only a necessary condition. Further, the set of constraints 


\textbf{Generalization: Relaxing unobserved confounding assumptions} In Section~\ref{sec:confounder} we made the assumption of no unobserved confounding between $S$ and any other variable. In Section~\ref{app:SDconf} we show that our main theorems, Theorem~\ref{thm:iv_tight} and Theorem~\ref{thm:equivalence} hold even in the case of allowing for confounding between $\sex$ and $\dept$. If unobserved confounding is allowed between $\sex$ and $\outcome$, the set of observational distributions induced by causal models where the direct effect between sex and admissions outcome is absent is the entire simplex which is the same as the set of observational distributions induced by causal models where the direct effect between sex and admissions outcome is present. Therefore, from observational data, it is not possible to distinguish between the presence/absence of a direct effect from sex to admissions outcome if we allow for confounding between sex and admissions outcome.


\textbf{Unmeasured Mediators}: Our conclusions and fairness notions are with respect to the measured variables. The presence of unmeasured mediators could change the interpretation of the results. For example, an unmeasured mediator that is not a `protected' variable, such as choice of undergraduate department, would result in a direct effect of sex on admissions outcome in the marginalized causal model, but might still be considered `fair'. Therefore, even if the data does not satisfy the IV inequalities, we can only claim the existence of the direct effect from sex to admissions outcome and not whether there is unfairness. However, if the existence of latent unprotected mediators is ruled out, then we can consider the existence of a direct effect as `unfair'. 

\textbf{Undecidability}:
Our analysis can be viewed as a Bayesian model comparison between causal models (i) without a direct effect of $S$ on $A$, vs.\ (ii) with a direct effect of $S$ on $A$. If we only have access to observational data then causal models in (ii) result in a saturated model in the observed distribution space, i.e., the set of induced observational distributions is the entire simplex, whereas for the causal models in (i), the IV inequalities hold. Therefore, satisfying the IV inequalities implies that fairness is `undecidable' since causal models in both (i) and (ii) satisfy the IV inequalities. Violating the IV inequalities, however, would imply that there is a direct effect of sex on admissions outcome. Only with the additional modeling assumption of there being no unprotected mediators between $S$ and $A$, could it then be concluded that the admissions process was `unfair'.

\textbf{Selection Bias: } \texttt{UCBAdmissions} dataset only has data from the $6$ largest departments as opposed to $85$ in \cite{BickelHO75}. Also, the fraction of female students is significantly smaller than the fraction of male students. Hence, it is plausible that (latent) selection mechanisms alter the causal model resulting in violating the assumptions of, for instance, absence of a bidirected edge in the causal graph between $\sex$ and $\outcome$ \citep{ChenZM24}. Since allowing for selection bias enlarges the model class $\modelsedge$, and given that the data satisfies the IV inequalities, we conclude that allowing for selection bias will not change our conclusion. 
%We leave a deeper analysis that takes selection bias into account as future work.
%The reasons that our statistical tests for every fairness notion either conclude  or `undecidable' are a) the set of observational distributions induced by the `fair' causal models and those induced by the `unfair' causal models intersect, and b) we use binary-outcome statistical tests based on observational data whose null hypothesis is the set of observational distributions that fair causal models induce. For example, in the case without confounding, for the interventional, counterfactual, and graphical notions of fairness, the set of observational distributions induced by the unfair causal models is the entire simplex and therefore has a nonempty intersection with the set of observational distributions induced by the fair causal models, i.e., the set of observational distributions that satisfy the conditional independence $A \indep S \mid D$. In the case where the no confounding assumption is relaxed, from Theorem~\ref{thm:equivalence}, we know that the set of observational distributions that the fair causal models induce is the set of observational distributions that satisfy the IV inequalities. Since there exist unfair causal models whose observational distributions also satisfy the IV inequalities, the aforementioned intersection is non-empty. Finally, our conclusion of undecidable/unfair is a result of having a statistical test whose null hypothesis is the set of distributions that satisfy the IV inequality.  

% \paragraph{Faithfulness Violations and Statistical Testing: }
% We proved equivalence of tests by equating the sets of observational distributions corresponding to the null hypotheses of the fairness notions at different rungs of the causal hierarchy. Given that the set difference of the fairness notions are models that violate faithfulness, the equivalence in the distribution space   
