%\documentclass{uai2023} % for initial submission
\documentclass[accepted]{uai2023} % after acceptance, for a revised
% version; also before submission to
% see how the non-anonymous paper
% would look like

%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2023} % ptmx math instead of Computer
% Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2023} % newtx fonts (improves upon
 % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams
\usepackage{mathtools}
\usepackage{bm}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{float}
\newcommand{\ind}{\perp\!\!\!\perp}
\newcommand{\nind}{\not\!\perp\!\!\!\perp}
\newtheorem{proposition}{Proposition}
\newtheorem{assumption}{Assumption} 
\newtheorem{remark}{Remark}
\newtheorem{lemma}{Lemma}
\newcommand\numberthis{\addtocounter{equation}{1}\tag{\theequation}}
% for cross referencing the main text
% PLEASE ONLY USE xr IN THE SUPPLEMENTARY MATERIAL. 
% In the main paper, hard code any cross-reference to the supplementary material. 
%\usepackage{xr} 
%\externaldocument{hochsprung_392}

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Increasing Effect Sizes of Pairwise Conditional Independence Tests between Random Vectors\\(Supplementary Material)}

% The standard author block has changed for UAI 2023 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<tom.hochsprung@dlr.de>?Subject=Your UAI 2023 paper}{Tom~Hochsprung}{}}
\author[2,1]{Jonas~Wahl$^*$}
\author[1]{Andreas~Gerhardus$^*$}
\author[2,1]{Urmi~Ninad$^*$}
\author[1,2]{Jakob~Runge}
% Add affiliations after the authors
\affil[1]{%
    Institute of Data Science\\
    German Aerospace Center\\
   Jena, Germany
}
\affil[2]{%
    Technische Universität Berlin\\
    Berlin, Germany
}
  
  \begin{document}
  
\onecolumn %% Turn this off if single column is desired for the supplement
\maketitle
\appendix
\def\thefootnote{$*$}\footnotetext{Equal contribution, order chosen uniformly at random.}
\def\thefootnote{\arabic{footnote}}
\section{Background material}
\subsection{Multivariate normal distribution}
\label{preMND}
 In the following, we recall some basic results about the multivariate normal distribution that are relevant in the context of this work \citep{Anderson2003}.\par
Let the joint distribution of $(\bm{X},\bm{Y},\bm{Z})$ be multivariate normal, i.e., $P_{\bm{X},\bm{Y},\bm{Z}} = \mathcal{N}(\bm{\mu},\bm{\Sigma})$ for some mean vector $\bm{\mu}$ and some positive definite covariance matrix $\bm{\Sigma}.$
It is a known fact that the multivariate normal distribution is closed under conditioning and marginalization, that means that if $P_{\bm{X},\bm{Y},\bm{Z}}$ is multivariate normal, then $P_{\bm{X},\bm{Y}|\bm{Z}=\bm{z}},$ $P_{\bm{X}|\bm{Z}=\bm{z}},$ $P_{\bm{Y}|\bm{Z}=\bm{z}},$
$P_{\bm{X}},P_{\bm{Y}}$ and $P_{\bm{Z}}$ are also (multivariate) normal. Furthermore, it is known that the partial correlation coefficient encapsulates the entire dependence structure between components of a multivariate normal random vector; in particular, $X_i\ind Y_j\mid\bm{Z}$ if and only if $\rho_{X_iY_j|\bm{Z}}=0.$\par 
Conditional independence is also encoded in the covariance matrices $\Sigma^{\bm{z}}$ corresponding to the distributions $P_{\bm{X},\bm{Y}|\bm{Z}=\bm{z}}.$ The fact $X_i\ind Y_j\mid\bm{Z}$ is equivalent to $\Sigma^{\bm{z}}_{ij}=0$ for all $\bm{z}.$ If $X_i\ind Y_j\mid\bm{Z}$ for all $i\in\{1,\ldots,d_X\}$ and $j\in\{1,\ldots,d_Y\},$ then each $\Sigma^{\bm{z}}$ is a block diagonal matrix, which implies that $p_{\bm{X},\bm{Y}|\bm{Z}=\bm{z}}$ factorizes for each $\bm{z}$ according to equation (2), which implies $\bm{X}\ind \bm{Y}\mid \bm{Z}.$\par
Thus, if the distribution of $(\bm{X},\bm{Y},\bm{Z})$ is multivariate normal, then statement (1) is indeed equivalent to statement (3). As the multivariate normal distribution is closed under conditioning and marginalization, the above reasoning also applies to arbitrary subvectors $\bm{X}_A$ and $\bm{Y}_B.$ Thus, if $P_{\bm{X},\bm{Y},\bm{Z}}$ is multivariate normal, then Assumption 1 is satisfied.\par
\subsection{Other sufficient conditions for Assumption 1}
\label{suffConds}
We now discuss another sufficient condition for Assumption 1 to hold. This sufficient condition arises in the area of graphical modelling.\par 
Following \citet{Pearl2009} and \citet{spirtes2000causation}, let $G=(V,E)$ be a directed acyclic graph, where $V$ is the set of vertices, and $E$ is the set of directed edges. In slight abuse of notation, we equate the vertex names with the variable names. Now, assume that the joint distribution $P_{\bm{X},\bm{Y},\bm{Z}}$ is faithful and globally Markov with respect to $G.$
Then, Assumption 1 is satisfied. \par To see this fact, note that $X_i\ind Y_j\mid \bm{Z}$ for all $i\in A$ and for all $j\in B$ in conjunction with faithfulness implies that $X_i$ and $Y_j$ are d-separated by $\bm{Z}$ for all $i\in A$ and for all $j\in B.$ This pairwise d-separation then implies that $\bm{X}_A$ and $\bm{Y}_B$ are d-separated by $\bm{Z},$ which, in conjunction with the global Markov property, implies that $\bm{X}_A\ind\bm{Y}_B\mid \bm{Z}.$
\section{More numerical experiments}
\subsection{Different significance levels in first step of sample splitting algorithm}
\label{differentAlphaPre}
We do the same simulations as in Section 5, this time, with $\alpha_{pre}=0.8$ in the first step for the algorithms that use sample splitting. Figure \ref{figextra} displays the results. \par 

A larger $\alpha_{pre}$ leads to smaller sets $Q_i$ and $Q_j'$ that are learned in the first step. In case of $\bm{\Sigma}^{(1)},$ where the dependence between $\bm{X}$ and $\bm{Y}$ is only between $X_1$ and $Y_1,$ we would expect a slightly worse performance than for $\alpha_{pre}=0.5$ because theoretically, the $Q_i$'s and $Q_j'$'s can be very large, and by learning smaller subsets, we omit some useful components that can be additionally conditioned out. We see that this theoretical reasoning indeed seems to be true; the results for $\alpha_{pre}=0.8$ (Figure \ref{figextra} in the SM) are indeed slightly worse than the results for $\alpha_{pre}=0.5$ (Figure 2 in the main paper).\par

If there are more components of $\bm{X}$ and $\bm{Y}$ that are conditionally dependent (which, for example, is the case for $\bm{\Sigma}^{(2)}$), then the effect of increasing $\alpha_{pre}$ is harder to foresee. On the one hand, some conditional independencies might be additionally omitted if $\alpha_{pre}$ is large, however, if $\alpha_{pre}$ is too small, wrong components of $\bm{X}$ and $\bm{Y}$ might be deemed conditionally independent (a Type II  error). We again see that the performance for $\bm{\Sigma}^{(2)}$ is slightly worse for $\alpha_{pre}=0.8$ (Figure \ref{figextra} in the SM) than for $\alpha_{pre}=0.5$ (Figure 2 in the main paper).

\addtocounter{figure}{+2}
\begin{figure}
	\centering
	\includegraphics[scale = 0.32]{latex/hochsprung_392-img5.png}
	\caption{Simulation results for the setting explained in Section \ref{differentAlphaPre}. The left $3$ and the right $3$ columns display the results for $\bm{\Sigma}^{(1)}$ and $\bm{\Sigma}^{(2)}$ respectively. 
 The first two rows are for $\tau = 0,$ the middle two rows for $\tau = 0.5,$ and the last two rows for $\tau = 0.9$. The abbreviation \textit{simple} stands for the approach from Section 3.1, \textit{oracle} for the approach from Section 3.2, \textit{no\_oracle\_0.2} and  \textit{no\_oracle\_0.5} for the sample split approaches from Section 3.3 with $20\%$ respectively $50\%$ of the sample used for the first part of the algorithm, and \textit{pdcor} for the partial distance correlation.}
	\label{figextra}
\end{figure}

\subsection{Simulation results for the generalized covariance measure}
\label{simuGCM}
In this section, we emphasize that the results of this paper are not only true for partial correlations and the multivariate normal distribution, but also more generally for other dependence measures and distributions. In particular, we employ the \textit{generalized covariance measure} from \citet{Shah2020TheHO} which is implemented in the R-package \textit{GeneralisedCovarianceMeasure} \citep{gcmPackage} and show that we get similar empirical results to the one in Section 5. We again restrict to the case $d_Z = 1.$ For the case $d_X = d_Y = 2,$ we use a model similar to model (d) in Section $5.2$ of \citet{Shah2020TheHO}, i.e.,
\begin{align*}
X_1 &:= \exp(-Z^2 / 2) \cdot \sin(Z) + 0.3 \cdot \eta_{1},\\
X_2 &:= \exp(-Z^2 / 2) \cdot \sin(Z) + \tau \cdot X_1 + 0.3 \cdot \eta_{2},\\
Y_1 &:= \exp(-Z^2 / 2) \cdot \sin(Z) + 0.3 \cdot \eta_{3},\\
Y_2 &:= \exp(-Z^2 / 2) \cdot \sin(Z) + \tau \cdot Y_1 + \rho\cdot X_2 + 0.3 \cdot \eta_{4},
\end{align*}
where $Z, \eta_{1},\ldots, \eta_{4}$ are independent standard normal random variables and $\rho$ and $\tau$ are real-valued parameters.\par 
For the case $d_X = d_Y = 3,$ we look at the model
\begin{align*}
X_1 &:= \exp(-Z^2 / 2) \cdot \sin(Z) + 0.3 \cdot \eta_{1},\\
X_2 &:= \exp(-Z^2 / 2) \cdot \sin(Z) + \tau \cdot X_1 + 0.3 \cdot \eta_{2},\\
X_3 &:= \exp(-Z^2 / 2) \cdot \sin(Z) + \tau \cdot X_2 + 0.3 \cdot \eta_{3},\\
Y_1 &:= \exp(-Z^2 / 2) \cdot \sin(Z) + 0.3 \cdot \eta_{4},\\
Y_2 &:= \exp(-Z^2 / 2) \cdot \sin(Z) + \tau \cdot Y_1 + 0.3 \cdot \eta_{5},\\
Y_3 &:= \exp(-Z^2 / 2) \cdot \sin(Z) + \tau \cdot Y_2 + \rho\cdot X_3 + 0.3 \cdot \eta_{6},
\end{align*}
where $Z, \eta_{1},\ldots, \eta_{6}$ are independent standard normal random variables  and $\rho$ and $\tau$ are again real-valued parameters.\par 
The parameters $\tau$ and $\rho$ have a similar meaning as in Section 5. Roughly speaking, $\tau$ characterizes the withing-group dependence, and $\rho$ characterizes the between-group dependence. We look at the cases $\tau \in \{0, 1, 2\}$ and $\rho \in\{0,0.005, \ldots, 0.15\}.$ Moreover, we consider the sample sizes $n\in\{216, 432, 864\}.$ \par 
We compare the same algorithms with the same settings as in in Section 5. Again, we employ the Bonferroni method to aggregate univariate test statistics. We again do $100$ replications for each of the above mentioned cases and plot the mean rejection rate with one standard error (see Figure \ref{figgcm} in the SM).
\begin{figure}
    \centering
	\includegraphics[scale = 0.32]{latex/hochsprung_392-img4.png}
	\caption{Simulation results for the setting explained in Section \ref{simuGCM}. The first two rows are for $\tau = 0,$ the middle two rows for $\tau = 0.5,$ and the last two rows for $\tau = 0.9$. The abbreviation \textit{simple} stands for the approach from Section 3.1, \textit{oracle} for the approach from Section 3.2, \textit{no\_oracle\_0.2} and  \textit{no\_oracle\_0.5} for the sample split approaches from Section 3.3 with $20\%$ respectively $50\%$ of the sample used for the first part of the algorithm, and \textit{pdcor} for the partial distance correlation.}
 \label{figgcm}
\end{figure}
We observe similar results as in Section 5. If the within-group dependence is relatively high, i.e., $\tau = 2,$ the algorithm that assumes a conditional independence oracle (Section 3.2) and the sample splitting algorithms (Section 3.3) perform better; if the within-group dependence is relatively low, i.e., $\tau = 0,$ there is no improvement by conditioning out already known independencies. For the case $\tau = 0,$ the sample-splitting algorithm (Section 3.3) again performs slighthly worse. The partial distance correlation test does not seem to properly control the false positive rate, so we cannot properly compare its power to the other approaches.
\section{Proofs}
\subsection{Proof of Lemma 1}
\label{proofLemma1}
Let $\mathcal{D}\in\mathbb{R}^{n\times(d_X+d_Y+d_Z)}$ be a matrix that has the $n$ rows $(\bm{X}^{(1)},\bm{Y}^{(1)},\bm{Z}^{(1)}),\ldots,(\bm{X}^{(n)},\bm{Y}^{(n)},\bm{Z}^{(n)}).$ Let $\mathcal{X}^n$ be the sample space corresponding the observations $(\bm{X}^{(1)},\bm{Y}^{(1)},\bm{Z}^{(1)}),\ldots,(\bm{X}^{(n)},\bm{Y}^{(n)},\bm{Z}^{(n)})$ (where we assume for mathematical rigor that each element of $\mathcal{X}^n$ is a real-valued matrix with $n$ rows and $d_X+d_Y+d_Z$ columns) .
Let $\psi':\mathcal{X}^n\rightarrow [0,1]$ be a (possibly randomized) test for $\mathcal{H}'_0$ at fixed sample size $n$ and let $\psi:\mathcal{X}^n\rightarrow [0,1]$ be the induced test for $\mathcal{H}_0.$ Here $1$ corresponds to certain rejection and $0$ to certain not-rejection.
Denote the set of all possible distributions $\Tilde{P}_{\bm{X},\bm{Y},\bm{Z}}$ for $(\bm{X},\bm{Y},\bm{Z})$ such that $\mathcal{H}_0$ is true by $\mathcal{P}_0.$ Similarly, we write $\mathcal{P}'_0$ for the set of all $\Tilde{P}_{\bm{X},\bm{Y},\bm{Z}}$ such that $\mathcal{H}'_0$ is true. Moreover, we write $\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}$ to denote the product measure induced by $\Tilde{P}_{\bm{X},\bm{Y},\bm{Z}}$ for the sample $\mathcal{D},$ and we write  $\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}$ to denote the expectation which is determined by $\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}.$
The size of the test $\psi$ for $\mathcal{H}_0$ for fixed sample size $n$ is $\sup_{\mathcal{P}_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi(\mathcal{D})].$ \par 
Now, assume that $\psi'$ has \textbf{valid level} for $\mathcal{H}_0'$ at sample size $n$, i.e.,
\begin{align*}
\sup_{\mathcal{P}'_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \alpha.
\end{align*}
By definition, $\mathcal{H}_0$ is rejected if and only if $\mathcal{H}'_0$ had been rejected, therefore,
\begin{align*}
\sup_{\mathcal{P}_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi(\mathcal{D})]= \sup_{\mathcal{P}_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})].
\end{align*}
Moreover, we note that $\mathcal{P}_0\subseteq\mathcal{P}'_0$ (by the discussion in Section 2.2) and thus, 
\begin{align*}
\sup_{\mathcal{P}_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \sup_{\mathcal{P}'_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \alpha,
\end{align*}
where the last inequality follows from the assumption on the size of $\psi'.$ Hence, $\psi$ has valid level for $\mathcal{H}_0$ at sample size $n$.\par
Now assume that $\psi'$ has \textbf{pointwise asymptotic level} for $\mathcal{H}_0'$, i.e., 
\begin{align*}
\sup_{\mathcal{P}'_0}\limsup_{n\rightarrow \infty}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \alpha.
\end{align*}
Now, by arguing similarly as before, for all $\Tilde{P}_{\bm{X},\bm{Y},\bm{Z}}\in \mathcal{P}_0$ we have
\begin{align*}
\limsup_{n\rightarrow \infty}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi(\mathcal{D})]= \limsup_{n\rightarrow \infty}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]
\end{align*}
and hence
\begin{align*}
\sup_{\mathcal{P}_0}\limsup_{n\rightarrow \infty}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi(\mathcal{D})]= \sup_{\mathcal{P}_0}\limsup_{n\rightarrow \infty}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})].
\end{align*}
Again, $\mathcal{P}_0\subseteq\mathcal{P}'_0$ and thus, 
\begin{align*}
\sup_{\mathcal{P}_0}\limsup_{n\rightarrow \infty}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \sup_{\mathcal{P}'_0}\limsup_{n\rightarrow \infty}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \alpha.
\end{align*}
Hence, $\psi$ has pointwise asymptotic level for $\mathcal{H}_0.$



\par Lastly, assume that $\psi'$ has \textbf{uniform asymptotic level} for $\mathcal{H}_0'$, i.e., 
\begin{align*}
\limsup_{n\rightarrow \infty}\sup_{\mathcal{P}'_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \alpha.
\end{align*}
By arguing as before, we have for all $n\in \mathbb{N}$  that
\begin{align*}
\sup_{\mathcal{P}_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi(\mathcal{D})]=\sup_{\mathcal{P}_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})].
\end{align*}
Again, $\mathcal{P}_0\subseteq\mathcal{P}'_0$ and thus, 
\begin{align*}
\sup_{\mathcal{P}_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \sup_{\mathcal{P}'_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \alpha
\end{align*}
for all $n\in \mathbb{N}.$
As the previous equality holds for all $n\in \mathbb{N},$
\begin{align*}
\limsup_{n\rightarrow \infty}\sup_{\mathcal{P}_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \limsup_{n\rightarrow \infty} \sup_{\mathcal{P}'_0}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi'(\mathcal{D})]\leq \alpha.
\end{align*}

Hence, $\psi$ has uniform asymptotic level for $\mathcal{H}_0.$
\subsection{Proof of Lemma 2}
The proof is very similar to the proof of Lemma 1, we reuse most of its notation. Let $\mathcal{P}_0''$ denote the set of all distributions $\Tilde{P}_{X, Y, Z}$ such that $X_i\ind Y_j\mid (\bm{Z}, \bm{S_{ij}})$ for all $i\in \{1,\ldots,d_X\}$ and $j\in\{1,\ldots,d_Y\}.$ As shown in the proof sketch in the main paper, $\bm{X}\ind \bm{Y}\mid \bm{Z}$ implies $X_i\ind Y_j\mid (\bm{Z}, \bm{S_{ij}})$ for all $i\in \{1,\ldots,d_X\}$ and $j\in\{1,\ldots,d_Y\},$ and hence,  $\mathcal{P}_0\subseteq\mathcal{P}_0''.$ Because of this fact, the proof of Lemma 2 is the same as the proof of Lemma 1 where $\mathcal{P}_0'$ is replaced by $\mathcal{P}_0''.$
\subsection{Proof of Lemma 3}
We reuse some notation from the proof of Lemma 1. To emphasize that the test for step $2$ depends on the chosen $Q_i$'s and $Q_j'$'s, we write
$\psi_{\{Q_i\},\{Q_j'\}}''.$ Here, $\{Q_i\}$ and $\{Q_j'\}$ are shorthand notation for the sets of all $Q_i$'s and $Q_j'$'s respectively. As before, we let $\psi$ denote the induced test procedure for $\mathcal{H}_0.$ Let $ \mathcal{P}_{0,\{Q_i\},\{Q_j'\}}''$ denote the set of all distributions $\Tilde{P}_{\bm{X},\bm{Y},\bm{Z}}$ that satisfy $X_i\ind Y_j\mid (\bm{Z}, \bm{S_{ij}})$ for all $i\in\{1,\ldots,d_X\}$ and $j\in\{1,\ldots, d_Y\},$ where the $\bm{S_{ij}}$'s are constructed using the respective $Q_i$'s and $Q_j'$'s.\footnote{Actually, $\psi_{\{Q_i\},\{Q_j'\}}''$ and $ \mathcal{P}_{0,\{Q_i\},\{Q_j'\}}''$ only depend on the $Q_i$'s and $Q_j'$'s that are chosen for some $\bm{S_{ij}}.$ For better clarification, we do not make this distinction in our notation.} To underline that the estimates in step $1$ depend on the particular sample as well, we write $\hat{S}(X_1,\mathcal{D}),\ldots,\hat{S}(X_{d_X},\mathcal{D}),\hat{S}(Y_1,\mathcal{D}),\ldots,\hat{S}(Y_{d_Y},\mathcal{D}).$ We write $A_{\{Q_i\},\{Q_j'\}}^{\mathcal{D}}$ to denote the event $\{(Q_1,\ldots,Q_{d_X},Q_1',\ldots,Q_{d_Y}') = (\hat{S}(X_1, \mathcal{D}),\ldots,\hat{S}(X_{d_X}, \mathcal{D}),\hat{S}(Y_1, \mathcal{D}),\ldots,\hat{S}(Y_{d_Y}, \mathcal{D}))\},$ i.e., the event that a particular set of $Q_i$'s and $Q_j'$'s
 has been chosen in step $1.$ The ${\mathcal{D}}$ in the superscript of the event indicates that the event is with respect to the random matrix ${\mathcal{D}}.$ We assume that all these events are measurable, which in practice, depends on the underlying estimation procedure, and we assume without loss of generality that all events have positive measure. 
\par
Now, assume that for all possible fixed $Q_1,\ldots,Q_{d_X}\subseteq \{1,\ldots,d_Y\}$ and $Q_1',\ldots,Q_{d_Y}'\subseteq \{1,\ldots,d_X\}$ we have a test in step $2$ that has \textbf{valid level} $\alpha\in(0,1)$ conditioned on the fact that $Q_1,\ldots,Q_{d_X},Q_1',\ldots,Q_{d_Y}'$ have been selected in step 1. That means, we assume that for all fixed $Q_1,\ldots,Q_{d_X}\subseteq \{1,\ldots,d_Y\}$ and $Q_1',\ldots,Q_{d_Y}'\subseteq \{1,\ldots,d_X\}$ and for all $\Tilde{P}_{\bm{X},\bm{Y},\bm{Z}}\in\mathcal{P}_{0,\{Q_i\},\{Q_j'\}}''$ that
\begin{align}
\label{condLevel}
\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D})\mid A_{\{Q_i\},\{Q_j'\}}^{\mathcal{D}}]\leq \alpha.
\end{align} 
(This notion of conditioning is similar to the one from \citet{Fithian2014}).
 By the proof sketch in the main paper, $\mathcal{P}_0\subseteq\mathcal{P}_{0,\{Q_i\},\{Q_j'\}}''$ for all $Q_1,\ldots,Q_{d_X}\subseteq \{1,\ldots,d_Y\}$ and $Q_1',\ldots,Q_{d_Y}'\subseteq \{1,\ldots,d_X\}$ (and not just for all $Q_i\subseteq S(X_i)$ and $Q_j'\subseteq S(Y_j)$) and hence, inequality \eqref{condLevel} holds for all $\Tilde{P}_{\bm{X},\bm{Y},\bm{Z}}\in \mathcal{P}_{0}$ as well. Thus, by applying the law of total expectation, we have for all $\Tilde{P}_{\bm{X},\bm{Y},\bm{Z}}\in \mathcal{P}_{0}$ that
\begin{align*}
&\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi(\mathcal{D})]\\
&=\sum_{\substack{Q_1,\ldots,Q_{d_X}\subseteq\{1,\ldots,d_Y\}\\Q_1',\ldots,Q_{d_Y}'\subseteq\{1,\ldots,d_X\}}}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi(\mathcal{D})\mid A_{\{Q_i\},\{Q_j'\}}^{\mathcal{D}}]\cdot \Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}(A_{\{Q_i\},\{Q_j'\}}^{\mathcal{D}})\\
&=\sum_{\substack{Q_1,\ldots,Q_{d_X}\subseteq\{1,\ldots,d_Y\}\\Q_1',\ldots,Q_{d_Y}'\subseteq\{1,\ldots,d_X\}}}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D})\mid A_{\{Q_i\},\{Q_j'\}}^{\mathcal{D}}]\cdot \Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}(A_{\{Q_i\},\{Q_j'\}}^{\mathcal{D}})\\
&\leq \sum_{\substack{Q_1,\ldots,Q_{d_X}\subseteq\{1,\ldots,d_Y\}\\Q_1',\ldots,Q_{d_Y}'\subseteq\{1,\ldots,d_X\}}}\alpha\cdot \Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}(A_{\{Q_i\},\{Q_j'\}}^{\mathcal{D}})\\
&=\alpha\sum_{\substack{Q_1,\ldots,Q_{d_X}\subseteq\{1,\ldots,d_Y\}\\Q_1',\ldots,Q_{d_Y}'\subseteq\{1,\ldots,d_X\}}} \Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}(A_{\{Q_i\},\{Q_j'\}}^{\mathcal{D}})\\
&=\alpha,
\end{align*}
where we used twice that the events $A_{\{Q_i\},\{Q_j'\}}^{\mathcal{D}}$ partition the underlying sample space $\mathcal{X}^n$.
Hence,
\begin{align*}
\sup_{\mathcal{P}_{0}}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi(\mathcal{D})]\leq \alpha.
\end{align*}
For the \textbf{sample splitting part} of the Lemma, we write $\mathcal{D}_1$ to denote a matrix that is constructed by an arbitrary but fixed subset of rows of the matrix $\mathcal{D},$ and we define $\mathcal{D}_2$ to contain exactly all the other rows that do not make up $\mathcal{D}_1.$ Without loss of generality, assume that $\mathcal{D}_1$ is made up of the first $n_1$ rows of $\mathcal{D}$ and $\mathcal{D}_2$ of the remaining $n_2:=n-n_1$ rows. Now, suppose that the estimates $\hat{S}(X_1,\mathcal{D}),\ldots,\hat{S}(X_{d_X},\mathcal{D}),\hat{S}(Y_1,\mathcal{D}),\ldots,\hat{S}(Y_{d_Y},\mathcal{D})$ are calculated using only the first part of the sample $\mathcal{D}_1$ and suppose that the test $\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D})$ is calculated based on $\mathcal{D}_2$ only.\par
 Under the assumption that all observations are mutually independent (i.e, under the  classical iid assumption), we get that $\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D})$ is independent of $(\hat{S}(X_1,\mathcal{D}),\ldots,\hat{S}(X_{d_X},\mathcal{D}),\hat{S}(Y_1,\mathcal{D}),\ldots,\hat{S}(Y_{d_Y},\mathcal{D})).$ Because  $\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D})$ only depends on $\mathcal{D}_2,$ we can define a test  $\phi_{\{Q_i\},\{Q_j'\}}'':\mathcal{X}^{|\mathcal{D}_2|}\rightarrow[0,1]$ that is equal to $\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D})$ in the sense that $\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D})=\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D}_1,\mathcal{D}_2)=\phi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D}_2)$ (this definition is just a technical detail, loosely speaking, we could directly write $\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D}_2)$ to indicate that the test depends on the dataset $\mathcal{D}_2$ only, but we need to make sure that the underlying domain is correct). We now obtain that
\begin{align*}
\label{sampleSplitLevel}
&\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D})\mid A_{\{Q_i\},\{Q_j'\}}^{\mathcal{D}}]\\
&=\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D})]\\
&=\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\phi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D}_2)]\\
&=\mathbb{E}_{\Tilde{P}^{\mathcal{D}_2}_{\bm{X},\bm{Y},\bm{Z}}}[\phi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D}_2)].\numberthis
\end{align*} 
The term $\mathbb{E}_{\Tilde{P}^{\mathcal{D}_2}_{\bm{X},\bm{Y},\bm{Z}}}[\phi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D}_2)]$ in \eqref{sampleSplitLevel} is the usual unconditional size of the test based on the second part of the dataset. 
Thus, if for all fixed $Q_1,\ldots,Q_{d_X}\subseteq \{1,\ldots,d_Y\}$ and $Q_1',\ldots,Q_{d_Y}'\subseteq \{1,\ldots,d_X\}$ it holds that
\begin{align*}
 \sup_{\mathcal{P}_{0,\{Q_i\},\{Q_j'\}}''}\mathbb{E}_{\Tilde{P}^{\mathcal{D}_2}_{\bm{X},\bm{Y},\bm{Z}}}[\phi_{\{Q_i\},\{Q_j'\}}''(\mathcal{D}_2)]\leq \alpha,
\end{align*}
 we obtain that 
\begin{align*}
	\sup_{\mathcal{P}_{0}}\mathbb{E}_{\Tilde{P}^{\mathcal{D}}_{\bm{X},\bm{Y},\bm{Z}}}[\psi(\mathcal{D})]\leq \alpha.
\end{align*}
\subsection{Analogous result as in Proposition 1 for partial correlations}
\label{SecParCorr}
In this section, we prove an analogous result as in Proposition 1 for partial correlations.
\addtocounter{proposition}{+3}
\begin{proposition}
\label{PropoParCorr}
    For any set of indices $Q_i\subseteq S(X_i),$
    \begin{align*}
|\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus \{j\}}}|\geq |\rho_{X_iY_j|\bm{Z}}|.
    \end{align*}
    Similarly, for any set of indices $Q_j'\subseteq S(Y_j),$
    \begin{align*}
|\rho_{X_iY_j|\bm{Z},\bm{X}_{Q_j'\setminus \{i\}}}|\geq |\rho_{X_iY_j|\bm{Z}}|
    \end{align*}
\end{proposition}
\begin{proof}
	We only prove the statement for any arbitrary but fixed $Q_i\subseteq S(X_i)$, the proof for any arbitrary but fixed $Q_j'\subseteq S(Y_j)$ is analogous.\newline
	Write $S(X_i)\setminus \{j\}= \{j_1,\ldots,j_m\},$ where $m$ is a natural number such that $1\leq m\leq d_Y - 1.$ Without loss of generality (as we can relabel the elements $j_1,\ldots,j_m$ arbitrarily), we prove the statement (and a slightly stronger statement for later use) for all sets $ \{j_1,\ldots,j_k\}\subseteq S(X_i)$ by induction over $k\in\{1,\ldots, m\}.$\footnote{Usually, a proof by induction is over the entire natural numbers. To be formally correct, we can also say that we inductively proof a statement that equals the original statement for $k\in\{1,\ldots,m\}$ and is always correct for $k>m.$} \newline
	\textbf{Induction start} ($k=1$)\textbf{:}
	First of all, note that $X_i\ind Y_{j_l}\mid \bm{Z}$ for all $j_l\in S(X_i)\setminus\{j\},$ hence, $\rho_{X_iY_{j_l}| \bm{Z}} = 0$ for all $j_l\in S(X_i).$
	Therefore, 
	\begin{align*}
	\rho_{X_iY_j|\bm{Z}, Y_{j_1}} &= \frac{\rho_{X_iY_j|\bm{Z}}-\overbrace{\rho_{X_iY_{j_1}|\bm{Z}}}^{=0}\rho_{Y_jY_{j_1}|\bm{Z}}}{\underbrace{\sqrt{1-\rho_{X_iY_{j_1}|\bm{Z}}^2}}_{=1}\sqrt{1-\rho_{Y_jY_{j_1}|\bm{Z}}^2}}\\&=\frac{\rho_{X_iY_j|\bm{Z}}}{\sqrt{1-\rho_{Y_jY_{j_1}|\bm{Z}}^2}}
	\end{align*} 
 and hence,
 \begin{align*}
|\rho_{X_iY_j|\bm{Z}, Y_{j_1}}|\geq |\rho_{X_iY_j|\bm{Z}}|
 \end{align*}
	\textbf{Induction hypothesis:} Let $k\leq m$ be an arbitrary but fixed natural number. Assume that 
	\begin{align}
	\label{IndHyp1}
	\rho_{X_iY_{j_l}|\bm{Z}, Y_{j_1},\ldots, Y_{j_{k-1}}} = 0
	\end{align}
	for all $l\in \{k,\ldots,m\}.$ Here, the notation $Y_{j_1},\ldots Y_{j_{k-1}}$ means the empty set if $k=1.$ \par
	Furthermore, assume that
	\begin{align}
	\label{IndHyp2}
	\rho_{X_iY_j|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}=\frac{\rho_{X_iY_j|\bm{Z}}}{\sqrt{1-\rho_{Y_jY_{j_1}|\bm{Z}}^2}\sqrt{1-\rho_{Y_jY_{j_2}|\bm{Z},Y_{j_1}}^2}\cdots \sqrt{1-\rho_{Y_jY_{j_k}|\bm{Z},Y_{j_1},\ldots,Y_{j_{k-1}}}^2}}
	\end{align}
 and hence that
 \begin{align*}
|\rho_{X_iY_j|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}|\geq |\rho_{X_iY_j|\bm{Z}}|.
 \end{align*}
	\newline
	\textbf{Induction step $(k\rightarrow k + 1)$:} 
	If $k=m,$ then we are done, so assume that $k<m.$
	Let $l \in \{k+1,\ldots,m\}$ be arbitrary but fixed. Then,
	\begin{align*}
	\rho_{X_iY_{j_l}|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}&=\frac{\overbrace{\rho_{X_iY_{j_l}|\bm{Z}, Y_{j_1},\ldots, Y_{j_{k-1}}}}^{=0\text{, equation \eqref{IndHyp1}}}-\overbrace{\rho_{X_iY_{j_k}|\bm{Z}, Y_{j_1},\ldots, Y_{j_{k-1}}}}^{=0\text{, equation \eqref{IndHyp1}}}\rho_{Y_{j_k}Y_{j_l}|\bm{Z}, Y_{j_1},\ldots, Y_{j_{k-1}}}}{\underbrace{\sqrt{1-\rho_{X_iY_{j_k}|\bm{Z}, Y_{j_1},\ldots, Y_{j_{k-1}}}^2}}_{=1\text{, equation \eqref{IndHyp1}}}\sqrt{1-\rho_{Y_{j_k}Y_{j_l}|\bm{Z}, Y_{j_1},\ldots, Y_{j_{k-1}}}^2}}\\
	&=\frac{0-0\cdot\rho_{Y_{j_k}Y_{j_l}|\bm{Z}, Y_{j_1},\ldots, Y_{j_{k-1}}}}{1\cdot\sqrt{1-\rho_{Y_{j_k}Y_{j_l}|\bm{Z}, Y_{j_1},\ldots, Y_{j_{k-1}}}^2}}\\
	&=0.\numberthis\label{IndStep1}
	\end{align*}
	With this, we obtain that
	\begin{align*}
	\rho_{X_iY_j|\bm{Z}, Y_{j_1},\ldots, Y_{j_{k+1}}}&=\frac{\rho_{X_iY_j|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}-\overbrace{\rho_{X_iY_{j_{k+1}}|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}}^{=0\text{, equation \eqref{IndStep1}}}\rho_{Y_jY_{j_{k+1}}|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}}{\underbrace{\sqrt{1-\rho_{X_iY_{j_{k+1}}|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}^2}}_{=1\text{, equation \eqref{IndStep1}}}\sqrt{1-\rho_{Y_jY_{j_{k+1}}|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}^2}}\\
	&=\frac{\rho_{X_iY_j|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}-0\cdot\rho_{Y_jY_{j_{k+1}}|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}}{1\cdot\sqrt{1-\rho_{Y_jY_{j_{k+1}}|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}^2}}\\
	&=\frac{\rho_{X_iY_j|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}}{\sqrt{1-\rho_{Y_jY_{j_{k+1}}|\bm{Z}, Y_{j_1},\ldots, Y_{j_k}}^2}}\\
 &=\frac{\rho_{X_iY_j|\bm{Z}}}{\sqrt{1-\rho_{Y_jY_{j_1}|\bm{Z}}^2}\sqrt{1-\rho_{Y_jY_{j_2}|\bm{Z},Y_{j_1}}^2}\cdots \sqrt{1-\rho_{Y_jY_{j_{k+1}}|\bm{Z},Y_{j_1},\ldots,Y_{j_k}}^2}}
	\end{align*}
	where the last equality follows from the induction hypothesis, see equation \eqref{IndHyp2}.
 Therefore, 
 \begin{align*}
|\rho_{X_iY_j|\bm{Z}, Y_{j_1},\ldots, Y_{j_{k+1}}}|\geq |\rho_{X_iY_j|\bm{Z}}|.
 \end{align*}
\end{proof}
\subsection{Proof of Proposition 2}
We start the proof by calculating the power corresponding to the null hypothesis $\rho_{X_iY_j| \bm{Z}}=0$. One rejects that null hypothesis if $\sqrt{n-3-|\bm{Z}|}|z(\hat{\rho}_{X_1Y_2| \bm{Z}})|>q,$ where $q:=\Phi^{-1}(1-\alpha/2)$ is the $1-\alpha/2$ - quantile of a standard normal distribution. Assume that the true $\rho_{X_iY_j|\bm{Z} }$ is fixed and not equal to zero. Note that this fact is equivalent to $z(\rho_{X_iY_j| \bm{Z}})\neq 0$. The statistical power is then
\begin{align*}
&P_{\bm{X},\bm{Y},\bm{Z}}^{\mathcal{D}}(\sqrt{n-3-|\bm{Z}|}|z(\hat{\rho}_{X_iY_j| \bm{Z}})|>q)\\
&=P_{\bm{X},\bm{Y},\bm{Z}}^{\mathcal{D}}(\sqrt{n-3-|\bm{Z}|}z(\hat{\rho}_{X_iY_j| \bm{Z}})>q \text{ or } \sqrt{n-3-|\bm{Z}|}z(\hat{\rho}_{X_iY_j| \bm{Z}})<-q)\\
&=P_{\bm{X},\bm{Y},\bm{Z}}^{\mathcal{D}}(\sqrt{n-3-|\bm{Z}|}z(\hat{\rho}_{X_iY_j| \bm{Z}})>q) + P_{\bm{X},\bm{Y},\bm{Z}}^{\mathcal{D}}(\sqrt{n-3-|\bm{Z}|}z(\hat{\rho}_{X_iY_j| \bm{Z}})<-q)\\
&=P_{\bm{X},\bm{Y},\bm{Z}}^{\mathcal{D}}(\sqrt{n-3-|\bm{Z}|}(z(\hat{\rho}_{X_iY_j| \bm{Z}})-z(\rho_{X_iY_j| \bm{Z}}))>q-\sqrt{n-3-|Z|}z(\rho_{X_iY_j| \bm{Z}}))\\
&\hspace{2cm}+ P_{\bm{X},\bm{Y},\bm{Z}}^{\mathcal{D}}(\sqrt{n-3-|\bm{Z}|}(z(\hat{\rho}_{X_iY_j| \bm{Z}})-z(\rho_{X_iY_j| \bm{Z}}))<-q-\sqrt{n-3-|Z|}z(\rho_{X_iY_j| \bm{Z}}))\\
&=:(1).
\end{align*}
Using that the left-hand-side in the probabilities is (approximately) standard normally distributed, we obtain that
\begin{align*}
(1)=P(W>q-\gamma_1)+P(W<-q-\gamma_1),
\end{align*}
where W is a standard normal random variable, $P$ is the underlying probability measure, and $\gamma_1=\sqrt{n-3-|\bm{Z}|}z(\rho_{X_iY_j| \bm{Z}})$.

By the same argument, we can calculate the power corresponding to the null hypothesis $\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus\{j\}}}=0,$ (note that this null hypothesis is true if and only if $\rho_{X_iY_j| \bm{Z}}=0$; for that, see the proof of Proposition \ref{PropoParCorr}). These calculations yield that the power (approximately) equals
\begin{align*}
(2):=P(W>q-\gamma_2)+P(W<-q-\gamma_2),
\end{align*}
where $\gamma_2 = \sqrt{n_2-3-|\bm{Z}|-|Q_i\setminus\{j\}|}z(\rho_{X_iY_j| \bm{Z},\bm{Y}_{Q_i\setminus\{j\}}}).$
The power difference $\Delta \beta$ between $(2)$  and $(1)$ then is
\begin{align*}
\Delta \beta = P(W>q-\gamma_2)+P(W<-q-\gamma_2)-P(W>q-\gamma_1)
-P(W<-q-\gamma_1).
\end{align*}
We now use our assumption that
\begin{align}
\label{assMutInfo}
I(Y_j;\bm{Y}_{Q_i\setminus\{j\}}| \bm{Z})\geq \log\biggr(\frac{z^{-1}\bigr(\sqrt{\frac{n-3-|\bm{Z}|}{n_2-3-|\bm{Z}|-|Q_i\setminus\{j\}|}}z(\rho_{X_iY_j|\bm{Z}})\bigr)}{\rho_{X_iY_j|\bm{Z}}}\biggr).
\end{align}
We start by assuming that $\rho_{X_iY_j| \bm{Z}}>0$ (which, see the proof of Proposition \ref{PropoParCorr}, is equivalent to $\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus\{j\}}}>0$). With that assumption, we can rearrange \eqref{assMutInfo} and obtain that
\begin{align*}
e^{I(Y_j;\bm{Y}_{Q_i\setminus\{j\}}| \bm{Z})}\rho_{X_iY_j|\bm{Z}}\geq z^{-1}\biggr(\sqrt{\frac{n-3-|\bm{Z}|}{n_2-3-|\bm{Z}|-|Q_i\setminus\{j\}|}}z(\rho_{X_iY_j|\bm{Z}})\biggr).
\end{align*}
Writing $Q_i\setminus\{j\}=\{j_1,\ldots,j_m\}$ and using the chain rule for conditional mutual information, we obtain that
\begin{align*}
e^{I(Y_j;Y_{j_1}| \bm{Z})+I(Y_j;Y_{j_2}| \bm{Z},Y_{j_1})+\ldots+I(Y_j;Y_{j_m}| \bm{Z},Y_{j_1},\ldots, Y_{j_{m-1}})}\rho_{X_iY_j|\bm{Z}}\geq z^{-1}\biggr(\sqrt{\frac{n-3-|\bm{Z}|}{n_2-3-|\bm{Z}|-|Q_i\setminus\{j\}|}}z(\rho_{X_iY_j|\bm{Z}})\biggr).
\end{align*}
Further rearrangements yield
\begin{align*}
e^{I(Y_j;Y_{j_1}| \bm{Z})}\cdot e^{I(Y_j;Y_{j_2}| \bm{Z},Y_{j_1})}\cdots e^{I(Y_j;Y_{j_m}| \bm{Z},Y_{j_1},\ldots, Y_{j_{m-1}})}\rho_{X_iY_j|\bm{Z}}\geq z^{-1}\biggr(\sqrt{\frac{n-3-|\bm{Z}|}{n_2-3-|\bm{Z}|-|Q_i\setminus\{j\}|}}z(\rho_{X_iY_j|\bm{Z}})\biggr).
\end{align*}
Recall that for multivariate normal distributions, $e^{I(Y_j;Y_{j_1}| \bm{Z})}=1/\sqrt{1-\rho_{Y_jY_{j_1}| \bm{Z}}^2}$ (analogously for the other terms), which then yields that
\begin{align*}
\frac{\rho_{X_iY_j|\bm{Z}}}{\sqrt{1-\rho_{Y_jY_{j_1}| \bm{Z}}^2}\cdot \sqrt{1-\rho_{Y_jY_{j_2}| \bm{Z},Y_{j_1}}^2}\cdots \sqrt{1-\rho_{Y_jY_{j_m}| \bm{Z},Y_{j_1},\ldots, Y_{j_{m-1}}}^2}}\geq z^{-1}\biggr(\sqrt{\frac{n-3-|\bm{Z}|}{n_2-3-|\bm{Z}|-|Q_i\setminus\{j\}|}}z(\rho_{X_iY_j|\bm{Z}})\biggr).
\end{align*}
Because $X_i\ind Y_{j_l}\mid \bm{Z}$ for all $l\in\{1,\ldots,m\},$ we can (by looking at the proof of Proposition \ref{PropoParCorr}) simplify the left-hand side of the previous inequality to
\begin{align*}
\frac{\rho_{X_iY_j|\bm{Z}}}{\sqrt{1-\rho_{Y_jY_{j_1}| \bm{Z}}^2}\cdot \sqrt{1-\rho_{Y_jY_{j_2}| \bm{Z},Y_{j_1}}^2}\cdots \sqrt{1-\rho_{Y_jY_{j_m}| \bm{Z},Y_{j_1},\ldots, Y_{j_{m-1}}}^2}}=\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus\{j\}}}.
\end{align*}
Thus,
\begin{align*}
\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus\{j\}}}\geq z^{-1}\biggr(\sqrt{\frac{n-3-|\bm{Z}|}{n_2-3-|\bm{Z}|-|Q_i\setminus\{j\}|}}z(\rho_{X_iY_j|\bm{Z}})\biggr).
\end{align*}
Further rearrangement then yields that
\begin{align*}
\sqrt{n_2-3-|\bm{Z}|-|Q_i\setminus\{j\}|}z(\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus\{j\}}})\geq \sqrt{n-3-|\bm{Z}|}z(\rho_{X_iY_j|\bm{Z}})
\end{align*}
which implies that
\begin{align*}
0<\gamma_1\leq \gamma_2.
\end{align*}
Thus,
\begin{align*}
\Delta\beta &= P(q-\gamma_2 < W < q-\gamma_1)-P(-q-\gamma_2 < W < -q-\gamma_1)\\
&=P(\gamma_1-q < W < \gamma_2-q)-P(\gamma_1+q < W < \gamma_2+q).
\end{align*}
To see that $\Delta \beta$ is positive, we make a case distinction.
First of all note that $\gamma_2+q$ and $\gamma_1+q$ are always positive because we assumed that $\rho_{X_iY_j|\bm{Z}}$ is positive, and hence (see the proof of Proposition \ref{PropoParCorr}), $\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus\{j\}}}$ is positive.

If $\gamma_1-q\geq 0,$ both probabilities are integrals of the standard normal density over intervals which are a subset of the nonnegative part of the real line. As the standard normal density is strictly decreasing on the nonnegative real line and both the interval $(\gamma_1-q,\gamma_2-q)$ and $(\gamma_1+q,\gamma_2+q)$ have the same length, the first probability is larger than the second probability.

If $\gamma_1-q< 0$ and $\gamma_2-q\geq 0,$ the interval $(\gamma_1+q,\gamma_2+q)$ is further away from zero than the interval $(\gamma_1-q,\gamma_2-q);$ and hence, the standard normal density takes smaller values on $(\gamma_1+q,\gamma_2+q)$ than on $(\gamma_1-q,\gamma_2-q).$ As again both integrals have the same length, the first probability is larger than the second probability.

The last case is $\gamma_2-q<0$ (and hence, $\gamma_1-q<0$). Note that $\gamma_2-q$ is closer to zero than $\gamma_1+q,$ because
\begin{align*}
&|\gamma_2-q|< |\gamma_1+q|\\
&\Longleftrightarrow q-\gamma_2< \gamma_1+q\\
&\Longleftrightarrow \gamma_1+\gamma_2>0,
\end{align*}
where the last line is always true by definition of $\gamma_1$ and $\gamma_2$. Hence, the interval $(\gamma_1-q,\gamma_2-q)$ is again closer to zero than the interval $(\gamma_1+q,\gamma_2+q),$ and hence, the first probability is again greater than the second probability.

Thus, in all cases, we have $\Delta \beta \geq 0.$

If $\rho_{X_iY_j|\bm{Z}}$ is negative, then one can proceed analogously and 
obtain that $\gamma_2\leq \gamma_1<0.$ It then follows that
\begin{align*}
\Delta \beta = P(-q-\gamma_1<W<-q-\gamma_2)-P(q-\gamma_1<W<q-\gamma_2).
\end{align*}
By a similar case distinction, one then obtains that $\Delta \beta \geq 0.$ Thus, we have proved the proposition.

\subsection{Proof of Proposition 3}
Let both tests corresponding to the respective null hypotheses $\rho_{X_iY_j|\bm{Z}}=0$ and $\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus \{j\}}}=0$ achieve a power of exactly $\beta.$ From the proof of Proposition 2, we use the formulas for calculating the power. We start by assuming  that $\rho_{X_iY_j|\bm{Z}}>0$ (equivalent to $\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus \{j\}}}>0$, see proof of Proposition \ref{PropoParCorr}) and obtain that
for the test corresponding to the null hypothesis $\rho_{X_iY_j|\bm{Z}}=0$,
\begin{align*}
\beta &= P(W>q-\gamma_1)+P(W < -q-\gamma_1)\\
&\leq P(W>q-\gamma_1)+P(W<-q)\\
&=P(W>q-\gamma_1)+\frac{\alpha}{2}.
\end{align*}
Rearranging terms yields
\begin{align*}
\gamma_1\geq q-\Phi^{-1}(1-\beta+\alpha/2).
\end{align*}
Note that the term on the right-hand side is positive because we assumed that $\beta\geq \alpha.$
Plugging in the definition of $\gamma_1$ and further rearrangement now yields that
\begin{align}
\label{ineq:1}
n\geq \biggr(\frac{q-\Phi^{-1}(1-\beta + \frac{\alpha}{2})}{z(\rho_{X_iY_j|\bm{Z}})}\biggr)^2 +3 + |\bm{Z}|.
\end{align}

Similarly, for the test corresponding to the null hypothesis $\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus \{j\}}}=0$,
\begin{align*}
\beta = P(W>q-\gamma_2)+P(W<-q-\gamma_2)\geq P(W>q-\gamma_2).
\end{align*}
Rearranging these terms yields 
\begin{align*}
q-\Phi^{-1}(1-\beta)\geq \gamma_2\geq 0.
\end{align*}
Plugging in the definition of $\gamma_2$ and rearranging yields
\begin{align}
\label{ineq:2}
n_2\leq \biggr(\frac{q-\Phi^{-1}(1-\beta)}{z(\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus \{j\}}})}\biggr)^2 +3 + |\bm{Z}|+|Q_i\setminus \{j\}|.
\end{align}

One can work similarly for the case $\rho_{X_iY_j|\bm{Z}}<0$ (equivalent to $\rho_{X_iY_j|\bm{Z},\bm{Y}_{Q_i\setminus \{j\}}}<0$, see the proof of Proposition \ref{PropoParCorr}) and obtain the same lower bound on $n$ and upper bound on $n_2.$

Putting together \eqref{ineq:1} and \eqref{ineq:2} then yields the lower bound on $n-n_2$ stated in the proposition.
 

\appendix

\bibliography{hochsprung_392}

\end{document}
