\clearpage
% \newpage
\appendix
\onecolumn
\apptitle{\papertitle}


\section{Detailed proofs}
\label{app:proofs}

\subsection{Proof of Proposition \ref{prop:delta_omega_ii}}
\begin{proof}
Consider a terminal vertex $i$ in $\Delta$. By definition, a terminal vertex in $\Delta$ implies that there are no outgoing edges from vertex $i$ to any other vertex in the difference graph. This means that for any $l \in \Chi{1}(i) \cup \Chi{2}(i)$, the difference in the connection strengths, $B_{l,i}^{(1)} - B_{l,i}^{(2)}$, must be zero, as there is no influence from vertex $i$ to vertex $l$ in the difference graph. 

Therefore, for each $l \in [p]$, either $B_{l,i}^{(1)} = B_{l,i}^{(2)}$ or both are zero. Consequently, the product $(B_{l,i}^{(1)} + B_{l,i}^{(2)})(B_{l,i}^{(1)} - B_{l,i}^{(2)})$ becomes zero for all $l \in [p]$. As a result, every term in the sum of Equation \ref{eq:diagonal_diff_precision} is zero, leading to $\Delta_{\Omega_{i,i}} = 0$.
\end{proof}

\subsection{Proof of Proposition \ref{prop:terminal_delta}}
\begin{proof}
Assume $i$ is a terminal vertex in $G^{\cup}$. Since $G^{\cup} = G^{(1)} \cup G^{(2)}$, being a terminal vertex in $G^{\cup}$ means that vertex $i$ has no outgoing edges in both $G^{(1)}$ and $G^{(2)}$. Now, consider $\Delta$, which represent the differences in edges between $G^{(1)}$ and $G^{(2)}$. Hence $\Delta$ is a subgraph of $G^{\cup}$. Since $i$ is a terminal vertex in both $G^{(1)}$ and $G^{(2)}$, there can be no edges originating from $i$ that would be present in one graph and absent in the other. Therefore, in $\Delta$, vertex $i$ cannot have any outgoing edges, making it a terminal vertex in $\Delta$ as well.
\end{proof}

\subsection{Proof of Lemma \ref{lem:edges_to_terminals}}
\begin{proof}
Let $\Delta_{\Omega_{i,i}} = 0$ for some $i$. From Assumption \ref{ass:converse_propositions}, we get $i$ is a terminal of $G^{\cup}$, i.e. $i$ is a terminal in both SEMs. From Proposition 2 from Ghoshal et al. \cite{ghoshal2018learning}, we know that the non-diagonal entries of the precision matrix $\Omega$ of an SEM $(B, D)$ are given by:
\begin{equation*}
\Omega_{i,j} = -\frac{B_{i,j}}{\sigma_i^2} -\frac{B_{j,i}}{\sigma_j^2} + \sum_{l \in [p]} \frac{B_{l,i}B_{l,j}}{\sigma_l^2}.
\end{equation*}
So, if $i$ is a terminal in the SEM $(B, D)$ i.e. $B_{l,i}=0, \forall l$, then $\Omega_{i,j} = -\frac{B_{i,j}}{\sigma_i^2}$.

In our case $i$ is a terminal in both SEMs, therefore $\Delta_{\Omega_{i,j}} = \Omega_{i,j}^{(1)} - \Omega_{i,j}^{(2)} = \frac{-B_{i,j}^{(1)} + B_{i,j}^{(2)}}{\sigma_i^2} = -\frac{\Delta_{B_{i,j}}}{\sigma_{i}^{2}}$
\end{proof}

\subsection{Proof of Proposition \ref{prop:removal_1b1}}
\begin{proof}
\textbf{Part 1: Iterative Removal and Topological Ordering.} Let us begin by establishing that the process of iteratively removing terminal vertices one-by-one from \( \Delta \) is equivalent to removing them in reverse order of any topological ordering of \( \Delta \). 

Let $\theta$ represent an order of removing terminals of $\Delta$, i.e., $\forall i \in [p], \theta_{i}$ is the terminal removed from the remaining subgraph of $\Delta$ at $i$th step. This means that $\theta_{i}$ doesn't have any successor in the remaining subgraph of $\Delta$ at $i$th step, i.e., $\forall j > i$, $\Delta$ doesn't have an edge from $\theta_{i}$ to $\theta_{j}$. Hence making reverse of $\theta$ a topological order of $\Delta$. 

Conversely, consider any topological ordering \( \tau \) of \( \Delta \). If we remove vertices in the reverse order of \( \tau \), we always remove a terminal vertex of the remaining subgraph of \( \Delta \) at each step, i.e., $\forall m \in [p]$, \( \tau_{m} \) is a terminal in \( \Delta_{[m, \tau]} \). This is because in a topological ordering, all the successors of a vertex come after the vertex itself.

\textbf{Part 2: Converse of Proposition \ref{prop:terminal_delta} and Topological Orderings.} Assume that the converse of Proposition \ref{prop:terminal_delta} holds after every iterative removal of a terminal vertex from \( \Delta \). The converse of Proposition \ref{prop:terminal_delta} states that if a vertex is terminal in \( \Delta \), then it is also terminal in \( G^{\cup} \). This implies that the removal sequence prescribed by some topological ordering of \( \Delta \) also represents a valid topological ordering for \( G^{\cup} \). Since the choice of the topological ordering of \( \Delta \) was arbitrary, every topological ordering of \( \Delta \) must be a valid topological ordering of \( G^{\cup} \).

Conversely, suppose that every topological ordering of \( \Delta \) is a topological ordering of \( G^{\cup} \). And since $\Delta$ is a subgraph of \( G^{\cup} \), every topological ordering of \( G^{\cup} \) is also a topological ordering of $\Delta$, therefore \( G^{\cup} \) and $\Delta$ have the same set of topological orderings. Then, the iterative removal of terminal vertices from \( \Delta \) according to any of its topological orderings does not introduce a terminal vertex in \( G^{\cup} \) that is not terminal in \( \Delta \). This ensures the validity of the converse of Proposition \ref{prop:terminal_delta} throughout the iterative removal process.

\textbf{Part 3: Transitive Edges and Topological Orderings.} Finally, we prove the claim regarding transitive edges. Let \( G \) be a DAG and \( H \) be a subgraph of \( G \). The set of topological orderings of \( H \) is the same as that of \( G \) if and only if all edges of \( G \) missing in \( H \) are transitive edges of \( G \). 

Let an edge from $v$ to $u$ in \( G \) is missing in \( H \). If it is not a transitive edge of $G$, then absence of this edge in $H$ allows new topological ordering for \( H \), not valid for \( G \). One such ordering can be formed by first placing all the non-successors of $v$ in $G$, excluding $v$, in their topological order, followed by $u$, followed by $v$, last followed by all the successors of $v$ in their topological order. This is a valid topological ordering of $H$, but not for $G$ because $u$ comes before $v$. 

Conversely, if all missing edges in \( H \) are transitive in \( G \), their removal does not create new topological orderings, as there are alternative paths preserving the precedence relations. Thus, every topological ordering of \( G \) remains valid for \( H \).
\end{proof}

\subsection{Proof of Lemma \ref{lem:removal_all}}
\begin{proof}
The converse of Proposition \ref{prop:terminal_delta} implies that the set of terminals of $\Delta$ is same as the set of terminals $G^{\cup}$. The iterative process of removing terminals all-at-once can be described as:
\begin{itemize}
    \item Initially, set of terminals of $\Delta$ $=$ set of terminals of $G^{\cup}$.
    Let $L_{0}$ be the set of terminals of $\Delta$. Let $\Delta_{-L_{0}}$ be the DAG obtained after removing $L_{0}$ from $\Delta$. Similarly we define $G^{\cup}_{-L_{0}}$.
    \item Set of terminals of $\Delta_{-L_{0}}$ $=$ set of terminals of $G^{\cup}_{-L_{0}}$. 
    Let $L_{1}$ be the set of terminals of $\Delta_{-L_{0}}$. Let $\Delta_{-(L_{0}\cup L_{1})}$ be the DAG obtained after removing $L_{1}$ from $\Delta_{-L_{0}}$. Similarly we define $G^{\cup}_{-(L_{0}\cup L_{1})}$.\\
    ...
    \item Set of terminals of $\Delta_{-(\cup_{i=0}^{k-1}L_{i})}$ $=$ set of terminals of $G^{\cup}_{-(\cup_{i=0}^{k-1}L_{i})}$. 
    Let $L_{k}$ be the set of terminals of $\Delta_{-(\cup_{i=0}^{k-1}L_{i})}$ and also equal to the set of all the vertices in $\Delta_{-(\cup_{i=0}^{k-1}L_{i})}$. (Process stops!)
\end{itemize}
This iterative requirement of converse of Proposition \ref{prop:terminal_delta} is equivalent to the set of level-wise terminals of $\Delta$ and $G^{\cup}$ being the same. Here the level of a vertex is $r$ if it was removed as part of the set $L_{r}$ as described in the process above. Level of a vertex in a DAG can be defined using the recursive process as shown above or as the maximum length of a path starting from the vertex in the graph. Hence the iterative assumption of the converse of Proposition \ref{prop:terminal_delta} for simultaneous removal of terminal vertices can be stated as: levels of all vertices in $\Delta$ and $G^{\cup}$ are the same, which is equivalent to minimal topological layerieng of $\Delta$ being a valid topological layering of $G^{\cup}$.
\end{proof}

\subsection{Proof of Theorem \ref{thm:necessity_of_ass4}}
\begin{proof}
Consider the following two pairs of SEMs over three nodes:
    \begin{figure}[ht]
    \centering
    
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (-1, 0) node[circle, thick, draw](b){$v_{2}$}
        (1, 0) node[circle, thick, draw](c){$v_{3}$};
        
        \draw[line width=0.5mm,->, gray] (a) -> node[left] {$\alpha$} (b);
        \draw[line width=0.5mm,->, gray] (b) -> node[below] {-1} (c);
        \draw[line width=0.5mm,->, gray] (a) -> node[right] {$\beta$} (c);
        
        \end{tikzpicture}
        \caption{$B^{(1)}$}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (-1, 0) node[circle, thick, draw](b){$v_{2}$}
        (1, 0) node[circle, thick, draw](c){$v_{3}$};
        
        \draw[line width=0.5mm,->, gray] (a) -> node[left] {$\alpha$} (b);
        \draw[line width=0.5mm,->, gray] (b) -> node[below] {-1} (c);
        % \draw[line width=0.5mm,-, gray] (a) -> node[right] {$\beta$} (c);
        
        \end{tikzpicture}
        \caption{$B^{(2)}$}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (-1, 0) node[circle, thick, draw](b){$v_{2}$}
        (1, 0) node[circle, thick, draw](c){$v_{3}$};
        
        % \draw[line width=0.5mm,-, gray] (a) -> node[left] {$\alpha$} (b);
        % \draw[line width=0.5mm,-, gray] (b) -> node[below] {-1} (c);
        \draw[line width=0.5mm,->, gray] (a) -> node[right] {$\beta$} (c);
        
        \end{tikzpicture}
        \caption{$\Delta$}
    \end{subfigure}

    \caption{First pair of SEMs.}
    \label{fig:fst_pair_SEM}
\end{figure}
\begin{figure}[ht]
    \centering
    
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (-1, 0) node[circle, thick, draw](b){$v_{2}$}
        (1, 0) node[circle, thick, draw](c){$v_{3}$};
        
        \draw[line width=0.5mm,->, gray] (a) -> node[left] {$\beta$} (b);
        \draw[line width=0.5mm,<-, gray] (b) -> node[below] {-1} (c);
        \draw[line width=0.5mm,->, gray] (a) -> node[right] {$\alpha$} (c);
        
        \end{tikzpicture}
        \caption{$B^{(1)}$}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (-1, 0) node[circle, thick, draw](b){$v_{2}$}
        (1, 0) node[circle, thick, draw](c){$v_{3}$};
        
        % \draw[line width=0.5mm,-, gray] (a) -> node[left] {$\beta$} (b);
        \draw[line width=0.5mm,<-, gray] (b) -> node[below] {-1} (c);
        \draw[line width=0.5mm,->, gray] (a) -> node[right] {$\alpha$} (c);
        
        \end{tikzpicture}
        \caption{$B^{(2)}$}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (-1, 0) node[circle, thick, draw](b){$v_{2}$}
        (1, 0) node[circle, thick, draw](c){$v_{3}$};
        
        \draw[line width=0.5mm,->, gray] (a) -> node[left] {$\beta$} (b);
        % \draw[line width=0.5mm,-, gray] (b) -> node[below] {-1} (c);
        % \draw[line width=0.5mm,-, gray] (a) -> node[right] {$\alpha$} (c);
        
        \end{tikzpicture}
        \caption{$\Delta$}
    \end{subfigure}

    \caption{Second pair of SEMs.}
    \label{fig:snd_pair_SEM}
\end{figure}
\\
Here $\alpha, \beta$ determine the edge weights of the SEMs and $\sigma$ is the variance of the exogenous noise variables. Then the difference precision matrix for both pairs of SEMs is \[
\frac{1}{\sigma^{2}}
\begin{bmatrix}
\beta^{2} & -\beta & -\beta \\
-\beta & 0 & 0 \\
-\beta & 0 & 0 \\
\end{bmatrix}
\]
Both pairs of SEMs don't satisfy Assumption \ref{ass:level_sets}, while they do satisfy Assumption \ref{ass:dn_levels}. We directly extend this to $p$ vertices, where $p$ let's say is a multiple of 3 and every 3 consecutive nodes can correspond to one of the two choices of pairs of SEMs. This gives us an exponentially large set of $2^{\frac{p}{3}}$ pairs of SEMs each having the same difference precision matrix as shown below.
\[
\frac{1}{\sigma^{2}}
\begin{bmatrix}
\beta^{2} & -\beta & -\beta & 0 & 0 & 0 & \cdots & 0 & 0 & 0\\
-\beta & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & 0 & 0\\
-\beta & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & 0 & 0\\
0 & 0 & 0 & \beta^{2} & -\beta & -\beta & \cdots & 0 & 0 & 0\\
0 & 0 & 0 & -\beta & 0 & 0 & \cdots & 0 & 0 & 0\\
0 & 0 & 0 & -\beta & 0 & 0 & \cdots & 0 & 0 & 0\\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\
0 & 0 & 0 & 0 & 0 & 0 & \cdots & \beta^{2} & -\beta & -\beta \\
0 & 0 & 0 & 0 & 0 & 0 & \cdots & -\beta & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & \cdots & -\beta & 0 & 0 \\
\end{bmatrix}
\]
We can also make these SEMs connected by introducing an auxiliary vertex 0 which is connected to the topmost most vertex of all $\frac{p}{3}$ components. The difference precision matrix remains similar as before, only having one extra row and column of all zeros.
\end{proof} 
\subsection{Proof of Theorem \ref{thm:necessity_of_ass5}}
\label{}
\begin{proof}
Consider the following two pairs of SEMs over two nodes:
    \begin{figure}[ht]
    \centering
    
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (0, 0) node[circle, thick, draw](b){$v_{2}$};
        
        \draw[line width=0.5mm,->, gray] (a) -> node[left] {$\alpha$} (b);
        
        \end{tikzpicture}
        \caption{$B^{(1)}$}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (0, 0) node[circle, thick, draw](b){$v_{2}$};
        
        \draw[line width=0.5mm,->, gray] (a) -> node[left] {$-\alpha$} (b);
        
        \end{tikzpicture}
        \caption{$B^{(2)}$}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (0, 0) node[circle, thick, draw](b){$v_{2}$};
        
        \draw[line width=0.5mm,->, gray] (a) -> node[left] {$2\alpha$} (b);
        
        \end{tikzpicture}
        \caption{$\Delta$}
    \end{subfigure}

    \caption{First pair of SEMs.}
    \label{fig:fst_pair_SEM_2}
\end{figure}
\begin{figure}[ht]
    \centering
    
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (0, 0) node[circle, thick, draw](b){$v_{2}$};
        
        \draw[line width=0.5mm,->, gray] (b) -> node[left] {$\alpha$} (a);
        
        \end{tikzpicture}
        \caption{$B^{(1)}$}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (0, 0) node[circle, thick, draw](b){$v_{2}$};
        
        \draw[line width=0.5mm,->, gray] (b) -> node[left] {$-\alpha$} (a);
        
        \end{tikzpicture}
        \caption{$B^{(2)}$}
    \end{subfigure}
    \hfill
    \begin{subfigure}[b]{0.3\textwidth}
        \centering
        % \includegraphics[width=\textwidth]{}
        \begin{tikzpicture}
        % \selectcolormodel{gray}
        
        \draw
        (0, 1.732) node[circle, thick, draw](a){$v_{1}$}
        (0, 0) node[circle, thick, draw](b){$v_{2}$};
        
        \draw[line width=0.5mm,->, gray] (b) -> node[left] {$2\alpha$} (a);
        
        \end{tikzpicture}
        \caption{$\Delta$}
    \end{subfigure}

    \caption{Second pair of SEMs.}
    \label{fig:snd_pair_SEM_2}
\end{figure}
\\
Here $\alpha$ determine the edge weights of the SEMs and $\sigma$ is the variance of the exogenous noise variables. Then the difference precision matrix for both pairs of SEMs is \[
\frac{1}{\sigma^{2}}
\begin{bmatrix}
0 & -2\alpha \\
-2\alpha & 0 \\
\end{bmatrix}
\]
Both pairs of SEMs don't satisfy Assumption \ref{ass:dn_levels}, while they do satisfy Assumption \ref{ass:level_sets}. We directly extend this to $p$ vertices, where $p$ let's say is a multiple of 2 and every 2 consecutive nodes can correspond to one of the two choices of pair of SEMs. This gives us an exponentially large set of $2^{\frac{p}{2}}$ pairs of SEMs each having the same difference precision matrix as shown below.
\[
\frac{1}{\sigma^{2}}
\begin{bmatrix}
0 & -2\alpha & 0 & 0 & \cdots & 0 & 0\\
-2\alpha & 0 & 0 & 0 & \cdots & 0 & 0\\
0 & 0 & 0 & -2\alpha & \cdots & 0 & 0\\
0 & 0 & -2\alpha & 0 & \cdots & 0 & 0\\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & 0 & \cdots & 0 & -2\alpha \\
0 & 0 & 0 & 0 & \cdots & -2\alpha & 0 \\
\end{bmatrix}
\]
We can also make these SEMs connected by introducing an auxiliary vertex 0 which is connected to the topmost most vertex of all $\frac{p}{2}$ components. The difference precision matrix remains similar as before, only having one extra row and column of all zeros.
\end{proof}

\subsection{Proof of Theorem \ref{thm:population}}
\begin{proof}
We prove Theorem \ref{thm:population} by induction on the number of variables in the system.

\textbf{Inductive Hypothesis:} Assume that Theorem \ref{thm:population} is true for all systems with $k$ or fewer variables, for some $k\geq 0$.

\textbf{Base Case:} For $k=0$, the system has no variables, and the graphs are empty. Theorem \ref{thm:population} trivially holds in this case.

\textbf{Inductive Step:} Now, consider a system with $k+1$ variables. In the first iteration of Algorithm \ref{alg:exact_alg}, the set \( S \) corresponds to the DN level-0 of \( \Delta \). Note that $S$ is non-empty because the two SEMs share a topological ordering, therefore they have at least one common terminal, which will be in $S$. According to Assumption \ref{ass:dn_levels}, for any vertex, its DN level is greater than or equal to its topological level. Hence, \( S \) is the set of terminal vertices of \( \Delta \), as the topological level of non-terminals is at least 1. From Assumption \ref{ass:level_sets} and Lemma \ref{lem:removal_all}, these terminals are also terminals of \( G^{\cup} \). Therefore, by Lemma \ref{lem:edges_to_terminals}, Algorithm \ref{alg:exact_alg} correctly identifies the incoming edges on this layer 0, as the corresponding non-zero entries in the row/column of the \( \Delta_\Omega \). Since the variables in $S$ are terminals in both SEMs, removing them doesn't introduce any hidden confounders into the system. Thus, both SEMs remain causally sufficient Linear SEMs. Because we remove the variables in $S$ from the system all-at-once, Assumption \ref{ass:level_sets} and Assumption \ref{ass:dn_levels} still hold in the new system. Therefore, we now have a smaller system with $k$ or fewer variables, under the same conditions. Hence, by induction hypothesis, Theorem \ref{thm:population} holds for this new system, i.e. Algorithm \ref{alg:exact_alg} will correctly identify the $\Delta$ of the remaining system, and we already identified the edges to $S$. Therefore, Algorithm \ref{alg:exact_alg} correctly learns the $\Delta$ for the system on $k+1$ variables.

Therefore, by induction, Theorem \ref{thm:population} holds for any number of variables.
\end{proof}