% !TEX root =  ../main.tex
\section{The network sheaf and cosheaf of causal knowledge}\label{sec:CS_CSprob}
The last step toward our proposed \emph{relativity of causal knowledge} is representing causal knowledge over the endogenous within a network.
To this aim, each node within the network is attached to the causal knowledge entailed by a certain SCM via \Cref{th:encoding_functor}.
Looking at a single node $\rho$, we interpret it as the perspective of the causal knowledge on itself.
Then, when two nodes $\rho$ and $\sigma$ are connected by an edge, say $\tau: \rho \sim \sigma$, they are related by a shared IC \alphaabs living on the edge $\tau$.
The causal knowledge at $\rho$ can be transported via $\tau$ to $\sigma$ giving life to the relative causal knowledge (RCK): \emph{the causal knowledge of $\rho$ from the perspective of $\sigma$.}\\
To make this point clearer, consider two AI agents with their subjective causal models \scm{\rho} and \scm{\sigma} about a certain phenomenon.
By \Cref{th:encoding_functor}, we can map \scm{\rho} and \scm{\sigma} to the corresponding causal knowledge, viz. $\mathsf{CK}^\rho\equiv\CK{\scm{\rho}}$ and $\mathsf{CK}^\sigma\equiv\CK{\scm{\sigma}}$.
The causal model \scm{\rho} admits an IC causal abstraction, viz. \scm{\tau}.
The same abstraction holds for \scm{\sigma}.
Acting on \scm{\tau} as well, the encoding functor provides us with a convex space $\mathsf{CK}^\tau$ which can be interpreted as a \emph{backbone space} on which the $\mathsf{CK}^\rho$ and $\mathsf{CK}^\sigma$ can relate each other.
In particular, the AI agents use the backbone space to \emph{embed} the causal knowledge of the other into their own.
Intuitively, consider that the AI agent $\rho$ \emph{projects} via an abstraction morphism an observational/interventional probability measure onto the backbone space $\mathsf{CK}^\tau$, matching an abstracted measure $\chi^{\rho,\tau}$.
The latter can be observed by the AI agent $\sigma$, who embed $\chi^{\rho,\tau}$ in its own causal knowledge obtaining the relative $\chi^{\rho,\sigma}$, namely the causal probability measure entailed by $\scm{\rho}$ from the perspective of $\scm{\sigma}$.\\
Using the categorical foundations laid in \Cref{sec:cat_scm_ck,sec:encoding_ck}, we formalize RCK through the \emph{network sheaf and cosheaf of causal knowledge}, the latter being the dual construction of the former.
Here, the \enquote{dual} nomenclature is used with a slight abuse of notation to mimic the jargon used in Abelian categories \cite{curry2014sheaves,hansen2019toward}.
These mathematical objects consist of \emph{(i)} a network, \emph{(ii)} convex spaces encoding causal knowledge on nodes and edges, and \emph{(iii)} mappings to move from nodes to edges in the case of the sheaf and vice-versa for the cosheaf. \\
First, we introduce the network as a topological object. 
Recall that the network is shaped by the existence of an IC CA between SCMs.
Such a network is shared by the network sheaf and cosheaf.
\begin{definition}[Network]\label{def:net}
    A (finite) network $G\coloneqq(\mathcal{N}, \mathcal{E})$ consists of: 
    (i) nodes $\rho \in \mathcal{N}$ homeomorphic to a point (0-dimensional open ball), 
    and (ii) edges $\tau \in \mathcal{E}$ homeomorphic to an open interval (1-dimensional open ball).
    The closure of each edge is the edge itself plus the two nodes at its boundaries, whereas the node has its closure as there are no lower-dimensional constituents of the network.
    The face incidence relation induces a partial order \faceincidenceposet on the set of nodes and edges, that is, $\rho \trianglelefteq \tau$ if and only if $\rho$ (node) belongs to the closure of $\tau$ (edge).
\end{definition}
Second, the convex spaces of probability measures are objects in \CSprob.
For the network sheaf, they are dubbed \emph{stalks} and are given by the causal knowledge on the network constituents encoded by the functor $E$.
Consider $\tau: \rho \sim \sigma$.
The stalk at $\rho$ is the image $E(\scm{\rho})$ in \CSprob of the SCM $\scm{\rho}$; similarly for $\sigma$.
The stalk at $\tau$ instead is the image $E(\scm{\tau})$ in \CSprob of the causal abstraction shared by $\rho$ and $\sigma$, viz. $\scm{\tau}$.
The role of the CA is key to relating the causal knowledge at $\rho$ and $\sigma$ as it provides a \emph{backbone space} where the causal knowledge can be compared.
For the network cosheaf instead, the convex spaces are dubbed \emph{costalks}.
Consider again $\tau: \rho \sim \sigma$.
The costalk at $\tau$, viz. $\widehat{E}(\scm{\tau})$, coincides with $E(\scm{\tau})$.
The costalk at $\rho$ instead is a convex space of probability measures $\widehat{E}(\scm{\rho})$, embedded in $E(\scm{\rho})$ as specified in the sequel.\\
Third, the mappings manifest as morphisms in the target category, viz. \CSprob.
Specifically, in the case of the network sheaf, we \emph{project} $E(\scm{\rho})$ onto $E(\scm{\tau})$ via the IC CA map, hereinafter \emph{restriction map} and denoted by $\alphamap{\myendogenous}^{\rho\trianglelefteq\tau}$.
Conversely, in the case of the network cosheaf, we \emph{embed} $E(\scm{\tau})$ into $E(\scm{\rho})$ obtaining $\widehat{E}(\scm{\rho})$ via an affine measurable map $\betamap{\myendogenous}^{\rho\trianglelefteq\tau}$, hereinafter \emph{extension map}.
At this point, we can formally define the \emph{embedded costalk of causal knowledge} as the convex space $\widehat{E}(\scm{\rho})=\langle \widehat{\Delta}_{(\myendogenousvals,\, \Omega)_\rho}, cc_{\lambda} \rangle$, where the set of probability measures $\widehat{\Delta}_{(\myendogenousvals,\, \Omega)_\rho} \subseteq \Delta_{(\myendogenousvals,\, \Omega)_\rho}$ is
\begin{equation}\label{eq:embedded_costalk_set}
    \widehat{\Delta}_{(\myendogenousvals,\, \Omega)_\rho} \coloneqq \{ \widehat{\chi}^\rho \text{ on } (\mathcal{V}, \Omega)_\rho \,:\, \alphamap{\myendogenous}^{\rho\trianglelefteq\tau}(\widehat{\chi}^\rho)=\chi^\tau \}\,,
\end{equation}
Please notice that \Cref{eq:embedded_costalk_set} and \Cref{th:ca_affine_functions} guarantee that $\widehat{E}(\scm{\rho})$ admits $E(\scm{\tau})$ as an IC CA.
We deliberately use the verbs \enquote{to project} and \enquote{to embed} to remark that in the node-edge transition formalized by the network sheaf, some of the causal knowledge idiosyncratic to the node is not transported into the backbone space; therefore, in the reverse edge-node transition formalized by the network cosheaf, it is only possible to integrate the transported causal knowledge into the causal knowledge of the node, with no guarantee of perfect reconstruction. 
This concept is essential for us to properly define the transfer of probability measures on the network.
At this point, we are ready to define the network sheaf and cosheaf of causal knowledge.
\begin{definition}[Network sheaf of causal knowledge]\label{def:net_sheaf_csprob}
    Given a network $G\coloneqq(\mathcal{N}, \mathcal{E})$ with face incidence poset $\faceincidenceposet$, a network sheaf valued in \CSprob is a functor $F: \faceincidenceposet \rightarrow \CSprob$ assigning (i) to each node $\rho$ and edge $\tau$ in \faceincidenceposet stalks consisting of convex spaces of probability measure, $E(\scm{\rho})$ and $E(\scm{\tau})$, respectively; (ii) to each node-edge incidence relation a restriction map $\alphamap{\myendogenous}^{\rho\trianglelefteq\tau}: E(\scm{\rho}) \rightarrow E(\scm{\tau})$ being the affine endogenous component within an IC \alphaabs between \scm{\rho} and \scm{\tau}.  
\end{definition}
\begin{definition}[Network cosheaf of causal knowledge]\label{def:net_cosheaf_csprob}
    Given a network $G\coloneqq(\mathcal{N}, \mathcal{E})$ with face incidence poset $\faceincidenceposet$, a network cosheaf valued in \CSprob is a functor $\widehat{F}: \faceincidenceposet^{\mathrm{op}} \rightarrow \CSprob$ assigning (i) to each node $\rho$ and edge $\tau$ in $\faceincidenceposet^{\mathrm{op}}$ costalks consisting of convex spaces of probability measure, $\widehat{E}(\scm{\rho})$ and $\widehat{E}(\scm{\tau})\equiv E(\scm{\tau})$, respectively; (ii) to each node-edge incidence relation an extension map $\betamap{\myendogenous}^{\rho\trianglelefteq\tau}: \widehat{E}(\scm{\tau}) \rightarrow \widehat{E}(\scm{\rho})$.  
\end{definition}
The network sheaf and cosheaf fulfill two complementary functions: the former transports causal knowledge from the nodes to a backbone space where such knowledge can be compared; the latter distributes causal knowledge from the backbone space to the network nodes.
\Cref{def:net_sheaf_csprob,def:net_cosheaf_csprob} are particular cases of more general network sheaf and cosheaf in \CSprob, in which every causal-related requirements on the (co)stalks and the (extension)restriction maps are dropped. For the network sheaf, we refer to a $0$-cochain as a collection of (interventional or observational) probability measures, one for each node.
The value of the $0$-cochain at the node $\rho$ is $\chi^\rho$.
Additionally, the $1$-cochain is a collection of probability measures on the edges, representing the IC CA of those on the nodes.
Hence, an induced value of the $1$-cochain at the  edge $\tau$ incident to a node $\rho$ is the pushforward probability measure $\chi^\tau=\alphamap{\myendogenous}^{\rho\trianglelefteq\tau}\left(\chi^\rho\right)$.
We denote the space of $0$- and $1$-cochains by $\mathcal{C}^0(G; F)$ and $\mathcal{C}^1(G; F)$, respectively.
An object of interest in sheaf theory \cite{curry2014sheaves} is the \emph{global section} of the network sheaf.
Loosely speaking, it is a consistent assignment of data to each node of the network that does not break local rules.
\begin{definition}[Global section]\label{def:global_section}
    Consider $F$ as in \Cref{def:net_sheaf_csprob}. A global section $\chi$ of $F$ is a choice $\chi^\rho$ in $E(\scm{\rho})$ for each node $\rho$ of $G$ such that $\chi^\tau=\alphamap{\myendogenous}^{\rho\trianglelefteq\tau}\left(\chi^\rho\right)$, for all $\rho\trianglelefteq\tau$.
    The space of global sections of $F$ is denoted by $\Gamma(G; F)$.
\end{definition}
Intuitively, in our RCK, a global section manifests itself as a collection of as many consistent (non-)interventions as the number of connected components of the network $G$. 
Specifically, consider a single connected component: according to our modeling, each pair of nodes connected by an edge admits a shared IC CA. 
Hence, choosing a consistent (non-)intervention for each node would result in a global section vanishing a certain notion of distance -- i.e., any suitable information-theoretic metric and $\phi$-divergence -- between the projected probability measures onto the edges. 
This is a direct consequence of the interventional consistency inherently ensured by the restriction maps.\\
Regarding the network cosheaf $\widehat{F}$, the $1$-chain agrees with the $1$-cochain of $F$, that is, a collection of probability measures representing the IC CA of those on the nodes.
The $0$-chain is a collection of probability measures, one for each node, satisfying the projection requirement in \Cref{eq:embedded_costalk_set}.
We denote by $\mathcal{C}^0(G; \widehat{F})$ and $\mathcal{C}^1(G; \widehat{F})$ the spaces of $0$- and $1$-chains of $\widehat{F}$.
Ultimately, we can pose the formal definition of \emph{relative causal knowledge}:
\begin{tcolorbox}[
    colback=white,
    colframe=black,
    boxrule=0.5pt,
    before=\par\smallskip\centering,
    after=\par\smallskip,
    boxsep=0pt
]
\begin{definition}[Relative Causal Knowledge]\label{def:rel_caus_know}
Given a network $G\coloneqq(\mathcal{N},\mathcal{E})$,  network sheaf and cosheaf of causal knowledge on $G$, $F$ and $\widehat{F}$, respectively, and two nodes $\rho_1$ and $\sigma_k$ connected by a path of $k$ edges $\tau_i:\rho_i \sim \sigma_i$, with $i \in [k]$, the relative causal knowledge is the causal knowledge at $\rho_1$ from the perspective of $\sigma_k$ filtered by the shared, abstract, causal knowledge on the edges $\tau_i$, $i \in [k]$ (i.e., path-dependent).
Specifically,
\begin{align}\label{eq:rel_caus_know}
        &\mathsf{CK}^{\rho_1,\sigma_k}\coloneqq\{\nonumber \\
        &\chi^{\rho_1,\sigma_k} \!=\! \betamap{\myendogenous}^{\sigma_k \trianglelefteq \tau_k} \!\circ\! \alphamap{\myendogenous}^{\rho_k \trianglelefteq \tau_k}\! \circ \! \ldots \! \circ \! \betamap{\myendogenous}^{\sigma_1 \trianglelefteq \tau_1} \!\circ \!\alphamap{\myendogenous}^{\rho_1 \trianglelefteq \tau_1} (\chi^\rho)\,,\nonumber\\
        & \,\chi^\rho \in E(\scm{\rho})\}\,.
\end{align}
\end{definition}
\end{tcolorbox}
We end the section with a working example on the transfer of causal knowledge.
Further discussion and exemplification of RCK is provided in \Cref{app:disc&ex}.

\spara{Setup.}
Consider a network with three nodes and two edges, forming a chain.
We assign to the leftmost node the causal knowledge of subject A, viz. $\mathsf{CK}^A$. Similarly, we attach to the central and rightmost nodes the causal knowledge of subject B and C, viz. $\mathsf{CK}^B$ and $\mathsf{CK}^C$.
The endogenous variables for the subjects are $\mathcal{A}=\{A_1,A_2,A_3\}$, $\mathcal{B}=\{B_1,B_2,B_3, B_4, B_5\}$, $\mathcal{C}=\{C_1,C_2,C_3\}$, composing non-intervened SMCs $\mathsf{M}^A$, $\mathsf{M}^B$, and $\mathsf{M}^C$.
Each subject does not have any information about the SCMs of the others.
$\mathsf{M}^A$ and $\mathsf{M}^B$ admit a shared CA $\mathsf{M}^{X}$ living on the shared edge, consisting of a single causal variable $X$ for simplicity. Specifically, for A the CA structure is $\{A_1, A_2\}\rightarrow X$; for B is $\{B_1,B_2,B_3\}\rightarrow X$.
$\mathsf{M}^B$ and $\mathsf{M}^C$ admit a shared CA $\mathsf{M}^{Y}$, given by a single causal variable $Y$. Here, for B the structure is $\{B_3,B_4,B_5\}\rightarrow Y$; for C is $\{C_1,C_2\}\rightarrow Y$.
For convenience, consider the restriction and extension maps, viz. $\alphamap{\myendogenous}^{\rho \trianglelefteq \tau}$ and $\betamap{\myendogenous}^{\rho \trianglelefteq \tau}$ to be linear (i.e., matrices). 
For A, we denote the restriction by $\mathbf{F}^{A\trianglelefteq X} \in \mathbb{R}^{1 \times 3}$ and the extension by $\widehat{\mathbf{F}}^{A\trianglelefteq X} \in \mathbb{R}^{3 \times 1}$. 
Similarly, for B we have $\mathbf{F}^{B\trianglelefteq X}$, $\mathbf{F}^{B\trianglelefteq Y}$ in $\mathbb{R}^{1 \times 5}$ as restrictions and $\widehat{\mathbf{F}}^{B\trianglelefteq X}$, $\widehat{\mathbf{F}}^{B\trianglelefteq Y}$ in $\mathbb{R}^{5 \times 1}$ as extensions.
Finally, for C we have $\mathbf{F}^{C\trianglelefteq X} \in \mathbb{R}^{1 \times 3}$ and $\widehat{\mathbf{F}}^{C\trianglelefteq X} \in \mathbb{R}^{3 \times 1}$.
Zero entries in the restriction and extension matrices manage the fact that some variables are not relevant to CA, a.k.a. non-constructive abstraction.

\spara{Relative Causal Knowledge.}
The subjects perform certain tasks through individual soft-intervened SCMs,
entailing the soft-intervened probability measures $\chi_\mathcal{S}^A=N(0, \boldsymbol{\Sigma}_\mathcal{S}^A)$, $\chi_\mathcal{S}^B=N(0, \boldsymbol{\Sigma}_\mathcal{S}^B)$, and $\chi_\mathcal{S}^C=N(0, \boldsymbol{\Sigma}_\mathcal{S}^C)$. 
The collection of these measures constitutes a valuation of the node stalks of the network sheaf, i.e., a $0$-cochain.
When projected onto the edges via the above CAs, the $0$-cochain entails the $1$-cochain, that is, a collection of probability measures for the edges $X$ and $Y$.
Then, \Cref{def:rel_caus_know} specifies for instance how B and C see A (i.e., $\chi_\mathcal{S}^A$) from their perspectives.
Specifically, the relative soft-intervened measures are:\\
\emph{(i)} for B, $\chi_\mathcal{S}^{A,B}=N(0,\widehat{\mathbf{F}}^{B\trianglelefteq X} \mathbf{F}^{A\trianglelefteq X}\boldsymbol{\Sigma}_\mathcal{S}^A \mathbf{F}^{{A\trianglelefteq X}^\top}\widehat{\mathbf{F}}^{{B\trianglelefteq X}^\top})$ $=N(0, \boldsymbol{\Sigma}_\mathcal{S}^{A,B})$;\\
\emph{(ii)} for C, $\chi_\mathcal{S}^{A,C}=N(0,\widehat{\mathbf{F}}^{C\trianglelefteq Y} \mathbf{F}^{B\trianglelefteq Y}\boldsymbol{\Sigma}_\mathcal{S}^{A,B}\mathbf{F}^{{B\trianglelefteq Y}^\top}\widehat{\mathbf{F}}^{{C\trianglelefteq Y}^\top})$ $=N(0, \boldsymbol{\Sigma}_\mathcal{S}^{A,C})$.

\spara{Global section.}
A key point of our framework is the possibility of investigating the global agreement among subjects in terms of causal knowledge. 
Such a global agreement gives rise to the global section in \Cref{def:global_section}.
In the working example, we have a global section when the soft-interventions performed by the subjects are such that \\
$\mathbf{F}^{A\trianglelefteq X}\boldsymbol{\Sigma}_\mathcal{S}^A \mathbf{F}^{{A\trianglelefteq X}^\top}=\mathbf{F}^{B\trianglelefteq X}\boldsymbol{\Sigma}_\mathcal{S}^B \mathbf{F}^{{B\trianglelefteq X}^\top}$ (section on $X$)\\
and\\
$\mathbf{F}^{B\trianglelefteq Y}\boldsymbol{\Sigma}_\mathcal{S}^{B}\mathbf{F}^{{B\trianglelefteq Y}^\top}=\mathbf{F}^{C\trianglelefteq Y}\boldsymbol{\Sigma}_\mathcal{S}^{C}\mathbf{F}^{{C\trianglelefteq Y}^\top}$ (section on $Y$).
