
\subsection{Prerequisites}
Throughout, we use the term \emph{group} to refer to a set of scalar random variables that all belong to the same random vector. For integer \(p\) we define \([p] = \{1, \ldots, p\}\). Further, define \(\mathbf{X} = (\mathbf{X}_1, \ldots, \mathbf{X}_p)\) as a tuple of \(p\) random vectors, where each \(\mathbf{X}_g = (X_1^{(g)}, \ldots, X_{d_g}^{(g)})\) is a vector in \(\mathbb{R}^{d_g}\), with \(d_g \in \mathbb{N}\), for \(g \in [p]\). We refer to \(\mathbf{X}_g\) as the \(g\)-th group. Denote by \(P_{\mathbf{X}}\) the joint distribution on \(\mathbb{R}^{d_1} \times \cdots \times \mathbb{R}^{d_p}\). If it exists, we denote the (joint) density of \(\mathbf{X}\) by \(p_{\mathbf{X}}\).
Let \(\mathcal{G} = (V,E)\) be a directed acyclic graph (DAG) with vertex set \(V = [p]\) and edges \(E \subseteq V \times V\).
The vertex set indexes the set of random vectors
in \(\mathbf{X}\), and adopting the framework of graphical models, we assume that \(P_\mathbf{X}\) factorizes according to \(\mathcal{G}\) as follows
\begin{equation*}
  P_{\mathbf{X}}(\mathbf{X}) = \prod_{g=1}^p P_{\mathbf{X}_g}(\mathbf{X}_g \mid \mathbf{X}_{pa_{\mathcal{G}}(g)}),
\end{equation*}
where \(pa_{\mathcal{G}}(g)\) refers to the parents of node \(g\) in \(G\). We refer to \citet{Drton2017} for more details on graphical models.

We call \(j\) a parent of \(g\) if \((j,g) \in
E\). Further, we denote the set of non-descendants by \(nd_{\mathcal{G}}(g) = V\setminus\bigl(\{g\}\cup\mathrm{desc}(g)\bigr)\) where the \emph{descendants} of \(g\) are defined as \(\mathrm{desc}_{\mathcal{G}}(g)=\{w\in V : \text{there exists a directed path from } g \text{ to } w\}\).
Whenever the graph is clear from the context we omit the subscript. Note that the factorization agrees fully with the scalar case as long as \(P_\mathbf{X}\) has a density with respect to some product measure.
We may also express the above model in terms of structural equation models (SEM) in the grouped case:

\begin{definition}\label{def:SEM}
  Let \(\mathbf{X}=(\mathbf{X}_g)_{g\in[p]}\) and
  \(\mathbf{N}=(\mathbf{N}_g)_{g\in[p]}\) be
  jointly distributed random vectors where \(\mathbf{X}_g\) is \(d_g\)-dimensional and where
  \(\mathbf{N}_g \indep \mathbf{N}_h\) for all $g\neq h$.
  If there exists a
  graph \(\mathcal{G}_0\)
  on \([p]\) and a sequence of vector-valued functions \(\mathcal{F}=(f_1,\ldots,f_p)\)
  such that
  \begin{equation}\label{eq:sem}
    \mathbf{X}_g = f_g(\mathbf{X}_{pa_{\mathcal{G}_0}(g)}, \mathbf{N}_g)
  \end{equation}
  for all \(g\in[p]\), then $(\mathbf{X},\mathbf{N}, \mathcal{F},\mathcal{G}_0)$ is a \emph{grouped
  structural equation model} (GSEM).
\end{definition}
In case all groups are one-dimensional, the GSEM is just a standard SEM~\citep{Bollen1989}.
Note that while Definition~\ref{def:SEM}
requires the noise groups to be jointly independent, the model explicitly allows for dependence
within each group.
