\documentclass[twocolumn]{aastex631}

\usepackage{amsmath}
\usepackage{multirow}
\usepackage{natbib}
\usepackage{graphicx} 
\usepackage{aas_macros}

\begin{document}

The objective of this study is to dissect the geometric structure of the 10-dimensional latent space generated by a Physics-Informed Neural Network (PINN) trained to solve the 2D Burger's equation. We investigate how the latent representations corresponding to different initial conditions are organized within this space and how their structure relates across an ensemble of 25 distinct initial conditions. Our methodology involves data preparation, applying Principal Component Analysis (PCA) to characterize the dimensionality and variance distribution of latent vector sets, and employing subspace similarity measures to compare the orientations of principal subspaces across different initial conditions.

\subsection{Latent Space Data Preparation}

The data used in this analysis originates from a pre-trained PINN solving the 2D Burger's equation over a specified spatiotemporal domain. The data was provided as a NumPy array `data\_bundle` with dimensions $(101, 103, 25, 13)$. These dimensions correspond to spatial grid points ($x$-coordinate), time steps ($t$), initial condition index, and features, respectively. The spatial grid consists of 101 points along the $x$-axis, and the temporal domain is discretized into 103 time steps. The dataset includes solutions and latent space representations for 25 different initial conditions. The features dimension (size 13) contains the predicted solution components (e.g., velocity fields $u$ and $v$) and the 10-dimensional latent vector output by an intermediate layer of the PINN for each spatial point $(x)$ and time step $(t)$ under a specific initial condition.

The 10-dimensional latent space data was extracted from the last 10 components of the features dimension. This resulted in a tensor `latent\_space\_data` with dimensions $(101, 103, 25, 10)$. Each element `latent\_space\_data[i, j, k, :]` represents the 10-dimensional latent vector $L(x_i, t_j, \text{IC}_k)$ corresponding to the spatial point $x_i$, time $t_j$, and the $k$-th initial condition $\text{IC}_k$. For each initial condition $k$, the set of latent vectors $\{L(x_i, t_j, \text{IC}_k)\}$ over all $i=0..100$ and $j=0..102$ forms a collection of $101 \times 103 = 10403$ points in the 10-dimensional latent space $\mathbb{R}^{10}$. This collection is treated as a point cloud representing the PINN's latent encoding of the physical solution for initial condition $\text{IC}_k$.

\subsection{Geometric Analysis Techniques}

To analyze the structure of these point clouds and their relationships, we employed Principal Component Analysis (PCA) and subspace similarity measures.

\subsubsection{Principal Component Analysis (PCA)}

PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The principal components are the eigenvectors of the data's covariance matrix, and their corresponding eigenvalues represent the variance along those directions.

In this study, PCA was applied in several contexts:
\begin{itemize}
    \item \textbf{Global PCA:} PCA was applied to the entire collection of latent vectors across all spatial points, time steps, and initial conditions. The `latent\_space\_data` tensor was reshaped into a $2D$ matrix of size $(101 \times 103 \times 25, 10)$, effectively treating all $10403 \times 25 = 260075$ latent vectors as a single dataset in $\mathbb{R}^{10}$. This global PCA reveals the overall dimensionality and dominant directions of variation within the latent space spanned by all observed states. The eigenvalues were used to calculate the percentage of total variance explained by each principal component and the cumulative variance, providing an estimate of the effective global dimensionality.
    \item \textbf{Per-Initial Condition PCA:} For each of the 25 initial conditions, PCA was applied independently to the set of $10403$ latent vectors $\{L(x_i, t_j, \text{IC}_k)\}$ corresponding to that specific initial condition $k$. For each IC $k$, the data `latent\_space\_data[:, :, k, :]` was reshaped into a $2D$ matrix of size $(10403, 10)$. This per-IC PCA characterizes the intrinsic dimensionality and shape of the point cloud associated with a single physical solution. The centroid (mean vector) $C_k$ of the point cloud for IC $k$ was calculated, and the eigenvalues and eigenvectors (principal components) of its covariance matrix were obtained. The eigenvalues indicate the variance along the principal directions, and the eigenvectors form an orthonormal basis for the principal subspace capturing the data's variation. The cumulative variance explained by the principal components for each IC was analyzed to determine the effective intrinsic dimensionality of the manifold for that specific initial condition.
    \item \textbf{PCA on Centroids:} The centroids $C_k$ for each of the 25 initial conditions are 10-dimensional vectors. These 25 centroid vectors were collected into a $2D$ matrix of size $(25, 10)$. PCA was applied to this matrix to analyze the geometric arrangement of the manifold centroids in the latent space. This reveals whether the variation in initial conditions primarily translates the latent manifold along a low-dimensional path or occupies a more complex structure in the latent space.
\end{itemize}
For all PCA applications, the data was centered by subtracting the mean before computing the covariance matrix and performing the eigenvalue decomposition.

\subsubsection{Subspace Similarity Measures}

To compare the orientations of the principal subspaces identified by the per-IC PCA, we employed subspace similarity measures. For each initial condition $k$, the per-IC PCA yields a set of principal components $\{v_{k,1}, v_{k,2}, \dots, v_{k,10}\}$ ordered by their corresponding eigenvalues. Based on the cumulative variance explained, we determined an effective intrinsic dimensionality $d_{ic}$ for the individual manifolds (e.g., the number of components capturing 95\% of variance). The principal subspace for IC $k$ is then approximated by the span of its first $d_{ic}$ principal components, $\text{span}\{v_{k,1}, \dots, v_{k,d_{ic}}\}$.

To quantify the similarity between the principal subspaces of two initial conditions $k$ and $j$, we compared their sets of principal vectors $\{v_{k,1}, \dots, v_{k,d_{ic}}\}$ and $\{v_{j,1}, \dots, v_{j,d_{ic}}\}$. A quantitative measure of subspace similarity is given by the principal angles between the two subspaces. Alternatively, for small $d_{ic}$, the similarity can be approximated by comparing corresponding principal vectors. For instance, the alignment of the primary direction of variation is measured by the absolute dot product $|v_{k,1} \cdot v_{j,1}|$. A value close to 1 indicates strong alignment, while a value close to 0 indicates orthogonality. We computed these measures for pairs of corresponding principal vectors (e.g., $v_{k,1}$ vs $v_{j,1}$, $v_{k,2}$ vs $v_{j,2}$) across all pairs of initial conditions to assess the consistency in manifold orientation. A high average subspace similarity across all pairs of ICs indicates that the principal directions of variation for the latent manifolds are largely parallel, implying that the manifolds are primarily translated versions of each other.

\subsection{Analysis Workflow}

The analysis was structured in a sequence of steps to progressively reveal the geometric structure of the latent space and the encoding of initial conditions:

\subsubsection{Initial Exploratory Data Analysis}
We began by performing global PCA on the entire collection of latent vectors to understand the overall distribution and effective dimensionality of the combined dataset. Concurrently, we performed per-IC PCA for each of the 25 initial conditions to obtain individual centroids and principal components, characterizing the typical intrinsic dimensionality and variance structure of a single manifold. Finally, PCA was applied to the set of 25 centroids to understand how the mean positions of the manifolds are organized.

\subsubsection{Characterization of Individual Manifolds}
Based on the per-IC PCA results, we determined the effective intrinsic dimensionality $d_{ic}$ for the latent point cloud of each initial condition. We approximated each point cloud as an affine subspace defined by its centroid $C_k$ and the span of its first $d_{ic}$ principal component vectors $V_k = [v_{k,1}, \dots, v_{k,d_{ic}}]$. The eigenvalues associated with these vectors provided insight into the extent of the manifold along each principal direction.

\subsubsection{Comparative Analysis Across Initial Conditions}
We systematically compared the characterized manifolds across the 25 initial conditions. The analysis of centroids (PCA on $\{C_k\}$) revealed the structure of the path traced by the manifold centers as the initial condition changes. Subspace similarity measures were computed for pairs of principal subspaces $\text{span}(V_k)$ to quantify how similarly oriented the manifolds are. By combining the information from centroid locations and manifold orientations, we assessed whether the primary effect of changing the initial condition is a simple translation, a rotation, or a more complex transformation of a fundamental latent structure. We also specifically analyzed the set of first principal vectors $\{v_{k,1}\}$ across all ICs using PCA to see if the dominant direction of variation for individual manifolds exhibits a structured, possibly low-dimensional, variation across ICs.

\subsubsection{Relation to Global Latent Space Structure}
Finally, we related the local structures (individual manifolds) to the global structure identified by the global PCA. We projected the centered latent vectors $(L_k - C_k)$ for each IC $k$ onto the dominant subspace identified by the global PCA to see how much of the per-IC variance is aligned with the global principal directions. We also examined the alignment between the per-IC principal subspaces $\text{span}(V_k)$ and the global principal subspace $\text{span}(U_{glob})$.

\subsubsection{Synthesis}
The findings from these analyses were synthesized to provide a comprehensive geometric description of the PINN's latent space. We described the typical intrinsic dimensionality of the latent representation for a single solution, the extent to which these representations form affine manifolds, how these manifolds are related across different initial conditions (e.g., by translation along a low-dimensional path, by consistent orientation), and how these local structures relate to the overall structure of the latent space. This synthesis allowed us to draw conclusions about how the PINN efficiently encodes the initial condition within its internal representation.

\end{document}
                