```Text
This section details the methodologies employed to extract cosmological parameters from dark matter halo merger trees, leveraging multi-scale substructure analysis, learned topological embeddings, and Quantum-Inspired Tensor Train (QITT) decomposition.
\subsection{Dataset and Data Preprocessing}
The dataset comprises 1000 dark matter halo merger trees \citep{jiang2013generatingmergertreesdark,nguyen2025emulatingdarkmatterhalo}, each provided as a PyTorch Geometric \texttt{Data} object \citep{nguyen2025emulatingdarkmatterhalo}. These trees originate from 40 distinct cosmological simulations \citep{yung2024characterisingultrahighredshiftdarkmatter,nguyen2025emulatingdarkmatterhalo}, with 25 trees generated per simulation. Each simulation corresponds to a unique set of cosmological parameters, specifically $\Omega_m$ (matter density parameter) and $\sigma_8$ (amplitude of matter fluctuations).

Each node within a merger tree represents a dark matter halo at a specific cosmic time and is characterized by a 4-dimensional feature vector: $\log_{10}(\text{mass})$, $\log_{10}(\text{concentration})$, $\log_{10}(V_{\text{max}})$, and `scale\ensuremath{\_}factor`  \citep{parkinson2007generatingdarkmatterhalo,nguyen2025emulatingdarkmatterhalo}. The `edge\ensuremath{\_}index` attribute defines the progenitor-descendant relationships within each tree  \citep{nguyen2025emulatingdarkmatterhalo}. The target variables for prediction are $\Omega_m$ and $\sigma_8$, which are associated with each entire merger tree.
\subsubsection{Data Preprocessing Steps}
Prior to any analysis, the node features were normalized to ensure consistent scaling across the dataset \citep{yang2021featurenorml2featurenormalization,skryagin2024graphneuralnetworksneed}. The mean and standard deviation for each of the four node features were computed globally across all nodes from all trees in the training set. Subsequently, each node feature $x$ was normalized using the formula: $x_{\text{normalized}} = (x - \mu) / \sigma$, where $\mu$ is the global mean and $\sigma$ is the global standard deviation for that feature \citep{skryagin2024graphneuralnetworksneed}. The target variables, $\Omega_m$ and $\sigma_8$, were used directly for regression without further transformation.
\subsubsection{Data Splitting}
The dataset of 1000 merger trees \citep{davies2025efficientsimulationdiscretegalaxy} was partitioned into training, validation, and testing sets following a 70-15-15 split. To prevent data leakage due to potential correlations between trees originating from the same cosmological simulation, the splitting was performed at the simulation level \citep{bernardini2025ember2emulatingbaryonsdark}. Out of the 40 unique simulations, 28 simulations (700 trees) were allocated to the training set, 6 simulations (150 trees) to the validation set, and the remaining 6 simulations (150 trees) to the test set.
\subsection{Multi-Scale Substructure Identification}
To move beyond global tree properties and capture fine-grained cosmological imprints, we systematically identified significant substructures within each dark matter merger tree \citep{jiang2025selfsimilardecompositionhierarchicalmerger}. A substructure is defined as a significant progenitor branch that either merges into a more massive main branch or exhibits substantial changes in its intrinsic halo properties \citep{chandrogómez2025accuracydarkmatterhalo}.
\subsubsection{Substructure Definition and Extraction}
The process of substructure identification involved traversing each merger tree from its main root halo (typically the halo at the latest `scale\ensuremath{\_}factor` with the largest mass) \citep{rangel2020buildinghalomergertrees,jung2024mergertreebasedgalaxymatching}. Merger events, defined as instances where a halo has multiple direct progenitors, served as primary indicators for substructure origins \citep{rangel2020buildinghalomergertrees,arendt2024identifyinggalaxyclustermergers}. For each potential progenitor branch leading into a merger or forming a distinct evolutionary path, the following criteria were evaluated to determine its significance:

\begin{enumerate}
    \item \textbf{Mass Accretion Rate:} The relative mass accretion rate, quantified as $\log_{10}(M_{\text{progenitor}} / M_{\text{descendant}})$, where $M_{\text{progenitor}}$ is the mass of the substructure's root halo and $M_{\text{descendant}}$ is the mass of the main branch halo it merges into. Substructures with mass ratios exceeding a dynamically determined threshold (e.g., top 10\% of mass ratios within each tree) were considered significant.
    \item \textbf{Significant Property Changes:} Changes in the normalized $\log_{10}(\text{concentration})$ and $\log_{10}(V_{\text{max}})$ along a branch were monitored. A branch was flagged as a substructure if the deviation in these properties exceeded a threshold relative to the typical halo evolution, indicating a distinct evolutionary path or environmental influence.
\end{enumerate}
Each identified significant substructure was then represented as a separate graph, inheriting its constituent halos (nodes) and their progenitor-descendant relationships (edges) from the original merger tree  \citep{robles2019halomergertreegeneration,chandrogómez2025accuracydarkmatterhalo}. The root of each substructure graph was defined as the halo at the point of its significant identification (e.g., just before a major merger or at the onset of a property deviation).
\subsection{Feature Extraction for Substructures}
For each identified substructure \citep{tola2024topertopologicalembeddingsgraph}, a comprehensive feature vector was constructed by combining physical properties with learned topological embeddings \citep{zhou2025preservingtopologicalgeometricembeddings,wei2025integratingphysicstopologyneural}.
\subsubsection{Physical Features}
A 10-dimensional physical feature vector was engineered for each substructure \citep{owens2024describetransformmachinelearning,sen2025featureengineeringdeadreviving}. These features quantify the intrinsic properties and interaction history of the substructure \citep{owens2024describetransformmachinelearning,bhardwaj2024foundationsautomaticfeatureextraction}:
\begin{enumerate}
    \item \textbf{Mass Ratio:} $\log_{10}(M_{\text{substructure root}} / M_{\text{main branch at merger}})$.
    \item \textbf{Merger Scale Factor:} The `scale\ensuremath{\_}factor` at which the substructure's root halo merges into a larger branch.
    \item \textbf{Property Differences at Merger:} Difference in normalized $\log_{10}(\text{concentration})$ and $\log_{10}(V_{\text{max}})$ between the substructure's root halo and its parent in the main branch at the time of merging.
    \item \textbf{Substructure Intrinsic Properties:} These include the mean and standard deviation of the normalized $\log_{10}(\text{mass})$, $\log_{10}(\text{concentration})$, $\log_{10}(V_{\text{max}})$, and `scale\ensuremath{\_}factor` across all halos within the substructure graph. This accounts for 8 features (mean and std for 4 properties).
\end{enumerate}
These 10 features provide a quantitative description of the substructure's physical characteristics and its interaction with the larger cosmic web \citep{hunde2025caughtcosmicwebenvironmental,bahe2025galaxiessimulatedcosmicweb}.
\subsubsection{Learned Topological Embeddings}
To capture the intricate connectivity patterns and relational information within each substructure, a Graph Neural Network (GNN) was employed to learn low-dimensional topological embeddings \citep{song2021topologicalregularizationgraphneural,tola2024topertopologicalembeddingsgraph,li2025functionalconnectivitygraphneural}.
\begin{enumerate}
    \item \textbf{GNN Architecture:} A GraphSAGE autoencoder was utilized for this purpose. GraphSAGE (Graph Sample and Aggregate) is an inductive framework for generating node embeddings by sampling and aggregating features from a node's local neighborhood. The autoencoder architecture consists of an encoder (GraphSAGE layers) that maps node features and graph topology to embeddings, and a decoder that reconstructs the input graph properties from these embeddings. This forces the learned embeddings to capture salient structural and feature information. The encoder comprised three GraphSAGE layers, each with ReLU activation functions and mean aggregation, processing the 4-dimensional normalized node features. The output dimension of the GNN for each node embedding was 64.
    \item \textbf{GNN Pre-training and Application:} The GraphSAGE autoencoder was pre-trained separately on a large corpus of generated graphs, including a subset of the merger trees, to learn robust, generalizable topological representations. Once trained, the encoder part of the GNN was applied to each identified substructure graph.
    \item \textbf{Graph-Level Embedding:} After generating 64-dimensional node embeddings for all halos within a substructure, a global mean pooling operation was applied. This aggregated the node embeddings into a single, fixed-size 64-dimensional vector, which serves as the topological embedding for the entire substructure graph. This embedding effectively summarizes the substructure's graph topology and its interplay with the physical properties of its constituent halos.
\end{enumerate}
\subsection{Tensor Construction}
The combined physical and topological features from all substructures within a merger tree were organized into a fixed-shape tensor, enabling unified processing and subsequent Quantum-Inspired Tensor Train (QITT) decomposition.
\subsubsection{Feature Concatenation and Tensor Dimensions}
For each substructure, its 10-dimensional physical feature vector was concatenated with its 64-dimensional learned topological embedding  \citep{su2024bghgnnscalableefficientheterogeneous,e2024tangnnconcisescalableeffective}. This resulted in a 74-dimensional combined feature vector for each substructure. For a given merger tree, if $N_{\text{sub}}$ substructures were identified, a tensor of shape $(N_{\text{sub}}, 74)$ was initially formed.
\subsubsection{Padding Strategy for Fixed Shape}
Since the number of identified substructures ($N_{\text{sub}}$) varied across trees, a fixed tensor shape was required for batch processing and QITT input. Based on preliminary analysis, a maximum number of substructures, $max_{N_{\text{sub}}}$, was set to 60, as indicated by the total feature count in the abstract (4440 features = $60 \times 74$).

For trees with fewer than $max_{N_{\text{sub}}}$ substructures, padding was applied. A "null" substructure embedding was generated: its physical features were set to zero vectors, and its 64-dimensional topological embedding was obtained by applying the pre-trained GraphSAGE GNN  \citep{fan2024generalizinggraphneuralnetworks,petkar2025graphtalkswhoslistening} to a canonical single-node graph with average feature values. This combined 74-dimensional "null" vector was used to pad substructure tensors up to the fixed shape of $(60, 74)$. Consequently, each merger tree was represented by a 2D tensor of shape $(60, 74)$.
\subsection{Quantum-Inspired Tensor Train (QITT) Decomposition}
The core of our feature engineering pipeline involves applying Quantum-Inspired Tensor Train (QITT) decomposition to the constructed tensors \citep{matsuura2025tensorcrossinterpolationapproach}. QITT efficiently compresses high-dimensional data, extracting a compact and informative lower-dimensional representation \citep{sander2025equivalencecheckingquantumcircuits,matsuura2025tensorcrossinterpolationapproach}.
\subsubsection{Tensor Reshaping and Decomposition}
For each tree, the $(60, 74)$ tensor, representing the collection of all substructures and their combined features, was first flattened into a 1D vector of length $60 \times 74 = 4440$. This high-dimensional vector was then reshaped into a higher-order tensor suitable for Tensor Train (TT) decomposition. Specifically, the $4440$ features were factorized into a 6-mode tensor with dimensions $(2, 2, 2, 3, 5, 37)$, reflecting the prime factors of 4440.

The Tensor Train decomposition \citep{diniz2021tensordecompositionsalgorithmsapplications,chen2023lowranktensortraindecomposition,wang2025effectivealgorithmstensortrain}, implemented using the TensorLy library, factorizes this high-order tensor into a sequence of interconnected smaller tensors, known as TT-cores \citep{chen2023lowranktensortraindecomposition}. The decomposition is defined by its ranks, which control the complexity and compression level \citep{chen2023lowranktensortraindecomposition,wang2025effectivealgorithmstensortrain}. The internal TT-ranks were treated as hyperparameters and tuned to achieve optimal performance \citep{chen2023lowranktensortraindecomposition}. The decomposition was performed as follows:
$$ \mathcal{T} \approx \mathcal{G}_1 \times \mathcal{G}_2 \times \dots \times \mathcal{G}_D $$
where $\mathcal{T}$ is the reshaped 6-mode tensor for a given tree, and $\mathcal{G}_i$ are the TT-cores.
\subsubsection{QITT-Derived Feature Vector}
The resulting TT-cores from the decomposition  \citep{phan2016tensornetworkslatentvariable,wang2025effectivealgorithmstensortrain} were then flattened and concatenated into a single, compact feature vector for each merger tree. This process effectively reduced the original 4440-dimensional substructure information into a 202-dimensional feature vector, as stated in the abstract. The specific ranks for the decomposition were tuned on the validation set to achieve this compact and highly informative representation, balancing compression with predictive power  \citep{phan2016tensornetworkslatentvariable}.
\subsection{Regression Models}
The 202-dimensional QITT-derived feature vectors served as input to various regression models to predict the cosmological parameters $\Omega_m$ and $\sigma_8$.
\subsubsection{Model Selection}
The following regression models were employed:
\begin{enumerate}
    \item \textbf{Linear Regression:} A simple linear model, serving as a baseline to assess the linearity of the relationship between QITT features and cosmological parameters.
    \item \textbf{Random Forest Regressor:} An ensemble learning method based on decision trees, capable of capturing non-linear relationships and providing insights into feature importance.
    \item \textbf{XGBoost (Extreme Gradient Boosting):} A highly efficient and robust gradient boosting framework, known for its strong performance in various machine learning tasks and its ability to handle complex interactions.
\end{enumerate}
\subsubsection{Training and Hyperparameter Tuning}
Each regression model was trained on the QITT-derived features from the training set. Hyperparameter tuning for all models \citep{franceschi2025hyperparameteroptimizationmachinelearning}, including the optimal QITT ranks, was performed using 5-fold cross-validation on the training set, with the primary objective of minimizing the Mean Squared Error (MSE) and maximizing the R-squared ($R^2$) metric \citep{mantovani2023bettertreesempiricalstudy}. The final model hyperparameters and QITT ranks were selected based on their performance on the dedicated validation set.
\subsection{Comparison with Baselines}
To rigorously evaluate the efficacy of our QITT-enhanced framework, its performance was compared against several baseline approaches.
\subsubsection{Baseline Models}
\begin{enumerate}
    \item \textbf{Aggregate Graph-Level Features:} This baseline employed global statistical features extracted from each entire merger tree. Features included total tree mass, average concentration, average $V_{\text{max}}$, average `scale\ensuremath{\_}factor` of all halos, total number of nodes, tree depth, and tree width. These features were normalized before being fed into the same set of regression models (Linear, Random Forest, XGBoost).
    \item \textbf{Raw Physical Substructure Features (No QITT, No Topology Embedding):} For this baseline, only the 10-dimensional physical features for each substructure were used. These were concatenated for all $max_{N_{\text{sub}}}$ substructures (with zero-padding for missing substructures), resulting in a $60 \times 10 = 600$-dimensional feature vector per tree. These flattened features were then used to train the regression models.
    \item \textbf{Graphlet Counts:} This baseline utilized graphlet counts as a basic topological signature. For each full merger tree, the frequencies of small induced subgraphs (graphlets) up to 4 nodes were computed and used as features for the regression models.
    \item \textbf{Topology Embedding but No QITT:} This baseline used the full combined feature vector for each substructure (10 physical + 64 topological = 74 dimensions). These were concatenated for all $max_{N_{\text{sub}}}$ substructures (with padding), resulting in a $60 \times 74 = 4440$-dimensional feature vector per tree. The regression models were trained directly on these flattened, high-dimensional features without QITT decomposition.
\end{enumerate}
\subsection{Evaluation Metrics and Statistical Significance}
The performance of all models was evaluated on the held-out test set. The primary evaluation metrics were the Root Mean Squared Error (RMSE) and the coefficient of determination ($R^2$) for both $\Omega_m$ and $\sigma_8$. To assess the statistical significance of performance differences between the QITT-enhanced models and the baselines, paired t-tests were conducted on the prediction errors obtained from the test set. A p-value threshold of 0.05 was used to determine statistical significance.
\end{Text}