\documentclass[twocolumn]{aastex631}

\usepackage{amsmath}
\usepackage{multirow}
\usepackage{natbib}
\usepackage{graphicx} 
\usepackage{aas_macros}

\begin{document}

\subsection{Data Acquisition and Pre-processing}
Our analysis initiates with the acquisition and meticulous pre-processing of posterior samples derived from the gravitational-wave event GW231123. These samples, representing the probability distributions of various source parameters, were generated using five distinct gravitational-wave waveform models: NRSur7dq4, IMRPhenomXO4a, SEOBNRv5PHM, IMRPhenomXPHM, and IMRPhenomTPHM. Each model's posterior samples were provided as individual CSV files, specifically located at `/mnt/ceph/users/fanonymous/AstroPilot/GW/Iteration1/data/GW231123_NRSur7dq4.csv`, `/mnt/ceph/users/fanonymous/AstroPilot/GW/Iteration1/data/GW231123_IMRPhenomXO4a.csv`, `/mnt/ceph/users/fanonymous/AstroPilot/GW/Iteration1/data/GW231123_SEOBNRv5PHM.csv`, `/mnt/ceph/users/fanonymous/AstroPilot/GW/Iteration1/data/GW231123_IMRPhenomXPHM.csv`, and `/mnt/ceph/users/fanonymous/AstroPilot/GW/Iteration1/data/GW231123_IMRPhenomTPHM.csv`.

Upon loading, each CSV file was parsed into a separate pandas DataFrame \citep{shanbhag2022energyconsumptiondifferentdataframe}. To facilitate unified analysis while preserving model attribution, a 'model' column was appended to each DataFrame, explicitly identifying the waveform model from which the samples originated. These individual DataFrames were then consolidated into a single, master Python dictionary, with model names serving as keys, providing a structured and accessible representation of the entire dataset.

A critical step in pre-processing involved thorough data cleaning and verification \citep{khan2019sparkmldrivenpreprocessing,lima2024impactsdatapreprocessinghyperparameter}. This included confirming the consistency of column names across all files, checking for the presence of any `NaN` or missing values (none were found, as expected for posterior samples of this nature) \citep{khan2019sparkmldrivenpreprocessing}, and verifying the sensible range of $log\ensuremath{\_}likelihood$ values.

\subsection{Exploratory Data Analysis and Baseline Comparison}
Prior to undertaking advanced discrepancy decomposition, we performed an extensive exploratory data analysis to establish a baseline understanding of the agreements and disagreements among the five waveform models. This phase provided initial quantitative insights into the parameter inferences for GW231123 \citep{cuceu2025gw231123binaryblackhole,theligoscientificcollaboration2025gw231123binaryblackhole}.

\subsubsection{Summary Statistics}
For each of the five waveform models, we computed key summary statistics for the following astrophysical parameters: $mass\ensuremath{\_}1\ensuremath{\_}source$ (primary component mass), $mass\ensuremath{\_}2\ensuremath{\_}source$ (secondary component mass), $chi\ensuremath{\_}eff$ (effective inspiral spin parameter), $chi\ensuremath{\_}p$ (precessing spin parameter), $redshift$, $final\ensuremath{\_}mass\ensuremath{\_}source$ (remnant black hole mass), and $final\ensuremath{\_}spin$ (remnant black hole spin). Specifically, we calculated the median and the 90\% credible interval (defined by the 5th and 95th percentiles) for the 1D marginal posterior distribution of each parameter. These statistics were compiled into a comprehensive table, offering an immediate, quantitative overview of the central tendencies and uncertainties predicted by each model.

\subsubsection{Pairwise Statistical Divergence}
To rigorously quantify the disagreement between the 1D marginal posterior distributions of each parameter across all model pairs, we employed two robust statistical divergence metrics: the Jensen-Shannon Divergence (JSD) \citep{nielsen2022generalizationjensenshannondivergencejssymmetrization,hoyososorio2024representationjensenshannondivergence} and the 1-Wasserstein distance.For each parameter, and for every unique pair of waveform models, the following procedure was applied:
\begin{enumerate}
    \item The 1D marginal posterior samples for the given parameter from each model were extracted.
    \item A Kernel Density Estimator (KDE) was used to estimate the Probability Density Function (PDF) for each set of samples. A common, optimized bandwidth (e.g., determined by Scott's rule or Silverman's rule) was applied across all PDFs for a given parameter to ensure consistent smoothing.
    \item The JSD was calculated between the estimated PDFs of the two models. The JSD is a symmetric and finite measure of the similarity between two probability distributions, ranging from 0 (identical distributions) to 1 (maximally divergent distributions), and is based on the Kullback-Leibler divergence.
    \item The 1-Wasserstein distance (also known as Earth Mover's Distance) was computed between the empirical distributions of the two models. This metric quantifies the minimum cost of transforming one distribution into the other, effectively measuring the "distance" between probability distributions.
\end{enumerate}
This process yielded two 5x5 symmetric matrices for each key astrophysical parameter, one for JSD values and one for 1-Wasserstein distances. These matrices served as a quantitative baseline for understanding the degree of agreement or disagreement between models on a parameter-by-parameter basis, highlighting where significant univariate discrepancies first emerge.

\subsection{High-Dimensional Degeneracy and Discrepancy Analysis}
To gain a holistic understanding of how the different waveform models populate the high-dimensional parameter space and to visualize complex degeneracies, we employed Uniform Manifold Approximation and Projection (UMAP) \citep{ghojogh2021uniformmanifoldapproximationprojection,vela2024visualizinghighentropyalloy}.

\subsubsection{Data Preparation for UMAP}
All posterior samples from the five waveform models were combined into a single, large DataFrame. This consolidated dataset included all 13 physical parameters typically inferred for binary black hole mergers. To ensure that parameters with differing scales did not disproportionately influence the dimensionality reduction, all parameter columns were standardized using z-scoring (subtracting the mean and dividing by the standard deviation across the combined dataset). The 'model' column was retained to allow for post-projection attribution and analysis.

\subsubsection{Uniform Manifold Approximation and Projection (UMAP)}
The UMAP algorithm was applied to the standardized, high-dimensional parameter space. UMAP is a non-linear dimensionality reduction technique that constructs a high-dimensional graph representing the data's topological structure and then optimizes a low-dimensional graph to be as structurally similar as possible \citep{wang2024acceleratingumaplargescaledatasets}. Our primary goal was to project the high-dimensional parameter space (encompassing all 13 physical parameters) down to a 2D space, thereby enabling intuitive visualization of the complex, non-linear relationships and degeneracies inherent in the posterior distributions \citep{wang2021understandingdimensionreductiontools,wang2024acceleratingumaplargescaledatasets}.

We utilized the `umap-learn` library for this implementation. Key hyperparameters, $n\ensuremath{\_}neighbors$ (controlling the balance between local and global structure preservation) and $min\ensuremath{\_}dist$ (controlling how tightly points are packed together), were tuned to optimize the embedding quality \citep{liao2023efficientrobustbayesianselection,liao2023efficientrobustbayesianselection}. Initial values of $n\ensuremath{\_}neighbors=50$ and $min\ensuremath{\_}dist=0.1$ were used as a starting point \citep{yang2024finetuninglargelanguagemodels,jung2025ghostumap2measuringanalyzingrdstability}, with iterative adjustments made to achieve a robust representation that captures both the local clustering and global separation of the data \citep{lin2024calibratingdimensionreductionhyperparameters}. The output of the UMAP transformation was a set of 2D coordinates ($UMAP\ensuremath{\_}1$, $UMAP\ensuremath{\_}2$) for each posterior sample, representing its position in the learned low-dimensional manifold.

\subsubsection{Analysis of UMAP Embedding}
The generated 2D UMAP embedding provided a powerful visual and analytical tool to assess the high-dimensional discrepancies \citep{negro2025promptgrbrecognitionwaterfalls}. By filtering the UMAP coordinates by their associated 'model' label, we could qualitatively and quantitatively examine how the posterior samples for each waveform model occupy and cluster within this reduced space \citep{dimple2023evidencedistinctpopulationskilonovaassociated,bufano2024siftingdebrispatternssnr}. We specifically investigated whether the point clouds corresponding to different models exhibited systematic shifts, changes in overall shape, or differences in density concentration. For instance, we analyzed if models with fundamentally different physical approximations, such as IMRPhenomXPHM (which includes a "twisting-up" precession formalism) and NRSur7dq4 (a numerical relativity surrogate), showed distinct, non-overlapping regions in the UMAP space \citep{houba2025deepsourceseparationoverlapping}, indicating significant high-dimensional disagreements.

\subsection{Physics-Informed Discrepancy Decomposition}
The core of our methodology lies in the Physics-Informed Discrepancy Decomposition \citep{dick2025lockingfreetrainingphysicsinformedneural}, which systematically dissects the overall model disagreements and attributes them to specific physical effects \citep{dick2025lockingfreetrainingphysicsinformedneural} and the corresponding approximations within the waveform models \citep{mu2025separationpinnphysicsinformedneuralnetworks}. This approach goes beyond global comparisons by focusing on physically motivated parameter subspaces \citep{dick2025lockingfreetrainingphysicsinformedneural}.

\subsubsection{Definition of Physical Parameter Subspaces}
Based on our understanding of binary black hole physics and the known characteristics and approximation schemes of the waveform models \citep{liu2023upgradedwaveformmodeleccentric,mukherjee2024phenomenologicalgravitationalwaveformmodel,kapil2024systematicbiaswaveformmodeling}, we meticulously defined four distinct parameter subspaces. These subspaces were designed to isolate specific physical aspects of the binary merger that are known to be treated differently across waveform models \citep{kapil2024systematicbiaswaveformmodeling}. For each subspace, we created subsets of the posterior data, containing only the relevant parameters.
\begin{enumerate}
    \item \textbf{Mass \& Distance Subspace:} This subspace includes ($mass\ensuremath{\_}1\ensuremath{\_}source$, $mass\ensuremath{\_}2\ensuremath{\_}source$, $redshift$). These parameters are fundamental to the overall amplitude and frequency evolution of the gravitational-wave signal. Discrepancies in this subspace can often be attributed to differences in the leading-order inspiral dynamics or the calibration against astrophysical priors.
    \item \textbf{Effective Spin Subspace:} Comprising ($chi\ensuremath{\_}eff$, $chi\ensuremath{\_}p$), this subspace captures the dominant, orbit-averaged effects of spin. $chi\ensuremath{\_}eff$ primarily influences the inspiral rate, while $chi\ensuremath{\_}p$ quantifies the strength of orbital plane precession. Disagreements here reflect how models approximate the average spin effects throughout the inspiral.
    \item \textbf{Individual Spin \& Orientation Subspace:} This is a high-dimensional subspace consisting of ($a\ensuremath{\_}1$, $a\ensuremath{\_}2$, $cos\ensuremath{\_}tilt\ensuremath{\_}1$, $cos\ensuremath{\_}tilt\ensuremath{\_}2$, $cos\ensuremath{\_}theta\ensuremath{\_}jn$, $phi\ensuremath{\_}jl$). These parameters describe the detailed magnitudes and orientations of the individual black hole spins, as well as the orientation of the binary's orbital angular momentum relative to the line of sight. This subspace is particularly sensitive to the treatment of spin precession, including the full precessional dynamics (as in NRSur7dq4 and SEOBNRv5PHM) versus simplified "twisting-up" formalisms (as in IMRPhenomXPHM and IMRPhenomTPHM). Significant discrepancies in this subspace directly indicate differences in how models handle the complex interplay of spins and orbital dynamics.
    \item \textbf{Remnant Properties Subspace:} This subspace includes ($final\ensuremath{\_}mass\ensuremath{\_}source$, $final\ensuremath{\_}spin$). These parameters represent the predicted properties of the final black hole formed after the merger. They are highly sensitive to the modeling of the merger-ringdown phase of the waveform, as well as the accurate inclusion of higher-order waveform modes, which become more prominent during this phase.
\end{enumerate}

\subsubsection{Quantifying Subspace-Specific Discrepancies}
For each of the four defined physical subspaces, and for every pairwise combination of the five waveform models, we quantified the multi-dimensional disagreement using the multi-dimensional Jensen-Shannon Divergence (JSD) \citep{nielsen2022generalizationjensenshannondivergencejssymmetrization,hoyososorio2024representationjensenshannondivergence}.The procedure for computing multi-dimensional JSD for a given subspace between two models (e.g., Model A and Model B) was as follows:
\begin{enumerate}
    \item The posterior samples for the parameters within the specific subspace were extracted for both Model A and Model B.
    \item A multi-dimensional Kernel Density Estimator (KDE) was employed to estimate the joint PDF for each model's samples within that subspace. This involves estimating the probability density across the entire multi-dimensional space spanned by the subspace parameters.
    \item The multi-dimensional JSD was then computed between the two estimated joint PDFs. This metric provides a single scalar value quantifying the overall divergence of the two models' posterior distributions within that specific physical subspace.
\end{enumerate}
This process resulted in four separate 5x5 discrepancy matrices \citep{tsagris2024constrainedsquaressimplicialsimplicialregression}, one for each physical subspace. Each matrix element represented the multi-dimensional JSD \citep{alagoz2024exploringhierarchicalclassificationperformance,chanda2025learningmattersprobabilistictask} between a pair of models within that particular subspace, thereby providing a targeted measure of disagreement.

\subsubsection{Correlation of Discrepancies with Model Physics}
The resulting discrepancy matrices from the physics-informed decomposition were critically analyzed to establish direct links between the magnitude of the observed discrepancies and the known physical differences in the underlying waveform models.

For instance, we specifically compared the JSD values in the 'Individual Spin \& Orientation' matrix with those in the 'Mass \& Distance' matrix. We hypothesized that models with fundamentally different treatments of spin precession (e.g., IMRPhenomXPHM versus NRSur7dq4) would exhibit significantly larger JSD values in the highly sensitive spin and orientation subspace compared to the more universally agreed-upon mass and distance subspace. Similarly, we examined the 'Remnant Properties' discrepancy matrix, anticipating that models incorporating a more complete treatment of higher-order modes (such as SEOBNRv5PHM and IMRPhenomXPHM) would show greater consistency among themselves, while displaying larger divergences with models that have a less comprehensive representation of the merger-ringdown phase, like IMRPhenomXO4a. This systematic correlation allowed us to attribute discrepancies to specific physical approximations within the models, moving beyond mere observation of disagreement to understanding its underlying causes.

\subsection{Robust Astrophysical Inference}
The final stage of our analysis involved synthesizing the findings from the exploratory data analysis, high-dimensional embedding, and physics-informed decomposition to derive robust astrophysical constraints for GW231123 \citep{yuan2025gw231123massgapevent,caputo2025superradianceconstraintsgw231123,tanikawa2025gw231123formationpopulationiii}.

\subsubsection{Identification of Robustly Constrained Parameters}
A key objective was to identify which astrophysical parameters for GW231123 are robustly constrained across all five waveform models, meaning their inferred posterior distributions show high consistency regardless of the model choice \citep{aswathi2025ultralightbosonconstraintsgravitational,bartos2025accretionneedblackhole}. A parameter was deemed "robust" if the maximum pairwise Jensen-Shannon Divergence (JSD) and 1-Wasserstein distance values among all model pairs (as calculated in Section 2.2) fell below a pre-defined threshold (e.g., $JSD < 0.01$). Furthermore, strong overlap in the medians and 90\% credible intervals across all models, as observed in the summary statistics, served as an additional indicator of robustness \citep{aswathi2025ultralightbosonconstraintsgravitational}.

\subsubsection{Identification of Model-Dependent Parameters}
Conversely, parameters that failed to meet the robustness criteria were classified as "model-dependent." For these parameters, the systematic uncertainties introduced by waveform model choice were found to be significant. Crucially, our Physics-Informed Discrepancy Decomposition (Section 4.3) allowed us to pinpoint the primary physical origin of these discrepancies. For example, if $chi\ensuremath{\_}p$ was identified as model-dependent, the analysis would then attribute this discrepancy to differing treatments of spin precession between phenomenological and NR-calibrated models, based on the high JSD values observed in the 'Individual Spin \& Orientation' subspace.

\subsubsection{Derivation of Consensus Astrophysical Constraints}
For those parameters identified as robustly constrained, we derived a final consensus measurement for GW231123 \citep{theligoscientificcollaboration2025gw231123binaryblackhole}. This was achieved by combining the posterior samples for that specific parameter from all five waveform models into a single, aggregated dataset. From this combined distribution, the final consensus median and 90\% credible interval were computed, representing our most reliable, model-agnostic measurement for that property of the binary black hole system \citep{theligoscientificcollaboration2025gw231123binaryblackhole}.

\subsubsection{Final Results Compilation}
The comprehensive findings were compiled into a final summary table. This table explicitly listed all key astrophysical parameters of GW231123. For each parameter, it provided the derived consensus median and 90\% credible interval if the parameter was deemed robust.

If a parameter was classified as model-dependent, the table reported the range of medians observed across the different models instead of a single consensus value, clearly marking it as such. An additional column provided a concise statement on whether the parameter constraint was 'Robust' or 'Model-Dependent', along with a brief, physics-informed note explaining the origin of any significant model dependency, directly linking back to the insights gained from the discrepancy decomposition. This structured presentation allowed for a clear and interpretable assessment of the astrophysical inferences for GW231123.

\end{document}
                