\documentclass[twocolumn]{aastex631}

\usepackage{amsmath}
\usepackage{multirow}
\usepackage{natbib}
\usepackage{graphicx} 
\usepackage{aas_macros}

\begin{document}

\subsection{Baseline Comparison: Significant Divergence in Key Physical Parameters}
Our initial exploratory data analysis, utilizing summary statistics and pairwise statistical divergence metrics as outlined in Section 2.2, immediately revealed substantial disagreements among the five waveform models regarding the inferred astrophysical parameters for GW231123. Table \ref{tab:summary_stats} presents the median and $90\%$ credible intervals for key source parameters.

\begin{table*}[ht]
\centering
\caption{Summary of Inferred Parameters for GW231123}
\label{tab:summary_stats}
\begin{tabular}{lcccc}
\hline
\textbf{Parameter} & \textbf{Model} & \textbf{Median} & \textbf{5th Percentile} & \textbf{95th Percentile} \\
\hline
mass\ensuremath{\_}1\ensuremath{\_}source & NRSur7dq4 & 129.14 & 115.15 & 143.86 \\
 & IMRPhenomXO4a & 143.18 & 128.70 & 167.47 \\
 & SEOBNRv5PHM & 133.69 & 119.69 & 152.28 \\
 & IMRPhenomXPHM & 149.87 & 138.24 & 162.34 \\
 & IMRPhenomTPHM & 133.37 & 121.44 & 150.75 \\
\hline
mass\ensuremath{\_}2\ensuremath{\_}source & NRSur7dq4 & 110.62 & 93.47 & 124.36 \\
 & IMRPhenomXO4a & 55.08 & 37.48 & 65.93 \\
 & SEOBNRv5pPHM & 111.10 & 91.61 & 127.56 \\
 & IMRPhenomXPHM & 93.33 & 73.44 & 111.44 \\
 & IMRPhenomTPHM & 110.04 & 95.16 & 125.21 \\
\hline
chi\ensuremath{\_}eff & NRSur7dq4 & 0.23 & -0.12 & 0.48 \\
 & IMRPhenomXO4a & 0.30 & 0.15 & 0.50 \\
 & SEOBNRv5PHM & 0.44 & 0.21 & 0.63 \\
 & IMRPhenomXPHM & 0.04 & -0.17 & 0.19 \\
 & IMRPhenomTPHM & 0.44 & 0.27 & 0.58 \\
\hline
chi\ensuremath{\_}p & NRSur7dq4 & 0.78 & 0.59 & 0.95 \\
 & IMRPhenomXO4a & 0.82 & 0.71 & 0.92 \\
 & SEOBNRv5PHM & 0.73 & 0.52 & 0.91 \\
 & IMRPhenomXPHM & 0.75 & 0.51 & 0.94 \\
 & IMRPhenomTPHM & 0.77 & 0.58 & 0.91 \\
\hline
redshift & NRSur7dq4 & 0.29 & 0.15 & 0.52 \\
 & IMRPhenomXO4a & 0.58 & 0.38 & 0.74 \\
 & SEOBNRv5PHM & 0.39 & 0.23 & 0.57 \\
 & IMRPhenomXPHM & 0.17 & 0.12 & 0.23 \\
 & IMRPhenomTPHM & 0.47 & 0.31 & 0.62 \\
\hline
final\ensuremath{\_}spin & NRSur7dq4 & 0.81 & 0.67 & 0.87 \\
 & IMRPhenomXO4a & 0.85 & 0.78 & 0.90 \\
 & SEOBNRv5PHM & 0.87 & 0.81 & 0.92 \\
 & IMRPhenomXPHM & 0.71 & 0.61 & 0.77 \\
 & IMRPhenomTPHM & 0.89 & 0.84 & 0.92 \\
\hline
\end{tabular}
\end{table*}

The most pronounced discrepancy is observed in the component masses, particularly for mass\ensuremath{\_}2\ensuremath{\_}source. While NRSur7dq4, SEOBNRv5PHM, and IMRPhenomTPHM infer a relatively symmetric binary system with mass\ensuremath{\_}2\ensuremath{\_}source medians ranging from $110.04\,M_\odot$ to $111.10\,M_\odot$, IMRPhenomXO4a predicts a significantly more asymmetric configuration, with a median mass\ensuremath{\_}2\ensuremath{\_}source of only $55.08\,M_\odot$. IMRPhenomXPHM also infers a lower secondary mass ($93.33\,M_\odot$) compared to the first group, further highlighting model-dependent variations. This fundamental disagreement in the mass ratio propagates to other inferred parameters, such as the effective inspiral spin parameter (chi\ensuremath{\_}eff) and redshift.

For chi\ensuremath{\_}eff, the inferred median values span a considerable range, from a near-zero value of $0.04$ for IMRPhenomXPHM to a significantly positive $0.44$ for SEOBNRv5PHM and IMRPhenomTPHM. Such a wide range has profound implications for understanding the astrophysical formation channels of GW231123, as chi\ensuremath{\_}eff is a key indicator of the binary's spin alignment with the orbital angular momentum. In contrast, the precessing spin parameter (chi\ensuremath{\_}p) shows a comparatively smaller spread in median values (from $0.73$ to $0.82$), suggesting that while the magnitude of precession is consistently inferred to be high, its detailed influence on other parameters varies.

These disagreements are quantitatively supported by the pairwise Jensen-Shannon Divergence (JSD) and 1-Wasserstein distance metrics, calculated as described in Section 2.2. For instance, JSD values between certain model pairs for mass\ensuremath{\_}2\ensuremath{\_}source and redshift frequently exceed $0.6$, indicating near-complete non-overlap of the 1D marginal posterior distributions. For redshift, IMRPhenomXPHM consistently places the source at a much closer distance (median $0.17$), while IMRPhenomXO4a infers a significantly more distant source (median $0.58$), with other models falling in between. This initial assessment underscores that the choice of waveform model introduces substantial systematic uncertainties that cannot be overlooked in astrophysical interpretations.

\subsection{High-Dimensional Degeneracy and Model Clustering}
To gain a more comprehensive understanding of how the waveform models populate the full, high-dimensional parameter space, we employed Uniform Manifold Approximation and Projection (UMAP), as detailed in Section 2.3. The $2D$ UMAP embedding, generated from the $13$-dimensional parameter space, provides a powerful visualization of the complex degeneracies and discrepancies.

The UMAP projection clearly reveals a structured separation of the models into distinct clusters, indicating that the discrepancies are not merely isolated to individual parameters but are inherent to the correlated, high-dimensional posterior distributions. The models coalesce into three primary groups:
\begin{enumerate}
    \item \textbf{A Core Cluster:} Comprising NRSur7dq4, SEOBNRv5PHM, and IMRPhenomTPHM. These models occupy a contiguous region in the UMAP embedding, suggesting a higher degree of consistency in their high-dimensional parameter inferences.
    \item \textbf{An Isolated Cluster (IMRPhenomXO4a):} This model forms a distinct, separate cluster, indicating significant divergence from all other models in the overall parameter space.
    \item \textbf{A Second Isolated Cluster (IMRPhenomXPHM):} This model also forms a unique cluster, located in a region of the UMAP space far from the other models.
\end{enumerate}

Table \ref{tab:umap_centroids} provides the UMAP centroid coordinates for each model, quantitatively illustrating their separation in the learned low-dimensional manifold. IMRPhenomXPHM is positioned at $UMAP\ensuremath{\_}1 \approx -3.86$, while IMRPhenomXO4a is at $UMAP\ensuremath{\_}1 \approx 11.42$, confirming their extreme separation from the core cluster which is centered around UMAP\ensuremath{\_}1 values closer to $0-3$.

\begin{table}[ht]
\centering
\caption{UMAP Cluster Centroids for Each Model}
\label{tab:umap_centroids}
\begin{tabular}{lcc}
\hline
\textbf{Model} & \textbf{UMAP\ensuremath{\_}1} & \textbf{UMAP\ensuremath{\_}2} \\
\hline
IMRPhenomTPHM & 3.46 & 5.69 \\
IMRPhenomXO4a & 11.42 & 6.74 \\
IMRPhenomXPHM & -3.86 & -2.20 \\
NRSur7dq4 & -0.33 & 3.18 \\
SEOBNRv5PHM & 2.90 & 3.08 \\
\hline
\end{tabular}
\end{table}

This clustering is physically meaningful. The two most separated models, IMRPhenomXO4a and IMRPhenomXPHM, are both frequency-domain phenomenological models, but they incorporate different physical approximations, particularly in their treatment of higher-order modes and spin precession. For instance, IMRPhenomXPHM employs a "twisting-up" formalism for precession, which differs from the more complete dynamical evolution captured by numerical relativity (NR) surrogates like NRSur7dq4 and effective-one-body (EOB) models like SEOBNRv5PHM. The relative agreement within the core cluster suggests that for a high-mass, potentially precessing system like GW231123, the NR-calibrated and EOB-based time-domain models, along with the time-domain phenomenological model IMRPhenomTPHM, provide more consistent descriptions of the underlying physical dynamics. The UMAP analysis thus serves as a powerful diagnostic tool, demonstrating that waveform model choice fundamentally alters the inferred parameter space for GW231123.

\subsection{Physics-Informed Discrepancy Decomposition}
To systematically attribute the observed high-dimensional disagreements to specific physical effects and the corresponding approximations within the waveform models, we performed a physics-informed discrepancy decomposition. As described in Section 2.4, this involved quantifying the multi-dimensional Jensen-Shannon Divergence (JSD) between model pairs within four predefined physical parameter subspaces: Mass \& Distance, Effective Spin, Individual Spin \& Orientation, and Remnant Properties.

\subsubsection{Mass \& Distance Subspace}
This subspace, comprising mass\ensuremath{\_}1\ensuremath{\_}source, mass\ensuremath{\_}2\ensuremath{\_}source, and redshift, exhibits extremely high JSD values (many exceeding $0.6$) across various model pairs. This confirms that the models fundamentally disagree on the intrinsic masses and the distance to the source. The systemic nature of this disagreement suggests that the way spin and orientation are modeled is strongly degenerate with the inferred masses and redshift. This leads to large systematic shifts in these fundamental parameters, highlighting that even basic source properties are not robustly constrained without accounting for waveform model systematics.

\subsubsection{Effective Spin Subspace}
Discrepancies in the Effective Spin subspace (chi\ensuremath{\_}eff, chi\ensuremath{\_}p) are also substantial. Notably, the JSD between IMRPhenomXPHM and IMRPhenomTPHM for this subspace is $0.636$, reflecting their starkly opposing conclusions on the effective spin parameter. This divergence directly points to differences in how models treat spin-orbit coupling and its influence on the inspiral rate. Conversely, SEOBNRv5PHM and IMRPhenomTPHM show remarkable agreement in this subspace (JSD = $0.043$), indicating that their modeling of orbit-averaged spin effects is highly consistent, despite representing different modeling paradigms (EOB vs. phenomenological).

\subsubsection{Individual Spin \& Orientation Subspace}
The $6$-dimensional Individual Spin \& Orientation subspace ($a_1$, $a_2$, $cos_tilt_1$, $cos_tilt_2$, $cos_theta_jn$, $phi_jl$) reveals the most severe and widespread disagreement among all subspaces. JSD values for many model pairs in this subspace approach the theoretical maximum of approximately $0.693$. This is a critical finding: the detailed, multi-dimensional configuration of the individual black hole spins and the binary's orientation relative to the observer is the most model-dependent aspect of the inference for GW231123. This profound divergence is the expected signature of differing treatments of spin precession. Models that employ simplified "twisting-up" formalisms (e.g., IMRPhenomXPHM, IMRPhenomXO4a) inherently produce different posterior distributions for these parameters compared to models that capture the full dynamical evolution of precessing spins (e.g., NRSur7dq4, SEOBNRv5PHM). This directly impacts the ability to infer the true spin configuration of the binary.

\subsubsection{Remnant Properties Subspace}
The inferred properties of the final remnant black hole (final\ensuremath{\_}mass\ensuremath{\_}source, final\ensuremath{\_}spin) are also highly model-dependent. The JSD values in this subspace are large, particularly for pairs involving IMRPhenomXPHM, which consistently predicts a much lower final spin compared to the other models (median $0.71$ vs. $0.81-0.89$). This suggests significant differences in the modeling of the merger-ringdown phase of the gravitational-wave signal and the calibration against numerical relativity simulations. The accurate inclusion of higher-order waveform modes, which become more prominent during the merger and ringdown, is crucial for precisely predicting remnant properties. The close agreement between SEOBNRv5PHM and IMRPhenomTPHM (JSD = $0.051$) in this subspace is again notable, as both models incorporate a more comprehensive treatment of higher-order modes and appear to have a more consistent description of the final state of the binary.

\subsection{Robust Astrophysical Inference for GW231123}
The culmination of our analysis was to synthesize the findings from the baseline comparisons, high-dimensional embedding, and physics-informed discrepancy decomposition to determine the robustness of astrophysical constraints for GW231123. As defined in Section 2.5, a parameter was considered "robust" if the maximum pairwise JSD across all models was below $0.05$ and the relative range of median values was less than $10\%$.

Our primary conclusion is that \textit{no key astrophysical parameter for GW231123 meets these criteria for robustness}. The systematic differences between the waveform models are significant enough to preclude a single, consensus measurement for any of the analyzed properties. Table \ref{tab:final_inference} summarizes the final astrophysical inference.

\begin{table*}[ht]
\centering
\caption{Final Astrophysical Inference Summary for GW231123}
\label{tab:final_inference}
\begin{tabular}{lccc}
\hline
\textbf{Parameter} & \textbf{Status} & \textbf{Consensus Value / Range} & \textbf{Physical Discrepancy Source} \\
\hline
mass\ensuremath{\_}1\ensuremath{\_}source & Model-Dependent & 129.1 - 149.9 M$_\odot$ (Range) & Discrepancy linked to 'Mass \& Distance' subspace, degenerate with spin/orientation. \\
mass\ensuremath{\_}2\ensuremath{\_}source & Model-Dependent & 55.1 - 111.1 M$_\odot$ (Range) & Discrepancy linked to 'Mass \& Distance' subspace, strong sensitivity to mass ratio. \\
chi\ensuremath{\_}eff & Model-Dependent & $0.04 - 0.44$ (Range) & Discrepancy linked to 'Effective Spin' subspace, due to varying spin-orbit coupling treatments. \\
chi\ensuremath{\_}p & Model-Dependent & $0.73 - 0.82$ (Range) & Discrepancy linked to 'Effective Spin' subspace, though less spread than chi\ensuremath{\_}eff. \\
redshift & Model-Dependent & $0.17 - 0.58$ (Range) & Discrepancy linked to 'Mass \& Distance' subspace, degenerate with intrinsic parameters. \\
final\ensuremath{\_}mass\ensuremath{\_}source & Model-Dependent & 189.7 - 232.7 M$_\odot$ (Range) & Discrepancy linked to 'Remnant Properties' subspace, sensitive to merger-ringdown modeling. \\
final\ensuremath{\_}spin & Model-Dependent & $0.71 - 0.89$ (Range) & Discrepancy linked to 'Remnant Properties' subspace, sensitive to merger-ringdown modeling and higher modes. \\
\hline
\end{tabular}
\end{table*}

This finding carries a crucial astrophysical implication: for high-mass, potentially precessing binary black hole mergers like GW231123, the signal is often relatively short in the detector's band and dominated by the highly non-linear merger and ringdown phases. In such cases, the systematic errors arising from the choice of waveform model can be comparable to, or even exceed, the statistical uncertainties inherent in the observational data. The wide range of inferred values, particularly for the mass ratio (e.g., mass\ensuremath{\_}2\ensuremath{\_}source varying from $55.1\,M_\odot$ to $111.1\,M_\odot$) and the effective spin (chi\ensuremath{\_}eff from $0.04$ to $0.44$), means that drawing firm conclusions about the source's formation history (e.g., distinguishing between isolated binary evolution and dynamical capture in a dense stellar environment) is severely hampered without a robust method to account for these waveform systematics. Our analysis unequivocally demonstrates that for GW231123, the choice of waveform model is not merely a technical detail but a dominant factor in the scientific interpretation of the event, precluding firm astrophysical conclusions about its nature or origin.

\end{document}
                