\documentclass[twocolumn]{aastex631}

\usepackage{amsmath}
\usepackage{multirow}
\usepackage{natbib}
\usepackage{graphicx} 
\usepackage{aas_macros}

\begin{document}

\subsection{Baseline comparison: Significant divergence in key physical parameters}
Our initial exploratory data analysis, utilizing summary statistics and pairwise statistical divergence metrics as outlined in Section 2.2, immediately revealed substantial disagreements among the five waveform models regarding the inferred astrophysical parameters for GW231123. As summarized in Table \ref{tab:summary_stats} and visually presented through one-dimensional marginal posterior distributions in Figure \ref{fig:marginal_posteriors}, key source parameters exhibit significant model-dependent variations.

\begin{table*}[ht]
\centering
\caption{Summary of Inferred Parameters for GW231123}
\label{tab:summary_stats}
\begin{tabular}{lcccc}
\hline
\textbf{Parameter} & \textbf{Model} & \textbf{Median} & \textbf{5th Percentile} & \textbf{95th Percentile} \\
\hline
\texttt{mass\_1\_source} & NRSur7dq4 & 129.14 & 115.15 & 143.86 \\
 & IMRPhenomXO4a & 143.18 & 128.70 & 167.47 \\
 & SEOBNRv5PHM & 133.69 & 119.69 & 152.28 \\
 & IMRPhenomXPHM & 149.87 & 138.24 & 162.34 \\
 & IMRPhenomTPHM & 133.37 & 121.44 & 150.75 \\
\hline
\texttt{mass\_2\_source} & NRSur7dq4 & 110.62 & 93.47 & 124.36 \\
 & IMRPhenomXO4a & 55.08 & 37.48 & 65.93 \\
 & SEOBNRv5PHM & 111.10 & 91.61 & 127.56 \\
 & IMRPhenomXPHM & 93.33 & 73.44 & 111.44 \\
 & IMRPhenomTPHM & 110.04 & 95.16 & 125.21 \\
\hline
\texttt{chi\_eff} & NRSur7dq4 & 0.23 & -0.12 & 0.48 \\
 & IMRPhenomXO4a & 0.30 & 0.15 & 0.50 \\
 & SEOBNRv5PHM & 0.44 & 0.21 & 0.63 \\
 & IMRPhenomXPHM & 0.04 & -0.17 & 0.19 \\
 & IMRPhenomTPHM & 0.44 & 0.27 & 0.58 \\
\hline
\texttt{chi\_p} & NRSur7dq4 & 0.78 & 0.59 & 0.95 \\
 & IMRPhenomXO4a & 0.82 & 0.71 & 0.92 \\
 & SEOBNRv5PHM & 0.73 & 0.52 & 0.91 \\
 & IMRPhenomXPHM & 0.75 & 0.51 & 0.94 \\
 & IMRPhenomTPHM & 0.77 & 0.58 & 0.91 \\
\hline
\texttt{redshift} & NRSur7dq4 & 0.29 & 0.15 & 0.52 \\
 & IMRPhenomXO4a & 0.58 & 0.38 & 0.74 \\
 & SEOBNRv5PHM & 0.39 & 0.23 & 0.57 \\
 & IMRPhenomXPHM & 0.17 & 0.12 & 0.23 \\
 & IMRPhenomTPHM & 0.47 & 0.31 & 0.62 \\
\hline
\texttt{final\_spin} & NRSur7dq4 & 0.81 & 0.67 & 0.87 \\
 & IMRPhenomXO4a & 0.85 & 0.78 & 0.90 \\
 & SEOBNRv5PHM & 0.87 & 0.81 & 0.92 \\
 & IMRPhenomXPHM & 0.71 & 0.61 & 0.77 \\
 & IMRPhenomTPHM & 0.89 & 0.84 & 0.92 \\
\hline
\end{tabular}
\end{table*}

\begin{figure}[htbp]
\centering
\includegraphics[width=0.5\textwidth]{../input_files/plots/marginal_posteriors_comparison_1_20250810-173014.png}
\caption{One-dimensional marginal posterior distributions for key astrophysical parameters of GW231123, inferred using five different waveform models. The posteriors reveal significant disagreements across models, particularly for \texttt{mass\_2\_source}, \texttt{chi\_eff}, and \texttt{redshift}. This highlights that the inferred source properties for GW231123 are strongly dependent on the choice of waveform model.}
\label{fig:marginal_posteriors}
\end{figure}

The most pronounced discrepancy, evident in both Table \ref{tab:summary_stats} and Figure \ref{fig:marginal_posteriors}, is observed in the component masses, particularly for \texttt{mass\_2\_source}. While NRSur7dq4, SEOBNRv5PHM, and IMRPhenomTPHM infer a relatively symmetric binary system with \texttt{mass\_2\_source} medians ranging from $110.04\,M_\odot$ to $111.10\,M_\odot$, IMRPhenomXO4a predicts a significantly more asymmetric configuration, with a median \texttt{mass\_2\_source} of only $55.08\,M_\odot$. IMRPhenomXPHM also infers a lower secondary mass ($93.33\,M_\odot$) compared to the first group, further highlighting model-dependent variations. This fundamental disagreement in the mass ratio propagates to other inferred parameters, such as the effective inspiral spin parameter (\texttt{chi\_eff}) and \texttt{redshift}.

For \texttt{chi\_eff}, the inferred median values span a considerable range, from a near-zero value of $0.04$ for IMRPhenomXPHM to a significantly positive $0.44$ for SEOBNRv5PHM and IMRPhenomTPHM, as shown in Table \ref{tab:summary_stats} and visually confirmed by the distinct posterior peaks in Figure \ref{fig:marginal_posteriors}. Such a wide range has profound implications for understanding the astrophysical formation channels of GW231123, as \texttt{chi\_eff} is a key indicator of the binary's spin alignment with the orbital angular momentum. In contrast, the precessing spin parameter (\texttt{chi\_p}) shows a comparatively smaller spread in median values (from $0.73$ to $0.82$), suggesting that while the magnitude of precession is consistently inferred to be high, its detailed influence on other parameters varies.

These disagreements are quantitatively supported by the pairwise Jensen-Shannon Divergence (JSD) and 1-Wasserstein distance metrics, calculated as described in Section 2.2. For instance, JSD values between certain model pairs for \texttt{mass\_2\_source} and \texttt{redshift} frequently exceed $0.6$, indicating near-complete non-overlap of the 1D marginal posterior distributions, as is clearly visible in Figure \ref{fig:marginal_posteriors}. For \texttt{redshift}, IMRPhenomXPHM consistently places the source at a much closer distance (median $0.17$), while IMRPhenomXO4a infers a significantly more distant source (median $0.58$), with other models falling in between. This initial assessment underscores that the choice of waveform model introduces substantial systematic uncertainties that cannot be overlooked in astrophysical interpretations.

\subsection{High-dimensional degeneracy and model clustering}
To gain a more comprehensive understanding of how the waveform models populate the full, high-dimensional parameter space, we employed Uniform Manifold Approximation and Projection (UMAP), as detailed in Section 2.3. The $2D$ UMAP embeddings, generated from the $13$-dimensional parameter space and shown in Figure \ref{fig:umap_final_embedding} and Figure \ref{fig:umap_sensitivity_analysis}, provide a powerful visualization of the complex degeneracies and discrepancies.

\begin{figure}[htbp]
\centering
\includegraphics[width=0.5\textwidth]{../input_files/plots/umap_final_embedding_3_20250810-175050.png}
\caption{UMAP projection of posterior samples for GW231123, illustrating the relationships among the five waveform models. Distinct clusters emerge: a core group comprising \texttt{NRSur7dq4}, \texttt{SEOBNRv5PHM}, and \texttt{IMRPhenomTPHM}, and isolated clusters for \texttt{IMRPhenomXO4a} and \texttt{IMRPhenomXPHM}. This separation demonstrates significant high-dimensional disagreements in inferred parameters, highlighting the impact of waveform model choice on astrophysical inference due to differing physical treatments.}
\label{fig:umap_final_embedding}
\end{figure}

\begin{figure}[htbp]
\centering
\includegraphics[width=0.5\textwidth]{../input_files/plots/umap_sensitivity_analysis_2_20250810-174855.png}
\caption{UMAP 2D embedding of the full posterior distributions for GW231123, colored by waveform model. The models cluster into three distinct groups: a core cluster (\texttt{NRSur7dq4}, \texttt{SEOBNRv5PHM}, \texttt{IMRPhenomTPHM}) and two isolated clusters (\texttt{IMRPhenomXO4a}, \texttt{IMRPhenomXPHM}). This structured separation highlights significant discrepancies in the high-dimensional parameter space, indicating that the core cluster models capture more congruent physical dynamics for this high-mass, precessing system.}
\label{fig:umap_sensitivity_analysis}
\end{figure}

The UMAP projection, as depicted in Figure \ref{fig:umap_final_embedding} and Figure \ref{fig:umap_sensitivity_analysis}, clearly reveals a structured separation of the models into distinct clusters. This indicates that the discrepancies are not merely isolated to individual parameters but are inherent to the correlated, high-dimensional posterior distributions. The models coalesce into three primary groups:
\begin{enumerate}
    \item \textbf{A Core Cluster:} Comprising NRSur7dq4, SEOBNRv5PHM, and IMRPhenomTPHM. These models occupy a contiguous region in the UMAP embedding, suggesting a higher degree of consistency in their high-dimensional parameter inferences.
    \item \textbf{An Isolated Cluster (IMRPhenomXO4a):} This model forms a distinct, separate cluster, indicating significant divergence from all other models in the overall parameter space.
    \textbf{A Second Isolated Cluster (IMRPhenomXPHM):} This model also forms a unique cluster, located in a region of the UMAP space far from the other models.
\end{enumerate}

Table \ref{tab:umap_centroids} provides the UMAP centroid coordinates for each model, quantitatively illustrating their separation in the learned low-dimensional manifold. IMRPhenomXPHM is positioned at \texttt{UMAP\_1} $\approx -3.86$, while IMRPhenomXO4a is at \texttt{UMAP\_1} $\approx 11.42$, confirming their extreme separation from the core cluster which is centered around \texttt{UMAP\_1} values closer to $0-3$.

\begin{table}[ht]
\centering
\caption{UMAP Cluster Centroids for Each Model}
\label{tab:umap_centroids}
\begin{tabular}{lcc}
\hline
\textbf{Model} & \textbf{UMAP\_1} & \textbf{UMAP\_2} \\
\hline
IMRPhenomTPHM & 3.46 & 5.69 \\
IMRPhenomXO4a & 11.42 & 6.74 \\
IMRPhenomXPHM & -3.86 & -2.20 \\
NRSur7dq4 & -0.33 & 3.18 \\
SEOBNRv5PHM & 2.90 & 3.08 \\
\hline
\end{tabular}
\end{table}

This clustering is physically meaningful. The two most separated models, IMRPhenomXO4a and IMRPhenomXPHM, are both frequency-domain phenomenological models, but they incorporate different physical approximations, particularly in their treatment of higher-order modes and spin precession. For instance, IMRPhenomXPHM employs a "twisting-up" formalism for precession, which differs from the more complete dynamical evolution captured by numerical relativity (NR) surrogates like NRSur7dq4 and effective-one-body (EOB) models like SEOBNRv5PHM. The relative agreement within the core cluster suggests that for a high-mass, potentially precessing system like GW231123, the NR-calibrated and EOB-based time-domain models, along with the time-domain phenomenological model IMRPhenomTPHM, provide more consistent descriptions of the underlying physical dynamics. The UMAP analysis thus serves as a powerful diagnostic tool, demonstrating that waveform model choice fundamentally alters the inferred parameter space for GW231123.

\subsection{Physics-informed discrepancy decomposition}
To systematically attribute the observed high-dimensional disagreements to specific physical effects and the corresponding approximations within the waveform models, we performed a physics-informed discrepancy decomposition. As described in Section 2.4, this involved quantifying the multi-dimensional Jensen-Shannon Divergence (JSD) between model pairs within four predefined physical parameter subspaces: Mass \& Distance, Effective Spin, Individual Spin \& Orientation, and Remnant Properties. The results are presented as pairwise JSD heatmaps in Figure \ref{fig:subspace_jsd_heatmaps}.

\begin{figure}[htbp]
\centering
\includegraphics[width=0.5\textwidth]{../input_files/plots/subspace_jsd_heatmaps_4_20250810-180917.png}
\caption{Pairwise Jensen-Shannon Divergence (JSD) heatmaps quantify disagreements between five waveform models for GW231123 across four distinct astrophysical parameter subspaces. Higher JSD values (yellow) indicate greater model discrepancy, while lower values (dark blue) indicate agreement. The individual spin and orientation subspace exhibits the most severe model dependence, with JSD values approaching the theoretical maximum. Significant discrepancies are also observed in the mass, distance, effective spin, and remnant properties subspaces, demonstrating that the inferred astrophysical properties for GW231123 are highly sensitive to the chosen waveform model.}
\label{fig:subspace_jsd_heatmaps}
\end{figure}

\subsubsection{Mass \& distance subspace}
This subspace, comprising \texttt{mass\_1\_source}, \texttt{mass\_2\_source}, and \texttt{redshift}, exhibits extremely high JSD values (many exceeding $0.6$) across various model pairs, as seen in the top-left heatmap of Figure \ref{fig:subspace_jsd_heatmaps}. This confirms that the models fundamentally disagree on the intrinsic masses and the distance to the source. The systemic nature of this disagreement suggests that the way spin and orientation are modeled is strongly degenerate with the inferred masses and \texttt{redshift}. This leads to large systematic shifts in these fundamental parameters, highlighting that even basic source properties are not robustly constrained without accounting for waveform model systematics.

\subsubsection{Effective spin subspace}
Discrepancies in the Effective Spin subspace (\texttt{chi\_eff}, \texttt{chi\_p}) are also substantial, as shown in the top-right heatmap of Figure \ref{fig:subspace_jsd_heatmaps}. Notably, the JSD between IMRPhenomXPHM and IMRPhenomTPHM for this subspace is $0.636$, reflecting their starkly opposing conclusions on the effective spin parameter. This divergence directly points to differences in how models treat spin-orbit coupling and its influence on the inspiral rate. Conversely, SEOBNRv5PHM and IMRPhenomTPHM show remarkable agreement in this subspace (JSD = $0.043$), indicating that their modeling of orbit-averaged spin effects is highly consistent, despite representing different modeling paradigms (EOB vs. phenomenological).

\subsubsection{Individual spin \& orientation subspace}
The $6$-dimensional Individual Spin \& Orientation subspace (\texttt{a1}, \texttt{a2}, \texttt{cos\_tilt\_1}, \texttt{cos\_tilt\_2}, \texttt{cos\_theta\_jn}, \texttt{phi\_jl}) reveals the most severe and widespread disagreement among all subspaces, with JSD values for many model pairs approaching the theoretical maximum of approximately $0.693$ (bottom-left heatmap in Figure \ref{fig:subspace_jsd_heatmaps}). This is a critical finding: the detailed, multi-dimensional configuration of the individual black hole spins and the binary's orientation relative to the observer is the most model-dependent aspect of the inference for GW231123. This profound divergence is the expected signature of differing treatments of spin precession. Models that employ simplified "twisting-up" formalisms (e.g., IMRPhenomXPHM, IMRPhenomXO4a) inherently produce different posterior distributions for these parameters compared to models that capture the full dynamical evolution of precessing spins (e.g., NRSur7dq4, SEOBNRv5PHM). This directly impacts the ability to infer the true spin configuration of the binary.

\subsubsection{Remnant properties subspace}
The inferred properties of the final remnant black hole (\texttt{final\_mass\_source}, \texttt{final\_spin}) are also highly model-dependent, as illustrated in the bottom-right heatmap of Figure \ref{fig:subspace_jsd_heatmaps}. The JSD values in this subspace are large, particularly for pairs involving IMRPhenomXPHM, which consistently predicts a much lower final spin compared to the other models (median $0.71$ vs. $0.81-0.89$, as shown in Table \ref{tab:summary_stats}). This suggests significant differences in the modeling of the merger-ringdown phase of the gravitational-wave signal and the calibration against numerical relativity simulations. The accurate inclusion of higher-order waveform modes, which become more prominent during the merger and ringdown, is crucial for precisely predicting remnant properties. The close agreement between SEOBNRv5PHM and IMRPhenomTPHM (JSD = $0.051$) in this subspace is again notable, as both models incorporate a more comprehensive treatment of higher-order modes and appear to have a more consistent description of the final state of the binary.

\subsection{Robust astrophysical inference for GW231123}
The culmination of our analysis was to synthesize the findings from the baseline comparisons, high-dimensional embedding, and physics-informed discrepancy decomposition to determine the robustness of astrophysical constraints for GW231123. As defined in Section 2.5, a parameter was considered "robust" if the maximum pairwise JSD across all models was below $0.05$ and the relative range of median values was less than $10\%$.

Our primary conclusion, summarized in Table \ref{tab:final_inference}, is that \textit{no key astrophysical parameter for GW231123 meets these criteria for robustness}. The systematic differences between the waveform models are significant enough to preclude a single, consensus measurement for any of the analyzed properties.

\begin{table*}[ht]
\centering
\caption{Final Astrophysical Inference Summary for GW231123}
\label{tab:final_inference}
\begin{tabular}{lccc}
\hline
\textbf{Parameter} & \textbf{Status} & \textbf{Consensus Value / Range} & \textbf{Physical Discrepancy Source} \\
\hline
\texttt{mass\_1\_source} & Model-Dependent & 129.1 - 149.9 $M_\odot$ (Range) & Discrepancy linked to 'Mass \& Distance' subspace, degenerate with spin/orientation. \\
\texttt{mass\_2\_source} & Model-Dependent & 55.1 - 111.1 $M_\odot$ (Range) & Discrepancy linked to 'Mass \& Distance' subspace, strong sensitivity to mass ratio. \\
\texttt{chi\_eff} & Model-Dependent & $0.04 - 0.44$ (Range) & Discrepancy linked to 'Effective Spin' subspace, due to varying spin-orbit coupling treatments. \\
\texttt{chi\_p} & Model-Dependent & $0.73 - 0.82$ (Range) & Discrepancy linked to 'Effective Spin' subspace, though less spread than \texttt{chi\_eff}. \\
\texttt{redshift} & Model-Dependent & $0.17 - 0.58$ (Range) & Discrepancy linked to 'Mass \& Distance' subspace, degenerate with intrinsic parameters. \\
\texttt{final\_mass\_source} & Model-Dependent & 189.7 - 232.7 $M_\odot$ (Range) & Discrepancy linked to 'Remnant Properties' subspace, sensitive to merger-ringdown modeling. \\
\texttt{final\_spin} & Model-Dependent & $0.71 - 0.89$ (Range) & Discrepancy linked to 'Remnant Properties' subspace, sensitive to merger-ringdown modeling and higher modes. \\
\hline
\end{tabular}
\end{table*}

This finding carries a crucial astrophysical implication: for high-mass, potentially precessing binary black hole mergers like GW231123, the signal is often relatively short in the detector's band and dominated by the highly non-linear merger and ringdown phases. In such cases, the systematic errors arising from the choice of waveform model can be comparable to, or even exceed, the statistical uncertainties inherent in the observational data. The wide range of inferred values, particularly for the mass ratio (e.g., \texttt{mass\_2\_source} varying from $55.1\,M_\odot$ to $111.1\,M_\odot$) and the effective spin (\texttt{chi\_eff} from $0.04$ to $0.44$), means that drawing firm conclusions about the source's formation history (e.g., distinguishing between isolated binary evolution and dynamical capture in a dense stellar environment) is severely hampered without a robust method to account for these waveform systematics. Our analysis unequivocally demonstrates that for GW231123, the choice of waveform model is not merely a technical detail but a dominant factor in the scientific interpretation of the event, precluding firm astrophysical conclusions about its nature or origin.

\end{document}
                