Keywords: exploratory causal modeling, cumulative advantage
Abstract: This paper develops an AI-assisted computational framework for exploratory causal modeling of cumulative advantage in small-$N$, spatially heterogeneous domains, demonstrated through a case study of junior golf. The analysis is methodologically challenging due to sparse state-level data, high collinearity among predictors, and the need to approximate unobservable factors. We address these challenges with a \textbf{dual-method framework} that combines forward-selection regression with leave-one-out cross-validation (LOOCV) for predictive modeling, and Directed Acyclic Graph (DAG)-guided structural modeling to explore assumption-dependent associations and simulate counterfactual scenarios.
Using data on 16,000+ junior golfers across all U.S. states, we find that population and participation serve as strong baseline predictors of elite performance; PGA Tour event presence, a proxy for elite training access, shows an independent and sizable association; financial strength is predictive only for Top 50 girls; and climate shows little direct association once other factors are accounted for. Exploratory simulations---such as adding a PGA Tour event or increasing participation---suggest potential gains in elite-player production. Our framework demonstrates how AI-assisted exploratory causal modeling can generate transparent, assumption-guided insights that generalize beyond sport to other small-$N$ scientific domains.
Supplementary Material: zip
Submission Number: 152
Loading