Step 1: A power analysis is required for the multiple linear regression models testing the relationship between educational attainment and cognitive profile heterogeneity. The analysis must be conducted programmatically in R using the 'pwr' package. For the primary hypothesis testing education level (8-level categorical predictor with dummy coding) effects on heterogeneity metrics, conduct a power analysis for multiple regression with the following parameters: Statistical Power ≥ 0.80, Significance Level α = 0.025 (Bonferroni-corrected for two heterogeneity metrics), Effect Size Cohen's d = 0.145 (converted to f² = 0.0215). For the secondary hypothesis testing education×age group interactions (14 interaction terms), conduct power analysis with Effect Size Cohen's d = 0.077 (converted to f² = 0.006). The analysis must determine minimum sample size requirements for each age group stratum (18-39, 40-49, 50+) and education level combination to detect meaningful effects. Use pwr.f2.test function in R to calculate required sample sizes, then apply 25% attrition buffer to account for multiple exclusion criteria.

Step 2: Load and preprocess raw data from 'battery26.csv'. First, filter participants based on documented exclusion criteria: remove test runs with a pause >24h between subtests, those taking >15 minutes on Trail Making subtests (IDs 39 & 40), those with missing 'grand_index' (incomplete battery) or essential demographic data ('age', 'gender', 'education_level'), and exclude education_level = 99 ('Other' category). Reverse-score time-based subtests (Go/no-go ID 32; Trail Making A/B IDs 39, 40) by subtracting each score from the maximum score plus one within each age bin, so that higher scores uniformly indicate better performance. For statistical outlier removal, partition data into six age bins ([18-29], [30-39], [40-49], [50-59], [60-69], [70-99]) and identify scores exceeding 3 standard deviations from the age-bin mean for each of the 11 subtests. Exclude participants with any flagged outlier scores to ensure profile validity. The final output is a single cleaned data file containing participant demographics and processed subtest scores for subsequent analysis.

Step 3: Calculate age-stratified percentile rankings for each of the 11 subtest scores from the cleaned dataset. First, partition the sample into six discrete age bins: [18-29], [30-39], [40-49], [50-59], [60-69], [70-99]. Within each age bin, convert each participant's raw score on each of the 11 subtests into a percentile rank (scaled 0-100) using the `scipy.stats.percentileofscore` function with the 'rank' method for handling ties. This process creates relative performance indices, controlling for age-related changes in absolute performance. As a quality check, verify that the resulting percentile distributions for each subtest within each age bin are approximately uniform using Kolmogorov-Smirnov tests.

Step 4: Compute and validate two distinct cognitive profile heterogeneity metrics for each participant using their 11 age-stratified percentile ranks. The primary metric will be the Percentile Range, defined as the difference between the maximum and minimum percentile scores across the 11 subtests. The secondary metric will be the Percentile Interquartile Range (IQR), defined as the difference between the 75th and 25th percentile of their 11 percentile scores using numpy.percentile function with 'linear' interpolation. To validate that these metrics capture profile shape rather than general ability, compute the Pearson correlation between each heterogeneity metric and the 'grand_index' score. Confirm that the correlations are weak (|r| < 0.20), indicating adequate discriminant validity and metric independence from overall performance.

Step 5: To test the primary hypothesis, conduct two separate multiple linear regression models using `statsmodels.formula.api.ols`. For each model, the dependent variable will be one of the continuous heterogeneity metrics (Percentile Range, Percentile IQR). The primary independent variable will be 'education_level', dummy coded with level 1 (Some high school) as the reference category (levels 2-8 as dummy variables). The model will include 'age' (continuous), 'gender', 'country', and 'time_of_day' as covariates. Model specification: heterogeneity_metric ~ C(education_level, Treatment(1)) + age + C(gender) + C(country) + C(time_of_day). Perform comprehensive assumption testing including Shapiro-Wilk tests for normality of residuals, Breusch-Pagan tests for homoscedasticity, variance inflation factor calculations for multicollinearity assessment, and Durbin-Watson tests for independence of residuals. A Bonferroni correction will be applied to account for testing two metrics (alpha = 0.025). Report standardized regression coefficients (β) with 95% Confidence Intervals to quantify the association between each level of education and profile heterogeneity.

Step 6: Test the secondary hypothesis regarding age-related differences by examining the interaction between education and age. Create a three-level categorical 'age_group' variable: Younger (18-39), Middle (40-49), and Older (50+), dummy coded with 'Younger' as the reference category. Conduct two multiple linear regression models (one for each heterogeneity metric) including statistical interaction terms: heterogeneity_metric ~ C(education_level, Treatment(1)) * C(age_group, Treatment('Younger')) + C(gender) + C(country) + C(time_of_day). The significance of the education_level:age_group interaction terms, assessed at a Bonferroni-corrected alpha of 0.025, will provide a direct statistical test of whether the relationship between education and heterogeneity differs significantly across age groups. Perform the same assumption testing as in Step 5 for both interaction models.

Step 7: Perform a set of sensitivity analyses to ensure the robustness of the primary findings. A) Rerun the primary models from Step 5 using an alternative heterogeneity metric: the coefficient of variation of each participant's 11 percentile ranks (standard deviation/mean). B) Rerun the primary models using a collapsed 3-level education variable (Low: levels 1-2; Medium: levels 3-4,8; High: levels 5-7) to check for consistency with a less granular predictor. C) Rerun the primary models on a dataset prepared using 3-standard deviation outlier cutoff instead of the planned exclusion criteria. D) Conduct split-half reliability analysis of heterogeneity metrics using odd-even subtest groupings to assess measurement stability. Compare the resulting standardized regression coefficients, confidence intervals, and p-values from these analyses to the main results to confirm the stability of the conclusions.