<pre-registration>
    <metadata>
        <title>Educational Attainment and Cognitive Profile Heterogeneity: Evidence for Domain-Specific Performance Patterns in Large-Scale Cognitive Assessment</title>
        <description>This study examines whether individuals with higher educational attainment exhibit greater cognitive profile heterogeneity (larger discrepancies between their highest and lowest domain-specific percentile rankings) compared to those with lower educational attainment, using percentile-based measures across 11 cognitive subtests from Battery 26 of the NeuroCognitive Performance Test dataset.</description>
        <contributors>Research team to be specified</contributors>

        <license type="select_one">
            <options>
                - CC-By Attribution 4.0 International
            </options>
        </license>

        <subject>Psychology, Cognitive Psychology, Educational Psychology, Individual Differences, Psychometrics</subject>
        <tags>cognitive profile heterogeneity, educational attainment, individual differences, cognitive assessment, percentile rankings, domain-specific performance</tags>
    </metadata>

    <study_information>
        <hypotheses>Primary Hypothesis: Individuals with higher educational attainment will demonstrate significantly greater cognitive profile heterogeneity, defined as larger discrepancies between their highest and lowest domain-specific percentile rankings within age-matched normative groups, compared to individuals with lower educational attainment. (Directional hypothesis)

This hypothesis is justified by Ackerman's PPIK theory, which proposes that extended educational experiences lead to domain-specific knowledge accumulation and cognitive specialization. Educational environments encourage development of particular cognitive strengths through specialized coursework and academic focus areas, potentially creating more differentiated cognitive profiles. Research on expertise demonstrates that extended practice in specific domains leads to domain-specific advantages, suggesting that educational specialization may produce similar patterns of cognitive differentiation in the general population.

Secondary Hypothesis: The association between educational attainment and cognitive profile heterogeneity will be significantly stronger among older adults (ages 50+) compared to younger adults (ages 18-39), reflecting potential cumulative differentiation effects over extended periods of specialized engagement. (Directional hypothesis)

This hypothesis is justified by lifespan developmental theories suggesting that cognitive differentiation may be a developmental process extending beyond traditional developmental periods. Older adults have had extended exposure to specialized educational and occupational environments, potentially allowing for greater accumulation of domain-specific advantages. The cumulative nature of educational and occupational specialization suggests that age-related differences in the education-heterogeneity relationship should emerge, with stronger associations in older cohorts who have had more time for cognitive differentiation to develop.</hypotheses>
    </study_information>

    <design_plan>
        <study_type type="select_one">
            <options>
                - Observational Study
            </options>
        </study_type>

        <blinding type="select_multiple">
            <options>
                - Personnel who analyze the data collected from the study are not aware of the treatment applied to any given group.
            </options>
        </blinding>
        
        <additional_blinding>No additional blinding procedures are applicable to this observational study using existing cognitive assessment data.</additional_blinding>
        <study_design>This is a cross-sectional observational study using existing cognitive assessment data from Battery 26. The design examines the relationship between educational attainment (8-level ordinal independent variable) and cognitive profile heterogeneity (continuous dependent variable) across six age strata ([18-29], [30-39], [40-49], [50-59], [60-69], [70-99]). The study includes age-stratified analyses to test for differential effects across the lifespan, with participants stratified within age bins to control for age-education confounding. The design uses within-age-bin percentile rankings to create relative performance measures that control for age-related performance differences while preserving individual differences in cognitive profile patterns.</study_design>
        <randomization>No randomization is involved as this is an observational study using existing cognitive assessment data.</randomization>
    </design_plan>

    <sampling_plan>
        <existing_data type="select_one">
            <options>
                - Registration prior to accessing the data
            </options>
        </existing_data>
        
        <existing_data_explanation>The data from Battery 26 exist but have not been accessed by the research team. No analysis has been conducted related to the specific research questions about educational attainment and cognitive profile heterogeneity. The dataset will be accessed only after pre-registration to ensure confirmatory analysis.</existing_data_explanation>
        <data_collection_procedures>Data will be obtained from Battery 26 (battery26.csv), which contains cognitive assessment data from approximately 318,300 participants. Inclusion criteria: participants with complete data on all 11 subtests, valid demographic information (age, gender, education_level), and grand_index scores. Exclusion criteria: test runs with pauses >24 hours between subtests, participants taking >15 minutes on Trail Making subtests (IDs 39 & 40), missing grand_index indicating incomplete battery, and missing essential demographic data (age, gender, education_level). Statistical outlier exclusion will be applied using robust criteria: participants with subtest scores exceeding 3 standard deviations from the mean within their age bin will be excluded to preserve legitimate cognitive profiles while removing extreme measurement errors.</data_collection_procedures>
        <sample_size>Target sample size is 4,786 participants after applying all exclusion criteria and accounting for anticipated data loss. This includes stratified sampling across six age bins with approximately 798 participants per age group to ensure adequate representation across the lifespan.</sample_size>

<sample_size_rationale>Power analysis conducted using the pwr package in R determined minimum sample size requirements. For the primary hypothesis (8-level ordinal educational attainment predictor), power analysis used: statistical power ≥ 0.80, significance level α = 0.025 (Bonferroni-corrected for two heterogeneity metrics), effect size Cohen's d = 0.145 (converted to f² = 0.0215). For the secondary hypothesis (education×age interaction with 14 interaction terms), effect size Cohen's d = 0.077 (converted to f² = 0.006). Initial calculated sample size was 3,589 (driven by the secondary hypothesis requirements), increased to 4,786 with 25% attrition buffer to account for multiple exclusion criteria including pause >24hrs, >15min on trails, missing data, and statistical outliers.</sample_size_rationale>

<stopping_rule>Not applicable - using existing dataset with predetermined sample based on power analysis requirements.</stopping_rule>
    </sampling_plan>

    <variables>
        <manipulated_variables>Not applicable - this is an observational study with no experimental manipulations.</manipulated_variables>
        <measured_variables>Independent Variable: Educational attainment (education_level) - 8-level ordinal variable (levels 1-8: 1=Some high school, 2=High school diploma/GED, 3=Some college, 4=College degree, 5=Professional degree, 6=Master's degree, 7=Ph.D., 8=Associate's degree; level 99 'Other' excluded from analysis).

Dependent Variables: Two cognitive profile heterogeneity metrics calculated from 11 age-stratified percentile ranks: (1) Percentile Range - difference between maximum and minimum percentile scores across subtests, (2) Percentile Interquartile Range - difference between 75th and 25th percentile of the 11 percentile scores.

Covariates: Age (continuous), gender (categorical: male/female), country (categorical: US/Canada/Australia/New Zealand), time_of_day (categorical: hour of test initiation), grand_index (continuous measure of overall cognitive performance).

Age Groups: Six discrete age bins ([18-29], [30-39], [40-49], [50-59], [60-69], [70-99]) for stratified analyses and three-level categorical variable for interaction analysis (Younger 18-39, Middle 40-49, Older 50+).

Raw Cognitive Measures: Eleven subtest scores from Battery 26 including verbal list learning (ID 36), trail making A (ID 39), trail making B (ID 40), arithmetic reasoning (ID 29), forward memory span (ID 28), reverse memory span (ID 33), grammatical reasoning (ID 30), divided visual attention (ID 27), go/no-go (ID 32), digit symbol coding (ID 38), and delayed verbal list learning (ID 37).</measured_variables>
        <indices>Cognitive Profile Heterogeneity Metrics: (1) Percentile Range = max(percentile scores) - min(percentile scores) across 11 subtests, (2) Percentile IQR = 75th percentile - 25th percentile of the 11 percentile scores for each participant, where percentile values are calculated using numpy.percentile function with 'linear' interpolation.

Age-Stratified Percentile Rankings: Each participant's raw score on each of the 11 subtests converted to percentile rank (0-100) within their age bin using scipy.stats.percentileofscore with 'rank' method. Age bins defined as [18-29], [30-39], [40-49], [50-59], [60-69], [70-99] with bin membership determined by age at time of testing.

Heterogeneity Validation Index: Pearson correlation coefficient between each heterogeneity metric and grand_index score to confirm measurement independence from general cognitive ability.</indices>
    </variables>

    <analysis_plan>
        <statistical_models>Primary Analysis: Two separate multiple linear regression models using statsmodels.formula.api.ols. Each model tests one heterogeneity metric (Percentile Range, Percentile IQR) as the continuous dependent variable with education_level as the ordered categorical predictor (dummy coded with level 1 as reference category) and age, gender, country, and time_of_day as covariates. Model specification: heterogeneity_metric ~ C(education_level, Treatment(1)) + age + C(gender) + C(country) + C(time_of_day).

Secondary Analysis: Two multiple linear regression models including education_level × age_group interaction terms to test whether the education-heterogeneity relationship differs across age groups. Model specification: heterogeneity_metric ~ C(education_level, Treatment(1)) * C(age_group, Treatment('Younger')) + C(gender) + C(country) + C(time_of_day).

Validation Analysis: Pearson correlations between heterogeneity metrics and grand_index to confirm independence from overall performance (expected |r| < 0.20 based on psychometric standards for discriminant validity).

Assumption Testing: Shapiro-Wilk tests for normality of residuals, Breusch-Pagan tests for homoscedasticity, and variance inflation factor calculations for multicollinearity assessment. Durbin-Watson tests for independence of residuals.</statistical_models>
        <transformations>Time-based subtests (Go/no-go ID 32; Trail Making A/B IDs 39, 40) will be reverse-scored by subtracting each score from the maximum score plus one within each age bin, so higher scores uniformly indicate better performance. Raw subtest scores will be converted to age-stratified percentile rankings (0-100) within discrete age bins using scipy.stats.percentileofscore. Education level will be dummy coded with level 1 (Some high school) as the reference category. Age group will be dummy coded with 'Younger' (18-39) as the reference category for interaction analyses.</transformations>
        <inference_criteria>Two-tailed tests with Bonferroni correction: α = 0.025 for primary analyses (correcting for two heterogeneity metrics), α = 0.025 for secondary analyses (correcting for two interaction tests). Effect sizes reported as standardized regression coefficients (β) with 95% confidence intervals. Statistical significance determined by p-values relative to corrected alpha levels. For validation analyses, correlations with |r| ≥ 0.20 will be considered evidence against discriminant validity.</inference_criteria>
        <data_exclusion>Participants excluded if: (1) test runs with pause >24 hours between subtests, (2) >15 minutes on Trail Making subtests (IDs 39, 40), (3) missing grand_index or essential demographic data (age, gender, education_level), (4) any subtest score exceeding 3 standard deviations from the age-bin mean (indicating extreme measurement error rather than legitimate cognitive variation), (5) education_level = 99 ('Other' category). No awareness checks applicable to this secondary data analysis.</data_exclusion>
        <missing_data>Participants with missing data on any of the 11 subtests, grand_index, age, gender, or education_level will be excluded from analysis. Complete case analysis will be used as missing data is expected to be minimal after applying exclusion criteria. No imputation procedures will be applied.</missing_data>
        <exploratory_analysis>Sensitivity analyses will examine: (1) alternative heterogeneity metric using coefficient of variation of percentile ranks (standard deviation/mean), (2) collapsed 3-level education variable (Low: levels 1-2; Medium: levels 3-4,8; High: levels 5-7), (3) analysis using 3-standard deviation outlier cutoff instead of the planned exclusion criteria, (4) domain-specific heterogeneity patterns examining which cognitive domains show strongest education-related differentiation. Split-half reliability analysis of heterogeneity metrics using odd-even subtest groupings to assess measurement stability.</exploratory_analysis>
    </analysis_plan>

    <other>This research extends Carroll's three-stratum theory of cognitive abilities by examining cognitive profile differentiation in relation to educational experiences. The percentile-based approach provides a methodologically sound alternative to variance-based specialization metrics that have been criticized in recent psychometric literature. The study addresses fundamental questions about cognitive architecture plasticity and the role of educational experiences in shaping individual differences patterns. Findings will contribute to theoretical understanding of cognitive development and have practical implications for educational assessment and personalized learning approaches. The large sample size and comprehensive cognitive battery provide unprecedented statistical power for detecting education-related cognitive differentiation effects in the general population.</other>
</pre-registration>