Looking for input files matching pattern: outputs/step1_cleaned_battery26_data_*.csv
Found input file: outputs/step1_cleaned_battery26_data_20250707_113751.csv
Loading data...
Data shape: (1083, 21)
All columns:
['user_id', 'test_run_id', 'age', 'gender', 'education_level', 'country', 'battery_id', 'time_of_day', 'grand_index', 'subtest_36_score', 'subtest_39_score', 'subtest_40_score', 'subtest_29_score', 'subtest_28_score', 'subtest_33_score', 'subtest_30_score', 'subtest_27_score', 'subtest_32_score', 'subtest_38_score', 'subtest_37_score', 'age_bin']

First 3 rows:
   user_id  test_run_id  age  ... subtest_38_score  subtest_37_score age_bin
0    68983       251259   50  ...             40.0              10.0   50-59
1   106315       614129   23  ...             48.0              10.0   18-29
2   334338       167761   60  ...             40.0               7.0   60-69

[3 rows x 21 columns]
All required columns present.

Unique values for key experimental design parameters:
age_bin unique values: ['18-29', '30-39', '40-49', '50-59', '60-69', '70-99']
battery_id unique values: [26]
gender unique values: ['f', 'm']
education_level unique values: [1, 2, 3, 4, 5, 6, 7, 8]
time_of_day unique values: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]

Subtest score columns data types:
subtest_36_score: float64
subtest_39_score: float64
subtest_40_score: float64
subtest_29_score: float64
subtest_28_score: float64
subtest_33_score: float64
subtest_30_score: float64
subtest_27_score: float64
subtest_32_score: float64
subtest_38_score: float64
subtest_37_score: float64
grand_index data type: float64

Filtering out rows with null values in critical columns...
Initial rows: 1083
Final rows: 1083
Excluded rows: 0

Expected age bins: ['18-29', '30-39', '40-49', '50-59', '60-69', '70-99']
Actual age bins: ['18-29', '30-39', '40-49', '50-59', '60-69', '70-99']
Calculating percentile ranks within age bins...
Processing subtest_36_score -> percentile_36
Processing subtest_39_score -> percentile_39
Processing subtest_40_score -> percentile_40
Processing subtest_29_score -> percentile_29
Processing subtest_28_score -> percentile_28
Processing subtest_33_score -> percentile_33
Processing subtest_30_score -> percentile_30
Processing subtest_27_score -> percentile_27
Processing subtest_32_score -> percentile_32
Processing subtest_38_score -> percentile_38
Processing subtest_37_score -> percentile_37

Dataframe with percentiles - First 2 rows:
   user_id  test_run_id  age  ... percentile_32  percentile_38 percentile_37
0    68983       251259   50  ...     84.976526      40.610329     67.840376
1   106315       614129   23  ...     52.839117      33.911672     60.410095

[2 rows x 32 columns]
Performing quality control checks...
Quality control for subtest_36_score (percentile column: percentile_36)
  50-59: Range 0.47 - 92.72
  50-59: KS statistic = 0.1056, p-value = 0.0160, uniform = False
  18-29: Range 0.32 - 90.06
  18-29: KS statistic = 0.1325, p-value = 0.0000, uniform = False
  60-69: Range 1.09 - 92.03
  60-69: KS statistic = 0.1123, p-value = 0.0568, uniform = True
  40-49: Range 0.59 - 91.18
  40-49: KS statistic = 0.1206, p-value = 0.0130, uniform = False
  30-39: Range 0.70 - 91.31
  30-39: KS statistic = 0.1291, p-value = 0.0015, uniform = False
  70-99: Range 3.12 - 96.88
  70-99: KS statistic = 0.1406, p-value = 0.5069, uniform = True
Quality control for subtest_39_score (percentile column: percentile_39)
  50-59: Range 0.47 - 99.53
  50-59: KS statistic = 0.0634, p-value = 0.3446, uniform = True
  18-29: Range 0.32 - 100.00
  18-29: KS statistic = 0.0568, p-value = 0.2489, uniform = True
  60-69: Range 0.72 - 99.64
  60-69: KS statistic = 0.0471, p-value = 0.9051, uniform = True
  40-49: Range 0.59 - 100.00
  40-49: KS statistic = 0.0706, p-value = 0.3489, uniform = True
  30-39: Range 0.47 - 100.00
  30-39: KS statistic = 0.0587, p-value = 0.4389, uniform = True
  70-99: Range 3.12 - 100.00
  70-99: KS statistic = 0.0625, p-value = 0.9988, uniform = True
Quality control for subtest_40_score (percentile column: percentile_40)
  50-59: Range 0.47 - 99.77
  50-59: KS statistic = 0.0305, p-value = 0.9854, uniform = True
  18-29: Range 0.32 - 100.00
  18-29: KS statistic = 0.0347, p-value = 0.8268, uniform = True
  60-69: Range 0.72 - 99.64
  60-69: KS statistic = 0.0399, p-value = 0.9746, uniform = True
  40-49: Range 0.59 - 100.00
  40-49: KS statistic = 0.0500, p-value = 0.7697, uniform = True
  30-39: Range 0.47 - 100.00
  30-39: KS statistic = 0.0376, p-value = 0.9135, uniform = True
  70-99: Range 3.12 - 100.00
  70-99: KS statistic = 0.0625, p-value = 0.9988, uniform = True
Quality control for subtest_29_score (percentile column: percentile_29)
  50-59: Range 0.47 - 100.00
  50-59: KS statistic = 0.0610, p-value = 0.3900, uniform = True
  18-29: Range 0.32 - 100.00
  18-29: KS statistic = 0.0568, p-value = 0.2489, uniform = True
  60-69: Range 0.72 - 99.64
  60-69: KS statistic = 0.0616, p-value = 0.6487, uniform = True
  40-49: Range 0.59 - 99.71
  40-49: KS statistic = 0.0735, p-value = 0.3020, uniform = True
  30-39: Range 1.17 - 100.00
  30-39: KS statistic = 0.0493, p-value = 0.6600, uniform = True
  70-99: Range 3.12 - 100.00
  70-99: KS statistic = 0.0938, p-value = 0.9164, uniform = True
Quality control for subtest_28_score (percentile column: percentile_28)
  50-59: Range 2.58 - 99.06
  50-59: KS statistic = 0.1901, p-value = 0.0000, uniform = False
  18-29: Range 1.10 - 100.00
  18-29: KS statistic = 0.1924, p-value = 0.0000, uniform = False
  60-69: Range 4.35 - 97.46
  60-69: KS statistic = 0.2246, p-value = 0.0000, uniform = False
  40-49: Range 3.53 - 99.71
  40-49: KS statistic = 0.1824, p-value = 0.0000, uniform = False
  30-39: Range 0.47 - 99.77
  30-39: KS statistic = 0.1808, p-value = 0.0000, uniform = False
  70-99: Range 4.69 - 96.88
  70-99: KS statistic = 0.2188, p-value = 0.0798, uniform = True
Quality control for subtest_33_score (percentile column: percentile_33)
  50-59: Range 2.35 - 97.89
  50-59: KS statistic = 0.2230, p-value = 0.0000, uniform = False
  18-29: Range 0.63 - 100.00
  18-29: KS statistic = 0.1845, p-value = 0.0000, uniform = False
  60-69: Range 5.07 - 98.91
  60-69: KS statistic = 0.1957, p-value = 0.0000, uniform = False
  40-49: Range 0.88 - 94.41
  40-49: KS statistic = 0.1882, p-value = 0.0000, uniform = False
  30-39: Range 1.17 - 99.06
  30-39: KS statistic = 0.2042, p-value = 0.0000, uniform = False
  70-99: Range 7.81 - 100.00
  70-99: KS statistic = 0.2344, p-value = 0.0498, uniform = False
Quality control for subtest_30_score (percentile column: percentile_30)
  50-59: Range 1.88 - 100.00
  50-59: KS statistic = 0.0657, p-value = 0.3029, uniform = True
  18-29: Range 0.47 - 100.00
  18-29: KS statistic = 0.0552, p-value = 0.2785, uniform = True
  60-69: Range 1.09 - 99.64
  60-69: KS statistic = 0.0761, p-value = 0.3822, uniform = True
  40-49: Range 0.59 - 99.12
  40-49: KS statistic = 0.0794, p-value = 0.2219, uniform = True
  30-39: Range 0.94 - 100.00
  30-39: KS statistic = 0.0563, p-value = 0.4909, uniform = True
  70-99: Range 7.81 - 100.00
  70-99: KS statistic = 0.0781, p-value = 0.9812, uniform = True
Quality control for subtest_27_score (percentile column: percentile_27)
  50-59: Range 1.64 - 99.77
  50-59: KS statistic = 0.2089, p-value = 0.0000, uniform = False
  18-29: Range 0.32 - 98.11
  18-29: KS statistic = 0.1893, p-value = 0.0000, uniform = False
  60-69: Range 2.90 - 98.55
  60-69: KS statistic = 0.1920, p-value = 0.0001, uniform = False
  40-49: Range 1.76 - 99.71
  40-49: KS statistic = 0.2059, p-value = 0.0000, uniform = False
  30-39: Range 0.70 - 98.12
  30-39: KS statistic = 0.2042, p-value = 0.0000, uniform = False
  70-99: Range 7.81 - 100.00
  70-99: KS statistic = 0.2031, p-value = 0.1236, uniform = True
Quality control for subtest_32_score (percentile column: percentile_32)
  50-59: Range 0.47 - 100.00
  50-59: KS statistic = 0.0141, p-value = 1.0000, uniform = True
  18-29: Range 0.32 - 100.00
  18-29: KS statistic = 0.0110, p-value = 1.0000, uniform = True
  60-69: Range 0.72 - 100.00
  60-69: KS statistic = 0.0217, p-value = 1.0000, uniform = True
  40-49: Range 0.59 - 100.00
  40-49: KS statistic = 0.0147, p-value = 1.0000, uniform = True
  30-39: Range 0.47 - 100.00
  30-39: KS statistic = 0.0164, p-value = 1.0000, uniform = True
  70-99: Range 3.12 - 100.00
  70-99: KS statistic = 0.0469, p-value = 1.0000, uniform = True
Quality control for subtest_38_score (percentile column: percentile_38)
  50-59: Range 0.47 - 100.00
  50-59: KS statistic = 0.0446, p-value = 0.7733, uniform = True
  18-29: Range 0.63 - 99.05
  18-29: KS statistic = 0.0315, p-value = 0.9006, uniform = True
  60-69: Range 0.72 - 100.00
  60-69: KS statistic = 0.0435, p-value = 0.9464, uniform = True
  40-49: Range 1.18 - 100.00
  40-49: KS statistic = 0.0353, p-value = 0.9790, uniform = True
  30-39: Range 0.47 - 99.06
  30-39: KS statistic = 0.0376, p-value = 0.9135, uniform = True
  70-99: Range 4.69 - 100.00
  70-99: KS statistic = 0.0781, p-value = 0.9812, uniform = True
Quality control for subtest_37_score (percentile column: percentile_37)
  50-59: Range 0.47 - 95.54
  50-59: KS statistic = 0.0869, p-value = 0.0757, uniform = True
  18-29: Range 0.32 - 93.69
  18-29: KS statistic = 0.1057, p-value = 0.0015, uniform = False
  60-69: Range 0.72 - 96.38
  60-69: KS statistic = 0.0906, p-value = 0.1953, uniform = True
  40-49: Range 0.59 - 92.06
  40-49: KS statistic = 0.1029, p-value = 0.0507, uniform = True
  30-39: Range 0.47 - 92.96
  30-39: KS statistic = 0.1221, p-value = 0.0032, uniform = False
  70-99: Range 3.12 - 100.00
  70-99: KS statistic = 0.1406, p-value = 0.5069, uniform = True

Quality control results:
   age_bin subtest_id  ks_statistic  ks_p_value  distribution_uniform
0    50-59         36      0.105634    0.015957                 False
1    18-29         36      0.132492    0.000026                 False
2    60-69         36      0.112319    0.056809                  True
3    40-49         36      0.120588    0.013020                 False
4    30-39         36      0.129108    0.001484                 False
..     ...        ...           ...         ...                   ...
61   18-29         37      0.105678    0.001548                 False
62   60-69         37      0.090580    0.195301                  True
63   40-49         37      0.102941    0.050671                  True
64   30-39         37      0.122066    0.003181                 False
65   70-99         37      0.140625    0.506941                  True

[66 rows x 5 columns]
Saved main output to: outputs/step2_percentile_rankings_20250707_114353.csv
Saved quality control results to: outputs/step2_quality_control_20250707_114353.csv

Final summary:
Total participants processed: 1083
Age bins represented: 6
Subtests processed: 11
Percentile columns created: 11

Percentile column ranges:
percentile_36: 0.32 - 96.88
percentile_39: 0.32 - 100.00
percentile_40: 0.32 - 100.00
percentile_29: 0.32 - 100.00
percentile_28: 0.47 - 100.00
percentile_33: 0.63 - 100.00
percentile_30: 0.47 - 100.00
percentile_27: 0.32 - 100.00
percentile_32: 0.32 - 100.00
percentile_38: 0.47 - 100.00
percentile_37: 0.32 - 100.00
Finished execution

