Starting Step 5: Interaction Regression Analysis
==================================================
Timestamp: 20250707_123144
Found latest file: outputs/step3_heterogeneity_metrics_20250707_115126.csv
Loading data from: outputs/step3_heterogeneity_metrics_20250707_115126.csv
Successfully loaded data with shape: (1083, 34)
All columns in dataset:
['user_id', 'age', 'gender', 'education_level', 'country', 'test_run_id', 'battery_id', 'time_of_day', 'grand_index', 'subtest_36_score', 'subtest_39_score', 'subtest_40_score', 'subtest_29_score', 'subtest_28_score', 'subtest_33_score', 'subtest_30_score', 'subtest_27_score', 'subtest_32_score', 'subtest_38_score', 'subtest_37_score', 'age_bin', 'percentile_36', 'percentile_39', 'percentile_40', 'percentile_29', 'percentile_28', 'percentile_33', 'percentile_30', 'percentile_27', 'percentile_32', 'percentile_38', 'percentile_37', 'percentile_range', 'percentile_iqr']

First 3 rows of data:
   user_id  age gender  ...  percentile_37 percentile_range  percentile_iqr
0    68983   50      m  ...      67.840376        44.366197       18.779343
1   106315   23      m  ...      60.410095        76.025237       32.255521
2   334338   60      m  ...      27.898551        65.217391       42.572464

[3 rows x 34 columns]

Unique values for key categorical variables:
Gender: ['m' 'f']
Education level: [6 8 4 1 2 3 7 5]
Country: ['US' 'NZ' 'AU' 'CA']
Time of day: [13 19  5 20  6 22  7  8 10  9 11 15 12 23 16 21 18 17  4  0 14  2  1  3]
Age bin: ['50-59' '18-29' '60-69' '40-49' '30-39' '70-99']

Data types for key variables:
Age: int64
Percentile range: float64
Percentile IQR: float64
Creating age groups...
Age group distribution:
age_group
Younger    530
Older      383
Middle     170
Name: count, dtype: int64
Creating education groups...
Education group distribution:
education_group
Medium    704
High      275
Low       104
Name: count, dtype: int64
Cleaning data for regression analysis...
Initial dataset size: 1083
Checking for missing values in key variables:
age: 0 missing values
gender: 0 missing values
education_level: 0 missing values
country: 0 missing values
time_of_day: 0 missing values
percentile_range: 0 missing values
percentile_iqr: 0 missing values
age_group: 0 missing values
education_group: 0 missing values
Checking for missing values in key variables (including binned time_of_day):
age: 0 missing values
gender: 0 missing values
education_level: 0 missing values
country: 0 missing values
time_of_day_binned: 0 missing values
percentile_range: 0 missing values
percentile_iqr: 0 missing values
age_group: 0 missing values
education_group: 0 missing values
Excluded 0 rows due to missing values
Final dataset size: 1083
Data cleaning completed successfully
Proceeding with 1083 observations

==================================================
MODEL 1: PERCENTILE RANGE
==================================================
Running Model_1_Range regression model...
Model formula: percentile_range ~ C(education_group, Treatment('Low')) * C(age_group, Treatment('Younger')) + C(gender) + C(country) + C(time_of_day_binned)
Model fitted successfully
R-squared: 0.0129
Adjusted R-squared: -0.0010
Extracted 16 coefficients
Interaction terms found: 4
Testing assumptions for Model_1_Range...
Testing normality of residuals...
Shapiro-Wilk test: statistic=0.9564, p=0.0000
Testing homoscedasticity...
Breusch-Pagan test: statistic=12.6149, p=0.6320
Testing independence of residuals...
Durbin-Watson test: statistic=1.9343
Testing multicollinearity (two-step VIF)...
Maximum VIF (Main Effects): 6.0768
Maximum VIF (Full Model): 12.4701
Assumption testing completed for Model_1_Range
Extracting model fit statistics for Model_1_Range...
Model fit statistics extracted successfully
Model 1 completed successfully

==================================================
MODEL 2: PERCENTILE IQR
==================================================
Running Model_2_IQR regression model...
Model formula: percentile_iqr ~ C(education_group, Treatment('Low')) * C(age_group, Treatment('Younger')) + C(gender) + C(country) + C(time_of_day_binned)
Model fitted successfully
R-squared: 0.0191
Adjusted R-squared: 0.0053
Extracted 16 coefficients
Interaction terms found: 4
Testing assumptions for Model_2_IQR...
Testing normality of residuals...
Shapiro-Wilk test: statistic=0.9956, p=0.0033
Testing homoscedasticity...
Breusch-Pagan test: statistic=14.2570, p=0.5061
Testing independence of residuals...
Durbin-Watson test: statistic=2.0182
Testing multicollinearity (two-step VIF)...
Maximum VIF (Main Effects): 6.0768
Maximum VIF (Full Model): 12.4701
Assumption testing completed for Model_2_IQR
Extracting model fit statistics for Model_2_IQR...
Model fit statistics extracted successfully
Model 2 completed successfully
Combined regression results: 32 rows
Interaction terms found: 8

Interaction term results:
       model_name  ... significant_bonferroni
12  Model_1_Range  ...                  False
13  Model_1_Range  ...                  False
14  Model_1_Range  ...                  False
15  Model_1_Range  ...                  False
28    Model_2_IQR  ...                  False
29    Model_2_IQR  ...                  False
30    Model_2_IQR  ...                  False
31    Model_2_IQR  ...                  False

[8 rows x 5 columns]
Combined assumption test results: 10 rows
Combined model fit results: 2 rows
Saving results to CSV files...
Saved regression results to: outputs/step5_interaction_regression_results_20250707_123144.csv
Saved assumption test results to: outputs/step5_assumption_tests_20250707_123144.csv
Saved model fit results to: outputs/step5_model_fit_20250707_123144.csv
Saved age group data to: outputs/step5_age_group_data_20250707_123144.csv
All results saved successfully

==================================================
ANALYSIS SUMMARY
==================================================
No significant interaction terms found after Bonferroni correction

Assumption test summary:
- Model_1_Range Breusch-Pagan: MET
- Model_1_Range Durbin-Watson: MET
- Model_1_Range Shapiro-Wilk: VIOLATED
- Model_1_Range VIF (Full Model): VIOLATED
- Model_1_Range VIF (Main Effects): MET
- Model_2_IQR Breusch-Pagan: MET
- Model_2_IQR Durbin-Watson: MET
- Model_2_IQR Shapiro-Wilk: VIOLATED
- Model_2_IQR VIF (Full Model): VIOLATED
- Model_2_IQR VIF (Main Effects): MET

Interaction regression analysis completed successfully
Finished execution

