This folder contains all code necessary to reproduce the numerical results presented in the paper "A Computationally Efficient Case-Control Sampling Framework for G-Formula with Longitudinal Data" submitted to ICLR 2026.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% File Descriptions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
*updatedMatchedGformula.R*
Contains all functions used to implement the NICE/ICE estimators of Logistic Regression (LR) with either complete data or case-control matched data.

*jobLR.sh*
The bash script used to submit simulation jobs for LR to the high-performance computing (HPC) environment.

*0_ParameterSetup.R*
Specifies the simulation parameter settings. The paper considers three levels of outcome rarity, corresponding to setup = 5, 6, and 7.

*1_TrueACE.R*
Estimates the true risks under the defined treatment strategies (always treated vs. never treated) using Monte Carlo simulation.

*2_DataGenerating.R*
Generates synthetic datasets based on the parameter configurations defined in *0_ParameterSetup.R*. All the generated dataset were stored in the folder "Data_Bootstrap" to avoid multiple data generation. 

*5_dataFittingBootsrapSingleRun_varyingJ.R*
Performs a single bootstrap resampling and estimates risks using the NICE/ICE estimators on both complete and case-control datasets. By modifying the parameter setup within the script, all three simulation setups can be executed.

*8_Aggregate_LR.R*
Aggregates results from all bootstrap samples.

*9_Summary_LR.R*
Generates tables and figures reported in the paper.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% Remark for Reproducing the Results
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

To improve computational efficiency, we utilized a high-performance computing (HPC) environment to carry out the simulation studies. For each setup, we generated 100 independent datasets, which are stored in the local folder "Data_Bootstrap".

For each setup and each dataset, we ran 100 bootstrap iterations using different random seeds. Each job loaded one dataset and performed the bootstrap resampling with one seed, with the resulting estimates saved.

The results were aggregated by loading the .txt files corresponding to each setup. For ease of reproduction, the aggregated results have been saved directly in the ""Results_LR_Summary/LR_summary.txt". The script *9_Summary_LR.R* was then used to generate the figures and tables presented in the paper.
