Enhancing Statistical Validity and Power in Hybrid Controlled Trials: A Randomization Inference Approach with Conformal Selective Borrowing

Ke Zhu; Shu Yang; Xiaofei Wang

Enhancing Statistical Validity and Power in Hybrid Controlled Trials: A Randomization Inference Approach with Conformal Selective Borrowing

Ke Zhu, Shu Yang, Xiaofei Wang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper proposes a randomization inference framework using conformal inference to selectively borrow external controls, ensuring exact type I error control and valid post-selection inference, with application to a lung cancer trial.

Abstract: External controls from historical trials or observational data can augment randomized controlled trials when large-scale randomization is impractical or unethical, such as in drug evaluation for rare diseases. However, non-randomized external controls can introduce biases, and existing Bayesian and frequentist methods may inflate the type I error rate, particularly in small-sample trials where external data borrowing is most critical. To address these challenges, we propose a randomization inference framework that ensures finite-sample exact and model-free type I error rate control, adhering to the “analyze as you randomize” principle to safeguard against hidden biases. Recognizing that biased external controls reduce the power of randomization tests, we leverage conformal inference to develop an individualized test-then-pool procedure that selectively borrows comparable external controls to improve power. Our approach incorporates selection uncertainty into randomization tests, providing valid post-selection inference. Additionally, we propose an adaptive procedure to optimize the selection threshold by minimizing the mean squared error across a class of estimators encompassing both no-borrowing and full-borrowing approaches. The proposed methods are supported by non-asymptotic theoretical analysis, validated through simulations, and applied to a randomized lung cancer trial that integrates external controls from the National Cancer Database.

Lay Summary: Clinical trials are the gold standard for testing new treatments, but they can be difficult to conduct when the disease is rare or patient enrollment is limited. In such cases, researchers often look to external data, such as historical trials or medical records, to strengthen the analysis. However, because this external data is not randomized, using it without caution can introduce bias and lead to misleading conclusions. We developed a new statistical method that carefully evaluates whether and how external data should be incorporated into a trial. It combines two ideas: randomization inference, which provides valid conclusions even in small trials, and conformal prediction, which identifies external data points that are comparable to the enrolled participants. By selectively borrowing comparable data, our method improves accuracy while protecting against hidden biases. It also includes an automatic tuning step that determines how much external information to borrow in each case. This method enhances underpowered trials by effectively utilizing existing data while maintaining validity. In a real-world lung cancer study, it boosted the robustness and precision of treatment evaluations, showcasing its ability to progress clinical research in scenarios where conventional trials encounter practical limitations.

Primary Area: General Machine Learning->Causality

Keywords: causal inference, data fusion, randomization test, real-world data and evidence, small sample size

Flagged For Ethics Review: true

Submission Number: 907

Loading