Generalizability of Experimental Studies

Generalizability of Experimental Studies

07 Feb 2026 (modified: 03 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Experimental studies are a cornerstone of Machine Learning (ML) research. A common and often implicit assumption is that the study's results will generalize beyond the study itself, e.g., to new data. That is, repeating the same study under different conditions will likely yield similar results. Existing frameworks to measure generalizability, borrowed from the causal inference literature, cannot capture the complexity of the results and the research questions of an ML study. The problem of measuring generalizability in the more general ML setting is thus still open, also due to the lack of a mathematical formalization of experimental studies. In this paper, we propose such a formalization, use it to develop a framework to quantify generalizability, and propose an instantiation based on rankings and the Maximum Mean Discrepancy. The definition we propose is appropriate to compute the generalizability of experimental results within a bounded range of levels. We show how our framework offers insights into the number of experiments necessary for a generalizable study, and how experimenters can benefit from it. Finally, we release the genexpy Python package, which allows for the evaluation of the generalizability of other experimental studies.

Submission Type: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=j1ZtWdWn7u

Changes Since Last Submission: We made changes to address the clarity concerns raised for our last submission and to add a new contribution. Specifically: 1. We shortened the title. 2. We reworked figures 1 and 2 as well as algorithm 1. 3. We simplified and streamlined Section 3, removing unnecessary details in the formalization which were obfuscating the core contribution. 4. We expanded the case studies to include kernels for numerical results. 5. We added a discussion on how generalizability and significance are orthogonal concepts. 6. We added a new contribution, a closed-form approximation of the CDF of the MMD, for the specific case of discrete probability distributions and equal sample size used in definition 4.4. 7. We removed Appendix B.

Assigned Action Editor: ~Tom_Rainforth1

Submission Number: 7398

Loading