Unlocking Unlabeled Data: Ensemble Learning with the Hui-Walter Paradigm for Performance Estimation in Online and Static Settings

TMLR Paper1325 Authors

24 Jun 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In the realm of machine learning and statistical modeling, practitioners often work under the assumption of accessible, static, labeled data for evaluation and training. However, this assumption often deviates from reality where data may be private, encrypted, difficult- to-measure, or unlabeled. In this paper, we bridge this gap by adapting the Hui-Walter paradigm, a method traditionally applied in epidemiology and medicine, to the field of machine learning. This approach enables us to estimate key performance metrics such as false positive rate, false negative rate, and prior in scenarios where no ground truth is available. We further extend this paradigm for handling online data, opening up new possibilities for dynamic data environments. Our methodology, applied to two diverse datasets, the Wisconsin Breast Cancer and the Adult dataset, involves partitioning each into latent classes to simulate multiple data populations and independently training models to replicate multiple tests. By cross-tabulating binary outcomes across ensemble categorizers and multiple populations, we are able to estimate unknown parameters through Gibbs sampling, eliminating the need for ground-truth or labeled data. This paper showcases the potential of our methodology to transform machine learning practices by allowing for accurate model assessment under dynamic and uncertain data conditions.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Dear Esteemed Reviewers, We are deeply grateful for your comprehensive feedback on our manuscript. Your insights have not only illuminated areas for enhancement but have undeniably elevated the quality of our research. It is an honor to be in dialogue with scholars of your caliber, and we have endeavored to reflect upon and address each of your pertinent remarks. Kindly find below a summary of our revisions: **Scope beyond binary classifiers:** We believe binary classification problems are ubiquitous enough in the applied ML space (and multi-class problems can often be reduced to multiple binary classification problems) that we wanted to focus the experimental results just on this problem type. We also cover multiple ways to apply the Hui-Walter paradigm (the original method as it was introduced in 1980 and also extensions to online learning) that we felt including a multi-class dataset experiment would distract from our overall thesis. **Readability**: We improved the readability of our paper by - **Revising the “Our Contributions”** section to “Proposed Method”. The “Proposed Method” section introduction generally outlines the logic and flow of our approach. The subsequent “Review” and “Proposed” subsections contain background context and approach details, respectively. This also improves the distinction between our contribution and existing work. - **Removing redundant explanations** - **Introducing relevant equations sooner and labeling them accordingly** - **Increasing font sizes in our figures for legibility** - **Correcting erroneous references to tables and figures** - **Removing the link to our github repository to preserve anonymity** **Clarification of Hui-Walter requirements:** we mention in Section 2.2.1 that the Hui-Walter framework can scale to more than 2 populations, more than 2 classifiers, and more than 2 classes. We also mention it works best when classifiers are conditionally independent. **Interpretation of experimental results**: we have added more interpretation in the corresponding Experimental Results subsections to facilitate comprehension of our results. **Clarify Gibbs Sampling role in Hui-Walter:** We included an explanation in the “Review” subsection on “Gibbs Sampling” describing that it can be used as an alternative to MLE estimating the parameters of interest (i.e. prevalences, false positive rates, false negative rates) by assuming they each follow a specific distribution. **Comparisons to existing methods:** We believe our included comparisons of our proposed method with relevant baselines such as the Adjusted Rand Index and Balanced Accuracy, along with quantitative results such as performance metrics, were sufficient. We have also included a discussion on the advantages and disadvantages of our method compared to these baselines. You can find this comparison in the “Experimental Results” section, “Hui-Walter Online” subsection. We understand this deserves more emphasis and have included a brief discussion in the Introduction. Given your invaluable critique, our manuscript is a more compelling proposition for TMLR. To the reviewer who suggested additional experiments on diverse datasets: could you specify the number and type of datasets you envision? While we earnestly wish to adhere to all recommendations, the experimental facet of our work is intricate and time-intensive. We seek your understanding and guidance on this front. We extend our warm gratitude for your continued patience and hope our enhancements resonate with your expectations. With profound respect and anticipation, Sincerely, Anonymous Authours
Assigned Action Editor: ~Simon_Lacoste-Julien1
Submission Number: 1325
Loading