The Theory of Probabilistic Hierarchical Supervised Ensemble Learning

Ziauddin Ursani; Dmytro Antypov; Katie Atkinson; Judith Clymo; Matthew S. Dyer; Matthew J. Rosseinsky; Sven Schewe; Andrij Vasylenko

The Theory of Probabilistic Hierarchical Supervised Ensemble Learning

Ziauddin Ursani, Dmytro Antypov, Katie Atkinson, Judith Clymo, Matthew S. Dyer, Matthew J. Rosseinsky, Sven Schewe, Andrij Vasylenko

Published: 01 Jan 2024, Last Modified: 12 May 2025ICMLA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents the theory of probabilistic hier-archical supervised ensemble learning (TPHSEL), a classification approach we have developed with the goal of obtaining classifications for material selection with a degree of interpretability of the results. We found that TPHSEL is a competitive classifier, not only for our target application, but also for a broader range of standard datasets, where it outperformed support vector machines, random forests, and optimal classification trees. The dataset we developed the method for within the field of materials science is small (405 entries), leading to relatively low accuracy (81 % to 82 %) for both our method and a deep learning approach used earlier. In this context, we found that selection based on a large vote share left close to 20 % of candidate materials, and in this bracket, accuracy and other model performance metrics are above 0.95. This is excellent news for prioritising experimental targets (and related tasks), as it indicates that it is possible to identify promising candidates based on data that still leaves shortfalls in classification.

Loading