Hierarchical Supervised Monte Carlo Ensemble Learning

Ziauddin Ursani, Dmytro Antypov, Katie Atkinson, Judith Clymo, Matthew S. Dyer, Matthew J. Rosseinsky, Sven Schewe, Andrij Vasylenko

Published: 01 Jan 2024, Last Modified: 12 May 2025ICMLA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents hierarchical supervised Monte Carlo ensemble learning (HSMEL). This provides an extension to the theory of probabilistic hierarchical supervised ensemble learning (TPHSEL), which itself evolved from the theory of prob-abilistic hierarchical supervised learning (TPHSL). The basic idea captured in TPHSL is that a complex model can be replaced with a hierarchy of simple and mathematically understandable models. Such models are amenable to interpretation, and they are therefore more likely to contribute to explainable AI, in comparison to black box models. The basic TPHSL was subsequently advanced to TPHSEL, where several hierarchical models make a classification decision by majority vote. In this paper TPHSEL is further advanced to include the notion of Monte Carlo ensemble. We show that this ensemble is computationally faster and has broader reach on training examples. The method has been deployed in use cases from materials science, specifically to study the impact of various features on the conductivity of materials. Based on the performance of individual features, the method has been devised that applies set theory over ensemble outcomes to predict the average accuracy that could be achieved if those features are grouped in some way. We argue that this method has potential to accelerate material design procedures by providing predictions about machine learning performance parameters without engaging in extensive computational effort and consequently will also reduce chemistry lab experimentation. In addition, to show resilience of HSMEL, we have also applied it on 28 general machine learning datasets, where its performance is compared with the classical methods from the literature.