Abstract: We present FOLD-SE, an efficient, explainable machine learning algorithm for classification tasks given tabular data containing numerical and categorical values. The (explainable) model generated by FOLD-SE is represented as a set of default rules. FOLD-SE uses a novel heuristic called Magic Gini Impurity for literal selection that we have devised. FOLD-SE uses a refined data comparison operator and eliminates the long tail effect. Thanks to these innovations, explainability provided by FOLD-SE is scalable, meaning that regardless of the size of the dataset, the number of learned rules and learned literals stay quite small while good accuracy in classification is maintained. Additionally, the rule-set constituting the model that FOLD-SE generates does not change significantly if the training data is slightly varied. FOLD-SE is competitive with state-of-the-art traditional machine learning algorithms such as XGBoost and Multi-Layer Perceptrons (MLP) w.r.t. accuracy of prediction while being an order of magnitude faster. However, unlike XGBoost and MLP, FOLD-SE generates explainable models. The FOLD-SE algorithm outperforms prior rule-learning algorithms such as RIPPER in efficiency, performance, and scalability, especially for large datasets. FOLD-SE generates a far smaller number of rules than earlier algorithms that learn default rules.
Loading