Neurosymbolic Association Rule Mining from Tabular Data

Published: 29 Aug 2025, Last Modified: 29 Aug 2025NeSy 2025 - Phase 2 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: association rule mining, neurosymbolic artificial intelligence, tabular data, interpretable machine learning
TL;DR: A Neurosymbolic association rule mining method from tabular data that can learn a concise set of high-quality rules with full data coverage faster and improve downstream task performance.
Abstract: Association Rule Mining (ARM) is the task of mining patterns among data features in the form of logical rules, with applications across a myriad of domains. However, high-dimensional datasets often result in an excessive number of rules, increasing execution time and negatively impacting downstream task performance. Managing this rule explosion remains a central challenge in ARM research. To address this, we introduce Aerial+, a novel neurosymbolic ARM method. Aerial+ leverages an under-complete autoencoder to create a neural representation of the data, capturing associations between features. It extracts rules from this neural representation by exploiting the model’s reconstruction mechanism. Extensive evaluations on five datasets against seven baselines demonstrate that Aerial+ achieves state-of-the-art results by learning more concise, high-quality rule sets with full data coverage. When integrated into rule-based interpretable machine learning models, Aerial+ significantly reduces execution time while maintaining or improving accuracy.
Track: Neurosymbolic Methods for Trustworthy and Interpretable AI
Paper Type: Long Paper
Resubmission: Yes
Software: https://github.com/DiTEC-project/aerial-rule-mining
Changes List: We would like to thank the reviewers for their time and constructive feedback! This script describes the changes made in our paper after the initial submission, as well as our responses to reviewers’ comments. [[ ]]: Text inside “double square brackets” refers to complete feedback given by the reviewers. In addition to those changes, we made the following two improvements in our paper: 1. Added the qualitative advantages of our Aerial+ approach over the state-of-the-art, at the end of the ‘Evaluation’ section. a. After training an Autoencoder in linear time over the number of data instances (e.g., rows), Aerial+ can be used to ‘verify’ whether a certain association exists in the data by creating the corresponding ‘test vector’ in O(1) time, which is not possible with the state-of-the-art ARM approaches as they need to go through the entire data each time to count co-occurrences of items (algorithmic/exhaustive) or need to run several optimization iterations over the entire data (optimization-based ARM). b. Aerial+ can be integrated into larger neural networks for interpretability, which is not possible with state-of-the-art algorithmic ARM approaches. 2. A Python package for Aerial+ with different ARM variations and support for downstream tasks is created, and an anonymous reference to the package is added at the beginning of the ‘Evaluation’ section, Open-Source paragraph. This is an addition to Aerial+’s experimental source code with all the baseline implementations and datasets. 3. A formal justification for Aerial+’s rule extraction is given in Appendix F. [[Reviewer 1: This paper presents a method called Aerial+ for using denoising auto-encoders to conduct efficient and effective rule mining. The general idea is to replace counting of instances in the dataset with an auto-encoder trained on the dataset. The general idea, applicable to discrete data, is plausible and explained clearly. The method has been implemented and published on GitHub. It has been tested on 5 commonly used datasets of variable sizes, showing smaller rule sets with higher confidence compared to exhaustive and optimising ARM methods. It is further tested on downstream classifiers using the output of Aerial+ compared to that of exhaustive methods in classifiers. The results show reduced execution time throughout and improved accuracy in many cases. Strengths: the presented method is useful, as rule mining is still popular and more effective methods are welcome. The general idea is interesting, and seems to show positive results. The results are very positive in terms of execution time and mostly positive in terms of rule quality and downstream classifiers. The method is also parallelizable and suitable for GPUs and variants for itemset mining and item constraints are given. Weaknesses: there are some elements that are not quite clear: It's not stated what the training times are for the auto-encoders (only 'one or two epochs') or what hardware was used. ]] 1. The execution times in “Experimental Setting 1” include both Autoencoder training and rule extraction times. This is explicitly mentioned in the last sentence of the “Execution time and number of rules” subsection. 2. The execution times in “Experimental Setting 2” include all the steps; Autoencoder training, rule extraction and building the classifiers. For the baseline methods, it again includes entire rule learning/mining time plus the time it took to build classifiers. This is explicitly mentioned in the footnote 3 of the same section. 3. The used hardware is explicitly stated in the beginning of Section 4. [[The evaluation metrics coverage, support and confidence are differently aggregated, which is normal but should be explicit. ]] This is now explicitly clarified in Section 4.1 [[All evaluations are presumably done on the training set, it would be good to also run evaluations on test sets. ]] To clarify how our evaluation methods were run: 1. Experimental Setting 1 uses the standard evaluation method in ARM literature, which is learning rules over the entire dataset and then calculating rule quality metrics again over the full dataset. 2. Experimental Setting 2 performs a train-test split with “10-fold cross-validation” when performing downstream classification tasks with the rules. Note that the ARM literature solely relies on rule quality evaluation only, while we provide this additional validation of our approach on downstream tasks with 10-fold cross-validation. [[The datasets are fairly small by today's standards, and the scalability of the method has not been tested.]] Please, see our response to the first comment of the second reviewer. We clarified the scalability of the algorithm further at the end of Section 3. [[Reviewer 2: The paper introduces a method to extract association rules from AutoEncoders (AE) trained on tabular data. The extraction algorithm is based on the reconstruction capability of the AEs, where a test vector is employed as the initial (soft) association of variables and the construction values (after being thresholded) are considered as the rules. Strengths: The method is simple and it works well on small datasets. The paper is easy to read, although the presentation should be improved. Weaknesses The method is bruce-force, it has to consider as many combinations of variables, which might work for small datasets but it would be challenging for larger ones. Therefore, there will be a big issue with scalability. In the complexity analysis, the extraction time is exponentially expensive. ]] We respectfully disagree with the reviewer with regards to the complexity analysis statement. We believe the reviewer refers to the feature combination vector C (line 3 of Algorithm 1), which is created from a combination of features up to a maximum number of antecedents “a”. This maximum number of antecedents is very small in practice (2 to 4) and it does not grow neither with the number of features (columns in a table), nor with the number of transactions (rows in a table). Therefore, we consider “a” to be a constant. The complexity analysis in Appendix A shows the complexity of extraction O(|F|^(a+1)), which is polynomial to the number of features given that “a” is a constant. This is explicitly stated in Section 3.3, "Algorithm", and in Appendix A. Note that since each feature combination S ∈ C is processed independently, Algorithm 1 supports parallel execution. All operations use vector representations, enabling efficient GPU execution. We would also like to point out that the rule extraction time does not depend at all on the number of transactions (rows in a table) and only depends on the number of features (columns in a table), while the autoencoder training time scales linearly with the number of transactions, due to only a forward pass of the table. [[The paper lacks formal justification for why the implication rules are extracted within a generative structure like AE.]] Our approach does not use a generative structure. We believe the reviewer was referring to a “variational” Autoencoder (AE, VAE). However, we only use a standard under-complete denoising AE to learn a neural representation of the tabular data, which is not a VAE. A formal justification is added in Appendix F. Furthermore, at the beginning of Section 3.3, we provide a reasoned intuition on why our Autoencoder architecture works for extracting rules and refer to Appendix F. [[It is not clear how antecedents and consequents are defined. According to the method, they are determined by thresholds (0.5 for antecedents, 0.8 for consequents), but how are these thresholds chosen, and why is the threshold for consequents higher than that for antecedents?]] The process of hyperparameter tuning for choosing the thresholds is described in Appendix B, as explained at the end of the Evaluation Section 4. We have now added one more reference to Appendix B in the "Algorithm" part in Section 3.3. [[The extraction algorithm should be explained better.]] The line-by-line explanation of the algorithm is provided in Section 3.3. We would like to point out that while we are constrained by the available space in the paper, the github page of the PyAerial Python package, together with our experimental code base (both anonymized links are given in Section 4) provides further detailed explanation of all aspects of the algorithm for those who are planning to use the package. [[The literature review did not cover a large number of methods that learn rules from datasets.]] We have now provided a reference to a survey of exhaustive ARM methods to the Related Work (Luna et al. 2019), and clarified that the reference to Kaushik et al. (2023) provides a comprehensive survey to optimization-based numerical ARM methods. We have also provided links to the three existing deep-learning methods. We would like to point out that the term "rule learning" is often used for different tasks, such as learning knowledge graph representations, which are out of scope for this paper. This is clearly mentioned in the Related Work section, but further literature analysis of these other approaches is not relevant to the scope of our work. [[Reviewer 3: The paper proposes a Deep Learning (DL) approach for the problem of Automatic Rule Mining (ARM). The proposal combines an autoencoder with a procedure for extracting rules from output of the autoencoder. The main claim of the paper is that this architecture is able to find few and more concise rules with less time. This is shown in the good results of the evaluation with respect to exhaustive search algorithms, such as FP-growth, or optimization-based techniques. The paper presents a Deep Learning solution to a combinatorics problem. While this is a hot and interesting topic for the AI community, the proposal does not use any symbolic procedure/knowledge (Algorithm 1 does not leverage any symbolic knowledge). Therefore, the NeSy conference is not the right venue for this paper.]] The NeSy organisers have confirmed to us that they believe our paper is topical to the conference. The symbolic knowledge comes in the form of interpretable logical rules, extracted from raw data, and the fact that Aerial+ proposes an algorithmic (symbolic) approach together with neural networks for rule mining. [[Other concerns: The novelty should be better explained as it is not clear the difference with other DL methods for ARM.]] As explained in the Related Work section, to the best of our knowledge, only a few deep learning (DL) approaches exist that address an association rule mining (ARM) problem similar to the one in our paper. Patel and Yadav (2022) do not provide any algorithm description or source code and couldn't be reproduced; Berteloot et al. (2024) proposed ARM-AE which is compared with our approach in the evaluation section, showing notably worse results; Karabulut et al. 2024 presented a DL-based ARM pipeline, however, it is specifically designed for sensor data using semantics in the Internet of Things domain, while Aerial+ provides a more generalised pipeline, applicable to all kinds of tabular data and tested on a downstream classification task. Note that the term "rule learning" is often used for other tasks, such as learning rules in knowledge graphs, which also have DL approaches. But since they solve a different problem, they are out of scope for this paper. [[DL methods, when used for solving combinatorics problems, do not guarantee the correctness of the output. Some discussions about this must be done. How can I be sure that the proposed method always returns rules with the right support?]] A formal justification as to why Aerial+ works for association rule mining is added in “Appendix F”. Furthermore, as mentioned in our paper, the standard approach to evaluating ARM methods is based on statistical rule quality metrics such as support, confidence, lift, etc., which show the quality of the found rules according to these pre-defined criteria. In addition to providing these metrics for the rules that Aerial+ finds, we provided a full-featured downstream evaluation on a classification task, by using the found rules in interpretable ML models CBA, BRL, and CORELS. The evaluation empirically shows that the found rules can be successfully employed for further downstream tasks besides knowledge discovery. [[The formalism in section 3.2 is hard to follow as it does not reflect the standard ARM setup defined in Section 2. How features and items are related? Are the same thing? Please clarify.]] We agree with the reviewer that the formalism in the Related Work Section 2 did not provide a clear connection to the formalism in Section 3.2. We have now reworked both sections to explicitly connect formalisms in both sections with each other. [[The evaluation part just shows and lists the results and the better performance of the proposal without a deep explanation about the reasons why the proposal has better results.]] At the beginning of Section 3.3, we provide a reasoned intuition on why the autoencoder architecture works for extracting rules. This is further shown in the “Formal justification for Aerial+’s rule extraction” section in Appendix F, where we provide formal justification as to how Aerial+ results in (fewer) prominent association rules.
Publication Agreement: pdf
Submission Number: 14
Loading