Characterizing Exceptional Distributions with Neural Rule Extraction

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Rule Learning, Normalizing Flows, Subgroup Discovery
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Given a quantity of interest, e.g. Covid-19 mortality, we find subsets of people who stand out with an usual distribution and characterize them by neurally optimized rules.
Abstract: Explaining the characteristics of patients with an unusual disease mortality can be an important tool to a clinician to understand and treat diseases. More generally, our goal is to find subsets of the data where the distribution of the target property, e.g. patient survivability, differs. The discovered subset must also defined by a human-interpretable rule given some descriptive features. However, previous methods typically constrain the property of interest to be a scalar, which must also follow some standard distribution. Additionally, they require a prohibitive computational complexity for larger number of features, while, invariably, applying them on numerical features requires their a-priori discretisation. To this end, we propose SYFLOW, a method which leverages the flexibility of normalising flows to learn any distribution that the property of interest may follow. With this, we then quantify the KL-divergence of this distribution in the discovered subset, thus yielding an objective that can be directly optimised all the way back to learnable feature weights. These, in turn, result in interpretable descriptions like ``*Patients with heart disease and blood cholesterol above 243mg/dL*''. When applied on established real-world datasets, SYFLOW provides easily interpretable descriptions in a fraction of the times of state-of-the-art methods, and seamlessly extends onto multi-variate targets, such as images. In evaluating on synthetic datasets, we also outperform the competition in terms of precision/recall, when the target property does not follow a simple distribution. In general, SYFLOW enables a wide range of applications to find notable trends in their data.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3608
Loading