Abstract: Many machine learning classification tasks involve imbalanced datasets, which are often sub-
ject to over-sampling techniques aimed at improving model performance. However, these
over-sampling methods are prone to generating unrealistic or infeasible samples. Further-
more, they often function as black boxes, lacking interpretability in their procedures. This
opacity makes it difficult to track their effectiveness and provide necessary adjustments,
and they may ultimately fail to yield significant performance improvements. To bridge
this gap, we introduce the Decision Predicate Graphs for Data Augmentation (DPG-da), a
framework that extracts interpretable decision predicates from trained models to capture
domain rules and enforce them during sample generation. This design ensures that over-
sampled data remain diverse, constraint-satisfying, and interpretable. In experiments on
synthetic and real-world benchmark datasets, DPG-da consistently improves classification
performance over traditional over-sampling methods, while guaranteeing logical validity and
offering clear, interpretable explanations of the over-sampled data.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Rahaf_Aljundi1
Submission Number: 5826
Loading