Keywords: model explanations, trustworthy machine learning, explainable ai, interpretable machine learning
Abstract: Machine learning needs a huge amount of (labeled) data, as otherwise it might not learn the right model for different sub-populations, or even worse, they might pick up spurious correlations in the training data leading to brittle prediction mechanisms. Also, for small training datasets, there is a huge variability in the learned models on randomly sampled training datasets, which makes the whole process less reliable. But, collection of large amount of useful representative data, and training on large datasets, are very costly. In this paper, we present a technique to train reliable classification models on small datasets, assuming we have access to some simple explanations (e.g., subset of influential input features) on labeled data. We also propose a novel two stage training pipeline that optimizes the model's output and fine-tunes its attention in an interleaving manner, to help the model to agree with the provided explanation while learning from the data. We show that our training pipeline enables faster convergence to better models, especially when there is a severe class imbalance in the population or spurious features in the training data.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
Supplementary Material: zip
18 Replies
Loading