Post-prediction confidence training complements supervised learning

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: supervised learning, prediction uncertainty, maxout, feature representation
Abstract: Wrong prediction is bad. For users, having high confidence on a wrong prediction is even worse. Since even the best-trained class-label predictor will have some chance of making mistakes, users, especially in some AI application areas such as personalized medicine, may want to tell the high quality predictions from the low quality ones. In convolutional neural networks (CNN), confidence on a prediction is associated with the softmax output layer, which gives a probability distribution on the class-labels. But even a prediction with 95\% probability concentrated on one class may still turn out wrong many times more often than the anticipated rate of 5\%. There are at least three main sources of uncertainty to cause a large anticipation gap. The first one is that some of the test samples may not belong to the same distribution of the training samples. The second one is the sever population heterogeneity within each class, causing the variation of prediction quality across some hidden subpopulations. The third one is the imperfectness of the prediction model. While most researches are focused on the first source of prediction uncertainty, the other two receive much less attention. Here we take a different approach, termed post-prediction confidence training (PPCT), to guide users how to discern the high-quality predictions from the low-quality ones. Distinctively different from other methods including conformal prediction, PPCT entertains all three sources of uncertainty by searching features to anchor the criticism of prediction quality. An enhancement to CNN configuration is required during network training. We propose a blueprint by coupling each logit node (T channel) in the layer feeding to softmax with an additional node (C channel) and using maxout to link the pair to the softmax layer. The C channel is introduced to counter the T channel as a contrastive feature against the feature of the target class. A high-quality prediction must follow a logically-lucid pattern between T and C for every class. Successful implementation of our methods on popular image datasets are reported.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7103
Loading