Abstract: We provide a survey and careful empirical comparison of the state-of-the-art in neural selective classification for NLP tasks. Across multiple trials on multiple datasets, only one of the surveyed techniques -- Monte Carlo Dropout -- significantly outperforms the simple baseline of using the maximum softmax probability as an indicator of prediction confidence. Our results provide a counterpoint to recent claims made on the basis of single-trial experiments on a small number of datasets. We also provide a blueprint and open-source code to support the future evaluation of selective prediction techniques.
0 Replies
Loading