Abstract: In recent years, data-driven and machine learning-based natural language processing (NLP) technologies have effectively addressed various challenges. To further enhance the performance of NLP models, it is crucial to understand the types of data that a model can handle well and those it struggles with. This study introduces a method to discern which types of data can be effectively processed by given neural network-based models and which pose difficulties. We define the criteria for hard-to-solve data, construct a pairwise easy-hard dataset, and propose a neural scoring model. This model ascertains the difficulty level of each data instance. To utilize the proposed difficulty level as an application, we employed curriculum learning. The experimental results show that our methodology can effectively distinguish between easy and hard data, and performance improves when applying the curriculum learning approach.
Loading