Abstract: Text2text question classification (TQC) is a foundational task in the question classification (QC) field, with a wide range of applications in both industry and academia, such as intelligent customer service systems. Conventional QC tasks typically rely on one or more user-provided keywords to classify questions. In contrast, TQC problems involve categorizing semantically similar standard questions, which are then represented in short text format. However, due to the limited availability of TQC datasets, the process of manual labeling often results in noisy labels that do not accurately reflect the true class of a question, introducing bias into the training data. Noisy labels can lead to unreliable and uncertain supervised signals, which have a significant negative impact on the performance of models. To tackle these challenges, we propose the Evidential Robust Deep Learning (ERDL) framework, which integrates TQC Contrastive Loss (TCL) and TQC Evidential Learning Loss (TEL) to achieve accurate semantic similarity and handle noisy data in the TQC dataset. Notably, TEL is a novel loss function based on evidential learning that models the output as a Dirichlet distribution to capture the uncertainty resulting from noisy data. We evaluated our framework using four noisy TQC datasets and found that it outperformed relevant baselines, as indicated by the experimental results.
Loading