Deep learning models for multiple answers extraction and classification of scientific publications

Youngsun Jang, Kwanghee Won, Hyung-Do Choi, Sung Shin

Published: 01 Jan 2022, Last Modified: 04 Mar 2025RACS 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper 1 presents an overview of a data augmented classification and multi-span (multiple) answer system for extracting key information from academic publications. This study consists of two sections: (a) implementing a new fine-tuned model to solve the multiple answer extraction issue (b) reporting results of sub-classification in various RF-EMF topics. In our previous study, it has been found that the essential cause of the low performance of the extractive question answering (EQA) system for certain types of questions was the multiple answer issue. To solve this problem, this study applies the TASE (TAg-based Span Extraction) technique and introduces the results. Our approach can retrieve multiple answers spreading over a given text by referring to the pre-trained TASE model with fine accuracy. In addition, this work adopts 'PEO (Population, Exposure, Outcome)' from the 'PECO' of the WHO-funded study on RF-EMF safety, as our holistic research framework. Based on the PEO perspective, the results of three sub-topic (RF, SAR, Causal Relationship) classifications are presented. For both models of multi-span answer and classification tasks, the data-augmenting method plays an important role. In particular, it is found that our proposed system outperforms the pre-trained BERT model in multi-span answer tasks with our RF-EMF dataset.