Iterative Semantic Transformer by Greedy Distillation for Community Question Answering

Jinmeng Wu, Tingting Mu, Jeyarajan Thiyagalingam, Hanyu Hong, Yanbin Hao, Tianxu Zhang, John Yannis Goulermas

Published: 01 Jan 2024, Last Modified: 05 Nov 2025IEEE ACM Trans. Audio Speech Lang. Process. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The semantic matching problem consists of recognizing if the candidate text is relevant to a particular input text. Semantic similarities can be determined from human-curated knowledge, but such knowledge may not be available in every language. Instead, statistical learning techniques have been applied, but these techniques circumvent the need for manual feature engineering by using large datasets to train models to perform semantic similarity scoring between portions of text or words. The pre-trained transformer provides a further mechanism to consolidate the information throughout a sentence into single sentence-level representations, but these representations may not be optimal for the matching task. As an alternative, we propose an interactive semantic transformer based on a greedy layer-wise framework to learn a distributed similarity representation for sentence pairs. The novelty of the architecture lies in an abstract representation of the semantic similarities created by three-stage learning strategies. Model training is accomplished through a greedy layer-wise training scheme, that incorporates both supervised and unsupervised learning. The proposed model is experimentally compared to state-of-the-art approaches on three different dataset types: the library TREC, the Yahoo!, and Stack Exchange community question datasets, and results show the proposed model outperforming other approaches.