Graph-based Approach for Semantic Text Clustering: Topic Detection in the Context of a Large Multilingual Public ConsultationDownload PDF

Anonymous

16 Dec 2022 (modified: 05 May 2023)ACL ARR 2022 December Blind SubmissionReaders: Everyone
Abstract: We present a novel algorithm for multilingual text clustering built upon two well studied techniques: multilingual aligned embedding and community detection in graphs. The aim of our algorithm is to discover underlying topics in a multilingual dataset using clustering. We present both a numerical evaluation using silhouette and V-measure metrics, and a qualitative evaluation for which we propose a new systematic approach. Our algorithm presents robust overall performance and its results were empirically evaluated by an analyst. The work we present was done in the context of a large multilingual public consultation, for which our new algorithm was deployed and used on a daily basis.
Paper Type: long
Research Area: Efficient Methods for NLP
0 Replies

Loading