Familia: A Configurable Topic Modeling Framework for Industrial Text EngineeringOpen Website

2021 (modified: 26 Oct 2021)DASFAA (3) 2021Readers: Everyone
Abstract: In this paper, we propose a configurable topic modeling framework named Familia. Familia supports an important line of topic models that are widely applicable in text engineering scenarios. In order to relieve burdens of software engineers without knowledge of Bayesian networks, Familia is able to conduct automatic parameter inference for a variety of topic models. Simply through changing the data organization of Familia, software engineers are able to easily explore a broad spectrum of existing topic models or even design their own topic models, and find the one that best suits the problem at hand. With its superior extendability, Familia has a novel sampling mechanism that strikes balance between effectiveness and efficiency of parameter inference. Furthermore, Familia is essentially a big topic modeling framework that supports parallel parameter inference and distributed parameter storage. The utilities and necessity of Familia are demonstrated in real-life industrial applications. Familia would significantly enlarge software engineers’ arsenal of topic models and pave the way for utilizing highly customized topic models in real-life problems. Source code of Familia have been released at Github via https://github.com/baidu/Familia/ .
0 Replies

Loading