Text Data Mining: Discovery of Important Keywords in the Cyberspace

Hiroki Arimura, Jun-ichiro Abe, Hiroshi Sakamoto, Setsuo Arikawa, Ryoichi Fujino, Shinichi Shimozono

Published: 2000, Last Modified: 26 Jul 2025Kyoto International Conference on Digital Libraries 2000EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper describes applications of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over phrases, called proximity phrase association patterns, and consider the problem of finding the patterns that optimize a given statistical measure within the whole class of patterns in a large collection of unstructured texts. For this class of patterns, we develop fast and robust text mining algorithms based on techniques in computational geometry and string matching. Finally, we successfully apply the developed text mining algorithms to the experiments on interactive document browsing in a large text database and keyword discovery from Web bases.