XML Document Mining Using Contextual Self-organizing Maps for Structures

Milly Kc, Markus Hagenbuchner, Ah Chung Tsoi, Franco Scarselli, Alessandro Sperduti, Marco Gori

2006 (modified: 27 Aug 2024)INEX 2006Readers: Everyone

Abstract: XML is becoming increasingly popular as a language for representing many types of electronic documents. The consequence of the strict structural document description via XML is that a relatively new task in mining documents based on structural and/or content information has emerged. In this paper we investigate (1) the suitability of new unsupervised machine learning methods for the clustering task of XML documents, and (2) the importance of contextual information for the same task. These tasks are part of an international competition on XML clustering and categorization (INEX 2006). It will be shown that the proposed approaches provide a suitable tool for the clustering of structured data as they yield the best results in the international INEX 2006 competition on clustering of XML data.

0 Replies