Distance-function design and fusion for sequence data

Yi Wu, Edward Y. Chang

2004 (modified: 05 Nov 2022)CIKM 2004Readers: Everyone

Abstract: Sequence-data mining plays a key role in many scientific studies and real-world applications such as bioinformatics, data stream, and sensor networks, where sequence data are processed and their semantics interpreted. In this paper we address two relevant issues: sequence-data representation, and representation-to-semantics mapping. For representation, since the best one is dependent upon the application being used and even the type of query, we propose representing sequence data in multiple views. For each representation, we propose methods to construct a valid kernel as the distance function to measure similarity between sequences. For mapping, we then find the best combination of the individual distance functions, which measure similarity of different views, to depict the target semantics. We propose a super-kernel function-fusion scheme to achieve the optimal mapping. Through theoretical analysis and empirical studies on UCI and real world datasets, we show our approach of multi-view representation and fusion to be mathematically valid and very effective for practical purposes.

0 Replies