Abstract: In this paper, we propose a new method for deter-mining shared features of and measuring the distance between data sets or point clouds. Our approach uses the joint factorization of two data matrices <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$X_{1}, X_{2}$</tex> into non-negative matrices <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$X_{1}=AS_{1}, X_{2}=AS_{2}$</tex> to derive a similarity measure that determines how well the shared basis <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">${A}$</tex> approximates <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$X_{1}, X_{2}$</tex> . We also propose a point cloud distance measure built upon this method and the learned factoriI zation. Our method reveals structural differences in both image and text data. Potential applications include classification, detecting plagiarism or other manipulation, data denoising, and transfer learning.
0 Replies
Loading