A Multilinear Least-Squares Formulation for Sparse Tensor Canonical Correlation Analysis
Abstract: Tensor data are becoming important recently in various applications, e.g., image and video recognition, which pose new challenges for data modeling and analysis approaches, such as high-order relations of large complexity, varying data scale and gross noise. In this paper, we consider the problem of sparse canonical correlation analysis for arbitrary tensor data. Although several methods have been proposed for this task, there are still limitations hindering its practical applications. To this end, we present a general Sparse Tensor Canonical Correlation Analysis (gSTCCA) method from a multilinear least-squares perspective. Specifically, we formulate the problem as a constrained multilinear least-squares problem with tensor-structured sparsity regularization based on CANDECOMP/PARAFAC (CP) decomposition. Then we present a divide-and-conquer deflation approach to tackle the problem by successive rank-one tensor estimation of the residual tensors, where the overall model is broken up into a set of unconstrained linear least-squares problems that can be efficiently solved. Through extensive experiments conducted on five different datasets for recognition tasks, we demonstrate that the proposed method achieves promising performance compared to the SOTA vector- and tensor-based canonical correlation analysis methods in terms of classification accuracy, model sparsity, and robustness to missing and noisy data. The code is publicly available at https://github.com/junfish/gSTCCA.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=zc0Y0cAuTV&nesting=2&sort=date-desc
Changes Since Last Submission: Specifically, in response to Reviewer dFfz's concerns about the experimental evaluations and the importance of incorporating tensor-structured sparse regularization within the CCA framework, we emphasize the significance of sparse models on model interpretability. Please see **the line 7-9 on p2**. In addition, we have added the analysis of identifying informative features, particularly focusing on the ADNI dataset to provide the interpretability for Alzheimer’s Disease. Please see the newly added **section 6.3.5 on p12**, as AE suggested. Regarding Reviewer Gq2a's points on the lack of novelty and limitations in the experimental evaluation, we appreciate the discussion on the evaluation criteria of TMLR and have addressed the concerns raised. We have clarified the novelty of our work and provided a further explanation of the experimental design, including the use of classification experiments as downstream tasks to demonstrate the effectiveness of the learned representations. Please see the modifications and clarifications in **section 6**. Furthermore, we are committed to incorporating all the feedback and requested changes provided in the rebuttals by reviewers dFfz, Gq2a, and yh64. Please see **section 1 on p2, equation 2 on p4, section 4.1 on p5, section 6.1 on p8-9, and other word/sentence adjustments** over the paper. Finally, it is worth noting that we have released the **source code for both our proposed methods and the baseline models, along with preprocessed data**, to the research community. This commitment ensures the reproducibility of our work.
Supplementary Material: pdf
Assigned Action Editor: ~Pablo_Sprechmann1
Submission Number: 1199