Abstract: We study a broad class of methods for the joint estimation of multiple sparse
symmetric matrices that incorporates group and fusion penalties for borrowing
strength across related matrices. This class includes extensions of popular
methods for precision and covariance matrix estimation as well as PCA. We show
that these methods can be unified through the lens of computational
sufficiency, a recently proposed theory that can reveal hidden commonalities
between seemingly disparate methods yielding both theoretical insights into
the underlying optimization problems and practical advantages in terms of
computational efficiency. We derive a universal screening rule that applies
simultaneously to all methods in this class, allowing us to reduce the search
space to block diagonal matrices. This enables streamlined algorithms that
drastically reduce the runtime, making the methods far more scalable and
practical for high-dimensional data analysis.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We thank the reviewers for their careful and constructive feedback.
The revised manuscript addresses the substantive comments, fixes the
issues with notation, Algorithm 1, and the Appendix Sec. 2.3.1 proof
of the Group Lasso construction of $V$, and adds the requested
experimental material.
The revised manuscript is 14.5 pages (vs. 12). The 2.5-page expansion
is entirely accounted for by reviewer-requested additions: ~1.5 pages
of new figures (toy schematic, proximal-gradient comparison,
time-vs-sparsity), ~0.5 page for the time-vs-sparsity discussion
(Reviewer mE5f), and ~0.5 page for the other reviewer-flagged
theoretical additions (Corollary 3.4, the "Scope of the invariance
condition" paragraph, novelty attribution).
**Major substantive changes**
- **Algorithm-agnostic guarantee made explicit (new Corollary 3.4,
Sec. 3.2).** The gradient, subgradient, and proximal operator of the
reduced objective all preserve the block-diagonal subspace, so any
first-order method initialized in that subspace remains in it
(a footnote notes the same applies to Newton, quasi-Newton, and
interior-point methods). The proof is in the appendix. A new
paragraph in Sec. 5 explains why ADMM is the natural choice for
Joint Sparse PCA without being essential to the reduction.
- **Definition 3.2 restructured.** The joint single-linkage
thresholding operator is now defined for a generic graph
$\mathcal{G}$, with the penalty-specific graph
$\mathcal{G}_{P}(X, \lambda_1, \lambda_2)$ introduced separately in
Sec. 3.3 and Theorem 3.3 referencing it explicitly. The forward
reference is removed.
- **Novelty attribution.** New paragraphs in Sec. 1 (after the
computational-sufficiency framework discussion) and Sec. 6 (Discussion)
credit the projection-based framework to Vu (2018) and identify the
multiple-matrix extension and the penalty-specific screening graphs
as the contributions of this paper.
- **Scope and limitations of Condition 1** are now discussed in Sec. 6
("Scope of the invariance condition" paragraph): the class of
generators satisfying diagonal sign-conjugation invariance, its
strict containment of the orthogonally invariant class, its
maximality among $O(d)$ subgroups preserving entrywise support, and
what falls outside (asymmetric losses, signed-difference fusion
penalties).
- **Regime in which the reduction helps.** The "Dependence on
$\lambda_1$ and $\lambda_2$" paragraph in Sec. 3.2 now ties speedup to
the component sizes $|C_b|$ via Corollary 3.4, identifies
moderate-to-strong regularization as the target regime, and notes
that the worst case is a no-op rather than a slowdown.
**Corrections**
- Eq. (2.7) (sparse-covariance generator): sign of the log-determinant
barrier corrected to $-\tau \log\det(\theta)$.
- Sec. 1, first mention of $\lambda_2$: now glossed inline with a forward
reference to its formal definition in Eq. (2.2).
- Definition 3.1: input space $\mathcal{X}$ and parameter space
$\mathcal{T}$ defined explicitly.
- Algorithm 1: missing variables added to the Require list, and
subscript/dual-update typos corrected.
- Appendix Sec. 2.3.1 (Group Lasso): the dual-feasibility construction of
$V^{(k)}_{ij}$ now divides by $\lambda_2$ in the two non-zero
branches, restoring the identity $\lambda_1 U + \lambda_2 V = X$.
- **Additional proofreading correction** (not reviewer-flagged).
Substantive math error in the appendix:
- **Graphical Lasso prox formula (Appendix Sec. 3.1)** had a sign
error on $\gamma_i$ and incorrect $\rho$ factors that have been corrected.
Assorted minor fixes (incomplete sentences, stale equation
references, ADMM display consistency, soft-threshold scaling, and
notational typos) were also made.
**Experimental additions**
- **Worked toy example ($d=5$, $K=3$).** New Figure 1 in Sec. 3.2
illustrates the data tuple, the screening graph, the binary mask,
and the reduced data. A second row depicts the guarantee from
Theorem 3.3.
- **Variability bars on the full-path runtime figure (now Figure 2).** Each
curve shows the median runtime over 20 independent runs; the shaded
band is the 25th--75th percentile range. The caption was also
rewritten to make explicit that each point is the total wall-clock
time over a $5 \times 20$ regularization path of 100 penalty
configurations.
- **Proximal-gradient solver experiment (new Figure 3 in Sec. 5.1).**
Confirms the solver-agnostic claim of Corollary 3.4: applying the
CS reduction to a proximal-gradient solver for Joint Graphical Lasso
reproduces the qualitative speedup pattern of the ADMM comparison.
Full setup is in the appendix.
- **Execution-time-vs-sparsity plot (new Figure 4 in the new Sec. 5.2,
"Time to Compute a Single Solution").** Details in the reply to
Reviewer mE5f #3.
We hope these revisions address the reviewers' concerns and welcome
any further feedback.
Assigned Action Editor: ~Dan_Garber1
Submission Number: 6935
Loading