GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

Eduarda de Souza Marques; Arthur Sobrinho Ferreira da Rocha; Joao Paixao; Heudson Mirandola; Daniel Sadoc Menasche

GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

Eduarda de Souza Marques, Arthur Sobrinho Ferreira da Rocha, Joao Paixao, Heudson Mirandola, Daniel Sadoc Menasche

Published: 02 Mar 2026, Last Modified: 18 Mar 2026ICLR 2026 Workshop GRaM PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 8 pages)

Keywords: GSVD, angle, geometry, alignment

TL;DR: Use GSVD to compare two datasets and assign each sample a single angle saying "more like A", "more like B", or "shared". The angle supports simple classification, visual diagnostics, and a global distance between datasets via their angle histograms.

Abstract: Geometry-grounded learning asks models to respect structure in the problem domain rather than treating observations as arbitrary vectors. Motivated by this view, we revisit a classical but underused primitive for comparing datasets: \emph{linear relations} between two data matrices, expressed via the co-span constraint $Ax=By=z$ in a shared ambient space. To operationalize this comparison, we use the generalized singular value decomposition (GSVD) as a joint coordinate system for two subspaces. In particular, we exploit the GSVD form $A = H C U$, $B = H S V$ with $C^\top C + S^\top S = I$, which separates shared versus dataset-specific directions through the diagonal structure of $(C,S)$. From these factors we derive an interpretable *angle score* $\theta(z)\in[0,\pi/2]$ for a sample $z$, quantifying whether $z$ is explained relatively more by $A$, more by $B$, or comparably by both. The primary role of $\theta(z)$ is a *per-sample geometric diagnostic*. We illustrate the behavior of the score on MNIST through angle distributions and representative GSVD directions. A binary classifier derived from $\theta(z)$ is presented as an illustrative application of the score as an interpretable diagnostic tool.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 17

Loading