Oh That Looks Familiar: A Novel Similarity Measure for Spreadsheet Template Discovery

Vengadesh Ravikumaran; Anand Krishnakumar

Oh That Looks Familiar: A Novel Similarity Measure for Spreadsheet Template Discovery

Vengadesh Ravikumaran, Anand Krishnakumar

Published: 18 Nov 2025, Last Modified: 18 Nov 2025AITD@EurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Submission Type: Short paper (4 pages)

Keywords: spreadsheet, vision, clustering, chamfer, hausdorff, tabular, embeddings, RAG

TL;DR: Spreadsheets are made for humans, often leading to machine-unfriendly formatting. Discovering recurring layout templates within a library of spreadsheets speeds up the organisation process, unlocking spreadsheets for scaled AI.

Abstract: Traditional methods for identifying structurally similar spreadsheets fail to capture the spatial layouts and type patterns defining templates. We present a hybrid distance metric combining spatial positioning, data type information, and semantic embeddings to measure similarity between spreadsheets. Our approach transforms spreadsheets into cell-level embeddings, then applies aggregation strategies including Chamfer and Hausdorff distances to compute spreadsheet similarity. Experiments across template families demonstrate superior unsupervised clustering performance compared to the graph-based Mondrian baseline, achieving perfect template reconstruction (Adjusted Rand Index of $1.00$ versus $0.90$) on the FUSTE dataset. Our method enables automated template discovery at scale, facilitating downstream applications including bulk data cleaning, model training, and retrieval-augmented generation over tabular collections.

Relevance Comments: This work directly addresses a core challenge in AI for tabular data: organizing and retrieving spreadsheets at scale. Our hybrid distance metric enables template discovery—a critical primitive for table-based RAG systems, foundation model pretraining, and automated data wrangling pipelines highlighted in the workshop's scope.

Submission Number: 48

Loading