Learning Embeddings for Discrete Tree-Structured Data via Structural Prediction

TMLR Paper6830 Authors

06 Jan 2026 (modified: 24 Jan 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Tree-structured data in natural language syntax, program analysis, and other symbolic domains are typically discrete, rooted, and ordered combinatorial objects. Despite their ubiquity, scalable and learnable representations for comparing such discrete structural trees remain limited. Classical methods such as tree edit distance (TED) and tree kernels provide principled structural measures but are computationally prohibitive, while previous neural encoders often produce latent representations without defining a consistent or interpretable space. We introduce a framework for learning embeddings for discrete tree-structured data in which a Transformer encoder is trained through structural prediction tasks—predicting parent indices, node positions, and optionally tree-level categories. Rather than supervising distances directly, these structural objectives induce a coherent Euclidean embedding space for rooted, ordered trees. A key property of the resulting embedding space is its stability under local structural perturbations: a bounded number of edits, such as inserting or deleting a leaf node, produces a proportionally bounded change in the embedding. Empirically, real datasets exhibit a global envelope in which the ratio between embedding distance and edit count remains uniformly bounded. This yields a smoother and more robust structure than TED and other discrete comparison methods, which often exhibit abrupt jumps under minor structural variations. We demonstrate the effectiveness of our approach across Universal Dependencies treebanks, synthetic random trees, and abstract syntax trees. The learned embeddings correlate strongly with TED, reveal cross-linguistic and cross-parser structural patterns, separate natural from random syntax, and support structure-only code clone retrieval. Together, these results show that structural prediction alone can induce a stable, scalable, and domain-general embedding space that captures fine-grained properties of discrete tree structure.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We added a one-sentence clarification immediately after the loss definition in Section 3.2. This change does not affect the method or results.
Assigned Action Editor: ~Yu_Cheng1
Submission Number: 6830
Loading