Representing Sentence Structure in a Tree Metric Space

13 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: tree metric space, tree representation learning, Transformer, tree analyses, tree metric, random trees
TL;DR: We propose a fast tree representation learning method that constructs a sentence metric space. The approach jointly predicts tree structure and class, offering benefits for evaluating LLMs and parsers, and analyzing sentences w.r.t. random trees.
Abstract: This paper proposes building a sentence tree metric space through representation learning of sentence structure. Our method represents every sentence tree structure as a vector, with the Euclidean distance applied to construct the sentence tree metric. In comparison with the previous, representative tree-metric methods of the (tree edit distance) TED, tree kernels, and PQ-grams, our method has the best computational complexity, scaling to handle a million trees, yet it performs well in predicting tree structure and learning TED-like distances, even without TED for supervision. Our large-scale sentence metric space analyses provide novel ways to study sentence structures from recent language technology, by evaluating parsers and tree-annotated corpora, and with tree structures acquired by recent large language models (LLMs). These analyses also address the nature of natural language trees not only within languages but in comparison with random trees.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 4636
Loading