Training-Free Spatially Grounded Geometric Shape Encoding

01 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: shape encoding, training-free, invertibility, neural network friendly, frequency richness
TL;DR: training-free, task-agnostic spatially grounded geometric shape encoding
Abstract: Positional encoding has become the de facto standard for grounding deep neural network to discrete positions, and has achieved remarkable success in tasks where the input can be represented as one-dimensional sequence. However, extending this concept to spatial geometric shapes demands carefully designed encoding strategies that account not only for the shape's geometry and spatial position, but also for its compatibility with neural network learning. In this work, we address these challenges by introducing a training-free, general-purpose encoding strategy, dubbed XShapeEnc, that encodes an arbitrary spatially grounded geometric shape into compact representation that is fully invertible and frequency-rich. Specifically, XShapeEnc decomposes a shape into its shape geometry and shape pose, and independently encodes each of them under Zernike moments transform. Benefiting from the orthogonality of Zernike basis across the both radial and angular frequencies, both the shape geometry and shape pose are projected into compact representation that completely describes the shape from coarse-to-fine granularity. By further involving carefully designed shape intrinsic position encoding and orthonormal radial basis for shape geometry and shape pose respectively, the resulting encoded representation naturally contains high frequency component while maintaining the invertibility.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 604
Loading