POINTESS: EFFICIENT 3D POINT CLOUD MODELING THROUGH ENHANCED STATE SPACE NETWORKS AND FINE-TUNING STRATEGIES

11 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Point Cloud, Mamba2, KH-Norm, Gate Adapter, Recent progress in point cloud modeling highlights the need for efficient pretraining strategies and task adaptability. We introduce \emph{PointESS}, the first framework to leverage the \emph{Mamba2} architecture for point cloud representation learning. PointESS balances structural modeling with downstream flexibility and incorporates two key components to enhance local structural awareness and fine-tuning performance: Gate Adapter, which dynamically integrates pretrained and task-specific features for improved transferability, and KH-Norm (KNN-Hybrid Normalization), which embeds geometric priors into normalization and pooling to capture fine-grained spatial details. Extensive experiments on ShapeNet55, ModelNet40, and ScanObjectNN demonstrate that PointESS achieves competitive or state-of-the-art results with substantially fewer parameters, highlighting its effectiveness in both generalization and efficiency.3D Object Classification
TL;DR: We propose PointESS, the first point cloud representation learning framework based on Mamba2, achieving strong performance with fewer parameters via a novel Gate Adapter and KH-Norm for better transferability and spatial modeling.
Abstract: Recent progress in point cloud modeling highlights the need for efficient pretraining strategies and task adaptability. We introduce PointESS, the first framework to leverage the Mamba2 architecture for point cloud representation learning. PointESS balances structural modeling with downstream flexibility and incorporates two key components to enhance local structural awareness and fine-tuning performance: Gate Adapter, which dynamically integrates pretrained and task-specific features for improved transferability, and KH-Norm (KNN-Hybrid Normalization), which embeds geometric priors into normalization and pooling to capture fine-grained spatial details. Extensive experiments on ShapeNet55, ModelNet40, and ScanObjectNN demonstrate that PointESS achieves competitive or state-of-the-art results with substantially fewer parameters, highlighting its effectiveness in both generalization and efficiency.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 4123
Loading