Talk Until You Burn Out: Escalating 3D-LLM Overgeneration via Semantic Manipulation

03 Sept 2025 (modified: 01 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D-LLMs, Overgeneration, Semantic Manipulation, Decoding Delay
Abstract: The rise of 3D large language models (3D-LLMs) has unlocked new potential in multimodal reasoning over unstructured 3D data, powering applications such as robotics and autonomous driving. However, these models also introduce new security risks, particularly during the inference-time computation. In this work, we present \textbf{Exhaust3D}, the first targeted energy-oriented adversarial framework against 3D-LLMs. Exhaust3D performs a \textbf{resource exhaustion attack} by injecting imperceptible yet strategically structured semantic perturbations into 3D point clouds, causing the model to overgenerate outputs and inflate inference latency. Specifically, we design two key components: (1) a \textit{semantic-aware adversarial manipulation strategy} that leverages internal model representations to selectively perturb semantically critical point regions while preserving geometric structure, and (2) a \textit{trajectory disruption mechanism} that maintains high-entropy token predictions to prolong auto-regressive decoding and induce verbose outputs. Experiments on widely-used 3D-LLM benchmarks show that Exhaust3D increases decoding steps and energy consumption by up to \textbf{4$\times$} with negligible degradation in functional performance. These results expose a previously underestimated vulnerability of 3D-LLMs to resource exhaustion attacks, highlighting the urgent need for energy-aware robustness in future multimodal foundation models.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 1359
Loading