SPOT-JS:Spectral Chebyshev Filter and Optimal Transport Fusion with Jensen-Shannon Alignment for Cross-Domain Multimodal Deception Detection

14 Sept 2025 (modified: 26 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multimodal deception detection, frequency spectrum, chebyshev polynomials, optimal transport, jensen-shannon, cross-domain
Abstract: Multimodal deception detection is increasingly important for security, justice, and human-AI interaction. However, prevailing systems still depend on contact-based sensing or elaborate handcrafted feature pipelines and exhibit limited generalization beyond their training domains. Typical approaches learn shallow unimodal cues (e.g., surface spatio-temporal patterns) and fuse modalities by simple concatenation or attention; these choices induce sensitivity to positional dependencies and to distribution shift. This work presents SPOT-JS, a frequency-domain framework aimed at cross-domain transfer. It standardizes inputs, improves unimodal representations, and performs fusion with distribution-aware alignment grounded in established theory. Concretely, a Temporal Deception Alignment Module (TDAM) first provides unified preprocessing and audio-visual synchronization to eliminate reliance on specialized facial/vocal features or invasive signals. We then propose a Learnable Chebyshev Spectrum Filter (LCSF) that operates on power spectra to emphasize task-relevant bands and suppress noise by embedding a learnable Chebyshev basis into the spectral transformation. Next, an Optimal Transport-based Cross-Modal Fusion (OTCF) module computes an entropic-regularized transport plan between spectral components of audio and video, enabling fine-grained, bidirectional correspondence and residual fusion in a shared latent space. Fourth, a Jensen-Shannon Guided Alignment (JS-Align) module measures cross-modal posterior similarity via JS divergence and adaptively reweights the fused representation, mitigating sensitivity to positional mismatches and improving stability under shift. Finally, we introduce the Chebyshev Spectrum-guided Knowledge Transfer (CSKT) Module, which leverages spectral filtering to enhance cross-domain facial knowledge transfer. On standard benchmarks (Real Life Trial, DOLOS, and Box of Lies), SPOT-JS surpasses strong unimodal, fusion, and transfer baselines in both intra- and cross-domain settings, with higher F1/ACC/AUC and especially large gains when training on one dataset and testing on another.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 5153
Loading