Constrained Maximum Likelihood Gaussian Score Fusion for Multimodal Deepfake Detection

Published: 01 Jan 2025, Last Modified: 29 Aug 2025IWBF 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The technologies powering synthetic multimedia generation are becoming more sophisticated with each passing year. This has resulted in multiple efforts focusing on the design of accurate multimodal Deepfake detection algorithms. While these detection algorithms appear to perform well on existing benchmarks, the resulting predictions are not necessarily calibrated. Having calibrated multimodal Deepfake detection systems is crucial for ensuring that model decisions can be trusted. In this paper, we propose a method for transforming the scores generated by Deepfake detection systems such that the resulting fused output is well-calibrated. More specifically, we extend a generative approach for score calibration called the Constrained Maximum-likelihood Gaussian (CMLG) technique, where the target (Fake) and non-target (Real) score distributions are modeled as univariate Gaussian distributions, by modifying it to perform score fusion and calibration simultaneously. We further extend the technique to handle more complex score distributions by modelling them with multiple mixtures. Our experiments show that not only does our proposed fusion method generate a calibrated output for the multimodal Deepfake detection, but it also results in improved discrimination performance.
Loading