Efficient Deepfake Detection via Layer-Frozen Assisted Dual Attention Network for Consumer Imaging Devices
Abstract: The advancement of open-source frameworks and user-friendly manipulation applications has accelerated the spread of deep fakes. In this study, we proposed optimal features assisted with a dual attention (DA) network strategy to combat this proliferation in consumer imaging devices. We employed EfficientNetV2 (ENV2) as the primary feature extractor, initially utilizing its pre-trained weights from the ImageNet dataset while keeping its layers frozen to leverage their rich feature extraction capabilities. We enhance this base model with a DA module that integrates the Convolutional Block Attention Module (CBAM), which utilizes both channel attention (CA) and spatial attention (SA) mechanisms to improve feature representation. CA dynamically adjusts channel-wise feature responses to capture interdependencies between channels, thereby improving feature discrimination. SA allows the network to focus on important regions within feature maps, enhancing localization and reducing noise. The features are assisted in multi-stages of the network with residual fashion to focus on discriminative visual information. During fine-tuning, we unfreeze the deeper layers of ENV2 for further refinement of learned features to better suit the deepfake dataset. We employed a targeted fine-tuning approach, unfreezing specific layers and applying iterative adjustments to optimize performance, providing valuable insights into countering the growing use of synthetic media in consumer imaging. To validate our network, we conducted comprehensive experiments on four benchmark datasets: FaceForensics++, the World Leaders, Celeb-DF, and DFDC. As a result, our network achieved superior performance compared to existing benchmarks and state-of-the-art approaches, offering a promising solution for robust deepfake detection (DD) in consumer imaging technologies.
Loading