Abstract: Government verification systems are increasingly relying on internet-based platforms, where users authenticate their identities by uploading images captured with ordinary mobile devices. However, the rapid advancements in generative algorithms have enabled the creation of highly realistic forged ID cards that can easily bypass such verification pipelines. These forgeries are not restricted to a single modality; they may target facial imagery, textual content, or both, posing significant challenges to existing detection approaches. We present a framework that analyzes visual features for ID forgery detection by integrating feature fusion with attention mechanisms, leveraging both convolutional neural network (CNN) architectures, such as ResNet-50 and EfficientNet, and transformer-based models, including ViT-16 and Swin Transformer. This study emphasises the significance of feature fusion and attention-driven representation learning in developing robust and trustworthy ID forgery detection systems for real-world deployment.
Loading