ViXNet: Vision Transformer with Xception Network for deepfakes based video and image forgery detection
Abstract: Highlights•Proposed a deep learning based model for deepfake image/video detection.•It has a patch-wise self-attention module which learns local image artifacts.•It consists of a vision transformer which learns correlation among masked patches.•Xception based global image features are stacked with patch based local features.•The model achieves good results on some standard video forgery detection datasets.
Loading