\section{Related Works}
\label{sec:related}
The domain of deepfake detection has evolved into a multifaceted area of research. This section discusses research on deepfake forensics from the recent past across spatial and frequency domain. Spatial domain methods analyze pixel-level inconsistencies, frequency domain methods analyze artifacts in spectral features (like DCT/DFT/FFT, etc.) While spatial domain (sometimes also fused with other domains like auditory domain \cite{astrid2025audio, usmani2025spatio, yang2023avoid}) are dominant methods for detection, frequency-based approaches are gaining prominence due to their lightweight feature extraction process, and invariance to adversarial shifts. 

\subsection{Spatial Domain Approaches}
Spatial methods primarily rely on Convolutional Neural Networks (CNNs) and Transformer-based architectures to capture features specific to deepfakes, and using those features to classify between real and synthetic videos (mainly done using fully-connected layers). The features include pixel-level manipulations, facial landmark distortions, and texture inconsistencies introduced in the synthetic videos generated using Generative Adversarial Networks (GANs). Naskar \emph{et al.} \cite{naskar2024deepfake} proposed a spatial domain deepfake detection approach using \textit{deep feature stacking} and \textit{meta-learning} integrating features extracted by \textsc{Xception} and \textsc{EfficientNet-B7} through a stacking-based ensemble framework. The extracted features are further selected using a multi-layer perceptron meta-learner for classification. Agarwal \emph{et al.} \cite{agarwal2021md} proposed a multi-domain \textit{cross-stitched network} for deepfake detection  --\textsc{MD-CSDNetwork}. It combined spatial and frequency domain features to improve generalization. The model has two parallel branches -- for processing spatial information and frequency-domain artifacts present in fake videos. Das \emph{et al.} \cite{das2023unmasking} proposed a masked autoencoding spatiotemporal transformer-based deepfake detection method using \textit{self-supervised learning}. The model combined two \textsc{Vision Transformers} -- \textit{Spatial Transformer} that learns frame-level visual features from individual RGB frames, and \textit{Temporal Transformer} for learning motion inconsistencies (by analysing optical flow fields). He \emph{et al.} \cite{he2024gazeforensics} proposed \textsc{GazeForensics}, that uses gaze-guided spatial inconsistency learning (e.g. unnatural eye movements) for improving deepfake detection accuracy. They used \textit{3D gaze estimation network} to extract gaze representations, which are then used for classification by integrating consistency between real and fake gaze patterns. 



\subsection{Frequency Domain Approaches}
Frequency domain analysis based methods analyze the spectral properties of the videos, capturing frequency artifacts that generative models unintentionally introduce (due to inconsistencies in texture synthesis).  The features generally include Discrete Fourier Transformation (DFT),  Discrete Cosine Transformation (DCT), Fast Fourier Transform (FFT), etc. 
Tan \emph{et al.} \cite{tan2024frequency} proposed \textsc{FreqNet}, a frequency-aware deepfake detection framework designed for better generalization across different deepfake generation models. While traditional methods detect artifacts introduced during the up-sampling process in GAN pipelines, \textsc{FreqNet} uses frequency domain learning by applying convolutional layers to the phase and amplitude spectra between Fast Fourier Transform (FFT) and Inverse FFT (iFFT). Kohli and Gupta \cite{kohli2021detecting} proposed a frequency-based convolutional neural network (fCNN) for detecting \textsc{DeepFake}, \textsc{FaceSwap}, and \textsc{Face2Face} facial forgeries (particular to as seen in \texttt{FaceForensics++} dataset). They convert the facial images from each of these classes to their respective frequency domain using two-dimensional \textit{Global Discrete Cosine Transforms} (2D-GDCT), which are then processed using a three-layer \textit{Frequency CNN} (fCNN) to learn and therefore classify between real and fake faces. Hasanaath \emph{et al.} \cite{hasanaath2025fsbi} introduced Frequency-Enhanced Self-Blended Images (FSBI) that integrates \textit{self-blended images}, and \textit{frequency-domain analysis}. For conversion to frequency domain, Discrete Wavelet Transform (DWT) was used on the self-blended images, which are then used to extract features from, by using convolutional neural network (standard for frequency-based feature extraction). Jeong \emph{et al.} \cite{jeong2022frepgan} proposed \textsc{FrepGAN} using \textit{frequency-level perturbation maps}. The training process in \textsc{FrepGAN} is divided into two phases -- \textit{Early Training} (which identifies frequency-level artifacts), and \textit{Later Training} (which identifies higher-level inconsistencies).



% \subsection{Auditory Approaches}
% Beyond visualization based detection, deepfake forensics research extends to synthetic voice detection (auditory approaches).  Hamza \textit{et al.} \cite{hamza2022deepfake} introduced Mel-Frequency Cepstral Coefficients (MFCC)-based classifier, for detecting synthetic speech anomalies through \textit{statistical acoustic modeling}. Wani \textit{et al.} \cite{wani2025audio} proposed a \textit{feature distillation framework}, which uses temporal and spectral domain representations for identification of voice clone. This is rather a hybrid approach that uses both visual and auditory domain for deepfake identification.

% \textcolor{red}{\textbf{Take i/p from Arnab da on the comment "Instead of auditory approaches, you can keep a generic section on other approaches; there we can say about auditory as well as multimodal approaches." Also since we have removed the audio part, remember to remove the citations from .bib file}}




% All text must be in a two-column format.
% The total allowable size of the text area is $6\frac78$ inches (17.46 cm) wide by $8\frac78$ inches (22.54 cm) high.
% Columns are to be $3\frac14$ inches (8.25 cm) wide, with a $\frac{5}{16}$ inch (0.8 cm) space between them.
% The main title (on the first page) should begin 1 inch (2.54 cm) from the top edge of the page.
% The second and following pages should begin 1 inch (2.54 cm) from the top edge.
% On all pages, the bottom margin should be $1\frac{1}{8}$ inches (2.86 cm) from the bottom edge of the page for $8.5 \times 11$-inch paper;
% for A4 paper, approximately $1\frac{5}{8}$ inches (4.13 cm) from the bottom edge of the
% page.

% %-------------------------------------------------------------------------
% \subsection{Margins and page numbering}

% All printed material, including text, illustrations, and charts, must be kept
% within a print area $6\frac{7}{8}$ inches (17.46 cm) wide by $8\frac{7}{8}$ inches (22.54 cm)
% high.
% %
% Page numbers should be in the footer, centered and $\frac{3}{4}$ inches from the bottom of the page.
% The review version should have page numbers, yet the final version submitted as camera ready should not show any page numbers.
% The \LaTeX\ template takes care of this when used properly.



% %-------------------------------------------------------------------------
% \subsection{Type style and fonts}

% Wherever Times is specified, Times Roman may also be used.
% If neither is available on your word processor, please use the font closest in
% appearance to Times to which you have access.

% MAIN TITLE.
% Center the title $1\frac{3}{8}$ inches (3.49 cm) from the top edge of the first page.
% The title should be in Times 14-point, boldface type.
% Capitalize the first letter of nouns, pronouns, verbs, adjectives, and adverbs;
% do not capitalize articles, coordinate conjunctions, or prepositions (unless the title begins with such a word).
% Leave two blank lines after the title.

% AUTHOR NAME(s) and AFFILIATION(s) are to be centered beneath the title
% and printed in Times 12-point, non-boldface type.
% This information is to be followed by two blank lines.

% The ABSTRACT and MAIN TEXT are to be in a two-column format.

% MAIN TEXT.
% Type main text in 10-point Times, single-spaced.
% Do NOT use double-spacing.
% All paragraphs should be indented 1 pica (approx.~$\frac{1}{6}$ inch or 0.422 cm).
% Make sure your text is fully justified---that is, flush left and flush right.
% Please do not place any additional blank lines between paragraphs.

% Figure and table captions should be 9-point Roman type as in \cref{fig:onecol,fig:short}.
% Short captions should be centred.

% \noindent Callouts should be 9-point Helvetica, non-boldface type.
% Initially capitalize only the first word of section titles and first-, second-, and third-order headings.

% FIRST-ORDER HEADINGS.
% (For example, {\large \bf 1. Introduction}) should be Times 12-point boldface, initially capitalized, flush left, with one blank line before, and one blank line after.

% SECOND-ORDER HEADINGS.
% (For example, { \bf 1.1. Database elements}) should be Times 11-point boldface, initially capitalized, flush left, with one blank line before, and one after.
% If you require a third-order heading (we discourage it), use 10-point Times, boldface, initially capitalized, flush left, preceded by one blank line, followed by a period and your text on the same line.

% %-------------------------------------------------------------------------
% \subsection{Footnotes}

% Please use footnotes\footnote{This is what a footnote looks like.
% It often distracts the reader from the main flow of the argument.} sparingly.
% Indeed, try to avoid footnotes altogether and include necessary peripheral observations in the text (within parentheses, if you prefer, as in this sentence).
% If you wish to use a footnote, place it at the bottom of the column on the page on which it is referenced.
% Use Times 8-point type, single-spaced.


% %-------------------------------------------------------------------------
% \subsection{Cross-references}

% For the benefit of author(s) and readers, please use the
% {\small\begin{verbatim}
%   \cref{...}
% \end{verbatim}}  command for cross-referencing to figures, tables, equations, or sections.
% This will automatically insert the appropriate label alongside the cross-reference as in this example:
% \begin{quotation}
%   To see how our method outperforms previous work, please see \cref{fig:onecol} and \cref{tab:example}.
%   It is also possible to refer to multiple targets as once, \eg~to \cref{fig:onecol,fig:short-a}.
%   You may also return to \cref{sec:formatting} or look at \cref{eq:also-important}.
% \end{quotation}
% If you do not wish to abbreviate the label, for example at the beginning of the sentence, you can use the
% {\small\begin{verbatim}
%   \Cref{...}
% \end{verbatim}}
% command. Here is an example:
% \begin{quotation}
%   \Cref{fig:onecol} is also quite important.
% \end{quotation}

% %-------------------------------------------------------------------------
% \subsection{References}

% List and number all bibliographical references in 9-point Times, single-spaced, at the end of your paper.
% When referenced in the text, enclose the citation number in square brackets, for
% example~\cite{Authors14}.
% Where appropriate, include page numbers and the name(s) of editors of referenced books.
% When you cite multiple papers at once, please make sure that you cite them in numerical order like this \cite{Alpher02,Alpher03,Alpher05,Authors14b,Authors14}.
% If you use the template as advised, this will be taken care of automatically.

% \begin{table}
%   \centering
%   \begin{tabular}{@{}lc@{}}
%     \toprule
%     Method & Frobnability \\
%     \midrule
%     Theirs & Frumpy \\
%     Yours & Frobbly \\
%     Ours & Makes one's heart Frob\\
%     \bottomrule
%   \end{tabular}
%   \caption{Results.   Ours is better.}
%   \label{tab:example}
% \end{table}

% %-------------------------------------------------------------------------
% \subsection{Illustrations, graphs, and photographs}

% All graphics should be centered.
% In \LaTeX, avoid using the \texttt{center} environment for this purpose, as this adds potentially unwanted whitespace.
% Instead use
% {\small\begin{verbatim}
%   \centering
% \end{verbatim}}
% at the beginning of your figure.
% Please ensure that any point you wish to make is resolvable in a printed copy of the paper.
% Resize fonts in figures to match the font in the body text, and choose line widths that render effectively in print.
% Readers (and reviewers), even of an electronic copy, may choose to print your paper in order to read it.
% You cannot insist that they do otherwise, and therefore must not assume that they can zoom in to see tiny details on a graphic.

% When placing figures in \LaTeX, it's almost always best to use \verb+\includegraphics+, and to specify the figure width as a multiple of the line width as in the example below
% {\small\begin{verbatim}
%    \usepackage{graphicx} ...
%    \includegraphics[width=0.8\linewidth]
%                    {myfile.pdf}
% \end{verbatim}
% }


% %-------------------------------------------------------------------------
% \subsection{Color}

% Please refer to the author guidelines on the \confName\ \confYear\ web page for a discussion of the use of color in your document.

% If you use color in your plots, please keep in mind that a significant subset of reviewers and readers may have a color vision deficiency; red-green blindness is the most frequent kind.
% Hence avoid relying only on color as the discriminative feature in plots (such as red \vs green lines), but add a second discriminative feature to ease disambiguation.