% \documentclass{uai2023} % for initial submission
\documentclass[accepted]{uai2023} % after acceptance, for a revised
% version; also before submission to
% see how the non-anonymous paper
% would look like

%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2023} % ptmx math instead of Computer
% Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2023} % newtx fonts (improves upon
 % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

% for cross referencing the main text
% PLEASE ONLY USE xr IN THE SUPPLEMENTARY MATERIAL. 
% In the main paper, hard code any cross-reference to the supplementary material. 
% \usepackage{xr} 
% \externaldocument{uai2023-template}

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Noisy Adversarial Representation Learning for Effective and Efficient Image Obfuscation}

% The standard author block has changed for UAI 2023 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author{Jonghu Jeong$^*$}
\author{Minyong Cho$^*$}
\author{Philipp Benz}
\author{Tae-hoon Kim}

\affil{%
    Deeping Source Inc.\\
    Seoul\\
    Republic of Korea
}
  \begin{document}
  
\onecolumn %% Turn this off if single column is desired for the supplement
\maketitle
\def\thefootnote{*}\footnotetext{Equal Contribution.}
\appendix

\begin{table*}[t]
\caption{Results with multi-class classification. The practical upper bounds (RN18) of each privacy and utility task are reported by training ResNet18~\citep{he2016deep} with original images, respectively.
}
\centering
\begin{tabular}{l | ccc | ccc }
\toprule
       & \multicolumn{3}{c}{FairFace (Race / Age)} & \multicolumn{3}{c}{FairFace (Age / Race)}  \\
Method & Privacy $\downarrow$ & Utility $\uparrow$ & $\Delta \uparrow$ & Privacy $\downarrow$ & Utility $\uparrow$ & $\Delta \uparrow$  \\
\midrule
RN18 & 63.57 & 55.49 & - & 55.49 & 63.57 & -    \\
\midrule
MaxEnt  & 23.40 & 53.82 & 30.42 & 30.82 & 63.27 & 32.45  \\
DeepObfs. & 53.12 & 50.40 & -2.72 & 63.54 & 62.08 & -1.46\\
DISCO   & 53.37 & 46.63 & -6.74 & 62.54 & 57.10 & -5.44 \\
\midrule
Ours (RN18$_3$) & 20.68 & 52.64 & 31.96 & 31.89 & 62.45 & 30.56   \\
Ours (RN18$_4$) & 20.70 & 52.77 & \textbf{32.07} & 29.75 & 62.98 & \textbf{33.23}\\
\bottomrule
\end{tabular}
\label{tab:complex-task}
\end{table*}

\section{Complex Tasks Other Than Binary Classification}
\subsection{Multi-class Classification}
\begin{figure*}[t]
\centering
\includegraphics[width=1.0\linewidth,bb=0 0 1962 1140]{figures/complex_task_recon.png}
\caption{Reconstruction attack on the complex task obfuscators. The reconstructor architecture is from DeepObfs.~\citep{li2021deepobfuscator} and is trained with MSE loss between original image and reconstructed image. Our method is the only one that successfully defend the reconstruction attack by concealing identity, race, age, and face shape.}
\label{fig:complex-recon}
\end{figure*}

While previous experiments are conducted with at least one of the two tasks (privacy and utility) being binary classification, we show that our method can also be applied to more complex task settings, such as multi-class classification, for both the privacy and utility task. The FairFace~\citep{karkkainen2021fairface} dataset provides ``Age" and ``Race" classification tasks that consists of 9 classes (``0 to 2", ``3 to 9", ``10 to 19", ... ,``more than 70") and 7 classes (``Black", ``East Asian", ``Indian", ``Latino Hispanic", ``Middle Eastern", ``Southeast Asian", and ``White"), respectively.
Table~\ref{tab:complex-task} shows the results using ``Age" and ``Race" as the privacy and the utility task, respectively. 
Table~\ref{tab:complex-task} also shows the results of a controlled study by switching the privacy and the utility tasks.
We conducted this experiment to investigate how different combinations of tasks affect the privacy-utility trade-off.



MaxEnt~\citep{roy2019mitigating} has the highest utility accuracy for both experiments compared to the other methods, excluding the upper bound. The result is in line with the results shown in the main paper with the other datasets. 
DeepObfs.~\citep{li2021deepobfuscator} suffers from a low privacy-utility trade-off which is consistent with its previous results. DISCO~\citep{singh2021disco} also has a hard time defending against the privacy leakage attack, contrary to its performance for the binary classification tasks.
We provide results for our method using $\sigma=960$ for RN18$_3$, and $\sigma=15$ for RN18$_4$, respectively.
We note that ours with RN18$_4$ has the highest $\Delta$ among all methods, reaffirming that our proposed noise module effectively facilitates the obfuscator to learn privacy-preserving representations.
RN18$_3$ also shows a comparable $\Delta$ even with lower computational cost and memory usage.
The result confirms that our method also applies to more complex tasks other than simple tasks such as binary classification while reducing the resource on the client side.

Figure~\ref{fig:complex-recon} shows reconstruction attack results on the representations generated by obfuscators trained with the privacy task ``Race" and utility task ``Age". DeepObfs. and DISCO failed to defend against reconstruction attacks revealing identity and race. MaxEnt successfully removes identity and race while retaining age. 
Our methods, both RN18$_3$ and RN18$_4$, successfully defend against the reconstruction attack, making it difficult for the adversary to identify both age and race on the reconstructed images.
This result is consistent with the reconstruction attack on CelebA~\citep{liu2015faceattributes} in the main paper.

\subsection{Facial Landmark Detection}
In addition to classification, we applied our method to a regression task, facial landmark detection (FLD). The CelebA~\citep{liu2015faceattributes} dataset provides image coordinates of 5 facial landmarks (left eye, right eye, nose, left mouth corner, and right mouth corner). We set facial landmark detection and gender classification as the utility and privacy tasks.
We compare our approach, RN18$_3$ using $\sigma=1920$, with the practical upper bound model (RN18) and MaxEnt, showing the best classification task performance among the compared methods.
Mean squared error (MSE) and accuracy are used as metrics for FLD and classification, respectively.

In Table~\ref{tab:fld-task}, our method shows an MSE of 0.1766, roughly 7 times better than MaxEnt. Regarding privacy accuracy, ours shows results that are only $1.1\%$p higher than MaxEnt.
The performance of our approach is comparable to MaxEnt since the accuracy is close to $50\%$, the lower bound for binary classification.
RN18$_3$ is more efficient than MaxEnt regarding computational cost (about 23\% less GFLOPs), reducing the client-side resource burden.


\begin{table}[t]
\caption{Facial landmark detection with gender classification as a privacy task. The metrics for the privacy and utility task are accuracy (\%) and MSE, respectively.}
\centering
\begin{tabular}{l | ccc}
\toprule
Method & Privacy (Gender) $\downarrow$ & Utility (FLD) $\downarrow$  \\
\midrule
RN18 & 98.14 & 0.0368 \\
\midrule
MaxEnt  & \textbf{57.43} & 1.2156 \\
Ours (RN18$_3$) & 58.53 & \textbf{0.1766} \\
\bottomrule
\end{tabular}
\label{tab:fld-task}
\end{table}


\begin{table}[ht]
\caption{Results on highly correlated privacy and utility tasks. Our method shows the biggest privacy-utility gap ($\Delta$) among the other methods while having less computational cost. }
\centering
\begin{tabular}{l | ccc }
\toprule
       & \multicolumn{3}{c}{CelebA (Mouth open / Smiling)}\\
Method & Privacy $\downarrow$ & Utility $\uparrow$ & $\Delta \uparrow$ \\
\midrule
RN18 & 94.20 & 93.48 & -- \\
\midrule
MaxEnt  & 80.82 & 93.29 & 12.47 \\
DISCO   & 78.30 & 90.70 & 12.40 \\
DeepObfs. & 94.06 & 91.99 & -2.07 \\
\midrule
Ours (RN18$_3$) & 76.99 & 91.56 & \textbf{14.57} \\
Ours (RN18$_4$) & 57.44 & 91.61 & \textbf{34.17} \\
\bottomrule
\end{tabular}
\label{tab:correlation}
\end{table}



\section{Correlation Between Privacy and Utility Tasks}
We present an experiment with privacy and utility tasks that are highly correlated on the CelebA~\citep{liu2015faceattributes} dataset.
The experiment in the main paper is conducted with ``Gender" as the privacy task and ``Smiling" as the utility task. The two classes show a Cram{\'e}r's V~\citep{cramer2016mathematical} correlation coefficient of $0.1367$. Here, we test with ``Mouth open" as a privacy task and ``Smiling" as a utility task, which is a highly correlated task, as made evident by the correlation coefficient of $0.5316$. 

A result comparison is shown in Table~\ref{tab:correlation}. Although DeepObfs. shows relatively high accuracy for the utility task, it fails to defend against the privacy leakage attack. MaxEnt has the best utility accuracy but performs poorly in defense compared with other approaches.
DISCO has the lowest privacy while showing the lowest utility among all baselines, which leads to a small privacy-utility gap ($\Delta$).
Our methods, RN18$_3$ and RN18$_4$ with $\sigma=15360$, outperform the other methods regarding the privacy-utility gap ($\Delta$). The $\Delta$ of RN18$_4$ is roughly $20\%$p higher than the others presenting a considerable gain. Even in the case of RN18$_3$, utilizing a more computationally efficient model, the $\Delta$ is roughly $2\%$p higher than the other approaches. The results confirm that our method preserves both privacy and utility performance while being efficient regarding the resource burden, even in the case of highly correlated tasks.


\section{Detailed Settings for User Study}
We report detailed settings for user study to show our best efforts to provide impartial results.
We first randomly sample 30 images from CelebA~\citep{liu2015faceattributes} that ResNet18 classifies correctly. By doing so, we address the possibility that the original images yield ambiguous results by default, which could affect our results.
Then, we randomly selected the test subject group of 30 people in their 20s and 30s who live in Seoul, South Korea.
Finally, the images are ordered randomly regardless of the obfuscation method to eliminate any bias that may arise from people noticing a pattern.

For the results in the main paper,
note that a 50\% ratio of ``Correct'' and ``Wrong'' can be considered a random guess since the tasks are binary classification and the data we presented is balanced. 
Additionally, ``Cannot judge'' can also be considered a random guess since the users would have guessed answers randomly if there was no ``Cannot judge'' option.
Thus, an equal ratio for ``Correct'' and ``Wrong'' answers, meaning 50\% per each, or all of the ``Cannot judge'' options are the best we can achieve regarding privacy protection for binary classification.


\section{Privacy-Utility Trade-off Under Different Standard Deviations}
\label{section:priv-util-trade-off-std}
% The noise adding module has one hyper-parameter, the standard deviation ($\sigma$) of the Gaussian distribution, which controls the intensity of the noise.
Intuitively, the noise intensity is highly relevant to removing information in the obfuscated feature.
More privacy would be achieved for the model with increased intensity since the noise would confuse the adversary model and hinder training the adversary task.
However, as the loss of information gets severe, utility accuracy would be negatively affected at a tipping point, eventually leading to a lower privacy-utility trade-off.
Thus, we must carefully choose the appropriate noise intensity for the best trade-off.
We report the privacy-utility trade-off for RN18$_3$ with various $\sigma$ in Figure~\ref{fig:delta_each_std}. 
An appropriate $\sigma$ of noise exists to achieve the best privacy-utility trade-off. Note that the best $\sigma$ differs for different datasets and models.

\begin{figure}[t]
\centering
\includegraphics[width=1.0\linewidth,bb=0 0 855 299]{figures/delta_each_std.png}
\caption{Privacy-utility trade-off under each standard deviation of noise.
Delta represents the performance gap between utility and privacy.
We report the privacy and utility accuracy in the supplementary materials.}
\label{fig:delta_each_std}
\end{figure}

\bibliography{jeong_184}

\end{document}
