\clearpage

\title{Appendix}
\onecolumn



\section{Implementation Details}
\label{app:details}

In this section we provide more technical details about our implementation of watermark generators. 
Hyper-parameters and computing infrastructure information is also provided. 


\subsection{Watermark Generator}
\label{app:wm-details}


In this work, we use MNIST digits~\citep{lecun1998gradient} and MNIST letters~\citep{cohen2017emnist} to create watermarks. 
Generated watermarks are grayscaled of size $ 64 \times 64$.
Watermark generators are parameterized by a three-hidden-layer MLP conditional VAEs following \citet{sohn2015learning}. 

\textbf{Controllable Location.}
To make watermark location controllable, when training the watermark generator, 
we put digits and letters in various locations in the image, by assigning different padding sizes to four sides. 
Based on this, we calculated the left and bottom padding ratios lying in $[0, 1]$. 
To be more specific, the watermark is placed at the \textit{leftmost} region when the left padding ratio takes 0, and at the \textit{rightmost} region when it takes 1. 
Bottom padding ratio functions in a similar way. 
In the training, the two padding ratios, together with the digit index (0-9), are used as conditions. 


\textbf{Watermark Optimization.}
In Alg \ref{alg:main}, {\name} uses the fixed pre-trained CVAE and optimizes its latent code and two padding ratios to learn watermarks. 
This parameterization allows us to maintain a good \textbf{watermark readability}. 
To further satisfies the image readability, we define $\mathcal R (\mv) = \| \mv \|_1$ to avoid learning an excessive watermark.


\textbf{Differentiable Approximation for Mask Matrix.}
To obtain a differentiable approximation for masking matrix in inpainting, we define 
\begin{align*}
    \Amat_m = \text{diag}\left(\text{sig}\left(\frac{m_1 - \alpha}{\beta}\right), \dots, \text{sig}\left(\frac{m_n - \alpha}{\beta}\right)  \right),
\end{align*}
and use $\alpha=0.15, \beta=0.01$. 
See inpainting observations in Figure \ref{fig:main} generated with this differentiable approximation.

\textbf{Initialization}
Due to the highly non-smooth nature in location parameters, 
in {\name}, we conduct a 3-by-3 grid search on watermark locations before running the complete algorithm. This search is based on the reconstruction quality from a 50-step MLE solution, as detailed in \cite{liu2023aipo}.  



\subsection{Hyperparameters}

We summarize all hyperparameters used in this paper in Table \ref{tab:hparams}, which we found working well. 
In execution, we rescale $\mathcal{R}(\mv)$ terms before tuning its weight to avoid bearing with the magnitude difference. 
Here we also provide a few clarification on some choices.

First, the
``coefficient of watermark regularizer'' in Eq \eqref{eq:inpaint-mask} (i.e., 
norm of the watermark), was tuned by monitoring the watermark size. 
We note that a strong regularization will make the watermark disappear, and a weak one will let the watermark cover the whole image. We chose 0.001 as it allowed watermark size to remain stable, i.e., close to the initial size during the optimization process. Note that it was not tuned based on watermark removal performance.

Second, the two ``smoothing factors'' $\alpha, \beta$, as discussed above, were not treated as tunable hyperparameter to benefit {\name} performance. 
Instead, we chose the two solely to make the sigmoid function have a fairly wide range $[0, 1]$ when its input (pixel) lies in $[0, 1]$. 

\begin{table}[htbp!]
\centering
% \small
\caption{
Hyper-parameters of different methods.
}
\label{tab:hparams}
\renewcommand{\tabcolsep}{4pt}
\resizebox{0.7\linewidth}{!}{%{}
\begin{tabular}{r r cc}
\toprule[0.3ex]
& & Digit Logo Watermarks & Initial Watermarks \\

\cmidrule{3-4}
% \midrule
                                    & HParam          & Value   &         Value \\
\cmidrule[0.3ex]{2-4} %\cmidrule{5-6} \cmidrule{8-10}
\multirow{7}{*}{{\name}}            & Learning Rate     & 0.05    & 0.05          \\
                                    & Optimizer & Default AdamW \citep{ilya2018decoupled} & Default AdamW \citep{ilya2018decoupled} \\
                                    & Coeff. of $\mathcal{R}(\mv)$ (Eq \eqref{eq:inpaint-mask}) & 0.001 & 0.001 \\
                                    & $\alpha, \beta$ (Eq \eqref{eq:inpaint-mask}) & $\alpha=0.15, \beta=0.01$ & $\alpha=0.15, \beta=0.01$ \\
                                    & Meta Step $K$       & 1   & 1 \\
                                    & Targeted $\lambda$  & 1   & 1 \\
                                    & Step Size for $\lambda$  & \multicolumn{2}{c}{Using dynamic strategies from \citet{liu2023aipo}.}\\
                                    % & Learning Rate     & \\

\cmidrule{2-4} %\cmidrule{5-6} \cmidrule{8-10}
\multirow{2}{*}{{Flow-R}}        & $\lambda$    & 1   & 1 \\
                                    & Others  & \multicolumn{2}{c}{Identical to \citet{liu2023aipo}} \\
\cmidrule{2-4} %\cmidrule{5-6} \cmidrule{8-10}
\multirow{2}{*}{{Repaint}}          & Batch Size & 10 & 10 \\ 
                                    & Others  & \multicolumn{2}{c}{Identical to \citet{lugmayr2022repaint}} \\
\cmidrule{2-4} %\cmidrule{5-6} \cmidrule{8-10}
\multirow{1}{*}{{SLBR}}             & All  & \multicolumn{2}{c}{Identical to \citet{liang2021visible}} \\
\multirow{1}{*}{{DeNet}}             & All  & \multicolumn{2}{c}{Identical to \citet{sun2023denet}} \\


\bottomrule[0.3ex]
\end{tabular}
}
\end{table}






\subsection{Computing Infrastructure}
\label{app:infra}

Our experiments were conducted on a NVIDIA A6000 48GB GPU.
Our watermark generators were trained on a CPU-only machine.



\subsection{Testing Datasets}
\label{app:data}
We tested {\name} on 100 validation samples from CelebA and ImageNet datasets respectively. 
More specifically, to guarantee reproducibility, we used the first 100 CelebA validation samples from \cite{whang21solve}; and the first 100 ImageNet samples from public subset on \url{https://github.com/EliSchwartz/imagenet-sample-images}.






\section{More Experiment Results}
\label{app:results}

In this section we provide more experimental results. 
In particular, Table \ref{tab:quant-full} reports the complete version of Table \ref{tab:quant}. 
SLBR failed to recognize the two types of watermark, as indicated by its low ``Before'' and ``After'' values: these values, as detailed in Sec \ref{sec:experiment}, refer to how much reconstruction $\Tilde{\xv}$ is better than observations $\yv$, i.e., $\text{PSNR}(\Tilde{\xv}, \xv_T) - \text{PSNR}(\yv, \xv_T)$. 
See Figure \ref{fig:main} for concrete examples of SLBR failures.



\input{subfiles/6_main_table}
\input{subfiles/6_raw_table}