% \documentclass{uai2022} % for initial submission
\documentclass[accepted]{uai2022} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2022} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2022} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams
\usepackage{caption}
\usepackage{graphicx}
%\usepackage{amsmath}
\usepackage{booktabs}
\usepackage{paralist}
\usepackage{amsfonts}
\usepackage{comment}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{nicefrac}
\usepackage{dsfont}
\usepackage{mathtools}
\usepackage{subcaption}
\usepackage{xcolor}
\usepackage{algorithm}
\usepackage{algorithmic}
\urlstyle{same}
\usepackage{float}
%\usepackage{ulem}



%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example
%% Self-defined macros
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\newcommand{\twodots}{\mathinner {\ldotp \ldotp}}

\newtheorem{theorem}{Theorem}[]
\newtheorem{example}{Example}[]
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{definition}{Definition}[]
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem*{remark}{Remark}

\def\bt{\color{red}}
\def\et{\color{black}}

\newcommand\Voters{\mathcal{N}}

\title{Multi-winner Approval Voting Goes Epistemic (Supplementary Material)}

% The standard author block has changed for UAI 2022 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<tahar.allouche@dauphine.eu>?Subject=Your UAI 2022 paper}{Tahar Allouche}{}}
\author[1]{\href{mailto:<lang@lamsade.dauphine.fr>?Subject=Your UAI 2022 paper}{Jérôme Lang}{}}
\author[1]{\href{mailto:<florian.yger@lamsade.dauphine.fr>?Subject=Your UAI 2022 paper}{Florian Yger}{}}

% Add affiliations after the authors
\affil[1]{%
    LAMSADE, CNRS, PSL, Université Paris-Dauphine\\
}

  
  \begin{document}
\maketitle


\appendix
% NOTE: necessary when ptmx or no mathfont class option is given
\providecommand{\upGamma}{\Gamma}
\providecommand{\uppi}{\pi}
\section{Data collection and incentives}
To see how the participants behave given the ranking incentives that we defined in the football quiz, we plotted the histogram of the sizes of the answers (see Figure \ref{hist}). It appears that although the platform enables to select every alternative, only two voters did so for all the questions. Moreover, figures $\ref{sw_hist}$ and $\ref{two_hist}$ show that the majority of the voters tend to select exactly the number of teams that appear in an image. 

\begin{figure}
     \centering
     \begin{subfigure}[b]{0.47\textwidth}
          \centering
         \includegraphics[width=0.85\textwidth]{figures-supp/Hist_sizes_2w.pdf}
        \subcaption{Two-winner instances}
        \label{two_hist}
     \end{subfigure}
     \begin{subfigure}[b]{0.47\textwidth}
          \centering
         \includegraphics[width=0.85\textwidth]{figures-supp/Hist_sizes_sw.pdf}
        \subcaption{Single-winner instances}
        \label{sw_hist}
     \end{subfigure}
     %\begin{subfigure}[b]{0.49\textwidth}
     %    \centering
     %    \includegraphics[width=0.85\textwidth]{Hist_sizes.pdf}
     %        \subcaption{All answers}
      %  \label{all_hist}
     %\end{subfigure}
        \caption{Histogram of the ballots' sizes}
        \label{hist}
\end{figure}

\section{Initializing Voters' Reliabilities}
Inspired by the \emph{Anna Karenina Principle} in~\cite{truth2019}, we devised an initialisation strategy for the voters' reliabilities. In his book, Leo Tolstoi stated that "Happy families are all alike; every unhappy family is unhappy in its own way". In the same spirit, it seems reasonable to make the hypothesis that accurate users tend to make similar answers, whereas inaccurate users have each their own way of being inaccurate.
\begin{comment}
 We use the following heuristic (see Algorithm \ref{algo_init}) for the initialization. We used the Jaccard distance given by:
 $$d_{Jacc}(A,B)= \frac{|\overline{A}\cap B|+|A \cap \overline{B}|}{|A \cup B|} $$
%\cj{$d_{Jacc}$ défini?}\cj{Il faut un exemple}
\begin{algorithm}[H]
\caption{ Initializing $(p_i,q_i)_i$ }
\label{algo_init}
$\begin{array}{ll}
\textbf{Input:} & \mbox{Approval ballots $(A_i^z)_{z,i}$}\\
\textbf{Output:} &\mbox{Initialization $(\hat{p}^{(0)}_i,\hat{q}^{(0)}_i)$} 
\end{array}$
\begin{algorithmic} 
\STATE -Compute $w_{max}=\frac{n}{1+n}, w_{min}=\frac{1}{1+n}$
\STATE -Compute $d_i = \sum_{j\neq i} d_{Jacc}(A_i,A_j)$
\STATE -Compute $d_{max} = \max d_i, d_{min} = \min d_i $
\STATE -Compute $w_i = (w_{max}-w_{min})\left(\frac{\frac{1}{d_i}-\frac{1}{d_{max}}}{\frac{1}{d_{min}}-\frac{1}{d_{max}}}\right)+w_{min}$
\STATE -Fix $\hat{p}^{(0)}_i= \frac{1}{2}$ and $\hat{q}^{(0)}_i=\frac{1-\frac{e^{w_i}-1}{e^{w_i}+1}}{2}$
\end{algorithmic}
\end{algorithm}

\begin{remark}
The formulas in Algorithm \ref{init} guarantee that a voter's parameters $(\hat{p}^{(0)}_i,\hat{q}^{(0)}_i)$ are such that her initial weight is equal to $w_i$, and that $\frac{w_{max}}{w_{min}}=n$ which means that initially, the voter closest in average to the other voters counts $n$ times the voter with biggest average distance. 
\end{remark}
\end{comment}

Here follows an example of the Anna Karenina initialization scheme.
\begin{example}
Consider following the approval profile (Table \ref{app profile}) for $3$ voters, $5$ alternatives and $4$ Instances.
\begin{table}[h]
    \centering
    \begin{tabular}{|l|c|c|c|c|}
  \hline
 & $A^1$  & $A^2$ & $A^3$ & $A^4$ \\
  \hline
  Voter $1$ & $\{a_1,a_4\}$ & $\{a_1\}$ & $\{a_3\}$ & $\{a_1\}$ \\
  \hline
  Voter $2$ & $\{a_2\}$ & $\{a_5\}$ & $\{a_4\}$ & $\{a_1\}$\\
  \hline
  Voter $3$ & $\{a_2,a_3,a_4\}$ & $\{a_2,a_3,a_5\}$ & $\{a_2,a_3\}$ & $\{a_3\}$ \\
  \hline 
\end{tabular}
    \caption{Approval Ballots of 3 Voters on 4 Instances}
    \label{app profile}
\end{table}
Here we have that:
$$w_{max}=\frac{n}{n+1}=0.75, w_{min}=\frac{1}{n+1}=0.25 $$
First, compute the mean Jaccard distance of all voters:
$d_1=1.71, d_2=1.69,d_3=1.65 $.
So $d_{max}=d_1=1.71$ and $d_{min}=d_3=1.65$, which means that voter $3$ (the closest in average to all the voters) will get the biggest weight $w_3=w_{max}=0.75$ and voter $1$ gets the smallest weight $w_1=w_{min}$.
Next, compute the weight that will be assigned to each voter, for instance:
$$w_2=(w_{max}-w_{min})\frac{\frac{1}{d_2}-\frac{1}{d_{max}}}{\frac{1}{d_{min}}-\frac{1}{d_{max}}}+w_{min}=0.38 $$
Now we can set the initial values for the reliability parameters accordingly:
$$\hat{p}^{(0)}_2= \frac{1}{2} ,\hat{q}^{(0)}_2=\frac{1-\frac{e^{w_2}-1}{e^{w_2}+1}}{2} $$
We can check that these parameters are such that:
$$ln\left[ \frac{p_2(1-q_2)}{q_2(1-p_2)}\right]=w_2 $$
After proceeding in the same fashion with all the voters, we get the initial parameters:
$$\left\{
    \begin{array}{lll}
        \hat{p}_1^{(0)} = 0.5 & \hat{p}_2^{(0)} = 0.5 & \hat{p}_3^{(0)} = 0.5  \\
        \hat{q}_1^{(0)} = 0.44 & \hat{q}_2^{(0)}  =0.41 & \hat{q}_3^{(0)}  =0.32 \\
    \end{array}
\right.$$
\end{example}

Since the AMLE only guarantees convergence to a local maximum, which makes the result depending on the initial point, we compared the results of this initialization (Anna Karenina) to other procedures to motivate its choice, see Figure \ref{init}, namely we tested:
\begin{compactitem}
    \item Uniform weights: Initially all the voters in the batch are given the same weight.
    \item Random weights: Initially, for each voter in the batch, $p_i$ is randomly picked from $(0.5,1)$ and $q_i$ is randomly picked from $(0,0.5)$.
\end{compactitem}
We can notice that these two baseline procedures show very similar performances, and that they are both outperformed by the Anna Karenina initialization.
\begin{figure}
     \centering
     \begin{subfigure}[b]{0.5\textwidth}
         \centering
         \includegraphics[width=0.95\textwidth]{figures-supp/01_init.pdf}
             \subcaption{0-1 accuracy}
        \label{init_01}
     \end{subfigure}
     \begin{subfigure}[b]{0.5\textwidth}
          \centering
         \includegraphics[width=0.95\textwidth]{figures-supp/hamming_init.pdf}
        \subcaption{Hamming accuracy}
        \label{init_ham}
     \end{subfigure}
        \caption{Accuracies of different initializations}
        \label{init}
\end{figure}

\begin{comment}
\section{Time Complexity of AMLE}
We assessed the execution time of the AMLE algorithm with and without constraints (refered to as AMLE and AMLE$_f$), run on Intel Core i7-10610U CPU @1.80Ghz 4 cores, 8 threads and 32Gb RAM. Results are show in Figure \ref{complexity}. We can see that whereas the number of iteration does not seem to grow as the number of voter increases, the execution time of AMLE does, especially around $40$ voters.
\begin{figure}[H]
     \centering
     \begin{subfigure}[b]{0.45\textwidth}
         \centering
         \includegraphics[width=0.85\textwidth]{figures-supp/Time.pdf}
             \subcaption{Execution time of AMLE}
        \label{time}
     \end{subfigure}
     \begin{subfigure}[b]{0.45\textwidth}
          \centering
         \includegraphics[width=0.85\textwidth]{figures-supp/Iterations.pdf}
        \subcaption{Number of iterations of AMLE}
        \label{iterations}
     \end{subfigure}
        \caption{Time complexity of AMLE}
        \label{complexity}
\end{figure}
\end{comment}

\section{Losses}
\subsection{Hamming, Harmonic and 0-1 Subset Metrics}
In addition to the Hamming and 0-1 subset accuracies, we introduced a new metric which can be considered as an intermediate one. The Hamming metric considers each label independently and the 0-1 subset loss considers them jointly in a strict fashion, whereas the harmonic accuracies that we introduced considers all the instance's labels jointly but with different convex weights depending on the number of correctly predicted ones:
$$T(S,S^*) = \sum_{k=1}^{|S\cap S^*|} \frac{1}{6-k} $$
So out of the 5 labels:
\begin{compactitem}
    \item if 0 labels are correct then $T = 0$.
    \item if 1 labels is correct then $T = \frac{1}{5}$.
    \item if 2 labels are correct then $T = \frac{1}{5}+\frac{1}{4}$.
    \item if 3 labels are correct then $T = \frac{1}{5}+\frac{1}{4}+\frac{1}{3}$.
    \item if 4 labels are correct then $T = \frac{1}{5}+\frac{1}{4}+\frac{1}{3}+\frac{1}{2}$.
    \item if 5 labels are correct then $T = \frac{1}{5}+\frac{1}{4}+\frac{1}{3}+\frac{1}{2}+1$.
\end{compactitem}

Defined as such, this accuracy favours the estimators that are able to correctly estimate most of the instance's labels without being as rigid as the 0-1 subset accuracy.
 
This metric is reminiscent of the Proportional Approval Voting rule for multiwinner elections, which defines the score of a subset of candidates $W$ for a voter as $1 + \frac12 + \ldots + \frac1j$, where $j$ is the number of candidates in $W$ approved by the voter. We could consider more generally a class of metrics defined by a vector $\vec{w}$, such that $T(S,S^*) = w_{|S \cap S^*|}$. This class generalizes Hamming, 0-1 and Harmonic and is reminiscent of the class of {\em Thiele} rules (see for instance \cite{LacknerS20} for an extended presentation of multiwinner approval-based committee rules).

\subsection{Results}
We show in Table \ref{entire_dataset} the accuracies of the considered methods when applied to the entire annotation dataset. In Figure \ref{harmonic} we show the evolution of the Harmonic accuracies when the number of randomly picked voters in each batch increase.
\begin{table}
    \centering
    \begin{tabular}{|l|c|c|c|c|}
  \hline
   &$\mbox{AMLE}_c$  & $\mbox{AMLE}_f$ & Modal & Majority \\
  \hline
  Hamming & \textbf{0.88} & 0.86 & 0.84 & 0.80 \\
  \hline
  Harmonic & \textbf{0.78} & 0.74  & 0.69 & 0.61\\
  \hline
  0/1 & \textbf{0.60} & 0.53  & 0.46 & 0.26\\
  \hline
\end{tabular}
    \caption{Hamming and 0/1 accuracy for entire dataset}
    \label{entire_dataset}
\end{table}

\begin{figure}
         \centering
         \includegraphics[width=0.5\textwidth]{figures-supp/Harmonic.png}
         \caption{Normalized Harmonic accuracy}
        \label{harmonic}
\end{figure}
\bibliography{allouche_258-supp}


\end{document}
