% \documentclass{uai2022} % for initial submission
\documentclass[accepted]{uai2022} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2022} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2022} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams
\usepackage{caption}
\usepackage{graphicx}
%\usepackage{amsmath}
\usepackage{booktabs}
\usepackage{paralist}
\usepackage{amsfonts}
\usepackage{comment}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{nicefrac}
\usepackage{dsfont}
\usepackage{mathtools}
\usepackage{subcaption}
\usepackage{xcolor}
\usepackage{algorithm}
\usepackage{algorithmic}
\urlstyle{same}
\usepackage{float}
%\usepackage{ulem}



%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example
%% Self-defined macros
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\newcommand{\twodots}{\mathinner {\ldotp \ldotp}}

\newtheorem{theorem}{Theorem}[]
\newtheorem{example}{Example}[]
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{definition}{Definition}[]
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem*{remark}{Remark}

\def\bt{\color{red}}
\def\et{\color{black}}

\newcommand\Voters{\mathcal{N}}

\title{Multi-winner Approval Voting Goes Epistemic}

% The standard author block has changed for UAI 2022 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<tahar.allouche@dauphine.eu>?Subject=Your UAI 2022 paper}{Tahar Allouche}{}}
\author[1]{\href{mailto:<lang@lamsade.dauphine.fr>?Subject=Your UAI 2022 paper}{Jérôme Lang}{}}
\author[1]{\href{mailto:<florian.yger@lamsade.dauphine.fr>?Subject=Your UAI 2022 paper}{Florian Yger}{}}

% Add affiliations after the authors
\affil[1]{%
    LAMSADE, CNRS, PSL, Université Paris-Dauphine\\
}

  
  \begin{document}
\maketitle

\begin{abstract}
Epistemic voting interprets votes as noisy signals about a ground truth. We consider contexts where the truth consists of a set of objective winners, knowing a lower and upper bound on its cardinality. A prototypical problem for this setting is the aggregation of multi-label annotations with prior knowledge on the size of the ground truth. We posit noise models, for which we define rules that output a set of winners corresponding to local maxima of the data likelihood function. We report on experiments on multi-label annotations (which we collected).
\end{abstract}

\section{Introduction}

The epistemic view of voting assumes the existence of a ground truth which, usually,
%. This ground truth 
is either an alternative or a ranking over alternatives.  Votes reflect opinions or beliefs about this ground truth; the goal is to aggregate these votes so as to identify it. Usual methods define a noise model specifying the probability of each voting profile given the ground truth, and output the alternative that is the most likely state of the world, or the ranking that is most likely the true ranking. 
%This approach dates back from Condorcet \cite{condorcet1785}
%Epistemic voting is a branch of social choice where votes do not express preferences about which alternative should be chosen, but  the goal being to identify the alternative that is most likely 

Now, there are contexts where the ground truth does not consist of a single alternative nor a ranking, but of a {\em set of alternatives}. Typical examples are multi-label crowdsourcing (find the items in a set that satisfy some property, {\em e.g.} the sport teams appearing on a picture)
%\cj{Peut-être pas le meilleur exemple parce qu'on ne voit pas ce que viennent y faire les contraintes de cardinalité}; 
or finding the  objectively $k$ best candidates 
%(best students for a prize, 
(best papers at a conference, 
best performance in artistic sports, 
%best applicants to be hired,  
 $k$ patients with highest probabilities of survival if being assigned a scarce medical resource).\
%cj{What are the examples where it makes sense to use our iterated algorithm and those where it makes sense to use the simple, degenerated algorithm with known parameters? First of all it seems that in some of these examples well always have a single instance, such as best students, papers or patients. If we see the patient or papers problems as a collection of instances then we run into a problem: the sets of alternatives should be different from an instance to another, which we exclude. Comments?T: I agree. The AMLE works on crowdsourcing tasks where there are usually many instances (Images ,texts, sounds ...) with the same alternatives. It does not work when it is not the case, namely, when the students/patients/candidates change from one instance to the other. J: even if we consider that the priors are the same?}
%\footnote{Note that there is a second interpretation of of the ground truth in epistemic multiwinner approval voting, that would come with different solutions: on that interpretation, the cardinality of the set of alternatives to be identified bear {\em on the output} and not on the ground truth itself. This second interpretation is left for further study.}  

These alternatives that are truly in the ground truth are called `winning' alternatives. Depending on the context, the number of winning alternatives can be fixed, unconstrained, or more generally, constrained to be in a given interval.
% In single-winner epistemic voting, there is one correct alternative --- the only one that is true in the real world --- and the aim is to identify it. In multiwinner voting, things become more complicated. To start with, there are two possible interpretations of the ground truth, which call for different solutions. The main difference between both interpretations is whether the constraints on the cardinality of the set of alternatives to be identified bear {\em on the ground truth itself}, or {\em on the output}. 
%Under the first interpretation, 
This constraint 
%on the number of winning alternatives 
expresses some {\em prior knowledge on the cardinality of the ground truth}. This prior knowledge is held by the central authority that aggregates the votes, and not necessarily by the voters themselves.  
Here are some examples:
\begin{compactitem}
    %\item {\em Guitar chord transcription task}: participants hear a chord and are asked to select the set of notes that constitute it.
    %The true set of notes is known to contain at least three and at most six alternatives.
    \item {\em Picture annotation via crowdsourcing}: participants are shown a picture taken from a soccer match and have to identify the team(s) appearing in it. The ground truth is known to contain one or two teams. 
    \item{\em Guitar chord transcription}: voters are base classifier algorithms \cite{aggregation2020} which, for a given chord, select the set of notes constitute it. The true set of notes can contain %at least 
    three to 
    %at most 
    six alternatives. 
    \item {\em Jury}: participants are members of a jury which has to give an award to three papers presented at a conference: the number of objective winners is fixed to three.
 (In a variant, the number of awards would be {\em at most} three.)
 %, so that to avoid giving an award to papers that do not deserve it.)
 %in which case the number of winner is constrained to be in [0,3].
\end{compactitem}

% Under the second interpretation, this constraint bears on the {\em number of winners in the output}. That is, whatever the ground truth is, we have to output a number of alternatives in a given interval, {\em even though the number of alternative in the ground truth may lie outside this interval}; the aim is to identify an {\em admissible} set of alternatives {\em closest to the ground truth}, in some sense to be defined later. Here are three examples:
%\begin{itemize}
%    \item alternatives are students who apply to a master program. The ground truth consists of those students who have objectively a good enough level to graduate. The number of students to be accepted in the program is however constrained to be in an interval $[l, u]$: we need at least $l$ for the program to open, and  at most $u$ because of the size of classrooms.
    %The output is therefore not always the ground truth (but is strongly linked to it, and is equal to is whenever this is possible). 
%    \item alternatives are papers submitted to a conference. Again we have a minimal and maximal number of papers to be accepted. 
%    \item alternatives are papers to be given an award. The conference chair give exactly three awards. In that case the ground truth consists of the papers that truly deserve the award, and the output consists of the best three papers. A variant of the problem would allow the conference chairs to give {\em at most} three awards, so that they avoid giving an award to papers that do not deserve it.
%     \item Alternatives are Covid-19 patients in urgent need of intensive care; there is a limited number of intensive care units.
%\end{itemize}

%Under this interpretation,  committee size plays the role of an external size constraint that specifies the minimum and maximum number of allowed winning alternatives. It remains to define precisely what we mean by being closest to the ground truth; we will propose two different solution concepts.

%In this case, we suppose that alternatives are either eligible or not, and that a priori there can be any number of eligible candidates. Given the utlity/cost of selecting an eligible/uneligible candidate, our goal would be to maximize the overall utility given the approval ballots subject to the size constraints. We propose and compare two different solution concepts to solve this problem.  %MAYBE GIVE EXAMPLE OF ADMISSION TO MASTER OR AGGRéGATION ENS.

% Although this distinction between two interpretations would already make sense in single-winner epistemic voting, we will show that in this special case they lead to the same solution: they are technically identical. This will however not be the case in the general case; therefore we shall develop solutions for each of these two interpretations in two separate sections of the paper.

We assume that voters provide a simple form of information: {\em approval ballots}, indicating which alternatives they consider plausible winners. These approval ballots are not subject to any cardinality constraint: {\em a voter may approve a number of alternatives, even if it does not lie in the interval bearing on the output}. This is typically the case for totally ignorant voters, who may plausibly approve all alternatives. 
%\cj{JL: Since we say ``{\em approval ballots}, indicating which alternatives they consider plausible winners'' it could make sense to require that the voters approve at least $l$ alternatives. But we may say that the framework works even if we allow voters to break this rule.}

%In addition to the aforementioned distinction between two interpretations, which as we see lead to different rules for computing the winning committee, there are two sources of possible complications. 

%There are two sources of possible complications. The first one is whether the size of the ground truth is fixed, or is known to lie in an interval; at the other extremity of the spectrum, the constraints can be maximally loose: the lower and upper bounds are simply 0 and the total number of alternatives (such an instance is said to be {\em unconstrained}).

%{\em and} the output has to lie in a (possibly different) interval. 

%An important special case where the two interpretations [may?] meet is when the size of the committee is {\em fixed} to some integer $k$. The example where conference organizers have to give exactly three awards can be interpreted both ways; under the first interpretation, each paper is objectively excellent or not, but the number of objectively excellent papers won't have any influence on the outcome, since we must output three papers; under the second interpretation, the ground truth consists of the set of the objectively best three papers. 

% An the other extremity of the spectrum, the constraints can be maximally loose: the lower and upper bounds are simply 0 and the total number of alternatives. Such an instance is said to be {\em unconstrained}. \cj{[Are there differences between the two interpretations in that case?]}

%A source of complication is related to whether 
Sometimes, the aggregating mechanism
%(a.k.a. the {\em central authority}) 
has some prior information about the likelihood of alternatives and the reliability of voters. We first study a simple case where this information is specified in the input: in the noise model, each voter has a probability $p_i$ (resp. $q_i)$ of 
%casting a true positive, that is, to 
approving a winning (resp. non-winning) alternative,
%and a probability $q_i$ of approving a non-winning positive. 
and each alternative 
%$a_j$ 
has a prior probability 
%$t_j$ 
to be winning.
%These probabilities do not necessarily sum up to 1, which allows to accommodate voters who have a bias toward approving or disapproving. Reliable voters have a high $p_i$ and a low $q_i$, but some voters may be more reliable when they approve, and others when they disapprove. Once a pair $(p_i, q_i)$ associated with a voter is fixed, the voters' choices about two alternatives are independent. Voters' ballots are independent given the ground truth. The expression of the prior knowledge of each alternative $a$ will depend on the application. 
This departs from classical voting, where voters are usually treated equally ({\em anonymity}), and similarly for alternatives ({\em neutrality}). 

%In the first setting, we assume they are part of the problem specification. This is sometimes justified because the voters are known to have a given type, associated with default values of the parameters $p_i$ and $q_i$; or sometimes voters come with some past record. Also, we sometimes have access to part of the ground truth (for example, if the voters are predicting outcomes of sport events through a season, at a given time some of these outcomes are known and the others are not), which allows us to estimate or revise the parameters.
%: there are some alternatives for which we can observe whether they are winners or not
%Having access to part of the ground truth allows us to estimate the parameters.

This simple case serves as a building component for the more complex case where these parameters are not known beforehand but {\em estimated from the votes}: votes allow to infer information about plausibly winning alternatives, from which we infer information about voter reliabilities, which leads to revise information about winning alternatives, 
%which in turn leads to revise information about voters, 
and so on until the process converges to a local optimum of the likelihood function. Here we move back to an anonymous and neutral setting, since all alternatives (resp. voters) are treated equally before votes are known.
%technically, we have to solve a fixed point equati

%The specificity of our work is threefold: (a) the ground truth consists of a set of alternatives; (b) the input consists of approval votes; (c) the competence of the various voters is not known {\em a priori} but learnt from the input. 

%The outline of the paper is as follows.
After discussing related work (Section \ref{sec:related}), we introduce the model (Section \ref{sec: Prior}) 
%bears on the first interpretation (constraint as prior belief). 
%define a way to model the size constraints as prior knowledge for 
and give an estimation algorithm (Section \ref{sec:estimating}), first in the case where the parameters are known, and then in the case where they are estimated from the votes.  
In Section \ref{sec: experiments} we present a data gathering task and analyse the results of the experiments.
%Then in Section \ref{sec: Constraint}  we move to the second interpretation, for which we give {\em two} solution concepts, along with experiments on synthetic data. 
Section \ref{conclusion} concludes.

%A first idea would consists in simply decomposing the problem into a series of single-winner epistemic voting problems on binary domains: each problem focuses on a single alternative and the question is to know whether it is among the true winners; this would be easy, since the epistemic view of single-winner social choice for binary domains is well understood. However, such a decomposition would come with an important loss of information (and hence to taking suboptimal decisions) because multi-winner profiles allow us to derive useful information about the reliability of voters.
%, which should be exploited for determining the outcome.


%6. What are the limitations of our approach?
%We do not model dependence between voters, nor dependencies between alternatives in the ground truth beyond cardinality constraints. As a consequence, a few problems that we cannot handle (yet) are
%- finding suitable clusterings of voters and alternatives such that a given cluster of voters tends to have a similar behaviour about a given cluster of alternatives, which we can then  use to help predicting the truth.
%- handling alternatives that are explicitly dependent: for instance, if the ground truth to be predicted or discovered is about finding a date where an event took place, it may be reasonable to assume the prior probability over of alternatives to be single-peaked.

%7. Is it the first work on multiwinner epistemic voting? If not, what is the novelty in our work?
%This is not exactly the first work: there are a few recent papers, especially Caragiannis et al., 2020, Caragiannis and Micha 2017, Procaccia et al. 2012. They are different from our work because [TO BE COMPLETED].

%8. How is work on collective annotation or multi-label learning related to our work?
%[TO BE COMPLETED] 

\section{Related Work}\label{sec:related}

\paragraph{Epistemic social choice}
Epistemic social choice consists in recovering an objective {\em ground truth} from votes seen as noisy reports about the ground truth, using maximum likelihood estimation.
It dates back from 
%and has lead to a lot of developments in the last 30 years. 
Condorcet's {\em jury theorem} \citep{condorcet1785}:  $n$ independent, equally reliable voters vote on two alternatives that are {\em a priori} equally likely; if every vote is correct with probability $p>\frac{1}{2}$, then majority outputs the correct alternative with a probability increasing with $n$ and 
tending to 1 when $n$ grows to infinity. 

There are several extensions of Condorcet's jury theorem: \cite{condorcet1988} for an arbitrary number of alternatives; \cite{ShapleyGrofman84} and \cite{maximum2004} for voters with various competence degrees; \cite{Ben-YasharN97} and \cite{optimal2001} for nonuniform priors over alternatives;
\cite{voting2011} and \cite{epistemic2017} for dependent voters. \cite{common2005} and \cite{ConitzerRX09} characterize various voting rules as maximum likelihood estimators, each associated with a particular noise model. 
See \cite{collective2017} and \cite{ElkindSlinko16}.
%and  \cite{Premises2008}.
for surveys on recent developments.
%proofs and discussion.

\paragraph{Multi-winner voting}

Multi-winner voting rules map voting profiles into sets of alternatives. A voting profile can be either a collection of subsets of alternatives (approval ballots) or a collection of ranking over alternatives (ordinal ballots). The output is often constrained to have a fixed cardinality, but not always: see \cite{approval2016,FaliszewskiST20}. There have been a lot of recent developments in the field: see the recent surveys  \cite{FaliszewskiSST17}) and \cite{LacknerS20}. They, however, deal only with the classical (non-epistemic) view of social choice, where votes express preferences. %Approval-based multi-winner voting with an unconstrained committee-size has also been studied and suitable rules have been proposed \cite{approval2016}.
%\cj{JL: I removed ``Approval-based multi-winner voting with an unconstrained committee-size has also been studied and suitable rules have been proposed \cite{approval2016}.''}



\paragraph{Multi-winner epistemic voting}

Multi-winner epistemic voting has received only little attention so far. \cite{maximum2012} assume a ground truth ranking over alternatives, and identify rules that output the $k$ alternatives maximizing the likelihood to contain the best alternative, or the likelihood to coincide with the top-$k$ alternatives. 
%defines 3 objectives for the selection of a subset of alternatives: the most likely $k$ alternatives containing the single winner, the most likely $k$ best alternatives, or the most likely order over the top $k$ alternatives. The authors prove that the three problems associated to each objective is NP-hard and that the solution can be approximated by majority-graph based rules.
The last section of \citep{maximum2011} defines a noise model where the ground truth is a set of $k$ alternatives (and the reported votes are partial orders).
%contains a complexity result for the problem of recovering the k-best alternatives given noisy pairwise comparisons.  
The only work we know where the noise models produce random {\em approval votes} from a ground truth consisting of {\em a set of alternatives} is \citep{Evaluating2020}.
%\cj{JL: doesn't this statement clash with the previous sentence? T: Pas vraiment: C'est le seul papier où à la fois le ground truth, les votes et ce qu'on cherche sont des subsets. Pour procaccia, le ground truth est un ordre. Pour Xia les votes sont des ordres. Ok. Peut-petre faudrait-il être plus explicite, je vais essayer.} 
They define a family of distance-based noise models, whose prototypical instance generates approval votes selecting an alternative in the ground truth (resp. not in the ground truth) with probability $p$ (resp. $1-p$); as we see further, this is a specific case of our noise model.
%, and an alternative not in the ground truth with probability $1-p$;
%Recently, \cite{Evaluating2020} compare multi-winner approval based rules w.r.t the probability of recovering the true $k$-sized winning committee under distance-based noise models. 
%\cite{maximum2012} defines 3 objectives for the selection of a subset of alternatives: the most likely $k$ alternatives containing the single winner, the most likely $k$ best alternatives, or the most likely order over the top $k$ alternatives. The authors prove that the three problems associated to each objective is NP-hard and that the solution can be approximated by majority-graph based rules.
%Many of these theoretical models and results have been challenged in \cite{better2013} when the authors conducted experiments on both synthetic and real data and noticed a significant gap between the performances of common aggregation rules in the two cases for the recovery of a true ranking or a true single winner. 
Generalizing multiwinner voting, \cite{XiaCL10} study epistemic voting on {\em combinatorial} (or {\em multi-attribute}) domains.

\paragraph{Epistemic approval voting}

Epistemic voting with approval ballots has scarcely been considered. \cite{isapproval2015}  assume that the ground truth is a {\em ranking} over alternatives, and identify noise models for which approval voting is optimal given $k$-approval votes, in the sense that the objectively best alternative gets elected. \cite{AlloucheLY22} continue this line of research but assume instead that the ground truth consists of a single alternative. They define various noise models and show that those that work best on real datasets are those that give a higher confidence to voters who approve few alternatives.
\cite{learning2017} study the number of samples needed to recover the ground truth ranking over alternatives with high enough probability from approval ballots; they show that is is exponential if ballots are required to approve $k$ candidates, but polynomial if the size of the ballots is randomized. 


\paragraph{Crowdsourcing and social choice}
%An axiomatic study of the task of collective annotation has been presented in \cite{Axiomatic2014}, along with an empirical analysis of an aggregation method \cite{Empirical2014}. 

A social choice-theoretic study of collective annotation tasks was done by \cite{Axiomatic2014} and \cite{Empirical2014}. Mechanisms for incentive-compatible elicitation with approval ballots in crowdsourcing applications have been designed by \cite{ShahZ20}. 
\cite{truth2019} define a method to aggregate votes weighted according to their average proximity to the other votes as an estimation of their reliability. 

 \cite{solution2017} introduce the {\em Bayesian truth serum} approach: eliciting, in addition to the voters' answers, their prediction of the distribution of answers, gives much better results. This approach was generalized by \cite{HosseiniM0S21} to contexts where  the ground truth is a ranking.

% approfondir les explications des papiers de l'equipe de Hulle 
\begin{comment}
Beyond social choice, collective annotation has also been studied in the machine learning community.
%This dates back to 
\cite{maximum1979} used an expectation-maximization (EM) approach 
%[EXPLAIN] 
for retrieving true binary labels. This approach has been improved along with other methods namely in \cite{learning2010,multidimensional2010,Minimax2017,domain2019}. 
\end{comment}

 Beyond social choice, collective multi-label annotation was first addressed by \cite{reliable2010}, who study the agreement between experts and non-experts in some multi-labelling tasks, and by \cite{scalable2014}, who solve the multi-label estimation problem with a scalable aggregation method.
 
%\smallskip
%Now, the specificity of our work is threefold: (a) the ground truth consists of a set of alternatives; (b) the input consists of approval votes; (c) the competence of the various voters is not known a priori but learnt from the input. 
%\bj While we are not aware of any work that has these three properties, we review below those that go more or less in this direction.\ej}
%Few works deal with (a), (b) or (c); we review them now in more detail.}

%+\paragraph{Adaptive voter confidence}
%\cite{truth2019} define a method to aggregate votes weighted according to their average proximity to the other votes as an estimation of their reliability. 
%\cite{AlloucheLY22} (already cited above) also estimate voters' competences from the votes.
%In a more sophisticated context, 
%the survey chapter 


%Moreover, the optimality of k-approval voting has been studied from a likelihood and sample complexity view points. \cite{isapproval2015} prove that for some noise models and real world cases approval voting can be suboptimal, in addition \cite{learning2017} prove that the sample complexity needed to guarantee the recovery of the ground truth by k-approval votes grows exponentially with the number of alternatives and that it can be significantly reduced by randomizing the size of the ballots.



%independently make the right comparison (right w.r.t to a supposedly existent ground truth ranking) with probability $p>\frac{1}{2}$, then the majority decision will more likely coincide with the true ranking as the number of voters grows to infinity (see \cite{majority2020} and \cite{Premises2008} for proofs and discussion).
%Since the foundation of epistemic social choice dating back to the 18th century by Condorcet and its later generalization by Young \cite{condorcet1988,optimal1986,optimal1995}, the problem of truth recovery from noisy judgments has known great developments. 
%The works of Young have generalized this aggregation framework to more than $2$ alternatives and have established a connection between the Maximum Likelihood Estimator and both Kemeny \cite{kemeny1959} and Borda rule \cite{borda1781}.

%Afterwards, many contributions studied several aspects of the problem. For instance, \cite{optimal1997} introduced the notion of payoff depending on the optimality of the aggregation of judgments on some binary issue. 

%Besides, \cite{maximum2004} relaxes the competence assumption in the Condorcet noise model, and supposes that the probability of making a true pairwise comparison depends on the distance between the alternatives in the true ranking. 

%The converse problem has been tackled in \cite{common2005}, where the authors studied the possibility of expressing common single winner voting rules as Maximum Likelihood Estimator given some correspondent noise model. Note that some of their results can be easily generalized to multi-winner elections. 

%Other lines of research have also been explored. In \cite{voting2011,epistemic2017} Pivato defines the concept of "abstract voting rules" and study their rationalizability along with the study of conditions for the Condorcet Jury Theorem to hold for correlated votes. 



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%% NEW PLAN %%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{The Model}\label{sec: Prior}
%\cj{The title of this section should be changed}
%\subsection{Framework, Noise Model and Prior Knowledge}

%As in a typical annotation crowdsourcing dataset,  consider a set of 

%We present here our framework (noise model and prior knowledge). 

Let $\Voters=\{1,\dots,n\}$ be a set of voters, and $\mathcal{A}=\{a_1,\dots,a_m\}$ a set of alternatives (possible objects in images, notes in chords, papers, patients...).
%sentiments in texts..). 
Consider a set of $L$ {\em instances}: an instance $z$ consists of an approval profile $A^z=(A_1^z,\dots,A_n^z)$ where $A_i^z \subseteq \mathcal{A}$ is an approval ballot for every $i \in \Voters$. For example, in a crowdsourcing context, a task usually contains multiple questions, and an instance comprises the voters' answers to one of these questions.

For each instance $z\in L$, there exists an \emph{unknown} ground truth $S^*_z$ belonging to $\mathcal{S}=2^{\mathcal{A}}$, which is the set of objectively correct alternatives in instance $z$. %Although these ground truth sets are a priori unknown, we will suppose in this section that 
It is prior knowledge by the central authority (but not necessarily by voters), that the number of alternatives in each of them lies in the interval $[l,u]$:
%\cj{What is this notation $[l,u]$?} 
$S^*_z \in \mathcal{S}_{l,u}= \{S \in \mathcal{S}, l\leq |S| \leq u\}$, for given bounds $0\leq l \leq u\leq m $.
%\cj{Why should $[l,u]$ be the same for all instances? T: For the guitar chords and image annotations it is the same. We can omit this hypothesis, but I don't think that it would be interesting, since l and u are inputs. If you have 15 images to annotate you usually do not have specific prior on each image.}

Our goal is to unveil the ground truth for each of these instance using the votes and the prior knowledge on the number of winning alternatives.
%(constrained to be between $l$ and $u$). For this purpose, 
We 
%propose a specific 
define a noise model consisting of two parametric distributions, namely, a conditional distribution of the approval ballots given the ground truth, and a prior distribution on the ground truth. Here we depart from classical noise models in epistemic social choice, as we suppose that the parameters of these distributions may be unknown and thus need to be estimated.

%Formally, we suppose that 
For each voter $i\in \Voters$, we suppose that there exist two unknown parameters $(p_i,q_i)$ in $(0,1)$ such that the approval ballot $A_i^z$ on an instance $z \in L$ is drawn according to the following distribution: for each $a \in {\cal A}$,
$$
P(a \in A_i^z|S^*_z=S) = \left\{
    \begin{array}{ll}
        p_i & \mbox{if } a \in S \\
        q_i & \mbox{if } a \notin S
    \end{array}
\right. 
$$
 where 
 %\cj{Problem with notation: here the ground truth is denoted by $S^*$ while it used to be $S^*_z$. More generally we should check that all notation is consistent throughout the paper and check that everything (notation and terminology) has been well defined. T: Here we omit the z because there is only one instance, but it needs to be $S^*$ instead of $S^*$ anyways. J: I would prefer $S^*$.}
$p_i$ (resp. $q_i$) is the (unknown) probability that voter $i$ approves a correct (resp. incorrect) alternative.
%\cj{Should we say that a correct alternative is one that belongs to the ground truth? Maybe we said it already, and maybe it's obvious enough.T: I think it is obvious}  
Then we make the following assumptions:%\cj{We have to make clear that for pedagogical reasons we start with the simple model where the parameters $p_i$, $q_i$ and $t_j$ are given, and that the more interesting case where they are estimated from the data will be discussed further. Also, say how the parameters come from when they are given, using examples.}
\begin{compactitem}
    \item[(1)]   A voter's  approvals of alternatives  are  mutually  independent given 
    the ground truth and parameters $(p_i,q_i)_{i \in \Voters}$.%\cj{Should be $p_i, q_i$?}%\cj{I changed the order of (1) and (2)}
    %\cj{1 and 2 could be put together into: all events ``a given voter  approves a given alternative'' are independent}
   \item[(2)]   Voters' ballots are  mutually  independent given the ground truth.
   %$ the ground truth. %\cj{Why don't we say 'given parameters' while we say it in 2 and 3?}
    \item[(3)] Instances are independent given the parameters $(p_i,q_i)_{i \in \Voters}$ and the ground truths.
    %\cj{Do we mean $(p_i,q_i)_{i \in N}$?} and $S^*$.
\end{compactitem}

To model the prior probability of any set $S$ to be the ground truth $S^*$, we define
parameters $t_j=P(a_j \in S^*)$. $t_j$ can be understood as the prior probability of $a_j$ to be in the ground truth set $S^*$  before the cardinality constraints are taken into account. These, together with an independence assumption on the events $\{a_j \in S^*\}$, gives 
$P(S=S^*)=\prod\limits_{a_j \in S} t_j \prod\limits_{a_j \notin S}1- t_j$.  Note that the choice of the parameters $t_j$ is not crucial when running the algorithm for estimating the ground truth: we will see in Section \ref{subsec:amle} that it converges whatever their values.  
The distribution conditional to the prior knowledge on the size of the ground truth $\tilde{P}(S)$ can 
%then 
be seen as a projection on the constraints  followed by a normalization: 
%\ct{La projection se fait avant et non pas après la normalisation} normalization: 
\begin{small}
$$P(S^*=S|l\leq|S^*|\leq u)=\frac{P(S^*=S \cap |S^*|\in [l,u])}{P(|S^*|\in [l,u])} $$
\end{small}
%and then, 
It follows:
$$\tilde{P}(S) = \left\{
    \begin{array}{ll}
        \frac{1}{\beta(l,u,t)}\prod\limits_{a_j\in S}t_j \prod\limits_{a_j \notin S} (1-t_j) & \mbox{if } S \in \mathcal{S}_{l,u} \\
        0 & \mbox{if } S \notin \mathcal{S}_{l,u}  
    \end{array}
\right. $$
where
$\beta(l,u,t)=\sum\limits_{S\in \mathcal{S}_{l,u}} \prod\limits_{a_j\in S}t_j \prod\limits_{a_j \notin S} (1-t_j)$.

%To model the prior probability of any set $S$ being the ground truth $S^*$, we suppose that, were it not for the size constraints, the events $\{a_j \in S^*\}$ would have been independent and we define $t_j=P(a_j \in S^*)$, which gives us the prior distribution before the constraints are taken into account:
%$P(S=S^*)=\prod\limits_{a_j \in S} t_j \prod\limits_{a_j \notin S}1- t_j$. In order to incorporate the prior knowledge on the size into our model we will rather consider the conditional distribution:
%\begin{small}
%$$\tilde{P}(S)=P(S^*=S||S^*|\in [l,u])=\frac{P(S^*=S \cap |S^*|\in [l,u])}{P(|S^*|\in [l,u])} $$
%\end{small}
%So all in all:
%$$\tilde{P}(S) = \left\{
%    \begin{array}{ll}
%        \frac{1}{\beta(l,u,t)}\prod\limits_{a_j\in S}t_j \prod\limits_{a_j \notin S} (1-t_j) & \mbox{if } S \in \mathcal{S}_{l,u} \\
%        0 & \mbox{if } S \notin \mathcal{S}_{l,u}  
%    \end{array}
%\right. $$
%where
%$\beta(l,u,t)=\sum\limits_{S\in \mathcal{S}_{l,u}} \prod\limits_{a_j\in S}t_j \prod\limits_{a_j \notin S} (1-t_j)$.
The ground truths associated with different instances are assumed to be mutually independent given the parameters.
%\cj{Ce n'était pas déjà l'hypothèse (3)? T: Ce n'est pas la même chose. Toutes les hypothèse précédentes portaient sur les votes. Cette hypothèse porte sur les ground truths.}

Two particular cases are worth discussing. First, when $(l,u)=(0,m)$, the problem is {\em unconstrained} and we have $\beta(0,m,t)=P(|S^*| \in [0,m])=1$, so $\tilde{P}(S)=P(S=S^*)$.
In this case the problem degenerates into a series of independent binary label-wise estimations (see Subsection \ref{subsec:gt-vp}). 
  
Second, in the single-winner case $(l,u)=(1,1)$, we have $\tilde{P}(\{a_j\})
%=P(\{a_j\} = S^*| |S^*|=1)
=\frac{t_j \prod_{h\neq j}1-t_h}{\beta(1,1,t)}$, therefore, for any approval profile $A$:
\begin{small}
\begin{align*}
P(S^* = a_j|A) & \propto P(A|S = a_j) \tilde{P}(a_j) \\
             &= P(A|S = a_j) \times \frac{t_j \prod\limits_{h\neq j}(1-t_h)}{\beta}\\
             &= P(A|S = a_j) \times \frac{1}{(1-t_j) }\frac{t_j\prod\limits_{1\leq h \leq m}(1-t_h)}{\beta}\\
             &\propto P(A|S = a_j) \times \frac{t_j}{(1-t_j)} 
\end{align*}
\end{small}
We recover the same estimation problem if we simply introduce $\alpha_j=P(S^* = \{a_j\})$ with $\sum \alpha_j =1$ as in \cite{optimal2001},
in which case we have $P(S^* = a_j|A)\propto \alpha_j P(A|S^* = a_j)$. 


\section{Estimating the Ground Truth}
\label{sec:estimating}

Our aim is the intertwined estimation of the ground truth and the parameters via maximizing the total likelihood of the instances:
\begin{small}
\begin{align*}
 \mathcal{L}(A,S,p,q,t) &=\prod_{z=1}^L \tilde{P}(S_z) \prod_{i=1}^n P(A_i^z|S_z)
 \end{align*}
\end{small} 
where:
\begin{small}
$$ P(A_i^z|S_z)=p_i^{|A_i^z\cap S_z|} q_i^{|A_i^z\cap \overline{S_z}|}(1-p_i)^{|\overline{A_i^z}\cap S_z|} (1-q_i)^{|\overline{A_i^z}\cap \overline{S_z}|} $$
\end{small}
To this aim, we will introduce an iterative algorithm whose main two steps will be presented in sequence, in the next subsections, before the main algorithm is formally defined and its convergence shown. These two steps are:
\begin{compactitem}
    \item Estimating the ground truths given the parameters.
    \item Estimating the parameters given the ground truths.
\end{compactitem}
Simply put, the algorithm consists in iterating these two steps until it converges to a fixed point. 

\subsection{Estimating the Ground Truth Given the Votes and the Parameters}\label{subsec:gt-vp}

Since instances are independent given the parameters, we focus here on one instance with ground truth $S^*$ and 
%approval 
profile $A=(A_1,\dots,A_n)$. Before diving into 
%this 
maximum likelihood estimation (MLE), 
%problem, we need 
we introduce some notions and prove some lemmas. In this subsection, we suppose that the parameters $(p_i,q_i)_{i \in \Voters}$ and $(t_j)_{j \in \mathcal{A}}$ are known (later on, these parameters will be replaced by their estimations at each iteration of the algorithm).
Thus, all in all, input and output are as follows:
%we have the following 
%the results of this section 
\begin{compactitem}
    \item Input: approval profile $A$;
    %$(A_1,\dots,A_n)$ and 
    parameters $(p_i,q_i)_{i \in \Voters}$ and $(t_j)_{j \in \mathcal{A}}$.
    \item Output: MLE of the ground truth $S^*$.
\end{compactitem}

\begin{definition}[weighted approval score]\label{weighted approval score}
Given an approval profile $(A_1,\dots,A_n)$, noise parameters $(p_i,q_i)_{1\leq i \leq n}$ 
%on $m$ alternatives $\mathcal{A}=\{a_1,\dots,a_m\}$ with 
and prior parameters $(t_j)_{1\leq j \leq m}$, define:
$$app_w(a_j)=ln\left(\frac{t_j}{1-t_j}\right) + \sum_{i: a_j\in A_i} ln\left(\frac{p_i(1-q_i)}{q_i(1-p_i)}\right)$$
\end{definition}

%\begin{remark}
The scores $app_w(a_j)$ can be interpreted as weighted approval scores for a $(n+m)$-voter profile where:
\begin{compactitem}
   \item for each voter $1\leq i \leq n$: $i$ has a weight $w_i=ln\left(\frac{p_i(1-q_i)}{q_i(1-p_i)}\right)$ and casts approval ballot $A_i$.
    \item for each $1\leq j \leq m$: there is a virtual voter with weight $w_j=ln\left(\frac{t_j}{1-t_j}\right)$ who casts approval ballot $A_j=\{a_j\}$.
\end{compactitem}
While the weight of each voter $i \in \Voters$ depends on her reliability, each prior information on an alternative plays the role of a virtual voter who only selects the concerned alternative, with a weight that increases as the prior parameter increases.


From now on, we suppose without loss of generality that the alternatives are ranked according to their score: 
$$app_w(a_1) \geq app_w(a_2) \geq \dots \geq app_w(a_m) $$

\begin{definition}[threshold and partition]\label{threshold_partition}
Define the threshold:
$$\tau_n=\sum_{i=1}^n ln\left( \frac{1-q_i}{1-p_i}\right)$$
and the partition of the set of alternatives in three sets:
$$
\left\{
    \begin{array}{ll}
        S_{max}^{\tau_n} & =\left\{a\in A, app_w(a)>\tau_n\right\} \\
        S_{tie}^{\tau_n} & =\left\{a\in A, app_w(a)=\tau_n \right\}\\
        S_{min}^{\tau_n} & =\mathcal{A}\backslash (S_{max}^{\tau_n}\cup S_{tie}^{\tau_n})
        \end{array}
\right.
$$
and let $k_{max}^{\tau_n}=|S_{max}^{\tau_n}|, k_{tie}^{\tau_n}=|S_{tie}^{\tau_n}|, k_{min}^{\tau_n}=|S_{min}^{\tau_n}|$.
\end{definition}

The next result characterizes the sets in $\mathcal{S}$ that are MLEs of the ground truth given the parameters. 
\begin{theorem}\label{constrained}
$\Tilde{S} \in \argmax_{S\in \mathcal{S}} \mathcal{L}(A,S,p,q,t)$ if and only if there exists $k\in [l,u]$ such that 
%$\Tilde{S}=\{a_1,\dots,a_k\}$ 
%$\Tilde{S}$ is a top-$k$ set and:
$\Tilde{S}$ is the set of $k$ alternatives with  the highest $k$ values of $app_w$ and:
\begin{equation}\label{constraints}
\left\{
    \begin{array}{cl}
        |\Tilde{S}\cap S_{max}^{\tau_n}| & =\min(u,k_{max}^{\tau_n})\\
        |\Tilde{S}\cap S_{min}^{\tau_n}| & =\max(0,l-k_{tie}^{\tau_n}-k_{max}^{\tau_n})
        \end{array}
\right.
\end{equation}
\end{theorem}


So the estimator $\tilde{S}$ is made of some top-$k$ alternatives, where the possible values of $k$ are determined by Eq.~(\ref{constraints}). The first equation imposes that $\Tilde{S}$ includes as many elements as possible from $S_{max}^{\tau_n}$ (without exceeding the upper-bound $u$), whereas the second one imposes that $\Tilde{S}$ includes as few elements as possible from $S_{min}^{\tau_n}$ (without getting below the lower-bound $l$). An example is included in the appendix. 


\begin{proof}
%For each admissible set
%\cj{Did we define states?T: You're right. } 
%$S \in \mathcal{S}$, the likelihood reads:
%    $L(S)=\frac{P(A|S)\tilde{P}(S)}{P(A)}$.
Since $\tilde{P}(S)>0 \iff S \in \mathcal{S}_{l,u}$, we have that $\argmax_{S\in \mathcal{S}} \mathcal{L}(S)= \argmax_{S\in \mathcal{S}_{l,u}} \mathcal{L}(S)$.
%Using the assumption on the ballots' independence, we can write:
%$L(S)\propto \tilde{P}(S)\prod_{i=1}^n P(A_i|S) $,
%which, by using the independence assumption on alternatives, becomes, for any $S \in \mathcal{S}_{l,u}$:
Moreover, we have that for any $S \in \mathcal{S}_{l,u}$:
\begin{small}{\allowdisplaybreaks
\begin{align*}
\mathcal{L}(S)& = \tilde{P}(S)\prod_{i=1}^n p_i^{|A_i\cap S|}q_i^{|A_i\cap \overline{S}|}(1-p_i)^{|\overline{A_i}\cap S|}(1-q_i)^{|\overline{A_i}\cap \overline{S}|}\\
 & = \tilde{P}(S)\prod_{i=1}^n p_i^{|A_i\cap S|}q_i^{|A_i|-|A_i\cap S|}(1-p_i)^{|S|-|A_i\cap S|}\\
 & \mbox{~~~~~~~~~~~~~~~~~~~~~}(1-q_i)^{|\overline{A_i}|-|S|+|A_i\cap S|}\\
 & \propto \tilde{P}(S) \prod_{i=1}^n \left[\frac{1-p_i}{1-q_i}\right]^{|S|}\left[\frac{p_i(1-q_i)}{q_i(1-p_i)}\right]^{|A_i \cap S|}\\
 & \propto \frac{1}{\beta}\prod_{a_j \in S} t_j \prod_{a_j \notin S} (1-t_j) \prod_{i=1}^n \left[\frac{1-p_i}{1-q_i}\right]^{|S|}\left[\frac{p_i(1-q_i)}{q_i(1-p_i)}\right]^{|A_i \cap S|}\\
  %& \propto \prod_{a_j \in \mathcal{A}} (1-t_j) \prod_{a_j \in S} \frac{t_j}{1-t_j}\prod_{i=1}^n %\left[\frac{1-p_i}{1-q_i}\right]^{|S|} \left[\frac{p_i(1-q_i)}{q_i(1-p_i)}\right]^{|A_i \cap S|} \\
  & \propto \prod_{a_j \in S} \frac{t_j}{1-t_j}\prod_{i=1}^n \left[\frac{1-p_i}{1-q_i}\right]^{|S|}\left[\frac{p_i(1-q_i)}{q_i(1-p_i)}\right]^{|A_i \cap S|} 
  \end{align*}}
%\end{align*}}
%\end{equation*}
\end{small}

%%reformulation => faire apparaitre les notations dans la derniere solution (pour clarifier le raisonnement sans que le lecteur n'aie à se réferer aux definitions)
Thus the log-likelihood $\ell(S)$ reads:

\allowdisplaybreaks
\begin{small}
\begin{align*}
    &  \sum_{a_j\in S} \ln \frac{t_j}{1-t_j}+\sum_{i=1}^n |S| \ln \frac{1-p_i}{1-q_i}+ |A_i \cap S|\ln \frac{p_i(1-q_i)}{q_i(1-p_i)}\\
     & = \sum_{a_j \in S} \left[ \overbrace{\underbrace{\ln \frac{t_j}{1-t_j}+\sum_{i: a_j \in A_i} \ln \frac{p_i(1-q_i)}{q_i(1-p_i)}}_{app_w(a_j)}-\underbrace{\sum_{i=1}^n\ln \frac{1-q_i}{1-p_i}}_{\tau_n} }^{l(a_j)}\right]% \\
    % & = \sum_{a_j \in S} \left[app_w(a)-\tau_n\right]= \sum_{a_j \in S} l(a)
\end{align*}

\end{small}
This means that $a\in S_{max}^{\tau_n}$ if and only if $\ell(a)>0$ , $a\in S_{min}^{\tau_n}$ if and only if $\ell(a)<0$ and $a\in S_{tie}^{\tau_n}$ if and only if $\ell(a)=0$.
%\cj{Je préfère ``if and only if'', sauf si on a des problèmes de place.}
Now, let $S_M$ be a maximizer of the likelihood.
%We %can\cj{will? No I discussed with Florian and decided to omit it since it is simple. In fact if it were not a  top-k set, just replace the last alternative in it with one with higher score (it exists since the set is not top-k) and you're set is no longer a maximizer of the likelihood.}  
%The fact that $S_M$ is made of top-$k$ alternatives for some $k \in [l \twodots u]$ is proven by contradiction but this proof is omitted due to the space constraint. 
Since $\ell(a_j)\geq \ell(a_h) \iff app_w(a_j)\geq app_w(a_h)$ we have that $S_M$, which maximizes $\sum_{a_j \in S} \ell(a_j)$, is made of top-$k$ alternatives for some $k \in [l \twodots u]$.


Furthermore, $|S_M\cap S_{min}^{\tau_n}| =\max(0,l-k_{tie}^{\tau_n}-k_{max}^{\tau_n})$. Start by noticing that $|S_M\cap S_{min}^{\tau_n}| \geq \max(0,l-k_{tie}^{\tau_n}-k_{max}^{\tau_n})$, since
$|S_M\cap S_{min}^{\tau_n}|\geq l-|S_M\cap S_{max}^{\tau_n}|-|S_M\cap S_{tie}^{\tau_n}|\geq l-k_{max}^{\tau_n}-k_{tie}^{\tau_n} $.
Suppose that $|S_M\cap S_{min}^{\tau_n}| > \max(0,l-k_{tie}^{\tau_n}-k_{max}^{\tau_n})$. Then we have that $|S_M|>l$ because otherwise, if $|S_M|=l$, then $|S_M \cap S_{max}^{\tau_n}|+|S_M \cap S_{tie}^{\tau_n}|=l-|S_M \cap S_{min}^{\tau_n}|<k_{max}^{\tau_n}+k_{tie}^{\tau_n}$, which would mean that there are elements in $S_{tie}^{\tau_n}$ and $S_{max}^{\tau_n}$ which are not in $S_M$, which is a contradiction since $|S_M \cap S_{min}^{\tau_n}|>0$ and $S_M$ is a top-$k$ set. Now consider $a\in S_M \cap S_{min}^{\tau_n}$, we have that $|S_M\backslash\{a\}|\geq l$ and $l(S_M)=l(S_M\backslash\{a\})+l(a)<l(S_M\backslash\{a\})$ which is a contradiction.
%We prove in the same way that $|S_M\cap S_{max}^{\tau_n}| =\min(u,k_{max}^{\tau_n})$. 

With the same idea we can prove that $|S_M\cap S_{max}^{\tau_n}| =\min(u,k_{max}^{\tau_n})$. %In fact, it is obvious that $|S_M\cap S_{max}^{\tau_n}| \leq \min(u,k_{max}^{\tau_n})$ since $|S_M\cap S_{max}^{\tau_n}|\leq |S_{max}^{\tau_n}|=k_{max}^{\tau_n}$ and $|S_M\cap S_{max}^{\tau_n}|\leq |S_M|\leq u$. Now, suppose that $|S_M\cap S_{max}^{\tau_n}| <\min(u,k_{max}^{\tau_n})$:
%First, notice that $|S_M|<u$ since if $|S_M|=u$ then  $|S_M\cap S_{max}^{\tau_n}|<|S_M|$ and $|S_M\cap S_{max}^{\tau_n}|<k_{max}^{\tau_n}$ which would mean that $S_M$ cannot be of the form $S_M=\{a_1,\dots,a_k\}$ since it would have had to contain alternatives from $S_{tie}^{\tau_n}$ and $S_{min}^{\tau_n}$ while omitting some alternatives from $S_{max}^{\tau_n}$. Now consider $a\in S_{max}^{\tau_n}\backslash S_M$, we have that $|S_M\cup \{a\}|\leq u$ and $l(S_m\cup \{a\})=l(S_M)+l(a)>l(S_M)$ which is a contradiction.

Conversely, consider an admissible set $S$ of top-$k$ alternatives that verifies the constraints (\ref{constraints}). Let $S_M$ be a MLE which, by the first part of the proof, is a top-$k'$ set that also satisfies the same constraints (\ref{constraints}). Thus we have that $|S_M\cap S_{max}^{\tau_n}| =|S\cap S_{max}^{\tau_n}| =\min(u,k_{max}^{\tau_n})$, and since $S$ and $S_M$ are top-$k$ and top-$k'$ sets, we have that $S\cap S_{max}^{\tau_n}=S_M\cap S_{max}^{\tau_n}$. Similarly we have that $S\cap S_{min}^{\tau_n}=S_M\cap S_{min}^{\tau_n}$. This suffices to prove that $\ell(S)=\ell(S_M)$ is maximal.
%Conversely, consider a set $S_M=\{a_1,\dots,a_k\}$ such that:
%$$
%\left\{
%    \begin{array}{cl}
%        k & \in [l,u] \\
%        |S_M\cap S_{max}^{\tau_n}| & =\min(u,k_{max}^{\tau_n})\\
%        |S_M\cap S_{min}^{\tau_n}| & =\max(0,l-k_{tie}^{\tau_n}-k_{max}^{\tau_n})
%        \end{array}
%\right.
%$$
%and let us prove that it maximizes the likelihood. To do so, consider $S' \in \argmax_{S\in \mathcal{S}_{l,u}} P(S^*=S|A)$, so by the first part of the proof there exists some $k'$ such that $S'=\{a_1,\dots,a_{k'}\}$ such that:
%$$
%\left\{
%    \begin{array}{cl}
%        k' & \in [l,u] \\
%        |S'\cap S_{max}^{\tau_n}| & =\min(u,k_{max}^{\tau_n})\\
%        |S'\cap S_{min}^{\tau_n}| & =\max(0,l-k_{tie}^{\tau_n}-k_{max}^{\tau_n})
%        \end{array}
%\right.
%$$
%Since $|S'\cap S_{max}^{\tau_n}|=|S_M\cap S_{max}^{\tau_n}|$ and $|S'\cap S_{min}^{\tau_n}|=|S_M\cap S_{min}^{\tau_n}|$ and given the structure of $S'=\{a_1,\dots,a_{k'}\}$ and $S_M=\{a_1,\dots,a_{k}\}$, we have that $S'\cap S_{max}^{\tau_n}=S_M\cap S_{max}^{\tau_n}$ and $S'\cap S_{min}^{\tau_n}=S_M\cap S_{min}^{\tau_n}$. Given the expression of the likelihood of a set of alternatives, and given that $l(a)=0$ for any $a\in S_{tie}^{\tau_n}$ we have that $l(S_M)=l(S')$, which implies that $S_M \in \argmax_{S\in \mathcal{S}_{l,u}} P(S^*=S|A)$.
\end{proof}


Notice that when $(l,u)=(0,m)$, the problem degenerates into a collection of label-wise problems, one for each alternative: $a_j$ is selected if $a_j \in S_{max}^{\tau_n}$, rejected if $a_j \in S_{min}^{\tau_n}$, and those that are on the fence can be arbitrarily selected or not.

\begin{example}\label{example constraints}
Consider $5$ alternatives $\mathcal{A}=\{a,b,c,d,e\}$ and $10$ voters $\Voters$ all sharing the same parameters $(p,q) = (0.7,0.4)$. We thus have that all voters share the same weight $w=ln\left(\frac{p(1-q)}{q(1-p)}\right)=1.25$ and $\tau_n=\sum_{i=1}^n ln\left(\frac{1-q}{1-p}\right) = 6.93$. We consider the constraints $(l,u)=(1,4)$

First, suppose that $t_d = 0.6$ and that $t_j =0.5$ for all the remaining candidates. Consider also the approval counts (and weighted approval scores) in the table below.\smallskip
%\ref{app counts}:
%\cj{Perhaps give also the virtual votes?}
%\begin{table}[H]
%    \centering

    \begin{tabular}{|c|c|c|c|c|c|}
  \hline
  Candidate & a & b & c & d & e  \\
  \hline
  Approval count &9 & 8 & 7 & 5 & 5  \\
  \hline
  $app_w$ & 11.25 & 10 & 8.75 & 6.65 & 6.25\\
  \hline
\end{tabular}\smallskip
%\caption{Approval counts}
%\label{app counts}
%\end{table}

We can easily check, by Theorem \ref{constrained} that $\Tilde{S}=\argmax_{S \in \mathcal{S}} P(S=S^*|A)=\{a,b,c\}$. We have that $S_{max}^{\tau_n}=\{a,b,c\}, S_{tie}^{\tau_n}=\emptyset $ and $S_{min}^{\tau_n}=\{d,e\}$. We know that there exists some $k \in [1,4]$ such that $\tilde{S}$ would consist of the top $k$ alternatives.
%\cj{What do you mean? }
We also have that:
\begin{small}
$$
\left\{
    \begin{array}{cl}
        %k & \in [1,4]  \\
        |\tilde{S}\cap S_{max}^{\tau_n}| & =\min(u,k_{max}^{\tau_n})=3  \implies \{a,b,c\} \subseteq \tilde{S} \\
        |\tilde{S}\cap S_{min}^{\tau_n}| & =\max(0,l-k_{tie}^{\tau_n}-k_{max}^{\tau_n})=0  \implies d,e\notin\tilde{S}
        \end{array}
\right. 
$$
\end{small}
So the only possibility is $\tilde{S}=\{a,b,c\}$.
\end{example}



\subsection{Estimating the Parameters Given the Ground Truth}\label{subsec:pa-gt}
\subsubsection{Estimating the prior parameters over alternatives}
%given the ground truth}

Once the ground truths are  estimated at one iteration of the algorithm, the next step consists in estimating the prior parameters $(t_j)_{j \in \mathcal{A}}$, with the ground truths being given (in Subsection \ref{subsec:amle} the ground truth will be replaced by its estimation at each iteration). The next proposition explicits the closed-form expression of the 
%maximum likelihood estimator 
MLE of the prior parameter of each alternative given the ground truth of each instance $S^*_z$ once the prior parameters of all other alternatives are fixed.
%Thus, all in all, the following result have the following inputs and outputs:
\begin{compactitem}
    \item Input: Approval profile $(A_1,\dots,A_n)$, ground truths $S^*_z$, and all but one prior parameters  $(t_h)_{h \neq j}$.
    \item Output: MLE of $t_j$.
\end{compactitem}
\begin{proposition}\label{prior}
For every $a_j \in \mathcal{A}$: 
%\cj{We should add comments to explain what these equations mean}
$$\argmax_{t \in (0,1)} \mathcal{L}(A,S,p,q,t,t_{-j}) = \frac{occ(j)\overline{\alpha}_j}{(L-occ(j))\underline{\alpha}_j + occ(j)\overline{\alpha}_j}$$
$$
\mbox{where: }\left\{
    \begin{array}{lll}
        \overline{\alpha}_j &=\sum\limits_{\substack{S\in \mathcal{S}_{l,u} \\ a_j\in S}} \prod\limits_{\substack{a_h \in S \\ h\neq j}} t_h \prod\limits_{a_h \notin S} (1-t_h)\\
        \underline{\alpha}_j & =\sum\limits_{\substack{S\in \mathcal{S}_{l,u} \\ a_j\notin S}} \prod\limits_{a_h \in S} t_h \prod\limits_{\substack{a_h \notin S \\ h\neq j}} (1-t_h) \\
        %occ(j) &= \sum\limits_{z=1}^L \mathds{1}\{a_j \in S_z \} 
        occ(j) & = \left|z \in \{1,\dots,L\}, a_j \in S_z \right|
        \end{array}
\right.$$
\end{proposition}
%\begin{remark}
%We have that:
%$$\left\{
%    \begin{array}{ll}
%        \overline{\alpha}_j & = \beta((l-1)^+,u-1,t_{-j})\\
%        \underline{\alpha}_j & = \beta(l,u,t_{-j})
%    \end{array}
%\right.$$
%\end{remark}
Notice that $\overline{\alpha}_j=P(l \leq |S^*| \leq u|a_j \in S^*)$ and $\underline{\alpha}_j=P(l \leq |S^*|\leq u|a_j \notin S^*)$ so $\beta=\overline{\alpha}_jt_j+\underline{\alpha}_j(1-t_j)$. $occ(j)$ is the number of instances whose ground truth contains $a_j$.
\begin{proof}
Fix all sets $S_z \in \mathcal{S}_{l,u}$ and all the noise parameters $(p_i,q_i)_i$ and all the prior parameters $t_h$ but for one $t_j$  for some $j\leq m$, and let $t \in (0,1)$:
\begin{small}
\begin{equation*}
    \begin{split}
      \mathcal{L}(S,t,t_{-j})  & \propto \prod_{z=1}^L \frac{1}{\beta(l,u,t)}\prod_{a_h \in S_z} t_h \prod_{a_h \notin S_z} (1-t_h)\\
       %& \mbox{}\prod_{i \in N_z} p_i^{|A_i^l\cap S_z|}
       %q_i^{|A_i^z\cap \overline{S_z}|}(1-p_z)^{|\overline{A_i^z}\cap S_z|} (1-q_i)^{|\overline{A_i^z}\cap \overline{S_z}|}  \\
       & \propto \prod_{z=1}^L\frac{1}{\beta(l,u,t,t_{-j})}\prod_{a_h \in S_z}t_h \prod_{a_h\notin S_z} (1-t_h)\\
       & \propto \left(\frac{1}{\beta(l,u,t,t_{-j})}\right)^L \underbrace{\prod_{z: a_j \in S_z}t}_{t^{occ(j)}} \underbrace{\prod_{z: a_j \notin S_z} (1-t)}_{(1-t)^{L-occ(j)}}\\
      % & \propto \left(\frac{1}{\beta(l,u,t,t_{-j})}\right)^L t^{occ(j)} (1-t)^{L-occ(j)}
    \end{split}
\end{equation*}
\end{small}
Taking the log we can write the function as:
$$\ell(t)=-L \log \beta + occ(j) \log t + (L-occ(j)) \log (1-t) $$
Its derivative reads:
$$\frac{\partial \ell}{\partial t} = -L \frac{\underline{\alpha}_j-\overline{\alpha}_j}{\underline{\alpha}_jt+\overline{\alpha}_j (1-t)}+occ(j)\frac{1}{t}+(occ(j)-L)\frac{1}{1-t} $$
%Since we have $\beta=t \underline{\alpha}_j+\overline{\alpha}_j (1-t) $,
% the derivative reads:
%$$\frac{\partial \ell}{\partial t} = -L \frac{\underline{\alpha}_j-\overline{\alpha}_j}{\underline{\alpha}_jt+\overline{\alpha}_j (1-t)}+occ(j)\frac{1}{t}+(occ(j)-L)\frac{1}{1-t} $$
Canceling it, we obtain:
$$ {t=\frac{occ(j)\overline{\alpha}_j}{(L-occ(j))\underline{\alpha}_j + occ(j)\overline{\alpha}_j}} $$
The derivative vanishes in a single point in $(0,1)$ and $ \lim_{t\to0} \ell(t)$ $= \lim_{t\to 1}\ell(t)=-\infty $ thus $\ell$ reaches a unique maximum. % in: $$ t=\frac{occ(j)\overline{\alpha}_j}{(L-occ(j))\underline{\alpha}_j + occ(j)\overline{\alpha}_j} $$
\end{proof}




We will see later that the algorithm applies Proposition \ref{prior} sequentially to estimate the alternatives' parameters one by one (see Example \ref{example amle}).
\begin{comment}
\begin{example}\label{Example prior}
Suppose that we have $5$ alternatives $\mathcal{A}=\{a_1,\dots,a_5\}$ and that we fix $t_2=\dots=t_5=0.5$. Suppose also that we are given the ground truth of $L=4$ instances with $l=1,u=2$:
$$S_1^*=\{a_2,a_4\}, S_2^*=\{a_2,a_5\},S_3^*=\{a_2,a_3\},S_4^*=\{a_1,a_3\}  $$
We want to estimate $t_1$ by MLE. We begin by computing $\overline{\alpha}_1$ and $\underline{\alpha}_1$ and $occ(a_1)$:
$$\left\{
    \begin{array}{ll}
        \overline{\alpha}_1 & = \beta(0,1,t_2,\dots,t_5)=0.3125\\
        \underline{\alpha}_1 & = \beta(1,2,t_2,\dots,t_5)=1\\
        occ(a_1) & = 1
    \end{array}
\right.$$
Then we have that the MLE of $t_1$ is:
$$\hat{t}_1 = \frac{occ(a_1)\overline{\alpha}_1}{(L-occ(a_1))\underline{\alpha}_1 + occ(a_1)\overline{\alpha}_1}= 0.09  $$
\end{example}
\end{comment}

\subsubsection{Estimating the voter parameters}
%given the ground truth}\label{subsec:vp-gt}
%Now we will briefly show how, 
Once the ground truths are known (or estimated), we can estimate the voters' parameters $(p,q)$. 
%Hence, the input and output here are: 
\begin{compactitem}
    \item Input: Instances $(A^1,\dots,A^L)$, ground truths $(S^*_1,\dots,S^*_L)$.
    \item Output: MLE of voter reliabilities $(p,q)$.
\end{compactitem}
The next result simply states that the maximum likelihood estimator of $p_i$ of some voter is the fraction of alternatives that the voter approves and that actually belong to the ground truth; the estimation of $q_i$ is similar. See Example \ref{example amle}. 

\begin{proposition}\label{pq}
Fix sets $S_z \in \mathcal{S}_{l,u}$ and prior parameters $t_j$. Then:
$$\argmax_{(p,q) \in (0,1)^{2 n}} \mathcal{L}(A,S,p,q,t) = (\hat{p},\hat{q})$$
where:
$ \hat{p}_i  = \frac{\sum_{z \in L} |A_i^z\cap {S}_z |}{\sum_{z \in L} |{S}_z |}  
        ,\hat{q}_i  = \frac{\sum_{z \in L} |A_i^z\cap \overline{{S}_z} |}{\sum_{z \in L} |\overline{{S}_z }|}$
%$$\left\{
%    \begin{array}{ll}
%        \hat{p}_i & = \frac{\sum_{z \in L} |A_i^z\cap {S}_z |}{\sum_{z \in L} |{S}_z |}  \\
%        \hat{q}_i & = \frac{\sum_{z \in L} |A_i^z\cap \overline{{S}_z} |}{\sum_{z \in L} |\overline{{S}_z }|}
%    \end{array}
%\right.$$
\end{proposition}
The (simple) proof is omitted.
\begin{comment}
\begin{proof}
The independence assumptions in the noise model made the likelihood expression separable voter-wise such that for any voter $i\in \Voters$:
$$\argmax_{p_i} \mathcal{L}(A,S,p,q,t) = \argmax_{p_i} \prod_{z\in L} p_i^{|A_i^z\cap {S}_z|}(1-p_i)^{|\overline{A_i^z}\cap {S}_z|} $$
so, applying the $\log$ to the expression, we see that it suffices to maximize:
$$h_i(p_i)=\sum_{z=1}^L |A_i^z \cap S_z| \log p_i + |\overline{A_i^z} \cap S_z| \log 1-p_i  $$
We can easily prove that the derivative of this expression only vanishes in $\hat{p}_i$. We proceed exactly in the same way for the estimation of $q_i$.

\end{proof}

\end{comment}

\begin{comment}
\begin{example}
 Consider $3$ voters $\Voters=\{1,2,3\}$ and $5$ alternatives $\mathcal{A}=\{a_1,\dots,a_5\}$. Suppose we have $L=4$ instances whose
approval profiles are given in Table \ref{approval profile}:
\begin{table}[h]
    \centering
    \begin{tabular}{|l|c|c|c|c|}
  \hline
 & $A^1$  & $A^2$ & $A^3$ & $A^4$ \\
  \hline
  Voter $1$ & $\{a_1,a_4\}$ & $\{a_1\}$ & $\{a_3\}$ & $\{a_1\}$ \\
  \hline
  Voter $2$ & $\{a_2\}$ & $\{a_5\}$ & $\{a_4\}$ & $\{a_1\}$\\
  \hline
  Voter $3$ & $\{a_2,a_3,a_4\}$ & $\{a_2,a_3,a_5\}$ & $\{a_2,a_3\}$ & $\{a_3\}$ \\
  \hline 
\end{tabular}
    \caption{Approval ballots of 3 voters on 3 instances}
    \label{approval profile}
\end{table}


Suppose also that the ground truths are given by:
$$S^*^1 = \{a_2,a_4\},  S^*^2 = \{a_2,a_5\}, S^*^3 = \{a_2,a_3\},S^*^4 = \{a_1,a_3\} $$
We want to compute the MLEs of the voter reliabilities. For instance, voter $1$ has only $2$ false positive labels from a total of $12$ negative labels so $\hat{q}_1^{(1)}=\frac{2}{12}=0.17$. And she has $3$ out of $8$ true positives so $\hat{p}_1^{(1)}=\frac{3}{8}=0.38$. In final, we get:

$$\left\{
    \begin{array}{lll}
        \hat{p}_1 = 0.38 & \hat{p}_2 = 0.38 & \hat{p}_3 = 0.88   \\
        \hat{q}_1 = 0.17 & \hat{q}_2 = 0.08 & \hat{q}_3 = 0.17
    \end{array}
\right.$$

\end{example}
\end{comment}
%$$\left\{
%    \begin{array}{ll}
%        (\hat{p}_1,\hat{q}_1) & =(0.38,0.17)  \\
%        (\hat{p}_2,\hat{q}_2) & =(0.38,0.08)\\
%        (\hat{p}_3,\hat{q}_3) & = (0.88,0.17)\\
%    \end{array}
%\right.$$


\subsection{Alternating Maximum Likelihood Estimation}\label{subsec:amle}
%\cj{Je ne comprends pas ``by maximizing the overall likelihood $\mathcal{L}(A,S,p,q,t)$'' T: C'est to maximize the overall likelihood et non pas by maximizing ....}

%\begin{equation*}
%    \begin{split}
%      \mathcal{L}(A,S,p,q,t)  & = \prod_{z=1}^L\frac{1}{\beta(l,u,t)} \prod_{a_j \in S_z} t_j \prod_{a_j \notin S_z} (1-t_j)\\
%      & \prod_{i \in N} p_i^{|A_i^z\cap S_z|} q_i^{|A_i^z\cap \overline{S_z}|}(1-p_i)^{|\overline{A_i^z}\cap S_z|} \\ 
%      & (1-q_i)^{|\overline{A_i^z}\cap \overline{S_z}|} \mathds{1}\{S_z \in \mathcal{S}_{l,u}\} 
%    \end{split}
%\end{equation*}
\begin{algorithm}[t]
\caption{\textit{AMLE} procedure %for the estimation of $(S^*_z)_z$ and parameters $(p_i,q_i)_i, (t_j)_j$ 
}
\label{algo}
$\begin{array}{ll}
\textbf{Input:} & \mbox{Approval ballots $(A_i^z)_{1\leq z \leq L , i \in \Voters}$}\\
& \mbox{Initial parameters $\hat{\theta}^{(0)}$}, \mbox{Bounds $(l,u)$}
, \mbox{error $\varepsilon$}\\
\textbf{Output:} &\mbox{Estimations $(\hat{S}_z), (\hat{p}_i,\hat{q}_i), (\hat{t}_j)$} 
\end{array}$

\begin{algorithmic} 
%\STATE -Initialize $(\hat{p}_i^{(0)},\hat{q}_i^{(0)})$ and $(\hat{t}_j^{(0)})$
\REPEAT
\FOR{ $z = 1 \dots L$}
    \STATE Compute $\hat{S}_z^{(v+1)} = \{a_1, \dots, a_k\}$ with $k\in [l,u]$ and:%:\footnotemark 
    $$
    \begin{array}{cl}
        %|\hat{S}_z^{(v+1)}\cap S_{max,z}^{\hat{\tau}_n^{(v)}}| & =\min(u,k_{max,z}^{\hat{\tau}_n^{(v)}}) \\
        |\hat{S}_z^{(v+1)}\cap S_{max,z}^{(v)}| & =\min(u,k_{max,z}^{(v)}) \\
        %|\hat{S}_z^{(v+1)}\cap S_{min,z}^{\hat{\tau}_n^{(v)}}| & =\max(0,l-k_{tie,z}^{\hat{\tau}_n^{(v)}}-k_{max,z}^{\hat{\tau}_n^{(v)}})
        |\hat{S}_z^{(v+1)}\cap S_{min,z}^{(v)}| & =\max(0,l-k_{tie,z}^{(v)}-k_{max,z}^{(v)})
        \end{array}
$$
%with $(S_{max,z}^{(v)}, S_{tie,z}^{(v)},S_{min,z}^{(v)})$ defined for $\hat{\theta}^{(v)}$ and $(A^z_i)_{i \in N}$.
\ENDFOR
 
\FOR{ $i=1 \dots \Voters$}
    \STATE Update the parameters $(p_i,q_i)$ given $\hat{S}^{(v+1)}$:
    \begin{small}
        $$
        \hat{p}_i^{(v+1)}  = \frac{\sum\limits_{z \in L} |A_i^z\cap \hat{S}_z^{(v+1)} |}{\sum\limits_{z \in L} |\hat{S}_z^{(v+1)} |}  ,
        \hat{q}_i^{(v+1)}  = \frac{\sum\limits_{z \in L} |A_i^z\cap \overline{\hat{S}_z^{(v+1)}} |}{\sum\limits_{z \in L} |\overline{\hat{S}_z^{(v+1)} }|}
    $$
    \end{small}
    %$$\left\{
    %\begin{array}{ll}
    %    \hat{p}_i^{(v+1)} & = \frac{\sum_{z \in L} |A_i^z\cap \hat{S}_z^{(v+1)} |}{\sum_{z \in L} |\hat{S}_z^{(v+1)} |}  \\
    %    \hat{q}_i^{(v+1)} & = \frac{\sum_{z \in L} |A_i^z\cap \overline{\hat{S}_z^{(v+1)}} |}{\sum_{z \in L} |\overline{\hat{S}_z^{(v+1)} }|}
    %\end{array}
%\right.$$
\ENDFOR
\FOR{$j = 1 \dots m$}
\STATE Update $\hat{t}_j^{(v+1)}$ by:
\begin{small}
$$ \hat{t}_j^{(v+1)} = \frac{occ^{(v+1)}(j) \overline{\alpha}_j^{(v+1)}}{occ^{(v+1)}(j) \overline{\alpha}_j^{(v+1)} + (L - occ^{(v+1)}(j)) \underline{\alpha}_j^{(v+1)}}$$
\end{small}
where : 
$$\left\{
    \begin{array}{ll}
        occ^{(v+1)}(j) & =\sum_{z=1}^L \mathds{1}\{a_j \in \hat{S}_z^{(v+1)}\}  \\
        \overline{\alpha}_j^{(v+1)} & = \beta((l-1)^+,u-1,\hat{t}^{(v+1)}_{<j},\hat{t}^{(v)}_{>j})\\
        \underline{\alpha}_j^{(v+1)} & = \beta(l,u,\hat{t}^{(v+1)}_{<j},\hat{t}^{(v)}_{>j})
    \end{array}
\right.$$
\ENDFOR
\UNTIL{
$|| \hat{\theta}^{(v+1)}-\hat{\theta}^{(v)} || \leq \varepsilon$}
\end{algorithmic}
\end{algorithm}
Now the estimation of the ground truths and that of the parameters are intertwined to maximize the overall likelihood $\mathcal{L}(A,S,p,q,t)$ by the \emph{Alternating Maximum Likelihood Estimation algorithm}. AMLE is an iterative procedure similar to the \textit{Expectation-Maximization} procedure introduced in \cite{Distill2011} but with a coordinate-steepest-ascent-like iteration, whose aim is to intertwinedly estimate the voter reliabilities, the alternatives' prior parameters and the instances' ground truths. The idea behind this estimation 
%is quite intuitive, and 
consists in alternating a MLE of the ground truths given the current estimate of the parameters, and an updating of these parameters via a MLE based on the current estimate of the ground truths.\footnote{In case of ties between subsets when estimating the ground truth, a tie-breaking 
priority over subsets is used. %dealing with the estimation of continuous parameters, 
No ties occurred in our experiments.}
Each of these steps have been discussed in the previous subsections and are now incorporated into Algo.~\ref{algo}.

The algorithm continues to run until a convergence criterion is met in the form of a bound on the norm of the change in the parameters' estimations. %$|| \hat{\theta}^{(v+1)}-\hat{\theta}^{(v)} ||$
In practice we chose $\ell_\infty$, but any other norm could be used in Algorithm
~\ref{algo} as in finite dimensions, all norms are equivalent (if a sequence converges according to one norm then it does so for any norm).

%, the convergence is proven for any norm in Theorem \ref{amle conv}. We do not specify the norm in the Algo \ref{algo} but in practice $\ell_\infty$ is used. 
%We do not specify the norm, since in finite dimensions all the norms are equivalent (if a sequence converges according to one norm then it does so for any norm).
%\cj{Pareil, cest très difficile à comprendre sans une introduction qui explique ce qu'on va faire et pourquoi.}

We define the vector of parameters $\hat{\theta}^{(v)} = (\hat{p}^{(v)},\hat{q}^{(v)},\hat{t}^{(v)})$ 
%\cj{Ambigu puisqu'on a dit que les paramètres ne sont pas connus.} 
containing the voters' estimated noise parameters as well as the prior information estimated parameters at iteration $v$.
%\cj{La notation $k$ est déjà prise: Je pense que non.} 
In particular $ \hat{\theta}^{(0)}$ is the input initial values. %We omit the discussion of 
The choice of the exact initial values 
%of this initialization for now 
depends on the application at hand.

Note that at convergence, only local optimality is guaranteed.

%\footnotetext{$(S_{max,z}^{(v)}, S_{tie,z}^{(v)},S_{tie,z}^{(v)}) $ is a partition of $\mathcal{A}$ as defined in Def~\ref{threshold_partition} for parameters $(p,q,t)= \hat{\theta}^{(v)}$ and approval profile $(A^z_i)_{i \in N}$.}



\begin{theorem}\label{amle conv}
%Given the set of approval ballots $(A_i^l)$ and 
For any initial values $\hat{\theta}^{(0)}$,
%we have that: 
%   \begin{small}
%\begin{equation*}\label{monotonicity}
%    \begin{split}
%     \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v+1)}) & \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})
%     \geq \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)})    
%    \end{split}
%\end{equation*}
%\end{small}
%In addition, 
\textit{AMLE} converges to a fixed point after a finite number of iterations.
\end{theorem}

 We only provide a sketch of proof and defer the full proof to the Appendix.
\begin{proof}
First we have by Theorem \ref{constrained}
%\cj{Pour moi ce n'est pas clair en quoi cela découle du  Theorem \ref{constrained}. Par ailleurs je préfère qu'on numérote les résultats comme suit : Proposition 1, Theorem 2, proposition 3, Lemma 4, Theorem 5 etc.}
%that $ \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)}) = \max_{S \in \mathcal{S}} \mathcal{L}(A,S,\hat{\theta}^{(v)})$,
%and we have in particular 
that: 
$$\mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})  \geq \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)}) $$

%To prove that $\mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v+1)})  \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})$ we use the fact that we update $(p,q,t)$ by their MLE. By Proposition \ref{pq} we have that
%$(\hat{p}^{(v+1)},\hat{q}^{(v+1)}) = \argmax_{(p,q)} \mathcal{L}(A,\hat{S}^{(v+1)},p,q,\hat{t}^{(v)})$.
%Also by Proposition \ref{prior}, and since we apply it sequentially to update $t_j$ we have:

By Proposition \ref{prior} and Proposition \ref{pq}, we deduce that:
$$ \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v+1)})  \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})$$
Hence, the likelihood increases at every step. Since there is a finite number of possible values for the ground truth (namely $2^{m L}$), the convergence of the algorithm is guaranteed.
%To prove convergence, it suffices to show that $\hat{S}^{(v)}=\hat{S}^{(v+1)}$ for some $v$ (which guarantees the estimators staying unchanged hereafter). 
%To prove the convergence, notice that the ground truth has a finite number of possible values (exactly $2^{m\times L}$) and that if $\hat{S}^{k}=\hat{S}^{k+1}$ for some $k$ then the algorithm converges (the estimators will remain the same after any number of iterations).
%Notice that the ground truth has a finite number of possible values (exactly $2^{m L}$), leading
%Since the number of possible values is finite, 
%the algorithm to cycle at some iteration. For the sake of simplicity, suppose that this cycle is of length $2$, in other words, suppose that $\hat{S}^{(v+2)}=\hat{S}^{(v)}$ for some $v$; this also implies that $\hat{\theta}^{(v+2)}=\hat{\theta}^{(v)}$. So:
%$$    \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)})  = \mathcal{L}(A,\hat{S}^{(v+2)},\hat{\theta}^{(v+2)}) \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})
%$$
%By optimality of $\hat{S}^{(v+1)}$, we have also that: 
%$$ \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)}) \geq \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)})$$
%\begin{align*}
%    \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)}) & = \mathcal{L}(A,\hat{S}^{(v+2)},\hat{\theta}^{(v+2)})\\
%    & \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v+1)})\\
%    & \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})
%\end{align*}
%So: $$ \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)}) \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)}) ~~ (2)$$
%By $(1)$ and $(2)$,  
%Hence, we get that: $$\mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)}) = \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)})$$
%and thus,
%in the absence of ties 
%$\hat{S}^{(v+1)}=\hat{S}^{(v)}=\argmax_{S \in \mathcal{S}_{l,u}} \mathcal{L}(A,S,\hat{\theta}^{(v)})$
%and the estimators will remain the same after any number of iterations following $v$.
\end{proof}


\begin{comment}
\begin{proof}
First we have by Theorem \ref{constrained}
%\cj{Pour moi ce n'est pas clair en quoi cela découle du  Theorem \ref{constrained}. Par ailleurs je préfère qu'on numérote les résultats comme suit : Proposition 1, Theorem 2, proposition 3, Lemma 4, Theorem 5 etc.}
that $ \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)}) = \max_{S \in \mathcal{S}} \mathcal{L}(A,S,\hat{\theta}^{(v)})$,
and we have in particular that: 
$$\mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})  \geq \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)}) $$

To prove that $\mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v+1)})  \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})$ we use the fact that we update $(p,q,t)$ by their MLE. By Proposition \ref{pq} we have that
$(\hat{p}^{(v+1)},\hat{q}^{(v+1)}) = \argmax_{(p,q)} \mathcal{L}(A,\hat{S}^{(v+1)},p,q,\hat{t}^{(v)})$.
Also by Proposition \ref{prior}, and since we apply it sequentially to update $t_j$ we have:
$$ \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v+1)})  \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})$$

To prove convergence, it suffices to show that $\hat{S}^{(v)}=\hat{S}^{(v+1)}$ for some $v$ (which guarantees the estimators staying unchanged hereafter). 
%To prove the convergence, notice that the ground truth has a finite number of possible values (exactly $2^{m\times L}$) and that if $\hat{S}^{k}=\hat{S}^{k+1}$ for some $k$ then the algorithm converges (the estimators will remain the same after any number of iterations).
Notice that the ground truth has a finite number of possible values (exactly $2^{m L}$), leading
%Since the number of possible values is finite, 
the algorithm to cycle at some iteration. For the sake of simplicity, suppose that this cycle is of length $2$, in other words, suppose that $\hat{S}^{(v+2)}=\hat{S}^{(v)}$ for some $v$; this also implies that $\hat{\theta}^{(v+2)}=\hat{\theta}^{(v)}$. So:
$$    \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)})  = \mathcal{L}(A,\hat{S}^{(v+2)},\hat{\theta}^{(v+2)}) \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})
$$
By optimality of $\hat{S}^{(v+1)}$, we have also that: 
$$ \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)}) \geq \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)})$$
%\begin{align*}
%    L(A,\hat{S}^{(v)},\hat{\theta}^{(v)}) & = L(A,\hat{S}^{(v+2)},\hat{\theta}^{(v+2)})\\
%    & \geq L(A,\hat{S}^{(v+1)},\hat{\theta}^{(v+1)})\\
%    & \geq L(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)})
%\end{align*}
%So: $$ L(A,\hat{S}^{(v)},\hat{\theta}^{(v)}) \geq L(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)}) ~~ (2)$$
%By $(1)$ and $(2)$,  
Hence, we get that: $$\mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)}) = \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)})$$
and thus,
%in the absence of ties 
$\hat{S}^{(v+1)}=\hat{S}^{(v)}=\argmax_{S \in \mathcal{S}_{l,u}} \mathcal{L}(A,S,\hat{\theta}^{(v)})$
and the estimators will remain the same after any number of iterations following $v$.
\end{proof}
\end{comment}
%As an important remark, we proved that
Because $\mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v+1)})  \geq \mathcal{L}(A,\hat{S}^{(v+1)},\hat{\theta}^{(v)}) \geq \mathcal{L}(A,\hat{S}^{(v)},\hat{\theta}^{(v)})$, the likelihood increases at each step of the algorithm. This guarantees that whenever the execution stops, the likelihood is closer to the maximum than it initially was. Therefore the algorithm can not only be run until convergence, but it can also be run as an anytime algorithm. 
% Moreover, we do not specify the norm, since in finite dimensions all the norms are equivalent (if a sequence converges according to one norm then it does so for any norm).

\begin{example}\label{example amle}
Take $n = 3$, $m = 5$, $l=1$, $u=2$, 
%Let $\Voters=\{1,2,3\}$ and $\mathcal{A}=\{a_1,\dots,a_5\}$. The sizes of the ground truths of
$L=4$, and the following profile and initial parameters:
%are between $l=1$ and $u=2$.
%The approval profiles are given in Table \ref{approval profile}: 
\begin{table}[h]
    \centering
    \begin{tabular}{|l|c|c|c|c|}
  \hline
 & $A^1$  & $A^2$ & $A^3$ & $A^4$ \\
  \hline
  Voter $1$ & $\{a_1,a_4\}$ & $\{a_1\}$ & $\{a_3\}$ & $\{a_1\}$ \\
  \hline
  Voter $2$ & $\{a_2\}$ & $\{a_5\}$ & $\{a_4\}$ & $\{a_1\}$\\
  \hline
  Voter $3$ & $\{a_2,a_3,a_4\}$ & $\{a_2,a_3,a_5\}$ & $\{a_2,a_3\}$ & $\{a_3\}$ \\
  \hline 
\end{tabular}
    %\caption{Approval ballots of 3 voters on 4 instances}
    \label{approval profile}
\end{table}
$$\left\{
    \begin{array}{lll}
        \hat{p}_1^{(0)} = 0.5 & \hat{p}_2^{(0)} = 0.5 & \hat{p}_3^{(0)} = 0.5  \\
        \hat{q}_1^{(0)} = 0.44 & \hat{q}_2^{(0)}  =0.41 & \hat{q}_3^{(0)}  =0.32 \\
        \hat{t}_1^{(0)}=\dots=\hat{t}_5^{(0)} &  =0.5
    \end{array}
\right.$$
%$$\left\{
%    \begin{array}{ll}
%        (\hat{p}_1^{(0)},\hat{q}_1^{(0)}) & =(0.5,0.44)  \\
%        (\hat{p}_2^{(0)},\hat{q}_2^{(0)}) & =(0.5,0.41)\\
%        (\hat{p}_3^{(0)},\hat{q}_3^{(0)}) & = (0.5,0.32)\\
%        \hat{t}_1^{(0)}=\dots=\hat{t}_5^{(0)} & =0.5
%    \end{array}
%\left.$$

\paragraph{Estimating the ground truth:} The first step 
is the application of Theorem \ref{constrained} to estimate the ground truth of the instances given the initial parameters, yielding
$\hat{S}_1^{(1)}= \{a_2,a_4\}, \hat{S}_2^{(1)} = \{a_2,a_5\}, \hat{S}_3^{(1)} = \{a_2,a_3\}, \hat{S}_4^{(1)} = \{a_1,a_3\} $

\paragraph{Estimating the voter reliabilities:} In the next step we use these estimates of the ground truths to compute the MLEs of the voter reliabilities.
%For instance, voter $1$ has only $2$ false positive labels from a total of $12$ negative labels so $\hat{q}_1^{(1)}=\frac{2}{12}=0.17$. And she has $3$ out of $8$ true positives so $\hat{p}_1^{(1)}=\frac{3}{8}=0.38$. In final, we get:
For instance, voter $1$ has 2 false positive labels from a total of $12$ negative labels so $\hat{q}_1^{(1)}=\frac{2}{12}=0.17$ and she has 3 true positive labels out of 8 positive ones so $\hat{p}_1^{(1)}=\frac{3}{8}=0.38$. In the end, we get:
$$\left\{
    \begin{array}{lll}
        \hat{p}_1^{(1)}=0.38  & \hat{p}_2^{(1)}=0.38 & \hat{p}_3^{(1)}=0.88\\
        \hat{q}_1^{(1)}=0.17  & \hat{q}_2^{(1)}=0.08 & \hat{q}_3^{(1)}=0.17
    \end{array}
\right.$$

\paragraph{Estimating the prior parameters:} The final step of this iteration consists in updating the estimations of the prior parameters by applying Proposition \ref{prior} sequentially.
First we estimate $\hat{t}_1^{(1)}$ given $\hat{S}^{(1)}$ and $\hat{t}_2^{(0)},\dots,\hat{t}_5^{(0)}$ by maximum likelihood estimation. We first compute $\overline{\alpha}_1=\beta(0,1,t_2,\dots,t_5)=0.3125$, $\underline{\alpha}_1=\beta(1,2,t_2,\dots,t_5)=1$ and $occ(a_1)=1$.
%$$\left\{
%    \begin{array}{ll}
%        \overline{\alpha}_1 & = \beta(0,1,t_2,\dots,t_5)=0.3125\\
%        \underline{\alpha}_1 & = \beta(1,2,t_2,\dots,t_5)=1\\
%        occ(a_1) & = 1
%    \end{array}
%\right.$$
Then the MLE of $t_1$ is:
$$\hat{t}_1 = \frac{occ(a_1)\overline{\alpha}_1}{(L-occ(a_1))\underline{\alpha}_1 + occ(a_1)\overline{\alpha}_1}= 0.09  $$
The next steps are to estimate $\hat{t}_2^{(1)}$ given $\hat{t}_1^{(1)},\hat{t}_3^{(0)},\hat{t}_4^{(0)},\hat{t}_5^{(0)}$ and so on. Finally, we get:
\begin{small}
$$\hat{t}_1^{(1)} = 0.09,\hat{t}_2^{(1)} = 0.56 , \hat{t}_3^{(1)} = 0.28, \hat{t}_4^{(1)} = 0.14, \hat{t}_5^{(1)} = 0.20 $$
\end{small}


Fix $\varepsilon=10^{-5}$. We repeat all steps until convergence (according to $ \ell_{\infty}$),
%\cj{A-t-on déjà parlé du choix de $\ell_{\infty}$? }
after $5$ full iterations. In the fixed point, the estimations of the ground truths are:
$$\hat{S}_1 = \{a_2,a_3\},  \hat{S}_2 = \{a_2,a_3\}, \hat{S}_3 = \{a_2,a_3\}, \hat{S}_4 = \{a_3\}$$
\end{example}

\section{Experiments}\label{sec: experiments}
\subsection{Experiment Design and Data Collection}
We designed an image annotation task as a football quiz.\footnote{The dataset and code are accessible at \url{https://github.com/taharallouche/Football-Quiz-Crowdsourcing}} We selected $15$ pictures taken during different matches between two of the following teams: Real Madrid, Inter Milan, Bayern Munich, Barcelona, Paris Saint-Germain.
%(PSG). 
In each picture, it may be the case that players from both teams appear, or players from only one team, therefore 
%(which fits the size-relaxed framework with 
$l=1$ and $u=2$. Each participant is shown the instances one by one, and is each time asked to select all
the teams she can spot (see Figure \ref{Quiz}). We 
%also 
designed a simple incentive for participants, consisting in ranking them according to the following principle:
%\cj{Should we cite \cite{approval2015}? T: No ce n'est pas le même mechanism et en tout cas eux ils s'interessent au cas single-winner.} 
\begin{compactitem}
    %\item The participants whose answers contain all the right alternatives to an image she get $1$ point. They participants are then ranked according to the sum of cumulated points.
    \item The participants get one point whenever their answer contains all correct alternatives for a picture. They are then ranked according to their cumulated points. 
    \item To break ties, the participant who selected a smaller number of alternatives overall is ranked first.
\end{compactitem}
\begin{figure}[h]
     \centering
         \includegraphics[width=0.40\textwidth]{figures/Example_QuizH.png}
    \caption{Example of Annotation Task}
    \label{Quiz}
\end{figure}

 We gathered the answers of $76$ participants: only two of them spammed by simply selecting all the alternatives. Figure \ref{hist_all} shows that voters responded well to the incentives by mostly selecting one or two alternatives.
\begin{figure}[h]
     \centering
         \includegraphics[width=0.40\textwidth]{figures/hist_all.png}
    \caption{Histogram of answers' size}
    \label{hist_all}
\end{figure}
\subsection{Anna Karenina's Initialization}
%In light of the results 
%\cj{What do you mean? T : Inspired by} the results 
%and observations 

Inspired by the \emph{Anna Karenina Principle} in~\cite{truth2019}, we assign more weight to voters who are \emph{closer} to the others on average, initializing the precision parameters $(p_i,q_i)$ accordingly. %who express the expected average distance of a voter to the others as a linear transformation of her error-rate,
%we initialize the precision parameters $(p,q)$ in a way that assigns more weight to the voters who are \emph{closer} to each other in average 
%(we used the Jaccard dissimilarity measure $d_{Jacc}(A,B)=\frac{|\overline{A}\cap B|+|\overline{B}\cap A|}{|\overline{A}\cap B|+|\overline{B}\cap A|+|A\cap B|}$). 
This suits our context, where voter competence is highly polarized: some voters are experts and cast similar answers close to the ground truth, the others are less reliable and their answers are dispersed among all combinations.

We use the following heuristics (see Algorithm \ref{init_karenina}) for the initialization:
\begin{algorithm}
\caption{ Initializing $(p_i,q_i)_i$ }
\label{init_karenina}
$\begin{array}{ll}
\textbf{Input:} & \mbox{Approval ballots $(A_i^z)_{z,i}$}\\
\textbf{Output:} &\mbox{Initialization $(\hat{p}^{(0)}_i,\hat{q}^{(0)}_i)$} 
\end{array}$
\begin{algorithmic} 
\STATE -Compute $w_{max}=\frac{n}{1+n}, w_{min}=\frac{1}{1+n}$
\STATE -Compute $d_i = \sum\limits_{j\neq i} d_{Jacc}(A_i,A_j)$ (Jaccard distance)
\STATE -Compute $d_{max} = \max d_i, d_{min} = \min d_i $
\STATE -Compute $w_i = (w_{max}-w_{min})\left(\frac{\nicefrac{1}{d_i}-\nicefrac{1}{d_{max}}}{\nicefrac{1}{d_{min}}-\nicefrac{1}{d_{max}}}\right)+w_{min}$
\STATE -Fix $\hat{p}^{(0)}_i= \frac{1}{2}$ and $\hat{q}^{(0)}_i=\frac{1-\frac{e^{w_i}-1}{e^{w_i}+1}}{2}$
\end{algorithmic}
\end{algorithm}

%\begin{remark}
%The formulas in \bj 
Algorithm \ref{init_karenina} guarantees that the parameters $(\hat{p}^{(0)}_i,\hat{q}^{(0)}_i)$ of a voter are such that her initial weight is equal to $w_i$, and that $\frac{w_{max}}{w_{min}}=n$: therefore, initially, the voter closest in average to the other voters counts $n$ times more than the voter with the largest average distance.
%\end{remark}

In the Appendix we give an example illustrating this initialization, and an empirical comparison with other classical initializations.

\begin{comment}
\begin{example}
Recall the approval profile in Table \ref{approval profile}. Here we have $w_{max}=0.75, w_{min}=0.25 $.
First, compute the Jaccard distance of all voters:
$d_1=1.71, d_2=1.69,d_3=1.65 $.
So $d_{max}=d_1=1.71$ and $d_{min}=d_3=1.65$, which means that voter $3$ will get the biggest weight $w_3=w_{max}=0.75$ and voter $1$ gets the smallest weight $w_1=w_{min}$.
Next, compute the weight that will be assigned to each voter, for instance:
$$w_2=(w_{max}-w_{min})\frac{\frac{1}{d_2}-\frac{1}{d_{max}}}{\frac{1}{d_{min}}-\frac{1}{d_{max}}}+w_{min}=0.38 $$
Now we can set the initial values for the reliability parameters accordingly:
$$\left\{
    \begin{array}{lll}
        \hat{p}_1^{(0)} = 0.5 & \hat{p}_2^{(0)} = 0.5 & \hat{p}_3^{(0)} = 0.5  \\
        \hat{q}_1^{(0)} = 0.44 & \hat{q}_2^{(0)}  =0.41 & \hat{q}_3^{(0)}  =0.32 \\
    \end{array}
\right.$$


%$$\left\{
%    \begin{array}{ll}
%        (\hat{p}^{(0)}_1,\hat{q}^{(0)}_1) & =(0.5,0.44) \\
%        (\hat{p}^{(0)}_2,\hat{q}^{(0)}_2) & =(0.5,0.41)\\
%        (\hat{p}^{(0)}_3,\hat{q}^{(0)}_3) & =(0.5,0.32)\\
%    \end{array}
%\right.$$
\end{example}
\end{comment}



\subsection{Results}

To assess the importance of 
%modelling the
prior information on the size of the ground truth, we tested the AMLE algorithm with free bounds $(l,u)=(0,m)$ (will be referred to as $\mbox{AMLE}_f$) and the $\mbox{AMLE}_c$ algorithm with $(l,u)=(1,2)$. 
%which correspond with the prior knowledge. 
We also apply the modal rule \cite{Evaluating2020} which 
%given an approval profile $(A_1,\dots,A_n)$ 
outputs the subset of alternatives that most frequently appears as an approval ballot $\argmax_{S \in \mathcal{S}} 
%\sum_{i=1}^n \mathds{1}\{S=A_i\}
\left|i \in \Voters, S=A_i \right|
$,  and a variant of label-wise majority rule which 
%given an approval profile $(A_1,\dots, A_n)$ 
outputs the subset of alternatives $S$ such that
$a \in S \iff \left|i \in \Voters, a \in A_i \right| > \frac{n}{2}$. If this subset is empty it is replaced by the alternative with highest approval count, and if it has more than two alternatives then we only keep the top-2 alternatives. 

%\begin{table}[h]
%    \centering
%    \begin{tabular}{|l|c|c|c|c|}
%  \hline
%   &$\mbox{AMLE}_c$  & $\mbox{AMLE}_f$ & Modal & Majority \\
%  \hline
%  Hamming & \textbf{0.88} & 0.86 & 0.84 & 0.80 \\
%  \hline
%  0/1 & \textbf{0.60} & 0.53  & 0.46 & 0.26\\
%  \hline
%\end{tabular}
%    \caption{Hamming and 0/1 accuracy for entire dataset}
%    \label{entire_dataset}
%    \vspace{-2mm}
%\end{table}


We took $20$ batches of $n=10$ to $n=74$ randomly drawn voters and applied the four methods to all of them (see Figure \ref{foot_voters_ham},\ref{foot_voters_01}). As classically done in the literature~\cite{aggregation2020}, we use the Hamming accuracy $\frac{1}{m L} \sum_{z=1}^L |S^*_z \cap \hat{S}^z|+|\overline{S^*_z} \cap \overline{\hat{S}^z}|  $
and the 0/1 accuracy
$\frac{1}{L} \sum_{z=1}^L \mathds{1}\{S^*_z = \hat{S}^z\} $  as metrics and report their 0.95 confidence intervals.

\begin{figure}
\centering
\begin{minipage}[b]{0.92\linewidth}
\centering
         \includegraphics[width=\textwidth]{figures/Ham_proj.pdf}
             \subcaption{Hamming accuracy}
        \label{foot_voters_ham}
\end{minipage}
%\quad
\begin{minipage}[b]{0.92\linewidth}
\centering
         \includegraphics[width=\textwidth]{figures/01_proj.pdf}
        \subcaption{0/1 accuracy}
        \label{foot_voters_01}
\end{minipage}
\caption{Accuracies of different aggregation methods}
        \label{foot_voters}
\end{figure}


\begin{comment}
%\vspace{-1mm}
\begin{figure}[ht]
\centering
\begin{minipage}[b]{0.67\linewidth}
\centering
         \includegraphics[width=\textwidth]{Ham_proj.pdf}
             \subcaption{Hamming accuracy}
        \label{foot_voters_ham}
\end{minipage}
\quad
\begin{minipage}[b]{0.67\linewidth}
\centering
         \includegraphics[width=\textwidth]{01_proj.pdf}
        \subcaption{0/1 accuracy}
        \label{foot_voters_01}
\end{minipage}
\caption{Accuracies of different aggregation methods}
        \label{foot_voters}
        \vspace{-1mm}
\end{figure}
\end{comment}


%\paragraph{Observations:}
%First, 
We notice 
%in Figures \ref{foot_voters_01} and \ref{foot_voters_ham} 
that the majority and the modal rule are outperformed by AMLE, which can be explained by the fact that they do not take into account the voters' reliabilities. %do not incorporate the heterogeneity among voter reliabilities. 
Comparing the performances of $\mbox{AMLE}_c$ and $\mbox{AMLE}_f$ emphasizes the importance of the prior knowledge on the committee size to improve the quality of the estimation.%we can see the importance of taking into the account the prior knowledge on the committee size to ameliorate the quality of the estimation


We also compared the execution time of $\mbox{AMLE}_c$ and $\mbox{AMLE}_f$ (see Figure \ref{time_comp}) when run on Intel Core i7-10610U CPU @1.80Ghz 4 cores, 8 threads and 32Gb RAM. Unsurprisingly, $\mbox{AMLE}_c$ needs more running time, especially for more than $40$ voters.
\begin{figure}[h]
    \centering
    \includegraphics[width=0.45\textwidth]{figures/time.png}
    \caption{Execution time}
    \label{time_comp}
\end{figure}



\section{Conclusion}\label{conclusion}
We study multi-winner approval voting from an epistemic point of view. The specificity of our work is threefold: (a) the ground truth consists of a set of alternatives; (b) the input consists of approval votes; (c) the competence of the various voters is not known {\em a priori} but learnt from the input. We proposed a noise model that incorporates the prior belief about the size of the ground truth. Then we derived an iterative algorithm to intertwinedly estimate the ground truth labels, the voter noise parameters and the prior belief parameters and we prove its convergence. Our algorithm is based on a simplification of Expectation-Maximization (EM), and its simple steps are more easily explainable to voters than EM and other similar statistical learning approaches.

Although we mainly considered a general multi-instance task that fits the collective annotation framework, where each voter answers several questions on the same set of alternatives, we can nonetheless apply the same algorithm to single-instance problems (such as the allocation of scarce medical resources) where only one question is answered. In this case, the prior parameters cannot be updated and it suffices to fix them once and for all and alternate between the estimation of the ground truth and the voter parameters.

In some contexts ({\em e.g.}, patients in a hospital), alternatives and votes are not observed at once but streamed. To cope with this online setup we consider extending our AMLE algorithm in the spirit of \cite{online2009}.

\begin{acknowledgements} % will be removed in pdf for initial submission,
                         % so you can already fill it to test with the
                         % ‘accepted’ class option
    This work was funded in part by the French government under management of Agence Nationale de la Recherche as
part of the ``Investissements d’avenir'' program, reference
ANR-19-P3IA-0001 (PRAIRIE 3IA Institute).
\end{acknowledgements}

\bibliography{allouche_258}


\end{document}
