
\section{Introduction}
ACM's consolidated article template, introduced in 2017, provides a
consistent \LaTeX\ style for use across ACM publications, and
incorporates accessibility and metadata-extraction functionality
necessary for future Digital Library endeavors. Numerous ACM and
SIG-specific \LaTeX\ templates have been examined, and their unique
features incorporated into this single new template.

If you are new to publishing with ACM, this document is a valuable
guide to the process of preparing your work for publication. If you
have published with ACM before, this document provides insight and
instruction into more recent changes to the article template.

The ``\verb|acmart|'' document class can be used to prepare articles
for any ACM publication --- conference or journal, and for any stage
of publication, from review to final ``camera-ready'' copy, to the
author's own version, with {\itshape very} few changes to the source.

\section{Template Overview}
As noted in the introduction, the ``\verb|acmart|'' document class can
be used to prepare many different kinds of documentation --- a
double-blind initial submission of a full-length technical paper, a
two-page SIGGRAPH Emerging Technologies abstract, a ``camera-ready''
journal article, a SIGCHI Extended Abstract, and more --- all by
selecting the appropriate {\itshape template style} and {\itshape
  template parameters}.

This document will explain the major features of the document
class. For further information, the {\itshape \LaTeX\ User's Guide} is
available from
\url{https://www.acm.org/publications/proceedings-template}.

\subsection{Template Styles}

The primary parameter given to the ``\verb|acmart|'' document class is
the {\itshape template style} which corresponds to the kind of publication
or SIG publishing the work. This parameter is enclosed in square
brackets and is a part of the {\verb|documentclass|} command:
\begin{verbatim}
  \documentclass[STYLE]{acmart}
\end{verbatim}

Journals use one of three template styles. All but three ACM journals
use the {\verb|acmsmall|} template style:
\begin{itemize}
\item {\texttt{acmsmall}}: The default journal template style.
\item {\texttt{acmlarge}}: Used by JOCCH and TAP.
\item {\texttt{acmtog}}: Used by TOG.
\end{itemize}

The majority of conference proceedings documentation will use the {\verb|acmconf|} template style.
\begin{itemize}
\item {\texttt{acmconf}}: The default proceedings template style.
\item{\texttt{sigchi}}: Used for SIGCHI conference articles.
\item{\texttt{sigchi-a}}: Used for SIGCHI ``Extended Abstract'' articles.
\item{\texttt{sigplan}}: Used for SIGPLAN conference articles.
\end{itemize}

\subsection{Template Parameters}

In addition to specifying the {\itshape template style} to be used in
formatting your work, there are a number of {\itshape template parameters}
which modify some part of the applied template style. A complete list
of these parameters can be found in the {\itshape \LaTeX\ User's Guide.}

Frequently-used parameters, or combinations of parameters, include:
\begin{itemize}
\item {\texttt{anonymous,review}}: Suitable for a ``double-blind''
  conference submission. Anonymizes the work and includes line
  numbers. Use with the \texttt{\acmSubmissionID} command to print the
  submission's unique ID on each page of the work.
\item{\texttt{authorversion}}: Produces a version of the work suitable
  for posting by the author.
\item{\texttt{screen}}: Produces colored hyperlinks.
\end{itemize}

This document uses the following string as the first command in the
source file:
\begin{verbatim}
\end{verbatim}

\section{Modifications}

Modifying the template --- including but not limited to: adjusting
margins, typeface sizes, line spacing, paragraph and list definitions,
and the use of the \verb|\vspace| command to manually adjust the
vertical spacing between elements of your work --- is not allowed.

{\bfseries Your document will be returned to you for revision if
  modifications are discovered.}

\section{Typefaces}

The ``\verb|acmart|'' document class requires the use of the
``Libertine'' typeface family. Your \TeX\ installation should include
this set of packages. Please do not substitute other typefaces. The
``\verb|lmodern|'' and ``\verb|ltimes|'' packages should not be used,
as they will override the built-in typeface families.

\section{Title Information}

The title of your work should use capital letters appropriately -
\url{https://capitalizemytitle.com/} has useful rules for
capitalization. Use the {\verb|title|} command to define the title of
your work. If your work has a subtitle, define it with the
{\verb|subtitle|} command.  Do not insert line breaks in your title.

If your title is lengthy, you must define a short version to be used
in the page headers, to prevent overlapping text. The \verb|title|
command has a ``short title'' parameter:
\begin{verbatim}
  \title[short title]{full title}
\end{verbatim}

\section{Authors and Affiliations}

Each author must be defined separately for accurate metadata
identification.  As an exception, multiple authors may share one
affiliation. Authors' names should not be abbreviated; use full first
names wherever possible. Include authors' e-mail addresses whenever
possible.

Grouping authors' names or e-mail addresses, or providing an ``e-mail
alias,'' as shown below, is not acceptable:
\begin{verbatim}
  \author{Brooke Aster, David Mehldau}
  \email{dave,judy,steve@university.edu}
  \email{firstname.lastname@phillips.org}
\end{verbatim}

The \verb|authornote| and \verb|authornotemark| commands allow a note
to apply to multiple authors --- for example, if the first two authors
of an article contributed equally to the work.

If your author list is lengthy, you must define a shortened version of
the list of authors to be used in the page headers, to prevent
overlapping text. The following command should be placed just after
the last \verb|\author{}| definition:
\begin{verbatim}
  \renewcommand{\shortauthors}{McCartney, et al.}
\end{verbatim}
Omitting this command will force the use of a concatenated list of all
of the authors' names, which may result in overlapping text in the
page headers.

The article template's documentation, available at
\url{https://www.acm.org/publications/proceedings-template}, has a
complete explanation of these commands and tips for their effective
use.

Note that authors' addresses are mandatory for journal articles.

\section{Rights Information}

Authors of any work published by ACM will need to complete a rights
form. Depending on the kind of work, and the rights management choice
made by the author, this may be copyright transfer, permission,
license, or an OA (open access) agreement.

Regardless of the rights management choice, the author will receive a
copy of the completed rights form once it has been submitted. This
form contains \LaTeX\ commands that must be copied into the source
document. When the document source is compiled, these commands and
their parameters add formatted text to several areas of the final
document:
\begin{itemize}
\item the ``ACM Reference Format'' text on the first page.
\item the ``rights management'' text on the first page.
\item the conference information in the page header(s).
\end{itemize}

Rights information is unique to the work; if you are preparing several
works for an event, make sure to use the correct set of commands with
each of the works.

The ACM Reference Format text is required for all articles over one
page in length, and is optional for one-page articles (abstracts).

\section{CCS Concepts and User-Defined Keywords}

Two elements of the ``acmart'' document class provide powerful
taxonomic tools for you to help readers find your work in an online
search.

The ACM Computing Classification System ---
\url{https://www.acm.org/publications/class-2012} --- is a set of
classifiers and concepts that describe the computing
discipline. Authors can select entries from this classification
system, via \url{https://dl.acm.org/ccs/ccs.cfm}, and generate the
commands to be included in the \LaTeX\ source.

User-defined keywords are a comma-separated list of words and phrases
of the authors' choosing, providing a more flexible way of describing
the research being presented.

CCS concepts and user-defined keywords are required for for all
articles over two pages in length, and are optional for one- and
two-page articles (or abstracts).

\section{Sectioning Commands}

Your work should use standard \LaTeX\ sectioning commands:
\verb|section|, \verb|subsection|, \verb|subsubsection|, and
\verb|paragraph|. They should be numbered; do not remove the numbering
from the commands.

Simulating a sectioning command by setting the first word or words of
a paragraph in boldface or italicized text is {\bfseries not allowed.}

\section{Tables}

The ``\verb|acmart|'' document class includes the ``\verb|booktabs|''
package --- \url{https://ctan.org/pkg/booktabs} --- for preparing
high-quality tables.

Table captions are placed {\itshape above} the table.

Because tables cannot be split across pages, the best placement for
them is typically the top of the page nearest their initial cite.  To
ensure this proper ``floating'' placement of tables, use the
environment \textbf{table} to enclose the table's contents and the
table caption.  The contents of the table itself must go in the
\textbf{tabular} environment, to be aligned properly in rows and
columns, with the desired horizontal and vertical rules.  Again,
detailed instructions on \textbf{tabular} material are found in the
\textit{\LaTeX\ User's Guide}.

Immediately following this sentence is the point at which
Table~\ref{tab:freq} is included in the input file; compare the
placement of the table here with the table in the printed output of
this document.

\begin{table}
  \caption{Frequency of Special Characters}
  \label{tab:freq}
  \begin{tabular}{ccl}
    \toprule
    Non-English or Math&Frequency&Comments\\
    \midrule
    \O & 1 in 1,000& For Swedish names\\
    $\pi$ & 1 in 5& Common in math\\
    \$ & 4 in 5 & Used in business\\
    $\Psi^2_1$ & 1 in 40,000& Unexplained usage\\
  \bottomrule
\end{tabular}
\end{table}

To set a wider table, which takes up the whole width of the page's
live area, use the environment \textbf{table*} to enclose the table's
contents and the table caption.  As with a single-column table, this
wide table will ``float'' to a location deemed more
desirable. Immediately following this sentence is the point at which
Table~\ref{tab:commands} is included in the input file; again, it is
instructive to compare the placement of the table here with the table
in the printed output of this document.

\begin{table*}
  \caption{Some Typical Commands}
  \label{tab:commands}
  \begin{tabular}{ccl}
    \toprule
    Command &A Number & Comments\\
    \midrule
    \texttt{{\char'134}author} & 100& Author \\
    \texttt{{\char'134}table}& 300 & For tables\\
    \texttt{{\char'134}table*}& 400& For wider tables\\
    \bottomrule
  \end{tabular}
\end{table*}

Always use midrule to separate table header rows from data rows, and
use it only for this purpose. This enables assistive technologies to
recognise table headers and support their users in navigating tables
more easily.

\section{Math Equations}
You may want to display math equations in three distinct styles:
inline, numbered or non-numbered display.  Each of the three are
discussed in the next sections.

\subsection{Inline (In-text) Equations}
A formula that appears in the running text is called an inline or
in-text formula.  It is produced by the \textbf{math} environment,
which can be invoked with the usual
\texttt{{\char'134}begin\,\ldots{\char'134}end} construction or with
the short form \texttt{\$\,\ldots\$}. You can use any of the symbols
and structures, from $\alpha$ to $\omega$, available in
\LaTeX~\cite{Lamport:LaTeX}; this section will simply show a few
examples of in-text equations in context. Notice how this equation:
\begin{math}
  \lim_{n\rightarrow \infty}x=0
\end{math},
set here in in-line math style, looks slightly different when
set in display style.  (See next section).

\subsection{Display Equations}
A numbered display equation---one set off by vertical space from the
text and centered horizontally---is produced by the \textbf{equation}
environment. An unnumbered display equation is produced by the
\textbf{displaymath} environment.

Again, in either environment, you can use any of the symbols and
structures available in \LaTeX\@; this section will just give a couple
of examples of display equations in context.  First, consider the
equation, shown as an inline equation above:
\begin{equation}
  \lim_{n\rightarrow \infty}x=0
\end{equation}
Notice how it is formatted somewhat differently in
the \textbf{displaymath}
environment.  Now, we'll enter an unnumbered equation:
\begin{displaymath}
  \sum_{i=0}^{\infty} x + 1
\end{displaymath}
and follow it with another numbered equation:
\begin{equation}
  \sum_{i=0}^{\infty}x_i=\int_{0}^{\pi+2} f
\end{equation}
just to demonstrate \LaTeX's able handling of numbering.

\section{Figures}

The ``\verb|figure|'' environment should be used for figures. One or
more images can be placed within a figure. If your figure contains
third-party material, you must clearly identify it as such, as shown
in the example below.
\begin{figure}[h]
  \centering
  \includegraphics[width=\linewidth]{sample-franklin}
  \caption{1907 Franklin Model D roadster. Photograph by Harris \&
    Ewing, Inc. [Public domain], via Wikimedia
    Commons. (\url{https://goo.gl/VLCRBB}).}
  \Description{A woman and a girl in white dresses sit in an open car.}
\end{figure}

Your figures should contain a caption which describes the figure to
the reader.

Figure captions are placed {\itshape below} the figure.

Every figure should also have a figure description unless it is purely
decorative. These descriptions convey what’s in the image to someone
who cannot see it. They are also used by search engine crawlers for
indexing images, and when images cannot be loaded.

A figure description must be unformatted plain text less than 2000
characters long (including spaces).  {\bfseries Figure descriptions
  should not repeat the figure caption – their purpose is to capture
  important information that is not already provided in the caption or
  the main text of the paper.} For figures that convey important and
complex new information, a short text description may not be
adequate. More complex alternative descriptions can be placed in an
appendix and referenced in a short figure description. For example,
provide a data table capturing the information in a bar chart, or a
structured list representing a graph.  For additional information
regarding how best to write figure descriptions and why doing this is
so important, please see
\url{https://www.acm.org/publications/taps/describing-figures/}.

\subsection{The ``Teaser Figure''}

A ``teaser figure'' is an image, or set of images in one figure, that
are placed after all author and affiliation information, and before
the body of the article, spanning the page. If you wish to have such a
figure in your article, place the command immediately before the
\verb|\maketitle| command:
\begin{verbatim}
  \begin{teaserfigure}
    \includegraphics[width=\textwidth]{sampleteaser}
    \caption{figure caption}
    \Description{figure description}
  \end{teaserfigure}
\end{verbatim}

\section{Citations and Bibliographies}

The use of \BibTeX\ for the preparation and formatting of one's
references is strongly recommended. Authors' names should be complete
--- use full first names (``Donald E. Knuth'') not initials
(``D. E. Knuth'') --- and the salient identifying features of a
reference should be included: title, year, volume, number, pages,
article DOI, etc.


Using the BibLaTeX system, the bibliography is included in your source
document with the following command, placed just before the \verb|\end{document}| command:
\begin{verbatim}
  \printbibliography
\end{verbatim}

The command \verb|\addbibresource{bibfile}| declares the \BibTeX\ file to use
in the {\bfseries preamble} (before the command
``\verb|\begin{document}|'') of your \LaTeX\ source
where ``\verb|bibfile|'' is the name, \emph{with} the ``\verb|.bib|'' suffix.
Notice that \verb|\addbibresource| takes only one argument: to declare multiple files,
use multiple instances of the command.

Citations and references are numbered by default. A small number of
ACM publications have citations and references formatted in the
``author year'' style; for these exceptions, please pass the option \verb|style=acmauthoryear|
to the \verb|biblatex| package loaded in the {\bfseries preamble} (before the command
``\verb|\begin{document}|'') of your \LaTeX\ source.


  Some examples.  A paginated journal article \cite{Abril07}, an
  enumerated journal article \cite{Cohen07}, a reference to an entire
  issue \cite{JCohen96}, a monograph (whole book) \cite{Kosiur01}, a
  monograph/whole book in a series (see 2a in spec. document)
  \cite{Harel79}, a divisible-book such as an anthology or compilation
  \cite{Editor00} followed by the same example, however we only output
  the series if the volume number is given \cite{Editor00a} (so
  Editor00a's series should NOT be present since it has no vol. no.),
  a chapter in a divisible book \cite{Spector90}, a chapter in a
  divisible book in a series \cite{Douglass98}, a multi-volume work as
  book \cite{Knuth97}, a couple of articles in a proceedings (of a
  conference, symposium, workshop for example) (paginated proceedings
  article) \cite{Andler79, Hagerup1993}, a proceedings article with
  all possible elements \cite{Smith10}, an example of an enumerated
  proceedings article \cite{VanGundy07}, an informally published work
  \cite{Harel78}, a couple of preprints \cite{Bornmann2019,
    AnzarootPBM14}, a doctoral dissertation \cite{Clarkson85}, a
  master's thesis: \cite{anisi03}, an online document / world wide web
  resource \cite{Thornburg01, Ablamowicz07, Poker06}, a video game
  (Case 1) \cite{Obama08} and (Case 2) \cite{Novak03} and \cite{Lee05}
  and (Case 3) a patent \cite{JoeScientist001}, work accepted for
  publication \cite{rous08}, 'YYYYb'-test for prolific author
  \cite{SaeediMEJ10} and \cite{SaeediJETC10}. Other cites might
  contain 'duplicate' DOI and URLs (some SIAM articles)
  \cite{Kirschmer:2010:AEI:1958016.1958018}. Boris / Barbara Beeton:
  multi-volume works as books \cite{MR781536} and \cite{MR781537}. A
  couple of citations with DOIs:
  \cite{2004:ITE:1009386.1010128,Kirschmer:2010:AEI:1958016.1958018}. Online
  citations: \cite{TUGInstmem, Thornburg01, CTANacmart}.
  Data Artifacts: \cite{UMassCitations}.
  Software project: ~\cite{cgal,delebecque:hal-02090402}. Software Version: ~\cite{gf-tag-sound-repo,}. Software Module: ~\cite{cgal:lp-gi-20a}. Code fragment: ~\cite{simplemapper}.

\section{Acknowledgments}

Identification of funding sources and other support, and thanks to
individuals and groups that assisted in the research and the
preparation of the work should be included in an acknowledgment
section, which is placed just before the reference section in your
document.

This section has a special environment:
\begin{verbatim}
  \begin{acks}
  ...
  \end{acks}
\end{verbatim}
so that the information contained therein can be more easily collected
during the article metadata extraction phase, and to ensure
consistency in the spelling of the section heading.

Authors should not prepare this section as a numbered or unnumbered {\verb|\section|}; please use the ``{\verb|acks|}'' environment.

\section{Appendices}

If your work needs an appendix, add it before the
``\verb|\end{document}|'' command at the conclusion of your source
document.

Start the appendix with the ``\verb|appendix|'' command:
\begin{verbatim}
  
\section{Introduction}
ACM's consolidated article template, introduced in 2017, provides a
consistent \LaTeX\ style for use across ACM publications, and
incorporates accessibility and metadata-extraction functionality
necessary for future Digital Library endeavors. Numerous ACM and
SIG-specific \LaTeX\ templates have been examined, and their unique
features incorporated into this single new template.

If you are new to publishing with ACM, this document is a valuable
guide to the process of preparing your work for publication. If you
have published with ACM before, this document provides insight and
instruction into more recent changes to the article template.

The ``\verb|acmart|'' document class can be used to prepare articles
for any ACM publication --- conference or journal, and for any stage
of publication, from review to final ``camera-ready'' copy, to the
author's own version, with {\itshape very} few changes to the source.

\section{Template Overview}
As noted in the introduction, the ``\verb|acmart|'' document class can
be used to prepare many different kinds of documentation --- a
double-blind initial submission of a full-length technical paper, a
two-page SIGGRAPH Emerging Technologies abstract, a ``camera-ready''
journal article, a SIGCHI Extended Abstract, and more --- all by
selecting the appropriate {\itshape template style} and {\itshape
  template parameters}.

This document will explain the major features of the document
class. For further information, the {\itshape \LaTeX\ User's Guide} is
available from
\url{https://www.acm.org/publications/proceedings-template}.

\subsection{Template Styles}

The primary parameter given to the ``\verb|acmart|'' document class is
the {\itshape template style} which corresponds to the kind of publication
or SIG publishing the work. This parameter is enclosed in square
brackets and is a part of the {\verb|documentclass|} command:
\begin{verbatim}
  \documentclass[STYLE]{acmart}
\end{verbatim}

Journals use one of three template styles. All but three ACM journals
use the {\verb|acmsmall|} template style:
\begin{itemize}
\item {\verb|acmsmall|}: The default journal template style.
\item {\verb|acmlarge|}: Used by JOCCH and TAP.
\item {\verb|acmtog|}: Used by TOG.
\end{itemize}

The majority of conference proceedings documentation will use the {\verb|acmconf|} template style.
\begin{itemize}
\item {\verb|acmconf|}: The default proceedings template style.
\item{\verb|sigchi|}: Used for SIGCHI conference articles.
\item{\verb|sigchi-a|}: Used for SIGCHI ``Extended Abstract'' articles.
\item{\verb|sigplan|}: Used for SIGPLAN conference articles.
\end{itemize}

\subsection{Template Parameters}

In addition to specifying the {\itshape template style} to be used in
formatting your work, there are a number of {\itshape template parameters}
which modify some part of the applied template style. A complete list
of these parameters can be found in the {\itshape \LaTeX\ User's Guide.}

Frequently-used parameters, or combinations of parameters, include:
\begin{itemize}
\item {\verb|anonymous,review|}: Suitable for a ``double-blind''
  conference submission. Anonymizes the work and includes line
  numbers. Use with the \verb|\acmSubmissionID| command to print the
  submission's unique ID on each page of the work.
\item{\verb|authorversion|}: Produces a version of the work suitable
  for posting by the author.
\item{\verb|screen|}: Produces colored hyperlinks.
\end{itemize}

This document uses the following string as the first command in the
source file:
\begin{verbatim}
\documentclass[sigconf]{acmart}
\end{verbatim}

\section{Modifications}

Modifying the template --- including but not limited to: adjusting
margins, typeface sizes, line spacing, paragraph and list definitions,
and the use of the \verb|\vspace| command to manually adjust the
vertical spacing between elements of your work --- is not allowed.

{\bfseries Your document will be returned to you for revision if
  modifications are discovered.}

\section{Typefaces}

The ``\verb|acmart|'' document class requires the use of the
``Libertine'' typeface family. Your \TeX\ installation should include
this set of packages. Please do not substitute other typefaces. The
``\verb|lmodern|'' and ``\verb|ltimes|'' packages should not be used,
as they will override the built-in typeface families.

\section{Title Information}

The title of your work should use capital letters appropriately -
\url{https://capitalizemytitle.com/} has useful rules for
capitalization. Use the {\verb|title|} command to define the title of
your work. If your work has a subtitle, define it with the
{\verb|subtitle|} command.  Do not insert line breaks in your title.

If your title is lengthy, you must define a short version to be used
in the page headers, to prevent overlapping text. The \verb|title|
command has a ``short title'' parameter:
\begin{verbatim}
  \title[short title]{full title}
\end{verbatim}

\section{Authors and Affiliations}

Each author must be defined separately for accurate metadata
identification. Multiple authors may share one affiliation. Authors'
names should not be abbreviated; use full first names wherever
possible. Include authors' e-mail addresses whenever possible.

Grouping authors' names or e-mail addresses, or providing an ``e-mail
alias,'' as shown below, is not acceptable:
\begin{verbatim}
  \author{Brooke Aster, David Mehldau}
  \email{dave,judy,steve@university.edu}
  \email{firstname.lastname@phillips.org}
\end{verbatim}

The \verb|authornote| and \verb|authornotemark| commands allow a note
to apply to multiple authors --- for example, if the first two authors
of an article contributed equally to the work.

If your author list is lengthy, you must define a shortened version of
the list of authors to be used in the page headers, to prevent
overlapping text. The following command should be placed just after
the last \verb|\author{}| definition:
\begin{verbatim}
  \renewcommand{\shortauthors}{McCartney, et al.}
\end{verbatim}
Omitting this command will force the use of a concatenated list of all
of the authors' names, which may result in overlapping text in the
page headers.

The article template's documentation, available at
\url{https://www.acm.org/publications/proceedings-template}, has a
complete explanation of these commands and tips for their effective
use.

Note that authors' addresses are mandatory for journal articles.

\section{Rights Information}

Authors of any work published by ACM will need to complete a rights
form. Depending on the kind of work, and the rights management choice
made by the author, this may be copyright transfer, permission,
license, or an OA (open access) agreement.

Regardless of the rights management choice, the author will receive a
copy of the completed rights form once it has been submitted. This
form contains \LaTeX\ commands that must be copied into the source
document. When the document source is compiled, these commands and
their parameters add formatted text to several areas of the final
document:
\begin{itemize}
\item the ``ACM Reference Format'' text on the first page.
\item the ``rights management'' text on the first page.
\item the conference information in the page header(s).
\end{itemize}

Rights information is unique to the work; if you are preparing several
works for an event, make sure to use the correct set of commands with
each of the works.

The ACM Reference Format text is required for all articles over one
page in length, and is optional for one-page articles (abstracts).

\section{CCS Concepts and User-Defined Keywords}

Two elements of the ``acmart'' document class provide powerful
taxonomic tools for you to help readers find your work in an online
search.

The ACM Computing Classification System ---
\url{https://www.acm.org/publications/class-2012} --- is a set of
classifiers and concepts that describe the computing
discipline. Authors can select entries from this classification
system, via \url{https://dl.acm.org/ccs/ccs.cfm}, and generate the
commands to be included in the \LaTeX\ source.

User-defined keywords are a comma-separated list of words and phrases
of the authors' choosing, providing a more flexible way of describing
the research being presented.

CCS concepts and user-defined keywords are required for for all
articles over two pages in length, and are optional for one- and
two-page articles (or abstracts).

\section{Sectioning Commands}

Your work should use standard \LaTeX\ sectioning commands:
\verb|section|, \verb|subsection|, \verb|subsubsection|, and
\verb|paragraph|. They should be numbered; do not remove the numbering
from the commands.

Simulating a sectioning command by setting the first word or words of
a paragraph in boldface or italicized text is {\bfseries not allowed.}

\section{Tables}

The ``\verb|acmart|'' document class includes the ``\verb|booktabs|''
package --- \url{https://ctan.org/pkg/booktabs} --- for preparing
high-quality tables.

Table captions are placed {\itshape above} the table.

Because tables cannot be split across pages, the best placement for
them is typically the top of the page nearest their initial cite.  To
ensure this proper ``floating'' placement of tables, use the
environment \textbf{table} to enclose the table's contents and the
table caption.  The contents of the table itself must go in the
\textbf{tabular} environment, to be aligned properly in rows and
columns, with the desired horizontal and vertical rules.  Again,
detailed instructions on \textbf{tabular} material are found in the
\textit{\LaTeX\ User's Guide}.

Immediately following this sentence is the point at which
Table~\ref{tab:freq} is included in the input file; compare the
placement of the table here with the table in the printed output of
this document.

\begin{table}
  \caption{Frequency of Special Characters}
  \label{tab:freq}
  \begin{tabular}{ccl}
    \toprule
    Non-English or Math&Frequency&Comments\\
    \midrule
    \O & 1 in 1,000& For Swedish names\\
    $\pi$ & 1 in 5& Common in math\\
    \$ & 4 in 5 & Used in business\\
    $\Psi^2_1$ & 1 in 40,000& Unexplained usage\\
  \bottomrule
\end{tabular}
\end{table}

To set a wider table, which takes up the whole width of the page's
live area, use the environment \textbf{table*} to enclose the table's
contents and the table caption.  As with a single-column table, this
wide table will ``float'' to a location deemed more
desirable. Immediately following this sentence is the point at which
Table~\ref{tab:commands} is included in the input file; again, it is
instructive to compare the placement of the table here with the table
in the printed output of this document.

\begin{table*}
  \caption{Some Typical Commands}
  \label{tab:commands}
  \begin{tabular}{ccl}
    \toprule
    Command &A Number & Comments\\
    \midrule
    \texttt{{\char'134}author} & 100& Author \\
    \texttt{{\char'134}table}& 300 & For tables\\
    \texttt{{\char'134}table*}& 400& For wider tables\\
    \bottomrule
  \end{tabular}
\end{table*}

Always use midrule to separate table header rows from data rows, and
use it only for this purpose. This enables assistive technologies to
recognise table headers and support their users in navigating tables
more easily.

\section{Math Equations}
You may want to display math equations in three distinct styles:
inline, numbered or non-numbered display.  Each of the three are
discussed in the next sections.

\subsection{Inline (In-text) Equations}
A formula that appears in the running text is called an inline or
in-text formula.  It is produced by the \textbf{math} environment,
which can be invoked with the usual
\texttt{{\char'134}begin\,\ldots{\char'134}end} construction or with
the short form \texttt{\$\,\ldots\$}. You can use any of the symbols
and structures, from $\alpha$ to $\omega$, available in
\LaTeX~\cite{Lamport:LaTeX}; this section will simply show a few
examples of in-text equations in context. Notice how this equation:
\begin{math}
  \lim_{n\rightarrow \infty}x=0
\end{math},
set here in in-line math style, looks slightly different when
set in display style.  (See next section).

\subsection{Display Equations}
A numbered display equation---one set off by vertical space from the
text and centered horizontally---is produced by the \textbf{equation}
environment. An unnumbered display equation is produced by the
\textbf{displaymath} environment.

Again, in either environment, you can use any of the symbols and
structures available in \LaTeX\@; this section will just give a couple
of examples of display equations in context.  First, consider the
equation, shown as an inline equation above:
\begin{equation}
  \lim_{n\rightarrow \infty}x=0
\end{equation}
Notice how it is formatted somewhat differently in
the \textbf{displaymath}
environment.  Now, we'll enter an unnumbered equation:
\begin{displaymath}
  \sum_{i=0}^{\infty} x + 1
\end{displaymath}
and follow it with another numbered equation:
\begin{equation}
  \sum_{i=0}^{\infty}x_i=\int_{0}^{\pi+2} f
\end{equation}
just to demonstrate \LaTeX's able handling of numbering.

\section{Figures}

The ``\verb|figure|'' environment should be used for figures. One or
more images can be placed within a figure. If your figure contains
third-party material, you must clearly identify it as such, as shown
in the example below.
\begin{figure}[h]
  \centering
  \includegraphics[width=\linewidth]{sample-franklin}
  \caption{1907 Franklin Model D roadster. Photograph by Harris \&
    Ewing, Inc. [Public domain], via Wikimedia
    Commons. (\url{https://goo.gl/VLCRBB}).}
  \Description{A woman and a girl in white dresses sit in an open car.}
\end{figure}

Your figures should contain a caption which describes the figure to
the reader.

Figure captions are placed {\itshape below} the figure.

Every figure should also have a figure description unless it is purely
decorative. These descriptions convey what’s in the image to someone
who cannot see it. They are also used by search engine crawlers for
indexing images, and when images cannot be loaded.

A figure description must be unformatted plain text less than 2000
characters long (including spaces).  {\bfseries Figure descriptions
  should not repeat the figure caption – their purpose is to capture
  important information that is not already provided in the caption or
  the main text of the paper.} For figures that convey important and
complex new information, a short text description may not be
adequate. More complex alternative descriptions can be placed in an
appendix and referenced in a short figure description. For example,
provide a data table capturing the information in a bar chart, or a
structured list representing a graph.  For additional information
regarding how best to write figure descriptions and why doing this is
so important, please see
\url{https://www.acm.org/publications/taps/describing-figures/}.

\subsection{The ``Teaser Figure''}

A ``teaser figure'' is an image, or set of images in one figure, that
are placed after all author and affiliation information, and before
the body of the article, spanning the page. If you wish to have such a
figure in your article, place the command immediately before the
\verb|\maketitle| command:
\begin{verbatim}
  \begin{teaserfigure}
    \includegraphics[width=\textwidth]{sampleteaser}
    \caption{figure caption}
    \Description{figure description}
  \end{teaserfigure}
\end{verbatim}

\section{Citations and Bibliographies}

The use of \BibTeX\ for the preparation and formatting of one's
references is strongly recommended. Authors' names should be complete
--- use full first names (``Donald E. Knuth'') not initials
(``D. E. Knuth'') --- and the salient identifying features of a
reference should be included: title, year, volume, number, pages,
article DOI, etc.

The bibliography is included in your source document with these two
commands, placed just before the \verb|\end{document}| command:
\begin{verbatim}
  \bibliographystyle{ACM-Reference-Format}
  
\section{Introduction}
ACM's consolidated article template, introduced in 2017, provides a
consistent \LaTeX\ style for use across ACM publications, and
incorporates accessibility and metadata-extraction functionality
necessary for future Digital Library endeavors. Numerous ACM and
SIG-specific \LaTeX\ templates have been examined, and their unique
features incorporated into this single new template.

If you are new to publishing with ACM, this document is a valuable
guide to the process of preparing your work for publication. If you
have published with ACM before, this document provides insight and
instruction into more recent changes to the article template.

The ``\verb|acmart|'' document class can be used to prepare articles
for any ACM publication --- conference or journal, and for any stage
of publication, from review to final ``camera-ready'' copy, to the
author's own version, with {\itshape very} few changes to the source.

\section{Template Overview}
As noted in the introduction, the ``\verb|acmart|'' document class can
be used to prepare many different kinds of documentation --- a
double-blind initial submission of a full-length technical paper, a
two-page SIGGRAPH Emerging Technologies abstract, a ``camera-ready''
journal article, a SIGCHI Extended Abstract, and more --- all by
selecting the appropriate {\itshape template style} and {\itshape
  template parameters}.

This document will explain the major features of the document
class. For further information, the {\itshape \LaTeX\ User's Guide} is
available from
\url{https://www.acm.org/publications/proceedings-template}.

\subsection{Template Styles}

The primary parameter given to the ``\verb|acmart|'' document class is
the {\itshape template style} which corresponds to the kind of publication
or SIG publishing the work. This parameter is enclosed in square
brackets and is a part of the {\verb|documentclass|} command:
\begin{verbatim}
  \documentclass[STYLE]{acmart}
\end{verbatim}

Journals use one of three template styles. All but three ACM journals
use the {\verb|acmsmall|} template style:
\begin{itemize}
\item {\texttt{acmsmall}}: The default journal template style.
\item {\texttt{acmlarge}}: Used by JOCCH and TAP.
\item {\texttt{acmtog}}: Used by TOG.
\end{itemize}

The majority of conference proceedings documentation will use the {\verb|acmconf|} template style.
\begin{itemize}
\item {\texttt{acmconf}}: The default proceedings template style.
\item{\texttt{sigchi}}: Used for SIGCHI conference articles.
\item{\texttt{sigchi-a}}: Used for SIGCHI ``Extended Abstract'' articles.
\item{\texttt{sigplan}}: Used for SIGPLAN conference articles.
\end{itemize}

\subsection{Template Parameters}

In addition to specifying the {\itshape template style} to be used in
formatting your work, there are a number of {\itshape template parameters}
which modify some part of the applied template style. A complete list
of these parameters can be found in the {\itshape \LaTeX\ User's Guide.}

Frequently-used parameters, or combinations of parameters, include:
\begin{itemize}
\item {\texttt{anonymous,review}}: Suitable for a ``double-blind''
  conference submission. Anonymizes the work and includes line
  numbers. Use with the \texttt{\acmSubmissionID} command to print the
  submission's unique ID on each page of the work.
\item{\texttt{authorversion}}: Produces a version of the work suitable
  for posting by the author.
\item{\texttt{screen}}: Produces colored hyperlinks.
\end{itemize}

This document uses the following string as the first command in the
source file:
\begin{verbatim}
\documentclass[sigconf, language=french,
language=german, language=spanish, language=english]{acmart}
\end{verbatim}

\section{Modifications}

Modifying the template --- including but not limited to: adjusting
margins, typeface sizes, line spacing, paragraph and list definitions,
and the use of the \verb|\vspace| command to manually adjust the
vertical spacing between elements of your work --- is not allowed.

{\bfseries Your document will be returned to you for revision if
  modifications are discovered.}

\section{Typefaces}

The ``\verb|acmart|'' document class requires the use of the
``Libertine'' typeface family. Your \TeX\ installation should include
this set of packages. Please do not substitute other typefaces. The
``\verb|lmodern|'' and ``\verb|ltimes|'' packages should not be used,
as they will override the built-in typeface families.

\section{Title Information}

The title of your work should use capital letters appropriately -
\url{https://capitalizemytitle.com/} has useful rules for
capitalization. Use the {\verb|title|} command to define the title of
your work. If your work has a subtitle, define it with the
{\verb|subtitle|} command.  Do not insert line breaks in your title.

If your title is lengthy, you must define a short version to be used
in the page headers, to prevent overlapping text. The \verb|title|
command has a ``short title'' parameter:
\begin{verbatim}
  \title[short title]{full title}
\end{verbatim}

\section{Authors and Affiliations}

Each author must be defined separately for accurate metadata
identification.  As an exception, multiple authors may share one
affiliation. Authors' names should not be abbreviated; use full first
names wherever possible. Include authors' e-mail addresses whenever
possible.

Grouping authors' names or e-mail addresses, or providing an ``e-mail
alias,'' as shown below, is not acceptable:
\begin{verbatim}
  \author{Brooke Aster, David Mehldau}
  \email{dave,judy,steve@university.edu}
  \email{firstname.lastname@phillips.org}
\end{verbatim}

The \verb|authornote| and \verb|authornotemark| commands allow a note
to apply to multiple authors --- for example, if the first two authors
of an article contributed equally to the work.

If your author list is lengthy, you must define a shortened version of
the list of authors to be used in the page headers, to prevent
overlapping text. The following command should be placed just after
the last \verb|\author{}| definition:
\begin{verbatim}
  \renewcommand{\shortauthors}{McCartney, et al.}
\end{verbatim}
Omitting this command will force the use of a concatenated list of all
of the authors' names, which may result in overlapping text in the
page headers.

The article template's documentation, available at
\url{https://www.acm.org/publications/proceedings-template}, has a
complete explanation of these commands and tips for their effective
use.

Note that authors' addresses are mandatory for journal articles.

\section{Rights Information}

Authors of any work published by ACM will need to complete a rights
form. Depending on the kind of work, and the rights management choice
made by the author, this may be copyright transfer, permission,
license, or an OA (open access) agreement.

Regardless of the rights management choice, the author will receive a
copy of the completed rights form once it has been submitted. This
form contains \LaTeX\ commands that must be copied into the source
document. When the document source is compiled, these commands and
their parameters add formatted text to several areas of the final
document:
\begin{itemize}
\item the ``ACM Reference Format'' text on the first page.
\item the ``rights management'' text on the first page.
\item the conference information in the page header(s).
\end{itemize}

Rights information is unique to the work; if you are preparing several
works for an event, make sure to use the correct set of commands with
each of the works.

The ACM Reference Format text is required for all articles over one
page in length, and is optional for one-page articles (abstracts).

\section{CCS Concepts and User-Defined Keywords}

Two elements of the ``acmart'' document class provide powerful
taxonomic tools for you to help readers find your work in an online
search.

The ACM Computing Classification System ---
\url{https://www.acm.org/publications/class-2012} --- is a set of
classifiers and concepts that describe the computing
discipline. Authors can select entries from this classification
system, via \url{https://dl.acm.org/ccs/ccs.cfm}, and generate the
commands to be included in the \LaTeX\ source.

User-defined keywords are a comma-separated list of words and phrases
of the authors' choosing, providing a more flexible way of describing
the research being presented.

CCS concepts and user-defined keywords are required for for all
articles over two pages in length, and are optional for one- and
two-page articles (or abstracts).

\section{Sectioning Commands}

Your work should use standard \LaTeX\ sectioning commands:
\verb|section|, \verb|subsection|, \verb|subsubsection|, and
\verb|paragraph|. They should be numbered; do not remove the numbering
from the commands.

Simulating a sectioning command by setting the first word or words of
a paragraph in boldface or italicized text is {\bfseries not allowed.}

\section{Tables}

The ``\verb|acmart|'' document class includes the ``\verb|booktabs|''
package --- \url{https://ctan.org/pkg/booktabs} --- for preparing
high-quality tables.

Table captions are placed {\itshape above} the table.

Because tables cannot be split across pages, the best placement for
them is typically the top of the page nearest their initial cite.  To
ensure this proper ``floating'' placement of tables, use the
environment \textbf{table} to enclose the table's contents and the
table caption.  The contents of the table itself must go in the
\textbf{tabular} environment, to be aligned properly in rows and
columns, with the desired horizontal and vertical rules.  Again,
detailed instructions on \textbf{tabular} material are found in the
\textit{\LaTeX\ User's Guide}.

Immediately following this sentence is the point at which
Table~\ref{tab:freq} is included in the input file; compare the
placement of the table here with the table in the printed output of
this document.

\begin{table}
  \caption{Frequency of Special Characters}
  \label{tab:freq}
  \begin{tabular}{ccl}
    \toprule
    Non-English or Math&Frequency&Comments\\
    \midrule
    \O & 1 in 1,000& For Swedish names\\
    $\pi$ & 1 in 5& Common in math\\
    \$ & 4 in 5 & Used in business\\
    $\Psi^2_1$ & 1 in 40,000& Unexplained usage\\
  \bottomrule
\end{tabular}
\end{table}

To set a wider table, which takes up the whole width of the page's
live area, use the environment \textbf{table*} to enclose the table's
contents and the table caption.  As with a single-column table, this
wide table will ``float'' to a location deemed more
desirable. Immediately following this sentence is the point at which
Table~\ref{tab:commands} is included in the input file; again, it is
instructive to compare the placement of the table here with the table
in the printed output of this document.

\begin{table*}
  \caption{Some Typical Commands}
  \label{tab:commands}
  \begin{tabular}{ccl}
    \toprule
    Command &A Number & Comments\\
    \midrule
    \texttt{{\char'134}author} & 100& Author \\
    \texttt{{\char'134}table}& 300 & For tables\\
    \texttt{{\char'134}table*}& 400& For wider tables\\
    \bottomrule
  \end{tabular}
\end{table*}

Always use midrule to separate table header rows from data rows, and
use it only for this purpose. This enables assistive technologies to
recognise table headers and support their users in navigating tables
more easily.

\section{Math Equations}
You may want to display math equations in three distinct styles:
inline, numbered or non-numbered display.  Each of the three are
discussed in the next sections.

\subsection{Inline (In-text) Equations}
A formula that appears in the running text is called an inline or
in-text formula.  It is produced by the \textbf{math} environment,
which can be invoked with the usual
\texttt{{\char'134}begin\,\ldots{\char'134}end} construction or with
the short form \texttt{\$\,\ldots\$}. You can use any of the symbols
and structures, from $\alpha$ to $\omega$, available in
\LaTeX~\cite{Lamport:LaTeX}; this section will simply show a few
examples of in-text equations in context. Notice how this equation:
\begin{math}
  \lim_{n\rightarrow \infty}x=0
\end{math},
set here in in-line math style, looks slightly different when
set in display style.  (See next section).

\subsection{Display Equations}
A numbered display equation---one set off by vertical space from the
text and centered horizontally---is produced by the \textbf{equation}
environment. An unnumbered display equation is produced by the
\textbf{displaymath} environment.

Again, in either environment, you can use any of the symbols and
structures available in \LaTeX\@; this section will just give a couple
of examples of display equations in context.  First, consider the
equation, shown as an inline equation above:
\begin{equation}
  \lim_{n\rightarrow \infty}x=0
\end{equation}
Notice how it is formatted somewhat differently in
the \textbf{displaymath}
environment.  Now, we'll enter an unnumbered equation:
\begin{displaymath}
  \sum_{i=0}^{\infty} x + 1
\end{displaymath}
and follow it with another numbered equation:
\begin{equation}
  \sum_{i=0}^{\infty}x_i=\int_{0}^{\pi+2} f
\end{equation}
just to demonstrate \LaTeX's able handling of numbering.

\section{Figures}

The ``\verb|figure|'' environment should be used for figures. One or
more images can be placed within a figure. If your figure contains
third-party material, you must clearly identify it as such, as shown
in the example below.
\begin{figure}[h]
  \centering
  \includegraphics[width=\linewidth]{sample-franklin}
  \caption{1907 Franklin Model D roadster. Photograph by Harris \&
    Ewing, Inc. [Public domain], via Wikimedia
    Commons. (\url{https://goo.gl/VLCRBB}).}
  \Description{A woman and a girl in white dresses sit in an open car.}
\end{figure}

Your figures should contain a caption which describes the figure to
the reader.

Figure captions are placed {\itshape below} the figure.

Every figure should also have a figure description unless it is purely
decorative. These descriptions convey what’s in the image to someone
who cannot see it. They are also used by search engine crawlers for
indexing images, and when images cannot be loaded.

A figure description must be unformatted plain text less than 2000
characters long (including spaces).  {\bfseries Figure descriptions
  should not repeat the figure caption – their purpose is to capture
  important information that is not already provided in the caption or
  the main text of the paper.} For figures that convey important and
complex new information, a short text description may not be
adequate. More complex alternative descriptions can be placed in an
appendix and referenced in a short figure description. For example,
provide a data table capturing the information in a bar chart, or a
structured list representing a graph.  For additional information
regarding how best to write figure descriptions and why doing this is
so important, please see
\url{https://www.acm.org/publications/taps/describing-figures/}.

\subsection{The ``Teaser Figure''}

A ``teaser figure'' is an image, or set of images in one figure, that
are placed after all author and affiliation information, and before
the body of the article, spanning the page. If you wish to have such a
figure in your article, place the command immediately before the
\verb|\maketitle| command:
\begin{verbatim}
  \begin{teaserfigure}
    \includegraphics[width=\textwidth]{sampleteaser}
    \caption{figure caption}
    \Description{figure description}
  \end{teaserfigure}
\end{verbatim}

\section{Citations and Bibliographies}

The use of \BibTeX\ for the preparation and formatting of one's
references is strongly recommended. Authors' names should be complete
--- use full first names (``Donald E. Knuth'') not initials
(``D. E. Knuth'') --- and the salient identifying features of a
reference should be included: title, year, volume, number, pages,
article DOI, etc.

The bibliography is included in your source document with these two
commands, placed just before the \verb|\end{document}| command:
\begin{verbatim}
  \bibliographystyle{ACM-Reference-Format}
  
\section{Introduction}
ACM's consolidated article template, introduced in 2017, provides a
consistent \LaTeX\ style for use across ACM publications, and
incorporates accessibility and metadata-extraction functionality
necessary for future Digital Library endeavors. Numerous ACM and
SIG-specific \LaTeX\ templates have been examined, and their unique
features incorporated into this single new template.

If you are new to publishing with ACM, this document is a valuable
guide to the process of preparing your work for publication. If you
have published with ACM before, this document provides insight and
instruction into more recent changes to the article template.

The ``\verb|acmart|'' document class can be used to prepare articles
for any ACM publication --- conference or journal, and for any stage
of publication, from review to final ``camera-ready'' copy, to the
author's own version, with {\itshape very} few changes to the source.

\section{Template Overview}
As noted in the introduction, the ``\verb|acmart|'' document class can
be used to prepare many different kinds of documentation --- a
double-blind initial submission of a full-length technical paper, a
two-page SIGGRAPH Emerging Technologies abstract, a ``camera-ready''
journal article, a SIGCHI Extended Abstract, and more --- all by
selecting the appropriate {\itshape template style} and {\itshape
  template parameters}.

This document will explain the major features of the document
class. For further information, the {\itshape \LaTeX\ User's Guide} is
available from
\url{https://www.acm.org/publications/proceedings-template}.

\subsection{Template Styles}

The primary parameter given to the ``\verb|acmart|'' document class is
the {\itshape template style} which corresponds to the kind of publication
or SIG publishing the work. This parameter is enclosed in square
brackets and is a part of the {\verb|documentclass|} command:
\begin{verbatim}
  \documentclass[STYLE]{acmart}
\end{verbatim}

Journals use one of three template styles. All but three ACM journals
use the {\verb|acmsmall|} template style:
\begin{itemize}
\item {\verb|acmsmall|}: The default journal template style.
\item {\verb|acmlarge|}: Used by JOCCH and TAP.
\item {\verb|acmtog|}: Used by TOG.
\end{itemize}

The majority of conference proceedings documentation will use the {\verb|acmconf|} template style.
\begin{itemize}
\item {\verb|acmconf|}: The default proceedings template style.
\item{\verb|sigchi|}: Used for SIGCHI conference articles.
\item{\verb|sigchi-a|}: Used for SIGCHI ``Extended Abstract'' articles.
\item{\verb|sigplan|}: Used for SIGPLAN conference articles.
\end{itemize}

\subsection{Template Parameters}

In addition to specifying the {\itshape template style} to be used in
formatting your work, there are a number of {\itshape template parameters}
which modify some part of the applied template style. A complete list
of these parameters can be found in the {\itshape \LaTeX\ User's Guide.}

Frequently-used parameters, or combinations of parameters, include:
\begin{itemize}
\item {\verb|anonymous,review|}: Suitable for a ``double-blind''
  conference submission. Anonymizes the work and includes line
  numbers. Use with the \verb|\acmSubmissionID| command to print the
  submission's unique ID on each page of the work.
\item{\verb|authorversion|}: Produces a version of the work suitable
  for posting by the author.
\item{\verb|screen|}: Produces colored hyperlinks.
\end{itemize}

This document uses the following string as the first command in the
source file:
\begin{verbatim}
\documentclass[sigconf]{acmart}
\end{verbatim}

\section{Modifications}

Modifying the template --- including but not limited to: adjusting
margins, typeface sizes, line spacing, paragraph and list definitions,
and the use of the \verb|\vspace| command to manually adjust the
vertical spacing between elements of your work --- is not allowed.

{\bfseries Your document will be returned to you for revision if
  modifications are discovered.}

\section{Typefaces}

The ``\verb|acmart|'' document class requires the use of the
``Libertine'' typeface family. Your \TeX\ installation should include
this set of packages. Please do not substitute other typefaces. The
``\verb|lmodern|'' and ``\verb|ltimes|'' packages should not be used,
as they will override the built-in typeface families.

\section{Title Information}

The title of your work should use capital letters appropriately -
\url{https://capitalizemytitle.com/} has useful rules for
capitalization. Use the {\verb|title|} command to define the title of
your work. If your work has a subtitle, define it with the
{\verb|subtitle|} command.  Do not insert line breaks in your title.

If your title is lengthy, you must define a short version to be used
in the page headers, to prevent overlapping text. The \verb|title|
command has a ``short title'' parameter:
\begin{verbatim}
  \title[short title]{full title}
\end{verbatim}

\section{Authors and Affiliations}

Each author must be defined separately for accurate metadata
identification. Multiple authors may share one affiliation. Authors'
names should not be abbreviated; use full first names wherever
possible. Include authors' e-mail addresses whenever possible.

Grouping authors' names or e-mail addresses, or providing an ``e-mail
alias,'' as shown below, is not acceptable:
\begin{verbatim}
  \author{Brooke Aster, David Mehldau}
  \email{dave,judy,steve@university.edu}
  \email{firstname.lastname@phillips.org}
\end{verbatim}

The \verb|authornote| and \verb|authornotemark| commands allow a note
to apply to multiple authors --- for example, if the first two authors
of an article contributed equally to the work.

If your author list is lengthy, you must define a shortened version of
the list of authors to be used in the page headers, to prevent
overlapping text. The following command should be placed just after
the last \verb|\author{}| definition:
\begin{verbatim}
  \renewcommand{\shortauthors}{McCartney, et al.}
\end{verbatim}
Omitting this command will force the use of a concatenated list of all
of the authors' names, which may result in overlapping text in the
page headers.

The article template's documentation, available at
\url{https://www.acm.org/publications/proceedings-template}, has a
complete explanation of these commands and tips for their effective
use.

Note that authors' addresses are mandatory for journal articles.

\section{Rights Information}

Authors of any work published by ACM will need to complete a rights
form. Depending on the kind of work, and the rights management choice
made by the author, this may be copyright transfer, permission,
license, or an OA (open access) agreement.

Regardless of the rights management choice, the author will receive a
copy of the completed rights form once it has been submitted. This
form contains \LaTeX\ commands that must be copied into the source
document. When the document source is compiled, these commands and
their parameters add formatted text to several areas of the final
document:
\begin{itemize}
\item the ``ACM Reference Format'' text on the first page.
\item the ``rights management'' text on the first page.
\item the conference information in the page header(s).
\end{itemize}

Rights information is unique to the work; if you are preparing several
works for an event, make sure to use the correct set of commands with
each of the works.

The ACM Reference Format text is required for all articles over one
page in length, and is optional for one-page articles (abstracts).

\section{CCS Concepts and User-Defined Keywords}

Two elements of the ``acmart'' document class provide powerful
taxonomic tools for you to help readers find your work in an online
search.

The ACM Computing Classification System ---
\url{https://www.acm.org/publications/class-2012} --- is a set of
classifiers and concepts that describe the computing
discipline. Authors can select entries from this classification
system, via \url{https://dl.acm.org/ccs/ccs.cfm}, and generate the
commands to be included in the \LaTeX\ source.

User-defined keywords are a comma-separated list of words and phrases
of the authors' choosing, providing a more flexible way of describing
the research being presented.

CCS concepts and user-defined keywords are required for for all
articles over two pages in length, and are optional for one- and
two-page articles (or abstracts).

\section{Sectioning Commands}

Your work should use standard \LaTeX\ sectioning commands:
\verb|section|, \verb|subsection|, \verb|subsubsection|, and
\verb|paragraph|. They should be numbered; do not remove the numbering
from the commands.

Simulating a sectioning command by setting the first word or words of
a paragraph in boldface or italicized text is {\bfseries not allowed.}

\section{Tables}

The ``\verb|acmart|'' document class includes the ``\verb|booktabs|''
package --- \url{https://ctan.org/pkg/booktabs} --- for preparing
high-quality tables.

Table captions are placed {\itshape above} the table.

Because tables cannot be split across pages, the best placement for
them is typically the top of the page nearest their initial cite.  To
ensure this proper ``floating'' placement of tables, use the
environment \textbf{table} to enclose the table's contents and the
table caption.  The contents of the table itself must go in the
\textbf{tabular} environment, to be aligned properly in rows and
columns, with the desired horizontal and vertical rules.  Again,
detailed instructions on \textbf{tabular} material are found in the
\textit{\LaTeX\ User's Guide}.

Immediately following this sentence is the point at which
Table~\ref{tab:freq} is included in the input file; compare the
placement of the table here with the table in the printed output of
this document.

\begin{table}
  \caption{Frequency of Special Characters}
  \label{tab:freq}
  \begin{tabular}{ccl}
    \toprule
    Non-English or Math&Frequency&Comments\\
    \midrule
    \O & 1 in 1,000& For Swedish names\\
    $\pi$ & 1 in 5& Common in math\\
    \$ & 4 in 5 & Used in business\\
    $\Psi^2_1$ & 1 in 40,000& Unexplained usage\\
  \bottomrule
\end{tabular}
\end{table}

To set a wider table, which takes up the whole width of the page's
live area, use the environment \textbf{table*} to enclose the table's
contents and the table caption.  As with a single-column table, this
wide table will ``float'' to a location deemed more
desirable. Immediately following this sentence is the point at which
Table~\ref{tab:commands} is included in the input file; again, it is
instructive to compare the placement of the table here with the table
in the printed output of this document.

\begin{table*}
  \caption{Some Typical Commands}
  \label{tab:commands}
  \begin{tabular}{ccl}
    \toprule
    Command &A Number & Comments\\
    \midrule
    \texttt{{\char'134}author} & 100& Author \\
    \texttt{{\char'134}table}& 300 & For tables\\
    \texttt{{\char'134}table*}& 400& For wider tables\\
    \bottomrule
  \end{tabular}
\end{table*}

Always use midrule to separate table header rows from data rows, and
use it only for this purpose. This enables assistive technologies to
recognise table headers and support their users in navigating tables
more easily.

\section{Math Equations}
You may want to display math equations in three distinct styles:
inline, numbered or non-numbered display.  Each of the three are
discussed in the next sections.

\subsection{Inline (In-text) Equations}
A formula that appears in the running text is called an inline or
in-text formula.  It is produced by the \textbf{math} environment,
which can be invoked with the usual
\texttt{{\char'134}begin\,\ldots{\char'134}end} construction or with
the short form \texttt{\$\,\ldots\$}. You can use any of the symbols
and structures, from $\alpha$ to $\omega$, available in
\LaTeX~\cite{Lamport:LaTeX}; this section will simply show a few
examples of in-text equations in context. Notice how this equation:
\begin{math}
  \lim_{n\rightarrow \infty}x=0
\end{math},
set here in in-line math style, looks slightly different when
set in display style.  (See next section).

\subsection{Display Equations}
A numbered display equation---one set off by vertical space from the
text and centered horizontally---is produced by the \textbf{equation}
environment. An unnumbered display equation is produced by the
\textbf{displaymath} environment.

Again, in either environment, you can use any of the symbols and
structures available in \LaTeX\@; this section will just give a couple
of examples of display equations in context.  First, consider the
equation, shown as an inline equation above:
\begin{equation}
  \lim_{n\rightarrow \infty}x=0
\end{equation}
Notice how it is formatted somewhat differently in
the \textbf{displaymath}
environment.  Now, we'll enter an unnumbered equation:
\begin{displaymath}
  \sum_{i=0}^{\infty} x + 1
\end{displaymath}
and follow it with another numbered equation:
\begin{equation}
  \sum_{i=0}^{\infty}x_i=\int_{0}^{\pi+2} f
\end{equation}
just to demonstrate \LaTeX's able handling of numbering.

\section{Figures}

The ``\verb|figure|'' environment should be used for figures. One or
more images can be placed within a figure. If your figure contains
third-party material, you must clearly identify it as such, as shown
in the example below.
\begin{figure}[h]
  \centering
  \includegraphics[width=\linewidth]{sample-franklin}
  \caption{1907 Franklin Model D roadster. Photograph by Harris \&
    Ewing, Inc. [Public domain], via Wikimedia
    Commons. (\url{https://goo.gl/VLCRBB}).}
  \Description{A woman and a girl in white dresses sit in an open car.}
\end{figure}

Your figures should contain a caption which describes the figure to
the reader.

Figure captions are placed {\itshape below} the figure.

Every figure should also have a figure description unless it is purely
decorative. These descriptions convey what’s in the image to someone
who cannot see it. They are also used by search engine crawlers for
indexing images, and when images cannot be loaded.

A figure description must be unformatted plain text less than 2000
characters long (including spaces).  {\bfseries Figure descriptions
  should not repeat the figure caption – their purpose is to capture
  important information that is not already provided in the caption or
  the main text of the paper.} For figures that convey important and
complex new information, a short text description may not be
adequate. More complex alternative descriptions can be placed in an
appendix and referenced in a short figure description. For example,
provide a data table capturing the information in a bar chart, or a
structured list representing a graph.  For additional information
regarding how best to write figure descriptions and why doing this is
so important, please see
\url{https://www.acm.org/publications/taps/describing-figures/}.

\subsection{The ``Teaser Figure''}

A ``teaser figure'' is an image, or set of images in one figure, that
are placed after all author and affiliation information, and before
the body of the article, spanning the page. If you wish to have such a
figure in your article, place the command immediately before the
\verb|\maketitle| command:
\begin{verbatim}
  \begin{teaserfigure}
    \includegraphics[width=\textwidth]{sampleteaser}
    \caption{figure caption}
    \Description{figure description}
  \end{teaserfigure}
\end{verbatim}

\section{Citations and Bibliographies}

The use of \BibTeX\ for the preparation and formatting of one's
references is strongly recommended. Authors' names should be complete
--- use full first names (``Donald E. Knuth'') not initials
(``D. E. Knuth'') --- and the salient identifying features of a
reference should be included: title, year, volume, number, pages,
article DOI, etc.

The bibliography is included in your source document with these two
commands, placed just before the \verb|\end{document}| command:
\begin{verbatim}
  \bibliographystyle{ACM-Reference-Format}
  
\section{Introduction}
ACM's consolidated article template, introduced in 2017, provides a
consistent \LaTeX\ style for use across ACM publications, and
incorporates accessibility and metadata-extraction functionality
necessary for future Digital Library endeavors. Numerous ACM and
SIG-specific \LaTeX\ templates have been examined, and their unique
features incorporated into this single new template.

If you are new to publishing with ACM, this document is a valuable
guide to the process of preparing your work for publication. If you
have published with ACM before, this document provides insight and
instruction into more recent changes to the article template.

The ``\verb|acmart|'' document class can be used to prepare articles
for any ACM publication --- conference or journal, and for any stage
of publication, from review to final ``camera-ready'' copy, to the
author's own version, with {\itshape very} few changes to the source.

\section{Template Overview}
As noted in the introduction, the ``\verb|acmart|'' document class can
be used to prepare many different kinds of documentation --- a
double-blind initial submission of a full-length technical paper, a
two-page SIGGRAPH Emerging Technologies abstract, a ``camera-ready''
journal article, a SIGCHI Extended Abstract, and more --- all by
selecting the appropriate {\itshape template style} and {\itshape
  template parameters}.

This document will explain the major features of the document
class. For further information, the {\itshape \LaTeX\ User's Guide} is
available from
\url{https://www.acm.org/publications/proceedings-template}.

\subsection{Template Styles}

The primary parameter given to the ``\verb|acmart|'' document class is
the {\itshape template style} which corresponds to the kind of publication
or SIG publishing the work. This parameter is enclosed in square
brackets and is a part of the {\verb|documentclass|} command:
\begin{verbatim}
  \documentclass[STYLE]{acmart}
\end{verbatim}

Journals use one of three template styles. All but three ACM journals
use the {\verb|acmsmall|} template style:
\begin{itemize}
\item {\verb|acmsmall|}: The default journal template style.
\item {\verb|acmlarge|}: Used by JOCCH and TAP.
\item {\verb|acmtog|}: Used by TOG.
\end{itemize}

The majority of conference proceedings documentation will use the {\verb|acmconf|} template style.
\begin{itemize}
\item {\verb|acmconf|}: The default proceedings template style.
\item{\verb|sigchi|}: Used for SIGCHI conference articles.
\item{\verb|sigchi-a|}: Used for SIGCHI ``Extended Abstract'' articles.
\item{\verb|sigplan|}: Used for SIGPLAN conference articles.
\end{itemize}

\subsection{Template Parameters}

In addition to specifying the {\itshape template style} to be used in
formatting your work, there are a number of {\itshape template parameters}
which modify some part of the applied template style. A complete list
of these parameters can be found in the {\itshape \LaTeX\ User's Guide.}

Frequently-used parameters, or combinations of parameters, include:
\begin{itemize}
\item {\verb|anonymous,review|}: Suitable for a ``double-blind''
  conference submission. Anonymizes the work and includes line
  numbers. Use with the \verb|\acmSubmissionID| command to print the
  submission's unique ID on each page of the work.
\item{\verb|authorversion|}: Produces a version of the work suitable
  for posting by the author.
\item{\verb|screen|}: Produces colored hyperlinks.
\end{itemize}

This document uses the following string as the first command in the
source file:
\begin{verbatim}
\documentclass[sigconf]{acmart}
\end{verbatim}

\section{Modifications}

Modifying the template --- including but not limited to: adjusting
margins, typeface sizes, line spacing, paragraph and list definitions,
and the use of the \verb|\vspace| command to manually adjust the
vertical spacing between elements of your work --- is not allowed.

{\bfseries Your document will be returned to you for revision if
  modifications are discovered.}

\section{Typefaces}

The ``\verb|acmart|'' document class requires the use of the
``Libertine'' typeface family. Your \TeX\ installation should include
this set of packages. Please do not substitute other typefaces. The
``\verb|lmodern|'' and ``\verb|ltimes|'' packages should not be used,
as they will override the built-in typeface families.

\section{Title Information}

The title of your work should use capital letters appropriately -
\url{https://capitalizemytitle.com/} has useful rules for
capitalization. Use the {\verb|title|} command to define the title of
your work. If your work has a subtitle, define it with the
{\verb|subtitle|} command.  Do not insert line breaks in your title.

If your title is lengthy, you must define a short version to be used
in the page headers, to prevent overlapping text. The \verb|title|
command has a ``short title'' parameter:
\begin{verbatim}
  \title[short title]{full title}
\end{verbatim}

\section{Authors and Affiliations}

Each author must be defined separately for accurate metadata
identification. Multiple authors may share one affiliation. Authors'
names should not be abbreviated; use full first names wherever
possible. Include authors' e-mail addresses whenever possible.

Grouping authors' names or e-mail addresses, or providing an ``e-mail
alias,'' as shown below, is not acceptable:
\begin{verbatim}
  \author{Brooke Aster, David Mehldau}
  \email{dave,judy,steve@university.edu}
  \email{firstname.lastname@phillips.org}
\end{verbatim}

The \verb|authornote| and \verb|authornotemark| commands allow a note
to apply to multiple authors --- for example, if the first two authors
of an article contributed equally to the work.

If your author list is lengthy, you must define a shortened version of
the list of authors to be used in the page headers, to prevent
overlapping text. The following command should be placed just after
the last \verb|\author{}| definition:
\begin{verbatim}
  \renewcommand{\shortauthors}{McCartney, et al.}
\end{verbatim}
Omitting this command will force the use of a concatenated list of all
of the authors' names, which may result in overlapping text in the
page headers.

The article template's documentation, available at
\url{https://www.acm.org/publications/proceedings-template}, has a
complete explanation of these commands and tips for their effective
use.

Note that authors' addresses are mandatory for journal articles.

\section{Rights Information}

Authors of any work published by ACM will need to complete a rights
form. Depending on the kind of work, and the rights management choice
made by the author, this may be copyright transfer, permission,
license, or an OA (open access) agreement.

Regardless of the rights management choice, the author will receive a
copy of the completed rights form once it has been submitted. This
form contains \LaTeX\ commands that must be copied into the source
document. When the document source is compiled, these commands and
their parameters add formatted text to several areas of the final
document:
\begin{itemize}
\item the ``ACM Reference Format'' text on the first page.
\item the ``rights management'' text on the first page.
\item the conference information in the page header(s).
\end{itemize}

Rights information is unique to the work; if you are preparing several
works for an event, make sure to use the correct set of commands with
each of the works.

The ACM Reference Format text is required for all articles over one
page in length, and is optional for one-page articles (abstracts).

\section{CCS Concepts and User-Defined Keywords}

Two elements of the ``acmart'' document class provide powerful
taxonomic tools for you to help readers find your work in an online
search.

The ACM Computing Classification System ---
\url{https://www.acm.org/publications/class-2012} --- is a set of
classifiers and concepts that describe the computing
discipline. Authors can select entries from this classification
system, via \url{https://dl.acm.org/ccs/ccs.cfm}, and generate the
commands to be included in the \LaTeX\ source.

User-defined keywords are a comma-separated list of words and phrases
of the authors' choosing, providing a more flexible way of describing
the research being presented.

CCS concepts and user-defined keywords are required for for all
articles over two pages in length, and are optional for one- and
two-page articles (or abstracts).

\section{Sectioning Commands}

Your work should use standard \LaTeX\ sectioning commands:
\verb|section|, \verb|subsection|, \verb|subsubsection|, and
\verb|paragraph|. They should be numbered; do not remove the numbering
from the commands.

Simulating a sectioning command by setting the first word or words of
a paragraph in boldface or italicized text is {\bfseries not allowed.}

\section{Tables}

The ``\verb|acmart|'' document class includes the ``\verb|booktabs|''
package --- \url{https://ctan.org/pkg/booktabs} --- for preparing
high-quality tables.

Table captions are placed {\itshape above} the table.

Because tables cannot be split across pages, the best placement for
them is typically the top of the page nearest their initial cite.  To
ensure this proper ``floating'' placement of tables, use the
environment \textbf{table} to enclose the table's contents and the
table caption.  The contents of the table itself must go in the
\textbf{tabular} environment, to be aligned properly in rows and
columns, with the desired horizontal and vertical rules.  Again,
detailed instructions on \textbf{tabular} material are found in the
\textit{\LaTeX\ User's Guide}.

Immediately following this sentence is the point at which
Table~\ref{tab:freq} is included in the input file; compare the
placement of the table here with the table in the printed output of
this document.

\begin{table}
  \caption{Frequency of Special Characters}
  \label{tab:freq}
  \begin{tabular}{ccl}
    \toprule
    Non-English or Math&Frequency&Comments\\
    \midrule
    \O & 1 in 1,000& For Swedish names\\
    $\pi$ & 1 in 5& Common in math\\
    \$ & 4 in 5 & Used in business\\
    $\Psi^2_1$ & 1 in 40,000& Unexplained usage\\
  \bottomrule
\end{tabular}
\end{table}

To set a wider table, which takes up the whole width of the page's
live area, use the environment \textbf{table*} to enclose the table's
contents and the table caption.  As with a single-column table, this
wide table will ``float'' to a location deemed more
desirable. Immediately following this sentence is the point at which
Table~\ref{tab:commands} is included in the input file; again, it is
instructive to compare the placement of the table here with the table
in the printed output of this document.

\begin{table*}
  \caption{Some Typical Commands}
  \label{tab:commands}
  \begin{tabular}{ccl}
    \toprule
    Command &A Number & Comments\\
    \midrule
    \texttt{{\char'134}author} & 100& Author \\
    \texttt{{\char'134}table}& 300 & For tables\\
    \texttt{{\char'134}table*}& 400& For wider tables\\
    \bottomrule
  \end{tabular}
\end{table*}

Always use midrule to separate table header rows from data rows, and
use it only for this purpose. This enables assistive technologies to
recognise table headers and support their users in navigating tables
more easily.

\section{Math Equations}
You may want to display math equations in three distinct styles:
inline, numbered or non-numbered display.  Each of the three are
discussed in the next sections.

\subsection{Inline (In-text) Equations}
A formula that appears in the running text is called an inline or
in-text formula.  It is produced by the \textbf{math} environment,
which can be invoked with the usual
\texttt{{\char'134}begin\,\ldots{\char'134}end} construction or with
the short form \texttt{\$\,\ldots\$}. You can use any of the symbols
and structures, from $\alpha$ to $\omega$, available in
\LaTeX~\cite{Lamport:LaTeX}; this section will simply show a few
examples of in-text equations in context. Notice how this equation:
\begin{math}
  \lim_{n\rightarrow \infty}x=0
\end{math},
set here in in-line math style, looks slightly different when
set in display style.  (See next section).

\subsection{Display Equations}
A numbered display equation---one set off by vertical space from the
text and centered horizontally---is produced by the \textbf{equation}
environment. An unnumbered display equation is produced by the
\textbf{displaymath} environment.

Again, in either environment, you can use any of the symbols and
structures available in \LaTeX\@; this section will just give a couple
of examples of display equations in context.  First, consider the
equation, shown as an inline equation above:
\begin{equation}
  \lim_{n\rightarrow \infty}x=0
\end{equation}
Notice how it is formatted somewhat differently in
the \textbf{displaymath}
environment.  Now, we'll enter an unnumbered equation:
\begin{displaymath}
  \sum_{i=0}^{\infty} x + 1
\end{displaymath}
and follow it with another numbered equation:
\begin{equation}
  \sum_{i=0}^{\infty}x_i=\int_{0}^{\pi+2} f
\end{equation}
just to demonstrate \LaTeX's able handling of numbering.

\section{Figures}

The ``\verb|figure|'' environment should be used for figures. One or
more images can be placed within a figure. If your figure contains
third-party material, you must clearly identify it as such, as shown
in the example below.
\begin{figure}[h]
  \centering
  \includegraphics[width=\linewidth]{sample-franklin}
  \caption{1907 Franklin Model D roadster. Photograph by Harris \&
    Ewing, Inc. [Public domain], via Wikimedia
    Commons. (\url{https://goo.gl/VLCRBB}).}
  \Description{A woman and a girl in white dresses sit in an open car.}
\end{figure}

Your figures should contain a caption which describes the figure to
the reader.

Figure captions are placed {\itshape below} the figure.

Every figure should also have a figure description unless it is purely
decorative. These descriptions convey what’s in the image to someone
who cannot see it. They are also used by search engine crawlers for
indexing images, and when images cannot be loaded.

A figure description must be unformatted plain text less than 2000
characters long (including spaces).  {\bfseries Figure descriptions
  should not repeat the figure caption – their purpose is to capture
  important information that is not already provided in the caption or
  the main text of the paper.} For figures that convey important and
complex new information, a short text description may not be
adequate. More complex alternative descriptions can be placed in an
appendix and referenced in a short figure description. For example,
provide a data table capturing the information in a bar chart, or a
structured list representing a graph.  For additional information
regarding how best to write figure descriptions and why doing this is
so important, please see
\url{https://www.acm.org/publications/taps/describing-figures/}.

\subsection{The ``Teaser Figure''}

A ``teaser figure'' is an image, or set of images in one figure, that
are placed after all author and affiliation information, and before
the body of the article, spanning the page. If you wish to have such a
figure in your article, place the command immediately before the
\verb|\maketitle| command:
\begin{verbatim}
  \begin{teaserfigure}
    \includegraphics[width=\textwidth]{sampleteaser}
    \caption{figure caption}
    \Description{figure description}
  \end{teaserfigure}
\end{verbatim}

\section{Citations and Bibliographies}

The use of \BibTeX\ for the preparation and formatting of one's
references is strongly recommended. Authors' names should be complete
--- use full first names (``Donald E. Knuth'') not initials
(``D. E. Knuth'') --- and the salient identifying features of a
reference should be included: title, year, volume, number, pages,
article DOI, etc.

The bibliography is included in your source document with these two
commands, placed just before the \verb|\end{document}| command:
\begin{verbatim}
  \bibliographystyle{ACM-Reference-Format}
  
\subsection{Library Terminology}

This section briefly explains the main terminology used in our library. 

\begin{itemize}
    \item A \textbf{sensitive attribute} is an attribute that partitions the population into groups with unequal benefits received.

    \item A \textbf{protected group} (or simply group) is created by partitioning the population by one or many sensitive attributes.

    \item A \textbf{privileged value} of a sensitive attribute is a value that gives more benefit to a protected group, which includes it, than to protected groups, which do not include it.

    \item A \textbf{subgroup} is created by splitting a protected group by privileges and disprivileged values.

    \item A \textbf{group metric} is a metric that shows the relation between privileged and disprivileged subgroups created based on one or many sensitive attributes.
\end{itemize}


\subsection{Code Quality Management}

Establishing high code quality practices is essential for an open-source library to be adopted by the community and extended by contributors. We do not propose any novelty in code quality management, but the good foundation of these practices is another feature that distinguishes us from other fairness projects.

Automation of development workflows in the \textsf{Virny}\xspace GitHub repository is provided by \textbf{GitHub Actions}. Each pull request to the ‘main’ branch triggers a CI pipeline that automatically runs unit tests with pytest and adds a description of new features and modifications in the documentation. The status of the test execution is displayed to the reviewer of the pull request to check the impact of added modifications on other existing library functionality. 

Creating test cases, which cover the main library functionality, is a crucial component of reliable development of new features. Our library has two types of tests: \textbf{unit tests} and \textbf{integration tests}. Unit tests are created based on the pytest Python library, ensuring the correct functionality of the main library functions. Integration tests are developed in Jupyter notebooks to check component interaction for all library use cases.

TODO: [Table of test coverage]

We also understand that user adoption requires comprehensive \textbf{library documentation} and detailed \textbf{use case examples}. Therefore, we created a website with all API descriptions and examples hosted on GitHub Pages using \href{https://squidfunk.github.io/mkdocs-material/}{mkdocs-material Python library}. Additionally, we adapted an open-source mkdocs parser (\href{https://github.com/MaxHalford/yamp}{YAMP}) for our library to generate automatic function descriptions based on code documentation.



\section{conclusions and Future Work}
\label{sec:discussion} 

In this work we attempted to clarify the desiderata of fairness and stability, by asking the question: ``Is estimator variance a friend or a foe?''. In answering this question we uncovered the fairness-variance-accuracy trade-off, an enrichment of the classically understood fairness-accuracy and accuracy-robustness trade-offs. We empirically demonstrated contexts in which large estimator variance, as well as large disparity in estimator variance, can have a corrective effect on both model accuracy and fairness, but we also identified scenarios in which variance fails to help. We hope that our work will usher in a new paradigm of fairness-enhancing interventions that go beyond the classic fairness-accuracy dichotomy~\cite{chouldechova_frontiers}. For instance, there is interesting future work to be done to exploit large noise variance on protected groups with improved fairness and accuracy through this fairness-variance-accuracy trichotomy. Furthermore, our insights on the effect of estimator variance could help guide model selection in cases when several models are equally ``fair'' or equally accurate. 

Our work also comes with important limitations: \citet{debiasing_bias} highlight statistical errors in the measurement of different performance metrics, and the statistical procedures used to compute estimator variance in this study also suffer from the same shortcomings. There are also interesting statistical questions around the variance of these variance estimates --- specially in social contexts where it is widely believed that noise variance tracks protected attributes~\cite{kappelhof2017,schelter2019fairprep} --- which we leave for future work.

Fairness is not a purely technical or statistical concept, but rather a normative and philosophical one. The major contributions of this work (methods, results and analysis) are purely technical, and are based on a popular technical definition of ``fairness'' as the parity in statistical bias. This is of course a limiting view, and one that should be regarded within a broader socio-legal-political view of fairness.  


One of the contributions of our work is the \textsf{Virny}\xspace software library. 
 We  envision several enhancements to our software library. Firstly, we would like to support other sampling-during-inference techniques for variance estimation beyond the simple Bootstrap, such as the Jackknife~\cite{Jackknife_review}, as well as combinations of Bootstrap and Jackknife~\cite{barber2021jackknife+,kim2020predictive_jackknife_bootstrap,Efron1992JackknifeAfterBootstrapSE}. We would also like to evaluate Conformal Prediction methods~\cite{vovk2017conformal,shafer2008conformal,angelopoulos2021conformal} for quantifying and correcting model instability. Specifically, it would be interesting to compare the insights from the variance metrics analyzed in this study with insights from the coverage and interval widths on different protected groups of conformal methods. 
\section{Experiments}
\label{sec:experiments}

We used the \textsf{Virny}\xspace library, presented in Section~\ref{sec:library}, to conduct an extensive empirical comparison of the behavior of the metrics described in Section~\ref{sec:fairness}, and evaluate the trade-offs between fairness, variance and accuracy.

\subsection{Benchmarks} We used two fair-ml benchmarks for our evaluation, namely \href{https://github.com/zykls/folktables#1}{folktables}  and \href{https://github.com/propublica/compas-analysis}{COMPAS}. 

Folktables~\citep{DBLP:conf/nips/DingHMS21} is constructed from census data from 50 US states for the years 2014-2018. We report results on the ACSEmployment task: a binary classification task of predicting whether an individual is employed. We report our results on data from Georgia from 2018. The dataset has 16 covariates, including age, schooling, and disability status, and contains about 200k samples, which we sub-sample down to 20k samples for computational feasibility.

COMPAS~\citep{compas_propublica} is perhaps the most influential dataset in fair-ML, released for public use by ProPublica as part of their seminal report titled ``Machine Bias.'' We use the binary classification task to predict violent recidivism. Covariates include sex, age, and information on prior criminal justice involvement. We use the version of COMPAS supported by \href{https://fairlearn.org/v0.4.6/auto_examples/plot_binary_classification_COMPAS.html}{FairLearn}. \textsf{FairLearn} loads the dataset pre-split into training and test. We merge these into a single dataset and then perform different random splits. We use the full dataset with 5,278 samples. 

\begin{table*}[t!]
    \caption{Demographic composition of folktables and COMPAS.}
    \centering
    \small 
    \begin{tabular}{|c|c|c|c|c|c|c|c|c|}
        \hline
          & sex$\_$race$\_$priv 
          & sex$\_$race$\_$dis
          & sex$\_$priv
          & sex$\_$dis
          & race$\_$priv
          & race$\_$dis   \\
         \hline
         folktables & 0.322 & 0.177 & 0.484 & 0.516 &  0.661 & 0.339 \\
         \hline
         COMPAS & 0.083 & 0.491 & 0.188 & 0.812 & 0.404 &  0.596 \\
         \hline
    \end{tabular}
    \label{tab:protected-info}
\end{table*}

We define binary groups with respect to two features, sex and race. Males are the privileged group in folktables, while females are the privileged group in COMPAS. Whites are the privileged group in both folktables and COMPAS. We also look at intersectional groups constructed from sex and race: (male, white) and (female, black) are the intersectionally privileged and disadvantaged groups in folktables respectively, while (female,white) and (male,black) are the intersectionally privileged and disadvantaged groups in COMPAS respectively. The proportion of demographic groups in folktables and COMPAS is reported in Table~\ref{tab:protected-info}.

\subsection{Model Training} In our experiments, we evaluate the performance of 6 different models, namely, Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), XG-Boosted Trees (XGB), K-Neighbors classifier (kNN), and a Neural Network (historically called the Multi-layer Perceptron, or MLP). In each run, we randomly split the dataset into train-test-validation sets (80:10:10).  We use the validation set to tune hyper-parameters once for each model type, for each dataset. We fit a single model on the complete train set and compute standard performance metrics (such as accuracy, TPR, FPR, TNR and FNR) both on the overall test set, and broken down by demographic groups listed in table~\ref{tab:protected-info}. Next, we use the bootstrap to construct 200 different versions of the training set (each with a size of $80\%$ of the full training set) and use this to train an ensemble of 200 predictors. We compute the variance metrics described in Section~\ref{sec:variance-metrics} on the outputs of this ensemble. We repeat this procedure for 10 different seeds on COMPAS and for 6 different seeds on folktables.

\subsection{Experimental Results}
For our analysis we will focus on four dimensions of model performance:
\begin{enumerate}
    \item Overall statistical bias: an \emph{accurate} model has low statistical bias on the full test set.
    \item Overall variance: a \emph{stable} model has low variance on the full test set.
    \item Disparity in statistical bias: a \emph{fair} model shows parity in statistical bias on \textsf{dis}\xspace and \textsf{priv}\xspace groups.
    \item Disparity in variance: a \emph{uniformly stable} model shows parity in variance on \textsf{dis}\xspace and \textsf{priv}\xspace groups.
\end{enumerate}

The overall statistical bias and variance of different models is presented in Figure \ref{fig:folk_metrics} for folktables and Figure \ref{fig:compas_metrics} for COMPAS. Standard deviation (Std), inter-quantile range (IQR), jitter, and label stability are measures of estimator variance, while accuracy, TPR, FPR, TNR, and FNR are measures of statistical bias.  We report all parity-based measures in Table~\ref{tab:folk-metrics} for folktables and Table~\ref{tab:COMPAS-metrics} for COMPAS.


Cells are colored according to the following scheme: cells with values close to parity (0 for difference measures, 1 for ratio measures) are in green. Cells that report discrimination (i.e.,\xspace disparity in favor of the \textsf{priv}\xspace group) are in pink, while those that report reverse discrimination (i.e.,\xspace disparity in favor of the \textsf{dis}\xspace group) are in yellow. The positive class in folktables (positive employment status) is desirable, whereas in COMPAS (positive risk of recidivism) is undesirable, and so we flip the coloring scheme across datasets. For variance metrics, cells that show larger instability on the \textsf{priv}\xspace group than on the \textsf{dis}\xspace group are in yellow, and those with a larger instability on the \textsf{dis}\xspace group than on the \textsf{priv}\xspace group are in pink. A summary of desirable behavior on our metrics, and the corresponding color scheme, is presented in Table~\ref{tab:color-table}.

\begin{table}
    \centering
    \caption{Summary of desirable behavior and coloring scheme on different metrics. Pink represents discrimination, yellow represents reverse-discrimination.}
    \begin{tabular}{|c|c|c|c|}
    \hline
         Metric name & Value & folktables & COMPAS  \\
         \hline
         Accuracy Parity & >0 & \colorbox{yellow}{       } & \colorbox{yellow}{        } \\
         Equalized Odds FPR & >0 & \colorbox{pink}{       } & \colorbox{pink}{        } \\ 
         Statistical Parity Difference & >0 & \colorbox{yellow}{       } & \colorbox{pink}{        } \\ 
         Disparate Impact & >1 & \colorbox{yellow}{       } & \colorbox{pink}{        } \\ 
         IQR Parity & >0 & \colorbox{pink}{       } & \colorbox{pink}{        } \\
         Jitter Parity & >0 & \colorbox{pink}{       } & \colorbox{pink}{        } \\
         Std Parity & >0 & \colorbox{pink}{       } & \colorbox{pink}{        } \\
         Label Stability Ratio & >1 & \colorbox{yellow}{       } & \colorbox{yellow}{        } \\
         \hline
    \end{tabular}
    \label{tab:color-table}
\end{table}


\begin{figure*}[h!]
    \centering
    \includegraphics[width=\linewidth]{folk-all-metrics.png}
    \caption{Statistical bias and variance metrics on folktables.}
    \label{fig:folk_metrics}
\end{figure*}

\input{folk-table.tex}

\subsection{The Fairness-Variance-Accuracy Trade-off}

Overall, as expected, ensemble models (Random Forest and XGBoost) are the most stable on all metrics and all datasets. Generally, the kNN and Decision Tree classifiers score highly on variance metrics (i.e.,\xspace are the least stable). The neural network (MLP) is stable on COMPAS, but is the least stable model on folktables! This is interesting, and counter-intuitive to the general understanding of how estimator variance relates to dataset size: folktables has 20k samples, while COMPAS has only 5k samples.  From a statistical bias perspective, all models perform poorly on COMPAS (no model has accuracy higher than $68\%$).

\subsubsection{Folktables} 
The MLP classifier and Random Forest are the best performing models on folktables, with an accuracy of $82.16\%$ and $82.3\%$ respectively. Random Forest is also one of the most stable models (low Std, low IQR, low Jitter, and high Label Stability), while MLP is one of the least stable models (both MLP and kNN have high Std, IQR and Jitter, and low Label Stability). 

From a fairness perspective, the MLP classifier and Logistic Regression perform the best. The Logistic Regression is not the best model on overall metrics, but has good parity on both statistical bias-based and variance-based metrics on folktables, as reported in Table \ref{tab:folk-metrics}. This is the first indication of a fairness-variance-accuracy trade-off: parity in variance and parity in statistical bias (``fairness'') comes at the cost of overall model accuracy.

Strikingly, the MLP is also a reasonably fair model --- it shows low Statistical Parity Difference and Disparate Impact (close to 0 and 1, respectively), despite having low overall stability and large disparity in variance-based metrics across groups. We argue that this is a feature and not a bug, and is, once again, the fairness-variance-accuracy trade-off at play: the classifier shows a larger variation in outputs on \textsf{dis}\xspace than on \textsf{priv}\xspace, and this has a corrective effect on both the overall fairness and accuracy. Here, we are trading off stability/variance to gain fairness and accuracy. 

The behavior of the Random Forest classifier also illustrates this trade-off: as mentioned previously, the Random Forest has the highest accuracy of all the models. From Table~\ref{tab:folk-metrics} we see that this classifier also shows good parity on almost all variance metrics. This, however, comes with model unfairness (large disparity in statistical bias)! On metrics that relate to model error (such as accuracy parity and equalized odds) the model is ``unfair'', in the sense that it discriminates against the \textsf{dis}\xspace group. However, on metrics that track selection rates (such as statistical parity and disparate impact) the Random Forest classifier shows reverse discrimination, in the sense that it over-selects the \textsf{dis}\xspace group. Here, the model trades off fairness on the one hand for high accuracy and parity in variance on the other hand.

There is no observable consistent trend in terms of fairness or stability for the kNN and XGBoost classifiers, and perhaps their lower overall accuracy compared to other models can also be explained by a sub-optimal trade-off on the fairness-variance-accuracy spectrum. 

\begin{figure*}[h!]
    \centering
    \includegraphics[width=\linewidth]{compas-all-metrics.png}
    \caption{Statistical bias and variance metrics on COMPAS}
    \label{fig:compas_metrics}
\end{figure*}

\input{compas-table.tex}

\subsubsection{COMPAS}
As described previously, none of the models in our experiments are particularly accurate on COMPAS. As expected, we also do not find these models to be particularly fair along any of the sensitive attributes, and for any fairness metrics. XGBoost is the most accurate model, and it does show parity for a handful of the bias-based metrics (Statistical Parity Difference and Disparate Impact, both along the lines of sex) and variance-based metrics (for IQR parity, Jitter parity, and Label stability ratio). For a classifier that has low overall accuracy, stability (low variance) and uniform stability (parity in variance) negates any potentially corrective effect estimator variance could have had, and results in model unfairness.

Interestingly, the Decision Tree is the most ``fair'' model on COMPAS --- it is close to having parity in accuracy across all groups and has the best parity in bias-based metrics for intersectional groups. Further, unlike the XGBoost classifier on COMPAS, the Decision Tree is far from having parity in variance, and it, in fact has higher variance on the \textsf{priv}\xspace group. Here, we see the corrective effect of estimator variance on the ``fairness'' of an inaccurate model: the Decision Tree has low accuracy --- even as compared to the other poorly performing models --- but its disparity in variance seems to improve the parity in statistical bias-based measures. 

\subsection{Comparing variance metrics}
\label{sec:metrics-comparison}

\begin{figure*}[h!]
    \centering
    \includegraphics[width=\linewidth]{folk-scatter-models-20k.png}
    \caption{folktables (20k samples): Relationship between different variance metrics. Y=X line is plotted in blue}
    \label{fig:metrics-scatter-folk-20k}
\end{figure*}

\begin{figure*}[h!]
    \centering
    \includegraphics[width=\linewidth]{folk-scatter-models-5k.png}
     \vspace{-0.75cm}
    \caption{folktables (5k samples): Relationship between different variance metrics. Y=X line is plotted in blue}
    \label{fig:metrics-scatter-folk-5k}
\end{figure*}

\begin{figure*}[h!]
    \centering
    \includegraphics[width=\linewidth]{compas-scatter-models.png}
     \vspace{-0.75cm}
    \caption{COMPAS (5k samples): Relationship between different variance metrics. Y=X line is plotted in blue}
    \label{fig:metric-scatter-compas}
\end{figure*}

In our last set of experiments, we examined two families of complimentary variance metrics: (1) Standard deviation (Std) and Inter-Quantile Range (IQR) track the spread of the predicted probabilities, while (2) Jitter and Label Stability track how often the predicted label flips. We compare how ``good'' (i.e.,\xspace informative) these different measures of estimator variance are in Figures~\ref{fig:metrics-scatter-folk-20k} and~\ref{fig:metrics-scatter-folk-5k} for folktables, and in Figure~\ref{fig:metric-scatter-compas} for COMPAS. 

The behavior of variance metrics seems to be both model-specific and-dataset specific. The MLP classifier (shown in red dots in Figure \ref{fig:metrics-scatter-folk-20k} and in green dots in Figure \ref{fig:metrics-scatter-folk-5k}) falls approximately on the $Y=X$ line (plotted in blue in both figures). This means that IQR, Jitter and Std are highly correlated, and so can be used interchangeably for this model.   Extending our earlier discussion of the instability of the MLP classifier on folktables: we see high instability in both the model that was trained on 5k samples (Figure~\ref{fig:metrics-scatter-folk-20k}) and in the model that was trained on 20k samples (Figure ~\ref{fig:metrics-scatter-folk-5k}). As expected, estimator variance decreases as the sample size increases: the MLP classifier trained with 20k and 5k samples has a maximum (worst-case) IQR of approximately 0.15 and 0.20 respectively.



We see interesting trends in estimator variance on the COMPAS dataset: in Figure~\ref{fig:metric-scatter-compas}, variance metrics reported for different model types forms clusters that are almost constant along one dimension. For the same value of IQR, models have a range of values of Jitter (left-most subplot), and for the same value of Std, models have a range of values for Jitter (second plot from the left). We do not observe this behavior when considering Label Stability: the metrics of different models do form clusters, but they do not stay constant along one dimension. This suggests that, while we may be tempted to treat them interchangeably, we do need to look at them as a set, since reporting them on different benchmark datasets (such as COMPAS here) could lead to some metrics appearing to be redundant, despite being informative in a different context (such as on folktables in Figure \ref{fig:metrics-scatter-folk-20k} and \ref{fig:metrics-scatter-folk-5k}). 



\section{Metrics for Model Performance}
\label{sec:fairness}  

Literature on fair-ML abounds with measures of models ``fairness''. Here, we first summarize some influential fairness measures that are stated as the ratio or the difference of measures of statistical bias computed on different demographic groups in Section~\ref{sec:bias-metrics}, and go on to define a new family of variance-based measures in Section~\ref{sec:variance-metrics}.

\paragraph{Notation.} Let $Y$ be the target or true outcome, $X$ be the covariates or features, and $A$ be the set of protected/sensitive attributes. To start, we limit our treatment to binary group membership, letting $A=1$ denote the privileged group and $A=0$ denote the disadvantaged group. We are interested to construct an estimator $\hat{Y} = f(X,A)$ that predicts $Y$, with the help of a suitable loss function. In fair-ML, we apply additional constraints on the interaction between $\hat{Y}$, $Y$ and $A$ in order to ensure that the estimator $\hat{Y}$ does not discriminate on the basis of sensitive attributes $A$. Different notions of fairness are formalized as different constraints, and a violation of the fairness constraint is usually defined as the corresponding measure of model unfairness, as we will discuss next. 

\subsection{Measures of Model (Un)Fairness}
\label{sec:bias-metrics}

\subsubsection{Equalized Odds}
The fairness criterion of Equalized Odds from \citet{hardt_EOP2016} is defined as:
$$ P(\hat{Y}=1|A=0,Y=y)=P(\hat{Y}=1|A=1,Y=y), y \in \{0,1\} $$

For $Y = 1$ (the positive outcome), this fairness constraint requires parity in true positive rates (TPR) across the groups $A = 0$ and $A = 1$, and for $Y = 0$ (the negative outcome), the constraint requires parity in false positive rates (FPR). A violation of this constraint (i.e.,\xspace the disparity in TPR and FPR across groups) is reported as a measure of model unfairness. In our paper, we refer to TPR and FPR as the \emph{base measures}, and we say that the fairness criterion of Equalized Odds is \emph{composed} as the difference between these base measures computed for the disadvantaged group ($A=0$, which we call \textsf{dis}\xspace) and for the privileged group ($A=1$, which we call \textsf{priv}\xspace), respectively.
$$\text{Equalized Odds Violation (True Positive)} = \Delta\text{TPR} = P(\hat{Y}=1|A=0,Y=1) -P(\hat{Y}=1|A=1,Y=1) $$
$$\text{Equalized Odds Violation (False Positive)} = \Delta\text{FPR} = P(\hat{Y}=1|A=0,Y=0)- P(\hat{Y}=1|A=1,Y=0) $$

We will now rewrite other influential fairness measures~\cite{chouldechova_impossibility, Kleinberg_impossibility} as the difference or ratio between different the base measures on the \textsf{dis}\xspace and \textsf{priv}\xspace groups. \footnote{$\Delta f = f_{dis} - f_{priv} $, $\mathcal{Q} f = f_{dis} / f_{priv} $}

\subsubsection{Disparate Impact}

Inspired by the 4/5th's rule in legal doctrines, Disparate Impact has been formulated as a fairness measure:
$$ \text{Disparate Impact} = \mathcal{Q}(\text{Positive Rate}) =  \frac{P(\hat{Y}=1|A=0)}{P(\hat{Y}=1|A=1)} $$

$P(\hat{Y}=1)$ is simply the Positive Rate of the estimator, and so the measure of Disparate Impact is composed as the ratio of the Positive Rate on the \textsf{dis}\xspace and \textsf{priv}\xspace groups, respectively. 

\subsubsection{Statistical Parity Difference}
Similarly, Statistical Parity is the fairness criterion that asks that comparable proportions of samples from each protected group receive the positive outcome:
$$ P(\hat{Y}=1|A=0) = P(\hat{Y}=1|A=1) $$

Statistical parity difference (SPD) is a popular fairness metric composed simply as the difference between the estimator's Positive Rate on \textsf{dis}\xspace and \textsf{priv}\xspace groups, respectively.
$$ \text{Statistical Parity Difference} = \Delta(\text{Positive Rate}) = P(\hat{Y}=1|A=0) - P(\hat{Y}=1|A=1) $$

\subsubsection{Accuracy Parity}
Accuracy parity is also commonly reported, and is computed as the difference in accuracy on \textsf{dis}\xspace and \textsf{priv}\xspace samples.
$$ \text{Accuracy Parity} =  \Delta(\text{Accuracy}) = \frac{P(\hat{Y}=1|A=0,Y=1)+P(\hat{Y}=0|A=0,Y=0)}{P(\hat{Y}=1|A=0,Y=1)+P(\hat{Y}=1|A=0,Y=0)+P(\hat{Y}=0|A=0,Y=1)+P(\hat{Y}=0|A=0,Y=0)} $$
$$ -  \frac{P(\hat{Y}=1|A=1,Y=1)+P(\hat{Y}=0|A=1,Y=0)}{P(\hat{Y}=1|A=1,Y=1)+P(\hat{Y}=1|A=1,Y=0)+P(\hat{Y}=0|A=1,Y=1)+P(\hat{Y}=0|A=1,Y=0)} $$

\subsection{Measures of Model (In)Stability}
\label{sec:variance-metrics}

We now introduce several variance-based metrics. We will introduce several base measures of model variance first, as this requires some reconciliation with uncertainty quantification and robustness literature, and then define the corresponding variance-based measures of model instability on different groups.

To measure the variation in model output, we will use the popular bootstrap technique~\cite{efron1994bootstrap}. This involves constructing multiple training sets by sampling with replacement from the given training set, and then
fitting estimators (with the same architecture and hyper-parameters) on these bootstrapped training sets. This allows us to construct a predictive distribution from the outputs of each trained model, instead of a single point estimate of the predicted probability. We can thereby compute different measures of variation between the predictions of the ensemble of estimators for the same data point, and use it to approximate the variance of a single model trained on the full dataset. We provide more details of our implementation of these techniques in Sections~\ref{sec:library} and~\ref{sec:experiments}. 

\subsubsection{Label Stability}

Label Stability \cite{Darling2018TowardUQ} is defined as the normalized absolute difference between the number of times a sample is classified as positive or negative:

$$ \text{Label Stability} = \frac{|\sum_{i=1}^{b} \mathbbm{1}[p_{\theta_{i}}(x)==1] - \sum_{i=1}^{b} \mathbbm{1}[p_{\theta_{i}}(x)==0]|}{b} $$

where $x$ is an unseen test sample, and $p_{\theta_{i}}(x)$ is the prediction of the $i^{\text{th}}$ model in the ensemble that has $b$ estimators. 

Recall that we are using the bootstrap to construct an ensemble of predictors to approximate the variance of a single estimator fit on the entire dataset. Label stability is a measure of disagreement between estimators in the ensemble: If the absolute difference is large, the label is more stable. If the difference is exactly zero, then the estimator is said to be ``highly  unstable'' because a test sample is equally likely to be classified as positive or negative by the ensemble.


We define the \textbf{Label Stability Ratio} as a new parity measure. It is computed as the ratio of the average Label Stability on samples from the disadvantaged (dis) group and the privileged (priv) group respectively.

\subsubsection{Jitter}
Jitter \cite{liu2022model} is a measure of the disparities of the model's predictions for each individual test example. It reuses a notion of \emph{Churn} \cite{milani2016launch} to define a ''pairwise jitter'':

$$
J_{i, j}\left(p_\theta\right)=\operatorname{Churn}_{i, j}\left(p_\theta\right)=\frac{\left|p_{\theta i}(x) \neq p_{\theta j}(x)\right|_{x \in X}}{|X|}
$$

where $x$ is an unseen test sample, and $p_{\theta i}(x)$, $p_{\theta j}(x)$ are the predictions of the $i^{\text{th}}$ and $j^{\text{th}}$ estimator in the ensemble for $x$, respectively.

To compute the variability over all models in the ensemble, we need to average \textit{pairwise jitters} over all pairs of models. This more general definition is called \emph{Jitter}:

$$J\left(p_\theta\right)=\frac{\sum_{\forall i, j \in N} J_{i, j}\left(p_\theta\right)}{N \cdot(N-1) \cdot \frac{1}{2}} \text{, where } i<j$$

We define \textbf{Jitter Parity} as the difference of the average Jitter on samples from the \textsf{dis}\xspace and \textsf{priv}\xspace groups, respectively.

\subsubsection{Standard Deviation and Inter-Quantile Range}

The bootstrap can also be used to compute the standard deviation (Std) and the inter-quantile range (IQR) of the predicted probabilities of the ensemble, as an approximation of the spread in predictions of a single model trained on the full dataset.

We compute the standard deviation (Std) and IQR on different groups (\textsf{dis}\xspace and \textsf{priv}\xspace), and compose the group-wise difference as \textbf{Std Parity} and \textbf{IQR Parity}, respectively.

We will empirically demonstrate the usefulness of these variance-based metrics, especially where statistical-bias based measures fail to provide a complete picture of model performance, in Section \ref{sec:experiments}. We will use a software library we developed to support this empirical analysis, and describe it next.


\section{Introduction}
\label{sec:intro}

The error of an estimator can be decomposed into a (statistical) bias term, a variance term, and an irreducible noise term. When we do bias analysis, formally we are asking the question: ``how \emph{good} are the predictions?'' The role of bias in the error decomposition is clear: if we trust the labels/targets, then we would want the estimator to have as low bias as possible, in order to minimize error. Fair machine learning (fair-ML) is concerned with the question: ``Are the predictions \emph{equally good} for different demographic or socioeconomic groups?'' One way to define \emph{equally good} is to require that the statistical bias of the estimator on samples from different demographic groups should be comparable.  In other words, unbiasedness is a fairness desideratum if we trust the data labels. This has naturally led to a variety of proposed fairness metrics, usually defined as the difference or the ratio of a measure of statistical bias (such as the True Positive Rate or the True Negative Rate) computed on different test subsets --- corresponding to socially privileged and socially disadvantaged  groups, respectively.

A complementary statistical question concerns the variance of the estimator. When we do variance analysis, formally we are asking the question: ``How \emph{stable} are the predictions?'' The role of variance in the error decomposition is subtle: it is unclear whether low variance is always a desirable property. For example, in a biased estimator --- whose predictions deviate from the true value --- high variance can have a corrective effect on some samples.
From a philosophical perspective, randomness is morally neutral, and so the effects of large variance can be morally more acceptable (and fairness-enhancing) than the effects of a systematic skew. Randomization is, of course, already used in algorithmic fairness research~\cite{dwork_awareness}, and it is an essential building block of the differential privacy framework~\cite{DBLP:journals/cacm/Dwork11}.

In this paper, our goal is to understand the role of estimator variance from a fairness
perspective --- what behavior of estimator variance is morally desirable? The dominant belief in fair-ML is that both stability and fairness are simultaneously desirable, that is, that we want to construct estimators that have parity in statistical bias across groups (are ``fair'') and have low variance (are ``stable'')~\cite{huang2019stable, friedler2019comparative}. Our insights provide a more nuanced picture of the stability desideratum in fair-ML and uncover a novel fairness-variance-accuracy trade-off.



\paragraph{\textbf{Contributions:}}


\begin{enumerate}
\item
We propose a new family of performance measures based on group-wise parity in variance in Section~\ref{sec:fairness}, and demonstrate their usefulness on folktables~\cite{DBLP:conf/nips/DingHMS21} and COMPAS~\cite{compas_propublica} benchmarks in Section \ref{sec:experiments}.


\item We clarify the relationship between fairness and stability: If a model is fair (in the sense of exhibiting low disparity in statistical bias), then we also desire it to be stable (in the sense of exhibiting overall low variance). However, instability (high variance) does not imply unfairness (high disparity in statistical bias)! Indeed, as we show empirically in Section~\ref{sec:experiments}, there is a fairness-variance-accuracy trade-off, where:

\begin{itemize}
\item[(i)] Parity in variance and parity in statistical bias (``fairness'') can come at the cost of overall model accuracy, as we demonstrate empirically using the logistic regression model for the ACSEmployment task on the folktables benchmark~\cite{DBLP:conf/nips/DingHMS21}.  

\item[(ii)] Variance can have a corrective effect on both fairness and the overall accuracy for models that have reasonably high overall accuracy. For example, we observe the MLP classifier on the ACSEmployment task on folktables and Decision Tree on COMPAS~\cite{compas_propublica} trading-off model stability to gain parity in statistical bias.
 
\item[(iii)] Conversely, for a classifier that has low overall accuracy, attempting to improve overall stability (low variance) and parity in stability across groups (parity in variance) negates any potentially corrective effect of estimator variance, and thereby leads to model unfairness.  We observe this empirically for the XGBoost classifier on COMPAS.
 \end{itemize}


\item We developed and publicly release a software library called 
\textsf{Virny}\xspace\footnote{https://github.com/DataResponsibly/Virny}
that reconciles uncertainty quantification techniques with fairness analysis/auditing frameworks. Using this library, it is easy to measure stability (estimator variance) and fairness for several protected groups, and their intersections.  We use this library in our own empirical analysis.

\end{enumerate}

\paragraph{\textbf{Roadmap.}} In Section~\ref{sec:fairness}, we present metrics for model performance. We first review influential fairness metrics, expressing them as ratios or differences of measures of statistical bias (Section~\ref{sec:bias-metrics}).   Next, we introduce several measures of estimator variance from robustness and uncertainty quantification literature, and propose a new family of performance metrics, expressed as the difference or ratio of these variance metrics (Section \ref{sec:variance-metrics}). This reconciles statistical bias-based and variance-based analysis of parity in estimator performance on different subgroups, and provides a richer picture of algorithmic discrimination, as we empirically demonstrate in Section \ref{sec:experiments}. In Section \ref{sec:library} we introduce a new software library --- \textsf{Virny}\xspace ---to compute statistical bias and variance metrics on subgroups of interest, and to compose parity-based performance metrics from them. In Section \ref{sec:experiments}, we report our empirical findings on folktables and COMPAS benchmarks, introduce the fairness-variance-accuracy trichotomy, and give a critical comparison of our proposed variance metrics with existing metrics.  We conclude and discuss avenues for exciting future work in Section~\ref{sec:discussion}.
\section{Motivation}
\label{sec:intro}

Heteroskedastic noise --- noise variance tracks protected group membership. Bias variance tradeoffs in groups. \todo{TODO: Forward reference C-IID work.} 

Fairness and the need to measure fairness --- model does not perform equally good on all parts of the input space: so we define fairness metrics as ratios or differences between performance metrics computed on different test subsets --- corresponding to demographic groups. Intuitively makes sense to define as disparity. Led to some good insights \todo{(cite important work here)}, turns out different performance metrics trade off against each other. 

But, reader, perhaps your statistical senses are tingling...Plotting how well the estimator performs for different groups is only one part of the story --- the bias part. In this paper we tell the variance story: we introduce a new family of 'fairness' metrics, computed as ratios and differences of several measures of variation in model outputs. 

Unfortunately there are several sources of uncertainty and it is not immediately obvious how to quantify individual effects. So, we instead identify stages in the model life-cycle that can introduce uncertainty, intervene on them one stage at a time, keeping all others constant, and measure the uncertainty propagated to the model outputs.

We also compute standard fairness metrics under each intervention, and this reconciles bias-based and variance-based measures of fairness, and provides a more rich picture of algorithmic discrimination. 

\textbf{Contributions:}
\begin{enumerate}
    \item Introduce a new family of variance-based fairness measures. Demonstrated how this complements bias-based fairness measures in the literature through experiments on benchmark datasets.
    \item We also make a significant contribution to literature that aims to study the fairness of ML models through a lifecycle-view. To the best our knowledge, ours is the first systematic empirical study to quantify uncertainty at different stages of the model lifecycle.
\end{enumerate}
\section{The \textsf{Virny}\xspace software library}
\label{sec:library}


In order to reconcile the reporting of statistical bias-based and variance-based performance measures discussed in Section \ref{sec:fairness}, we developed \textsf{Virny}\xspace\footnote{\emph{\textsf{Virny}\xspace} is a Ukranian word meaning faithful, true or reliable. $\#$ScienceForUkraine} --- a Python library for auditing model stability and fairness. The \textsf{Virny}\xspace library was developed based on three fundamental principles: 1) easy extensibility of model analysis capabilities; 2) compatibility to user-defined/custom datasets and model types; 3) simple composition of parity metrics based on context of use. 

\textsf{Virny}\xspace decouples model auditing into several stages, including: subgroup metrics computation, group metrics composition, and metrics visualization and reporting. This gives data scientists and practitioners more control and flexibility to use the library for both  model development and monitoring post-deployment.

\subsection{Comparison with existing fairness libraries}
Many toolkits dedicated to measuring bias and fairness have been released in the past couple of years. The majority of these toolkits are easily extensible, can measure a list of fairness metrics, and create detailed reports and visualizations. For example, \textsf{AI Fairness 360}~\cite{bellamy2018ai} is an extensible Python toolkit for fairness researchers and industry practitioners that can detect, explain, and mitigate unwanted algorithmic bias. \textsf{Aequitas}~\cite{saleiro2018aequitas} is another fairness auditing toolbox for both data scientists and policymakers. It concentrates on detailed explanations of how it should be used in a public policy context, including a ``Fairness Tree'' that guides the user to select suitable fairness metrics for their decision-making context.
\textsf{FairLearn}~\cite{bird2020fairlearn} provides an interactive visualization dashboard and implements unfairness mitigation algorithms. These components help with navigating trade-offs between fairness and model performance.

Similarly, \textsf{LiFT}~\cite{vasudevan2020lift} can measure a set of fairness metrics, but additionally, it focuses on scalable metric computation for large ML systems. Authors have shown how bias measurement and mitigation tools can be integrated with production ML systems and, at the same time, how to enable monitoring and mitigation at each stage of the ML lifecycle. Finally, \textsf{fairlib}~\cite{han2022fairlib} implements a broad range of bias mitigation approaches and supports the analysis of neural networks for complex computer vision and natural language processing tasks. In addition, the analysis module of \textsf{fairlib} provides an interactive model comparison to explore the effects of different mitigation approaches.

\textsf{Virny}\xspace distinguishes itself from the existing libraries in three key aspects. First, our software library instantiates our conceptual contribution, allowing the data scientist to understand the role of estimator variance in assessing model fairness.
\textsf{Virny}\xspace supports the measurement of both statistical bias and variance metrics for a set of initialized models, both overall on the full test set and broken down by user-defined subgroups of interest.

Second, \textsf{Virny}\xspace provides several APIs for metrics computation, including an interface for the analysis of a set of initialized models based on multiple executions and random seeds. This interface enables a  detailed model audit, and supports reliable and reproducible analysis of model performance. 

Third, our library allows data scientists to specify multiple sensitive attributes, as well as their intersections, for analysis.  For example, \textsf{Virny}\xspace can audit statistical bias and variance with respect to all of the following simultaneously: \textsf{sex}, \textsf{race}, \textsf{age}, \textsf{sex\&race}, and \textsf{race\&age}.  
We also support the definition of non-binary sensitive attributes (although we limit our experimental evaluation in Section~\ref{sec:experiments} to binary groups). We hope that this flexibility in selecting datasets, models, metrics and subgroups of interest will help usher in an era of research where measuring and reporting a variety of metrics of model performance on different subgroups is the norm, and not a specialized research interest of the few.

\subsection{Architecture}

\begin{figure*}[h!]
    \centering
    \includegraphics[width=\linewidth]{library-architecture.png}
     \vspace{-0.75cm}
    \caption{\textsf{Virny}\xspace Architecture}
    \label{fig:library_diagram}
\end{figure*}

Figure \ref{fig:library_diagram} shows how \textsf{Virny}\xspace constructs a pipeline for model analysis. Pipeline stages are shown in blue, and the output of each stage is shown in purple.
Each analysis pipeline has three processing stages: subgroup metrics computation, group metrics composition, and metrics visualization and reporting. We will now describe each of them. 

\subsubsection{Inputs}
To use \textsf{Virny}\xspace, the user needs to provide three inputs, namely:
\begin{itemize}
    \item A \textsf{dataset class} is a for the user's dataset that includes its descriptive attributes such a target column, numerical columns, categorical columns, etc\xspace. This class must be inherited from the $BaseDataset$ class, which was created for user convenience. The idea behind having a common base class is to standardize raw dataset pre-processing and feature creation and to simplify the logic for downstream metric computation.
    
    \item A \textsf{config Yaml} is a file that specifies the configuration parameters for \textsf{Virny}\xspace's user interfaces for metrics computation. We adopt this user-specified configuration approach to allow more flexibility to users. For instance, users can easily shift from one experiment to another, having just one config yaml per experiment, without having to make any further modifications before using \textsf{Virny}\xspace's user interfaces.

    The config file contains information such as the number of bootstrap samples to create (this is the number of estimators in our ensemble for variance analysis), the fraction of samples in each bootstrap sample, a list of random seeds, etc\xspace. Importantly, we ask the user to specify subgroups of interest in the dataset by simply passing a dictionary where key-value pairs specify the relevant column names and the values of the sensitive attribute of the groups of interest. Users can also specify intersectional groups here. 

   
    
    \item Finally, a \textsf{models config} is a Python dictionary, where keys are model names and values are initialized models for analysis. This dictionary helps conduct audits of multiple models for one or multiple runs and analyze different types of models.
\end{itemize}


\subsubsection{Subgroup metric computation} After the variables are input to a user interface, \textsf{Virny}\xspace creates a \textsf{generic pipeline} based on the input dataset class to hide pre-processing complexity (such as one-hot encoding categorical columns, scaling numerical columns, etc\xspace) and provide methods for subsequent model analysis. Later, this generic pipeline is used in subgroup analyzers to compute different sets of metrics. Our library implements a \textsf{Subgroup Variance Analyzer} and a \textsf{Subgroup Statistical Bias Analyzer}, and it is easily extensible to include other analyzers. We provide abstract analyzer classes for users to inherit from and to create custom analyzers. Once these analyzers finish computing metrics, their outputs are combined and returned as a \textsf{pandas} dataframe.


The \textsf{Subgroup Variance Analyzer} is responsible for computing our variance metrics (from Section \ref{sec:variance-metrics}) on the overall test set, as well as on subgroups of interest specified by the user. We use a simple bootstrapping approach~\cite{efron1994bootstrap} to quantify estimator variance, as is common in uncertainty quantification literature~\cite{Darling2018TowardUQ,liu2022jitter,debiasing_bias}. However, instead of simply computing the standard deviation of the predictive distribution, we also compute additional metrics such as label stability, jitter and IQR (defined in Section \ref{sec:variance-metrics}).
Similarly, the \textsf{Subgroup Statistical Bias Analyzer} computes statistical bias metrics (such as accuracy, TPR, FPR, TNR, and FNR) on the overall test set as well as for subgroups of interest.

\subsubsection{Group metric composition} The \textsf{Metrics Composer} is responsible for the second stage of the model audit. Currently, it computes the statistical bias-based and variance-based parity metrics described in Section \ref{sec:fairness}, but a user can compose additional metrics if desired. For example, the fairness measure of Disparate Impact is composed as the ratio of the Positive Rate computed on the \textsf{priv}\xspace and \textsf{dis}\xspace subgroups. 



\subsubsection{Metric visualization and reporting} The \textsf{Metrics Visualizer}
unifies different processing steps on the composed metrics and creates various data formats to ease visualization. Users can use methods of this class to create custom plots for analysis. Additionally, these plots can be collected in an HTML report with comments for reporting.


\subsection{User Interfaces}

For the first library release, we have developed the following three user interfaces:

\subsubsection{Single run, single model} This interface gives the ability to audit one model for one execution. Users can set a model seed or generate and record a random seed, and control the number of estimators for bootstrap, the fraction of samples used in each bootstrap sample, and the test set fraction. This interface returns a \textsf{pandas} dataframe of statistical bias and variance metrics for an input base model and stores results separately in a file.
    
\subsubsection{Single run, multiple models} This interface extends the functionality of the previous interface to audit multiple models. It can be more convenient and speed up the computation of multiple metrics for all models.
    
\subsubsection{Multiple runs, multiple models} This interface can be used for a more extensive model audit. Users specify a set of models to use and the seeds for each run.  This interface then computes metrics for all specified models and seeds, and saves the results after each run. In addition to metrics, this interface stores the seeds used for each run, which can help maintain consistent and reproducible results, such as those reported in Section~\ref{sec:experiments}.



\section{Quantifying Uncertainty in the Model Lifecycle}
\label{sec:lifecycle}

\todo{New experiments: model selection and data generating process, on one folktables task on one state}

We quantify the uncertainty/variance in model outputs with respect to the following dimensions. Our list of interventions are mapped onto different stages of the data science life-cycle, shown in Figure \ref{fig:lifecycle}

\subsection{Hypothesis Space} 
\label{sec:hypothesis}
Different model architectures encode different inductive biases, and so one source of uncertainty, which could result in instability in model outputs downstream, is the hypothesis class over which we are running our optimization.
      
    
\subsection{Parameter Space}
\label{sec:parameter}

The settings of several hyper parameters in predictive model determine how stable the model outputs will be --- based on choice we make in the parameter space we might end up at a minimum in a flat valley in the loss surface, and parameterize a stable model, whose performance generalizes reasonably well to unseen samples, or the local minimum we pick might be extremely sharp, and the corresponding model could produce large deviations in outputs for small perturbations in the input space. 
       
    
\subsection{Data Processing}
\label{sec:preprocessing}

Raw data is necessarily pre-processed and cleaned before being fed into data-driven models. The choice of data engineering technique is another source of uncertainty, because it involves a strategic manipulation of the dataset, which modifies it statistical properties. For example, standardization or max-min scaling is a common technique to reduce the variance in a dataset. 
    
    
\subsection{Data Collection}
\label{sec:collection}

The quality of an estimator depends acutely on the quality and size of the dataset that it was estimated from. We supplement our analysis of the impact of data errors (see point above), with an analysis of dataset size --- a model simply could not have seen enough data to make stable predictions, ie to find good, stable minima, and so dataset size is an important facet along which to study model instability. 
    
    
\subsection{Data Generating Process}
\label{sec:generation}

In practice, we will not be able to confidently assert that the data we built our model from, and the data that we will apply our model to get predictions for, comes from the same distribution. If any part of the data generating process changes (more weakly, if we suddenly start accessing/sampling from a different, previously unseen part of the data space) then the guarantees that we have for model performance no longer hold. We will study bais-variance trade-offs in the simple setting of test-data coming from a distribution that is different from the one sampled our training set from
    
    
\subsection{Learning Paradigm}
\label{sec:paradigm}

We take a second, more detailed look at the effect of data uncertainty on model stability by intervening on the learning paradigm -- whether the model sees all the data at once (batch), or is trained iteratively (incremental/online).
    

\begin{figure}
  \includegraphics[width=\textwidth]{stability_lifecycle.png}
  \caption{Stages in the model lifecycle at which we study stability. The numbers in circles map to different interventions, described in Section \ref{sec:lifecycle}.}
  \centering
  \Description{Stages in the model lifecycle at which we will measure stability}
  \label{fig:lifecycle}
\end{figure}

\section{Uncertainty Quantification Techniques}
\label{sec:methods}

\subsection{Sampling During Inference \todo{TODO: Diagrams? + How to construct confidence intervals}}
\label{sec:sampling-during-inference}
The basic idea of these UQ techniques is to generate samples at inference time (hence the name), and use the sample variance of simulated values to approximate the true variance of the estimator (by the law of large numbers) \cite{wasserman2004all}. 

We will now discuss dominant approaches to constructing these samples during inference, contrasting their statistical properties/guarantees and computational complexities.

 
\subsubsection{\textbf{The Bootstrap}}

\label{sec:bootstrap}
The bootstrap ~\citep{efron1994bootstrap} is a widely used and understood statistical procedure so we will not spend time explaining it here. Instead, we will describe how the Bootstrap can be used at inference time to get uncertainty estimates, along with the standard predicted probabilities: By construction, any ensemble model implicitly fits several copies of the same model (with the same base architecture). Ensembles constructed from bootstraps of the same training set come with implicit uncertainty estimates --- we make a prediction based on the ensemble mean, but we could also look at the ensemble variance. In this way, the predictive variance of the bootstrap outputs approximates the variance in predictions as if a single model was being used to make a prediction. Indeed the reduction in variance that we gain from ensembling can be interpreted as an estimate of the worst-case variance of a single model.

\subsubsection{\textbf{The Jackknife}}
\label{sec:jackknife}
The jackknife~\citep{Jackknife_review} is the intellectual predecessor to the sampling technique of leave-one-out cross-validation. Similarly to how we used the Bootstrap to construct estimates of predictive variance (and related measures) we can construct samples during inference from the base training set by leaving one (k in general) sample out each time. So, the Jackkife constructs subsampled dataset by iteraticely leaving one sample of the training set (hence the name leave-one-out). If there are n samples, the Jackknife constructs n permuted training sets, each of which differ exactly in one sample (added in one sample, omitted in the other). Hence, a common realistic scenario under which the Jackknife is preferable to the Bootstrap is when the number of samples is critically low: recall that the Bootstrap constructs variations of the base train set by sampling with replacement. If we do not have enough samples to be able to construct unique subsets through the Bootstrap, we might instead prefer to construct samples using the Jackknife: this is very computationally optimal when the train set size is sufficiently small. In other words, the Bootstrap is a more computationally feasible alternative to the Jackknife when the sample size is prohibitively high.

\subsubsection{\textbf{Jackknife+}}
\label{sec:jaccknife-plus}

~\cite{barber2021jackknife+} showed that the Jackknife fails to provide an assumption-free guarantee. The key observation they make is that the test residual computed on a held-out test set is not comparable with the leave-one-out residuals computed on the Jackknife. This is because the former sees one more observation in training than the latter sees (the single observation that is left out while constructing the subsets using the Jackknife. Instead the Jackknife+ makes a slight modification: it uses the ensemble constructed from models trained on the leave-one-out training for inference directly. The residuals on the held-out test set can therefore be used to construct confidence intervals for the ensemble with a guarantee of distribution-free predictive coverage at level 1-{$2\alpha$}. We point the reader to the original Jackknife+ paper~\cite{barber2021jackknife+} for the technical proofs.

\subsubsection{\textbf{Jackknife+-after-Bootstrap}}
\label{sec:jackknife-bootstrap}

There are several variants that combine ideas from the Jackknife and the Bootstrap. Here we highlight perhaps the most influential variant: the Jackknife+-after-bootstrap~\citep{kim2020predictive_jackknife_bootstrap}. The authors make a very neat observation that allows them to use "out-of-bag" estimates to improve on the computational complexity of sampling-during-inference techniques, namely that it is possible to obtain the i-th leave-one-out residual without having to recompute residuals from the base model by reusing the residuals computed from each training set. We can simply aggregate the residuals from models that did not see the i-th data point and directly compute the residual!

\subsection{Conformal Prediction}
\label{sec: conformal}
To motivate conformal prediction let us think about a scenario in which we are working with large-scale dataset or one where model evaluation is expensive: As the sample size increases, both the bootstrap and the jackknife become increasingly computationally expensive --- prohibitively so. In such a situation it would be more feasible to instead create a second hold-out set --- called the 'calibration set' --- on which to construct our confidence intervals from. This a very different procedure than sampling-during-inference techniques. Here, we train a single model --- we do not create multiple copies of the base model, train them on slightly different training sets and then construct predictive distributions. Instead, we train one model, apply it on the calibration set, compute residuals (or any suitable non-conformity score) and use it to construct confidence intervals for unseen samples. Alternatively, we could construct prediction sets for each unseen sample (with uncertainty built into the selection), instead of constructing confidence intervals around point estimates. In this manner conformal prediction can be thought of as a transformation from an uncertainty heuristic (computed on the labelled samples) into an uncertainty guarantee (for held-out samples).  

The basic algorithm for conformal prediction (with coverage 1-$\alpha$) is as follows: 
\begin{enumerate}
    \item  Fit your estimator. This is the model whose uncertainty you want to quantify and calibrate to satisfy the desired coverage.
    \item Define a score function. Intuitively this is a measure of non-conformity, and larger scores indicate larger disagreement between inputs. 
    \item Compute the non-conformity between the true label and the predicted values from the estimator for all the samples in the calibration set. From this compute the (1-$\alpha$) quantile of these calibration score (let us call it $qˆ$). 
    \item Use this quantile to form the prediction sets for new examples as:
    $$ C(X_{test}) = \{y : s(X_{test},y) \leq qˆ\}. $$
\end{enumerate}

Conformal prediction is an increasingly popular area of research in machine learning. For a more complete treatment of this technique we point the reader to the excellent primer on Conformal Prediction from \cite{angelopoulos2021conformal} and to seminal works from ~\cite{shafer2008conformal, vovk2017conformal, shafer2008conformal, vovk_cross-conformal}.

\todo{TODO: Comparison of methods table}


\section{Model Stability/Robustness Metrics}
\label{sec:metrics}

The Jackknife and Bootstrap (and their many variants) are all ways to construct a predictive distribution on a held-out set, instead of single point estimates. There are several things we can do with this distribution, we group them broadly into supervised and unsupervised methods. Supervised methods use the true labels to compute residuals and construct confidence intervals for unseen test samples, as discussed in Section \ref{sec:methods}.

Now, we discuss unsupervised approaches which instead focus on the statistical properties of the predictive distribution. In this way, sampling during inference UQ techniques have a natural connection to metrics of model stability from  stability/ adversarial robustness literature. Popular statistical measures of variation include, but are not limited to: \todo{TODO: set up single notation for all equations + diagram showing connection between methods?}

\begin{enumerate}
    \item \textbf{Label Stability:} Normalized absolute difference between the number of times a sample is classified as positive or negative. Value of 1 means perfect stability. Value of 0 means extremely bad model stability, because it indicates that ensemble members disagree greatly on the predicted labels.
    \item \textbf{Jitter:} TODO
    \item \textbf{Sample Variance:} The spread of the predictive distribution is a natural measure of model (in)stability.
    \item \textbf{Sample Inter Quantile Range:} This is another natural statistical measure of the spread of predictions by the ensemble. 
    \item \textbf{Predictive Entropy:} (?) 
\end{enumerate}
\section{Notation}
\label{sec:notation}

Before proceeding let us fix some notation. \todo{TODO}
\section{Quantifying Uncertainty in the Model Lifecycle}
\label{sec:setup}

\begin{figure}
  \includegraphics[width=\textwidth]{stability_lifecycle.png}
  \caption{Stages in the model lifecycle at which we study stability. The numbers in circles map to different interventions, described in Section \ref{sec:setup}.}
  \centering
  \Description{Stages in the model lifecycle at which we will measure stability}
  \label{fig:lifecycle}
\end{figure}



\subsection{Interventions}
We quantify the uncertainty/variance in model outputs with respect to the following dimensions. Our list of interventions are mapped onto different stages of the data science life-cycle, shown in Figure \ref{fig:lifecycle}
\begin{enumerate}
    \item \textbf{Hypothesis Space:} Different model architectures encode different inductive biases, and so one source of uncertainty, which could result in instability in model outputs downstream, is the hypothesis class over which we are running our optimization.
    
    
    \todo{Bias-variance plots (broken down by groups) for different models: linear models, tree-based models, ensembles, knn, linear NNs, SVM(?). Also plot training dynamics for each model: train loss, test loss, label stability on the test set, as a function of train epochs}
    
    \item \textbf{Parameter Space:} The settings of several hyper parameters in predictive model determine how stable the model outputs will be --- based on choice we make in the parameter space we might end up at a minimum in a flat valley in the loss surface, and parameterize a stable model, whose performance generalizes reasonably well to unseen samples, or the local minimum we pick might be extremely sharp, and the corresponding model could produce large deviations in outputs for small perturbations in the input space. 
    
    
    \todo{Pick one (best) model from each hypothesis class, train it in three regimes: underfitting, overfitting and tuned. Plot bias vs fairness for different subgroups.}
    
    \item \textbf{Data Processing:} Raw data is necessarily pre-processed and cleaned before being fed into data-driven models. The choice of data engineering technique is another source of uncertainty, because it involves a strategic manipulation of the dataset, which modifies it statistical properties. For example, standardization or max-min scaling is a common technique to reduce the variance in a dataset. 
    
    \todo{We will limit our analysis to the impact of synthetically generated data errors, so that we have access to the ground truth. For this we will: simulate nulls (percentage), add noise to produce outliers (frac of noise/weight of noise) and randomly flip labels in the train set (frac of labels flipped -- can be strategic wrt groups). Create bias-variance plots comparing different procedures for each error type. Also plot bias and variance metrics for different values of the noise hyperparameters (written in brackets above) } 
    
    \item \textbf{Data Collection:} The quality of an estimator depends acutely on the quality and size of the dataset that it was estimated from. We supplement our analysis of the impact of data errors (see point above), with an analysis of dataset size --- a model simply could not have seen enough data to make stable predictions, ie to find good, stable minima, and so dataset size is an important facet along which to study model instability. 
    
    \todo{Subsample the dataset and report trends for bias and variance metrics}
    
    \item \textbf{Data Generating Process:} In practice, we will not be able to confidently assert that the data we built our model from, and the data that we will apply our model to get predictions for, comes from the same distribution. If any part of the data generating process changes (more weakly, if we suddenly start accessing/sampling from a different, previously unseen part of the data space) then the guarantees that we have for model performance no longer hold. We will study bais-variance trade-offs in the simple setting of test-data coming from a distribution that is different from the one sampled our training set from
    
    \todo{Only do for folktables: train on (S1, Y1), sample test data points from (S2, Y1), (S1, Y2) and (S2,Y2) and report bias-variance plots for each. Can we capture a notion of shift? KL between datasets?}
    
    \item{ \textbf{Learning Paradigm:} We take a second, more detailed look at the effect of data uncertainty on model stability by intervening on the learning paradigm -- whether the model sees all the data at once (batch), or is trained iteratively (incremental/online).}
    
    \todo{Compare incremental vs batch versions of the same algorithm, ie. hold the hypothesis space fixed, and report bias-variance plots. }
\end{enumerate} 
