\documentclass{article}


% if you need to pass options to natbib, use, e.g.:
%     \PassOptionsToPackage{numbers, compress}{natbib}
% before loading maeb_2025


% ready for submission
\usepackage[nonatbib, preprint]{maeb_2025}


% to compile a preprint version, e.g., for submission to arXiv, add add the
% [preprint] option:
%     \usepackage[preprint]{maeb_2025}


% to compile a camera-ready version, add the [final] option, e.g.:
%     \usepackage[final]{maeb_2025}


% to avoid loading the natbib package, add option nonatbib:
%    \usepackage[nonatbib]{maeb_2025}


\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
\usepackage{hyperref}       % hyperlinks
\usepackage{url}            % simple URL typesetting
\usepackage{booktabs}       % professional-quality tables
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography
\usepackage{xcolor}         % colors
\usepackage{minted}
\usepackage{subcaption}
\usepackage{graphicx}


\title{DIGNEA: A tool to generate diverse and discriminatory instance suites for optimisation domains
}


% The \author macro works with any number of authors. There are two commands
% used to separate the names and addresses of multiple authors: \And and \AND.
%
% Using \And between authors leaves it to LaTeX to determine where to break the
% lines. Using \AND forces a line break at that point. So, if LaTeX puts 3 of 4
% authors names on the first line, and the last on the second line, try using
% \AND instead of \And before the third author name.


\author{%
  Alejandro Marrero, Eduardo Segredo, Coromoto León \\
  Departamento de Ingeniería Informática y de Sistemas \\ 
  Universidad de La Laguna \\
  San Cristóbal de La Laguna, Spain \\
  \texttt{(amarrerd|esegredo|cleon)@ull.edu.es} \\
  \And
  Emma Hart \\
  School of Computing, Edinburgh Napier University \\
  Edinburgh, United Kingdom  \\
  \texttt{e.hart@napier.ac.uk} \\
}


\begin{document}


\maketitle


\begin{abstract}
  This paper presents DIGNEA \cite{DIGNEA}, a novel tool designed to generate diverse and discriminatory instances for various optimisation domains. DIGNEA utilizes an evolutionary algorithm-based framework that incorporates novelty search techniques to ensure the generated instances are not only varied but also useful for assessing different solvers. The tool, written in C++, is available as a repository and a Docker image, making it easily adaptable to different domains and solver types. Recently, a Python version has been released to facilitate the adoption by the research community. An application to the Knapsack Problem is showcased, demonstrating its effectiveness in generating meaningful test instances.
\end{abstract}


\section{Introduction}

The performance of optimisation algorithms is highly dependent on the nature of problem instances. As such, designing effective optimisation algorithms requires diverse and discriminatory test instances that highlight the strengths and weaknesses of different solvers. Traditional approaches to instance generation often fail to provide sufficiently diverse and discriminatory problem instances since their main focus is to increase the hardness of the instances~\cite{Pisinger2005, SMITHMILES2021105184, michalakgenerating}, understanding hardness as the computational time required for the state-of-the-art exact solver to obtain the optimal solution for each instance. 
%
Nevertheless, plenty of real-world and academic scenarios require the use of non-exact solvers, and the lack of diversity in the benchmark sets becomes a real issue. 
%
Back in the 70s Rice highlighted the necessity of diverse instance datasets to enable better algorithm evaluation and selection in the definition of The Algorithm Selection Problem (ASP)~\cite{Rice1976}. Shortly, given a set of instances for some optimisation problem and a portfolio of solvers, the goal is to maximise a performance metric by mapping each instance in the set to the solver in the portfolio that scores best. The ASP has garnered considerable attention over recent years~\cite{kerschke2019automated} with the rise of Machine Learning approaches to predict either the performance of a given algorithm or the label of the best solver using large datasets of instances from the optimisation domain. However, the process of gathering sufficient instances that both cover the feature-space of instances and are discriminatory with respect to the solvers in the portfolio is considerably challenging.

Although recent research has focused on generating instances which are maximally discriminative with respect to a portfolio of solvers proposed for a specific domain, for instance, maximising the performance-gap between a target and other solvers, these approaches do not include explicit mechanisms to generate instances that are diverse with respect to the feature or instance space -- they focus only on generating instances that are diverse with respect to solver performance for domains like the Bin-Packing (BP), Travelling Salesman Problem (TSP) and Knapsack Problem (KP), respectively~\cite{Alissa2019,bossek2019evolving,Plata2019, julstrom2009evolving}.
%

DIGNEA is a novel software tool designed to address this challenge by leveraging Quality Diversity~(QD) \cite{pugh2016quality} algorithms within an Evolutionary Algorithm (EA) framework to the problem of generating instances that are diverse with respect to a feature, instance or performance space and also discriminatory to a set of solvers of the users choice. This approach allows researchers to better analyse algorithmic behaviour and improve algorithm selection strategies.
%
The software is available in both C++ and Python versions. The C++ version is provided as a repository and a Docker image, making it easily adaptable for various domains and solver types, whereas the Python library (known as DIGNEApy) is accessible at the Python Package Index (PyPi)\footnote{PyPi: \url{https://pypi.org/}} from where it can be easily installed using the standard package manager `pip'. The tool is designed to facilitate research in algorithm selection, instance space analysis, and performance benchmarking.

\section{DIGNEA: Architecture and Functionalities}

DIGNEA is structured around a modular framework that allows users to define problems, domains and solvers easily. Although there exist more classes in the framework, from the user perspective the key classes are the following:
\begin{itemize}
    \item Domain: Establishes the instance generation space for a given optimisation problem. A domain defines all the characteristics of the combinatorial optimisation problem, such as KP or TSP, to which the instances will be generated, i.e., the number of items in the instance, the bounds for each item, what features describe an instance and in that case how to extract them, etc. Note that it is necessary to define a domain for every optimisation problem the user is interested in.
    
    \item Problem: While the domain specifies the characteristics of the instances for an optimisation problem, the Problem class in DIGNEA describes how to evaluate an instance for a specific domain, i.e., given a solution, how to calculate the constraints, fitness and/or objectives.
    
    \item Solver: Defines algorithms applicable to optimisation problems, from generic heuristics and meta-heuristics, to domain-specific algorithms.
    
\end{itemize}

Combined, both versions of DIGNEA include several domains and solvers alongside lots of functionalities. Some of them are:

\begin{itemize}
    \item Domains: KP, TSP and BP.
    \item Solvers: Parallel-EA, Simulated Annealing, three exact solvers \cite{Pisinger2005, Pisinger1995, Martello1999} and four heuristics for the KP \cite{Plata2019}, 2-Opt and Greedy heuristics for the TSP, four heuristics for the BP \cite{Alissa2019} and a fully customisable evolutionary algorithm through the use of the DEAP library \cite{DEAP_JMLR2012}.
    \item Descriptors: The instances can be described by a set of computed features (domain-dependent), a performance descriptor (portfolio-dependent), and the full definition of the instance. Moreover, there exist several transformers specifically designed for the KP domain that allow the reduction of the search space from the features or instance descriptors. In addition to that, DIGNEA facilitates the creation of customizable novelty search descriptors from any callable type that receives an instance as a parameter and returns a collection of values that describe that instance based on the user's needs.
    \item Logs, evolution plots and datasets can be automatically generated after a run of DIGNEA.
\end{itemize}

\definecolor{bg}{rgb}{0.95,0.95,0.95}
\begin{figure}[h]
\begin{small}
    \centering
    \begin{minted}[mathescape,
               linenos,
               numbersep=5pt,
               bgcolor=bg,
               gobble=2]{python}
    portfolio = deque(default_kp, map_kp, miw_kp, mpw_kp)
    kp_domain = KnapsackDomain(dimension=50, capacity_approach='evolved')
    eig = EAGenerator(
        pop_size=128,
        generations=1000,
        domain=kp_domain,
        portfolio=portfolio,
        novelty_approach=NS(Archive(threshold=3), k=15),
        solution_set=Archive(threshold=1),
        repetitions=1,
        descriptor='features',
        replacement=generational_replacement,
    )
    solution_set = eig(verbose=True)
    \end{minted}
    \caption{Source code example to generate KP instances with $N=50$ items in DIGNEA.}
    \label{fig:python_code}
\end{small}
\end{figure}

\begin{figure}[!t]
    \centering
    \begin{subfigure}{0.4\textwidth}
    \centering
    \includegraphics[width=\linewidth]{kp_nsf.png}
    \label{fig:nsf}
    \end{subfigure}
    \hfill
    \begin{subfigure}{0.4\textwidth}
    \centering
    \includegraphics[width=\linewidth]{kp_nsp.png}
    \label{fig:nsp}
    \end{subfigure}
    \caption{KP instances generated through DIGNEA. Colours reflect the target algorithm for which an instance was produced. Instances on the left were generated using a features-based descriptor, while a performance-based descriptor was used to generate instances shown on the right.}
    \label{fig:space}
\end{figure}

While DIGNEA’s extensibility is ensured through the use of design patterns and modern features of both C++ and Python programming languages, enabling easy adaptation to various domains, parallelization and high-performance computing capabilities is another of the goals of the software using libraries like OpenMP, MPI and Numpy.

\section{Illustrative Example: The Knapsack Problem}

To demonstrate DIGNEA’s functionality, the tool was applied to generate instances for the 0/1 KP with N = 50 items. A portfolio of deterministic heuristics was used as solvers, and instances were evolved by considering both features and performance-based descriptors. Figure \ref{fig:python_code} shows an example of source code to run the experiment using a feature-based descriptor.

The results (Figure \ref{fig:space}) showed that instances generated with both features and performance-based descriptors were well-spread across both the feature and performance spaces. %
%while those generated with NSPerformance created clusters corresponding to different solver strengths. 
%
The collection \texttt{solution\_set} (Figure \ref{fig:python_code}) contains the resulting instances with all the required information to perform an exhaustive data analysis and solver performance benchmarking.


\section{Impact and conclusions}
DIGNEA significantly improves instance generation for optimisation research. It streamlines the ASP process by automating instance generation and solver performance evaluation in a single step, reducing computational overhead and human error. The tool’s modularity and portability make it applicable across multiple domains. DIGNEA has been successfully deployed on several HPC systems demonstrating its scalability and efficiency, and has been successfully used in several research publications \cite{Marrero22, Marrero23, Marrero24, Marrero24ecj}. Future developments will expand DIGNEA’s applicability to additional domains, further enhancing its utility for researchers and practitioners. The code for both versions is available in a GitHub Organisation.\footnote{DIGNEA Org. at GitHub: \url{https://github.com/DIGNEA}}

%such as Archer2 and TeideHPC, demonstrating its scalability and efficiency.


\bibliographystyle{IEEEtran}
\bibliography{references}

\end{document}