Reliability of Indicator-Based Comparison Results of Evolutionary Multi-objective Algorithms

Lie Meng Pang, Hisao Ishibuchi, Yang Nan, Cheng Gong

Published: 2024, Last Modified: 22 Jul 2025PPSN (4) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In evolutionary multi-objective optimization (EMO), performance indicators are often used to measure the quality of non-dominated solution sets obtained by EMO algorithms. However, the reliability of the performance indicators has not been well studied. In this paper, we compare the quality of non-dominated solution sets using four performance indicators: hypervolume (HV), inverted generational distance (IGD), inverted generational distance+ (IGD+), and additive epsilon (\(\epsilon _{+}\)). Our experimental results show that different performance indicators produce similar results when they are applied to commonly-used benchmark test problems such as DTLZ1 and DTLZ2. However, for real-world problems, we obtained significantly different comparison results from these indicators. Even when we use the same HV indicator, we obtain significantly different results depending on the reference point specifications. These observations suggest the importance of the choice of an indicator for performance comparison of EMO algorithms on real-world problems. When the HV indicator is used, the choice of a reference point is also important. Moreover, our observations suggest the necessity of using multiple indicators (including the HV indicator with multiple reference points) to obtain reliable performance comparison results.