['1c1', '< Title: REALISTIC EVALUATION OF SEMI-SUPERVISED LEARN-ING ALGORITHMS IN OPEN ENVIRONMENTS', '---', '> Title: COMPREHENSIVE AND REALISTIC EVALUATION OF SEMI-SUPERVISED LEARNING ALGORITHMS IN OPEN ENVIRONMENTS', '3c3', '< Abstract: Semi-supervised learning (SSL) is a powerful paradigm for leveraging unlabeled data and has been proven to be successful across various tasks. Conventional SSL studies typically assume close environment scenarios where labeled and unlabeled examples are independently sampled from the same distribution. However, realworld tasks often involve open environment scenarios where the data distribution, label space, and feature space could differ between labeled and unlabeled data. This inconsistency introduces robustness challenges for SSL algorithms. In this paper, we first propose several robustness metrics for SSL based on the Robustness Analysis Curve (RAC), secondly, we establish a theoretical framework for studying the generalization performance and robustness of SSL algorithms in open environments, thirdly, we re-implement widely adopted SSL algorithms within a unified SSL toolkit and evaluate their performance on proposed open environment SSL benchmarks, including both image, text, and tabular datasets. By investigating the empirical and theoretical results, insightful discussions on enhancing the robustness of SSL algorithms in open environments are presented. The re-implementation and benchmark datasets are all publicly available. More details can be found at https://ygzwqzd.github.io/Robust-SSL-Benchmark.', '---', '> Abstract: Semi-supervised learning (SSL) is a powerful paradigm for leveraging unlabeled data, demonstrating success across various tasks. Traditional SSL research often operates under a "closed-environment" assumption, where labeled and unlabeled data originate from the same distribution. However, real-world applications frequently encounter "open-environment" scenarios, characterized by disparities in data distribution, label space, and feature space between labeled and unlabeled data. These inconsistencies pose significant robustness challenges for SSL algorithms. This paper addresses these challenges by first introducing novel robustness metrics for SSL, anchored in the Robustness Analysis Curve (RAC). Second, we develop a rigorous theoretical framework to analyze the generalization performance and robustness of SSL algorithms within open environments. Third, we provide a unified re-implementation of widely adopted SSL algorithms and conduct a thorough evaluation on newly proposed open-environment SSL benchmarks, encompassing image, text, and tabular datasets. Our empirical and theoretical findings offer crucial insights and actionable strategies for enhancing the robustness of SSL algorithms in practical, open-environment settings. All re-implementations and benchmark datasets are publicly available at https://ygzwqzd.github.io/Robust-SSL-Benchmark.', '6,13c6', '< Semi-supervised learning (SSL) aims to leverage unlabeled data to improve learning performance when labels are limited or expensive to obtain (Chapelle et al., 2006). SSL algorithms have been repeatedly reported to achieve highly competitive performance to purely supervised learning and save a lot of labeling costs, by exploring the structure of unlabeled data.', '< All of the positive results, however, are based on the close environment assumption where labeled and unlabeled data are sampled from the same distribution independently. However, many practical applications involve open environments (Zhou, 2022) where the data distribution, feature space, and label space could be inconsistent between labeled and unlabeled data. SSL methods suffer severe robustness problems in open environments and could be even worse than a simple supervised learning model without exploiting more unlabeled data (Guo & Li, 2018;Oliver et al., 2018;Guo et al., 2020a;Li et al., 2021). Such phenomena undoubtedly go against the expectations of SSL and limit its effectiveness in more practical tasks.', '< The robustness of SSL in open environments has attracted great attention in recent years and various robust SSL algorithms have been proposed from different perspectives, such as inconsistent label space (Guo et al., 2020a;Chen et al., 2020;Yu et al., 2020;Saito et al., 2021;Guo & Li, 2022;Wei et al., 2022), inconsistent data distribution (Guo et al., 2020b;Zhou et al., 2021;Huang et al., 2021;Jia et al., 2023a). However, these algorithms primarily focus on robustness from a singular perspective and overlook the utilization of practical metrics for robustness analysis. Consequently, it remains challenging to ascertain the suitability of SSL algorithms in real-world open environments.', '< In this paper, we first propose several metrics considering different aspects of performance in open environments to achieve a fair and comprehensive evaluation of SSL algorithms. Then, we establish a theoretical framework for studying the generalization performance and robustness of SSL algorithms, and the results show that the generalization error in SSL consists of five components: bias caused by the learner, variance caused by data sampling, and three types of inconsistencies caused by open environments. Finally, we re-implement widely adopted SSL algorithms within a unified SSL toolkit and evaluate their performance on proposed open environment SSL benchmarks, including both image, text, and tabular datasets. Some interesting findings include:', '< • Inconsistency between the feature and label space has a more detrimental impact compared to cases where there is inconsistency in data distribution.', '< • Traditional statistical SSL algorithms can often outperform deep SSL algorithms in terms of both performance and robustness when applied to tabular datasets. Thus, more advanced SSL algorithms on tabular datasets should be studied.', '< • Certain robust SSL algorithms currently proposed do not consistently exhibit enhanced robustness and may not surpass ordinary deep SSL algorithms in most scenarios. We argue that the robustness of SSL algorithms should be evaluated under more reasonable metrics.', '< • Inconsistency between labeled and unlabeled data does not invariably result in negative effects. On the contrary, leveraging inconsistent unlabeled examples may improve performance in some cases. Thus, it is important to study how to exploit helpful information from inconsistent unlabeled data.', '---', '> Semi-supervised learning (SSL) is designed to enhance learning performance by effectively utilizing unlabeled data, particularly when labeled data is scarce or costly to acquire (Chapelle et al., 2006). While SSL algorithms have consistently demonstrated competitive performance compared to purely supervised methods, significantly reducing labeling efforts by exploring the underlying structure of unlabeled data, these successes largely rely on a critical "closed-environment" assumption. This assumption posits that both labeled and unlabeled data are independently sampled from the same distribution.', '14a8,22', '> However, real-world applications frequently deviate from this idealized scenario, operating instead in "open environments" (Zhou, 2022). In such settings, significant inconsistencies can arise between labeled and unlabeled data concerning their distribution, feature space, and even label space. These discrepancies introduce severe robustness challenges for SSL methods, often leading to performance degradation that can be worse than a simple supervised model that foregoes the use of additional unlabeled data (Guo & Li, 2018; Oliver et al., 2018; Guo et al., 2020a; Li et al., 2021). Such counterintuitive outcomes undermine the core promise of SSL and severely restrict its applicability in practical, complex tasks.', '> ', '> Recent years have seen growing attention to the robustness of SSL in open environments, leading to the proposal of various robust SSL algorithms addressing different facets of inconsistency, such as label space (Guo et al., 2020a; Chen et al., 2020; Yu et al., 2020; Saito et al., 2021; Guo & Li, 2022; Wei et al., 2022) and data distribution (Guo et al., 2020b; Zhou et al., 2021; Huang et al., 2021; Jia et al., 2023a). Nevertheless, these existing approaches often adopt a singular perspective on robustness and frequently lack comprehensive, practical metrics for a holistic analysis. This gap makes it difficult to reliably assess and compare the true suitability of SSL algorithms for real-world open environments.', '> ', '> In this paper, we address these limitations by making several key contributions:', '> • We first propose a novel set of metrics designed for fair and comprehensive evaluation of SSL algorithms in open environments, considering various performance aspects.', '> • Second, we establish a rigorous theoretical framework to analyze the generalization performance and robustness of SSL algorithms. Our findings reveal that the generalization error in SSL comprises five distinct components: learner bias, data sampling variance, and three types of inconsistencies inherent to open environments.', '> • Third, we provide a unified re-implementation of widely adopted SSL algorithms within a dedicated toolkit and conduct an extensive evaluation on newly proposed open-environment SSL benchmarks, spanning image, text, and tabular datasets.', '> ', '> Our empirical and theoretical investigations yield several insightful findings:', '> • Inconsistencies between the feature and label spaces exert a more significantly detrimental impact on SSL performance compared to inconsistencies solely in data distribution.', '> • Traditional statistical SSL algorithms frequently demonstrate superior performance and robustness over deep SSL algorithms when applied to tabular datasets, highlighting a critical area for further research in advanced SSL for tabular data.', '> • Many purportedly robust SSL algorithms do not consistently exhibit enhanced robustness, often failing to outperform conventional deep SSL algorithms in most scenarios. This underscores the necessity for evaluating SSL algorithms with more robust and reasonable metrics.', '> • Crucially, inconsistency between labeled and unlabeled data is not universally detrimental. In certain contexts, strategically leveraging inconsistent unlabeled examples can actually improve performance, suggesting a promising avenue for future research into exploiting helpful information from such data.', '> ', '349d356', '< ']
