Abstract: A single implementation choice in experience replay—whether replay samples are merged with current data or processed separately—can flip experimental outcomes. We show this overlooked detail alone determines whether methods outperform random baselines, with statistically significant differences on CIFAR-10, CIFAR-100, and TinyImageNet. Popular studies and libraries adopt inconsistent defaults, introducing hidden reproducibility gaps in continual learning research. Our results establish this protocol choice as a critical confounder, making fair evaluation of replay methods impossible without explicit reporting.
External IDs:dblp:conf/icann/Krutsylo25
Loading