Abstract: Recent findings suggest that consecutive layers of neural networks with the ReLU activation function \emph{fold} the input space during the learning process. While many works hint at this phenomenon, an approach to quantify the folding was only recently proposed by means of a space folding measure based on the Hamming distance in the ReLU activation space. Moreover, it has been observed that space folding values increase with network depth when the generalization error is low, but decrease when the error increases, thus underpinning that learned symmetries in the data manifold (visible in terms of space folds) contribute to the network's generalization capacity. Inspired by these findings, we propose a novel regularization scheme that enforces folding early during the training process. Further, we generalize the space folding measure to a wider class of activation functions through the introduction of equivalence classes of input data. We then analyze its mathematical and computational properties and propose an efficient sampling strategy for its implementation. Lastly, we outline the connection between learning with increased folding and contrastive learning, hinting that the former is a generalization of the latter. We underpin our claims with an experimental evaluation.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Lechao_Xiao2
Submission Number: 4908
Loading