Microscopes and telescopes: Trading in black boxes for a lens with multitexts, network depths, and statistical comparisons

Ada Wan

Published: 25 Apr 2024, Last Modified: 25 Apr 2024OpenReview Archive Direct UploadEveryoneCC BY-NC-ND 4.0

Abstract: Deep neural networks (DNNs) have typically been thought of as black boxes. Work on evaluation and interpretations has often tried to look into/through the box --- by testing each model with various hyperparameter settings or via output generated by each model. In this paper, we examine the effects of network depth and show how it is also possible to look outside the box and arrive at an interpretation through a meta evaluation across multiple models. Following a setup similar to Wan (2022), we perform systematically controlled experiments in conditional language modeling with the Transformer and multiway parallel data, controlling for the number of layers in depth, holding all other hyperparameters constant. We present visualization of our results and substantiate our interpretation with statistical comparisons confirming that there are more instances of significant differences between pairs in deeper models than in shallower models. That is, all else being equal, the deeper models magnify, like microscopes, differences in raw data statistics, while the shallower models, much like telescopes, neutralize/compress them.