TL;DR: We reduce the bias of eigenspectrum diagnosis using a method based on matrix subsampling
Abstract: Diagnosing deep neural networks (DNNs) through the eigenspectrum of weight matrices has been an active area of research in recent years. At a high level, eigenspectrum analysis of DNNs involves measuring the heavytailness of the empirical spectral densities (ESD) of weight matrices. It provides insight into how well a model is trained and can guide decisions on assigning better layer-wise training hyperparameters. In this paper, we address a challenge associated with such eigenspectrum methods: the impact of the aspect ratio of weight matrices on estimated heavytailness metrics. We demonstrate that matrices of varying sizes (and aspect ratios) introduce a non-negligible bias in estimating heavytailness metrics, leading to inaccurate model diagnosis and layer-wise hyperparameter assignment. To overcome this challenge, we propose FARMS (Fixed-Aspect-Ratio Matrix Subsampling), a method that normalizes the weight matrices by subsampling submatrices with a fixed aspect ratio. Instead of measuring the heavytailness of the original ESD, we measure the average ESD of these subsampled submatrices. We show that measuring the heavytailness of these submatrices with the fixed aspect ratio can effectively mitigate the aspect ratio bias. We validate our approach across various optimization techniques and application domains that involve eigenspectrum analysis of weights, including image classification in computer vision (CV) models, scientific machine learning (SciML) model training, and large language model (LLM) pruning. Our results show that despite its simplicity, FARMS uniformly improves the accuracy of eigenspectrum analysis while enabling more effective layer-wise hyperparameter assignment in these application domains. In one of the LLM pruning experiments, FARMS reduces the perplexity of the LLaMA-7B model by 17.3\% when compared with the state-of-the-art method.
Lay Summary: We assess the training status of each layer by calculating the heavy-tailedness of the eigenspectrum of the model's weight matrices. However, earlier methods seemed to overlook the bias introduced by different aspect ratios of weight matrices, which led to severe misjudgment of the training status for some layers.
In this work, we adopt a simple fixed-window analysis method named FARMS. We sample many submatrices of the same size from the original weight matrix, like dividing farmland into many plots for cultivation. We then concatenate the eigenspectrum of these decomposed submatrices. By analyzing the heavy-tailedness of the concatenated spectrum, we can obtain a more accurate estimate of the training status of the weight matrix, thus eliminating the bias caused by different aspect ratios.
We validated the effectiveness of our new method on various layer-wise optimization approaches. By using our method, we can optimize each layer of the model more precisely, such as by providing more accurate learning rates. This leads to a more balanced training status across all layers of the trained model.
Link To Code: https://github.com/HUST-AI-HYZ/FARMS
Primary Area: Deep Learning->Algorithms
Keywords: LLM Pruning, Weight Matrix Analysis, Spectral Analysis, Marchenko-Pastur Law, Hyperparameter Tuning
Submission Number: 13188
Loading