Abstract: Deep neural networks have demonstrated its superiority in many fields. Its excellent performance relys on quite a lot of parameters used in the network, resulting in a series of problems, including memory and computation requirement and overfitting, which seriously impede the application of deep neural networks in many assignments in practice. A considerable number of model compression methods have been proposed in deep neural networks to reduce the number of parameters used in networks, among which there is one kind of methods persuing sparsity in deep neural networks. In this paper, we propose to combine ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1,1</sub> and ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1,2</sub> norm together as the regularization term to regularize the objective function of the network. We introduce group and ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1,1</sub> can zero out weights in both intergroup and intra-group level. ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1,2</sub> regularizer can obtain intragroup level sparsity and cause even weights among groups. We adopt proximal gradient descent to solve the objective function regularized by our combined regularization. Experimental results demonstrate the effectiveness of the proposed regularizer when comparing it with other baseline regularizers.
0 Replies
Loading