Abstract: Overfitting is a well-known problem in the fields of symbolic and connectionist machine learning. It describes the deterioration of generalisation performance of a trained model. In this paper, we investigate the ability of a novel artificial neural network, bp-som, to avoid overfitting. bp-som is a hybrid neural network which combines a multi-layered feed-forward network (mfn) with Kohonen’s self-organising maps (soms). During training, supervised back-propagation learning and unsupervised som learning cooperate in finding adequate hidden-layer representations. We show that bp-som outperforms standard backpropagation, and also back-propagation with a weight decay when dealing with the problem of overfitting. In addition, we show that bp-som succeeds in preserving generalisation performance under hidden-unit pruning, where both other methods fail. 1 On avoiding overfitting In machine-learning research, the performance of a trained model is often expressed in its generalisation performance, i.e., its capability to process correctly new instances not present in the training set. When the generalisation performance of the trained model is much worse than its performance on the training material (i.e., its ability to reproduce the training material), we speak of overfitting. Overfitting is sometimes due to the sparseness of the training material: e.g., the training material does not sufficiently cover the characteristics of the classification task. A second cause for overfitting might be a high degree of non-linearity in the training material. In both cases, the learning algorithm might not be able to learn more from the training material than the classification of the training instances itself (see, e.g., Norris, 1989). The issue of avoiding overfitting is well-known in the field of symbolic and connectionist machine learning (e.g., Wolpert, 1992; Schaffer, 1993; Jordan and Bishop, 1996). In symbolic machine learning, a commonly used heuristic to avoid overfitting is minimising the size of the induced models (cf. Quinlan’s (1993) C4.5 and C4.5rules), in the sense of the minimum-descriptionlength (mdl) principle (Rissanen, 1983). For instance, smaller (or less complex) models should restrict the number of parameters to the minimum required for learning the task at hand. In connectionist machine learning (neural networks), avoiding overfitting is closely related to finding an optimal network complexity. In this view, two types of methods of avoiding overfitting (or regularisation) can be distinguished: (i) starting with an undersized network and gradually increasing the network’s complexity (Fahlman and Lebiere, 1990), and (ii) starting with an oversized network and gradually decreasing its complexity (e.g., Mozer and Smolensky, 1989; Le Cun, Denker, and Solla, 1990; Weigend, Rumelhart, and Huberman, 1991; Hassibi, Stork, and Wolff, 1992; Prechelt, 1994; Weigend, 1994). In this paper we analyse the overfitting-avoidance behaviour of a novel artificial neural-network architecture (bp-som, Weijters, 1995), which belongs to the second type of connectionist machine-learning methods. In bp-som, the network complexity is reduced by guiding the hidden-layer representations of a multi-layer feedforward network (mfn, Rumelhart et al., 1986) to simplified vector representations. To achieve its aim, bpsom combines the traditional mfn architecture with selforganising maps (soms) (Kohonen, 1984): each hidden layer of the mfn is associated with one som (see Figure 1). During training of the weights in the mfn, the corresponding som is trained on the hidden-unit activation patterns. The standard mfn error-signal is augmented with information from the soms. The effect of the augmented error signals is that, during learning, the hidden-unit activation patterns of clusters of instances associated with the same class tend to become highly similar. Intuitively speaking, the self-organisation of the som guides the mfn into arriving at adequate hiddenunit representations. We demonstrate that bp-som avoids overfitting by reducing the complexity of the hidden-layer representations. In Section 2, we provide a description of the bp-som architecture and learning algorithm. Section 3 presents experiments with bp-som trained on three benchmark classification tasks, focusing on the ability to avoid overfitting. In addition, we study the robustness of bp-som to hidden-unit pruning. Our conclusions are given in Section 4.
0 Replies
Loading