Revisiting Gaussian Neurons for Online Clustering with Unknown Number of Clusters
Abstract: Despite the recent success of artificial neural networks, more biologically plausible learn- ing methods may be needed to resolve the weaknesses of backpropagation trained models such as catastrophic forgetting and adversarial attacks. Although these weaknesses are not specifically addressed, a novel local learning rule is presented that performs online clustering with an upper limit on the number of clusters to be found rather than a fixed cluster count. Instead of using orthogonal weight or output activation constraints, activation sparsity is achieved by mutual repulsion of lateral Gaussian neurons ensuring that multiple neuron centers cannot occupy the same location in the input domain. An update method is also presented for adjusting the widths of the Gaussian neurons in cases where the data samples can be represented by means and variances. The algorithms were applied on the MNIST and CIFAR-10 datasets to create filters capturing the input patterns of pixel patches of various sizes. The experimental results demonstrate stability in the learned parameters across a large number of training samples.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Based on feedback from reviewer Vx3w - Subfigure 2h now shows an example result of k-means clustering. Also added a paragraph at the end of the MNIST subsection that discusses this result. The Supplementary Material was also updated to include the source code for creating the subfigure. - Changed "A novel local learning rule is presented ..." to "Although these weaknesses are not specifically addressed, a novel local learning rule is presented ..." in the abstract. - The Introduction was divided into two sections: Introduction and Related Work. - Changed the paragraph starting with "In online clustering, the distribution of the input data might change ..." to "One way to address distribution shifts in online clustering is to utilize overparameterized models that have additional parameters available to model input from subsequent unknown distributions. This will require that the parameters used to capture previous data be less likely to be adjusted, but model susceptibility to catastrophic forgetting would consequently be alleviated.". - Changed "For a given neuron, the online method has an attraction term ..." to "For a given neuron, the online method has an attraction term toward the current sample, and an inhibition term that ensures reduced overlap between the Gaussian neurons in the same layer to achieve activation sparsity.". - Replaced "Thus, a maximum number of clusters is set prior to training rather than a fixed number of clusters." with "While the inhibition term does not fully resolve the possibility of catastrophic forgetting, specialized rules can be added since the neurons representing already sampled data can be identified.". Other changes: - Changed "Although, the neuron and cost function of this method can result in learned cluster centers that are outside of the input domain." to "The neuron and cost function of this method, however, can result in learned cluster centers that are outside the input domain, and these centers would then poorly represent the true cluster centers. Moreover, the proposed learning rule does not require a softmax function to produce the neuron outputs." to further highlight the differences of our learning rule and the method given in Deco & Obradovic (1995).
Assigned Action Editor: ~Jeffrey_Pennington1
Submission Number: 181