Impact of Sampling on Neural Network Classification Performance in the Context of Repeat Movie Viewing
Abstract: This paper assesses the impact of different sampling approaches on neural network classification performance in the context of repeat movie going. The results showed that synthetic oversampling of the minority class, either on its own or combined with under-sampling and removal of noisy examples from the majority class offered the best overall performance. The identification of the best sampling approach for this data set is not trivial since the alternatives would be highly dependent on the metrics used, as the accuracy ranks of the approaches did not agree across the different accuracy measures used. In addition, the findings suggest that including examples generated as part of the oversampling procedure in the holdout sample, leads to a significant overestimation of the accuracy of the neural network. Further research is necessary to understand the relationship between degree of synthetic over-sampling and the efficacy of the holdout sample as a neural network accuracy estimator.
0 Replies
Loading