Selective AnDE for Large Data Learning: a Low-Bias Memory Constrained Approach

Shenglei Chen

Published: 24 Feb 2017, Last Modified: 07 Feb 2024OpenReview Archive Direct UploadEveryoneCC0 1.0

Abstract: Learning from data that are too big to fit into memory poses great challenges to currently available learning approaches. Averaged n-Dependence Estimators (AnDE) allows for a flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for learning from large quantities of data. Memory requirement in AnDE, however, increases combinatorially with the number of attributes and the parameter n. In large data learning, number of attributes is often large and we also expect high n to achieve low-bias classification. In order to achieve the lower bias of AnDE with higher n but with less memory requirement, we propose a memory constrained selective AnDE algorithm, in which two passes of learning through training examples are involved. The first pass performs attribute selection on super parents according to available memory, whereas the second one learns an AnDE model with parents only on the selected attributes. Extensive experiments show that the new selective AnDE has considerably lower bias and prediction error relative to An'DE, where n' = n − 1, while maintaining the same space complexity and similar time complexity. The proposed algorithm works well on categorical data. Numerical data sets need to be discretized first.