Abstract: Highlights•Almost all these works above take the independent and identically distributed (i.i.d.) assumption, i.e., regarding all samples in various computer vision tasks as i.i.d., but they are not i.i.d. samples and the samples of many machine learning applications such as speech recognition, system diagnosis, and market forecasting are also proved to be non-i.i.d. samples. In particular, with the advent of the big data era, multi-class learning problems-such as text or image classification - can involve tens or hundreds of thousands of classes, and the samples are usually spatially-connected or temporally-correlated with their physically-connected neighbours, which can not be directly handled by these existing methods or algorithms. This implies that many samples are not satisfy the i.i.d. assumption or this assumption is very restrictive and can not be strictly justified in real-world problems. In addition, the AIO methods above are theoretically reasonable (i.e., Fisher consistent) and experiments indicate that they can lead to well-generalizing hypothesis, but efficient algorithms are lacking so far. Then three problems are posed:•Whether the AIO-MSVM algorithm based on non-i.i.d. samples is consistent? If this algorithm is consistent, what is the learning rate of this algorithm with non-i.i.d. samples? Whether the efficient AIO-MSVM algorithm can be proposed (or whether the AIO-MSVM algorithm with i.i.d. samples can be improved)?•The goal of this paper is to answer these problems above. Now, let’ s highlight the main features of this paper.•The generalization bound of AIO-MSVM algorithm with u.e.M.c. samples is established and the fast learning rate is obtained. The AIO-MSVM algorithm with u.e.M.c. samples is proved to be consistent. To our knowledge, these studies are the first works on this topic.•A new algorithm, called AIO-MSVM algorithm based on q-times Markovian resampling (AIO-Mar-q<math><mi is="true">q</mi></math>), is proposed. The effectiveness and efficiency of the proposed algorithm are validated by experiments on public datasets.
Loading