Abstract: Gaussian processes are robust and flexible non-parametric statistical models that benefit from the Bayes theorem by assigning a Gaussian prior distribution to the unknown function. Despite their capability to provide high-accuracy predictions, they suffer from high computational costs. Various solutions have been proposed in the literature to deal with computational complexity. The main idea is to reduce the training cost, which is cubic in the size of the training set. A distributed Gaussian process is a divide-and-conquer approach that divides the entire training data set into several partitions and employs a local approximation scenario to train a Gaussian process at each data partition. An ensemble technique combines the local Gaussian experts to provide final aggregated predictions. Available baselines aggregate local predictions assuming perfect diversity between experts. However, this assumption is often violated in practice and leads to sub-optimal solutions. This thesis deals with dependency issues between experts. Aggregation based on experts' interactions improves accuracy and can lead to statistically consistent results. Few works have considered modeling dependencies between experts. Despite their theoretical advantages, their prediction steps are costly and cubically depend on the number of experts. We benefit from the experts' interactions in both dependence and independence-based aggregations. In conventional aggregation methods that combine experts using a conditional independence assumption, we transform the available experts set into clusters of highly correlated experts using spectral clustering. The final aggregation uses these clusters instead of the original experts. It reduces the effect of the independence assumption in the ensemble technique. Moreover, we develop a novel aggregation method for dependent experts using the latent variable graphical model and define the target function as a latent variable in a connected undirected graph. Besides, we propose two novel expert selection strategies in distributed learning. They improve the efficiency and accuracy of the prediction step by excluding weak experts in the ensemble method. The first is a static selection method that assigns a fixed set of experts to all new entry points in the prediction step using the Markov random field model. The second solution increases the flexibility of the selection step by converting it into a multi-label classification problem. It provides an entry-dependent selection model and assigns the most relevant experts to each data point. We address all related theoretical and practical aspects of the proposed solutions. The findings present valuable insights for distributed learning models and advance the state-of-the-art in several directions. Indeed, the proposed solutions do not need restricted assumptions and can be easily extended to non-Gaussian experts in distributed and federated learning.
Loading