Bounded Gaussian process with multiple outputs and ensemble combination

Jeremy H. M. Wong, Huayun Zhang, Nancy F. Chen

Published: 2024, Last Modified: 07 Oct 2025CAI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Spoken Language Assessment (SLA) is a subjective task, where different human raters often assign differing scores for the same input. It also often has a bounded score range. Prior work of applying a Gaussian Process (GP) to SLA uses a Gaussian output, which is unbounded, and does not consider inter-rater uncertainty. This paper investigates using a bounded beta density function output for a GP in SLA and proposes to extend this bounded GP framework to utilise the multiple output samples per input in the training set. In the experiments, various types of Neural Network (NN) and GP models are trained. This paper investigates combining ensembles of these GPs and NNs. Experiments on the speechocean762 dataset show that using a beta output is better able to predict the inter-rater uncertainty than a Gaussian output. Using multiple output samples in the training set further improves the beta-output GP’s inter-rater uncertainty prediction. Combination between a GP and NN yields improvements.

External IDs:dblp:conf/ieeecai/WongZC24