Abstract: This paper proposes a statistical learning approach for estimating occupancy in smart buildings using a set of small and simple nonintrusive sensors that can be viewed as alternatives to sensors that are sometimes perceived as invasive such as cameras. In that context, large amount of labelled training data are required. However, labelling large scale occupancy data is time consuming and tedious since it requires the direct involvement of the users. To tackle this challenge, we consider a hybrid approach based on the recently introduced interactive learning methodology that allows to collect training data of good quality, by ensuring a minimal involvement of the user, and a classification approach that we have developed. The classification part is based on the predictive distribution of the generalized Dirichlet (GD) mixture model which unfortunately does not have a closed-form. To alleviate that issue, we calculate a reliable approximation to the predictive distribution by optimizing the parameters of GD posterior distribution by a Bayesian variational inference approach. The choice of the GD mixture model is motivated by the heterogeneous non-Gaussian nature of the sensors outputs. Extensive experimental results reported for both synthetic data and real data indicate that our method could achieve promising results especially with extremely small training data.