Abstract: Mobile sensing apps have proliferated rapidly over the recent years. Most of them rely on inference components heavily for detecting interesting activities or contexts. Existing work implements inference components using traditional models designed for balanced data sets, where the sizes of interesting (positive) and non-interesting (negative) data are comparable. Practically, however, the positive and negative sensing data are highly imbalanced. For example, a single daily activity such as bicycling or driving usually occupies a small portion of time, resulting in rare positive instances. Under this circumstance, the trained models based on imbalanced data tend to mislabel positive ones as negative. In this paper, we propose a new inference framework SLIM based on several machine learning techniques in order to accommodate the imbalanced nature of sensing data. Especially, guided under-sampling is employed to obtain balanced labelled subsets, followed by a similarity-based sampling that draws massive unlabelled data to enhance training. To the best of our knowledge, SLIM is the first model that considers data imbalance in mobile sensing. We prototype two sensing apps and the experimental results show that SLIM achieves higher recall (activity recognition rate) while maintaining the precision compared with five classical models. In terms of the overall recall and precision, SLIM is around $12$ percent better than the compared solutions on average.
Loading