Abstract: Prediction over heterogeneous data attracts much attention in urban computing. Recently, satellite imagery provides a new
chance for urban perception but raises the problem of how to fuse visual and non-visual features. So far, the practice is to
concatenate the multimodal features into a vector, which may suppress important features. Therefore, we propose a new
ensemble learning framework: (1) An estimator is developed for each predictor to score its confidence, which is input adaptive.
(2) By applying the output of each predictor to the input of the corresponding estimator as feedback, the estimator learns the
performance of the predictor in the input-output space. When a new input is applied to produce a prediction, the similar
situations will be recalled by the estimator to score the confidence of the prediction. (3) Using end-to-end training, the
estimator learns the weights automatically to minimize the total loss of the neural networks. With the proposed method, data
mining based urban computing and computer vision rendered urban perception can be bridged at the task of commercial
activeness prediction, where the prediction based on satellite images and social context data are fused to yield better prediction
than those based on single view data in the experiments.
0 Replies
Loading