Abstract: Traditional spatial prevalent co-location pattern mining is discovering groups of spatial features whose instances frequently appear together in nearby areas. However, it is unsuitable for many real-world applications where the significance of these instances must be considered. High utility co-location pattern (HUCP) mining is developed to find highly beneficial patterns by considering the importance of spatial instances. However, the mining result typically contains many HUCPs, making it difficult for users to absorb, comprehend, and apply. This work proposes a compressed representation of HUCPs, \(\epsilon \)-closed HUCPs, that allow for a user-specified small tolerance of the information between a pattern and its supersets. If the information difference is not larger than the small tolerance it only needs to keep the supersets. Moreover, an efficient algorithm is developed to discover \(\epsilon \)-closed HUCPs. The proposed algorithm avoids examining many unnecessary candidates; therefore, the performance of mining \(\epsilon \)-closed HUCPs is significantly improved. A set of different numbers of features, numbers of instances, and distribution of both synthetic and real data sets are employed to evaluate the performance of the proposed method completely. The experimental results show that \(\epsilon \)-closed balances the compression rate and the UPI error rate and gives a large pattern compression rate within a relatively small range of error rates. Moreover, the proposed algorithm is high-performance on dense and large data sets.
Loading