Abstract: MapReduce provided a novel computing model for complex job decomposition and sub-tasks management to support cloud computing with large distributed data sets. However, its performance is significantly influenced by the working data distributions over those data sets. In this paper, we put forward a novel model to balance data distribution to improve cloud computing performance in data-intensive applications, such as distributed data mining. By extending the classic MapReduce model with an agent-aid layer and abstracting working load requests for data blocks as tokens, the agents can reason from previously received tokens about where to send other tokens in order to balance the working tasks and improve system performance. Our key contribution lies in building an efficient token routing algorithm in spite of agents' unknowing to the global state of data distribution in cloud. We also built a prototype of our system, and the experimental results show that our approach can significantly improve the efficiency of cloud computing.
0 Replies
Loading