Abstract: Characteristics of flow describe the pattern and trend of network traffic, it helps network operator understanding network usage and user behavior, especially useful for those who concerns more about network capacity planning, traffic engineering and fault handling. Due to the large scale of datacenter network and explosive growth of traffic volume, it's hard to collect, store and analyze Internet traffic on a single machine. Hadoop has become a popular infrastructure for massive data analytics because it facilitates scalable data processing and storage services on a distributed computing system consisting of commodity hardware. In this paper, we present a Hadoop-based traffic analysis system, which accepts input from multiple data traces, performs flow identification, characteristics mining and flow clustering, output of the system provides guidance in resource allocation, flow scheduling and some other tasks. Experiment on a dataset about 8G size from university datacenter network shows that the system is able to finish flow characteristics mining on a four node cluster within 23 minutes.
External IDs:dblp:conf/cits/CaiWZLS14
Loading