Towards efficient server architecture for virtualized network function deployment: Implications and implementations

Yang Hu, Tao Li

Published: 2016, Last Modified: 15 May 2023MICRO 2016Readers: Everyone

Abstract: Recent years have seen a revolution in network infrastructure brought on by the ever-increasing demands for data volume. One promising proposal to emerge from this revolution is Network Functions Virtualization (NFV), which has been widely adopted by service and cloud providers. The essence of NFV is to run network functions as virtualized workloads on commodity Standard High Volume Servers (SHVS), which is the industry standard. However, our experience using NFV when deployed on modern NUMA-based SHVS paints a frustrating picture. Due to the complexity in the NFV data plane and its service function chain feature, modern NFV deployment on SHVS exhibits a unique processing pattern - heterogeneous software pipeline (HSP), in which the NFV traffic flows must be processed by heterogeneous software components sequentially from the NIC to the end re-ceiver. Since the end-to-end performance of flows is cooperatively determined by the performance of each processing stage, the resource allocation/mapping scheme in NUMA-based SHVS must consider a thread-dependence scheduling to tradeoff the impact of co-located contention and remote packet transmission. In this paper, we develop a thread scheduling mechanism that collaboratively places threads of HSP to minimize the end-to-end performance slowdown for NFV traffic flow. It employs a dynamic programming-based method to search for the optimal thread mapping with negligible overhead. To serve this mechanism, we also develop a performance slowdown estimation model to accurately estimate the performance slowdown at each stage of HSP. We implement our collaborative thread scheduling mechanism on a real system and evaluate it using real workloads. On average, our algorithm outperforms state-of-the-art NUMA-aware and contention-aware scheduling policies by at least 7% on CPU utilization and 23% on traffic throughput with negligible computational overhead (less than 1 second).

0 Replies