Abstract: The virtualization of AI chips ensures security in multi-user scenarios by partitioning the AI chip into multiple logically isolated instances. While each instance is independent in terms of computing resources, they share the same task scheduler, resulting in implicit competition for scheduler time slices which could in turn degrade overall performance. To address the problem, we propose vLFS, a deep reinforcement learning-based scheduling algorithm that opens up a new multidimensional optimization space for AI chip virtualization. vLFS schedules tasks from different instances by combining task load characteristics and the runtime status of the underlying AI chip, while also considering collaborative scheduling between the host and the device, thereby significantly improving AI chip utilization. We implement vLFS on a real-world system and conduct extensive comparisons against heuristic scheduling methods.
Loading