Abstract: The rapid growth of artificial intelligence (AI), particularly in computer vision (CV), necessitates distributed computing for efficient model training. Existing benchmarks often lack adaptability to emerging scenarios or focus on limited applications. To address these gaps, this paper studies a case on a comprehensive benchmark suite for distributed AI training systems. We classifies AI tasks into four categories, LargeScale, Moderate Complexity, High Load, and High-Performance, based on single-load computation and load concurrency, with representative models evaluated on Ray and DeepSpeed across diverse hardware. The experiments reveal fragmented framework performance. DeepSpeed excels in stability and efficiency for Large-Scale and Moderate Complexity tasks, leveraging advanced memory optimization. Ray outperforms in High Load and High-Performance tasks due to its dynamic resource scheduling but shows greater variability. These results highlight the need for task-specific framework selection tailored to hardware and performance requirements. We provides valuable insights for optimizing distributed AI training and bridges limitations in current benchmarks. Future work aims to expand task categories and framework support to align with the evolving demands of distributed AI systems.
External IDs:dblp:conf/iwqos/GaoPLLWZG25
Loading