Abstract: Miniaturized artificial intelligence (AI) data centers (MAIDC) built from high-performance embedded AI nodes have shown great promise in accelerating next-generation edge computing applications. However, MAIDCs are hard to design since they face more stringent thermal constraints due to high power densities exceeding traditional server architectures, limited cooling capacity from compact form factors, and variable conditions such as fluctuating ambient temperatures. In this work, we take the first to explore the thermal behavior of MAIDC and present TriCooling-Sim, a hierarchical and adaptive thermal–computation co-simulation framework for high-density MAIDCs composed of system-on-chip (SoC) nodes. Our design features a novel light-weight physics-guided modeling strategy that can achieve proactive workload–cooling co-optimization, supporting power-efficient architecture design and intelligent resource management. The framework allows multi-scale thermal simulation across six orders of temporal magnitude and three orders of spatial magnitude without prohibitive overhead. Validation across 16 representative MAIDC configurations shows that TriCooling-Sim attains a mean absolute error of 1.7 \(^\circ \)C compared with reference CFD simulations while reducing simulation time by up to two orders of magnitude, enabling both rapid design-space exploration and near-real-time operational decision-making for future MAIDC deployments.
External IDs:dblp:conf/npc/GuoWWHLG25
Loading