Abstract: The rapid expansion of the Internet of Things (IoT) industry highlights the significance of workload characterization when evaluating microprocessors tailored for IoT applications. The streamlined yet comprehensive system stack of an IoT system is highly suitable for synergistic software and hardware co-design. This stack comprises various layers, including programming languages, frameworks, runtime environments, instruction set architectures (ISA), operating systems (OS), and microarchitecture. These layers can be bucketed into three primary categories: the intermediate representation (IR) layer, the ISA layer, and the microarchitecture layer. Consequently, conducting cross-layer workload characterization constitutes the initial stride in IoT design, especially in co-design. In this paper, we use a cross-layer profiling methodology to conduct an exhaustive analysis of IoTBench-an IoT workload benchmark. Each layer’s key metrics, including instruction, data, and branch locality, were meticulously examined. Experimental evaluations were performed on both ARM and X86 architectures. Our findings revealed general patterns in how IoTBench’s metrics fluctuate with different input data. Additionally, we noted that the same metrics could demonstrate varied characteristics across different layers, suggesting that isolated layer analysis might yield incomplete conclusions. Besides, our cross-layer profiling disclosed that the convolution task, characterized by deeply nested loops, significantly amplified branch locality at the microarchitecture layer on the ARM platform. Interestingly, optimization with the GNU C++ compiler (G++), intended to boost performance, had a counterproductive effect, exacerbating the branch locality issue and resulting in performance degradation.
Loading