Abstract: Several experiments using a versatile optimizing compiler to evaluate the benefit of four forms of microarchitectural parallelisms (multiple microoperations issued per cycle, multiple result-distribution buses, multiple execution units, and pipelined execution units) are described. The first 14 Livermore loops and 10 of the linpack subroutines are used as the preliminary benchmarks. The compiler generates optimized code for different microarchitecture configurations. It is shown how the compiler can help to derive a balanced design for high performance. For each given set of technology constraints, these experiments can be used to derive a cost-effective microarchitecture to execute each given set of workload programs at high speed.< >
Loading