Abstract: The integration of processing and DRAM offers a potential solution to the memory bottleneck problem. The bandwidth available within the chip is several orders of magnitude higher than that at the memory bus with a lower access time. As workloads shift towards data-intensive/multimedia applications, the wide bandwidth can be effectively utilized by harnessing the parallelism available in these applications. There are difficult challenges in developing architectures and programming models that expose the available bandwidth to the application. This paper presents the design of an intelligent memory based on a distributed data-parallel architecture with limited support for control parallelism. We investigate some of the relevant design issues and evaluate the success of such an architecture in supporting data-intensive applications. The design is evaluated as a stand-alone system, and also as a co-processor acting as a memory access filter. A cycle-accurate simulator is developed and used to study the performance of the architecture for data-intensive applications. The performance is compared against that of a modern superscalar processor.
0 Replies
Loading