Abstract: With the significant increase in memory size, in-memory database systems are becoming the dominant way of dealing with large scale data analytics as compared to the traditional disk-based systems such as data warehouses. Due to the significant differences in both physical and logical designs, these two systems show totally different characteristics on massive data analytic workload. In order to address the difference and technical reasons behind, we contrast the performance between disk-based data warehousing and in-memory database systems by comparing two state-of-the-art commercial systems using a large-scale real transportation dataset. This independent performance study reveals several interesting insights. Experimental evaluation shows that the in-memory system can achieve competitive performance on most data analytics queries with less model maintenance cost and more flexibility, but it is not capable in other cases. We summarise the results of our study and provide guidelines on how to select an appropriate system for a given data analytics task.
Loading