Abstract: Analyzing and processing large graphs is of fundamental importance for an ever-growing number of applications. Significant advancements in the last few years at both, systems and algorithmic side, let graph processing become increasingly scalable and efficient. Often, these advances are still not well-known and well-understood outside the systems and algorithms communities. In particular, there is very little understanding of the various trade-offs involved in the usage of particular combinations of algorithms, data structures, and systems. This tutorial will have a particular focus on this aspect, imparting theoretical knowledge intertwined with hands-on experience. Since there is no clearly winning system/algorithm combination that performs best on all the different metrics, it is of utmost importance to understand the pros and cons of the various alternatives. The tutorial will enable application developers in industry and academics, students as well as researchers to make corresponding decisions in an informed way. The participants do neither require any particular a-priori knowledge apart from a basic understanding of core computer science concepts, nor any special equipment apart from their laptop. After a general introduction, we will describe the critical dimensions that need to be tackled together to effectively and efficiently overcome problems in large graph processing: data representation, data storage, acceleration via multi-core programming, and horizontally scalable graph-processing infrastructures. Thereafter, we will provide an overview of existing graph-processing systems and graph databases. This will be followed by hands-on experiences with popular representatives of such systems. Finally, we will provide a detailed description of algorithms used in these systems for fundamental problems like shortest paths and Pagerank, how they are implemented, and how this affects the overall performance. We will also cover basic data structures such as distance oracles that can be built on these systems to efficiently answer distance queries for real-world graphs.
0 Replies
Loading