Abstract: In the last decade, various distributed stream processing engines (DSPEs) were developed in order to process data streams in a flexible, scalable, fast and resilient manner. Coping with the increasing high-throughput and low-latency requirements of modern applications led to a careful investigation and re-design of new tools for stream processing. The first generation of tools, such as Apache Hadoop [19] , Spark [20] , Storm [18] and Kafka [14] , were designed to split an incoming data stream into batches and to then synchronously execute their analytical workflows over these data batches. To overcome the limitations-primarily, the high latency-of this iterative form of bulk-synchronous processing (BSP), asynchronous stream-processing (ASP) engines such as Apache Flink [17] and Samza [15] have also recently emerged.
0 Replies
Loading