BigDebug: interactive debugger for big data analytics in Apache SparkDownload PDFOpen Website

2016 (modified: 15 Sept 2022)SIGSOFT FSE 2016Readers: Everyone
Abstract: To process massive quantities of data, developers leverage data-intensive scalable computing (DISC) systems in the cloud, such as Google's MapReduce, Apache Hadoop, and Apache Spark. In terms of debugging, DISC systems support post-mortem log analysis but do not provide interactive debugging features in realtime. This tool demonstration paper showcases a set of concrete usecases on how BigDebug can help debug Big Data Applications by providing interactive, realtime debug primitives. To emulate interactive step-wise debugging without reducing throughput, BigDebug provides simulated breakpoints to enable a user to inspect a program without actually pausing the entire computation. To minimize unnecessary communication and data transfer, BigDebug provides on-demand watchpoints that enable a user to retrieve intermediate data using a guard and transfer the selected data on demand. To support systematic and efficient trial-and-error debugging, BigDebug also enables users to change program logic in response to an error at runtime and replay the execution from that step. BigDebug is available for download at http://web.cs.ucla.edu/~miryung/software.html
0 Replies

Loading