Apache Pig is a platform for creating MapReduce workflows with Hadoop. These workflows are expressed as directed acyclic graphs (DAGs) of tasks that exist at a conceptually higher level than their implementations as series of MapReduce jobs. Pig Latin is the procedural language used for building these workflows, providing syntax similar to the declarative SQL commonly used for relational database systems. In addition to standard SQL operations, Pig can be extended with user-defined functions (UDFs) commonly written in Java. We adopted Pig for our implementation of the correlator to speed up development time, allow for ad hoc workflow changes, and to embrace the Hadoop community׳s migration away from MapReduce towards more generalized DAG processing (Mayer, 2013). Specifically, in the event that future versions of Hadoop are optimized to support paradigms other than MapReduce, Pig scripts could take advantage of these advances without recoding, whereas explicit Java MapReduce jobs would need to be rewritten.
