Abstract: SQL-on-Hadoop engines are considered as useful data integration tools for large-scale data. However, they may incur redundant network overhead by redistributing the intermediate results multiple times in the cases where a number of attributes are included in the query result. We propose an optimization method using partial materialization which avoids repetitively redistributing trivial attributes.
Loading