Skip to main content

Research Repository

Advanced Search

Efficiently Handling Skew in Outer Joins on Distributed Systems

Cheng, L.; Kotoulas, S.; Ward, T.; Theodoropoulos, T.

Efficiently Handling Skew in Outer Joins on Distributed Systems Thumbnail


Authors

L. Cheng

S. Kotoulas

T. Ward

T. Theodoropoulos



Abstract

Outer joins are ubiquitous in databases and big data systems. The question of how best to execute outer joins in large parallel systems is particularly challenging as real world datasets are characterized by data skew leading to performance issues. Although skew handling techniques have been extensively studied for inner joins, there is little published work solving the corresponding problem for parallel outer joins. Conventional approaches to this problem such as ones based on hash redistribution often lead to load balancing problems while duplication-based approaches incurs significant overhead in terms of network communication. In this paper, we propose a new algorithm, query with counters (QC), for directly handling skew in outer joins on distributed architectures. We present an efficient implementation of our approach based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skew. Experimental results show that our method is scalable and, in cases of high skew, faster than the state-of-the-art.

Citation

Cheng, L., Kotoulas, S., Ward, T., & Theodoropoulos, T. (2014). Efficiently Handling Skew in Outer Joins on Distributed Systems. In 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014) : Chicago, Illinois, USA, 26-29 May 2014 ; proceedings (295-304). https://doi.org/10.1109/ccgrid.2014.35

Conference Name 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Conference Location Chicago, IL, USA
Start Date May 26, 2014
End Date May 29, 2014
Publication Date May 29, 2014
Deposit Date Apr 21, 2016
Publicly Available Date Apr 28, 2016
Publisher Institute of Electrical and Electronics Engineers
Pages 295-304
Book Title 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014) : Chicago, Illinois, USA, 26-29 May 2014 ; proceedings.
DOI https://doi.org/10.1109/ccgrid.2014.35
Additional Information Date of Conference: 26-29 May 2014

Files

Accepted Conference Proceeding (325 Kb)
PDF

Copyright Statement
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.





You might also like



Downloadable Citations