L. Cheng
Efficiently Handling Skew in Outer Joins on Distributed Systems
Cheng, L.; Kotoulas, S.; Ward, T.; Theodoropoulos, T.
Authors
S. Kotoulas
T. Ward
T. Theodoropoulos
Abstract
Outer joins are ubiquitous in databases and big data systems. The question of how best to execute outer joins in large parallel systems is particularly challenging as real world datasets are characterized by data skew leading to performance issues. Although skew handling techniques have been extensively studied for inner joins, there is little published work solving the corresponding problem for parallel outer joins. Conventional approaches to this problem such as ones based on hash redistribution often lead to load balancing problems while duplication-based approaches incurs significant overhead in terms of network communication. In this paper, we propose a new algorithm, query with counters (QC), for directly handling skew in outer joins on distributed architectures. We present an efficient implementation of our approach based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skew. Experimental results show that our method is scalable and, in cases of high skew, faster than the state-of-the-art.
Citation
Cheng, L., Kotoulas, S., Ward, T., & Theodoropoulos, T. (2014). Efficiently Handling Skew in Outer Joins on Distributed Systems. In 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014) : Chicago, Illinois, USA, 26-29 May 2014 ; proceedings (295-304). https://doi.org/10.1109/ccgrid.2014.35
Conference Name | 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing |
---|---|
Conference Location | Chicago, IL, USA |
Start Date | May 26, 2014 |
End Date | May 29, 2014 |
Publication Date | May 29, 2014 |
Deposit Date | Apr 21, 2016 |
Publicly Available Date | Apr 28, 2016 |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 295-304 |
Book Title | 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014) : Chicago, Illinois, USA, 26-29 May 2014 ; proceedings. |
DOI | https://doi.org/10.1109/ccgrid.2014.35 |
Additional Information | Date of Conference: 26-29 May 2014 |
Files
Accepted Conference Proceeding
(325 Kb)
PDF
Copyright Statement
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
You might also like
Efficient Comparison of Massive Graphs Through The Use Of 'Graph Fingerprints'
(2016)
Conference Proceeding
Towards large-scale what-if traffic simulation with exact-differential simulation
(2015)
Conference Proceeding
Data Quality Assessment and Anomaly Detection Via Map / Reduce and Linked Data: A Case Study in the Medical Domain
(2015)
Conference Proceeding
Fast Compression of Large Semantic Web Data using X10
(2015)
Journal Article
Towards an Info-Symbiotic Decision Support System for Disaster Risk Management
(2015)
Conference Proceeding
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search