Content area

Abstract

Directed Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-based distributed computing systems. The performance model for a DAG on data-parallel frameworks (e.g., MapReduce) is a research challenge because the allocation of preemptable system resources among parallel jobs may dynamically vary during execution. This resource allocation variation during execution makes it difficult to accurately estimate the execution time. In this paper, we tackle this challenge by proposing a new cost model, called Bottleneck Oriented Estimation (BOE), to estimate the allocation of preemptable resources by identifying the bottleneck to accurately predict task execution time. For a DAG workflow, we propose a state-based approach to iteratively use the resource allocation property among stages to estimate the overall execution plan. Furthermore, to handle the skewness of various jobs, we refine the model with the order statistics theory to improve estimation accuracy. Extensive experiments were performed to validate these cost models with HiBench and TPC-H workloads. The BOE model outperforms the state-of-the-art models by a factor of five for task execution time estimation. For the refined skew-aware model, the average prediction error is under 3% when estimating the execution time of 51 hybrid analytics (HiBench) and query (TPC-H) DAG workflows.

Details

Business indexing term
Title
Performance models of data parallel DAG workflows for large scale data analytics
Author
Shi, Juwei 1 ; Lu, Jiaheng 2 

 Microsoft STCA, Beijing, China 
 University of Helsinki, Helsinki, Finland (GRID:grid.7737.4) (ISNI:0000 0004 0410 2071) 
Publication title
Volume
41
Issue
3
Pages
299-329
Publication year
2023
Publication date
Sep 2023
Publisher
Springer Nature B.V.
Place of publication
New York
Country of publication
Netherlands
ISSN
09268782
e-ISSN
15737578
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2023-05-23
Milestone dates
2023-04-19 (Registration); 2023-04-19 (Accepted)
Publication history
 
 
   First posting date
23 May 2023
ProQuest document ID
3255421070
Document URL
https://www.proquest.com/scholarly-journals/performance-models-data-parallel-dag-workflows/docview/3255421070/se-2?accountid=208611
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-09-29
Database
ProQuest One Academic