It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Scheduling a distributed application modeled as a directed acyclic task graph (where nodes represent computation tasks and edges represent precedence constraints/data flow between the tasks) over a set of networked compute nodes is a fundamental problem in distributed computing and thus has received substantial scholarly attention. Most existing solutions, however, fall short of accommodating the dynamic nature of modern dispersed computing systems (e.g., IoT, edge, and robotic systems) where applications and compute networks are not known ahead of time, constantly changing, and are heavily resource-constrained. We propose solutions that address this gap and show that novel methods for understanding the limitations of existing task scheduling algorithms can inform the design of new algorithms better suited for dynamic, unpredictable, and resource-constrained modern dispersed computing systems.
The goal in task scheduling is to assign computational tasks from the given task graph to different compute nodes in such a way that minimizes or maximizes some performance metric (e.g., total execution time, energy consumption, throughput, etc.). We will focus on the task scheduling problem concerning heterogeneous task graphs and compute networks with the objective of minimizing makespan (total execution time). In particular, we explore the potential and limitations of both new and existing task scheduling algorithms for constantly changing, unpredictable, and resource-constrained environments.
First, we show that many of the most widely-used task scheduling algorithms are not well-suited for scenarios with highly dynamic network conditions and propose a novel approach to scheduling in such environments. Leveraging recent advances in machine learning, we show that graph neural networks (GNNs) can be used to quickly produce high-quality schedules in highly dynamic environments.
Then, we study the problem from a slightly different perspective. In real-world scenarios, it is often the case that the compute network is not given, but rather is something that can be designed ahead of time. For example, IoT system designers can decide where to place sensors and compute nodes in order to best support their application. Because task scheduling is NP-hard, however, we know there must exist problem instances on which even the (empirically) best algorithms perform poorly. In our work on network synthesis, we show that, for this reason, optimizing networks with respect to traditional metrics like communication strength can be detrimental to performance. We propose a framework for studying network synthesis for task scheduling applications and demonstrate its utility with an interesting, tactical IoT inspired use-case.
We also show that the traditional benchmarking approach to comparing existing task scheduling algorithms is insufficient because problem instances on which algorithms perform poorly are often remarkably similar to problem instances on which they perform well. We propose a new approach to comparing task scheduling algorithms that can help us better understand the conditions under which an algorithm performs well and poorly. Using our approach, we are able to find problem instances for which every algorithm we consider performs twice as bad (produces a schedule with twice the makespan) as another algorithm. In fact, for most algorithms, we are able to find conditions for which they perform five times as bad as another algorithm. These are the first results of their kind in the literature and represent an important step toward understanding the performance boundaries between algorithms.
Finally, we extend our efforts in comparing task scheduling algorithms to the individual algorithmic components that make up the algorithms. We propose a generalized parametric list scheduling approach for studying the individual and combined effects of such algorithmic components. We evaluate 72 algorithms produced by combining five different types of algorithmic components on 20 datasets and present results on their individual and combined effects on average performance and runtime. We also discuss how these results differ for individual datasets, suggesting that the way algorithmic components interact with each other is problem-dependent.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer