It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Multicore scaling is reaching its limit with the end of Dennard scaling; however, data and processing needs are increasing rapidly. The new generation of cloud applications are making large-scale application development commonplace; hence, this software growth does not show any sign of slowing. Unlike in the past, we cannot maintain the growth by adding more hardware to systems. It is developers’ responsibility to write optimized software that efficiently use the underlying hardware to sustain innovation.
How applications place data relative to where they perform computation can greatly impact the performance of applications with diverse resource requirements, ranging from single-CPU machines to multi-socket NUMA machines to distributed clusters. This dissertation demonstrates, across three application domains (DNN inference, OS kernels, and distributed key-value stores), that while a universal placement strategy does not exist for all these domains, it is feasible to develop systematic abstractions that enable the movement, replication, and partitioning of workloads across cores and machines. Such abstractions alleviate the need for ad-hoc and manual solutions that are currently prevalent.
This dissertation exemplifies this with three systems. The first is Packrat , which uses controlled data and compute placement to improve DNN inference latency on single CPU machines. It algorithmically partitions the larger DNN tasks into smaller tasks and places them on the CPU cores to improve the overall throughput and latency. The second system, NrOS uses controlled data and compute placement to improve the performance of OS kernels on multi-socket NUMA machines while simplifying kernel development. It replicates the kernel data structures across the NUMA nodes to avoid costly remote memory accesses to improve throughput and latency for OS services such as file system and virtual memory, etc. The third and the last system, ASFP uses controlled data and compute placement to improve the performance of distributed key-value stores. It uses logically decoupled storage functions and, based on resources demands, it places them on storage servers and clients to improve the overall system throughput and latency.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer