Abstract

With deep learning models achieving results that exceed human capabilities in various computer vision tasks, robotics, and simulated environments, continued scaling of performance and energy-efficiency of deep learning systems is crucial to its deployment in solving complex real-world problems. However, improving the performance scalability and power efficiency of deep learning workloads through using emerging non-volatile memory (NVM) technologies or understanding the architectural implications of deep learning workloads remains an open problem.

In this thesis, we propose novel ways to identify and overcome the limitations of designing scalable and efficient systems for deep learning including (1) a cross-layer analysis framework, DeepNVM++, to characterize, model, and optimize emerging NVM-based caches in GPU architectures which tackle the scalability and efficiency limitations of conventional SRAM-based caches, (2) identifying the architectural implications of distributed reinforcement learning training and improving performance scalability and power efficiency of CPU-GPU systems by approaching the problem not solely from the GPU microarchitecture perspective but following a holistic system-level analysis approach, and last but not least (3) presenting a framework, QUIDAM, for quantization-aware power, performance, and area modeling of hardware accelerators to carry out efficient design space exploration and hardware-machine learning model co-exploration to achieve the best of both worlds.

Details

Title
Scalable and Efficient Systems for Deep Learning
Author
Inci, Ahmet Fatih  VIAFID ORCID Logo 
Publication year
2022
Publisher
ProQuest Dissertations & Theses
ISBN
9798845417459
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
2723468951
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.