Content area

Abstract

Memory reliability is a critical concern in modern computing systems, where DRAM errors can significantly impact performance and data integrity. Systems employ Error Correction Codes (ECC) as a protection mechanism against memory errors, but these mechanisms are not capable of correcting all errors. Uncorrectable errors at this stage present a significant challenge in DRAM systems as they result in degraded performance and reliability and require costly memory replacements.

To address this, newer mitigation mechanisms have been developed. However, existing research on their effectiveness has primarily focused on operating-system-level mechanisms such as page offlining, and studies on hardware-targeted mechanisms including Post-Package Repair (PPR), and Adaptive Double Device Data Correction (ADDDC) have been very limited. Additionally, while these actions incur performance and resource overhead, the optimal conditions and timing for triggering them have remained unexplored.

We aim to fill this gap by modeling error dynamics with spatial information about error locations, moving towards the ability to predict uncorrectable errors and other events which lead to DRAM replacement, and select the most efficient mitigation action tailored to each unique situation. By leveraging a rich dataset collected from a substantial population of enterprise storage systems, this work provides invaluable insights into the real-world behavior of memory errors, and establishes a foundation for optimized application of error mitigation strategies, which results in enhanced reliability and performance in storage systems.

Details

1010268
Title
Dram Errors in Enterprise Storage Systems: Probabilistic Modeling and Mitigations
Number of pages
41
Publication year
2025
Degree date
2025
School code
0779
Source
MAI 87/1(E), Masters Abstracts International
ISBN
9798290664705
University/institution
University of Toronto (Canada)
Department
Electrical and Computer Engineering
University location
Canada -- Ontario, CA
Degree
M.A.S.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31839066
ProQuest document ID
3234680447
Document URL
https://www.proquest.com/dissertations-theses/dram-errors-enterprise-storage-systems/docview/3234680447/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic