Content area
Efficient storage and processing of sparse data remain fundamental challenges in high-performance computing (HPC), particularly as applications increasingly operate on large-scale graphs and matrices with highly irregular structures. Traditional data layouts, such as the Compressed Sparse Row (CSR) format, provide strong spatial locality for static analytics but lack the flexibility required for dynamic updates. Moreover, the growing performance gap caused by irregular memory access patterns highlights the need for new abstractions that better align data access and computation with underlying hardware characteristics. This dissertation addresses these challenges in two critical HPC domains--dynamic graph storage and sparse matrix-matrix multiplication (SpGEMM)--by introducing novel data structures and algorithms designed through a memory–architecture co-design perspective.
In the first domain, graph-structured data is ubiquitous in areas like biology, social networks, and recommendation systems. As real-world graphs evolve continuously, dynamic updates must be supported alongside efficient analytics. Classical storage formats such as Compressed Sparse Row (CSR), adjacency lists, and edge lists, as well as more recently PMA-based extensions (e.g., VCSR [1], PCSR) have shown promise in making balance on both sides. This research explores this data structure design on emerging memory technologies, i.e., Persistent memory (PM). PM devices (such as Intel Optane) offer an appealing blend of large capacity, byte-addressability, and data persistence [2, 3, 4]. This research proposes DGAP [5], a framework that unifies graph storage and computation on PM to exploit its low-latency, high-bandwidth characteristics. DGAP builds upon a single mutable-CSR data structure augmented with PM-optimized designs–including a per-section append-only edge log, per-thread undo logging for crash consistency, and intelligent data placement to minimize in-place writes. These techniques drastically reduce write amplification and overheads of ensuring persistence. Extensive evaluations show that DGAP outperforms state-of-the-art dynamic graph systems, achieving up to 3.2× faster updates and 3.77× faster analytics on persistent memory. This demonstrates the efficacy of co-designing data structures with novel memory hardware to meet the demands of data-intensive graph applications.
In the second domain, I investigate Sparse General Matrix-Matrix Multiplication (SpGEMM), a core kernel in scientific computing and data analytics. SpGEMM performance is often bottlenecked by irregular memory access patterns, particularly when accessing the second input matrix. To improve data reuse and cache efficiency, this research introduces a cluster-wise SpGEMM algorithm [6] that groups structurally similar rows of the first input matrix (A) and processes them together, thereby achieving more cache-efficient access to the second matrix (B). This approach is supported by a new row-clustered CSR format (CSR_Cluster), which enhances spatial locality and reduces memory traffic. Experimental evaluations demonstrate consistent performance improvements, with average speedups of 1.39× over conventional approaches.
Together, the contributions presented in this dissertation demonstrate that substantial gains in sparse data processing can be achieved through the co-design of algorithms, data structures, and the underlying memory architecture. The proposed techniques offer practical pathways for leveraging emerging memory technologies and provide insights that extend to a broad class of HPC applications involving large, irregular, and dynamically evolving sparse data.