Content area

Abstract

The Simple Storage Service (S3) protocol has become the de-facto standard for largescale data storage. With the widespread adoption of cloud services, the S3 protocol, which was initially developed by Amazon, has now been quickly adopted by all major vendors with a common set of base functionalities. S3 file operations are performed using a Representational State Transfer (REST) Application Programming Interface (API). This thesis presents the challenges associated with copying large amounts of data across S3 Clusters (both on-premise and in cloud) using native tools such as mc (MinIO Client), rclone, and s3cmd, and proposes the design of an in-memory metadata cache to accelerate S3 operations. The metadata cache first builds the current state of the bucket and persists the operations to the disk using PostgreSQL, and then uses the S3 bucket notification to build an incremental view of changes caused by file operations on the bucket. This solution eliminates the need to rescan the entire contents of the bucket to determine file changes in the source S3 bucket, which is the current standard in replication tools such as rclone. The cache has been developed in golang and tested on an 8-core Turing Pi System On Chip (SoC) module, and impact with performance has been measured. Performance evaluations demonstrate significant reductions in metadata retrieval time to a mere 6 minutes as compared to 4 hours using the standard method of listing objects, making this approach a practical enhancement for on-premise S3 storage solutions.

Details

1010268
Title
Designing an In-Memory Metadata Cache to Accelerate Object Storage Operations
Number of pages
104
Publication year
2025
Degree date
2025
School code
0010
Source
MAI 87/6(E), Masters Abstracts International
ISBN
9798265465979
Committee member
Crandall, Jedidiah; Zhao, Ming
University/institution
Arizona State University
Department
Computer Engineering
University location
United States -- Arizona
Degree
M.S.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32284978
ProQuest document ID
3279126541
Document URL
https://www.proquest.com/dissertations-theses/designing-memory-metadata-cache-accelerate-object/docview/3279126541/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic