Content area

Abstract

In today’s big data era, applications across diverse domains increasingly involve heterogeneous data represented in different models and stored in various specialized engines. The traditional “one size fits all” approach no longer suffices. Systems that natively support multiple data models and engines, and can execute cross-model, cross-engine queries offer distinct advantages: (1) each data model and its corresponding engine is optimized for specific types of queries, enabling the system to leverage their respective strengths; and (2) data can remain in its original form and location, avoiding costly or infeasible transformations due to large data volume or third-party data governance.

This dissertation firstly presents AWESOME, an analytical tri-store system that supports three native data models along with their associated analytical libraries. A learned cost model is employed to select the optimal execution platform for each fine-grained physical operator. Extensive experiments across three analytical workloads show that AWESOME significantly outperforms single-model relational implementations, Python-based analytics, and a prior state-of-the-art analytical system.To further optimize an important class of cross-model queries, graph-relational joins, this dissertation then introduces MICRO, a lightweight middleware for optimizing such cross-store, cross-model queries. To train and evaluate MICRO’s learned optimizer, we construct benchmark datasets and use large language models to generate diverse and realistic workloads. Experimental results demonstrate that MICRO achieves up to a 2× speedup over leading federated relational systems, and its learned optimizer consistently outperforms heuristic and rule-based approaches across all evaluation metrics.

This dissertation summarizes these two systems and addresses key challenges in heterogeneous data management and cross-model query optimization.

Details

1010268
Title
Empowering Scalable Heterogeneous Data Analysis
Number of pages
146
Publication year
2025
Degree date
2025
School code
0033
Source
DAI-B 87/3(E), Dissertation Abstracts International
ISBN
9798293884643
Committee member
Deutsch, Alin; Roberts, Margaret
University/institution
University of California, San Diego
Department
Computer Science and Engineering
University location
United States -- California
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32237727
ProQuest document ID
3254335118
Document URL
https://www.proquest.com/dissertations-theses/empowering-scalable-heterogeneous-data-analysis/docview/3254335118/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic