Content area

Abstract

Extended SQL with machine learning (ML) predicates, commonly referred to as SQL+ML, integrates ML abilities into traditional SQL processing in databases. When processing SQL+ML queries, some methods move data from database (DB) systems to ML systems to support SQL+ML queries. Such methods are not only costly due to maintaining two copies of data, but also pose security risks due to data movement. Fortunately, in-database SQL+ML processing addresses these limitations. However, conventional DB optimizers take ML predicates as UDFs (user-defined functions) and cannot optimize them using query rewriter and cost models. To boost the efficiency of in-database SQL+ML processing, this paper proposes to generate SQL predicates based on ML predicates and add them into SQL+ML queries, which can prune a significant number of irrelevant tuples and thus improve the performance. Optimizing SQL+ML queries presents three challenges: (C1) how to generate valid SQL predicates, (C2) how to select high-quality SQL predicates, and (C3) how to optimize the query using SQL predicates. To address these challenges, we propose Smart, which integrates three novel modules into the database optimizer: (1) inference rewrite: generating tight and valid SQL predicates for logical optimization; (2) progressive inference: selecting high-pruning-power but low-overhead SQL predicates to prune irrelevant tuples; (3) cost-optimal inference: optimizing the cost of query plan with selected SQL predicates for physical optimization. We implemented Smart in PostgreSQL and evaluated it on four widely-used benchmarks, JOB, TPC-H, SSB, and Flight. Experimental results revealed that Smart performed up to three orders of magnitude faster than the state-of-art baselines.

Details

Business indexing term
Title
In-database query optimization on SQL with ML predicates
Author
Guo, Yunyan 1 ; Li, Guoliang 1 ; Hu, Ruilin 1 ; Wang, Yong 1 

 Tsinghua University, Beijing, China (GRID:grid.12527.33) (ISNI:0000 0001 0662 3178) 
Publication title
Volume
34
Issue
1
Pages
12
Publication year
2025
Publication date
Jan 2025
Publisher
Springer Nature B.V.
Place of publication
New York
Country of publication
Netherlands
Publication subject
ISSN
10668888
e-ISSN
0949877X
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2024-12-23
Milestone dates
2024-11-26 (Registration); 2024-03-08 (Received); 2024-11-25 (Accepted); 2024-10-21 (Rev-Recd)
Publication history
 
 
   First posting date
23 Dec 2024
ProQuest document ID
3256782936
Document URL
https://www.proquest.com/scholarly-journals/database-query-optimization-on-sql-with-ml/docview/3256782936/se-2?accountid=208611
Copyright
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
Last updated
2025-10-03
Database
ProQuest One Academic