Content area

Abstract

Recent advances in large language models have driven major breakthroughs in Text-to-SQL tasks. However, many challenges hinder the use of SQL parsers for cross-language tasks. In this article, we introduce FGCSQL, a novel three-stage pipeline framework to deal with three challenges: cross-language schema linking, SQL parsing potential of LLM, and error propagation in SQL parsers, in which the framework uniquely incorporates a filtering encoder to eliminate irrelevant database schema items, harnessing a pre-trained generative large language model fine-tuned on a carefully structured dataset for enhanced SQL parsing. Finally, a correcting decoder addresses error propagation, culminating in a robust system for semantic parsing tasks. Tested on the CSpider dataset, the FGCSQL showcases a substantial improvement in the exact-set-match (EM) accuracy and execution accuracy (EX) metrics, validating the pipeline’s architecture’s effectiveness in mitigating the challenges typically confronted in Text-to-SQL conversion, especially in cross-lingual contexts. FGCSQL outstrips existing methods in execution precision, indicating the validity of our proposed method.

Details

1009240
Title
FGCSQL: A Three-Stage Pipeline for Large Language Model-Driven Chinese Text-to-SQL
Author
Jiang, Guanyu 1 ; Li, Weibin 2   VIAFID ORCID Logo  ; Yu, Chenglong 2 ; Zhu, Zixuan 1 ; Li, Wei 1 

 Hangzhou Institute of Technology, Xidian University, Hangzhou 311231, China; [email protected] (G.J.); [email protected] (Z.Z.); [email protected] (W.L.) 
 School of Artificial Intelligence, Xidian University, Xi’an 710071, China; [email protected] 
Publication title
Volume
14
Issue
6
First page
1214
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-03-19
Milestone dates
2025-02-10 (Received); 2025-03-18 (Accepted)
Publication history
 
 
   First posting date
19 Mar 2025
ProQuest document ID
3181456109
Document URL
https://www.proquest.com/scholarly-journals/fgcsql-three-stage-pipeline-large-language-model/docview/3181456109/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-03-27
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic