Content area
Abstract
Intelligent question answering over industrial databases is a challenging task due to the multicolumn context and complex questions. The existing methods need to be improved in terms of SQL generation accuracy. In this paper, we propose a question-aware few-shot Text-to-SQL approach based on the SDCUP pretrained model. Specifically, an attention-based filtering approach is proposed to reduce the redundant information from multiple columns in the industrial database scenario. We further propose an operator semantics enhancement method to improve the ability of identifying complex conditions in queries. Experimental results on the industrial benchmarks in the fields of electric energy and structural inspection show that the proposed model outperforms the baseline models across all few-shot settings.
Full text
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
1. Introduction
With the development of intelligent devices and relational database technologies, a lot of information management systems have been applied in engineering scenarios [1, 2]. For example, in the field of electric energy, many industrial databases store domain-specific data relevant to electricity mechanism, electricity sales, and prices [3]. In order to manage the technical condition of the transportation infrastructures, structural health monitoring and inspection management systems are comprehensively deployed. Massive structural response and defect data are also collected in these databases [4]. Research on natural language processing for engineering scenarios has received much attention. Bhardwaj et al. [5] presented a custom word embedding approach for maintenance records. Zhu et al. [6] introduced a collaborative pretrained language model (PLM) for AIoT question answering (QA). Making full use of the industrial data and achieving convenient human–machine data interaction become the important tasks.
In natural language processing research, QA over databases has been a challenging task for many years [7]. This tabular QA task requires the models to understand the natural language questions and then retrieve relevant information from database tables. This process is often formulated as a semantic parsing–based Text-to-SQL task, which acts as a bridge converting user queries into SQL instructions. For this intelligent QA task, Xu et al. [8] proposed AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Deng et al. [9] presented a novel weakly supervised structure-grounded pretraining framework for Text-to-SQL that can effectively learn to capture text-table alignment based on parallel text-table corpus.
In recent years, the emerging PLMs have greatly improved the performance of QA systems. In particular, the generative large language models (LLMs), such as ChatGPT [10], LLaMA [11], and BaiChuan [12], have made significant advancements in text understanding and generation. However, training the LLMs from scratch requires enormous computational resources and collecting massive data. For specific domains, the general LLMs also lack sufficient domain knowledge, so they are not accurate enough to handle complex domain-specific tasks and difficult to handle temporal questions [13]. Moreover, from the perspective of practical applications, most industrial users are not allowed to upload data to these commercial LLMs due to the data privacy [14]. So, the deployment of few-shot QA models in low-resource conditions has become an important demand in industrial scenarios.
To date, some Text-to-SQL benchmarks, e.g., WikiSQL [15], Spider [16], and SparC [17] have been released, and many Text-to-SQL neural models have been proposed as well. For example, Li et al. [18] presented a fuzzy semantic to structured query language neural approach for the complex Text-to-SQL task. In addition, based on the emerging “pretraining and fine-tuning” paradigm, several few-shot Text-to-SQL approaches have been proposed, which can be divided into the transfer learning–based methods and the meta-learning-based methods. Most of these approaches are designed for the tabular QA task in general domains. The difference is that industrial databases often store massive sensing data and management information, so that the number of data columns is much larger, and the conditions in the WHERE clause are more complex [19]. These specific industrial scenarios pose more challenges to the Text-to-SQL models. For example, how to effectively select appropriate candidate units from many columns and how to generate WHERE conditional statements accurately are the key issues that the QA systems for industrial databases.
In this paper, we propose a novel question-aware few-shot Text-to-SQL approach for industrial databases. Building on top of the pretrained SDCUP model [20], our model uses a self-attention-based module to select the data columns relevant to the questions, so as to significantly reduce the encoding space. And then, an operator semantics enhancement manner is used to improve the prediction effect of complex WHERE conditions. The main contributions of this paper are as follows:
1. We propose an attention-based encoding manner to filter columns according to the semantics of questions. The proposed approach effectively reduces redundant information in multiple columns in industrial database scenarios.
2. A novel operator semantics enhancement approach is proposed to improve the capability of recognizing the complex WHERE conditions for the Text-to-SQL task over industrial databases.
3. Experimental results on the industrial benchmarks in the fields of electric energy and structural inspection show that our model outperforms the baseline models in all few-shot settings and achieves the state-of-the-art (SOTA) results.
2. Related Work
Generally, the semantic parsing–based tabular QA systems employ the Text-to-SQL phase to convert the natural language questions into SQL queries. And then, the generated queries are fed into the SQL engines of databases to perform data retrieval and computational tasks. Therefore, the quality of the SQL queries seriously affects the accuracy of results. In recent years, many PLMs-based Text-to-SQL methods have been presented. Lin et al. [21] proposed a BERT-based sequential architecture for modeling questions and relational databases in cross-DB semantic parsing. Wei et al. [22] utilized the multitask learning and the reweight loss strategies to improve the performance of Text-to-SQL. Nevertheless, these methods require large-scale labeled data for fine-tuning. In industrial scenarios, annotating massive data is time-consuming and relies on the participation of domain experts [23]. Hence, Text-to-SQL solutions for industrial databases need to be investigated in the few-shot settings.
To date, several few-shot Text-to-SQL approaches have been proposed. Some studies [24, 25] applied prompt engineering to improve the SQL generation performance of LLMs. Yang et al. [26] presented a few-shot semantic parsing approach termed SEQZERO. By using their method, the question is decomposed into a sequence of subquestions, which correspond to the subclauses of the formal language. Then, the language models can generate short answers using prompts for predicting subclauses. Gu et al. [27] proposed a divide-and-conquer framework for few-shot Text-to-SQL translation. In the first stage, they use a PLM to generate an SQL structure with placeholders. And then, the content stage guides a PLM to populate the placeholders in the generated SQL structure with concrete values. Shin et al. [28] proposed a semantic parsing method based on a language model, which transformed the questions into several controlled sublanguages by using dynamic prompts or fine-tuning, and constrained the decoding to ensure the legitimacy of the results. These prompt-based manners can effectively improve the SQL generation performance in the few-shot settings. However, the design of the prompt templates requires the support of domain knowledge, and the effect is affected by data distribution [29]. Chang et al. [30] presented a zero-shot Text-to-SQL approach with auxiliary task. However, this method requires additional BIO annotation, limiting the applicability to some specific domains. GAZP [31] adapted the semantic parser to the new domains by synthesizing cycle-consistent data. Nevertheless, most of the existing approaches use the row encoding solution, which has the limitation of encoding length in the Text-to-SQL task with Wide tables for industrial scenarios [32]. These issues lead to the performance of few-shot Text-to-SQL task in the Wide table context which needs to be further improved.
On the other hand, the meta-learning technologies can adapt to new tasks and data distribution via learning how to learn, so as to improve the generalization and adaptability of the model, which is another direction of few-shot Text-to-SQL [33]. MC-SQL [3] trains model via the content enhanced meta-learning and employs a two-stage gradient updating strategy to learn general knowledge, so as to improve the generalization ability in the zero-shot setting. MST-SQL [34] is a few-shot Text-to-SQL manner based on column-wise HydraNet. It adopts self-training to learn from readily available unlabeled data. However, MST-SQL uses the column-based encoding manner [35], which has some shortcomings of data redundancy and large memory occupation.
The rapid development of LLMs has injected new vitality into the Text-to-SQL field. To improve the performance of LLMs in this task, Zhang et al. [36] proposed SQLfuse, a robust system that integrates open-source LLMs with a suite of tools to enhance the accuracy and usability of Text-to-SQL translation. SQLfuse features four modules: schema mining, schema linking, SQL generation, and a SQL critic module, allowing for the continuous enhancement of SQL query quality. Additionally, Gao et al. [37] presented XiYan-SQL, an innovative framework that employs a multigenerator ensemble strategy to improve candidate generation. XiYan-SQL leverages the significant potential of in-context learning and supervised fine-tuning to enhance the quality of generated SQL queries, achieving excellent performance on multiple benchmarks.
In summary, many few-shot Text-to-SQL efforts have been proposed for general domains. However, few solutions are presented for the Wide table in industrial scenarios [38]. Current methods still face many challenges in terms of context adaptive training strategy and generalization ability. Different from the existing methods, our approach can filter the table columns to improve the fine-tuning efficiency of PLMs. Operator semantic features are further integrated to improve the performance of few-shot Text-to-SQL in the industrial scenarios.
3. Methodology
3.1. Task Formulation
Formally, the Text-to-SQL task takes a question
More specifically, the few-shot Text-to-SQL task stipulates that only a limited number of labeled examples can be used to train the model. Following the research work [34], the few-shot settings are adopted to
3.2. Task Characteristics in Industrial Scenarios
Significantly different from general application systems, the data tables in industrial systems usually contain more columns. Meanwhile, the questions in industrial scenarios contain more constraints, so that the conditions and functions in the WHERE clause are more complex. In order to quantitatively analyze the characteristics of Text-to-SQL task in industrial scenarios, we employ WikiSQL in general domain and two industrial datasets ESQL and SMI-SQL for comparison. The ESQL dataset is used for the field of electric energy, and SMI-SQL is the Text-to-SQL dataset for structural monitoring and defect inspection.
From the perspective of data context, the datasets from the industrial scenarios involve a larger number of columns, which poses more challenges for the ability to understand the schema. Table 1 shows the number of table columns in the testing sets of the three datasets.
Table 1
The statistical results of the number of table columns.
| Datasets | 1∼5 columns | 6∼10 columns | 11∼15 columns | 16∼20 columns | > 20 columns |
| WikiSQL | 611 (40.73%) | 860 (57.33%) | 29 (1.93%) | 0 (0%) | 0 (0%) |
| ESQL | 0 (0%) | 0 (0%) | 0 (0%) | 300 (20%) | 1200 (80%) |
| SMI-SQL | 14 (0.93%) | 320 (21.33%) | 342 (22.80%) | 547 (36.47%) | 277 (18.47%) |
As shown in Table 1, 40.73% of the examples in WikiSQL contain the number of columns between 1 and 5. Only 1.93% of them contain tables with 11–15 columns. The difference is in the industrial databases, there are more columns contained in the tables. For example, in the ESQL dataset, all example tables contain more than 16 columns, and even 80% of the tables have more than 20 columns. Only 0.93% of the tables in the SMI-SQL dataset have less than 5 columns. Therefore, the industrial databases are characterized by Wide tables.
From the perspective of SQL queries, QA in industrial scenarios need to deal with more complex conditions and aggregation functions. Tables 2 and 3 present the statistical results of conditions in WHERE clauses and functions in SQL queries, respectively. In WikiSQL, 67.87% of the examples contain only one WHERE condition. In ESQL and SMI-SQL datasets, about 40% of the examples contain two conditions in the WHERE clauses. There are 274 examples containing three conditions, which account for 18.27% of SMI-SQL examples. On the other hand, 69.53% of the examples in WikiSQL do not contain any arithmetic functions. In industrial scenarios, more generated SQL queries involve Count(), Sum(), Avg(), and other functions. For example, as shown in Table 3, 20.67% of the generated SQL examples in SMI-SQL contain Count(). This poses significant challenges to the model capability of question understanding and function identification.
Table 2
The statistical results of the conditions in WHERE clauses.
| Datasets | 0 | 1 | 2 | 3 | 4 |
| WikiSQL | 9 (0.60%) | 1018 (67.87%) | 385 (25.76%) | 74 (4.93%) | 14 (0.93%) |
| ESQL | 297 (19.80%) | 532 (35.47%) | 600 (40.00%) | 69 (4.60%) | 2 (0.13%) |
| SMI-SQL | 76 (5.06%) | 540 (36.00%) | 604 (40.27%) | 274 (18.27%) | 6 (0.40%) |
Table 3
The statistical results of the functions in SQL queries.
| Datasets | Null | Max() or min() | Count() | Sum() | Avg() |
| WikiSQL | 1043 (69.53%) | 196 (13.07%) | 136 (9.07%) | 65 (4.33%) | 60 (4.00%) |
| ESQL | 823 (54.87%) | 99 (6.60.%) | 362 (24.13%) | 108 (7.20%) | 108 (7.20%) |
| SMI-SQL | 993 (66.20%) | 88 (5.86%) | 310 (20.67%) | 68 (4.53%) | 41 (2.73%) |
Based on the above analysis, we argue that handling the Wide table with multicolumns and parsing complex questions should be the keys to achieve efficiency Text-to-SQL for the industrial databases. Therefore, we propose a novel approach suitable for the task characteristics to improve the performance in the few-shot settings.
3.3. Overview of the Pretrained SDCUP Model
SDCUP is the first Chinese table pretraining model proposed by Alibaba Damo Academy and is accompanied by two novel pretraining goals: schema dependency prediction (SDP) and entity perturbation recovery (EPR). SDP integrates inductive bias for modeling the interaction between questions and structured tables into the pretraining process, and EPR provides better generalizability and adaptability question-aware representation.
SDCUP significantly improves the accuracy of table QA by directly predicting the keyword mapping between natural language and structured table. Specifically, SDCUP refers to the semantic dependency analysis mechanism to model the schema dependency task, uses a fully connected network to obtain the semantic representation of each node as a parent node or a child node, and then uses a biaffine network to predict the probability of the existence of each edge and the probability of that edge relationship type. At the pretraining stage, SDCUP uses a schema-aware curriculum learning mechanism that imitates humans and utilizes comprehensive valuable information in a noisy corpus to improve the model performance.
3.4. Architecture
Instead of training from scratch, we take the pretrained table semantic parsing model SDCUP as the backbone. Figure 1 shows the overall architecture of our model.
[figure(s) omitted; refer to PDF]
Overall, the proposed model consists of an operator semantic enhancement (OSE) module, an attention-based column filter (ACF) module, and the pretrained SDCUP model. Following the task definition of Text-to-SQL, the question
Moreover, we heuristically introduce a predefined operator set termed
On the other hand, the ACF module takes the question
After that, the concatenation of
In the following sections, we elaborate on the algorithms of the two modules: ACF and OSE, respectively.
3.5. ACF
Intuitively, the main purpose of this module is to select the candidate columns according to the intention of input question, so that to reduce redundant information from multiple columns in the Wide table scenario. The module is composed of a BERT encoder, a long-short term memory (LSTM) component and a self-attention component.
Formally, in order to adapt to the input format of BERT, we serialize the question
We use the pretrained BERT-large model as the encoder. The computing procedure is listed as follows:
Furthermore, we employ the LSTM component to extract the long context features, which are calculated as
By using the index
We calculate the mean values of
Next, the attention-based calculation is performed as follows:
Finally, if
3.6. The OSE Module
The identification of complex operators in the questions is the key point to achieve efficient Text-to-SQL. In particular, in the industrial database scenario, the WHERE clause contains more complex functions. Therefore, we propose an OSE manner to improve the parsing ability.
Besides the question
Formally, taking the serialized
In order to enhance the semantics of operator span, we add each feature to itself as follows:
We replace the original feature
3.7. Model Training
In terms of the training strategies for the ACF module, we set the label of question-related column to 1, the irrelevant columns to 0. The binary cross-entropy BCEWithLogitsLoss is used to optimize the classifier. The loss function
Because we utilize the pretrained SDCUP model as the SQL prediction module, we follow the training manners of SDCUP to perform few-shot training. We refer the loss of SDCUP to
4. Experiments
Following the recent few-shot Text-to-SQL research efforts for industrial databases, we evaluate our model on the ESQL dataset for the field of electric energy. We also construct a few-shot Text-to-SQL dataset for the field of structural monitoring and defect inspection which is termed SMI-SQL to compare the performance with the baseline models.
4.1. Experiment Setup
We conduct the experiments on a server with Intel i9-10900x CPU, 128 GB RAM, NVIDIA RTX 3090 GPU, Ubuntu 20.04, and CUDA 11.2. We implement the proposed model based on PyTorch, with the open-source BERT_Chinese_Large serving as the encoder. In terms of hyperparameter settings, the learning rate of model fine-tuning is set to 1e − 5, the learning rate of SQL prediction is set to 1e − 3, and Adam is used as the optimizer. The number of LSTM layers in ACF is set to 2. To prevent model overfitting, dropout with 0.2 is used after encoder. The batch size is set to 2 for both datasets.
Following the evaluation approach in MST-SQL, four few-shot settings, e.g., 5-shot, 10-shot, 15-shot, and 20-shot are used. The logical form (LF) accuracy is adopted as the metrics [34]. The exact matching (EM) results are evaluated between the predicted SQL and the gold answer. LF accuracy is calculated as follows:
4.2. Comparative Experiments With Baseline Models
In order to evaluate the performance of the proposed manner, we compare with the emerging few-shot Text-to-SQL models which have the comparable number of model parameters. These baseline models include SQLova [32], MC-SQL [3], HydraNet [35], and MST-SQL [34]. Among them, MST-SQL is the previous SOTA model on the ESQL dataset. MST-SQL trains the content-enhanced model via the meta-learning technology. We use the term QaF-SQL to represent our model.
Tables 4 and 5 show the experimental results on the ESQL dataset and the SMI-SQL dataset, respectively. On both datasets, our model outperforms the baseline models in all few-shot settings.
Table 4
The experimental results on the ESQL benchmark.
| Models | 5-shot | 10-shot | 15-shot | 20-shot |
| SQLova | 22.3 | 39.6 | 52.0 | 54.7 |
| MC-SQL | 36.5 | 53.2 | 60.5 | 67.4 |
| HydraNet | 43.6 | 58.1 | 70.5 | 76.7 |
| MST-SQL | 55.3 | 67.4 | 76.7 | 80.5 |
| QaF-SQL | 58.0 | 78.5 | 83.0 | 88.8 |
Note: The bold values represent the state-of-the-art results under the current settings.
Table 5
The experimental results on the SMI-SQL benchmark.
| Models | 5-shot | 10-shot | 15-shot | 20-shot |
| SQLova | 37.7 | 53.9 | 68.6 | 77.3 |
| MC-SQL | 41.1 | 55.0 | 73.4 | 80.6 |
| HydraNet | 28.3 | 39.2 | 76.5 | 83.5 |
| MST-SQL | 33.0 | 37.3 | 82.3 | 88.2 |
| QaF-SQL | 57.5 | 79.4 | 86.5 | 88.7 |
Note: The bold values represent the state-of-the-art results under the current settings.
In the 5-shot setting, all models do not perform well on two datasets. Our QaF-SQL achieves the LF accuracy of 58.0% and 57.5% on ESQL and SMI-SQL, outperforming the baseline models by a large margin. For example, on the ESQL dataset, our model outperforms MST-SQL by 2.7% and SQLova by 35.7%. On the SMI-SQL dataset, the accuracy of QaF-SQL is 16.4% higher than that of the second place MC-SQL model.
When the number of training examples is increased to 10, the accuracy significantly improves. For example, on the SMI-SQL dataset, the accuracy of our QaF-SQL improves from 58.0% to 78.5%, which is better than the previous SOTA model MST-SQL by 11.1%.
In the 15-shot setting, QaF-SQL achieves better performance as well. As shown in Tables 4 and 5, the accuracy of QaF-SQL improves to 83.0% and 86.5%, which are better than the performance of MST-SQL by 6.3% and 4.2%, respectively.
When there are 20 labeled samples for model fine-tuning, the accuracy of our model reaches 88.8% and 88.7%, respectively, which is better than all the baseline models participating in the comparison.
In terms of computational efficiency, we analyzed the number of parameters of each model for comparison. We further measured the time-consume and computational complexity required for inferencing on a single sample. The experimental results are recorded in Table 6.
Table 6
Number of model parameters and inference efficiency of each model.
| Model | Params (M) | Time-consume (ms) | GFlops |
| SQLova | 335 | 141 | 350 |
| MC-SQL | 103 | 70 | 92 |
| HydraNet | 355 | 87 | 343 |
| MST-SQL | 357 | 68 | 347 |
| QaF-SQL | 349 | 42 | 334 |
The experimental results show that our model has a similar number of model parameters compared with MST-SQL, HydraNet, and other models. However, our model is more computationally efficient. For example, the average inference time of our model is 42 ms, much less than the baseline models.
4.3. Comparative Experiments With LLMs
Since the LLMs have achieved well performance in many QA tasks, we further perform comparative experiments with the open-source LLMs. We choose Vicuna-13B [39], which is a chatbot trained by fine-tuning LLaMA on 70K user-shared conversations. It achieves more than 90% quality of OpenAI ChatGPT while outperforming other models like LLaMA and Stanford Alpaca in more than 90% of cases.
We comprehensively evaluate the performance on nine tasks, including the syntactic compliance of the generated SQL (Syn-SQL), the accuracy of the LF, the accuracy of the execution results (EX), as well as the performance of the six subtasks such as SC, SA, WN, WC, WO, and WV.
As shown in Table 7, almost all statements generated by our QaF-SQL model conform to SQL syntax. However, less than 60% of the SQL statements generated by Vicuna-13B can be correctly executed by SQL engines. In addition, our model is significantly better than Vicuna-13B in the accuracy evaluation of LF and EX. In terms of the six subtasks of Text-to-SQL, we use the EM values as the evaluation metric. As shown in Table 8, the general LLM is difficult to adapt to Text-to-SQL. The performance of Vicuna-13B is much lower than that of our model in terms of EM accuracy of six subtasks.
Table 7
The Syn-SQL, LF, and EX results of comparative experiments.
| Datasets | Models | Syn-SQL | LF | EX |
| ESQL | Vicuna-13B | 57.7 | 3.1 | 8.6 |
| QaF-SQL | 99.9 | 58.0 | 60.2 | |
| SMI-SQL | Vicuna-13B | 58.6 | 2.9 | 11.0 |
| QaF-SQL | 99.9 | 57.5 | 59.3 | |
Note: The bold values represent the state-of-the-art results under the current settings.
Table 8
The results of comparative experiments on six subtasks.
| Datasets | Models | SC | SA | WN | WC | WO | WV |
| ESQL | Vicuna-13B | 22.3 | 28.9 | 41.4 | 17.6 | 24.4 | 22.9 |
| QaF-SQL | 98.3 | 84.7 | 82.7 | 77.7 | 76.9 | 62.1 | |
| SMI-SQL | Vicuna-13B | 45.3 | 24.5 | 33.7 | 15.6 | 20.3 | 17.1 |
| QaF-SQL | 97.8 | 96.9 | 76.9 | 73.4 | 76.5 | 65.3 | |
Note: The bold values represent the state-of-the-art results under the current settings.
4.4. Ablation Study
To verify the performance impact of key components, some ablation studies are performed. Table 9 shows the ablation results on ESQL and SMI-SQL across four few-shot settings. Table 10 shows the ablation results on the six subtasks of Text-to-SQL in the 5-shot setting. The term “w/o.” denotes to remove the corresponding module.
Table 9
The results of ablation study across four few-shot settings.
| Datasets | Models | 5-shot | 10-shot | 15-shot | 20-shot |
| ESQL | QaF-SQL | 58.0 | 78.5 | 83.0 | 88.8 |
| w/o. OSE | 44.6 | 61.3 | 70.0 | 77.4 | |
| w/o. ACF | 36.8 | 58.7 | 63.4 | 76.9 | |
| w/o. OSE & ACF | 35.5 | 56.5 | 60.7 | 76.3 | |
| SMI-SQL | QaF-SQL | 57.5 | 79.4 | 86.5 | 88.7 |
| w/o. OSE | 53.5 | 75.5 | 86.2 | 86.9 | |
| w/o. ACF | 49.4 | 74.6 | 82.6 | 84.8 | |
| w/o. OSE & ACF | 49.1 | 73.2 | 78.6 | 81.2 | |
Note: The bold values represent the state-of-the-art results under the current settings.
Table 10
The results of ablation study on six subtasks in 5-shot setting.
| Datasets | Models | SC | SA | WN | WC | WO | WV |
| ESQL | QaF-SQL | 98.3 | 84.7 | 82.7 | 77.7 | 76.9 | 62.1 |
| w/o. OSE | 97.6 | 83.6 | 83.5 | 75.4 | 73.8 | 58.4 | |
| w/o. ACF | 90.5 | 78.3 | 83.5 | 70.0 | 74.1 | 60.4 | |
| w/o. OSE & ACF | 91.3 | 82.9 | 81.6 | 71.2 | 72.4 | 56.3 | |
| SMI-SQL | QaF-SQL | 97.8 | 96.9 | 76.9 | 73.4 | 76.5 | 65.3 |
| w/o. OSE | 97.5 | 93.7 | 72.9 | 68.8 | 72.5 | 62.4 | |
| w/o. ACF | 94.7 | 91.6 | 74.3 | 64.5 | 75.9 | 62.9 | |
| w/o. OSE & ACF | 90.4 | 92.2 | 71.7 | 66.1 | 71.4 | 58.6 | |
Note: The bold values represent the state-of-the-art results under the current settings.
As with the expected results, when the model components are removed, the performance decreases. The ablation results also show that the ACF module has the greatest impact on the model performance. For example, the ablation results on the ESQL dataset show that when the ACF module is removed, the model performance decreases from 58.0% to 36.8% in the 5-shot setting, and in the 20-shot setting, the performance drops from 88.8% to 76.9%. The ablation experiments on the six subtasks show similar results. The reason for this result is that the ACF module can effectively filter out irrelevant columns according to the input question statement, so that the search space of the answers is reduced, and the accuracy is improved.
As shown in Tables 9 and 10, when we remove the OSE module, the performance decreases in all ablation settings. This also shows that incorporating operator semantics can help better understand the questions and improve performance.
When ACF and OSE modules are removed, the model can be regarded as performing SQL generation with the SDCUP pretrained model, so the performance is also significantly degraded.
Upon further analysis of Table 10, it is evident that the ACF module substantially enhances the precision of column name selection (SC) within the SELECT clause. This improvement is attributed to the module’s ability to filter out irrelevant columns, thereby enabling the model to concentrate on those columns that are more pertinent to the query, which in turn diminishes the likelihood of erroneous selections. Concurrently, the ACF module exerts a positive influence on the selection of aggregation functions (SA) and the column names within the WHERE clause (WC), underscoring the pivotal role of column name accuracy in the SQL query generation process.
The OSE module, on the other hand, primarily targets the refinement of operator selection, which subsequently bolsters the precision of conditional expressions and directly impacts the WHERE clause’s operator selection (WO). This enhancement also indirectly ameliorates the selection of column names (WC) and values (WV) in the WHERE clause. The rationale behind this improvement is that a more nuanced selection of operators facilitates the model’s comprehension of the logical conditions, thereby enabling a more accurate determination of column names and values.
4.5. Evaluation of Generalization on Nonindustrial Datasets
In order to verify the performance of the proposed approach in other domains, we further carried out several generalization experiments. We employed the CBANK dataset [40], which contains Text-SQL pairs for the financial scenario. The training set, validation set, and test set of CBANK dataset contain 14,625, 1603, and 1530 samples, respectively. In order to evaluate the performance in the few-shot settings, we reconstructed the training set via 5, 10, 15, and 20 shots. We use the baseline models for comparative evaluation, while the LF accuracy rate is also employed. The results of the generalization evaluation are shown in Table 11.
Table 11
The experimental results on the CBANK dataset for financial scenario.
| Models | 5-shot | 10-shot | 15-shot | 20-shot |
| SQLova | 33.6 | 47.8 | 56.4 | 67.5 |
| MC-SQL | 40.1 | 53.6 | 63.8 | 70.2 |
| HydraNet | 37.8 | 55.3 | 67.6 | 76.3 |
| MST-SQL | 48.4 | 60.3 | 74.2 | 79.5 |
| QaF-SQL | 55.7 | 69.2 | 76.4 | 81.6 |
Note: The bold values represent the state-of-the-art results under the current settings.
For the financial domain, our QaF-SQL outperforms other baseline models in all few-shot settings, achieving LF accuracy of 55.7%, 69.2%, 76.4%, and 81.6% in 5-shot, 10-shot, 15-shot, and 20-shot settings, respectively. The experimental results show that our proposed method can also be applied to other nonindustrial scenarios and has better Text-to-SQL performance than the mainstream models.
5. Case Analysis
The experimental results show that QaF-SQL achieves SOTA performance on the ESQL and SMI-SQL datasets in the few-shot settings. We selected several samples from the test sets as cases to analyze the strengths and deficiencies. Since the datasets ESQL and SMI-SQL are both in Chinese, we translate the cases to English. The original golden answers and model output can be found on our GitHub pages. Table 12 presents the results of case analysis, where the rows “Golden Ans.” show the golden answers.
Table 12
The results of case analysis on the ESQL and SMI-SQL datasets.
| Datasets | Models | Output SQL statement |
| ESQL | Question 1 | Which organization is located outside the west and has a sales rate of more than 87.27% |
| Golden Ans. | SELECT ‘dept’ FROM table_1 WHERE ‘loc’ ! = “west” AND ‘sale rate’> “87.27%” | |
| SDCUP | SELECT ‘dept’ FROM table_1 WHERE ‘loc’ = “west” AND ‘sale rate’> “87.27%” | |
| Vicuna-13B | SELECT AVG(‘sale rate’) FROM table_1 WHERE ‘dept’! = “west” AND ‘sale rate’> “87.27%” | |
| QaF-SQL | SELECT ‘dept’ FROM table_1 WHERE ‘loc’ ! = “west” AND ‘sale rate’> “87.27%” | |
| Question 2 | How many companies have a pre-tax net profit that is not equal to 4401.76 | |
| Golden Ans. | SELECT COUNT(‘company’) FROM table_14 WHERE pre-tax net profit! = ‘4401.76’ | |
| SDCUP | SELECT company FROM table_14 WHERE pre-tax net profit! = ‘4401.76’ | |
| Vicuna-13B | SELECT COUNT(‘company’) FROM table_14 WHERE net profit! = ‘4401.76’ | |
| QaF-SQL | SELECT COUNT(‘company’) FROM table_14 WHERE pre-tax net profit! = ‘4401.76’ | |
| SMI-SQL | Question 1 | How many bridges are there in Xixia |
| Golden Ans. | SELECT COUNT(‘bridge_name’) FROM cbms_bridge_all WHERE ‘loc’ = “Xixia” | |
| SDCUP | SELECT COUNT(‘bridge_name’) FROM cbms_bridge_all WHERE ‘loc’ = “Xixia” AND ‘bridge_size’ = “Xixia” | |
| Vicuna-13B | SELECT COUNT( | |
| QaF-SQL | SELECT COUNT(‘bridge_name’) FROM cbms_bridge_all WHERE ‘loc’ = “Xixia” | |
| Question 2 | Which bridges have a technical_level of 1? | |
| Golden Ans. | SELECT ‘bridge_name’ FROM cbms_evaluate_record WHERE technical_level = ‘1’ | |
| SDCUP | SELECT COUNT(‘bridge_name’) FROM cbms_evaluate_record WHERE technical_level = ‘1’ | |
| Vicuna-13B | SELECT COUNT(‘bridge_name’) FROM cbms_evaluate_record | |
| QaF-SQL | SELECT ‘bridge_name’ FROM cbms_evaluate_record WHERE technical_level = ‘1’ | |
In the case analysis on the ESQL dataset, the input question is “Which organization is located outside the West and has a sales rate of more than 87.27%”. As shown in the first part of Table 12, QaF-SQL generates SQL statements consistent with the gold answers. However, SDCUP incorrectly predicts the operator as ‘ = ’ instead of ‘! = ’, and Vicuna-13B incorrectly generates an AVG() function.
As shown in the second part of Table 12, on the SMI-SQL dataset, our QaF-SQL correctly generates SQL statement when the input question is “How many bridges are there in Xixia”. Nevertheless, the SDCUP model generates a redundant AND condition ‘bridge_size’ = “Xixia”. In addition, Vicuna-13B incorrectly generates the arguments of the COUNT function, and the condition in the WHERE fragment is also faulty. These cases illustrate that our QaF-SQL has more accurate WHERE condition generation ability.
In order to analyze the shortcomings of our model for further improvement, we selected several error cases of QaF-SQL. As shown in Table 13, on the ESQL dataset, when the question is “What is the project scale that the profit of Dongli company exceeds 636.2”, our QaF-SQL incorrectly generates a COUNT function. The reason should be that the meaning of “project scale” in the question is not properly understood by QaF-SQL. On the SMI-SQL dataset, when the input question is “How many tunnels are between 1000 and 3000 in length”, QaF-SQL incorrectly uses the term “between 1000 and 3000” as a match condition instead of splitting it into two comparison conditions. These errors show that the capability of understanding complex questions with complex conditions should be strengthened in the future.
Table 13
The results of error analysis on the ESQL and SMI-SQL datasets.
| Datasets | Models | Output SQL statement |
| ESQL | Question 1 | What is the project scale that the profit of Dongli company exceeds 636.2 |
| Golden Ans. | SELECT ‘project scale’ FROM table_2 WHERE ‘organ’ = “Dongli company” AND ‘profit’> “636.2” | |
| QaF-SQL | SELECT COUNT(‘project scale’) FROM table_2 WHERE ‘organ’ = “Dongli company” AND ‘profit’> “636.2” | |
| Question 2 | Among the departments with sales rates below 56.35% and not located in the northeast, what is the lowest sales rate? | |
| Golden Ans. | SELECT MIN(sales rate) FROM table_1 WHERE ‘sales rate’< “56.35% ”and ‘loc’! = “northeast” | |
| QaF-SQL | SELECT MIN(‘sales rate’) FROM table_1 WHERE ‘sales rate’< “56.35%” | |
| SMI-SQL | Question 1 | How many tunnels are between 1000 and 3000 in length |
| Golden Ans. | SELECT COUNT(‘tunnel_name’) FROM tunnel_basic WHERE ‘tunnel_length’> “1000” AND ‘tunnel_length’ < “3000” | |
| QaF-SQL | SELECT COUNT(‘tunnel_name’) FROM tunnel_basic WHERE ‘tunnel_length’ = “between 1000 and 3000” | |
| Question 2 | How many long tunnels are there in Zhongwei city? | |
| Golden Ans. | SELECT COUNT(‘tunnel_name’) FROM tunnel_basic WHERE ‘tunnel_category’ = “Long_tunnel” and ‘location_city’ = “Zhongwei city” | |
| QaF-SQL | SELECT COUNT(‘tunnel_name’) FROM tunnel_basic WHERE ‘tunnel_category’ = ‘long’ and ‘location_city’ = “Zhongwei city” | |
6. Conclusion
To alleviate the challenge of the few-shot Text-to-SQL task for industrial databases, a novel question-aware neural model named QaF-SQL is proposed in this paper. Taking the pretrained SDCUP model as the backbone, QaF-SQL adopts a novel ACF mechanism to select the candidate columns according to the intention of input question, so that the redundant information from multiple columns in the Wide table scenario can be reduced. In order to improve the understanding of industrial questions, we introduce an OSE mechanism that facilitates the accurate generation of WHERE conditions. We evaluate the proposed QaF-SQL on the ESQL dataset and the SMI-SQL dataset. The experimental results show that QaF-SQL can achieve SOTA performance across four few-shot settings. In addition, some analytical experiments also show that the proposed model effectively improves the performance for industrial databases.
Funding
This work was supported by the science and technology research program of the Chongqing Municipal Education Commission of China under grant nos. KJZD-M202300703 and KJQN202200720, the Natural Science Foundation of Chongqing, China, under grant no. CSTB2023NSCQ-MSX0145, and the Graduate Student Research Innovation Project of Chongqing under grant no. CYS240499.
[1] T. Schlegl, S. Schlegl, D. Tomaselli, N. West, J. Deuse, "Adaptive Similarity Search for the Retrieval of Rare Events From Large Time Series Databases," Advanced Engineering Informatics, vol. 52,DOI: 10.1016/j.aei.2022.101629, 2022.
[2] Q. Guo, S. Cao, Z. Yi, "A Medical Question Answering System Using Large Language Models and Knowledge Graphs," International Journal of Intelligent Systems, vol. 37 no. 11, pp. 8548-8564, DOI: 10.1002/int.22955, 2022.
[3] Y. Chen, X. Guo, C Wang, "Leveraging Table Content for Zero-Shot Text-To-SQL with Meta-Learning," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35 no. 5, pp. 3992-4000, DOI: 10.1609/aaai.v35i5.16519, 2021.
[4] R. Li, T. Mo, J. Yang, S. Jiang, T. Li, Y. Liu, "Ontologies-based Domain Knowledge Modeling and Heterogeneous Sensor Data Integration for Bridge Health Monitoring Systems," IEEE Transactions on Industrial Informatics, vol. 17 no. 1, pp. 321-332, DOI: 10.1109/tii.2020.2967561, 2021.
[5] A. S. Bhardwaj, A. Deep, D. Veeramani, S. Zhou, "A Custom Word Embedding Model for Clustering of Maintenance Records," IEEE Transactions on Industrial Informatics, vol. 18 no. 2, pp. 816-826, DOI: 10.1109/tii.2021.3079521, 2022.
[6] H. Zhu, P. Tiwari, A. Ghoneim, M. S. Hossain, "A Collaborative AI-Enabled Pretrained Language Model for AIoT Domain Question Answering," IEEE Transactions on Industrial Informatics, vol. 18 no. 5, pp. 3387-3396, DOI: 10.1109/tii.2021.3097183, 2022.
[7] G. Katsogiannis-Meimarakis, G. Koutrika, "A Survey on Deep Learning Approaches for Text-To-SQL," The VLDB Journal, vol. 32 no. 4, pp. 905-936, DOI: 10.1007/s00778-022-00776-8, 2023.
[8] S. Xu, S. J. Semnani, G. Campagna, M. S. Lam, AutoQA, "From Databases to QA Semantic Parsers with Only Synthetic Training Data," Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 422-434, .
[9] X. Deng, A. H. Awadallah, C. Meek, S. H. Polozov, M. Richardson, "Structure-Grounded Pretraining for Text-To-SQL," Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1337-1350, .
[10] "ChatGPT," 2022.
[11] H. Touvron, T. Lavril, G. Izacard, "Llama: Open and Efficient Foundation Language Models," CoRR, 2023.
[12] A. Yang, B. Xiao, B. Wang, "Baichuan 2: Open Large-Scale Language Models," arXiv preprint arXiv:2309.10305, 2023.
[13] W. X. Zhao, K. Zhou, J. Li, "A Survey of Large Language Models," 2023.
[14] B. Zhou, X. Li, T. Liu, K. Xu, W. Liu, J. Bao, "CausalKGPT: Industrial Structure Causal Knowledge-Enhanced Large Language Model for Cause Analysis of Quality Problems in Aerospace Product Manufacturing," Advanced Engineering Informatics, vol. 59,DOI: 10.1016/j.aei.2023.102333, 2024.
[15] V. Zhong, C. Xiong, R. Socher, "Seq2sql: Generating Structured Queries from Natural Language Using Reinforcement Learning," arXiv preprint arXiv:1709.00103, 2017.
[16] T. Yu, R. Zhang, K. Yang, "Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-To-Sql Task," arXiv preprint arXiv:1809.08887, 2018.
[17] T. Yu, R. Zhang, M. Yasunaga, "Sparc: Cross-Domain Semantic Parsing in Context," arXiv preprint arXiv:1906.02285, 2019.
[18] Q. Li, L. Li, Q. Li, J. Zhong, "A Comprehensive Exploration on Spider with Fuzzy Decision Text-To-SQL Model," IEEE Transactions on Industrial Informatics, vol. 16 no. 4, pp. 2542-2550, DOI: 10.1109/tii.2019.2952929, 2020.
[19] W. Zhao, D. Fang, J. Zhang, "An Effective Framework for Semistructured Document Classification via Hierarchical Attention Model," International Journal of Intelligent Systems, vol. 36 no. 9, pp. 5161-5183, DOI: 10.1002/int.22508, 2021.
[20] B. Hui, X. Shi, R. Geng, Improving Text-To-SQL with Schema Dependency Learning, 2021.
[21] X. V. Lin, R. Socher, C. Xiong, Bridging Textual and Tabular Data for Cross-Domain Text-To-SQL Semantic Parsing, 2020.
[22] C. Wei, S. Huang, R. Li, "Enhance Text-To-SQL Model Performance with Information Sharing and Reweight Loss," Multimedia Tools and Applications, vol. 81 no. 11, pp. 15205-15217, DOI: 10.1007/s11042-022-12573-0, 2022.
[23] F. Lanfermann, S. Schmitt, "Concept Identification for Complex Engineering Datasets," Advanced Engineering Informatics, vol. 53,DOI: 10.1016/j.aei.2022.101704, 2022.
[24] L. Nan, Y. Zhao, W. Zou, Enhancing Text-To-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies, 2023.
[25] R. Sun, S. Arik, R. Sinha, SQLPrompt: In-Context Text-To-SQL with Minimal Labeled Data, 2023.
[26] J. Yang, H. Jiang, Q. Yin, D. Zhang, B. Yin, D. Yang, SEQZERO: Few-Shot Compositional Semantic Parsing with Sequential Prompts and Zero-Shot Models, 2022.
[27] Z. Gu, J. Fan, N. Tang, "Few-shot Text-To-Sql Translation Using Structure and Content Prompt Learning," Proceedings of the ACM on Management of Data, vol. 1 no. 2,DOI: 10.1145/3589292, 2023.
[28] R. Shin, C. Lin, S. Thomson, "Constrained Language Models Yield Few-Shot Semantic Parsers," Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 7699-7715, .
[29] N. Schucher, S. Reddy, H. de Vries, "The Power of Prompt Tuning for Low-Resource Semantic Parsing," Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 148-156, .
[30] S. Chang, P. Liu, Y. Tang, J. Huang, X. He, B. Zhou, "Zero-Shot Text-To-SQL Learning with Auxiliary Task," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34 no. 05, pp. 7488-7495, DOI: 10.1609/aaai.v34i05.6246, 2020.
[31] V. Zhong, M. Lewis, S. I. Wang, L. Zettlemoyer, "Grounded Adaptation for Zero-Shot Executable Semantic Parsing[C]," Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6869-6882, .
[32] W. Hwang, J. Yim, S. Park, M. Seo, "A Comprehensive Exploration on Wikisql with Table-Aware Word Contextualization," arXiv preprint arXiv:1902.01069, 2019.
[33] Y. Chen, X. Guo, T. Wu, G. Qi, Y. Li, Y. Dong, "Learn from Yesterday: A Semi-supervised Continual Learning Method for Supervision-Limited Text-To-SQL Task Streams," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37 no. 11, pp. 12682-12690, DOI: 10.1609/aaai.v37i11.26492, 2023.
[34] X. Guo, Y. Chen, G. Qi, T. Wu, H. Xu, "Improving Few-Shot Text-To-SQL with Meta Self-Training via Column Specificity[C]," IJCAI, pp. 4150-4156, 2022.
[35] Q. Lyu, K. Chakrabarti, S. Hathi, S. Kundu, J. Zhang, Z. Chen, "Hybrid Ranking Network for Text-To-Sql," arXiv preprint arXiv:2008.04759, 2020.
[36] T. Zhang, C. Chen, C. Liao, "SQLfuse: Enhancing Text-To-SQL Performance through Comprehensive LLM Synergy," arXiv preprint arXiv:2407.14568, 2024.
[37] Y. Gao, Y. Liu, X Li, "XiYan-SQL: A Multi-Generator Ensemble Framework for Text-To-SQL," arXiv preprint arXiv:2411.08599, 2024.
[38] J. Lyu, X. Wang, G. Chen, H. Zhang, M. Wang, "Chinese Text-To-SQL Model for Industrial Production," Journal of Computer Applications, vol. 42 no. 10, pp. 2996-3002, 2022.
[39] W. L. Chiang, Z. Li, Z Lin, Vicuna: An Open-Source Chatbot Impressing Gpt-4, 2023.
[40] B. Qin, B. Hui, L Wang, "Schema Dependency-Enhanced Curriculum Pre-Training for Table Semantic Parsing," Knowledge-Based Systems, vol. 262,DOI: 10.1016/j.knosys.2023.110264, 2023.
Copyright © 2025 Ren Li et al. International Journal of Intelligent Systems published by John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution License (the “License”), which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/