Arabic WikiTableQA: Benchmarking Question Answering over Arabic Tables Using Large Language Models

Abstract

Table-based question answering (TableQA) has made significant progress in recent years; however, most advancements have focused on English datasets and SQL-based techniques, leaving Arabic TableQA largely unexplored. This gap is especially critical given the widespread use of structured Arabic content in domains such as government, education, and media. The main challenge lies in the absence of benchmark datasets and the difficulty that large language models (LLMs) face when reasoning over long, complex tables in Arabic, due to token limitations and morphological complexity. To address this, we introduce Arabic WikiTableQA, the first large-scale dataset for non-SQL Arabic TableQA, constructed from the WikiTableQuestions dataset and enriched with natural questions and gold-standard answers. We developed three methods to evaluate this dataset: a direct input approach, a sub-table selection strategy using SQL-like filtering, and a knowledge-guided framework that filters the table using semantic graphs. Experimental results with an LLM show that the graph-guided approach outperforms the others, achieving 74% accuracy, compared to 64% for sub-table selection and 45% for direct input, demonstrating its effectiveness in handling long and complex Arabic tables.

Details

Subject

Arabic language;
Questions;
Datasets;
Accuracy;
Metadata;
Large language models;
Graphs;
Benchmarks;
Decomposition;
Natural language processing;
Linguistics;
Complexity;
Web portals;
Queries;
Structured Query Language-SQL;
Knowledge representation;
Semantics;
Query languages

Company / organization

Name:

Wikipedia

NAICS:

513140, 516210

Identifier / keyword

information retrieval; knowledge graph; long table; table question answering

Title

Arabic WikiTableQA: Benchmarking Question Answering over Arabic Tables Using Large Language Models

Author

Fawaz, Alsolami¹

; Alrayzah Asmaa²

¹ Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia; [email protected]
² Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran 55461, Saudi Arabia

Publication title

Electronics; Basel

Volume

Issue

First page

3829

Number of pages

Publication year

2025

Publication date

2025

Publisher

MDPI AG

Place of publication

Basel

Country of publication

Switzerland

Publication subject

Electronics

e-ISSN

20799292

Source type

Scholarly Journal

Language of publication

English

Document type

Journal Article

Publication history

Online publication date

2025-09-27

Milestone dates

2025-08-28 (Received); 2025-09-26 (Accepted)

Publication history

First posting date

27 Sep 2025

DOI

https://doi.org/10.3390/electronics14193829

ProQuest document ID

3261057209

Document URL

https://www.proquest.com/scholarly-journals/arabic-wikitableqa-benchmarking-question/docview/3261057209/se-2?accountid=208611

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Last updated

2025-10-16

Database

2 databasesView list

ProQuest One Academic
ProQuest One Academic

Arabic WikiTableQA: Benchmarking Question Answering over Arabic Tables Using Large Language Models

Content area

Abstract

Details