Full Text

Turn on search term navigation

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract–Transform–Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract–Clean–Load–Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.

Details

Title
A Model for Enhancing Unstructured Big Data Warehouse Execution Time
Author
Marwa Salah Farhan 1   VIAFID ORCID Logo  ; Youssef, Amira 2 ; Abdelhamid, Laila 3   VIAFID ORCID Logo 

 Department of Information Systems, Faculty of Computers and Artificial Intelligence, Helwan University, Cairo 11795, Egypt or [email protected] (M.S.F.); [email protected] (L.A.); Faculty of Informatics and Computer Science, British University in Egypt, Cairo 11837, Egypt 
 Department of Information Systems, Faculty of Computers and Artificial Intelligence, Helwan University, Cairo 11795, Egypt or [email protected] (M.S.F.); [email protected] (L.A.); Higher Institute of Computer Science and Information Systems, 5th Settlement, Department of Computer Science, Cairo 11835, Egypt 
 Department of Information Systems, Faculty of Computers and Artificial Intelligence, Helwan University, Cairo 11795, Egypt or [email protected] (M.S.F.); [email protected] (L.A.) 
First page
17
Publication year
2024
Publication date
2024
Publisher
MDPI AG
e-ISSN
25042289
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2930507083
Copyright
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.