Abstract

With the advent of technology and use of latest devices, they produce voluminous data. Out of it, 80% of the data are unstructured and remaining 20% are structured and semi-structured. The produced data are in heterogeneous format and without following any standards. Among heterogeneous (structured, semi-structured and unstructured) data, textual data are nowadays used by industries for prediction and visualization of future challenges. Extracting useful information from it is really challenging for stakeholders due to lexical and semantic matching. Few studies have been solving this issue by using ontologies and semantic tools, but the main limitations of proposed work were the less coverage of multidimensional terms. To solve this problem, this study aims to produce a novel multidimensional reference model using linguistics categories for heterogeneous textual datasets. The categories in such context, semantic and syntactic clues are focused along with their score. The main contribution of MRM is that it checks each tokens with each term based on indexing of linguistic categories such as synonym, antonym, formal, lexical word order and co-occurrence. The experiments show that the percentage of MRM is better than the state-of-the-art single dimension reference model in terms of more coverage, linguistics categories and heterogeneous datasets.

Details

Title
A Novel Multidimensional Reference Model for Heterogeneous Textual Datasets using Context, Semantic and Syntactic Clues
Author
Kumar, Ganesh; Shuib Basri; Abdullahi Abubakar Imam; Abdullateef Oluwagbemiga Balogun; Hussaini Mamman; Capretz, Luiz Fernando
Publication year
2023
Publication date
2023
Publisher
Science and Information (SAI) Organization Limited
ISSN
2158107X
e-ISSN
21565570
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2893799718
Copyright
© 2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.