Full Text

Turn on search term navigation

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Governments are embracing an open data philosophy and making their data freely availableto the public to encourage innovation and increase transparency. However, the number of availabledatasets is still limited. Finding relationships between related datasets on different data portalsenables users to search the relevant datasets. These datasets are generated from the training data,which need to be curated by the user query. However, relevant dataset retrieval is an expensiveoperation due to the preparation procedure for each dataset. Moreover, it requires a significantamount of space and time. In this study, we propose a novel framework to identify the relationshipsbetween datasets using structural information and semantic information for finding similar datasets.We propose an algorithm to generate the Concept Matrix (CM) and the Dataset Matrix (DM) fromthe concepts and the datasets, which is then used to curate semantically related datasets in responseto the users’ submitted queries. Moreover, we employ the proposed compression, indexing, andcaching algorithms in our proposed scheme to reduce the required storage and time while searchingthe related ranked list of the datasets. Through extensive evaluation, we conclude that the proposedscheme outperforms the existing schemes.

Details

Title
An Efficient Framework for Finding Similar Datasets Based on Ontology
Author
Sultana, Tangina 1   VIAFID ORCID Logo  ; Qudus, Umair 2   VIAFID ORCID Logo  ; Umair, Muhammad 3   VIAFID ORCID Logo  ; Hossain, Md Delowar 4   VIAFID ORCID Logo 

 Department of Computer Science and Engineering, Kyung Hee University, Yongin-si 17104, Republic of Korea; [email protected] (T.S.); [email protected] (M.U.); Department of Electronics and Communication Engineering, Hajee Mohammad Danesh Science & Technology University, Dinajpur 5200, Bangladesh 
 Department of Computer Science, Paderborn University, Warburger Str. 100, 33098 Paderborn, Germany; [email protected] 
 Department of Computer Science and Engineering, Kyung Hee University, Yongin-si 17104, Republic of Korea; [email protected] (T.S.); [email protected] (M.U.) 
 Department of Computer Science and Engineering, Kyung Hee University, Yongin-si 17104, Republic of Korea; [email protected] (T.S.); [email protected] (M.U.); Department of Computer Science and Engineering, Hajee Mohammad Danesh Science & Technology University, Dinajpur 5200, Bangladesh 
First page
4417
Publication year
2024
Publication date
2024
Publisher
MDPI AG
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3133009411
Copyright
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.