Content area
Purpose
This research aimed to visualize and analyze the co-word network and thematic clusters of the intellectual structure in the field of linked data during 1900–2021.
Design/methodology/approach
This applied research employed a descriptive and analytical method, scientometric indicators, co-word techniques, and social network analysis. VOSviewer, SPSS, Python programming, and UCINet software were used for data analysis and network structure visualization.
Findings
The top ranks of the Web of Science (WOS) subject categorization belonged to various fields of computer science. Besides, the USA was the most prolific country. The keyword ontology had the highest frequency of co-occurrence. Ontology and semantic were the most frequent co-word pairs. In terms of the network structure, nine major topic clusters were identified based on co-occurrence, and 29 thematic clusters were identified based on hierarchical clustering. Comparisons between the two clustering techniques indicated that three clusters, namely semantic bioinformatics, knowledge representation, and semantic tools were in common. The most mature and mainstream thematic clusters were natural language processing techniques to boost modeling and visualization, context-aware knowledge discovery, probabilistic latent semantic analysis (PLSA), semantic tools, latent semantic indexing, web ontology language (OWL) syntax, and ontology-based deep learning.
Originality/value
This study adopted various techniques such as co-word analysis, social network analysis network structure visualization, and hierarchical clustering to represent a suitable, visual, methodical, and comprehensive perspective into linked data.
Introduction
The linked data (LD) initiative is the latest achievement in the natural evolution of the semantic web (Allemang and Hendler, 2011). The interest of libraries, museums, and archives (the cultural heritage context) is expanding to LD because it helps produce data innovatively and with a standardized format to be reusable and discoverable (Guerrini and Possemato, 2013). Generally, LDs are the data published on the web in a machine-readable format (Raza et al., 2019), wherein meaning is precisely defined and linked to other datasets on the web (e.g. DBpedia) and can be linked by other databases as well. In recent years, LD has been the focus of research in library and information science (LIS) in addition to computer science (Southwick, 2015). Berners-Lee introduced the concept of LD in 2006. Published data are based on principles that facilitate the link between databases, elements, and vocabularies (Moulaison and Million, 2014). Vocabularies play a significant role in promoting the semantic expression of LD by defining a schema layer for entity recognition and interconnection for knowledge graphs (Jia, 2021). Theoretically, the LD method refers to a set of best practices for structuring and linking the data available on the web (Bizer et al., 2009).
There is a strong potential in LD that provides solutions for heterogeneous web objects and many library issues, e.g. increasing web searches, authority control, classification, data flexibility, and ambiguity (Zengenene, 2013). Thus, given the growth and importance of this field, surveying the thematic clusters and analyzing topic maturity are essential measures.
This research aimed to visualize and analyze the co-word network and thematic clusters of the intellectual structure in the field of LD during 1900–2021. The following research questions were addressed:
Literature review
Co-word analysis has been a widely used method in various domains, e.g. smart cities (Ebrahiem et al., 2023), knowledge management in tourism and hospitality (Fauzi, 2023), digital marketing (Amiri et al., 2023), big data and performance measurement (Sardi et al., 2020), artificial intelligence (AI) related to the tourism industry (Kong et al., 2023), and iMetrics (Khasseh et al. (2017).
Wang et al. (2014) studied national knowledge discovery from 1992–2013 using highly frequent keywords from related core journals in the CNKI database. The study involved counting the co-occurrences of two frequent keywords in the same journal, constructing a highly frequent keyword matrix, and transforming it into a correlation and dissimilarity matrix. The dissimilarity matrix was analyzed using factor analysis and cluster analysis. They found that the current hotspots in domestic knowledge discovery are focused on six aspects: knowledge discovery based on data research, knowledge discovery algorithm optimization research, the model of knowledge discovery and references research, knowledge management based on domain ontology, expert system construction research, and applied research of the knowledge discovery.
Feng et al. (2017) proposed an improved co-word analysis method that incorporated semantic distance measurements with ontologically-based concept mapping. This method yielded better results in terms of matrix dimensions and clustering outcomes. However, this approach had some limitations; it heavily relied on domain ontology and required further study to improve its efficiency and accuracy during concept mapping. Their method enhanced co-word matrix conditions in two ways. First, by applying concept mapping within the co-word matrix labels, it combined words at the concept level to reduce matrix dimensions and create a more content-rich concept matrix. Second, it integrated logical relationships and concept connotations among studied concepts into a co-word matrix and calculated a semantic distance between concepts based on domain ontology to generate a semantic matrix.
Liu and Liu (2019) investigated research papers on knowledge engineering (2009–2018) from the WOS and used the co-word analysis method. The cluster tree was divided into four sections: smart learning and education, knowledge acquisition, knowledge basic algorithms, and knowledge engineering technology application.
Khasseh et al. (2022) investigated the intellectual structure of knowledge organization studies. They found that the most commonly used keywords in these articles are information retrieval, classification, and ontology. Moreover, certain co-word pairs, e.g. “Ontology*Semantic Web,” “Digital Library*Information Retrieval,” and “Indexing*Information Retrieval” were frequently used. While information retrieval is a major topic in knowledge organization, the study found that theoretical concepts related to knowledge organization were overlooked. The hierarchical clustering results revealed eight main thematic clusters.
Danesh and Ghavidel (2022) conducted a longitudinal study analyzing cluster concepts in knowledge organization (KO) using co-occurrence analysis. In the first period, the cluster with the highest centrality was Knowledge Management, while the cluster with the highest density was Strategic Planning from 2000–2018. In later periods, the cluster with the highest centrality and density was Information Retrieval. The two-dimensional map of KO's thematic topics indicated that in the periods studied, there was a significant overlap in thematic clusters in terms of concept and content.
Niknia and Mirtaheri (2015) used Scopus data to analyze the co-word network of LD over a decade. The results showed that thematic areas focused on common themes related to computer fields, e.g. big data, cloud computing, semantic data, semantic technologies, semantic web, AI, computer programming, and semantic search. Moreover, Kyaw and Wang (2018) analyzed 964 papers on LD extracted from the WOS using the mentioned techniques. The nine main clusters were the internet of Things, entity linking, education, semantic web, LD, web of data, DBpedia, data integration, and ontology.
Hosseini et al. (2021) visualized and analyzed the co-word network and thematic clusters of LD. The keywords linked data and semantic web had the highest frequencies in terms of co-word pairs. The intellectual structure was mapped as five main clusters, while HC included two clusters. Thematic clusters, namely core concepts of the semantic web, were identified as the most mature, and linked data usage in the context of cultural heritage was considered a well-developed but isolated cluster. The USA was the top country in terms of publications in the field. Moreover, various sub-categories of computer sciences included a large share of the publications.
The difference between the cited studies and the present research is that this study covered a wider period and more comprehensive data. Previous studies covered the 1986–2018 period, while the current study spanned the period from 1900 to 2021. The query used in the present study was more complex and included more specialized words in LD, while previous studies only searched the phrase “linked data” in the subject section of the WOS. Therefore, the present study examined a deeper, more comprehensive perspective. Finally, the HC of previous studies contained only 50 high-frequency keywords, so the topics were not discussed in depth. These points exhibit the originality and novelty of the current study.
In terms of the main contribution, this study is broader in data collection and deeper in analysis. It provides considerable insight into the development of research frontiers and directions in the field of LD. It especially identifies the maturity of LD research in library and information sciences and reveals that it has strong potential to be more prolific and push boundaries while focusing on new algorithm-based techniques as observed within recognized clusters.
Methodology
This applied study was conducted with a descriptive and analytical approach. The methodology was divided into four steps. The first step involved data collection and preprocessing. The second step dealt with network analysis and focused on co-word analysis (co-occurrences) by VOSviewer. The third step included data analysis via co-word analysis in terms of the intellectual structure of the network and employed the HC technique in SPSS. The fourth step entailed an analysis of the maturity of clusters by utilizing social network analysis and deliberating on the SD by SPSS. More details are provided below.
Data collection and pre-processing
The research population included all the keywords extracted from all the documents about LD indexed in the WOS during 1900–2021. The following query was searched using an advanced search in the WOS Core Collection from Clarivate Analytics in June 2020:
AK=((“linked data” OR “semantic web” OR “semantic web standard*” OR “ontology*” OR “ontolog*” OR “RDF” OR “RDF triple” OR “RDF/XML” OR “GraphDB” OR “Neo4J” OR “RDFS” OR “RDFa” OR “Resource Description Framework” OR “Resource Description Framework Schema” OR “SPARQL” OR “SPARQL Protocol and RDF Query Language” OR “RDF query language” OR “Application Profile*” OR “Linked open Data” OR “LOD” OR “W3C” OR “World wide Web consortium” OR “Web of Data” OR “semantic enrichment” OR “OWL” OR “Web Ontology Language” OR “The W3C Web Ontology Language (OWL)” OR “Semantic*” OR “Semantic Search” OR “Semantic Retrieval” OR “semantic reasoning” OR “semantic Web language” OR “Semantic Modeling” OR “Semantic data” OR “semantic data modeling” OR “semantic modeling for data” OR “Semantic Web technology stack” OR “data integration” OR “metadata schema” OR “Thesauri” OR “Thesaurus” OR “meta thesaurus” OR “interconnected data” OR “data model” OR “knowledge graph*” OR “knowledge Retrieval” OR “NOSQL” OR “JSON-LD” OR “Turtle” OR “RDF serialization*” OR “N3” OR “interlinked data” OR “machine-readable data” OR “LOD cloud” OR “Linked Open Data Cloud”) AND LANGUAGE:(English))
There is no single best scientometric data source. The WOS was chosen since it provided an incredible wealth of information on global scientific content from 1900 on. The study of science, technology, and knowledge has benefited greatly from its extensive coverage, which is valuable in interdisciplinary fields for appropriate literature. It is possible to access the best scholarly publications in humanities, social sciences, arts, and sciences through the WOS Core Collection (All Indexes) (Clarivate, 2023). Moreover, the authors had legal access to this database through the institution.
As a result of the query, 96,179 documents were retrieved. Then, the keyword column was unmixed, refined, and cleaned. Plural and compound words were converted into a singular form based on LD experts' viewpoints [1]. Table 1 presents the changes made to the keywords. The final data of this section answered the first question.
The authors decided to consider the period searched as the longest because the search query contained words and phrases that had gone through different periods in terms of growth and development and were the background of a topic, e.g. metadata or thesauri. For example, the thesaurus of English Words and Phrases, created by Peter Mark Roget in 1852, is the most well-known thesaurus. Still, the phrase was not used to describe vocabulary lists used in information retrieval until roughly a century later (Aitchison, Clarke, 2004).
Co-word analysis
Co-word analysis was introduced in the 1980s. It is based on the assumption that the use of common terms in two or more documents indicates the proximity of these texts, by which we can delimit the structure, concepts, and components of a scientific field. This method helps visualize the structure of scientific domains (Whittaker, 1989).
Co-word analysis, e.g. co-authoring, bibliographic coupled analysis, and co-citation is one of the most commonly used bibliometric and scientific methods. In this method, the co-occurrence of keywords is reviewed as a type of relationship, and the analysis unit deals with keywords or terms extracted from the titles and abstracts of the documents (Cobo et al., 2011). It also assists in identifying hidden and salient patterns, internal and external relationships of concepts (Osareh et al., 2016), and emerging trends.
The general links between concepts in clusters can be visualized by using VOSviewer to explore maps made from network data. VOSviewer can be used to construct, visualize, and explore maps based on any kind of network data, even though it is primarily designed to analyze bibliometric networks (Van Eck and Waltman, 2018). The threshold value in the software to the test and error was considered to be ≥ 10 co-occurrence, which resulted in the formation of nine main clusters of the 160,099 keywords; of these, 5702 met the threshold, including 490 keywords. The data in this section answered the second question.
In addition, some techniques, e.g. HC, can be adopted for co-word analysis. Hierarchical relationships between words in clusters can be represented by mappings in SPSS, the result of which helped answer the third research question.
Social network analysis
In an SD, the x-axis indicates centrality and the y-axis stands for density. It means that the SD includes four quadrants containing various degrees of density and centrality (Hu et al., 2013).
A square matrix (co-occurrence matrix) and, then, a correlation matrix were created based on the number of keywords for each cluster. Subsequently, the centrality and density of each cluster were measured by using the UCINet software, and an SD was plotted by SPSS, thereby answering the fourth research question (see Figure 1).
Findings
Q1: What are the top scientific publications on LD in terms of publication year, WOS subject categorization, organization, author, country, and research area?
The top ranks of the subject categorization of WOS were related to various fields of computer science (Table 2). The USA had the most outputs in the field of LD.
Q2: How is the intellectual structure of LD analyzed in terms of network structure and thematic clusters based on co-occurrence? What is their status in terms of frequency, link numbers, and total link strength?
The final output by using the algorithms and analyses of VOSviewer included nine main clusters of 144,353 total co-occurrences, total link strength of 245,378, and 50,840 total links.
According to the semantic concepts in the clusters, these nine main clusters were named as represented in Table 3.
Figure 2 depicts the network structure in LD, including the keywords visualized by VOSviewer 1.6.9. As shown in Figure 2, the network consisted of nine clusters in different colors. Figure 2 illustrates the overlay visualization of the network in this field. The colors of this map were determined by their weight in the network. Blue has the lowest score, green indicates the average score, and yellow has the highest score. Thus, movement from blue to yellow indicates more importance and weight due to the greater score and significance of the keyword in the network (Van Eck and Waltman, 2018).
Figure 3 illustrates cluster density visualization. When the color of the network cluster is closer to yellow, a greater density is available and the cluster is more significant (Van Eck and Waltman, 2018).
Q3: How is the intellectual structure of LD analyzed in terms of top co-word pairs, co-occurrence matrix, and HC?
Due to the excessive volume of data (keywords) to answer this question, we had to create a time series to reduce the amount of data only for high co-occurrences (500 co-occurrences). Because of the lack of scientific publications, the years before 1991 did not reach the threshold and were not analyzed.
The data could be processed by the software to export adjacency matrices as co-occurrence square matrices. Therefore, the data of the years were divided based on the output volume of the square matrices into 1991–2005, 2006–2008, 2009–2011, 2012–2014, 2015–2017, and 2018–2021 periods. The matrices were exported by a code in NumPy and Panda libraries in Python.
As a result, the total numbers of co-words of all the scientific outputs from WOS were recorded in square matrices in Microsoft Excel files. Table 4 presents the top and high-frequency co-word pairs during the various time series.Tables
Next, a dendrogram was plotted by utilizing the HC technique, focusing on Ward's method and the squared Euclidean distance. Figure A1-A6 displays the final dendrogram of the matrix, available in the Appendix. As a result, 29 final clusters were labeled (Tables 4–9).
Q4: How are the clusters of LD visualized and analyzed by the SD in terms of maturity and development?
The SD was plotted by utilizing SNA indicators, e.g. the degree of centrality and density. This technique is also useful for analyzing the maturity and development status of each cluster. Therefore, a co-occurrence matrix and, then, a correlation matrix was formed, and the centrality and density of each cluster were measured by UCINet (Table 10).
The SD was visualized (Figure 4) based on the origin of the centrality average (0.13) and density average (0.39), respectively.
Discussion
The findings revealed that the top scientific publications in the WOS classification are related to sub-categories of computer science. The USA was the most prolific country. These results are consistent with the results of Niknia and Mirataheri (2015) and Hosseini et al. (2021). Additionally, information science library science in terms of the research area had the 10th placement and ranked the 14th in terms of the WOS subject categories and included 2458 records. This indicates that this field has a strong potential to be more productive to contribute to boundaries of knowledge on LD. Moreover, scientific publications on LD have been on the rise since 2016.
The keyword ontology had the highest frequency of co-occurrence. It means that the core of the studies in this field commonly use, and are known by, this keyword. Therefore, LD and ontology are complementary, and their interactions help them strengthen and develop the field (Dutta, 2017). Accordingly, ontologies make a major contribution to the field of LD in terms of semantic interoperability (Delgado Azuara et al., 2013), supporting the argumentation and inference of new knowledge and improvement of data quality. Conversely, LDs can modify and improve the development of ontologies. They can also support the recognition of the terminology and field requirements to promote the development of user-oriented ontologies and support ontology data-centric modeling (Dutta, 2017), which is recognized as the central component of the infrastructure needed to understand the semantic web (Berners-Lee and Fischetti, 1999).
Keywords, e.g. semantic, semantic web, linked data, semantic integration, resource description framework (RDF), gene ontology, semantic segmentation, web ontology language (OWL), semantic similarity, natural language processing (NLP), and knowledge representation also had high frequencies of co-occurrences among thematic clusters (Table 11). As a result, they have received a large share of the discussions in this field and are the main sub-categories and sub-clusters.
In the six-time series, the words ontology and semantic were the highest-frequency co-word pairs, followed by the semantic web which had the highest co-occurrence in the last five time series. Moreover, linked data emerged among high-frequency co-word pairs during 2015–2017 and gene ontology during 2012–2014. Based on Table 12, OWL emerged during 2009–2014 and new trends, e.g. artificial intelligence and convolutional neural network appeared as top co-word pairs during 2018–2021. These occurrences were also highlighted and confirmed with the largest size and highest density in the network structure, as depicted in Figures 2 and 3.Figures 4
The intellectual structure in terms of co-occurrence represented nine clusters, focusing on semantic analysis, tools, standards by the linked data approach, algorithm-based techniques, and ontology-driven in the context of bioinformatics, cognition, image segmentation, and formal semantics.
Comparisons between the two clustering techniques (HC and co-occurrence analysis) indicated that three clusters, namely semantic bioinformatics, knowledge representation, and semantic tools are common to all. This result emphasizes the significance of these discussions in the field, as briefly described and addressed below.
Bioinformatics as one of the most important contexts in LD has a large share of scientific publications and is a major topic leader. The study by Hosseini et al. (2021) also concluded that the health context is mainstream and pioneering in LD compared to other contexts, e.g. cultural heritage.
Moreover, knowledge representation played a vital role and was a key topic in LD to represent various entities, relationships, and properties in different domains and contexts (Lin et al., 2021) to enhance semantic data modeling (Alexopoulos, 2020), ontology development (Dhingra and Bhatia, 2015), and reasoning (Zheng et al., 2021).
A variety of tools are used in the semantic web technology and linked open data environment, namely semantic tools, e.g. semantic web browsers, servers, generators, XPath tools, XML editors, validators, authority tools, triplestore tools, visualization tools, conversion software, vocabulary building platforms, ontologies, meta thesaurus, thesauri (Nowroozi et al., 2018), data management software, metadata management (O'Dell, 2015), BIBFRAME tools (Park et al., 2019), discovery interface platforms, and glossaries (World Wide Web Consortium, 2017). These can develop infrastructure, interface design, best practices, standards, and large-scale discovery services based on curated metadata and semantically annotated data (Cuna and Angeli, 2020; Kalita and Deka, 2021).
The mainstream topics of LD were located in Quadrant I and were defined as mature and central clusters (Table 13). These clusters had the most comprehensive thematic concepts in this field. They were more developed than the other themes, and their concepts were at the core of the field's subjects. They were the strongest and most mature clusters that had central positions in the field: natural language processing techniques to boost modeling and visualization, context-aware knowledge discovery, probabilistic latent semantic analysis (PLSA), semantic tools, latent semantic indexing (LSI), OWL abstract syntax, and ontology-based deep learning. These have been noted in various studies, some of which are mentioned below.
Cluster 17 indicates that NLP and semantic web technologies play separate but complementary roles. Semantic web technologies aim to convert unstructured data into meaningful representations, which greatly benefit from the use of NLP technologies to enhance information visualization, semantic searching, connecting text to linked open data, and modeling user behavior in online platforms (Maynard and Bontcheva, 2016; Kapetanios et al., 2013).
Cluster 22 showed that there are some advances in semantic web tools for various purposes, which can be used by machine learning to make better predictions by exploiting semantic links in knowledge graphs and linked datasets (Kanza and Frey, 2019).
As for Cluster 18, there is a conceptual intersection between context and knowledge discovery, focusing on the LD approach. Tim Berners-Lee suggested a five-star deployment scheme for linked open data. He noted: “Link your data to other people's data to provide context [LD]” (Berners-Lee, 2006). Therefore, the identified interconnected data and semantic relationships in a special context are highly fruitful and vital for facilitating discovery and functionality to improve context-aware knowledge discovery (Cole et al., 2017). As Wang et al. (2014) revealed that knowledge management based on domain ontology is a hotspot in the knowledge discovery field.
Cluster 21 signifies that PLSA is considered a new statistical approach to develop applications in information retrieval and filtering, NLP, machine learning from text (Hofmann, 2013), and other related fields, e.g. improving text segmentation (Bestgen, 2006), realizing global behavior inference (Li et al., 2008), and representing topic-based multi-document summarization (Hennig, 2009).
Concerning Cluster 23, a better approach is required for users to retrieve information in terms of meaning or the conceptual topic of a document (Luis Morato et al., 2013). LSI is supposed to solve the lexical matching problem by retrieving data using statistically determined conceptual indexes rather than individual words. It assumes that there is a latent structure in word usage that is somewhat concealed by the changeability of word choices (Deerwester et al., 1990; Rosario, 2000). This is accomplished by simulating the inherent higher-order structure in the relationship between words and objects. It offers a potential solution to increase users' access to a wide range of textual resources and objects with textual descriptions (Dumais, 1994). Overall, LSI is useful in text similarity and classification (Sari, 2018; Zhen and Zhang, 2018).
As for Cluster 28, ontology-based techniques focusing on deep learning can be utilized for various purposes, including triple classification (Amador-Domínguez et al., 2021), predicting human behavior in social platforms (Phan et al., 2017), knowledge embeddings (Maldonado et al., 2017), knowledge graph completion (Amador-Domínguez et al., 2020), ontology embedding (Kulmanov et al., 2021), relation extraction and entity linking techniques, and knowledge representation methods (Alam et al., 2022), which can make the represented knowledge in an ontology available to learn features for machine learning models as inputs of a similarity function.
Regarding Cluster 25, the OWL is a language of knowledge representation for authoring ontologies. The OWL languages are characterized by formal semantics, built upon the World Wide Web Consortium's (W3C) XML standard for objects called RDF. OWL includes different levels of expressiveness, respectively called OWL Lite, OWL DL, and OWL Full in increasing expressiveness. OWL is utilized as an international language in the semantic web for coding and exchanging ontologies to express knowledge based on the object-oriented models and relationships between things through ontologies (Hitzler et al., 2012). OWL and RDF/RDFS are the same, but OWL is a stronger language with more annotation, more vocabulary, and a stronger syntax compared to RDF/RDFS. Additionally, OWL offers a way to express relationships across several ontologies using a common and standard annotation framework (Baker, 2012). RDF as a crosswalk model can enhance semantic metadata interoperability (Chen, 2015). Following a methodical approach, librarians can effectively take part in the procedure of generating linked data, and the process can enhance the quality of data sources (Vila-Suero and Gómez-Pérez, 2013).
Quadrant II presents clusters that are not axial but developing. In other words, clusters located in Quadrant II (Ivory Tower) have strong internal relationships and a good level of maturity in this field. These are not pivotal but well-developed, important, and isolated clusters. The reason for this isolation is that these clusters are merged, switched, developed, and matured to higher-level topics with new trends and operational considerations as they can be observed in Quadrant I as mainstream themes.
The clusters located in Quadrant III have a relatively discontinuous structure and are called underdeveloped. These are not structured because their topics are either new or evolving. In other words, they are marginal with little attention, in the transition phase, and chaotic due to a lack of internal and external relations.
Quadrant IV shows the central, immature, underdeveloped, basic, and transversal themes, e.g. service-oriented modeling. This topic will be extended more in the future to model software systems aiming to consider paradigms, e.g. micro-services, context-aware architecture, deep learning and machine learning techniques, new trends, application architecture, ontology alignment and merging, and ontology developments.
In terms of practical implications, policy-makers, designers, and developers of semantic technologies can collaborate to utilize the results of this study as a thematic policy map to become fully aware of past and evolving trends in the field. In terms of academic implications, semantic researchers, practitioners, and faculty professors can use these insights to improve their understanding and prepare ahead of time to enhance scientific outputs practically, quantitatively, and qualitatively to develop themes in a balanced manner. The results may also be used to discover theme gaps, avoid repeated research, and uncover underlying patterns, core topics, and popular areas of the discipline. The insights offered here help semantic researchers effectively bridge the gap between theory and practice to push boundaries. Other potential beneficiaries include organizations and institutions that work with LD, e.g. libraries, museums, archives, and government agencies, which can use the findings to develop more effective strategies to manage and share their data. Besides, funding agencies can use the findings of this study to make informed decisions about investments in LD initiatives or other areas, e.g. education, healthcare, and bioinformatics. The LD approach can help media companies to improve content discovery, advertise more effectively and purposefully through personalized recommendations, and optimize the content strategy, e.g. BBC which is a well-known pioneer in this era.
Additionally, designers, high-tech firms, and semantic developers can make investments to design and power semantic tools for data mining, ontology development, category tagging, semantic search, and knowledge graph visualization, focusing on deep learning, NLP techniques, and algorithms as emphasized by mainstream clusters. The financial services industry can also benefit from applying LD approaches, e.g. risk management, fraud detection, investment analysis, and customer profiling.
This study also had certain limitations. The data were limited to the data available on the WOS Core Collection, and the final data were influenced by the researcher-made query. Moreover, due to a large amount of data, only words that had at least 500 co-occurrences were used in the final HC analysis.
Conclusion
This study aimed to analyze thematic clusters and their developments in the context of LD by focusing on co-word analysis to visualize and present inclusive viewpoints about the field.
Nine major topic clusters were identified according to co-occurrences, and 29 thematic clusters were identified based on HC. Comparisons between the two clustering techniques demonstrated that three clusters, namely semantic bioinformatics, knowledge representation, and semantic tools were in common. Scientific publications in this field have been on the rise since 2016.
It is concluded that information science library science is promising in opening up knowledge frontiers in LD. Moreover, bioinformatics had a large share of scholarly publications and can be regarded as a significant topic head.
Artificial intelligence and convolutional neural network emerged as current inclinations during 2018–2021. In addition, keywords, e.g. semantic, semantic web, linked data, semantic integration, RDF, gene ontology, semantic segmentation, OWL, semantic similarity, NLP, and knowledge representation were allocated abundant discourse and were hot topics, and crucial and dominant sub-categories, e.g. Sellami and Zarour (2022). Moreover, ontology and semantic were the highest-frequency co-word pairs.
The most mature and mainstream thematic clusters were natural language processing techniques to boost modeling and visualization, context-aware knowledge discovery, PLSA, semantic tools, LSI, OWL syntax, and ontology-based deep learning. The cluster entitled service-oriented modeling was an untimely and underdeveloped theme.
Future studies should focus on the analysis of the intellectual structure of knowledge in the related themes, e.g. semantic web, ontology, semantic interoperability, and knowledge representation to identify common concepts, clusters, and research gaps in these related disciplines. Some topics such as reasoning strategies (e.g. spatial, temporal, and context), deep learning on knowledge graphs, the application of formal ontology theories for knowledge representation, probabilistic knowledge graphs, methodological aspects of ontology development, knowledge graphs, and deep semantics, conceptual analysis and ontology design, and machine learning in semantic computing are highly recommended based on the identified core and mainstream themes. Such studies can investigate the strengths and weaknesses of these techniques and how they can be improved or combined with other methods to enhance their effectiveness. Future studies should also explore best practices and guidelines for organizations and institutions that work with the LD approach in various sectors where LD can have the greatest impact on improving services, e.g. education, healthcare, and metadata management.
Note1.Several LD experts confirmed the authenticity of the keywords, i.e. three subject experts who were interested in LD and had published papers on this subject approved and endorsed the query and the modifications.
Figure 1
4 quadrants in a strategic diagram
[Figure omitted. See PDF]
Figure 2
The network structure of keywords in the field of LD by using VOSViewer clustering
[Figure omitted. See PDF]
Figure 3
Cluster density visualization of keyword network in the field of LD from VOSViewer software
[Figure omitted. See PDF]
Figure 4
Strategic diagram of the 29 clusters by hierarchical clustering
[Figure omitted. See PDF]
Figure A1
Numbers and names of the clusters during 1991–2005 based on hierarchical clustering
[Figure omitted. See PDF]
Figure A2
Numbers and names of the clusters during 2006–2008 based on hierarchical clustering
[Figure omitted. See PDF]
Figure A3
Numbers and names of the clusters during 2009–2011 based on hierarchical clustering
[Figure omitted. See PDF]
Figure A4
Numbers and names of the clusters during 2012–2014 based on hierarchical clustering
[Figure omitted. See PDF]
Figure A5
Numbers and names of the clusters during 2015–2017 based on hierarchical clustering
[Figure omitted. See PDF]
Figure A6
Numbers and names of the clusters during 2018–2021 based on hierarchical clustering
[Figure omitted. See PDF]
Table 1
Keywords' changes after refining
| The word in the keyword column | The word converted after the refining | The word in the keyword column | The word converted after the refining |
|---|---|---|---|
| Databases | Database | Information-retrieval | Information retrieval |
| Thesauri | Thesaurus | Resource description framework | RDF |
| Ontologies | Ontology | Semantic-web | Semantic web |
| RDF data | RDF | Linked-data | Linked data |
| Vocabularies | Vocabulary | Linked (open) data | Linked data |
| Algorithms | Algorithm | Systems | System |
| Open linked data | Linked data | LOD | Linked data |
Source(s): Table 1 by authors
Table 2
Top scientific publications in the field of LD from WOS
| Web of science categories | Computer science information systems (2285) | Computer science theory methods (20627) | Computer science artificial intelligence (19604) | Engineering electrical electronic (13,648) | Computer science software engineering (9413) |
|---|---|---|---|---|---|
| Research Areas | Computer Science (51,639) | Engineering (18963) | Psychology (5480) | Telecommunications (4874) | Linguistics (4766) |
| Publication Year | 2020 (8539) | 2019 (8398) | 2018 (7647) | 2017 (7471) | 2016 (7249) |
| Organization | CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRS (1991) | UNIVERSITY OF CALIFORNIA SYSTEM (1566) | CHINESE ACADEMY OF SCIENCES (1479) | UNIVERSITY OF LONDON (1113) | STATE UNIVERSITY SYSTEM OF FLORIDA (766) |
| Authors | Zhang Y (267) | Liu Y (264) | Wang Y (247) | Li Y (220) | Zhang L (217) |
| Countries | USA (19043) | PEOPLES R CHINA (16,086) | ENGLAND (7368) | GERMANY (6876) | FRANCE (5459) |
Source(s): Table 2 by authors
Table 3
Labels of the nine main clusters
| Cluster | Labels |
|---|---|
| The 1st cluster | Semantic cognition |
| The 2nd cluster | Semantic analysis through algorithm-based techniques |
| The 3rd cluster | Ontology-based knowledge representation |
| The 4th cluster | Semantic image segmentation |
| The 5th cluster | Semantic web technologies |
| The 6th cluster | Semantic bioinformatics |
| The 7th cluster | Semantic web focusing on linked data approach |
| The 8th cluster | Formal and operational semantic |
| The 9th cluster | Semantic standards, databases, and tools |
Source(s): Table 3 by authors
Table 4
Numbers and names of the clusters during 2006–2008 based on hierarchical clustering
| Numbers of the clusters in the time series | Number of the cluster | Label of the cluster | Words in clusters |
|---|---|---|---|
| 2 clusters during 2006–2008 | 8 | Context-based retrieval | Analytic Hierarchical Process, analytical intelligence, Annotation based image retrieval, Bayesian network, semantic web, corrosion data markup language, Cross-language information retrieval, customer knowledge management, data integration, database, Domain specific language, domain specific modeling, domain specific modeling language, enterprise architecture, enterprise service bus, formal ontology, fuzzy c-mean clustering, Granular Soft Computing, graphics processing unit, hidden Markov model, Hierarchical Markov Model Mediator, Hierarchical Network of concept theory, Human Computer Interaction, Human language technology, Information Flow model, information retrieval, intelligent robot, Knowledge Acquisition, knowledge base system, Knowledge Graph, knowledge management system, knowledge point, knowledge representation, knowledge representation, Labeled Transition System, layered depth image, Learning Services, Learning Management System, length of diffusion, Ontology Web Language, ontology web language service, product data management system, product information specification, product lifecycle management, protein function prediction, Quality Function Deployment, rdfs, read data management system, reasoning, relational data model, RFID, semantic, semantic annotations for WSDL, Semantic Business Process Management, Semantic closure, semantic fields, Semantic Interoperability, semantic memory, semantic organization, semantic overlay network, Semantic Search Agent, semantic selection restriction, semantic similarity, semantic structure, semantic subspace projection, Structured Operational semantic, support vector machine, tacit knowledge, technology transfer platform, Text Based Image Retrieval, text mining, the Unified Medical Language System, theory, thesaurus, transcription factor binding site, urban ontology, vagueness, verb semantic, verification, virtual reality, visual analytics, visual language, Web 2.0, Web Accessibility Initiative, Web Content Accessibility Guidelines, Web feature service, web service, word sense (disambiguation), word sense disambiguation, WordNet, Workflow Enactment Tier, XML Linking Language, XQuery |
| 9 | recommendation systems | reasoning, Basic Formal Ontology, Bayesian network, data warehousing process, data-mining, design management activities, design structure matrix, Discrete Event Simulation, enterprise information system, XML, function-behavior-structure model, formal concept analysis, geographic information system, integrated information platform, intelligent transportation system, Knowledge Node, model driven architecture, multiple attribute decision making, NATO Network Enabled Capabilities, natural language processing, ontology, qualities of service, rdf, RDF Schema, Recommendation System, right expression language, Transforming the Radiologic Interpretation Process, URI |
Source(s): Table 4 by authors
Table 5
Numbers and Names of the Clusters during 1991–2005 based on Hierarchical Clustering
| Numbers of the clusters in the time series | Number of the cluster | Label of the cluster | Words in clusters |
|---|---|---|---|
| 7 clusters during 1991–2005 | 1 | Unified modeling language (UML) | Common Information Model, Substation Automation System, Substation Configuration Language, UML |
| 2 | Knowledge representation | indirect semantic priming, structural operational semantic, algebraic semantic, approximate fuzzy data model, artificial intelligence course, Bayesian information criterion BIC, business process management, category-based semantic field, computer integrated manufacturing and engineering, Competitive intelligence system, competence ontology, comprehensive spatiotemporal data model, concurrent execution model power set semantic, content-based retrieval, Context focus techniques, Customer Relationship Management, data integration, data mining, database, degree of entity-relationship E-R model, extensible modeling and simulation framework, Formal Description Technique, Formal Description Techniques, FORMAL semantic, fuzzy knowledge and meta knowledge, fuzzy rule-based system, KWIC Key Word In Context display, language, Object-Oriented conceptual model, parametric model-based analysis, Performance Requirements Framework, query optimization, RDFS algebra, RDFS query language, relational semantic, RDF Site Summary, semantic memory, semantic of wh-words, semantic priming, semantic processing, semantic query optimization | |
| 3 | knowledge acquisition | static and dynamic integrity constraints, functional magnetic resonance imaging fMRI, knowledge acquisition computer, latent semantic indexing LSI, neural network computer, semantic, specific language impairment, universal grammar | |
| 4 | Case-based reasoning | Car Life Ontology, Case-Based Reasoning, Intelligent Transportation system ITS, Problem-Solving Ontology, Problem-Solving Method, Symmetrical and Active Service Model | |
| 5 | Science data analysis network | social science data analysis network, geographic information system, national STEM digital library, NSF national science foundation, STEM science, technology, engineering | |
| 6 | fuzzy –crisp- semantic | fuzzy crisp matrix, fuzzy crisp matrix model, fuzzy crisp matrix sequential consequence, fuzzy crisp model, fuzzy crisp semantic, fuzzy crisp semantic sequential consequence, fuzzy crisp valuation | |
| 7 | An entity-relationship model (ERM) | agent service description language, cognitive task analysis, constraint logic programming, content repository management system, DAMLOIL, decision support system generator, Digital Archives, Mining, entity relationship diagram, XML database, graphical user interface, I Human-Computer Interaction, Information Retrieval, information sources, Integrated web architecture Mediator and Wrappers, intelligent tutoring system, Korea Educational Metadata, knowledge interchange format, multi-agent system, ontology, ontology inference layer, Product Lifecycle Management PLM, rdf scheme, support vector machine, user interface design, web grid service, web usage mining |
Source(s): Table 5 by authors
Table 6
Numbers and Names of the Clusters during 2009–2011 based on Hierarchical Clustering
| Numbers of the clusters in the time series | Number of the cluster | Label of the cluster | Words in clusters |
|---|---|---|---|
| 6 clusters during 2009–2011 | 10 | Development of different language modeling | (Semi-)structured documents, Activity control model, Acute phase protein, Artificial General Intelligence, Association link network, Basic formal ontology, bioinformatics, Common Item Based Classifier, Computer Aided Design, Digital ecosystem, digital elevation model, Digital Human Model, discourse argumentative/newspaper discourse, Discourse Representation Theory, discovery feature sub-space model, Domain Operating Platform, Domain-Specific Modeling Language, eLearning, Electronic Health Record, Electronic Medical Record, factor analysis, Fault semantic network, Formal Concept Analysis, functional data analysis, Granular Computing, Human-Computer Interaction, Information management for Collaborative network, Information Retrieval, multivariate image analysis, Ontological URI, ontology, Ontology Learning, Ontology-Driven Compositional System, outage management system, oxide thickness fluctuations, part-of-speech disambiguation, Personalized Recommendation in E-commerce, representations (procedural and rule based), (Simple Concurrent Object Oriented Programming), Semantic (associative) network, semantic (European language), Semantic (meta-)hooking, semantic classification, Semantic Normal Form, Semantic priming, Semantic search, semantic space model, Semantic Structure, semantic tableau, Semantic web technology, sentence category, Service discovery, Simple Knowledge Organization System, Simple Sequence Repeat, statistical language modeling, structural equation modeling, term rewriting, thematic (semantic) role labeling, thin buried oxide, Unified Enterprise molding Language, UML, Unifying theories of programming, U-service (Ubiquitous-Service) Ontology, visual semantic algebra, Visual Space Graph, OWL, web service, Web Service Discovery, Weighted Association Rule Mining, wildlife model, Word Lexical Semantic Similarity Measurement, word recognition, WordNet, XML |
| 11 | Enhancing resource descriptions by application-profiles | DCMI Application Profile, resource description editor | |
| 12 | Developing service-oriented modeling | Description Logic, Enterprise architecture management system, Java Expert System Shell, OWL, service-oriented Model, Small and Medium Enterprises, Web Map Service | |
| 13 | Semantic annotations | annotations for the Semantic Web, modeling language resource and annotations, Open Knowledge Foundation | |
| 14 | virtual knowledge graphs | Electronic patient record, General Relationship Model, multiagent system, Outcomes Assessment (Healthcare), pragmatics, semantic | |
| 15 | Fuzzy Formal Concept Analysis (FFCA ( | Contextual question answering, dynamic throughput graph, Fuzzy Formal Concept Analysis, Home Energy Management system, information and computer ethics, integrated object-process modeling, network management system, Qualitative Data Analysis, self-organizing maps, service-oriented architecture, spatial decision support system, Web service resource framework |
Source(s): Table 6 by authors
Table 7
Numbers and Names of the Clusters during 2012–2014 based on Hierarchical Clustering
| Numbers of the clusters in the time series | Number of the cluster | Label of the cluster | Words in clusters |
|---|---|---|---|
| 4 clusters during 2012–2014 | 16 | Semantic Web content creation, annotation, and extraction | Application Programming Interfaces, Artificial intelligence, Artificial neural network, associative network, AudioVisual Description Profile, basic formal ontology, Bayesian Association with Missing Data, Biomedical Research Integrated Domain Group, Bloom's Taxonomy, Brain Computer Interface, Cognitive Informatics, composite semantic web service, Computer aided process planning, Distributed management system, Distributed Missions Operations, Document Object Model, Fuzzy description logic, fuzzy markup language, geospatial information system, Healthcare Information System, Internet/World Wide Web, interoperability, Learning Object Repository, Learning Objects, Library Linked Data Incubator Group: Use Case, Linguistic linked data cloud, linked data, linked data Enabled Bibliographical Data, Linked Data in Linguistics, linked data-Enabled Bibliographical Data, Linked Heritage Project, Linked open Vocabulary, edical subject headings, metadata, Meter data integration, natural language understanding, Natural Semantic Metalanguage, Neural Network, NoSQL, ontological cognition, Ontological Semantic Theory of Humor, Ontology Learning From Text, Ontology Web Language for service, Ontology-based Data Integration, Semantic network analysis, Semantic overlay network, Semantic processing, Semantic Technology, Semantic Textual Similarity, semantic theory, semantic transparency, Semantic Web service, semantic++ MapReduce, Semi-supervised clustering, service oriented Architecture, SNOMED CT, Social Media, Social Network Ontology Collaborative platform, spatial pyramid, support vector machine, Syntax, Tag-based integrated semantic ontology, Text mining, the method of semantic structures, thesaurus, Web operating system, Web service, Web Service Discovery, Wikipedia, WordNet, XML |
| 17 | Natural language processing techniques to boost modeling and visualization | Attributed relational graph, Building Information Modeling, Cognitive Internet of Things, cognitive radio network, Geographic Information System, Named Entity Recognition, Natural Language Processing, semantic, Unified process for ontology, User Interface, Visual Analytics | |
| 18 | context-aware knowledge discovery | DARPA Agent Markup Language, Extraction, transformation and loading, Formal Concept Analysis, Functional Requirements for Bibliographic record (FRBR), Knowledge Discovery in database Process, Linking Open Data, Method Oriented Architecture, Quality of service, RDF, Region Based Image Retrieval, Semantic Web, Semantic Web rule language, SPARQL, OWL, XML | |
| 19 | ontology alignment to enhance semantic integration | case based reasoning), Clinical decision support system, computed tomographic images, Computer-Supported Collaborative Work, database, electronic health recording, Health Level Seven International (HL7), Information Extraction, Information technology infrastructure library, Knowledge Management, ontology, ontology alignment, overlapping area matrix, semantic integration, Semantic Web rule language, word sense disambiguation |
Source(s): Table 7 by authors
Table 8
Numbers and Names of the Clusters during 2015–2017 based on Hierarchical Clustering
| Numbers of the clusters in the time series | Number of the cluster | Label of the cluster | Words in clusters |
|---|---|---|---|
| 7 clusters during 2015–2017 | 20 | Ontology engineering | Actionable Knowledge Discovery, Arabic Named Entity Recognition, Arabic WordNet, Arabic WordNet ontology, Artificial Intelligence, Artificial Neural Network Fuzzy Inference System, Assistive technology, Associated value, bag of word vectors, Bayesian Network, Bibliography of Linguistic Literature, big data, Case-based reasoning, Chinese thesaurus, Class/(Deep)Semantic Binary Codes, Content-based image retrieval, Content-based medical image retrieval, context based query weighting, Data Manipulation Language Operations, Electronic Health Record, Encoded Archival Description, event-related brain potentials, extension-based semantic, Formal Concept Analysis, Fuzzy diabetic ontology, Fuzzy inference system, Fuzzy object-oriented database model, Gene annotations, Graphical User Interfaces, Human Computer Interaction, Human Phenotype Ontology, Information Retrieval, Interaction Network Ontology, interference, Interoperability and System Analysis and Design, Knowledge Extraction, Knowledge Base, Knowledge discovery database, Knowledge graph, Knowledge Management, Knowledge Modeling, Knowledge representation, Knowledge sharing, Latent Semantic Analysis, Levels of Information system Interoperability Model, lexical (semantic) fields, Lexical semantic, Medical Subject Headings, Named Entity Recognition, Natural Language Processing, ontology (computer science), Ontology Web Language for service, Ontology-based data access, Ontology-based multimedia information retrieval, political movements (Thesaurus), protein-protein interaction network analysis, RDF graph, RDF(S) and SPARQL, Relation inference, representation, Resource Description and Access, semantic, semantic annotation based clustering, SEMANTIC ASSOCIATION, Semantic interoperability, semantic similarity, Semantic technology, Semantic Web of Things, Sentiment Analysis, Unified Medical Language System, Web service, Web service Description Language, Word Map (Semantic Mapping), Word sense disambiguation, Wordnet |
| 21 | probabilistic latent semantic analysis (PLSA) | Bag-of-topics, bag-of-words, latent Dirichlet allocation, probabilistic latent semantic analysis, probabilistic topic model | |
| 22 | Semantic tools | Data Quality, Description logic, Korean Cultural Heritage Data Model, Library of Congress Subject Headings, linked data, spatial image information mining, thesaurus | |
| 23 | Latent Semantic Indexing (LSI) | Correlation Topic Modeling, Latent Semantic Indexing, Object-oriented software, Ontology Learning, Probabilistic Latent Semantic Indexing | |
| 24 | bibliographic conceptual ER models | Bibliographic Framework (BIBFRAME), Environmental information system, FRBR Object-Oriented (FRBRoo), Functional Requirements for Bibliographic record (FRBR), Fuzzy Ontology Generation framework, Knowledge Organization system, Semantic Web | |
| 25 | OWL | DARPA Agent Markup Language, XML, eXtensible Style Language Transformations, OWL API, RDF, OWL, XML Schema Definition | |
| 26 | Linked open rules and vocabularies | Basic Formal Ontology, building automation system, building management system, Current Research Information system, feature model ontology, fuzzy markup language, Health Level 7, Internet of Things, Linked Open Rules, Linked Open Vocabulary, Model-driven Interoperability, object-oriented image analysis, ontology, Ontology Definition Metamodel, ontology-driven engineering |
Source(s): Table 8 by authors
Table 9
Numbers and Names of the Clusters during 2018–2021 based on Hierarchical Clustering
| Numbers of the clusters in the time series | Number of the cluster | Label of the cluster | Words in clusters |
|---|---|---|---|
| 3 clusters during 2018–2021 | 27 | Semantic bioinformatics | Action semantic, artificial intelligence, Bag of concept, bag of visual words, Building Information Modeling, cancer biology, data integration, Digital Transformation, Dublin core, Enterprise Metamodel, XML, Fuzzy cognitive maps, Fuzzy semantic representation, Gene ontology, Genetic programming, human visual system, Human-robot interaction, Hypertext transfer protocol, hyperspectral image classification, Knowledge Base, Knowledge graph, knowledge graph completion, knowledge graph embedding, Knowledge Management, Latent Dirichlet allocation, Latent Semantic Analysis, latent semantic indexing, learning artificial intelligence, Named entity recognition, Named Entity Recognizer, Names of medicinal plants phytonyms, Narcotic UNESCO Thesaurus, object-based image analysis, Object-Oriented Ontology, ontology artificial intelligence, Ontology Definition Metamodel ODM, ontology integration, Ontology Web Service Language, Open Data Protocol OData service, Open web application security project, OWL, probabilistic latent semantic analysis, Program for Cooperative Cataloging, Proof-theoretic semantic, Protein-protein interaction, Protein-protein interaction network, Prototype Data Model PDM, Question Answering, RDF, RDFS, Relational Database RDB, Relational, Relational ontology, Relationships, Resource Description and Access RDA entities, semantic analysis, semantic feature extraction, Semantic Text Similarity, Semantic theory of survey response, Semantic Web, Semantic Web Rule Language, semantic-pragmatic cycle, social ontology, SPARQL, Structure Query Language, Subjectivity and sentiment analysis, text analysis, thesaurus, Word-to-word semantic relevance matrix, World Health Organization, W3C |
| 28 | Ontology-based deep learning | Artificial neural network, augmented reality, information retrieval, Knowledge Organization system, Lexicon Model for ontology Lemon, machine learning, natural language processing natural language processing, ontology, Open Biomedical Annotator, semantic Web of Things, subject area's information, Web of Things | |
| 29 | Ontology-based data access | attribute profiles, convolutional neural network, Database of Names and Biographies, Deep learning, deep neural network, Deep residual network ResNet, Digital elevation model, Dirichlet process, functional near-infrared spectroscopy, fuzzy class probability, generative adversarial network, Geographical Shared Data Source, graph convolutional network, Human-Object Interactions, Internet-of-Things, Lifelong topic model, machine-to-machine, Natural language understanding, object-based image classification, ontology-based data access, semantic, TF-IDF |
Source(s): Table 9 by authors
Table 10
Degree of centrality and density of the clusters from UCINet
| Cluster | Density | Centrality |
|---|---|---|
| 1 | 0 | 0 |
| 2 | 0.002 | 0.009 |
| 3 | 1 | 0 |
| 4 | 0 | 0 |
| 5 | 0 | 0 |
| 6 | 0 | 0 |
| 7 | 1 | 0 |
| 8 | 0.003 | 0.015 |
| 9 | 1 | 0 |
| 10 | 0.001 | 0.025 |
| 11 | 0 | 0 |
| 12 | 0.048 | 0.167 |
| 13 | 0 | 0 |
| 14 | 0 | 0 |
| 15 | 0.015 | 0.091 |
| 16 | 0.002 | 0.013 |
| 17 | 0.673 | 0.278 |
| 18 | 0.661 | 0.255 |
| 19 | 1 | 0 |
| 20 | 0.002 | 0.021 |
| 21 | 0.500 | 0.833 |
| 22 | 0.667 | 0.467 |
| 23 | 0.700 | 0.500 |
| 24 | 0.571 | 0.600 |
| 25 | 0.679 | 0.429 |
| 26 | 1 | 0 |
| 27 | 0.004 | 0.036 |
| 28 | 0.803 | 0.236 |
| 29 | 1 | 0 |
Source(s): Table 10 by authors
Table 11
High-frequency Keywords in nine Main Clusters from VOSViewer Software
| Rank | Keyword | Frequency of Co-occurrence | Link | Total link strength |
|---|---|---|---|---|
| The first cluster: 73 keywords/total co-occurrences: 28,323/links: 10,231/total link strength:48,114 | ||||
| 1 | semantic | 7407 | 462 | 12,408 |
| 2 | semantic memory | 1217 | 147 | 1796 |
| 3 | language | 821 | 242 | 1981 |
| 4 | semantic priming | 586 | 89 | 489 |
| 5 | epistemology | 543 | 118 | 1007 |
| 6 | semantic processing | 486 | 107 | 445 |
| 7 | semantic dementia | 471 | 69 | 607 |
| 8 | FMRI | 469 | 118 | 888 |
| 9 | syntax | 466 | 153 | 1012 |
| 10 | pragmatics | 385 | 110 | 719 |
| The second cluster: 60 keywords/total co-occurrences: 22,578/links: 9384/total link strength:37,697 | ||||
| 1 | natural language processing | 1341 | 303 | 2628 |
| 2 | semantic similarity | 1307 | 263 | 1676 |
| 3 | information retrieval | 1150 | 283 | 2240 |
| 4 | knowledge graph | 982 | 217 | 1097 |
| 5 | machine learning | 981 | 318 | 2208 |
| 6 | data mining | 825 | 287 | 1820 |
| 7 | semantic analysis | 807 | 247 | 837 |
| 8 | semantic network | 772 | 256 | 873 |
| 9 | semantic annotation | 686 | 225 | 1004 |
| 10 | latent semantic analysis | 615 | 178 | 565 |
| The third cluster: 71 keywords/total co-occurrences: 36,063/links: 8856/total link strength:56,565 | ||||
| 1 | ontology | 18,456 | 457 | 25,353 |
| 2 | owl | 1285 | 286 | 2599 |
| 3 | knowledge representation | 1119 | 292 | 2129 |
| 4 | knowledge management | 922 | 256 | 1702 |
| 5 | interoperability | 871 | 246 | 1764 |
| 6 | semantic interoperability | 669 | 189 | 973 |
| 7 | domain ontology | 578 | 198 | 691 |
| 8 | description logic | 470 | 161 | 816 |
| 9 | reasoning | 431 | 208 | 992 |
| 10 | semantic technology | 410 | 193 | 631 |
| The fourth cluster: 48 keywords/total co-occurrences: 10,475/links: 4786/total link strength:23,356 | ||||
| 1 | semantic segmentation | 1520 | 101 | 1649 |
| 2 | deep learning | 1002 | 193 | 1958 |
| 3 | data model | 982 | 256 | 1330 |
| 4 | feature extraction | 542 | 163 | 1972 |
| 5 | visualization | 512 | 241 | 1440 |
| 6 | convolutional neural network | 464 | 99 | 870 |
| 7 | neural network | 430 | 212 | 1017 |
| 8 | task analysis | 331 | 148 | 1554 |
| 9 | image segmentation | 277 | 85 | 941 |
| 10 | semantic information | 275 | 119 | 226 |
| The fifth cluster: 48 keywords/total co-occurrences: 9424/links: 5036/total link strength:18,429 | ||||
| 1 | web service | 1008 | 262 | 2111 |
| 2 | internet of things | 768 | 223 | 1539 |
| 3 | semantic web service | 647 | 160 | 907 |
| 4 | multi-agent system | 453 | 168 | 785 |
| 5 | context | 450 | 269 | 930 |
| 6 | agent | 294 | 168 | 615 |
| 7 | security | 265 | 174 | 603 |
| 8 | owl-s | 229 | 80 | 416 |
| 9 | service discovery | 217 | 98 | 444 |
| 10 | service composition | 216 | 99 | 429 |
| The sixth cluster: 46 keywords/total co-occurrences: 11,518/links: 3746/total link strength:14,427 | ||||
| 1 | data integration | 3409 | 322 | 3633 |
| 2 | gene ontology | 1652 | 134 | 1108 |
| 3 | turtle | 880 | 47 | 169 |
| 4 | database | 558 | 251 | 1234 |
| 5 | bioinformatics | 360 | 151 | 815 |
| 6 | taxonomy | 264 | 161 | 472 |
| 7 | modeling | 240 | 15 | 63 |
| 8 | microarray | 212 | 49 | 319 |
| 9 | gene expression | 198 | 77 | 363 |
| 10 | system biology | 183 | 85 | 375 |
| The seventh cluster: 30 keywords/total co-occurrences:13,513/links: 3686/total link strength:24,615 | ||||
| 1 | semantic web | 5952 | 401 | 10,820 |
| 2 | linked data | 2869 | 325 | 4060 |
| 3 | metadata | 644 | 237 | 1391 |
| 4 | e-learning | 458 | 163 | 915 |
| 5 | thesaurus | 451 | 172 | 718 |
| 6 | annotation | 263 | 170 | 590 |
| 7 | evaluation | 244 | 179 | 453 |
| 8 | cultural heritage | 174 | 101 | 330 |
| 9 | open data | 164 | 93 | 347 |
| 10 | vocabulary | 162 | 125 | 362 |
| The eighths cluster: 29 keywords/total co-occurrences:5079/links: 2526/total link strength:7874 | ||||
| 1 | formal semantic | 468 | 115 | 356 |
| 2 | operational semantic | 399 | 69 | 253 |
| 3 | denotational semantic | 305 | 39 | 167 |
| 4 | algorithm | 276 | 194 | 721 |
| 5 | verification | 243 | 108 | 498 |
| 6 | theory | 237 | 113 | 595 |
| 7 | performance | 234 | 181 | 559 |
| 8 | uml | 228 | 101 | 365 |
| 9 | design | 210 | 178 | 604 |
| 10 | simulation | 199 | 134 | 347 |
| The ninth cluster:21 keywords/total co-occurrences:7380/links: 2589/total link strength:14,301 | ||||
| 1 | rdf | 1769 | 308 | 3101 |
| 2 | big data | 885 | 263 | 1918 |
| 3 | nosql | 772 | 161 | 1206 |
| 4 | sparql | 742 | 224 | 1729 |
| 5 | xml | 506 | 190 | 1018 |
| 6 | cloud computing | 498 | 185 | 939 |
| 7 | relational database | 259 | 110 | 492 |
| 8 | nosql database | 239 | 77 | 279 |
| 9 | mapping | 189 | 134 | 428 |
| 10 | mongodb | 176 | 69 | 349 |
Source(s): Table 11 by authors
Table 12
Top and high-frequency Co-word pairs
| Time series | The square matrix | Ten top Co-occurrence of keywords |
|---|---|---|
| 1991–2005 | 262*262 | ontology* ontology (1221) |
| semantic* semantic (824) | ||
| semantic* ontology (40) | ||
| Ontology* semantic (40) | ||
| turtle* turtle (25) | ||
| semantic memory * semantic memory (19) | ||
| NASA thesaurus* NASA thesaurus (9) | ||
| semantic priming * semantic priming (7) | ||
| web service * web service (5) | ||
| linkage -genetics_ * linkage -genetics_ (4) | ||
| 2006–2008 | 288*288 | ontology* ontology (2054) |
| semantic*semantic (669) | ||
| semantic*ontology (69) | ||
| ontology*semantic (69) | ||
| Semantic web* semantic web (21) | ||
| Web service*web service (11) | ||
| Limit of detection (linked data) * limit of detection (linked data) (8) | ||
| RDFS*RDFS (7) | ||
| RDF* RDF (6) | ||
| Gen ontology* gen ontology (5) | ||
| 2009–2011 | 367*367 | ontology*ontology (1932) |
| semantic*semantic (958) | ||
| Semantic web*semantic web (715) | ||
| Semantic web*ontology (103) | ||
| ontology*semantic web (103) | ||
| Ontology*semantic (73) | ||
| semantic*ontology (73) | ||
| OWL*OWL (10) | ||
| Semantic memory*semantic memory (8) | ||
| Web service*web service (7) | ||
| 2012–2014 | 266*266 | ontology*ontology (2007) |
| semantic*semantic (1098) | ||
| Semantic web*semantic web (721) | ||
| Semantic web*ontology (78) | ||
| ontology*semantic web (78) | ||
| semantic*ontology (64) | ||
| ontology*semantic (64) | ||
| Web Ontology Language (OWL) *Web Ontology Language (OWL) (19) | ||
| RDF*RDF (16) | ||
| Gene ontology (GO)* Gene ontology (GO) (10) | ||
| 2015–2017 | 232*232 | Ontology*ontology (2218) |
| Semantic*semantic (1301) | ||
| Semantic web*semantic web (620) | ||
| Linked data*linked data (527) | ||
| Ontology*semantic (85) | ||
| Semantic*ontology (85) | ||
| Semantic web*ontology (60) | ||
| Ontology*semantic web (60) | ||
| Semantic web*linked data (43) | ||
| Linked data*semantic web (43) | ||
| 2018–2021 | 165*165 | Semantic*semantic (1537) |
| Ontology*ontology (1130) | ||
| Ontology*semantic (48) | ||
| Semantic*ontology (48) | ||
| Semantic segmentation*semantic segmentation (36) | ||
| learning -artificial intelligence*learning -artificial intelligence (31) | ||
| convolutional neural network (CNN)*convolutional neural network (CNN) (17) | ||
| ontology -artificial intelligence *ontology -artificial intelligence (15) | ||
| Semantic web*semantic web (14) | ||
| convolutional neural network (CNN)*semantic (10) |
Source(s): Table 12 by authors
Table 13
Quadrants for 29 clusters located in the strategic diagram
| Quadrant II: Developed but isolated | Quadrant I: Mainstream |
|
|
| Quadrant III: Chaos/Unstructured/emerging | Quadrant IV: Basic and transversal |
|
|
Source(s): Table 13 by authors
© Emerald Publishing Limited.
