Integrating Node Importance and Network

Full text

Turn on search term navigation

1. Introduction

A complex system can be described as a complex network that consists of many nodes and links, and the processing of information in a complex system can be transformed into the mining of information in a complex network. Therefore, researching complex networks in a scientific way can help us better recognize and understand the internal structure of complex systems. The research contents of complex networks mainly include community detection [1], link prediction [2], important node ranking [3], network evolution [4], etc. Link prediction methods aim to predict a link’s existence between nodes which have no connecting link, based on observed network information. The predicted objects include unknown links and future links [5]. Predicting the existence of missing links is a process of data mining, and the prediction of possible future links is related to the evolution of complex networks.

As one of the important research branches of network science, the applications of link prediction involve many fields, such as biological research, network reconstruction, and label classification. In biological research, link prediction is applied to guide experiments on protein interactions. In network reconstruction, link prediction can predict the true network structure based on known network information. In label classification, link prediction can classify nodes without labels based on the label information of existing nodes in the network. Link prediction also has many challenges, such as the theoretical limits of link prediction accuracy, determination of the Hamiltonian of the network, skillful integration of structural information and external information, dealing with complex types of networks and hyperscale networks, etc.

According to the different methods of network feature extraction, link prediction methods can be divided into network topology structure, likelihood analysis [6], and machine learning [7]. Although the methods based on maximum likelihood probability and machine learning can obtain better prediction accuracy, due to overly complex calculations, the application scope is limited. The methods based on network topology structure are easier to implement when utilizing the network topology to predict the link’s existence. Existing methods based on network topology can be divided into three categories: local similarity, quasi-similarity, and global similarity.

The local similarity methods mainly considers the common neighbors’ information of nodes in a network. The literature [8,9,10] pays more attention to the degree of common neighbors and the number of common neighbors to predict the links of a network. By considering multiple attributes of nodes, a link prediction algorithm was proposed by Gao et al. [11], which uses the degree and clustering coefficient of common neighbors to estimate the likelihood of the existence of links between nodes and achieve good prediction accuracy. Yu et al. [12] proposed a link prediction algorithm which combines degree, clustering coefficient, and node centrality. The advantage of this method based on local similarity is its low computational complexity. However, due to the limited information used, the prediction accuracy is not ideal.

The global similarity methods takes advantage of more information in the network, such as the Katz method [13], which considered the transfer similarity of all paths in the network and improved the prediction performance. Based on the random walk process of particles in the network, a random walk algorithm with restart was proposed by Brin et al. [14], which improved the prediction results. The global similarity methods have better prediction performance but calculation complexity is too high. To further improve the prediction performance of local similarity methods and reduce the computational complexity of the global similarity methods, some scholars proposed methods based on quasi-local similarity, such as the local path method proposed by Lv et al. [15]. This method considered the contribution of high-order paths on the basis of the common neighbor method while achieve better prediction accuracy. Liu et al. [16] proposed the local random walk method. Because only a limited number of steps are used, which is suitable for the link prediction in large-scale networks. Based on the network structure and the strong connection between nodes and considering the contributions of high-order paths to similarity of nodes, Qian et al. [17] proposed three algorithms: TPSR2, TPSR3, and TPSR4. The prediction results show that these algorithms have better prediction accuracy while the TPSR3 algorithm performs the best.

Inspired by the idea of employing local link information for link prediction, Wu et al. [18] proposed a new similarity method, named link prediction with node clustering coefficient (CCLP), which employed more network information. Through comparative experiments in networks from various fields, the results showed that it is effective in predicting missing links and has better prediction accuracy. Based on the degree distribution and local information of nodes to predict the existence of links, Mumin D et al. [19] proposed the common neighbor with degree of nodes (CN2D) algorithm for link prediction, utilizing network structure information to predict the existence of links. The CN2D algorithm has low computational complexity and high prediction accuracy. Liu et al. [20], by considering the initial information of nodes, proposed the initial information contribution of nodes (IICN) algorithm, which aimed to solve the problem of ignoring the initial information size of nodes. The experimental results demonstrate that, compared with mainstream benchmark methods, the IICN algorithm showing better performance advantages.

In this paper, a novel link prediction algorithm was proposed by considering the degree centrality of node pairs and the proximity centrality of nodes, called DCCLP, which predicts the possibility of link existence from two aspects: node importance and node topological feature. The framework of the DCCLP algorithm is shown in Figure 1. Since the network datasets used are undirected and unweighted networks, which lead to the adjacency matrix of the networks being symmetric, we maintain symmetry to calculate the similarity between any two nodes in the networks. The DCCLP algorithm makes up for the deficiency that the TPSR3 algorithm does not fully consider the node importance and achieves better prediction accuracy and performance.

2. The TPSR3 Algorithm

In 2017, Qian et al. [17] utilized the strong connection of the ego network and the close relationship between nodes to describe the similarity and proposed a class of TPSR algorithms. The main ideas of these algorithms are to use the paths within the third order and nodes’ attribute to describe the similarity between nodes. Through comparison the experimental results, it is found that the TPSR3 algorithm performs best.

For two nodes $x, y \in V$ in the undirected unweighted network $G = (V, E)$ , among them, $V$ is the node set and $E$ is the link set. $A = (e_{x y})$ is an adjacency matrix and a symmetric matrix, if $e_{x y} = 1$ , it indicates that there is a connected link between node $x$ and $y$ , and if $e_{x y} = 0$ , it indicates that there is no connected link. Let $L_{x, y}^{3}$ denotes the set of third-order paths of connecting node $x$ and node $y$ , the similarity between nodes $x$ and $y$ is shown in Formula (1) [17].

(1) $s_{x y}^{T P S R 3} = \sum_{z \in Γ (x) ⋂ Γ (y)} (\frac{C_{z}}{k_{z}} + \frac{1}{k_{z}}) + θ |L_{x y}^{3}|$

where

Γ (x), Γ (y)

represent the neighbor node sets of nodes

x

and

y

, respectively, and the parameter

θ

is a small positive number, which is used to adjust the impact of high-order paths on node similarity.

|L_{x y}^{3}|

represents the number of elements contained in

L_{x y}^{3}

, and

k_{z}

is the degree of node

z

C_{z}

denote the clustering coefficient of node

z

, the calculation method is given as:

(2) $C_{z} = \frac{2 t_{z}}{k_{z} (k_{z} - 1)}$

where

t_{z}

represents the number of connected links between neighbor nodes of node

z

According to Formula (1), it can be seen that when measuring the similarity between nodes in the network, the TPSR3 algorithm considered the attributes information of common neighbors of predicted nodes, it also considered the path information between nodes and adjusts the contribution of third-order paths to node similarity by parameter $θ$ .

3. The DCCLP Algorithm

Similar to existing algorithms that considered the network structure, the TPSR3 algorithm did not fully consider the importance factors, the importance of nodes has an important effect on the possibility of a link’s existence between nodes. By analyzing the impact of different centralities on a link’s existence, the literature [21] points out that degree centrality has the most significant effect, and the proximity centrality of nodes can also reflect the importance of nodes in the network. The degree centrality and proximity centrality are integrated into the TPSR3 algorithm in this paper, called DCCLP.

For two nodes $x, y \in V$ in the undirected unweighted network $G = (V, E)$ , among them, $V$ is the node set and $E$ is the link set. $A = (e_{x y})$ is an adjacency matrix and also a symmetric matrix, if $e_{x y} = 1$ , it indicates that there are connected link between node $x$ and $y$ , otherwise $e_{x y} = 0$ . We define the similarity between node $x$ and $y$ as shown in Formula (3).

(3) $s_{x y}^{D C C L P} = α \sum_{z \in Γ (x) \cap Γ (y)} (\frac{C_{z}}{k_{z}} + \frac{1}{k_{z}} + \frac{{C C}_{z}}{k_{z}}) + β {D C}_{x y} + θ |L_{x y}^{3}|$

where

α > 0, β > 0

are adjustable parameters, and

α + β = 1

. The parameter

θ

is used to control the contribution of the third-order paths to the node similarity,

k_{z}

represents the degree of node

z

{D C}_{x y}

represents th degree centrality of node pairs

(x, y)

(4) ${D C}_{x y} = \frac{k_{x} k_{y}}{{(|V| - 1)}^{2}}$

where

|V|

is the number of nodes in the network

G

In the network $G$ , the length of the average shortest path from node $z$ to other nodes is expressed as Formula (5).

(5) $d_{z} = \frac{1}{N - 1} \sum_{j = 1, j \neq z}^{N} d_{z j}$

where

d_{z j}

represents the distance of the shortest path between node

z

and node

j

, and the smaller

d_{z}

, the more important node

z

is in the network

G

The proximity centrality ${C C}_{z}$ of node $z$ in the network $G$ is the inverse of $d_{z}$ , that is,

(6) ${C C}_{z} = \frac{N - 1}{\sum_{j = 1, j \neq z}^{N} d_{z j}}$

4. The Parameter Estimation of DCCLP Algorithm

In 2022, Braik M et al. [22] proposed the White Shark Optimization (WSO) algorithm to solve global optimization problems. The core idea of the WSO algorithm is inspired by the hunting behavior of great white sharks, including their extraordinary hearing and smell when sailing and foraging in the sea. To achieve optimization, they carried out the mathematical modeling of great white sharks’ foraging behavior, to adapt to the full balance between exploration and development of a great white shark, and to help explore and develop every potential area of the space.

The parameters $α$ and $θ$ in the DCCLP algorithm are estimate using the WSO algorithm, and then we obtain the optimal parameter $α^{*}$ and $θ^{*}$ in each network. The steps for estimating the parameters of the DCCLP algorithm are as follows.

Step 1 For the unweighted network, divide it into training set $G_{t r a i n}$ and testing set $G_{t e s t}$ according to a certain partition ratio. Initialize the parameter values using the WSO method, and calculate the AUC indicator in training set using the DCCLP algorithm.

Step 2: We takes the maximum AUC indicator of the training set network as the objective function, and the values of parameters $α$ and $θ$ as the objective constraints, forming the following single objective optimization problem.

(7) ${m a x}_{α, θ} A U C (A_{t r a i n})$

where

A_{t r a i n}

represents the adjacency matrix corresponding to the training network

G_{t r a i n}

Step 3: By using the WSO algorithm to solve the optimization problem (7), the optimal value of parameters α* and θ* are obtained.

5. Experimental Results and Analysis

In order to evaluate the prediction accuracy of the DCCLP algorithm on the networks, Area Under the Curve (AUC) and Precision are selected as evaluation indicators, the definition of two indicators is as follows.

AUC [23] can be understand as the following definition. After obtaining the similarity score, randomly select one link from the testing set and the non-existent link set each time, and compare the scores of the two links, if the score of link in the testing set is higher than that of non-existent link, 1 point will be added; otherwise, 0.5 points will be added. It may be noted that there were $n^{'}$ testing set with high scores on both sides, while there were $n^{″}$ set with no high scores on either side, the calculation method of AUC is shown in Formula (8), $n$ is the number of independent experiments.

(8) $A U C = \frac{n^{'} + 0.5 n^{″}}{n}$

Precision [24] can be understand as the following definition. After obtaining the similarity scores of all links, we rank the scores of the testing set and the non-existent links in descending order, taking the first $L$ links as the prediction list; If there are $m$ links in the testing set among the $L$ links, the calculation method for Precision is shown in Formula (9).

(9) $P r e c i s i o n = \frac{m}{L}$

Next, to verify the effectiveness of the proposed algorithm, we conducted two sets of comparative experiments.

(1). Comparison of prediction accuracy between DCCLP algorithm and algorithms in [17]

To verify the effectiveness and prediction the accuracy of the DCCLP algorithm, we compare it with the nine algorithms mentioned in the literature [17] on different scale networks. Before the experiment, the dataset is preprocessed, that is, the directed network is converted to the undirected network, and the weight of the links in the weighted network is not considered. Table 1 shows the AUC indicator of ten algorithms. Table 2 shows the prediction Precision of ten algorithms. Figure 2 shows the AUC and Precision of ten algorithms on different networks. The black bold values in Table 1 and Table 2 represent the best prediction accuracy among the ten algorithms. The first nine columns in Table 1 and Table 2 are from the literature [17].

According to the above experimental results, our DCCLP algorithm has better prediction accuracy and predictive performance than the other nine compared algorithms.

In the AUC results observed in Table 1, on the nine networks, the DCCLP algorithm obtains the optimal AUC indicator, especially on the Netscience network, the AUC value is improved by 5.89%. In the Precision results observed in Table 2, on nine networks, the DCCLP algorithm obtains the best Precision indicator, especially on the FWEW network, where the Precision is improved by 5.68% compared with the best of the other nine algorithms. By analyzing the advantages of our DCCLP algorithm, there are mainly the following three aspects. Firstly, compared with the local similarity methods, for example, CN, RA, AA, and the TPSR2 methods which mainly use node attribute information to predict links’ existence, our DCCLP algorithm by considering the importance of nodes in the network. Secondly, the global similarity method Katz predicts the existence of links by considering all the paths in the network, the DCCLP algorithm uses less path information to reduce the computational complexity of the algorithm. Thirdly, our DCCLP algorithm considers the degree attribute of the predicted node itself when measuring the similarity between nodes. Our DCCLP algorithm performs best in comparison algorithms.

The test sets contain10% of the links of the whole network, and the results in each network are the average of 100 experiments. For the DCCLP algorithm, the optimal values of adjustable parameters $θ^{*}$ and $α^{*}$ are shown in parentheses in the table, the parameter in LP method and Katz method are set 0.01.

(2). Comparison of the DCCLP algorithm with existing algorithms

To further verify the effectiveness and predictive performance of the DCCLP algorithm, we compared it with the CCNC [12], CCLP [18] and CN2D [19] algorithms which considered the network structure and ignore the influence of node degree centrality on link existence in the network. Twenty-one networks of different scales are collected for comparative experiments. These networks involve social networks, biological networks, citation networks, cooperative networks, aviation networks, and other networks. Table 3 shows the basic statistical characteristics of the networks.

The DCCLP algorithm proposed in this paper, the CCNC, CCLP, and CN2D algorithms, are applied for link prediction in twenty-one networks (shown in Table 3), the prediction results are shown in Table 4 and Table 5, and Figure 3.

Table 4 and Table 5, respectively, show the four algorithms’ results of AUC and Precision on twenty-one networks. The values in parentheses in Table 4 and Table 5 represent the optimal parameters $θ^{*}$ and $α^{*}$ . The experimental data of the first three algorithms are cited in the corresponding literature, indicating that this data is from the literature, and the rest are the results of this paper.

From the above experimental results, among the twenty-one networks shown in Table 3, there are eleven networks with the best AUC indicator, and nine networks with the best Precision indicator. These further show the superiority of the DCCLP algorithm. Compared with the other three algorithms, our DCCLP algorithm performs better in biological networks, cooperative networks, and aviation networks through the introduction of the node importance. The prediction accuracy in social networks is not as good as the CCNC algorithm.

The experimental results show that, compared with CCNC, CCLP, and CN2D algorithms, the DCCLP algorithm has better prediction accuracy and better prediction effect. At the same time, the results also show that the clustering coefficient, degree, proximity centrality of common neighbor nodes and the degree centrality of predicted nodes can promote the possibility of connecting links between nodes and can well reflect the difference of the contribution ability of each neighbor node.

6. Conclusions

The degree centrality and the proximity centrality are integrated into the TPSR3 algorithm, and the link prediction algorithm DCCLP is proposed in this paper. The DCCLP algorithm predicts the existence of unknown links in the network from the perspectives of network structure similarity and node importance. Through the two groups’ comparison experiments, the prediction accuracy AUC and Precision of the DCCLP algorithm outperforms in all comparison algorithms, which verifies the effectiveness and feasibility of the DCCLP algorithm, and further illustrates the importance of nodes to improve the prediction accuracy for link prediction.

The DCCLP algorithm proposed in this paper aimed to predict and analyze undirected and unweighted networks and was not applicable to other type of networks. In addition, the algorithm has high computational complexity and is not suitable for link prediction in large-scale networks. When the network structure is sparse, the prediction accuracy of the DCCLP algorithm not well and cannot mine effective network information. In addition, in the algorithm training phase, the WSO algorithm is used to estimate the parameters, which is time-consuming. In the subsequent research, the metaheuristic with better performance can be selected to improve the calculation efficiency of the algorithm and further optimize the prediction performance of the link prediction.

Author Contributions

Conceptualization, F.D. and F.Z.; methodology, J.Z. and F.D.; software, J.Z.; validation, W.G., F.D. and F.Z.; formal analysis, W.G.; investigation, J.Z.; resources, J.Z.; data curation, J.Z.; writing—original draft preparation, J.Z. and F.D.; writing—review and editing, J.Z., F.D., F.Z. and W.G.; visualization, J.Z.; supervision, F.Z.; project administration, F.D.; funding acquisition, F.D. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Publicly available networks were analyzed in this study. These networks can be found here: http://www.konect.cc/networks (accessed on 8 June 2023); http://vlado.fmf.uni-lj.si/pub/netwroks/data/bio/foodweb/foodweb.htm (accessed on 8 June 2023); http://www-personal.umich.edu/~mejn/netdata/ (accessed on 8 June 2023); http://vladowiki.fmf.uni-lj.si/doku.php?id=pajek:data:urls:index (accessed on 8 June 2023); http://snap.stanford.edu/data/ (accessed on 8 June 2023).

Acknowledgments

The authors would like to thank the reviewers for their valuable suggestions and comments which significantly improved the presentation.

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1. The framework of DCCLP algorithm.

View Image - Figure 2. The indicator AUC and Precision of ten algorithms on a set of real-world networks. For almost all networks (expect C. elegans and NetScience), the DCCLP algorithm surpasses all comparison algorithms.

Figure 2. The indicator AUC and Precision of ten algorithms on a set of real-world networks. For almost all networks (expect C. elegans and NetScience), the DCCLP algorithm surpasses all comparison algorithms.

Figure 3. The indicator AUC and Precision of the DCCLP algorithm and the comparative link prediction algorithms.

Table 1

Prediction performance AUC of ten algorithms on a set of real-world networks.

Networks	CCN	TPSR2	TPSR3	TPSR4	CN	AA	RA	LP (0.01)	Katz (0.01)	DCCLP θ, α
Jazz	0.9710	0.9710	0.9220	0.9130	0.9550	0.9620	0.9710	0.9470	0.9420	0.9720 (0.0001, 1)
USAir	0.9530	0.9540	0.9260	0.9200	0.9370	0.9480	0.9530	0.9270	0.9240	0.9703 (0.0013, 0.9798)
C. elegans	0.8980	0.8640	0.8710	0.8630	0.8430	0.8600	0.8630	0.8590	0.8560	0.8602 (0.0422, 0.9242)
FWFW	0.7580	0.6250	0.8080	0.7860	0.6110	0.6130	0.6160	0.6720	0.6790	0.8143 (0.0943, 0.0290)
FWFD	0.7570	0.6200	0.8080	0.7860	0.6100	0.6100	0.6130	0.6720	0.6800	0.8168 (0.0819, 0.0478)
FWEW	0.8120	0.7130	0.8410	0.8320	0.6920	0.6990	0.7070	0.7360	0.7410	0.8505 (0.0713, 0.0042)
FWMW	0.7930	0.7140	0.8180	0.8040	0.7020	0.7070	0.7110	0.7390	0.7400	0.8249 (0.0829, 0.0571)
PB	0.9420	0.9230	0.9320	0.9270	0.9190	0.9210	0.9230	0.9300	0.9240	0.9430 (0.0366, 0.7578)
Netscience	0.9400	0.9350	0.9400	0.9400	0.9360	0.9360	0.9350	0.9400	0.9400	0.9989 (0.0822, 0.0160)

Table 2

Prediction Precision of ten algorithms on a set of real-world networks.

Networks	CCN	TPSR2	TPSR3	TPSR4	CN	AA	RA	LP (0.01)	Katz (0.01)	DCCLP θ, α
Jazz	0.8240	0.8380	0.6500	0.6000	0.8190	0.8380	0.8240	0.7840	0.8090	0.8390 (0.0001, 1)
USAir	0.6270	0.6310	0.5710	0.5610	0.5910	0.6070	0.6270	0.5860	0.5900	0.6374 (0.0013, 0.9798)
C. elegans	0.1430	0.1330	0.1590	0.1520	0.1320	0.1340	0.1330	0.1360	0.1360	0.1587 (0.0422, 0.9242)
FWFW	0.1600	0.0880	0.3420	0.2900	0.0900	0.0900	0.0860	0.1300	0.0980	0.3868 (0.0943, 0.0290)
FWFD	0.1680	0.0910	0.3510	0.2990	0.0900	0.0890	0.0870	0.1310	0.0950	0.3955 (0.0819, 0.0478)
FWEW	0.2600	0.1700	0.3200	0.3060	0.1470	0.1570	0.1650	0.1850	0.1580	0.3768 (0.0713, 0.0042)
FWMW	0.2270	0.1470	0.3340	0.3010	0.1390	0.1440	0.1450	0.1740	0.1470	0.3818 (0.0829, 0.0571)
PB	0.2980	0.2470	0.5030	0.4830	0.4050	0.3610	0.2400	0.4430	0.4130	0.5075 (0.0366, 0.7578)
Netscience	0.9700	0.9720	0.9710	0.9710	0.8160	0.9660	0.9640	0.8100	0.8100	0.8409 (0.0822, 0.0160)

Table 3

Basic topological properties of the twenty-one networks used in the experiments.

Networks	$\|V\|$	$\|E\|$	$⟨k⟩$	$⟨C⟩$	D	H	$⟨d⟩$
Polbook	105	441	8.400	0.4875	0.0808	1.4207	3.0788
Dolphin	62	159	5.129	0.3852	0.0841	1.3243	3.1089
karate	34	78	4.5882	0.6001	0.1390	1.6933	2.4082
FWEW	69	880	25.5072	0.5521	0.3751	1.2746	1.636
FWFW	128	2075	32.4219	0.3346	0.2553	1.2370	1.7763
FWMW	97	1446	29.8144	0.4683	0.3106	1.2656	1.6929
football	115	613	10.6609	0.4032	0.0935	1.0069	2.5082
Grassland	75	114	3.0400	0.8198	0.0411	2.7499	3.1996
Trainbombing	64	243	7.5938	0.7473	0.1205	1.6588	2.691
C. elegans	297	2148	14.4646	0.3429	0.0489	1.8008	2.4553
USAir	332	2126	12.8072	0.7909	0.0387	3.4639	2.7381
Infectious	410	2765	13.4878	0.4802	0.0330	1.3876	3.6309
FWFD	128	2106	32.9063	0.3346	0.2591	1.2307	1.7724
Metabolic	453	2025	8.9404	0.6597	0.0198	4.485	2.6638
Jazz	198	2742	27.6970	0.6427	0.1406	1.3951	2.235
US Roads	49	107	4.3673	0.5171	0.0910	1.1299	4.1633
PB	1222	16,714	27.3552	0.4307	0.0224	2.9707	2.7375
Netscience	1589	2742	3.4512	0.8310	0.0022	2.0105	0.3514
Email	1133	5451	9.6222	0.3535	0.0085	1.9421	3.606
Bio-DM-LC	658	1129	3.4316	0.5166	0.0052	3.1149	3.5637
Bio-CE-GT	924	3239	7.0108	0.6820	0.0076	4.1392	3.3724

$|V|$ is the number of nodes; $|E|$ is the number of links; $⟨k⟩$ is the average degree; $⟨C⟩$ is the average clustering coefficient of the network; D is the network density; H is the degree heterogeneity; $⟨d⟩$ is the average distance.

Table 4

The AUC indicator for link prediction of four algorithms on twenty-one networks.

Networks	CN2D	CCLP	CCNC	DCCLP θ, α
Polbook	0.8791	0.8908	0.9240 [12]	0.8876 (0.0826, 0.7284)
Dolphin	0.7722	0.8020 [18]	0.8360 [12]	0.7981 (0.0883, 0.0735)
karate	0.6441	0.6960	0.7332	0.7925 (0.0960, 0.0784)
FWEW	0.6750	0.7026	0.7216	0.8505 (0.0713, 0.0042)
FWFW	0.6042	0.6362 [18]	0.6470 [12]	0.8143 (0.0943, 0.290)
FWMW	0.7079	0.7229	0.7267	0.8249 (0.0829, 0.0571)
football	0.8535	0.8397	0.8420	0.8472 (0.0834, 0.002)
Grassland	0.8236	0.7900 [18]	0.8987	0.8655 (0.0891, 0.0606)
Trainbombing	0.9283	0.9317	0.9424	0.9328 (0.0296, 0.9077)
C. elegans	0.8613 [19]	0.8658	0.8721	0.8602 (0.0422, 0.9242)
USAir	0.9695 [19]	0.9576	0.9620 [12]	0.9703 (0.0013, 0.9798)
Infectious	0.9393	0.9399	0.9447	0.9579 (0.0156, 0.6831)
FWFD	0.6003	0.6308	0.6318	0.8168 (0.0819, 0.0478)
metabolic	0.9039	0.9507	0.9592	0.9542 (0.0034, 1.0000)
Jazz	0.9685 [19]	0.9600 [18]	0.9740 [12]	0.9716 (0.0001, 1.000)
USRoads	0.9007	0.8647	0.8829	0.8182 (0.0892, 0.0566)
Email	0.8546	0.8570 [18]	0.8578	0.9168 (0.0504, 0.5232)
bio-DM-LC	0.6701	0.6498	0.6707	0.9684 (0.0411, 0.0007)
bio-CE-GT	0.9341	0.9403	0.9547	0.9717 (0.0168, 0.7671)
PB	0.9599 [19]	0.9266 [18]	0.9360 [12]	0.9397 (0.0366, 0.7578)
Netscience	0.9981 [19]	0.9480	0.9920	0.9989 (0.0822, 0.0160)

Table 5

The Precision indicator for link prediction of four algorithms on twenty-one networks.

Networks	CN2D	CCLP	CCNC	DCCLP θ, α
Polbook	0.1145	0.1426	0.1485	0.1257 (0.0826, 0.7284)
Dolphin	0.0646	0.0528	0.0576	0.0551 (0.0883, 0.0735)
karate	0.0307	0.0385	0.0424	0.0489 (0.0960, 0.0784)
FWEW	0.1423	0.1671	0.1978	0.3768 (0.0713, 0.0042)
FWFW	0.0853	0.0970 [18]	0.1068	0.3868 (0.0943, 0.290)
FWMW	0.1320	0.1440	0.1735	0.3818 (0.0829, 0.0571)
football	0.3112	0.2619	0.2738	0.2477 (0.0834, 0.002)
Grassland	0.0441	0.0565	0.0698	0.0465 (0.0891, 0.0606)
Trainbombing	0.1791	0.1912	0.2083	0.1777 (0.0296, 0.9077)
C. elegans	0.1213	0.1356	0.1316	0.1587 (0.0422, 0.9242)
USAir	0.6046	0.6166	0.6503	0.6374 (0.0013, 0.9798)
Infectious	0.3803	0.3592	0.5083	0.2720 (0.0156, 0.6831)
FWFD	0.0862	0.0967	0.1109	0.3955 (0.0819, 0.0478)
metabolic	0.1918	0.2467	0.3188	0.2929 (0.0034, 1.0000)
Jazz	0.8243	0.8590 [18]	0.8297	0.8385 (0.0001, 1.000)
USRoads	0.0800	0.0603	0.0657	0.0238 (0.0892, 0.0566)
Email	0.2953	0.3080 [18]	0.2878	0.1850 (0.0504, 0.5232)
bio-DM-LC	0.0043	0.0456	0.0207	0.4751 (0.0411, 0.0007)
bio-CE-GT	0.1076	0.1911	0.2228	0.2961 (0.0168, 0.7671)
PB	0.4288	0.4040 [18]	0.2732	0.5075 (0.0366, 0.7578)
Netscience	0.8464	0.8982	0.6241	0.8409 (0.0822, 0.0160)

References

1. Rossetti, G.; Cazabet, R. Community discovery in dynamic networks: A survey. ACM Comput. Surv.; 2018; 51, pp. 1-37. [DOI: https://dx.doi.org/10.1145/3172867]

2. Shakibian, H.; Moghadam Charkari, N. Mutual information model for link prediction in heterogeneous complex networks. Sci. Rep.; 2017; 7, 44981. [DOI: https://dx.doi.org/10.1038/srep44981]

3. Zareie, A.; Sheikhahmadi, A.; Jalili, M. Influential node ranking in social networks Based on neighborhood diversity. Future Gener. Comput. Syst.; 2019; 94, pp. 120-129. [DOI: https://dx.doi.org/10.1016/j.future.2018.11.023]

4. Zhang, F.; Liu, J.; Zuo, C. Research of complex network dynamics evolution. Information Engineering and Applications; Springer: London, UK, 2012; pp. 806-815.

5. Lü, L. Link Prediction on Complex networks. J. Univ. Electron. Sci. Technol. China; 2010; 39, pp. 651-661.

6. Clauset, A.; Moore, C.; Newman, M.E.J. Hierarchical structure and the prediction of Missing links in networks. Nature; 2008; 453, pp. 98-101. [DOI: https://dx.doi.org/10.1038/nature06830] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18451861]

7. Zhang, D.; Yin, J.; Zhu, X.; Zhang, C. Network representation learning: A survey. IEEE Trans. Big Data; 2018; 6, pp. 3-28. [DOI: https://dx.doi.org/10.1109/TBDATA.2018.2850013]

8. Rapoport, A. Spread of information through a population with socio-structural bias: I. Assumption of transitivity. Bull. Math. Biol.; 1953; 15, pp. 523-533. [DOI: https://dx.doi.org/10.1007/BF02476440]

9. Adamic, L.A.; Adar, E. Friends and neighbors on the Web. Soc. Netw.; 2003; 25, pp. 211-230. [DOI: https://dx.doi.org/10.1016/S0378-8733(03)00009-1]

10. Zhou, T.; Lü, L.; Zhang, Y.C. Predicting missing links via local information. Eur. Phys. J. B; 2009; 71, pp. 623-630. [DOI: https://dx.doi.org/10.1140/epjb/e2009-00335-8]

11. Gao, Y.; Zhang, Y.P.; Qian, F.L.; Zhao, S. Combined with Node Degree and Node Clustering of Link Prediction Algorithm. J. Chin. Comput. Syst.; 2017; 38, pp. 1436-1441.

12. Yu, Y.; Wang, Y.G.; Luo, Z.G.; Yang, Y.; Wang, X.; Gao, T.; Yu, Q. Link Prediction algorithm based on clustering coefficient and node centrality. J. Tsinghua Univ. (Sci. Technol.); 2022; 62, pp. 98-104.

13. Katz, L. A new status index derived from sociometric analysis. Psychometrika; 1953; 18, pp. 39-43. [DOI: https://dx.doi.org/10.1007/BF02289026]

14. Brin, S.; Page, L. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst.; 1998; 30, pp. 107-117. [DOI: https://dx.doi.org/10.1016/S0169-7552(98)00110-X]

15. Lv, L.Y.; Jin, C.H.; Zhou, T. Similarity index based on local paths for link prediction of Complex networks. Phys. Rev. E; 2009; 80, 046122.

16. Liu, W.; Lv, L.Y. Link prediction based on local random walk. Europhys. Lett.; 2010; 89, 58007. [DOI: https://dx.doi.org/10.1209/0295-5075/89/58007]

17. Qian, F.; Gao, Y.; Zhao, S.; Tang, J.; Zhang, Y. Combining topological properties and strong ties for Link prediction. Tsinghua Sci. Technol.; 2017; 22, pp. 595-608. [DOI: https://dx.doi.org/10.23919/TST.2017.8195343]

18. Wu, Z.; Lin, Y.; Wang, J.; Gregory, S. Link prediction with node clustering coefficient. Phys. A Stat. Mech. Its Appl.; 2016; 452, pp. 1-8. [DOI: https://dx.doi.org/10.1016/j.physa.2016.01.038]

19. Mumin, D.; Shi, L.L.; Liu, L. An efficient algorithm for link prediction based on local Information: Considering the effect of node degree. Concurr. Comput. Pract. Exp.; 2022; 34, 6289. [DOI: https://dx.doi.org/10.1002/cpe.6289]

20. Liu, Y.; Liu, S.; Yu, F.; Yang, X. Link prediction algorithm based on the initial information Contribution of nodes. Inf. Sci.; 2022; 608, pp. 1591-1616. [DOI: https://dx.doi.org/10.1016/j.ins.2022.07.030]

21. Fu, X.Y.; Gu, Y.J. Unsupervised Link Prediction Algorithm Fusing Node Importance. Comput. Eng. Appl.; 2022; 58, pp. 94-101.

22. Liu, B.; Xu, S.; Li, T.; Xiao, J.; Xu, X.K. Quantifying the effects of topology and weight for link prediction in weighted complex networks. Entropy; 2018; 20, 363. [DOI: https://dx.doi.org/10.3390/e20050363] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33265453]

23. Samei, Z.; Jalili, M. Application of hyperbolic geometry in link prediction of multiplex networks. Sci. Rep.; 2019; 9, 12604. [DOI: https://dx.doi.org/10.1038/s41598-019-49001-7] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31471541]

24. Braik, M.; Hammouri, A.; Atwan, J.; Al-Betar, M.A.; Awadallah, M.A. White Shark Optimizer: A novel bio-inspired metaheuristic algorithm for global optimization problems. Knowl.-Based Syst.; 2022; 243, 108457. [DOI: https://dx.doi.org/10.1016/j.knosys.2022.108457]

Word count: 4720

Show less

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Link prediction is one of the most important and challenging tasks in complex network analysis, which aims to predict the existence of unknown links based on the known information in the network. As critical topological properties in the network, node’s degree and clustering coefficient are well-suited for describing the tightness of connection between nodes. The importance of node can affect the possibility of link existence to a certain extent. By analyzing the impact of different centrality on links, which concluded that the degree centrality and proximity centrality have the greatest influence on network link prediction. A link prediction algorithm combines importance of node and network topological properties, called DCCLP, is proposed in this paper, the symmetry of the adjacency matrix is considered in the DCCLP link prediction algorithm to further describe the structural similarity of network nodes. In the training phase of the DCCLP algorithm, the maximized AUC indicator in the training set as the objective, and the optimal parameters are estimated by utilizing the White Shark Optimization algorithm. Then the prediction accuracy of the DCCLP algorithm is evaluated in the test set. By experimenting on twenty-one networks with different scales, and comparing with existing algorithms, the experimental results show that the effectiveness and feasibility of DCCLP algorithm, and further illustrate the importance of the degree centrality of node pairs and proximity centrality of nodes to improve the prediction accuracy of link prediction.

Details

Title

Integrating Node Importance and Network Topological Properties for Link Prediction in Complex Network

Author

Zhu, Junxi; Dai, Fang; Zhao, Fengqun; Guo, Wenyan

First page

1492

Publication year

2023

Publication date

2023

Publisher

MDPI AG

e-ISSN

20738994

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/sym15081492

ProQuest document ID

2857426332

Integrating Node Importance and Network Topological Properties for Link Prediction in Complex Network

Jump to:

Full text

Abstract

Details

Suggested sources