This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
Data publishing of social network is very important for scientific research, commercial purpose, countries, and so on, but social network data includes privacy information and sensitive relations, which can be leaked by publishing directly. How to protect individual privacy, and make the publishing data or graph useful at the same time, has become very important problem of social network data publishing. One of the most important principle is that individual can decide his own privacy whether be published or not, that is to say individual has different privacy protection needs.
Anonymity techniques for data publishing have been used in the relational data for a long time, and make great progress in relational database area, including
Actually, a graph structure is necessary to represent the network vertexes relations rather than a two dimensional representation in relational database [11], degree denotes the relationship between two vertexes, high degrees mean the relationships are more closer among the vertexes, and there are only a small part of vertexes which degrees are high, degrees of most of vertexes are low in big social network. So a limited fraction of vertexes with high degrees bring a lot of data loss and computation cost when using unified anonymity methods and the same privacy protection level [12].
Personalized privacy protection based on data table was proposed firstly by Xiao and Tao in 2006 [13]. They used individual guarding node to set level of self-sensitive attribute and did not set the same anonymity level for all individuals, but rather anonymity according to setting guarding node.
Ever since then, more and more researcher paid more attention to personalized anonymity of data publishing and made modest progress. During the research process of social network privacy protection, because data of social network is more complex than traditional data table, most of social network research used unified anonymity methods and the same privacy protection level. For example, user can create their basic information, Web albums, Web logs, the lists of friends, and so on. But Facebook, Twitter, Wechat, and voov meeting, they were able to decide those information whether can be accessed and viewed by others according to their own privacy level, consequently achieved purpose of preserving privacy to some extent.
The data in social network is more complex than two-dimensional data table in relational database. Privacy protection in social network can be summarized as vertex protection, edge protection, and sensitive attribute protection. Vertex protection is to prevent an attacker from identifying a vertex in an anonymous publishing graph with a high probability. Edge protection is to prevent an attacker from identifying an edge in an anonymous publishing graph with a high probability. Attribute protection is to prevent the attacker from getting vertexes or sensitive attributes of edges with a high probability. We cannot use anonymity methods and technologies, which used into traditional two dimensional data table, into social network directly, and users have personalized protecting privacy requirements (vertex protection, edge protection, and sensitive attribute protection) in the real social network such as the users of Facebook, Twitter, and Wechat, so it has the very high research value that personalized privacy protection methods are used into social network data publishing [14].
2. Problem Definition
2.1. Related Concepts
Definition 1.
Table 1 is said to satisfy
Definition 2. k-Degree anonymity.
A social network graph
Table 1
Example of 2-anonymity,
No. | Nation | Birthday | Gender | ZIP | Salary |
1 | Yellow | 1995 | F | 26118 | 9000 |
2 | Yellow | 1995 | F | 26118 | 5000 |
3 | Yellow | 1994 | M | 26112 | 27000 |
4 | Yellow | 1994 | M | 26112 | 10000 |
5 | Brown | 1993 | F | 26113 | 7500 |
6 | Brown | 1993 | F | 26113 | 30000 |
7 | Brown | 1993 | F | 26113 | 9500 |
Definition 3. Graph isomorphism.
For graphs:
[figure(s) omitted; refer to PDF]
For example, when we delete the node information of (a) and (b) in Figure 1, (a) and (b) are isomorphic [18].
Definition 4.
For a graph
Definition 5. k-Isomorphism vertex group.
Given a
Definition 6. k-Isomorphism edge group.
Given a
Definition 7. Social network graph.
Given a social network graph:
(a) Identifier attribute (ID) of vertex as
(b) Quasi-identifier attribute (QI) of vertex
(c) Sensitive attribute (SA) of vertex
(d) Quasi-identifier attribute (QI) of edge
(e) Sensitive attribute (SA) of edge
(f) Other attributes (OA)
Attribute (QI) of edge denotes by vector pair (
Table 2
The vertex table.
Vid | Name | Age | Gender | ZIP | Salary |
1 | Kya | 25 | F | 261186 | 9000 |
2 | John | 25 | M | 261185 | 5000 |
3 | Mira | 26 | F | 261101 | 27000 |
4 | Maci | 26 | F | 261131 | 10000 |
5 | Cathy | 27 | F | 261131 | 7500 |
6 | Kurt | 27 | M | 261124 | 30000 |
7 | Sage | 27 | F | 261186 | 9500 |
8 | Rain | 37 | M | 261185 | 28000 |
9 | Toni | 35 | M | 261124 | 22000 |
Table 3
The edge table.
Eid | Vid1 | Vid2 | Weighted relationship |
1 | 1 | 2 | 1 |
2 | 1 | 4 | 3 |
3 | 2 | 3 | 2 |
4 | 2 | 4 | 1 |
5 | 2 | 5 | 2 |
6 | 3 | 5 | 3 |
7 | 3 | 7 | 2 |
8 | 4 | 6 | 2 |
9 | 5 | 6 | 1 |
10 | 5 | 9 | 1 |
11 | 7 | 8 | 1 |
12 | 7 | 9 | 3 |
13 | 8 | 9 | 1 |
Table 4
Table of primal personal health information.
Name | Age | Gender | ZIP | Disease |
Kya | 25 | F | 261186 | Cold |
John | 25 | M | 261185 | Hypertension |
Mira | 26 | F | 261101 | AIDS |
Maci | 26 | F | 261131 | Cold |
Cathy | 27 | F | 261131 | Cancer |
Kurt | 27 | M | 261124 | Obesity |
Sage | 27 | F | 261186 | Pneumonia |
Rain | 37 | M | 261185 | Diabetes |
Toni | 35 | M | 261124 | Short breath |
2.2. Sensitive Degree of Friend Relationship (SA) of Vertex and Edge
2.2.1. Sensitive Degree of Friend Relationship (SA) of Vertex
We use the influence matrix to represent the level of influence of vertex-sensitive attributes [19, 20]. We can use the influence matrix to meet the requirements of personalized privacy protection of users.
Influence matrix
The
2.2.2. Sensitive Degree of Friend Relationship (SA) of Edge
We described relationships of simple friend, good friend, and sweetheart friend (boyfriend or girlfriend) among the vertexes in Figure 1 of friend relationship graph. Graph (
2.3.
In order to make it impossible for an attacker to infer the real relationship between targeted individuals and corresponding vertexes with a probability,
Definition 8.
Undirected graph
For example, in Figures 1,
Definition 9.
Undirected graph
Definition 10. Individual information leakage.
Suppose graph
(1) Vertex Leakage. The probability of ascertaining the corresponding relationship between the vertex in the graph
(2) Edge Leakage. The probability of ascertaining the corresponding relationship between the edge in the graph
(3) Leakage of Vertex Sensitive Information. The probability of ascertaining the sensitive information of target individual A in the primal graph
(4) Leakage of Edge Sensitive Information. The probability of ascertaining the sensitive information of the edge in the primal graph
2.4. Personalized
2.4.1. Personalized
Personalized
(1) Personalized
(2)
(3)
(4) If
Here, threshold
2.4.2. Personalized
There is an example which is shown to explain the definition and the process of personalized
[figure(s) omitted; refer to PDF]
Figure 1(a) is the subgraph
In Figure 2, (a) is the initial subgraph in Figure 1, and (b) and (c) are the isomorphism graphs corresponding to (a). From graph
Table 5
9 vertex groups of 3-isomorphism.
VCS | |||
1 | |||
2 | |||
3 | |||
4 | |||
5 | |||
6 | |||
7 | |||
8 | |||
9 |
Table 6
13 edges groups of 3-isomorphism.
ECS | |||
1 | |||
2 | |||
3 | |||
4 | |||
5 | |||
6 | |||
7 | |||
8 | |||
9 | |||
10 | |||
11 | |||
12 | |||
13 |
Now, the 9 3-isomorphism vertex groups are generalized by their identifier attributes according to parameter
[figure(s) omitted; refer to PDF]
The
Table 7
Example of isomorphism groups vertex’s attributes values.
VCS | Num | Race | Occupation | Age | Gender | ZIP | Disease |
1 | Asian | Salesman | 25 | M | 150086 | Flu | |
Asian | Salesman | 35 | F | 150084 | Flu | ||
Black | Teacher | 35 | M | 150081 | Mammary cancer | ||
2 | White | Teacher | 45 | F | 150090 | Lung cancer | |
White | Driver | 45 | M | 150041 | Lung cancer | ||
White | Driver | 50 | M | 150024 | Lung cancer |
Table 8
Example of isomorphism groups’ generalization identifier attributes values.
VCS | Num | Race | Occupation | Age | Gender | ZIP | Disease |
1 | Asian | Salesman | [25,35) | 15008 | Flu | ||
Asian | Salesman | [25,35) | Flu | ||||
Black | Teacher | [25,35) | Mammary cancer | ||||
2 | White | Teacher | [45,50) | F | 1500 | ||
White | Driver | [45,50) | M | ||||
White | Driver | [45,50) | M |
3. Personalized
The basic algorithm principle is that
Algorithm 1: Personalized
Inputs:
Initial anonymous graph G = (V, E),
Sensitivity parameters: k’(k’ ≥2); l(2 ≤ l ≤ k’); m(l ≤ m ≤ |V|);
Node attributes table: AS = {viS,viN(1),…,viN(s),viC(1),…,viC(t)};
Edge attributes table: AS = {vjS,vjN(1),…,vjN(s),vjC(1),…,vjC(t)};
All the classified attribute inheritance tree HC
Input parameters α =0 and β
Outputs:
Anonymous graph Gp = {g1, g2, …, ge};
The whole VCS, ECS and their attribute information;
Steps:
1 anonymous graph Gp, groups VCS and ECS are caught;
2 read α to judge the generalizing type;
3 got the group number:NVCS = |NVP|/k, NECS = |NEP|/k;
4 fori =1 to NVCSdo//QI attributes generalization
5 forj =1 to sdo//numeric type attributes generalization
6 gen(VCSi)[Nj] = [min{v1N(j), …,vkN(j)},max{v1N(j), …,vkN(j)}]
7 end for
8 forj =1 to tdo//t type QI attributes generalization
9 gen(VCSi)[Cj] = {v1C(j),…,vkC(j)}
10 end for
11 while
12 if(sensitive attributes are classified) then
13 forj =0 to kdo
14 vjC is replaced by its parents node in classified inheritance tree of sensitive attribute
15 if
16 end for
17 Else
18 forj =1 to kdo
19 the interval of vjN is changed to its neighborhood;
20 if
21 end for
22 end if
23 end while
24 end for
25 fori =1 to NECSdo
26 forj =1 to pdo
27 gen(ECSi)[Nj] = [min{v1N(j),…,vkN(j)},max{v1N(j), …,vkN(j)}]
28 end for
29 forj =1 to qdo
30 gen(ECSi)[Cj] = {e1C(j),…,ekC(j)}
31 end for
32 while
33 if(sensitive attributes are classified) then
34 forj =1 to kdo
35 vjC is replaced by its parents node in classified inheritance tree of sensitive attribute
36 if
37 end for
38 Else
39 forj =1 to kdo
40 the interval of njN is changed to its neighborhood;
41 if
42 end for
43 end if
44 end while
45 end for
46 anonymous graph Gp is published; all the VCS nodes, ECS edges and their attribute information are published
4. Experiments and Results
The experiments were completed in the PC with Intel(R) Core(TM) i5-4590 CPU @ 3.30 GHz, 8 GB memory, and the OS is Microsoft Windows 7. The programs were coded and compiled in VS 2019 IDE.
The vertex (nodes) data set in these experiments are from adults census data set of the UC Irvine Machine Learning Repository [25, 26]. There are two experiments examples, and the vertex numbers of each are 300 and 1000. In these vertexes, 6 attributes are considered in the experiments, which are age, occupation, race, gender, zip, and disease. In these attributes, age is numeric, and the others are category. The attribute disease is sensitive attribute. The edge set in these experiments is created by Pajek software randomly, and the numbers of nodes are, respectively, 5000, 10000, 15000, 20000, and 25000.
Information loss was compared between the algorithm in this paper that we proposed and paper [15]. We use the information loss method from paper [15]. The algorithm in this paper was named as ACIM (anonymous composite improved model) algorithm, and the algorithm in paper [15] was named as ACM (anonymous composite model) algorithm. In personalized
When
Figure 5 shows that some nodes are added to construct the isomorphic graphs, the percentage of adding nodes in all the nodes of the graph is shown in Figure 5, and the situation of edges is shown in Figure 6. With the
[figure(s) omitted; refer to PDF]
In Figures 7–9, the loss of information is shown with
[figure(s) omitted; refer to PDF]
When
[figure(s) omitted; refer to PDF]
From Figure 10, when
Figures 12 and 13 show the comparison of generalization information loss with different
[figure(s) omitted; refer to PDF]
5. Conclusion
The authors study
Acknowledgments
The authors want to thank the helpful comments and suggestions from the anonymous reviewers. This work was supported in part by the Natural Science Foundation of Taizhou University (Grant no. TZXY2019QDJJ008) and the Natural Science Foundation of Heilongjiang Province of China (Grant no. JJ2019LH0048).
[1] X. M. Ren, B. X. Jia, K. C. Wang, J. Cheng, "Research on k-anonymity privacy protection of social network," Applied Mechanics and Materials, vol. 530-531, pp. 701-704, DOI: 10.4028/www.scientific.net/AMM.530-531.701, 2014.
[2] H. Miyajima, N. Shigei, H. Miyajima, Y. Miyanishi, S. Kitagami, N. Shiratori, "New privacy preserving clustering methods for secure multiparty computation," International Journal of Computer Science, vol. 6 no. 1, pp. 270-276, DOI: 10.5430/air.v6n1p27, 2016.
[3] K. Macwan, S. Patel, "Privacy preservation approaches for social network data publishing," Artificial Intelligence for Cyber Security: Methods, Issues and Possible Horizons or Opportunities, pp. 213-233, DOI: 10.1007/978-3-030-72236-4_9, 2021.
[4] R. Kumar, J. Novak, A. Tomkins, "Structure and evolution of online social networks," Proceedings of the 12th ACM SIGKDD international conference on Knowledge Discovery and Data Mining(KDD), pp. 611-617, .
[5] Y. J. Luo, Q. Liu, Y. Wang, "Overview of protecting user privacy in social networks," Application Research of Computers, vol. 10, pp. 3061-3064, 2010.
[6] Y. Xiaowei, Z. Weitao, "On link privacy in randomizing social networks," Knowledge and Information Systems, vol. 28 no. 3, pp. 645-663, DOI: 10.1007/s10115-010-0353-5, 2011.
[7] J. Cheng, A. W. Fu, J. Liu, "K-isomorphism: privacy preserving network publication against structural attacks," Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 459-470, .
[8] K. Liu, E. Terzi, "Towards identity anonymization on graphs//proceedings of the 2008 ACM SIGMOD international conference on management of data," ACM, pp. 93-106, 2008.
[9] A. Campan, T. Truta, "Data and structural k-anonymity in social networks," Privacy, Security, and Trust in KDD, vol. 5456, pp. 33-54, DOI: 10.1007/978-3-642-01718-6_4, 2009.
[10] M. K. SUNG, K. Y. LEE, J.-B. SHIN, Y. D. CHUNG, "A privacy protection method for social network data against content/degree attacks," IEICE Transactions on Information and Systems, vol. E95-D no. 1, pp. 152-160, DOI: 10.1587/transinf.E95.D.152, 2012.
[11] E. Y. Baagyere, Z. Qin, H. Xiong, Q. Zhiguang, "The structural properties of online social networks and their application areas," International Journal of Computer Science, vol. 43 no. 2, pp. 270-276, 2016.
[12] N. Li, X.-L. Zhang, "Research on dynamic social network anonymity technology for protecting community structure," International Journal of Network Security, vol. 23 no. 4, pp. 576-587, 2021.
[13] X. K. Xiao, Y. F. Tao, "Personalized privacy preservation," Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 229-240, .
[14] X. Zhang, J. Liu, H. Bi, J. Li, Y. Wang, "Personalized K-in&out-degree anonymity method for large-scale social networks based on hierarchical community structure," International Journal of Network Security, vol. 23 no. 2, pp. 314-325, 2021.
[15] L. Sweeney, "k-Anonymity: a model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems, vol. 10 no. 5, pp. 557-570, DOI: 10.1142/S0218488502001648, 2002.
[16] H. Wu, J. Zhang, B. Wang, J. Yang, B. Sun, "(d, k)-Anonymity for social networks publication against neighborhood attacks," Journal of Convergence Information Technology, vol. 8 no. 2, pp. 59-67, DOI: 10.4156/jcit.vol8.issue2.8, 2013.
[17] H. W. Wu, Research on Anonymity Techniques for Privacy-Preserving Data Publishing in Social Networks, 2013.
[18] L. Zou, L. Chen, M. T. Ozsu, "k-Automorphism," Proceedings of the VLDB Endowment, vol. 2 no. 1, pp. 946-957, DOI: 10.14778/1687627.1687734, 2009.
[19] X. Ren, J. Yang, F. Wei, "Research on CBK(L,K)-anonymity algorithm," International Journal of Advancements in Computing Technology, vol. 3 no. 4, pp. 165-173, DOI: 10.4156/ijact.vol3.issue4.18, 2011.
[20] L. I. Siyu, Research for Protecting Privacy of Social Network Data Based on Relevance Degree Perception, 2020.
[21] P. Samarati, L. Sweeney, "Generalizing data to provide anonymity when disclosing information," Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1998.
[22] B. Zhou, J. Pei, "The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks," KAIS, vol. 28 no. 1, pp. 47-77, DOI: 10.1007/s10115-010-0311-2, 2011.
[23] X. M. Ren, D. X. Jiang, K. C. Wang, R. A. N. Qi, "A personalized a (d, k)-anonymity for social network," 2017 2nd international conference on computer, Mechatronics and Electronic Engineering, pp. 167-174, .
[24] B. Zhou, J. Pei, "Preserving privacy in social networks against neighborhood attacks," Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08, pp. 506-515, DOI: 10.1109/ICDE.2008.4497459, .
[25] D. J. Newman, S. Hettich, C. L. Blake, C. J. Merz, UCI Repository of Machine Learning Databases, 1998. http://archive.ics.uci.edu/ml/datasets/Adult
[26] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, A. W. C. Fu, "Utility-based anonymization for privacy preservation with less information loss," Journal of SIGKDD Explorations, vol. 8 no. 2, pp. 21-30, DOI: 10.1145/1233321.1233324, 2006.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2022 Xiangmin Ren and Dexun Jiang. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
By mining the data published on social network, we can discover the hidden value of information including the privacy of individuals and organizations. Protecting privacy of individuals and organizations on social network has become the focus of more and more researchers. Based on the actual privacy protection need of edge sensitive attribute and vertexes sensitive attribute, we propose a new personalized
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer