Full Text

Turn on search term navigation

This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

1. Introduction

Data publishing of social network is very important for scientific research, commercial purpose, countries, and so on, but social network data includes privacy information and sensitive relations, which can be leaked by publishing directly. How to protect individual privacy, and make the publishing data or graph useful at the same time, has become very important problem of social network data publishing. One of the most important principle is that individual can decide his own privacy whether be published or not, that is to say individual has different privacy protection needs.

Anonymity techniques for data publishing have been used in the relational data for a long time, and make great progress in relational database area, including $k$ -anonymity, $l$ -diversity, generalization, and so forth [1, 2]. Can we apply the same anonymous techniques that apply to relational data to social networks? Social network data contains more information than relational data because network data contains vertexes (nodes), edges, relationships between nodes, and various metric features of the graph. Some researchers want to use these technologies on data publishing of social network [3]. So the structure and evaluation of social network method were proposed in paper [4], and the categories of attack in social network can be found in paper [5]. Graph modification [6], graph partitioning [7], graph isomorphism [8], clustering [9], attribute generalization [10], and so on were applied to data publishing of social network, and then more and more anonymous technologies of social networks appear in many academic papers.

Actually, a graph structure is necessary to represent the network vertexes relations rather than a two dimensional representation in relational database [11], degree denotes the relationship between two vertexes, high degrees mean the relationships are more closer among the vertexes, and there are only a small part of vertexes which degrees are high, degrees of most of vertexes are low in big social network. So a limited fraction of vertexes with high degrees bring a lot of data loss and computation cost when using unified anonymity methods and the same privacy protection level [12].

Personalized privacy protection based on data table was proposed firstly by Xiao and Tao in 2006 [13]. They used individual guarding node to set level of self-sensitive attribute and did not set the same anonymity level for all individuals, but rather anonymity according to setting guarding node.

Ever since then, more and more researcher paid more attention to personalized anonymity of data publishing and made modest progress. During the research process of social network privacy protection, because data of social network is more complex than traditional data table, most of social network research used unified anonymity methods and the same privacy protection level. For example, user can create their basic information, Web albums, Web logs, the lists of friends, and so on. But Facebook, Twitter, Wechat, and voov meeting, they were able to decide those information whether can be accessed and viewed by others according to their own privacy level, consequently achieved purpose of preserving privacy to some extent.

The data in social network is more complex than two-dimensional data table in relational database. Privacy protection in social network can be summarized as vertex protection, edge protection, and sensitive attribute protection. Vertex protection is to prevent an attacker from identifying a vertex in an anonymous publishing graph with a high probability. Edge protection is to prevent an attacker from identifying an edge in an anonymous publishing graph with a high probability. Attribute protection is to prevent the attacker from getting vertexes or sensitive attributes of edges with a high probability. We cannot use anonymity methods and technologies, which used into traditional two dimensional data table, into social network directly, and users have personalized protecting privacy requirements (vertex protection, edge protection, and sensitive attribute protection) in the real social network such as the users of Facebook, Twitter, and Wechat, so it has the very high research value that personalized privacy protection methods are used into social network data publishing [14].

2. Problem Definition

2.1. Related Concepts

Definition 1. $k$ -Anonymity.

$RT A_{1}, \dots, A_{n}$ is a table and $Q I_{RT}$ is quasi-identifier in $RT$ . $RT$ is said to satisfy $k$ -anonymity if and only if each sequence of values in $RT Q I_{RT}$ emerge $k$ occurrences at least in $RT Q I_{RT}$ [15].

Table 1 is said to satisfy $k$ -anonymity, $Q I_{RT}$ includes nation, birthday, gender, ZIP, the sensitive attribute is disease, $k = 2$ . As can be seen from Table 1, $t 1 QI = t 2 QI$ , $t 3 QI = t 4 QI$ , $t 5 QI = t 6 QI = t 7 QI$ .

Definition 2. k-Degree anonymity.

A social network graph $G V; E$ is said to satisfy $k$ -degree anonymity, if each vertex (node) has $k - 1$ other vertexes at least, and these vertex’s degree are same in the social network graph. The variable $V$ represents vertex amounts, and $E$ represents edge amounts between vertexes [16, 17].

Table 1

Example of 2-anonymity, $QI = Nation, Birthday, Gender, ZIP$ .

No.	Nation	Birthday	Gender	ZIP	Salary
1	Yellow	1995	F	26118 $*$	9000
2	Yellow	1995	F	26118 $*$	5000
3	Yellow	1994	M	26112 $*$	27000
4	Yellow	1994	M	26112 $*$	10000
5	Brown	1993	F	26113 $*$	7500
6	Brown	1993	F	26113 $*$	30000
7	Brown	1993	F	26113 $*$	9500

$K$ -degree anonymity can prevent the inference attack by the adversary with background knowledge about vertex degree. In Figure 1, degree collection is $d = 4, 3, 2, 4, 3, 3, 2, 3, 2$ in primal social network graph (a), so anonymity social network graph (b) satisfies 2-degree anonymity in Figure 1.

Definition 3. Graph isomorphism.

For graphs: $G_{1} = V_{1}, E_{1}$ and $G_{2} = V_{2}, E_{2}$ where $∣ V_{1} ∣ = ∣ V_{2} ∣$ , if there is a bijection $h$ between $V_{1}$ and $V_{2}$ satisfies $\forall u, v \in E_{1}$ , if and only if $\exists$ $h u, h v$ $\in$ $E_{2}$ , $G 1$ , and $G_{2}$ are graph isomorphism, represented as $G_{1} ≅ G_{2}$ . $V_{i}$ represents vertex (node) numbers, and $E_{i}$ represents edge numbers between vertexes.

[figure(s) omitted; refer to PDF]

For example, when we delete the node information of (a) and (b) in Figure 1, (a) and (b) are isomorphic [18].

Definition 4. $k$ -Isomorphism.

For a graph $G = V, E$ , whose $k$ sub-graphs are $g_{1}, g_{2}, \dots, g_{k},$ if $g_{i} l \leq i \leq k$ satisfies: (1) $U_{i = 1}^{k} g_{i} = G$ ; (2) $g_{i} \cap g_{j} = Φ, i \neq j$ (3) $g_{i} and g_{j} i \neq j$ are isomorphism, then, the graph $G$ is $k$ -isomorphism.

Definition 5. k-Isomorphism vertex group.

Given a $k$ -isomorphism publishing graph $G_{p} = VP, EP = g_{1}, g_{2}, \dots g_{k}, \forall v_{1} \in G_{p}, v_{1} \in g_{1}$ , then, there exist $k - 1$ vertexes $v_{i} \in g_{i} i = 2, \dots, k$ are isomorphic to $v_{1}$ , the vertex set consists the vertex $v_{i}$ and the $k - 1 v_{i} \in i = 2, \dots, k$ is $k$ -isomorphism vertexes group, which is denoted as $VCS$ , $VCS = k$ . Each $VCS$ includes $k$ vertexes and there are $VP / k VCS$ in the $k$ -isomorphism graph $G_{p}$ .

Definition 6. k-Isomorphism edge group.

Given a $k$ -isomorphism publishing graph $G_{p} = VP, EP = g_{1}, g_{2}, \dots, g_{k}$ , $\forall e_{1} \in g_{1}$ , then there exist $k - 1$ edges $e_{i} \in g_{i} i = 2, \dots, k$ is isomorphic to $e_{1}$ , the vertex set consists the vertex $e_{1}$ _, and the $k - 1$ $e_{i} \in g_{i} i = 2, \dots, k$ is $k$ -isomorphism edges group, which is denoted as $E C S$ , $ECS = k$ . Each $VCS$ includes $k$ edges and there are $EP / k$ $E C S$ in the $k$ -isomorphism graph $G_{p}$ .

Definition 7. Social network graph.

Given a social network graph: $G = V, E$ , wherein vertex set $V$ denotes the social individuals, and the edge set $E$ denotes the relationships among the social individuals. Each vertex and edge has its identify and attribute which includes

(a) Identifier attribute (ID) of vertex as $v_{i} v_{i}^{ID}$

(b) Quasi-identifier attribute (QI) of vertex $v_{i}$ as $QI = v_{i}^{N 1}, \dots, v_{i}^{N s}, v_{i}^{C 1}, \dots, v_{i}^{C t}$

(d) Quasi-identifier attribute (QI) of edge $e_{i}$ as $QI = e_{j}^{N 1}, \dots, e_{j}^{N p}, e_{j}^{C 1}, \dots, e_{j}^{C q}$

(e) Sensitive attribute (SA) of edge $e_{i}$ as $SA = e_{j}^{S 1}$

(f) Other attributes (OA)

Attribute (QI) of edge denotes by vector pair ( $v_{i}^{ID}, v_{j}^{ID}$ ), the total number of vertexes $N = V$ , $N$ denotes QI of numeric attribute, $C$ denotes QI of character attribute, $s$ , $t$ , $p$ , and $q$ denote the amount of QI, respectively. For example, Figure 1 is an example of friendship social network, each vertex is a customer, and each edge denotes relationship between two vertexes. Table 2 is primal data of each vertex in Figure 1(a). Table 3 is edge table, Eid denotes the sequence number of edge, Vid1 and Vid2 denote the sequence number of vertex of Figure 1(b), and weighted relationship denotes the relationship between Vid1 and Vid2 of Figure 1(c). Table 4 is another relational data table of a vertex of Figure 1(a).

Table 2

The vertex table.

Vid	Name	Age	Gender	ZIP	Salary
1	Kya	25	F	261186	9000
2	John	25	M	261185	5000
3	Mira	26	F	261101	27000
4	Maci	26	F	261131	10000
5	Cathy	27	F	261131	7500
6	Kurt	27	M	261124	30000
7	Sage	27	F	261186	9500
8	Rain	37	M	261185	28000
9	Toni	35	M	261124	22000

Table 3

The edge table.

Eid	Vid1	Vid2	Weighted relationship
1	1	2	1
2	1	4	3
3	2	3	2
4	2	4	1
5	2	5	2
6	3	5	3
7	3	7	2
8	4	6	2
9	5	6	1
10	5	9	1
11	7	8	1
12	7	9	3
13	8	9	1

Table 4

Table of primal personal health information.

Name	Age	Gender	ZIP	Disease
Kya	25	F	261186	Cold
John	25	M	261185	Hypertension
Mira	26	F	261101	AIDS
Maci	26	F	261131	Cold
Cathy	27	F	261131	Cancer
Kurt	27	M	261124	Obesity
Sage	27	F	261186	Pneumonia
Rain	37	M	261185	Diabetes
Toni	35	M	261124	Short breath

2.2. Sensitive Degree of Friend Relationship (SA) of Vertex and Edge

2.2.1. Sensitive Degree of Friend Relationship (SA) of Vertex

We use the influence matrix to represent the level of influence of vertex-sensitive attributes [19, 20]. We can use the influence matrix to meet the requirements of personalized privacy protection of users.

$t_{i j}$ : the influence degree of NO. $j$ sensitive attribute generated by NO. $i$ vertex.

$b_{i}$ : the weightiness of sensitive attribute value of NO. $i$ vertex.

Influence matrix ${Vertex}_{m}$ is with $m$ rows, $n + 1$ columns, $m$ represents vertex amount, $n$ represents QI attribute amount, so it can be described as $\begin{matrix} (1) & {Vertex}_{m} = {t_{i j} b_{i}}_{m \times n + 1} \begin{matrix} QI 1 & QI 2 & QI 3 & \dots & {QI}_{n} \\ t 11 & t 12 & t 13 & \dots & {t1}_{n} \\ \dots & \dots & \dots & \dots & \dots \\ t m - 1 1 & t m - 1 2 & t m - 1 3 & \dots & {t m - 1}_{n} \\ t m 1 & t m 2 & t m 3 & \dots & t m n \end{matrix} \begin{matrix} S \\ b 1 \\ \dots \\ b m - 1 \\ b m \end{matrix} . \end{matrix}$

The $t_{i j}$ , $b_{i}$ values come from experts or experience value. For example, the weightiness of QI in Table 4 can be divided into 5 grades, 1, 0.8, 0.4, 0.1, and 0, and the weightiness of $S$ in Table 4 can be divided into 5 grades too, 0.10, 0.60, 0.70, 0.80, and 0.90. The cold is general disease, and disease weightiness value can use 0.1. Common cold (influenza) may have the character of a regional outbreak, and we define the weight value of the ZIP as 0.8. Common cold may also have a little bit to do with gender, and we define the weight value of gender as 0.1. Then, we define the disease weight values of obesity, short breath, hypertension, diabetes, pneumonia, cancer, and AIDS as 0.12, 0.31, 0.5, 0.6, 0.7, 0.91, and 0.92, respectively. The influence matrix is as follows according to Table 4. $\begin{matrix} (2) & {Vertex}_{m} = {t_{i j} b_{i}}_{9 \times 5 + 1} \begin{matrix} Nation & Occupation & Birthday & Gender & ZIP \\ 0 & 0 & 0 & 0.1 & 0.8 \\ \dots & \dots & \dots & \dots & \dots \\ 0 & 0.1 & 0.4 & 0 & 0 \\ 0 & 0.1 & 0 & 0 & 0 \end{matrix} \begin{matrix} Disease \\ 0.1 \\ \dots \\ 0.6 \\ 0.31 \end{matrix} . \end{matrix}$

2.2.2. Sensitive Degree of Friend Relationship (SA) of Edge

We described relationships of simple friend, good friend, and sweetheart friend (boyfriend or girlfriend) among the vertexes in Figure 1 of friend relationship graph. Graph ( $c$ ) of Figure 1 is an example of friend relationship graph, “1” represents simple friend relationship between two vertexes, “2” represents good friend relationship between two vertexes, “3” represents sweetheart relationship between two vertexes, and “0” represents no relationship between two vertexes. Usually, if sweetheart friend includes gay or lesbian relationship, most of people do not want others to know that he is gay or she is lesbian, so different people have different sensitive degree about friend relationships, so we must meet the needs of personalized privacy protection according to the practical application.

2.3. $α d, k$ -Anonymity Graph

In order to make it impossible for an attacker to infer the real relationship between targeted individuals and corresponding vertexes with a probability, $k$ -anonymity concept in data tables and the new concept of $α d, k$ -anonymity are introduced.

Definition 8. $α d, k$ -Anonymity of the vertex.

Undirected graph $G = V, E,$ the graph $G_{p} = V_{p}, E_{p}$ is as its anonymous publishing graph, if a vertex $v \in V$ , there are at least $k - 1$ vertexes $u_{1}, u_{2}, \dots, u_{k - 1} \in V_{p}$ in $G_{p}$ , which makes ${Neighbor}_{d} v ≅ {Neighbor}_{d} u_{i}$ and $v \neq u_{i}$ , wherein, $i = 1, 2, \dots k - 1$ , thus, the vertex $v$ is $d, k$ -anonymity, and the vertex $v$ is $α d, k$ -anonymity according to $α$ , $α$ is the weight of relationships (edge weight) of $d$ - neighborhood of vertex $v$ .

For example, in Figures 1, $α = 1, 2, 3$ of vertex $F$ (sage), and $α = 1, 2, 3$ of vertex $H$ (Maci), so vertex $F$ and vertex $H$ satisfy $α 1, 2$ -anonymity.

Definition 9. $α d, k$ -Anonymity of the graph.

Undirected graph $G = V, E$ , the graph $G_{p} = V_{p}, E_{p}$ is as its anonymous publishing graph. If any vertex $v \in V$ is $d, k$ -anonymity, thus, the graph $G_{p}$ is $d, k$ -anonymity, if any vertex $v \in V$ is $α d, k$ -anonymity, thus, the graph $G_{p}$ is $α d, k$ -anonymity.

Definition 10. Individual information leakage.

Suppose graph $G_{p}$ is the anonymity publishing graph of social network graph $G$ , when the relative sensitive coefficient $k$ and $l$ satisfy one of the following four conditions, then, there exists individual information leakage. Otherwise, if the graph $G_{p}$ can ensure that any of the following circumstances are not going to happen, the anonymity publishing graph $G_{p}$ is regarded as secure. If the graph $G_{p}$ can ensure the following circumstance (1) and (2) will not happen, then, the anonymity publishing graph $G_{p}$ is $k$ -secure [21].

(1) Vertex Leakage. The probability of ascertaining the corresponding relationship between the vertex in the graph $G_{p}$ and the target individual A in the primal graph $G$ is greater than $1 / k$

(2) Edge Leakage. The probability of ascertaining the corresponding relationship between the edge in the graph $G_{p}$ and the edge in the primal graph $G$ is greater than $1 / k$

(3) Leakage of Vertex Sensitive Information. The probability of ascertaining the sensitive information of target individual A in the primal graph $G$ is greater than $1 / l$

(4) Leakage of Edge Sensitive Information. The probability of ascertaining the sensitive information of the edge in the primal graph $G$ is greater than $1 / l$

2.4. Personalized $α, β, l, k$ -Anonymity

2.4.1. Personalized $α, β, l, k$ -Anonymity Model

Personalized $α, β, l, k$ -anonymity satisfies the following conditions:

(1) Personalized $α, β, l, k$ -anonymity satisfies $α d, k$ -anonymity

(2) $\forall b i < β$ in matrix ${Vertex}_{m}$ , all vertexes in $k$ -isomorphism vertexes group be supposed to be published directly. Otherwise should be satisfied condition (3) and condition (4)

(3) $L = \sum_{j = 1, \dots, k - I} c ount b i - b j > 0, 1 \leq i \leq VCS, b i, b j$ are $S$ column vectors of influence matrix ${Vertex}_{m}$ , $i \neq j$ , $L$ is the numbers of different sensitive attribute value

(4) If $P = count {MAX}_{i = 1 \dots} e_{m} t_{i k} = 1 > 0$ in influence matrix ${Vertex}_{m}$ , when $t_{i k}$ is generated, under the precondition of anonymity, promote generalization hierarchies, or suppress directly [19]. $P$ denotes sensitive degrees between $Q I_{k}$ and $S$ in influence matrix $V e r t e x m$ , if $P = 1$ , it means that $t_{i k}$ will influence $b_{i}$ ’s sensibility

Here, threshold $β$ is important degree parameter of sensitive attribute in condition (2). If sensitive attribute values of an equivalent class ( $VCS$ ) are less then $β$ , that is to say sensitive attribute of these vertexes in $k$ -isomorphism vertex group cannot affect their privacy, all vertexes can be published directly. Otherwise, must satisfy condition (3) and condition (4). If $L > 0$ , number of different sensitive attribute value is greater than or equal to 2, $L$ makes sensitive attribute diversity.

2.4.2. Personalized $α, β, l, k$ -Anonymity Example

There is an example which is shown to explain the definition and the process of personalized $α, β, l, k$ -anonymity according to Figure 2.

[figure(s) omitted; refer to PDF]

Figure 1(a) is the subgraph $G$ of social relationships network, and the isomorphism subgraphs of $G$ are found. The 3-isomorphism subgraphs are shown in Figure 2.

In Figure 2, (a) is the initial subgraph in Figure 1, and (b) and (c) are the isomorphism graphs corresponding to (a). From graph $G$ , the amount of vertexes $∣ V p ∣$ is 27, and the amount of edges $∣ E p ∣$ is 39. Therefore, 9 3-isomorphism vertex groups and 13 3-isomorphosm edge groups are created and listed in Tables 5 and 6.

Table 5

9 vertex groups of 3-isomorphism.

VCS	$G 1$	$G 2$	$G 3$
1	$A 1$	$A 2$	$A 3$
2	$B 1$	$B 2$	$B 3$
3	$C 1$	$C 2$	$C 3$
4	$D 1$	$D 2$	$D 3$
5	$E 1$	$E 2$	$E 3$
6	$F 1$	$F 2$	$F 3$
7	$G 1$	$G 2$	$G 3$
8	$H 1$	$H 2$	$H 3$
9	$I 1$	$I 2$	$I 3$

Table 6

13 edges groups of 3-isomorphism.

ECS	$G 1$	$G 2$	$G 3$
1	$A 1, B 1$	$A 2, B 2$	$A 3, B 3$
2	$A 1, I 1$	$A 2, I 2$	$A 3, I 3$
3	$B 1, C 1$	$B 2, C 2$	$B 3, C 3$
4	$B 1, G 1$	$B 2, G 2$	$B 3, G 3$
5	$B 1, I 1$	$B 2, I 2$	$B 3, I 3$
6	$C 1, D 1$	$C 2, D 2$	$C 3, D 3$
7	$C 1, G 1$	$C 2, G 2$	$C 3, G 3$
8	$D 1, E 1$	$D 2, E 2$	$D 3, E 3$
9	$D 1, F 1$	$D 2, F 2$	$D 3, F 3$
10	$E 1, F 1$	$E 2, F 2$	$E 3, F 3$
11	$F 1, G 1$	$F 2, G 2$	$F 3, G 3$
12	$G 1, H 1$	$G 2, H 2$	$G 3, H 3$
13	$H 1, I 1$	$H 2, I 2$	$H 3, I 3$

Now, the 9 3-isomorphism vertex groups are generalized by their identifier attributes according to parameter $β$ . The isomorphism vertex groups VCS are changed into equivalence class vertexes groups QI. The item age, gender, and ZIP are identifier attributes, and disease item is the sensitive attribute. The inheritance hierarchy tree of ZIP is shown in Figure 3. The inheritance hierarchy tree of disease is shown in Figure 4 [21].

[figure(s) omitted; refer to PDF]

The $A 1$ , $A 2$ , and $A 3$ attributes in the isomorphism groups VCS1 and VCS2 are listed in Table 7. After generalization, the identifier attributes value gen (VCS) are created and shown in Table 8 [15].

Table 7

Example of isomorphism groups vertex’s attributes values.

VCS	Num	Race	Occupation	Age	Gender	ZIP	Disease
1	$A 1$	Asian	Salesman	25	M	150086	Flu
	$A 2$	Asian	Salesman	35	F	150084	Flu
	$A 3$	Black	Teacher	35	M	150081	Mammary cancer
2	$B 1$	White	Teacher	45	F	150090	Lung cancer
	$B 2$	White	Driver	45	M	150041	Lung cancer
	$B 3$	White	Driver	50	M	150024	Lung cancer

Table 8

Example of isomorphism groups’ generalization identifier attributes values.

VCS	Num	Race	Occupation	Age	Gender	ZIP	Disease
1	$A 1$	Asian	Salesman	[25,35)	$*$	15008 $*$	Flu
	$A 2$	Asian	Salesman	[25,35)	$*$		Flu
	$A 3$	Black	Teacher	[25,35)	$*$		Mammary cancer
2	$B 1$	White	Teacher	[45,50)	F	1500 $* *$	$*$
	$B 2$	White	Driver	[45,50)	M		$*$
	$B 3$	White	Driver	[45,50)	M		$*$

3. Personalized $α, β, l, k$ -Anonymity Algorithm

The basic algorithm principle is that $k$ -isomorphism graph $G_{p} = g_{1}, \dots, g_{e}$ is caught; $k$ -isomorphism graph vertex group VCS is generalized about identifier attributes and sensitive attributes; edge group ECS is generalized about identifier attributes and sensitive attributes. In the process, the generalization is not executed definitely, especially when the type differences do not affect $l$ -diversity [22]. The input parameter $α = 0, 1, 2, 3$ indicates the generalizing type: when the value is 0, it should be static generalization, and when the value is greater than 0, it means the generalizing would be on the base of graph isomorphism. The input parameter $α$ indicates the sensitive degree between nodes. When $α \neq 0$ , we achieve $α d, k$ -anonymity graph, $d$ -neighborhood attack of graph and structure attack of graph can be prevented [23, 24], when $α = 0$ , the input parameter $β$ is the generalization threshold [19, 22], background knowledge attack and homogeneity attack can be prevented by using anonymous data of vertexes in social network effectively, and diversity of sensitive attribute can be solved. The following is personalized $α, β, l, k$ -anonymity algorithm ( $α = 0$ ), and personalized $α, β, l, k$ -anonymity algorithm ( $α \neq 0$ ) has been given in another paper published by the author [23].

Algorithm 1: Personalized $α, β, l, k$ -anonymity algorithm.

Inputs:

Initial anonymous graph G = (V, E),

Sensitivity parameters: k’(k’ ≥2); l(2 ≤ l ≤ k’); m(l ≤ m ≤ |V|);

Node attributes table: AS = {v_i^S,v_i^N(1),…,v_i^N(s),v_i^C(1),…,v_i^C(t)};

Edge attributes table: AS = {v_j^S,v_j^N(1),…,v_j^N(s),v_j^C(1),…,v_j^C(t)};

All the classified attribute inheritance tree H_C

Input parameters α =0 and β

Outputs:

Anonymous graph Gp = {g₁, g₂, …, g_e};

The whole VCS, ECS and their attribute information;

Steps:

1 anonymous graph Gp, groups VCS and ECS are caught;

2 read α to judge the generalizing type;

3 got the group number:N_VCS = |N_VP|/k, N_ECS = |N_EP|/k;

4 fori =1 to N_VCSdo//QI attributes generalization

5 forj =1 to sdo//numeric type attributes generalization

6 gen(VCS_i)[N_j] = [min{v₁^N(j), …,v_k^N(j)},max{v₁^N(j), …,v_k^N(j)}]

7 end for

8 forj =1 to tdo//t type QI attributes generalization

9 gen(VCS_i)[C_j] = {v₁^C(j),…,v_k^C(j)}

10 end for

11 while $S A V C S_{i} < l & & S A V C S_{i} / S A V P > β$ do

12 if(sensitive attributes are classified) then

13 forj =0 to kdo

14 v_j^C is replaced by its parents node in classified inheritance tree of sensitive attribute

15 if $S A V C S_{i} \geq l$ then jump while loop

16 end for

17 Else

18 forj =1 to kdo

19 the interval of v_j^N is changed to its neighborhood;

20 if $S A V C S_{i} \geq l$ then jump while loop

21 end for

22 end if

23 end while

24 end for

25 fori =1 to N_ECSdo

26 forj =1 to pdo

27 gen(ECS_i)[N_j] = [min{v₁^N(j),…,v_k^N(j)},max{v₁^N(j), …,v_k^N(j)}]

28 end for

29 forj =1 to qdo

30 gen(ECS_i)[C_j] = {e₁^C(j),…,e_k^C(j)}

31 end for

32 while $S A V C S_{i} < l & & S A V C S_{i} / ∣ S A V_{P} ∣ > β$ do

33 if(sensitive attributes are classified) then

34 forj =1 to kdo

35 v_j^C is replaced by its parents node in classified inheritance tree of sensitive attribute

36 if $S A V C S_{i} \geq l$ then jump while loop

37 end for

38 Else

39 forj =1 to kdo

40 the interval of n_j^N is changed to its neighborhood;

41 if $S A V C S_{i} \geq l$ then jump while loop

42 end for

43 end if

44 end while

45 end for

46 anonymous graph G_p is published; all the VCS nodes, ECS edges and their attribute information are published

4. Experiments and Results

The experiments were completed in the PC with Intel(R) Core(TM) i5-4590 CPU @ 3.30 GHz, 8 GB memory, and the OS is Microsoft Windows 7. The programs were coded and compiled in VS 2019 IDE.

The vertex (nodes) data set in these experiments are from adults census data set of the UC Irvine Machine Learning Repository [25, 26]. There are two experiments examples, and the vertex numbers of each are 300 and 1000. In these vertexes, 6 attributes are considered in the experiments, which are age, occupation, race, gender, zip, and disease. In these attributes, age is numeric, and the others are category. The attribute disease is sensitive attribute. The edge set in these experiments is created by Pajek software randomly, and the numbers of nodes are, respectively, 5000, 10000, 15000, 20000, and 25000.

Information loss was compared between the algorithm in this paper that we proposed and paper [15]. We use the information loss method from paper [15]. The algorithm in this paper was named as ACIM (anonymous composite improved model) algorithm, and the algorithm in paper [15] was named as ACM (anonymous composite model) algorithm. In personalized $α, β, l, k$ -anonymity algorithm ( $α = 0$ ), we make the data usability and original according to parameter $β$ . When $β$ is less than the given threshold, all vertex (data) will be published directly, which reduce the degree of data distortion [19].

When $α \neq 0$ , a number of nodes and edges should be added in initial graphs. When the structure is more different, the number of adding is higher. Meanwhile, the information loss is larger.

Figure 5 shows that some nodes are added to construct the isomorphic graphs, the percentage of adding nodes in all the nodes of the graph is shown in Figure 5, and the situation of edges is shown in Figure 6. With the $k = 5$ , $k = 10$ , and $k = 15$ , the increasing speed of nodes and edges slows down. These are additional redundant data.

[figure(s) omitted; refer to PDF]

In Figures 7–9, the loss of information is shown with $k = 5$ , $k = 10$ , and $k = 15$ . The information loss degrees are increasing with the increasing of nodes, $k$ and $d$ . The reason is that the candidate set will be larger with the increasing of data scale, and finding similar neighborhood will be easier [19].

[figure(s) omitted; refer to PDF]

When $α = 0$ , the information loss results of attributes generalization are compared when $k$ value is 5, 10, 15, 20, and 25. Figures 10 and 11 show the comparison results.

[figure(s) omitted; refer to PDF]

From Figure 10, when $k$ value increase, the demand of privacy protection becomes higher, which lead to obviously increasing of information loss. Besides, the loss of ACIM is lower after comparison. The reason is that not all the situations in same types are generalized but adding threshold judgments in ACIM. In Figure 11, the number of nodes is larger, and the generalization information loss is lower and make the node information availability for users.

Figures 12 and 13 show the comparison of generalization information loss with different $β$ value. With higher $β$ value, the vertex’s attributes should be changed less, so the information loss rate should be lower. That is to say, parameter $β$ can make the vertex information availability and meet personalized needs.

[figure(s) omitted; refer to PDF]

5. Conclusion

The authors study $k$ -anonymity technologies and introduce $k$ -anonymity application in relational database and social network. We proposed personalized $α, β, l, k$ -anonymity model of social network. A lot of personalized $α, β, l, k$ -anonymity algorithm experiments were done by the authors. Experimental results show that $d$ -neighborhood attack of graph, background knowledge attack, and homogeneity attack can be prevented effectively by using anonymous vertexes and edges, as well as the influence matrix based on background knowledge. The diversity of vertex-sensitive attribute can be achieved. Personalized protecting privacy requirements can be met by using such parameter as $α, β, l, k$ .

Acknowledgments

The authors want to thank the helpful comments and suggestions from the anonymous reviewers. This work was supported in part by the Natural Science Foundation of Taizhou University (Grant no. TZXY2019QDJJ008) and the Natural Science Foundation of Heilongjiang Province of China (Grant no. JJ2019LH0048).

References

[1] X. M. Ren, B. X. Jia, K. C. Wang, J. Cheng, "Research on k-anonymity privacy protection of social network," Applied Mechanics and Materials, vol. 530-531, pp. 701-704, DOI: 10.4028/www.scientific.net/AMM.530-531.701, 2014.

[2] H. Miyajima, N. Shigei, H. Miyajima, Y. Miyanishi, S. Kitagami, N. Shiratori, "New privacy preserving clustering methods for secure multiparty computation," International Journal of Computer Science, vol. 6 no. 1, pp. 270-276, DOI: 10.5430/air.v6n1p27, 2016.

[3] K. Macwan, S. Patel, "Privacy preservation approaches for social network data publishing," Artificial Intelligence for Cyber Security: Methods, Issues and Possible Horizons or Opportunities, pp. 213-233, DOI: 10.1007/978-3-030-72236-4_9, 2021.

[4] R. Kumar, J. Novak, A. Tomkins, "Structure and evolution of online social networks," Proceedings of the 12th ACM SIGKDD international conference on Knowledge Discovery and Data Mining(KDD), pp. 611-617, .

[5] Y. J. Luo, Q. Liu, Y. Wang, "Overview of protecting user privacy in social networks," Application Research of Computers, vol. 10, pp. 3061-3064, 2010.

[6] Y. Xiaowei, Z. Weitao, "On link privacy in randomizing social networks," Knowledge and Information Systems, vol. 28 no. 3, pp. 645-663, DOI: 10.1007/s10115-010-0353-5, 2011.

[7] J. Cheng, A. W. Fu, J. Liu, "K-isomorphism: privacy preserving network publication against structural attacks," Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 459-470, .

[8] K. Liu, E. Terzi, "Towards identity anonymization on graphs//proceedings of the 2008 ACM SIGMOD international conference on management of data," ACM, pp. 93-106, 2008.

[9] A. Campan, T. Truta, "Data and structural k-anonymity in social networks," Privacy, Security, and Trust in KDD, vol. 5456, pp. 33-54, DOI: 10.1007/978-3-642-01718-6_4, 2009.

[10] M. K. SUNG, K. Y. LEE, J.-B. SHIN, Y. D. CHUNG, "A privacy protection method for social network data against content/degree attacks," IEICE Transactions on Information and Systems, vol. E95-D no. 1, pp. 152-160, DOI: 10.1587/transinf.E95.D.152, 2012.

[11] E. Y. Baagyere, Z. Qin, H. Xiong, Q. Zhiguang, "The structural properties of online social networks and their application areas," International Journal of Computer Science, vol. 43 no. 2, pp. 270-276, 2016.

[12] N. Li, X.-L. Zhang, "Research on dynamic social network anonymity technology for protecting community structure," International Journal of Network Security, vol. 23 no. 4, pp. 576-587, 2021.

[13] X. K. Xiao, Y. F. Tao, "Personalized privacy preservation," Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 229-240, .

[14] X. Zhang, J. Liu, H. Bi, J. Li, Y. Wang, "Personalized K-in&out-degree anonymity method for large-scale social networks based on hierarchical community structure," International Journal of Network Security, vol. 23 no. 2, pp. 314-325, 2021.

[15] L. Sweeney, "k-Anonymity: a model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems, vol. 10 no. 5, pp. 557-570, DOI: 10.1142/S0218488502001648, 2002.

[16] H. Wu, J. Zhang, B. Wang, J. Yang, B. Sun, "(d, k)-Anonymity for social networks publication against neighborhood attacks," Journal of Convergence Information Technology, vol. 8 no. 2, pp. 59-67, DOI: 10.4156/jcit.vol8.issue2.8, 2013.

[17] H. W. Wu, Research on Anonymity Techniques for Privacy-Preserving Data Publishing in Social Networks, 2013.

[18] L. Zou, L. Chen, M. T. Ozsu, "k-Automorphism," Proceedings of the VLDB Endowment, vol. 2 no. 1, pp. 946-957, DOI: 10.14778/1687627.1687734, 2009.

[19] X. Ren, J. Yang, F. Wei, "Research on CBK(L,K)-anonymity algorithm," International Journal of Advancements in Computing Technology, vol. 3 no. 4, pp. 165-173, DOI: 10.4156/ijact.vol3.issue4.18, 2011.

[20] L. I. Siyu, Research for Protecting Privacy of Social Network Data Based on Relevance Degree Perception, 2020.

[21] P. Samarati, L. Sweeney, "Generalizing data to provide anonymity when disclosing information," Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1998.

[22] B. Zhou, J. Pei, "The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks," KAIS, vol. 28 no. 1, pp. 47-77, DOI: 10.1007/s10115-010-0311-2, 2011.

[23] X. M. Ren, D. X. Jiang, K. C. Wang, R. A. N. Qi, "A personalized a (d, k)-anonymity for social network," 2017 2nd international conference on computer, Mechatronics and Electronic Engineering, pp. 167-174, .

[24] B. Zhou, J. Pei, "Preserving privacy in social networks against neighborhood attacks," Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08, pp. 506-515, DOI: 10.1109/ICDE.2008.4497459, .

[25] D. J. Newman, S. Hettich, C. L. Blake, C. J. Merz, UCI Repository of Machine Learning Databases, 1998. http://archive.ics.uci.edu/ml/datasets/Adult

[26] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, A. W. C. Fu, "Utility-based anonymization for privacy preservation with less information loss," Journal of SIGKDD Explorations, vol. 8 no. 2, pp. 21-30, DOI: 10.1145/1233321.1233324, 2006.

Word count: 4292

Show less

Copyright © 2022 Xiangmin Ren and Dexun Jiang. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

By mining the data published on social network, we can discover the hidden value of information including the privacy of individuals and organizations. Protecting privacy of individuals and organizations on social network has become the focus of more and more researchers. Based on the actual privacy protection need of edge sensitive attribute and vertexes sensitive attribute, we propose a new personalized $α, β, l, k$ -anonymity technology of privacy preserving to reduce distortion extent of the data in the privacy processing of data of social network. Experimental results of personalized $α, β, l, k$ -anonymity algorithm show that $d$ -neighborhood attack of graph, background knowledge attack, and homogeneity attack can be prevented effectively by using anonymous vertexes and edges, as well as the influence matrix based on background knowledge. The diversity of vertex sensitive attribute can be achieved. Personalized protecting privacy requirements can be met by using such parameter as $α, β, l, k$ .

Details

Title

A Personalized α,β,l,k-Anonymity Model of Social Network for Protecting Privacy

Author

Ren, Xiangmin¹

; Jiang, Dexun²

¹ College of Computer Science and Technology, Taizhou University, Taizhou, Jiangsu Province, China
² School of Information Engineering, Harbin University, Harbin, Heilongjiang Province, China

Editor

Deepak Kumar Jain

Publication year

2022

Publication date

2022

Publisher

John Wiley & Sons, Inc.

e-ISSN

15308677

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2022/7187528

ProQuest document ID

2658000368

A Personalized α,β,l,k-Anonymity Model of Social Network for Protecting Privacy

Jump to:

Full Text

Abstract

Details

Suggested sources