Lourens J. Waldorp 1 and Verena D. Schmittmann 2
Academic Editor:Frank Werner
1, University of Amsterdam, Weesperplein 4, 1018 XA Amsterdam, Netherlands
2, Tilburg University, Warandelaan 2, 5037 AB Tilburg, Netherlands
Received 13 May 2014; Accepted 8 February 2015; 22 February 2015
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
Assortative mixing by node degree (i.e., the number of connections of a node) is the tendency of nodes to be connected to other nodes of similar degree and an important concept in network analysis [1-3]. For example, assortative mixing in social networks could reflect the notion that well-connected people, who know many people, have a tendency to know mainly other well-connected people [4]. Or in a network of actors, where connections indicate that actors worked together on a film, actors with many connections are likely to have worked together with other well-connected actors [5]. As a final example, in symptom networks in psychopathology, where connections represent symptoms belonging to the same disorder, it appears that core symptoms, which are common to multiple disorders, have a high tendency to be mutually connected [6]. Assortative mixing is becoming more and more important, because it points to relevant network characteristics, such as self-similarity and other emergent properties, if it is detected in networks with a power-law degree distribution [1, 7]. Here we propose to compute assortative mixing in undirected networks using linear programming.
Several measures of assortative mixing for undirected graphs exist [5]. One of the most popular ones is the Pearson assortativity coefficient, or degree correlation [figure omitted; refer to PDF] [1], which is Pearson's correlation applied to the degrees of each node in the network. Degree correlation is a normalized metric and obtains values between [figure omitted; refer to PDF] and [figure omitted; refer to PDF] . However, Alderson and Li [8] have shown that the same value of degree correlation can occur from very different configurations of edges (topology) of a network and that the use of degree correlation may lead to incorrect conclusions. The reason for this is that degree correlation is normalized with respect to general networks with a specific degree sequence (number of connections for all nodes) that can have multiple edges, self-loops and can even be disconnected (i.e., consist of multiple disconnected components). However, in many situations the objective is to compare assortativity with networks that are similar, that is, that are connected and simple (no self-loops, connected, and no multiple edges). To remedy this objection to degree correlation, Li et al. [9] proposed the [figure omitted; refer to PDF] -metric (first called [figure omitted; refer to PDF] -metric and later [figure omitted; refer to PDF] -metric), which is a linear transformation of the degree correlation. In the normalized version of the [figure omitted; refer to PDF] -metric, normalization is calculated with the maximal and minimal values of [figure omitted; refer to PDF] with respect to the class of networks that have the desired properties of connectedness and simplicity for the specified degree sequence. The minimal ( [figure omitted; refer to PDF] ) and maximal ( [figure omitted; refer to PDF] ) possible values of [figure omitted; refer to PDF] in the class of simple and connected networks with the same degree sequence are compared to the obtained [figure omitted; refer to PDF] -value for the network at hand. Then, in contrast to the degree correlation coefficient [figure omitted; refer to PDF] which is normalized with respect to unrestricted networks of the same degree sequence, the normalized [figure omitted; refer to PDF] -metric is obtained by comparing [figure omitted; refer to PDF] of the network under consideration to similar networks that are within the same class of networks of the same degree sequence yet are maximal ( [figure omitted; refer to PDF] ) or minimal ( [figure omitted; refer to PDF] ) with respect to assortative mixing.
Obtaining the maximal and minimal network in the class of simple (undirected networks without self-loops and no multiple edges) and connected networks with the same degree sequence is not trivial [8]. The algorithm introduced in Alderson and Li [8] and described in van Mieghem et al. [2] ranks all edges according to the product of degrees of the pair of nodes and then connects nodes according to a nodes degree such that the graph is connected. Unfortunately, this algorithm does not always achieve the exact degree sequence as desired. Alternatively, van Mieghem et al. [2] proposed a rewiring approach increasing (or decreasing) the assortativity, while the degree sequence remains unchanged. The constraint of a connected graph here is sacrificed, however, resulting in possibly disconnected graphs. We propose to use a linear program (LP) to identify [figure omitted; refer to PDF] and [figure omitted; refer to PDF] . Obtaining the minimum and maximum value of [figure omitted; refer to PDF] within the class of graphs with a specific degree sequence that are simple and connected is formulated as a binary integer program (BIP). A binary integer program is a program to optimize a linear objective function given certain linear constraints for a binary (0/1) solution (e.g., [10, 11]). Here the objective function to be optimized is [figure omitted; refer to PDF] and the constraints are that the optimal graph is simple and connected and has the specified degree sequence. The [figure omitted; refer to PDF] -metric problem resembles the traveling salesman problem (e.g., [12]), except that the degree constraints can be different from value 2. As a consequence we cannot use the approach of Held and Karp [13], where the problem is reformulated to a 1-tree problem and was shown to lead under certain circumstances to a linear program with the same solution as the binary integer problem. We prove a weaker result that leads to an efficient solution of the problem by moving the degree constraints to a penalty term in the optimization (Lagrangian relaxation), which is solved by a subgradient algorithm.
2. Correlation and the [figure omitted; refer to PDF] -Metric
Let [figure omitted; refer to PDF] be an undirected graph, where [figure omitted; refer to PDF] is the set of nodes [figure omitted; refer to PDF] and [figure omitted; refer to PDF] is the set of edges [figure omitted; refer to PDF] with size [figure omitted; refer to PDF] . The degree sequence [figure omitted; refer to PDF] of graph [figure omitted; refer to PDF] is the [figure omitted; refer to PDF] vector [figure omitted; refer to PDF] containing for each node the number of connections and the degrees [figure omitted; refer to PDF] of the nodes in [figure omitted; refer to PDF] . The degrees [figure omitted; refer to PDF] are not necessarily ascending or descending. We are mostly interested in simple connected graphs, that is, no self-loops and no multiple edges with a single component. This class of graphs with degree sequence [figure omitted; refer to PDF] is denoted by [figure omitted; refer to PDF] . The unconstrained class of graphs with self-loops, multiple edges, and possibly multiple components having degree sequence [figure omitted; refer to PDF] is denoted by [figure omitted; refer to PDF] . It is immediate that [figure omitted; refer to PDF] .
The degree correlation or assortative mixing of an undirected graph [figure omitted; refer to PDF] is defined in terms of the degree sequence as [1, 8] [figure omitted; refer to PDF] It is equivalent to the Pearson correlation coefficient and its value is between [figure omitted; refer to PDF] . The first term in the numerator calculates assortativity by degree for the graph [figure omitted; refer to PDF] [7]. The first term in the denominator [figure omitted; refer to PDF] can be interpreted as the maximal value for assortative mixing that can be obtained within the class of graphs that may have multiple connections and self-loops and need not be connected in [figure omitted; refer to PDF] . The second term in the numerator and in the denominator [figure omitted; refer to PDF] can be seen as the central point in the class of graphs in [figure omitted; refer to PDF] with minimal assortativity [8]. Hence, the Pearson degree correlation or assortativity can be considered as the normalized assortativity within the general class of graphs that may contain self-loops and multiple edges and need not be connected that are in [figure omitted; refer to PDF] (see the discussion on p. 502 of Li et al. [7] for more details).
Another insightful way of considering the degree correlation is given by van Mieghem et al. [2]. Let [figure omitted; refer to PDF] be the symmetric [figure omitted; refer to PDF] adjacency matrix with a 1 if two nodes are connected and 0 otherwise, and let [figure omitted; refer to PDF] be a diagonal matrix with the degrees on the diagonal and zero otherwise. The matrix [figure omitted; refer to PDF] is called the Laplacian matrix. Then it is shown that the degree correlation is [figure omitted; refer to PDF] From this version of degree correlation, it is easily seen that a regular graph has degree correlation [figure omitted; refer to PDF] , since all degrees are equal. Additionally, it is shown in van Mieghem et al. [2] that a connected Erdös-Renyi random graph has zero assortativity.
For a graph [figure omitted; refer to PDF] the [figure omitted; refer to PDF] -metric is defined by [7, 9] [figure omitted; refer to PDF] It is clear from its definition that the [figure omitted; refer to PDF] -metric can obtain values between [figure omitted; refer to PDF] and [figure omitted; refer to PDF] for the complete graph of size [figure omitted; refer to PDF] , [figure omitted; refer to PDF] . It is therefore convenient to normalize the [figure omitted; refer to PDF] -metric such that it obtains values between [figure omitted; refer to PDF] and [figure omitted; refer to PDF] . The normalized [figure omitted; refer to PDF] -metric of graph [figure omitted; refer to PDF] is defined by [7, 9] [figure omitted; refer to PDF] where [figure omitted; refer to PDF] and [figure omitted; refer to PDF] refer to the minimal or maximal value of [figure omitted; refer to PDF] in a specific class, like [figure omitted; refer to PDF] . The value of the normalized [figure omitted; refer to PDF] -metric is between [figure omitted; refer to PDF] . The first term in the nominator of [figure omitted; refer to PDF] is [figure omitted; refer to PDF] , which is the same as that of degree correlation above. The difference between [figure omitted; refer to PDF] and [figure omitted; refer to PDF] is the normalization. The [figure omitted; refer to PDF] -metric is normalized with respect to the maximum and minimum obtainable in a specific class, whereas degree correlation is always normalized by the central and maximal values of assortativity by degree within the general class of graphs having self-loops, multiple connections and being possibly disconnected in [figure omitted; refer to PDF] .
Example 1.
We generated two topologically different graphs, shown in Figure 1. The first graph, shown in Figure 1(a), has [figure omitted; refer to PDF] nodes, [figure omitted; refer to PDF] edges, and ascending degree sequence [figure omitted; refer to PDF] . The other graph in Figure 1(b) has [figure omitted; refer to PDF] nodes, [figure omitted; refer to PDF] edges, and ascending degree sequence [figure omitted; refer to PDF] . The networks have similar degree sequence but the topology is different, as can be seen in Figure 1. However, both networks have approximately zero degree correlation (assortativity), [figure omitted; refer to PDF] and [figure omitted; refer to PDF] , respectively. This suggests that the networks are similar in topology. The normalized [figure omitted; refer to PDF] -metric does pick up the topological differences. The values of the normalized [figure omitted; refer to PDF] -metric are [figure omitted; refer to PDF] and [figure omitted; refer to PDF] , respectively, indicating that the networks can be correctly distinguished and may have a hub-like core. The values of [figure omitted; refer to PDF] are higher because the normalization is made within the class of simple, connected graphs, whereas the correlation coefficient [figure omitted; refer to PDF] is approximately zero with respect to normalization within the general class of graphs with self-loops and multiple connections that are possibly disconnected. This example shows that there are differences in degree sequence of a connected graph that are not picked up at all by degree correlation but are picked by the [figure omitted; refer to PDF] -metric; and the reason for this is that the normalization for the [figure omitted; refer to PDF] -metric is within [figure omitted; refer to PDF] whereas the normalization of the correlation is within the larger class [figure omitted; refer to PDF] .
Two example graphs with [figure omitted; refer to PDF] nodes and [figure omitted; refer to PDF] edges (a) and [figure omitted; refer to PDF] nodes and [figure omitted; refer to PDF] edges (b) with the degrees inside the nodes.
(a) [figure omitted; refer to PDF] [figure omitted; refer to PDF] , [figure omitted; refer to PDF]
[figure omitted; refer to PDF]
(b) [figure omitted; refer to PDF] [figure omitted; refer to PDF] , [figure omitted; refer to PDF]
[figure omitted; refer to PDF]
The main practical difference between the correlation coefficient [figure omitted; refer to PDF] and normalized [figure omitted; refer to PDF] -metric is that the correlation coefficient does not distinguish certain graphs that are configurationally (topologically) different whereas the normalized [figure omitted; refer to PDF] -metric does distinguish between them. Alderson and Li [8] show that two networks can have a completely different topology but the correlations of both networks are approximately the same. Additionally, there is a large range of values of the coefficient of variation for which the correlation is approximately zero, whereas the [figure omitted; refer to PDF] -metric shows clear changes across the same range of values. There is also a set of graphs for which both the degree corruption and the normalized [figure omitted; refer to PDF] -metric are approximately zero (see Figure 1 in [8]). One solution to this is to compute the local assortativity [14]. Piraveenan et al. [3] show that assortative graphs can heave locally disassortative hubs, and vice versa, suggesting that local information can be relevant.
The main problem in calculating the normalized [figure omitted; refer to PDF] -metric identified by Alderson and Li [8] is that the normalization requires calculation of [figure omitted; refer to PDF] and [figure omitted; refer to PDF] . The algorithm proposed in Li et al. [7] orders all edges [figure omitted; refer to PDF] according to the weights [figure omitted; refer to PDF] . Nodes are then connected corresponding to this ordering as long as the degree sequence remains unchanged and the graph is connected. Such a heuristic greedy search type algorithm seems to work reasonably well for the maximum within [figure omitted; refer to PDF] but does not always achieve the exact degree sequence [figure omitted; refer to PDF] [7]. Additionally, only the maximum is obtained, not the minimum, with this algorithm [8]. The minimum is usually approximated by a lower bound, the minimum in the class [figure omitted; refer to PDF] of unrestricted graphs. The alternative algorithm introduced in van Mieghem et al. [2] randomly rewires two links simultaneously (leaving the degree sequence unchanged) such that nodes with the highest degree are connected and nodes with the lowest degrees are connected, increasing assortativity. Although with this algorithm the correct degree sequence is obtained, the resulting graph is not necessarily connected. In the following sections we present an integer program using a Lagrangian relaxation that obtains both [figure omitted; refer to PDF] and [figure omitted; refer to PDF] for simple, connected graphs that have exactly the specified degree sequence.
Example 2.
We continue with the two graphs from the previous example with [figure omitted; refer to PDF] nodes shown in Figure 1. The graphs that maximize the [figure omitted; refer to PDF] -metric within the class of simple, connected graphs for [figure omitted; refer to PDF] and [figure omitted; refer to PDF] are shown in Figures 2(a) and 2(b). It can be seen that node degree is similar to that of neighbors, as the optimization requires. In contrast, the graphs with minimal value [figure omitted; refer to PDF] in Figures 2(c) and 2(d) have low degree nodes connected to high degree nodes. For [figure omitted; refer to PDF] the value for the graph is [figure omitted; refer to PDF] and the minimal and maximal values within the class [figure omitted; refer to PDF] are [figure omitted; refer to PDF] and [figure omitted; refer to PDF] , so that [figure omitted; refer to PDF] The minimal and maximal values within [figure omitted; refer to PDF] are [figure omitted; refer to PDF] and [figure omitted; refer to PDF] , respectively. For [figure omitted; refer to PDF] the value for the graph is [figure omitted; refer to PDF] and the minimal and maximal values within the class [figure omitted; refer to PDF] are [figure omitted; refer to PDF] and [figure omitted; refer to PDF] , so that [figure omitted; refer to PDF] The minimal and maximal values within [figure omitted; refer to PDF] are [figure omitted; refer to PDF] and [figure omitted; refer to PDF] , respectively. As expected the minimal values within [figure omitted; refer to PDF] , [figure omitted; refer to PDF] , were similar to the minimal values within [figure omitted; refer to PDF] . However, the maximal values are quite different for [figure omitted; refer to PDF] and [figure omitted; refer to PDF] , indicating the importance that the maximal value needs to be determined specifically for [figure omitted; refer to PDF] .
Graphs that have maximal value [figure omitted; refer to PDF] ((a) and (b)) and minimal values ((c) and (d)) for the graphs [figure omitted; refer to PDF] and [figure omitted; refer to PDF] corresponding to the ones in Figure 1. Degrees are inside the nodes.
(a) [figure omitted; refer to PDF] [figure omitted; refer to PDF]
[figure omitted; refer to PDF]
(b) [figure omitted; refer to PDF] [figure omitted; refer to PDF]
[figure omitted; refer to PDF]
(c) [figure omitted; refer to PDF] [figure omitted; refer to PDF]
[figure omitted; refer to PDF]
(d) [figure omitted; refer to PDF] [figure omitted; refer to PDF]
[figure omitted; refer to PDF]
3. Linear and Integer Programming to Obtain [figure omitted; refer to PDF]
In this section we first give the formulation of the [figure omitted; refer to PDF] -metric problem for binary integer programming. We use the cutset formulation to ensure that the resulting graph is connected. We then show that in general the polytope for a linear program is not integral. However, we can relax the constraints and show that while constraining the solution to be integral, we need not constrain it to be binary, which is what we require.
The objective is to find a graph such that the minimal and maximal values of [figure omitted; refer to PDF] within the class of simple and connected graphs with degree sequence [figure omitted; refer to PDF] , [figure omitted; refer to PDF] , can be obtained to compute [figure omitted; refer to PDF] , the normalized [figure omitted; refer to PDF] -metric. We will show that obtaining the minimum (or maximum) can be reformulated into a binary integer program (BIP) with a linear relaxation such that a solution can be obtained within polynomial time.
3.1. Problem Formulation
The objective function [figure omitted; refer to PDF] is to be optimized subject to two constraints to obtain a graph in [figure omitted; refer to PDF] . Since maximization is equivalent to minimizing its negative, we limit our discussion to minimization. The objective function can be rewritten in terms of an incidence vector and the weights [figure omitted; refer to PDF] , the product of degrees for nodes [figure omitted; refer to PDF] and [figure omitted; refer to PDF] . Let [figure omitted; refer to PDF] denote the set of all possible edges for the nodes in [figure omitted; refer to PDF] (Kronecker set). Then for an edge [figure omitted; refer to PDF] in the set [figure omitted; refer to PDF] , [figure omitted; refer to PDF] when an edge is present and [figure omitted; refer to PDF] otherwise. We can then rewrite the objective function [figure omitted; refer to PDF] for preferential attachment as [figure omitted; refer to PDF] To obtain a graph in the class of simple and connected graphs with degree sequence [figure omitted; refer to PDF] in [figure omitted; refer to PDF] , we require several types of constraint. The first is that the degree sequence of the graph should be exactly [figure omitted; refer to PDF] . Define, for any [figure omitted; refer to PDF] , the cut by the partition [figure omitted; refer to PDF] , where [figure omitted; refer to PDF] is the complement of [figure omitted; refer to PDF] . Denote the edges in the cut by [figure omitted; refer to PDF] . Then [figure omitted; refer to PDF] is the incidence set of node [figure omitted; refer to PDF] , and the degree constraint can be defined as [figure omitted; refer to PDF] . The second type of constraint is to ensure that the graph is connected, for which we use the cutset formulation [11, 15, 16]. The idea is that for any nonempty subset [figure omitted; refer to PDF] of nodes at least one edge has to connect both sets of vertices in [figure omitted; refer to PDF] and [figure omitted; refer to PDF] . This implies that for any nonempty [figure omitted; refer to PDF] the cutset [figure omitted; refer to PDF] has to contain at least one edge. Finally, we require that the optimal solution is binary; that is, [figure omitted; refer to PDF] should be in [figure omitted; refer to PDF] . We now have the binary integer program (BIP) as follows: [figure omitted; refer to PDF] where [figure omitted; refer to PDF] means the integer part. The degree constraints in (10) combined with the cutset constraints (11) imply nonnegativity since [figure omitted; refer to PDF] for all [figure omitted; refer to PDF] . The cutset constraints (11) run to size [figure omitted; refer to PDF] since we do not need to check subsets twice. For any cut [figure omitted; refer to PDF] we have [figure omitted; refer to PDF] and so for a cut [figure omitted; refer to PDF] , we have that this [figure omitted; refer to PDF] is the same as [figure omitted; refer to PDF] when [figure omitted; refer to PDF] .
Example 3.
Consider a graph [figure omitted; refer to PDF] with [figure omitted; refer to PDF] nodes and [figure omitted; refer to PDF] edges shown in Figure 3 (solid lines). The edge set [figure omitted; refer to PDF] can be considered as an instance of the incidence vector [figure omitted; refer to PDF] of dimension [figure omitted; refer to PDF] . The incidence vector in this case is [figure omitted; refer to PDF] For this [figure omitted; refer to PDF] we have degree sequence [figure omitted; refer to PDF] and so [figure omitted; refer to PDF] With [figure omitted; refer to PDF] nodes there are [figure omitted; refer to PDF] constraints for the degrees to equal [figure omitted; refer to PDF] , giving the degree constraints (10). For connectedness there are sets [figure omitted; refer to PDF] of size [figure omitted; refer to PDF] , and so the number of cutsets to be checked is [figure omitted; refer to PDF] . For this [figure omitted; refer to PDF] the [figure omitted; refer to PDF] cutsets are shown in Table 1. For example, take the first partition [figure omitted; refer to PDF] and [figure omitted; refer to PDF] . All elements in [figure omitted; refer to PDF] for this partition are in the first row. Since nodes 1 and 2 are in the same set [figure omitted; refer to PDF] , no edge is required here for connectedness; nodes 1 and 3 are in [figure omitted; refer to PDF] and [figure omitted; refer to PDF] , respectively, and so an edge is required for connectedness. This is repeated for all different partitions [figure omitted; refer to PDF] and [figure omitted; refer to PDF] . In total there are 15 constraints from the degrees and the cutset constraints.
Table 1: The incidence vectors [figure omitted; refer to PDF] for different cutsets [figure omitted; refer to PDF] showing edges between [figure omitted; refer to PDF] and its complement [figure omitted; refer to PDF] .
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | [figure omitted; refer to PDF] | |||||||||
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | [figure omitted; refer to PDF] | [figure omitted; refer to PDF] | [figure omitted; refer to PDF] | [figure omitted; refer to PDF] | [figure omitted; refer to PDF] | [figure omitted; refer to PDF] | [figure omitted; refer to PDF] | [figure omitted; refer to PDF] | ||
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 |
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 |
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 |
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
[figure omitted; refer to PDF] | [figure omitted; refer to PDF] | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 |
Figure 3: Graph of Example 3. [figure omitted; refer to PDF] has [figure omitted; refer to PDF] nodes and [figure omitted; refer to PDF] edges (solid lines); all other possible edges including labels in [figure omitted; refer to PDF] are shown as dotted lines.
[figure omitted; refer to PDF]
Note that the number of constraints for this formulation is exponential in the size of the graph [figure omitted; refer to PDF] , because all nonempty subsets [figure omitted; refer to PDF] are required for connectedness, with [figure omitted; refer to PDF] (see, e.g., [11], p. 471). Specifically, the number of constraints to check whether all subsets [figure omitted; refer to PDF] of size [figure omitted; refer to PDF] connect to at least one other node is [figure omitted; refer to PDF] , where [figure omitted; refer to PDF] . One possibility is to change formulation to obtain a connected graph, for example, to a subtour elimination formulation, as used in minimal spanning trees (e.g., [11, 12, 15]). The subtour elimination formulation for the constraints in optimizing [figure omitted; refer to PDF] is [figure omitted; refer to PDF] However, the subtour elimination formulation is in this case inappropriate since the total number of edges can in general exceed [figure omitted; refer to PDF] , which means we are not dealing with a tree.
3.2. Linear and Integer Program
We are interested in designing a linear program that will result in incidence (binary) vectors such that the constraints in (10) and (11) are satisfied. We therefore need to bound the solution we obtain in [figure omitted; refer to PDF] to be at most 1; that is, [figure omitted; refer to PDF] Let [figure omitted; refer to PDF] (10), (11)} be the polyhedron associated with the [figure omitted; refer to PDF] -metric problem. Let [figure omitted; refer to PDF] (11)} be the polyhedron of the cutset constraints. The polyhedron [figure omitted; refer to PDF] resembles the subtour elimination polytope (SEP), associated with the symmetric traveling salesman problem [12]. The difference is that for the [figure omitted; refer to PDF] -metric problem the degrees are not fixed to be 2 but are equal to the original graph.
Example 4.
Consider a graph with three nodes and two edges, [figure omitted; refer to PDF] with [figure omitted; refer to PDF] and [figure omitted; refer to PDF] ; see Figure 4(a). Here, [figure omitted; refer to PDF] is the incidence vector. The cutset constraints (11) and the degree constraints (10) are in this example: [figure omitted; refer to PDF] This is a description of [figure omitted; refer to PDF] in three dimensions. If we consider the degree constraints only, then we see that the only extreme point is [figure omitted; refer to PDF] . This is because the face [figure omitted; refer to PDF] has dimension [figure omitted; refer to PDF] , since the three degree constraints are supported and its associated matrix has rank 3 (see Figure 4(b)). Note that this is not true for other points like [figure omitted; refer to PDF] . It is also seen that the polytope [figure omitted; refer to PDF] is not integral, since the point [figure omitted; refer to PDF] is an extreme point. This point is in the center of the triangle in Figure 4(c).
The graph [figure omitted; refer to PDF] from Example 4 in (a) and the polytope in (b) obtained from the degree constraints, and the polyhedron [figure omitted; refer to PDF] in (c) from only the cutset including the bound 1 constraints. The degree constraints in (b) show that their intersection at [figure omitted; refer to PDF] is the only possible solution here. The line in (b) indicates where the faces induced by the constraints [figure omitted; refer to PDF] and [figure omitted; refer to PDF] intersect.
(a) [figure omitted; refer to PDF]
(b) [figure omitted; refer to PDF]
(c) [figure omitted; refer to PDF]
From this example it is clear that in general we need to consider points in [figure omitted; refer to PDF] such that we only obtain integral solutions. Our problem is different from the traveling salesman problem approached by Held and Karp [13] as a 1-tree problem, because the degree sequence we seek for the solution is in our case fixed to the one of the original graph. It follows that any simple graph with the number of edges ≥n will not lead to a 1-tree. A general solution to the [figure omitted; refer to PDF] -metric problem cannot therefore be linked to a 1-tree. We can establish a weaker result than integrality of [figure omitted; refer to PDF] , though, which allows us to use a linear program effectively but is not guaranteed to lead to an integral result. When considering only the cutset constraints in [figure omitted; refer to PDF] , we find that the constraints of unity for each [figure omitted; refer to PDF] imply that considering the convex hull of the integral solutions will result in binary solutions.
Theorem 5.
Let [figure omitted; refer to PDF] be the convex hull over the integer points in [figure omitted; refer to PDF] (11), (16)}. An LP can be used to find the optimal solution such that [figure omitted; refer to PDF]
Proof.
We can rewrite the polyhedron [figure omitted; refer to PDF] such that it is associated with a set-covering problem. For such a set-covering problem we use Proposition 6.3 of Nemhauser and Wolsey [11] which, together with the constraint from cutset (11), then proves the claim.
In a set-covering problem the edges cover all nodes and can be described by a minimization over a polyhedron with constraints of the form [figure omitted; refer to PDF] , where [figure omitted; refer to PDF] is a characteristic binary vector representing a graph indexed by [figure omitted; refer to PDF] . Let a set [figure omitted; refer to PDF] of subsets of nodes [figure omitted; refer to PDF] such that (intersection) [figure omitted; refer to PDF] for all [figure omitted; refer to PDF] and (minimality) if [figure omitted; refer to PDF] then [figure omitted; refer to PDF] for some [figure omitted; refer to PDF] [11]. The set [figure omitted; refer to PDF] is called a blocking clutter and [figure omitted; refer to PDF] is called a clutter. Then [figure omitted; refer to PDF] is a subset of the nodes [figure omitted; refer to PDF] which represents the edges that connect a partition [figure omitted; refer to PDF] into [figure omitted; refer to PDF] and its complement [figure omitted; refer to PDF] . For any path from [figure omitted; refer to PDF] to [figure omitted; refer to PDF] the vector [figure omitted; refer to PDF] if it connects one node from [figure omitted; refer to PDF] to another node from [figure omitted; refer to PDF] and 0 otherwise. Let [figure omitted; refer to PDF] if the edge is on a path for some node [figure omitted; refer to PDF] to [figure omitted; refer to PDF] and 0 otherwise. Then the characteristic vectors [figure omitted; refer to PDF] with [figure omitted; refer to PDF] and [figure omitted; refer to PDF] with [figure omitted; refer to PDF] are a blocking clutter and a clutter, respectively. Since a path from [figure omitted; refer to PDF] to [figure omitted; refer to PDF] for some [figure omitted; refer to PDF] coincides with an edge that connects [figure omitted; refer to PDF] to [figure omitted; refer to PDF] for some [figure omitted; refer to PDF] (intersection property), we have that [figure omitted; refer to PDF] for some [figure omitted; refer to PDF] . This means that the [figure omitted; refer to PDF] -metric problem is a set-covering problem [figure omitted; refer to PDF] , where [figure omitted; refer to PDF] .
If we show that the extreme points of [figure omitted; refer to PDF] are exactly the paths from a node [figure omitted; refer to PDF] to a node [figure omitted; refer to PDF] , represented by the vectors [figure omitted; refer to PDF] with [figure omitted; refer to PDF] , then we are done. By the intersection property we find that if [figure omitted; refer to PDF] , then there is an [figure omitted; refer to PDF] such that [figure omitted; refer to PDF] for all [figure omitted; refer to PDF] . This follows because a path from [figure omitted; refer to PDF] to [figure omitted; refer to PDF] coincides with exactly one edge for an [figure omitted; refer to PDF] . This shows that the incidence vectors [figure omitted; refer to PDF] are the extreme points of [figure omitted; refer to PDF] . Since all characteristic vectors [figure omitted; refer to PDF] are incidence vectors and so in [figure omitted; refer to PDF] , the claim of the theorem follows.
This result, together with the fact that [figure omitted; refer to PDF] (10), (11)} is not integral, tells us that we may consider a relaxation of the [figure omitted; refer to PDF] -metric problem by dropping the degree constraints (10) and considering optimizing only over [figure omitted; refer to PDF] (11)}. One way of doing this is to incorporate the degree constraints (10) in the objective function; this is a Lagrangian relaxation of the degree constraints. This is similar to the approach of Held and Karp [13].
4. Relaxation of the Integer Problem
The linear relaxation of the [figure omitted; refer to PDF] -metric problem with the cutset formulation means dropping constraint (12) that [figure omitted; refer to PDF] is 0/1 [17]. The LP problem is then [figure omitted; refer to PDF] . Linear relaxation, however, does not guarantee an integral solution, and in fact we know that [figure omitted; refer to PDF] is not integral (see Example 4). As in the traveling salesman problem, the combinations of the degree (equality) constraints and the cutset constraints make optimization difficult (see, e.g., [12, 13, 18, 19]). Here we consider Lagrangian relaxation of the degree constraints. We use the subgradient method to find the optimal solution with the step size for the subgradient as suggested in Held and Karp [13] (see also Fisher [18]).
4.1. Lagrangian Relaxation
We consider Lagrangian relaxation where the degree constraints (10) are considered hard and so are entered as a penalty in the function to optimize. For fixed [figure omitted; refer to PDF] (positive or negative) we have the Lagrangian relaxation: [figure omitted; refer to PDF] Since we are dealing with equality constraints of degree [figure omitted; refer to PDF] we have that a solution [figure omitted; refer to PDF] for the Lagrange multiplier [figure omitted; refer to PDF] is also a solution to the original IP, because [figure omitted; refer to PDF] for this solution. It follows that [figure omitted; refer to PDF] . To obtain the bound closest to the original problem, the greatest lower bound is of interest, which is the Lagrangian dual [figure omitted; refer to PDF] . Combining this gives the Lagrangian dual [11]: [figure omitted; refer to PDF] This means that we can use an LP to find the Lagrangian dual. We know from Theorem 5 that this results in a binary solution, which is required; we cannot drop the integrality constraint in general since [figure omitted; refer to PDF] is not integral. One way to obtain a solution to the Lagrangian dual is by subgradient optimization.
4.2. Subgradient Optimization
For differentiable functions, minimization over convex functions is relatively straightforward [20]. If [figure omitted; refer to PDF] is differentiable and convex, then [figure omitted; refer to PDF] for all [figure omitted; refer to PDF] , and [figure omitted; refer to PDF] is the gradient of [figure omitted; refer to PDF] at [figure omitted; refer to PDF] . If [figure omitted; refer to PDF] is a local minimum then the gradient is 0 and it follows from convexity that [figure omitted; refer to PDF] is also a global minimum. For nonsmooth and/or nondifferentiable functions the gradient cannot be defined everywhere [18]. In the subgradient algorithm the gradient vector is replaced by a subgradient. Let the vector [figure omitted; refer to PDF] be a subgradient at [figure omitted; refer to PDF] if [figure omitted; refer to PDF] for all [figure omitted; refer to PDF] . Since the function [figure omitted; refer to PDF] is piecewise linear and convex, we can define a subgradient by the degree constraints [figure omitted; refer to PDF] [11, 21].
The essential step in this algorithm is to determine how to change the subgradient [figure omitted; refer to PDF] correctly. Like in the gradient method the subgradient is adjusted at each iteration in the direction most favorable to the constraints. A sequence of subgradients [figure omitted; refer to PDF] is generated by [figure omitted; refer to PDF] We then have that when [figure omitted; refer to PDF] , the set of all subgradients (subdifferential), we have obtained the global minimum. The subdifferential [figure omitted; refer to PDF] contains 0 if the step size [figure omitted; refer to PDF] [22]. The most often used way to determine the step size is by the so-called Held-Karp method (see, e.g., [18, 21]): [figure omitted; refer to PDF] where [figure omitted; refer to PDF] , [figure omitted; refer to PDF] , and [figure omitted; refer to PDF] denotes the Euclidean norm. The sequence [figure omitted; refer to PDF] is halved whenever [figure omitted; refer to PDF] remains unchanged for some number of iterations [18].
The subgradient algorithm for the Lagrangian dual is presented in Algorithm 1.
Algorithm 1: Subgradient algorithm for Lagrangian dual.
(1) [figure omitted; refer to PDF] , [figure omitted; refer to PDF] , and [figure omitted; refer to PDF]
(2) while [figure omitted; refer to PDF] do
(3) solve
[figure omitted; refer to PDF]
and [figure omitted; refer to PDF] is the optimal solution
(4) [figure omitted; refer to PDF]
(5) [figure omitted; refer to PDF] , where
[figure omitted; refer to PDF]
and [figure omitted; refer to PDF]
(6) [figure omitted; refer to PDF]
(7) end while
(8) the optimal solution is [figure omitted; refer to PDF]
(9) end
4.3. Computational Results
Computations were done with lpSolve (version 5.5) in [figure omitted; refer to PDF] [23]. The lpSolve software is a mixed integer linear programming solver that uses (revised) simplex methods for linear programming and branch and bound methods for integer programming [24].
We generated 100 connected random graphs (Erdös-Renyi) of nodes with size [figure omitted; refer to PDF] . Edges between each pair of nodes were determined by a probability of around 0.4 (requiring that the graph be connected), resulting in graphs with [figure omitted; refer to PDF] approximately 4470.
For the subgradient algorithm we used [figure omitted; refer to PDF] and [figure omitted; refer to PDF] , which was halved if the objective value did not improve [18]. The Lagrange multiplier starting value was [figure omitted; refer to PDF] , the maximum of degrees for all nodes.
Similar to Fisher [18] we report the percentage of problems for which the IP solution is exactly the same as the LD solution, the ratio of the average of the LD solutions and the average of the IP solutions, to indicate the sharpness of the bound, and the maximum of the subgradient, which should be (close to) 0. For the 100 simulations the percentage of exact solutions [figure omitted; refer to PDF] is 81.48 and the ratio [figure omitted; refer to PDF] is 99.96. This suggests that in many cases the result is exact and that if it is not exact then the bound using the Lagrangian dual is sharp. The subgradient was in all cases 0 or close to 0; the maximum subgradient over all simulations was [figure omitted; refer to PDF] , indicating that the degree constraints were satisfied in all cases.
5. Application to Psychopathology and the DSM IV
The diagnostic statistical manual (DSM) IV contains criteria to diagnose people as having certain disorders like general anxiety (GA) or major depression (MD) disorder. These criteria are a set of symptoms which need to be ascertained for a candidate patient. Many of the symptoms are part of different disorders and so it is "easy" to have multiple disorders. For instance, both GA and MD have a set of nine symptoms of which any five must be valid for a person to be suffering from that disorder. But GA and MD have four symptoms in common, and hence, many patients suffering from GA also suffer from MD (see [25] for an elaborate discussion on this).
A network can be created from the DSM IV by connecting symptoms that belong to the same disorder [6]. This network contains [figure omitted; refer to PDF] nodes and [figure omitted; refer to PDF] edges. In this network approximately [figure omitted; refer to PDF] of the nodes is connected, giving rise to a giant component. This giant component has [figure omitted; refer to PDF] nodes and [figure omitted; refer to PDF] edges. The network is shown in Figure 5. The range of the degree sequence is between [figure omitted; refer to PDF] and [figure omitted; refer to PDF] with median [figure omitted; refer to PDF] , and its distribution seems to fit an exponential distribution [6]. The degree correlation of the giant component is [figure omitted; refer to PDF] , while the normalized [figure omitted; refer to PDF] -metric is [figure omitted; refer to PDF] , suggesting that the graph is assortative. Again the degree correlation and normalized [figure omitted; refer to PDF] -metric obtain different numbers for the same graph, due to the comparison in either the general class of graphs for the degree correlation or in the restricted class of graphs for the normalized [figure omitted; refer to PDF] -metric.
Figure 5: The DSM IV graph constructed by connecting all symptoms that belong to the same disorder. The [figure omitted; refer to PDF] white nodes belong the giant component and the remaining [figure omitted; refer to PDF] gray nodes consist of [figure omitted; refer to PDF] components.
[figure omitted; refer to PDF]
The DSM IV graph constructed by connecting all symptoms that belong to the same disorder. The [figure omitted; refer to PDF] white nodes belong the giant component and the remaining [figure omitted; refer to PDF] gray nodes consist of [figure omitted; refer to PDF] components.
6. Discussion
Assortativity is for certain classes of graphs best calculated by the normalized [figure omitted; refer to PDF] -metric [8]. The existing algorithms by Li et al. [7] and van Mieghem et al. [2] to determine the normalized [figure omitted; refer to PDF] -metric are not optimal in the sense that the first one does not always obtain the desired degree sequence and the second does not necessarily result in a connected graph. Here we used a linear program to calculate the minimal and maximal value graphs with respect to the [figure omitted; refer to PDF] -metric. This is a binary integer program which we showed can be solved by a linear program. Any solution satisfies the constraints, and hence, the obtained minimal or maximal graph with respect to the [figure omitted; refer to PDF] -metric is simple and connected with specified degree.
In this paper we considered the class of simple and connected graphs with specified degree sequence. Although the ideas used here can be used for other types of graphs, generalization is not immediate for all variations. For instance, if multiple edges were allowed between any pairs of nodes, then showing that a linear program can be used is less straightforward, indicating that some solutions are difficult to find. Also, generalizations to other metrics, like shortest path betweenness, are not straightforward because to compute the shortest path length is not linear in the adjacency matrix.
Since there exists a class of graphs for which both the degree correlation and the [figure omitted; refer to PDF] -metric are approximately zero [8], it may be more appropriate to consider local assortativity as introduced by Piraveenan et al. [14]. In Piraveenan et al. [3] it is shown that many networks that are disassortative have many hubs that are assortative, so revealing interesting structure at a local scale.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
[1] M. E. J. Newman, "Assortative mixing in networks," Physical Review Letters , vol. 89, no. 20, 2002.
[2] P. van Mieghem, H. Wang, X. Ge, S. Tang, F. A. Kuipers, "Influence of assortativity and degree-preserving rewiring on the spectra of networks," The European Physical Journal B , vol. 76, no. 4, pp. 643-652, 2010.
[3] M. Piraveenan, M. Prokopenko, A. Y. Zomaya, "Classifying complex networks using unbiased local assortativity," in Proceedings of the Alife 12th Conference, pp. 329-336, Odense, Denmark, 2010.
[4] A.-L. Barabasi, R. Albert, "Emergence of scaling in random networks," Science , vol. 286, no. 5439, pp. 509-512, 1999.
[5] A. Barrat, M. Barthelemy, A. Vespignani Dynamical Processes on Complex Networks , Cambridge University Press, 2008.
[6] D. Borsboom, A. O. J. Cramer, V. D. Schmittmann, S. Epskamp, L. J. Waldorp, "The small world of psychopathology," PLoS ONE , vol. 6, no. 11, 2011.
[7] L. Li, D. Alderson, J. C. Doyle, W. Willinger, "Towards a theory of scale-free graphs: definition, properties, and implications," Internet Mathematics , vol. 2, no. 4, pp. 431-523, 2005.
[8] D. L. Alderson, L. Li, "Diversity of graphs with highly variable connectivity," Physical Review E , vol. 75, no. 4, 2007.
[9] L. Li, D. Alderson, W. Willinger, J. Doyle, "A first-principles approach to understanding the internet's router-level topology," Computer Communication Review , vol. 34, no. 4 in Proceedings of the 2004 Conference On Applications, Technologies, Architectures, And Protocols For Computer Communications (SIGCOMM '04), pp. 3-14, ACM
[10] A. Schrijver Theory of Linear and Integer Programming , of Wiley-Interscience Series in Discrete Mathematics, John Wiley & Sons, Chichester, UK, 1986.
[11] G. Nemhauser, L. Wolsey Integer and Combinatorial Optimization , John Wiley & Sons, 1999.
[12] S. Boyd The subtour polytope of the travelling salesman problem [Ph.D. thesis] , University of Waterloo, 1986.
[13] M. Held, R. M. Karp, "The traveling-salesman problem and minimum spanning trees," Operations Research , vol. 18, no. 6, pp. 1138-1162, 1970.
[14] M. Piraveenan, M. Prokopenko, A. Y. Zomaya, "Local assortativity and growth of Internet," European Physical Journal B-Condensed Matter and Complex Systems , vol. 70, no. 2, pp. 275-285, 2009.
[15] D. Bertsimas, J. Tsitsklis Introduction to Linear Optimization , Athena Scientific and Dynamic Ideas, Belmont, Mass, USA, 1997.
[16] S. Krumke Integer Programming, 2007
[17] A. Schrijver, "Polyhedral proof methods in combinatorial optimization," Discrete Applied Mathematics , vol. 14, no. 2, pp. 111-133, 1986.
[18] M. L. Fisher, "The lagrangian relaxation method for solving integer programming problems," Management Science , vol. 50, no. 12, pp. 1861-1874, 2004.
[19] S. Boyd, P. Elliott-Magwood, S. Iwata, "Structure of the extreme points of the subtour elimination problem of the stsp," Combinatorial Optimization and Discrete Algorithms , pp. 33-47, 2010.
[20] J. R. Magnus, H. Neudecker Matrix Differential Calculus with Applications in Statistics and Econometrics , of Wiley Series in Probability and Statistics, John Wiley & Sons, Chichester, UK, 1999.
[21] M. Held, P. Wolfe, H. P. Crowder, "Validation of subgradient optimization," Mathematical Programming , vol. 6, pp. 62-88, 1974.
[22] M. Held, R. M. Karp, "The traveling-salesman problem and minimum spanning trees. Part II," Mathematical Programming , vol. 1, no. 1, pp. 6-25, 1971.
[23] R Development Core Team http://www.R-project.org/ R: A Language and Environment for Statistical Computing , R Foundation for Statistical Computing, Vienna, Austria, 2012.
[24] M. Berkelaar, K. Eikland, P. Notebaert lpsolve: Open Source (Mixedinteger) Linear Programming System , GNU Lesser General Public License, 2004.
[25] A. O. J. Cramer, L. J. Waldorp, H. L. J. van der Maas, D. Borsboom, "Comorbidity: a network perspective," Behavioral and Brain Sciences , vol. 33, no. 2-3, pp. 137-150, 2010.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2015 Lourens J. Waldorp and Verena D. Schmittmann. Lourens J. Waldorp et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Calculation of assortative mixing by degree in networks indicates whether nodes with similar degree are connected to each other. In networks with scale-free distribution high values of assortative mixing by degree can be an indication of a hub-like core in networks. Degree correlation has generally been used to measure assortative mixing of a network. But it has been shown that degree correlation cannot always distinguish properly between different networks with nodes that have the same degrees. The so-called s -metric has been shown to be a better choice to calculate assortative mixing. The s -metric is normalized with respect to the class of networks without self-loops, multiple edges, and multiple components, while degree correlation is always normalized with respect to unrestricted networks, where self-loops, multiple edges, and multiple components are allowed. The challenge in computing the normalized s -metric is in obtaining the minimum and maximum value within a specific class of networks. We show that this can be solved by using linear programming. We use Lagrangian relaxation and the subgradient algorithm to obtain a solution to the s -metric problem. Several examples are given to illustrate the principles and some simulations indicate that the solutions are generally accurate.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer