Specification and estimation of network formation

Full text

Turn on search term navigation

1 Introduction

Economic research on social networks and interactions has grown rapidly over the past two decades. In many economic contexts, social networks have been found to be an important channel to disseminate information or facilitate activities.¹ Due to the importance of social networks for a wide range of applications, both academic researchers and practitioners have been and are still interested in understanding how network links are formed. Indeed, the question is not only interesting in its own right, but it is also important for understanding the role of network structures on economic outcomes.

For example, in the context of social interactions, we would like to understand how individuals choose their friends to benefit from peer effects on economic outcomes. In particular, friendship networks may be formed to produce favorable economic consequences; for example, students may prefer choosing high achieving friends who can help them study. Then, if one is interested in measuring peer effects on academic achievement, they need to correct for possible endogeneity bias due to friendship selection, as itself might also be based on academic achievement.

Moreover, endogenous friendship formation may amplify observed peer interactions due to unobserved factors that affect both friendship selection and economic outcomes (Weinberg (2007)). For example, Goldsmith-Pinkham and Imbens (2013), Hsieh and Lee (2016), Johnsson and Moon (forthcoming), and Auerbach (2019) study unobserved driving factors and use them to link network formation and network interactions to economic activities.

In this paper, we propose an unified modeling approach for individuals who form a friendship network inside a group and have their economic behaviors influenced by their friends' behaviors once the network is formed. In particular, we focus on a static model and present a novel approach for examining whether utilities of interactions in certain economic activities (e.g., peer influence on tobacco consumption) play any role in the formation of friendship networks for high school students.² Specifically, we allow for economic choices that are subject to peer effects, for example, smoking decisions, to impact individuals' utilities of forming network links.

Formally, we present a two-stage game. A network is formed in the first stage, and then in the second stage, individuals choose the intensity of their involvement in some economic activities (e.g., tobacco consumption). We focus on a “subgame perfect” equilibrium; we consider individuals who anticipate the second stage of the game when choosing network links in the first stage.

The formation of the network, that is, the first stage, follows the literature regarding the stability and efficiency of social networks (Jackson and Wolinsky (1996), Dutta and Jackson (2000), Jackson (2005), Caulier, Mauleon, and Vannetelbosch (2015)). Accordingly, we adopt a transferable utility framework that allows individuals to make side payments. Indeed, we present a model in which individuals may have preferences over global network features (e.g., popularity, transitive triads, etc.). As such, individuals have a strong incentive to make side payments.

Once a network is formed, individuals choose the intensity of their activities in the second stage. We therefore follow the literature dealing with games on networks (e.g., Ballester, Calvó-Armengol, and Zenou (2006), Calvó-Armengol, Patacchini, and Zenou (2009), Bramoullé, Kranton, and D'amours (2014), Boucher (2016)) and focus on the Nash equilibrium. Specifically, individuals choose the intensity of each activity non-cooperatively, taking the network and the other individuals' choices as given.

The advantages of modeling both network formation and network interactions using an unified framework are twofold: first, we can evaluate the importance of individual incentives on forming friendship that stem from utilities of interactions in activity outcomes. That is, assessing how much individuals anticipate that they will be influenced once a network is formed. Second, the use of a jointly coherent model permits controlling for possible friendship selection biases on estimating peer effects on each activity.

A simple empirical approach to model network formation is to assume pairwise independence: the probability of forming each link is independent of other links, conditional on observables.³ For example, Fafchamps and Gubert (2007a, 2007b) and Comola (2007) assume pairwise independence and focus on individual and dyad-specific variables to explain the formation of a network. Other examples include latent position models (Hoff, Raftery, and Handcock (2002), Handcock, Raftery, and Tantrum (2007), McCormick and Zheng (2015)) and models with unobserved (network) degree heterogeneity (Graham (2017), Jochmans (2018)). In these models, individuals are assumed to have unobserved positions (or fixed effects) in the network that reflect heterogeneity of their social or economic status. Those unobserved positions also allow researchers to control the homophily effect in terms of unobserved individual characteristics.

However, as noted by Bramoullé and Fortin (2009), pairwise independence is a strong assumption since it implies that individuals' utility functions are additively separable across links. Therefore, even if such models are flexible enough to replicate many network statistics observed in real data, their microeconomic foundations involve strong assumptions.

In this paper, we go beyond pairwise independence specification and consider the exponential probability distribution to model network data. The idea is to treat an observed network as one of many possible configurations for links (represented by one or zero) among a population of m individuals. This idea matches the Exponential Random Graph model (ERGM) proposed by Frank and Strauss (1986) or, more generally, the $p^{*}$ model of Wasserman and Pattison (1996) in the statistical literature.

In either an ERGM or a $p^{*}$ model, several selected network statistics, such as the number of reciprocal links, the number of k-stars, $k \geq 2$ , and the number of triangles, are specified using an exponential probability distribution as a way to measure how likely those network structures would appear in a network.⁴ However, the coefficients of those network statistics in ERGMs and $p^{*}$ models do not allow for causal interpretations.

Contrary to the standard literature on ERGMs, we motivate our model specification using a formal economic model where the probability of the observed network is given by the shape of the unique equilibrium of the game. Meanwhile, we show that previous methods used for controlling unobserved individual heterogeneity through latent variables (Hsieh and Lee (2016)) can still be used in our context. As a result, our proposed network formation model handles three distinguished features: observed and unobserved individual heterogeneity, global network dependence, and utilities of interactions from endogenous economic activities as incentives in link decisions. To our knowledge, this is the first paper to do so.

The drawback of using a very general and flexible specification is that it complicates the estimation. Indeed, the likelihood function of an ERGM involves an intractable normalizing term in the denominator, which requires the evaluation of the network statistics for all possible network realizations. To handle the intractable normalizing term during the estimation, many suggestions have been proposed; they include, for example, using simulations in a classical estimation setting (Geyer and Thompson (1992), Snijders (2002)) or in a Bayesian setting with auxiliary Markov chain Monte Carlo (MCMC) (Liang (2010), Murray, Ghahramani, and MacKay (2006), Mele (2017b)).

Due to its numerical efficiency, in this paper we adopt a Bayesian method based on a double Metropolis–Hastings (M–H) algorithm (Liang (2010)) to deal with the intractable normalizing term. We also implement the modification of the double M–H proposed by Mele (2017b) to improve convergence of the algorithm. We conduct an extensive simulation study to show that the proposed Bayesian MCMC sampler can successfully recover true model parameters from artificially generated network data. We also examine model misspecification issues in the simulation and provide new evidence of network endogeneity biases within network interaction studies.

We apply our model to the study of American high school students' friendship networks using the Add Health data. We focus on two activity variables: students' GPA and smoking habits. We find a significant impact of a student's GPA on the formation of the network, but we observe no effect from their smoking habits. However, we find peer effects for both activities. This suggests that the interaction in academic learning is a factor for building friendships, whereas the interaction in smoking is not.

Our results also reveal significant homophily effects from both the observed and unobserved characteristics in network formation. Unobserved characteristics in network formation have significant influence on activity outcomes. That is, peer effects on GPA and smoking are subject to selection biases due to unobserved characteristics linked to the formation of friendship relations.

This paper contributes mainly to two strands of the literature. First, it contributes to the empirical literature on network formation. Graham (2017), Jochmans (2018), and Dzemski (2019) introduced node specific parameters to capture degree heterogeneity in a pairwise independent link formation model; however, they ignore any possible network externality effect. We capture the network externality effects in our empirical model by allowing for global network effects such as popularity, transitive triads, etc.

Sheng (2014), Miyauchi (2016), and De Paula, Richards-Shubik, and Tamer (2018) specified strategic network formation models and characterize equilibria of their models by the pairwise stability condition (Jackson and Wolinsky (1996)). Instead of imposing equilibrium selection assumptions, they specify incomplete models and utilize a partial identification approach. We show that our model is complete as it corresponds to the unique equilibrium of our two-stage game. Importantly, the structure of the model gives us more flexibility in including relevant observed and unobserved individual characteristics, and network features.

Christakis, Fowler, Imbens, and Kalyanaraman (2010) and Mele (2017b) modeled network formation as a sequential process where in each period a single, randomly selected pair of agents has the opportunity to meet and decide to form or sever a link. This sequential process is equivalent to an equilibrium selection mechanism in the corresponding static model (Jackson and Watts (2002)). In contrast, our equilibrium concept, which allows for side payments, leads to a static random utility model.

More specifically, we contribute to the empirical literature on ERGMs of network formation (e.g., Chandrasekhar and Jackson (2014), Boucher and Mourifié (2017), Mele (2017b), Mele and Zhu (2017) Mele (2017c)). With an exception of Mele (2017c), the literature assumes that econometricians observe all of the payoff relevant variables. We contribute to the literature by allowing for unobserved heterogeneity using latent variables, following the strategy used in Hsieh and Lee (2016) and Hsieh and Van Kippersluis (2018). Also, our transferable utility setting allows us to study a wider range of preferences. In particular, we do not require the existence of a potential function.

Second, this paper contributes to the literature on peer effects in endogeneous networks (e.g., Goldsmith-Pinkham and Imbens (2013), Hsieh and Lee (2016)). In particular, this paper contributes to the emerging empirical literature studying the impact of economic actions on the formation of a network.⁵ To our knowledge, the only five existing papers that deal with this aspect are Badev (2013), Boucher (2016), Lewis-Faupel (2016), Battaglini, Patacchini, and Rainone (2019), and Hsieh, König, and Liu (2019).

Badev (2013) focused on a setup where individuals take a single binary action (affecting the preferences on the network structure). He presents a random utility model and an original equilibrium concept—based on stability constraints—that nests a pairwise stable and pairwise Nash network. The equilibrium also follows an ERGM in which all payoff relevant variables are assumed to be observed and where existence is only guaranteed for potential games. Lewis-Faupel (2016) also focused on a setup where individuals take a single binary action, but this binary action is anticipated in the network formation stage, where individuals form heuristic expectations.

Boucher (2016) presented a model of conformism with respect to a single continuous action. Since the model features many equilibria, he assumes that the data is generated by the equilibrium that maximizes the potential function. Solving for such an equilibrium is not feasible, thus the estimation is performed using the maximum of first- and second-order approximations of the potential function. Hsieh, König, and Liu (2019) used a similar framework to Badev (2013) while considering a continuous action under the context of firm R&D collaboration network and R&D expenditure. The microfoundation of their model is based on the stochastic best response dynamics (Blume (1993)) and the equilibrium is also only guaranteed for potential games.

Battaglini, Patacchini, and Rainone (2019) studied the interaction between the intensity of a legislator's social connections and their legislative effectiveness. In particular, they allow for varying connections' strengths and present a novel equilibrium notion. They show that social connection are an important driver of legislators' effectiveness.

In contrast to this literature, the model presented in this paper simultaneously studies multiple activities. We also control for a wide range of unobserved heterogeneity and allow for a large class of preferences. In particular, characterization of our model equilibrium does not require the existence of a potential function.

The remainder of this paper is structured as follows. Section 2 presents a unified modeling approach for both network formation and network interactions on economic activities. A Bayesian estimation method for the proposed model is discussed in Section 3. Section 4 provides an application of the model to high school students' friendship networks and activities with the Add Health data. Section 5 concludes the paper. Appendix A provides a proof of the main proposition in this paper. Some technical details of estimation, a Monte Carlo simulation, and additional empirical results are relegated to the Online Supplementary Appendix (Hsieh, Lee, and Boucher (2020)).

2 Economic model of peer effects in an endogenous network 2.1 Description of the economy

Let W be an $m \times m$ matrix representing the friendship network of m individuals.⁶ The $(i, j)$ th entry of W, denoted as $w_{i j}$ , equals one if individual i has a link to individual j and zero otherwise. The notation $w_{i}$ stands for the ith row of W and $W_{- i}$ stands for W excluding $w_{i}$ . We assume that the links are directed, so it is possible that i has a link to j, while j is not linked to i (i.e., W is not symmetric).⁷ We normalize diagonal elements so that $w_{i i} = 0$ for all $i \in (1, \dots, m)$ . Importantly, we assume that individuals can form at most $\bar{n}$ friendship relations. This assumption may reflect, for example, time constraint (e.g., Boucher (2015) and De Paula, Richards-Shubik, and Tamer (2018)).

Let X be the $m \times k$ matrix of individual characteristics and $x_{i}$ be the ith row of X. Individuals choose the intensity of their involvement in economic activities. For each activity $d \in (1, \dots, \bar{d})$ , let $y_{i, d}$ denote the intensity of individual i in activity d. Let also $Y_{d} = {(y_{1, d}, \dots, y_{m, d})}^{'}$ be the m-dimensional column vector of intensities for activity d and $Y_{- i, d}$ be the $m - 1$ dimensional vector with $y_{i, d}$ removed from $Y_{d}$ . We assume that $y_{i, d}$ is continuous, or continuous on strictly positive values, but we allow for left censoring at zero. Formally, we consider either $y_{i, d} \in R$ or $y_{i, d} \in R_{+}$ .⁸ The impact of left censoring on equilibrium activity intensity is formally described in Proposition 1 in Section 2.3.

2.2 Preferences

As discussed above, we focus on a transferable utility framework in which individuals are allowed to make side payments. How those side payments are implemented is described in Section 2.3. In this section, we describe the individuals' preferences before side payments are allowed. The preference of any given individual i is represented by the utility function: [Image Omitted. See PDF] where $v_{i} (W)$ represents an explicit preference over the structure of network W, while $u_{i, d} (y_{i, d}, Y_{- i, d}, W)$ is the utility derived from activity d when individuals are choosing $Y_{d} = (y_{i, d}, Y_{- i, d})$ and the network structure is given by W. The coefficient $δ_{d} \geq 0$ captures the relative importance (or weight) of the utility of activity d with respect to the utility of the network $v_{i} (W)$ . We call this the incentive effect of activity d on network formation.⁹

It is worth noting that, conditional on W, the utility in equation (1) is separable across activities, so we assume no complementarity across activities. This assumption is quite common in the literature, with the notable exception of Cohen-Cole, Liu, and Zenou (2018).¹⁰ That being said, since the network structure is endogenously determined in our context, the optimal choice for each activity will, at equilibrium, be a function not only of the individual's preference for this particular activity but also of preferences for the other activities, and their explicit network preferences.¹¹ Indeed, the equilibrium value for $(W, Y_{1}, \dots, Y_{\bar{d}})$ is the result of a complex interplay between individuals' preference over the network structure and their preferences regarding each type of activity. This has important consequences for estimation, as will be discussed in Section 4.

For tractability, we follow the literature (e.g., Ballester, Calvó-Armengol, and Zenou (2006), Calvó-Armengol, Patacchini, and Zenou (2009), Bramoullé, Kranton, and D'amours (2014), Boucher (2016)) and assume a linear quadratic specification for the utility of activity conditional on network structure: [Image Omitted. See PDF] where $μ_{i, d}$ captures individual exogenous heterogeneity. The first and second terms of equation (2) represent the private benefit and cost of increasing the intensity of the activity $y_{i, d}$ . The third term reflects an additional social benefit (or cost) of increasing the intensity of the activity for the individual, that is, a complementary (or competitive) effect from peers' activity intensities whenever $λ_{d} \geq 0$ ( $λ_{d} < 0$ ).

We assume that individual i's (explicit) preference for the network structure is given by [Image Omitted. See PDF] where $τ_{i, W}$ is an idiosyncratic shock on the value of the network W for individual i. In equation (3), the local network effects captured by $ψ_{i j}$ give the intrinsic bilateral value (for individual i) of a link between i and j. This value is assumed to be independent from the position of the individual in the network. However, as argued by Bramoullé and Fortin (2009), individuals may also have preferences over the entire network structure. These preferences are captured by the global network effects $ϖ_{i} (w_{i}, W_{- i}) η$ and allow for preferences regarding many features of the networks (e.g., popularity, clustering, etc.). Specifically, $ϖ_{i} (w_{i}, W_{- i})$ is an $\bar{h}$ -dimensional row vector of network statistics that are relevant to individual i's utility, and η is the corresponding vector of coefficients. Note that by considering these global network effects, our network model differs substantially from the pairwise independent network link case (Bramoullé and Fortin (2009)) and connects to ERGMs in the statistical literature. The empirical specification of global network effects used in this paper is discussed in Section 2.4. We now discuss our equilibrium concepts for $(W, Y_{1}, \dots, Y_{\bar{d}})$ and the timing of the game.

2.3 Game

The game occurs in two stages. In the first stage, network links are determined. In the second stage, individuals play a noncooperative game for the choice of the activity intensities $(Y_{1}, \dots, Y_{\bar{d}})$ , conditional on the network structure.

Our equilibrium (stability) concept for the first stage of the game is based on the literature focused on the stability and efficiency of network formation games (e.g., Jackson and Wolinsky (1996), Dutta and Jackson (2000), Jackson (2005), Caulier, Mauleon, and Vannetelbosch (2015)). In this transferable utility setting, individuals are allowed to make side payments. For example, individuals may be willing to spend time or resources so that other individuals want to be linked to them. Although these side payments are not observed, they play an important role in the efficiency of the equilibrium network.

The focus on a transferable utility framework in this paper contrasts with the existing economic literature based on ERGMs (i.e., as in equation (6) below). Indeed, the usual microeconomic foundation is based on Christakis et al. (2010), Badev (2013), and Mele (2017b) where individuals are assumed to have the opportunity periodically to meet another and revise their friendship status (in a nontransferable utility framework). Importantly, it is assumed that the revision of that friendship relation is done myopically, taking the rest of the network structure as given. The meeting process runs through time, and it is assumed that the observed network is drawn from its steady state distribution.

In contrast, we focus on a static random utility model and do not assume any specific meeting process.¹² This implicitly allows for a richer variety of meeting processes. Under the assumptions of Section 2.2, individuals' preferences not only depend on which friends they have, but also on the the global network effects (see equation (3)). Consequently, we might expect individuals to be willing to spend resources to promote certain friendship relations.

To clarify the intuition, consider a very simple example of a population composed of only three individuals (i, j, and k). Assume that the global network effect for i is given by $ϖ_{i} (w_{i}, W_{- i}) η = w_{i j} w_{k j} η > 0$ so that for i the value of a link with j is greater when there exists a link between k and j, as Figure 1 illustrates.

Individual i would therefore be willing to spend resources to compensate individual k for creating a link with j (provided they would not do so otherwise). Of course, this is just a simple example, but the general intuition is the same: global network effects introduce strong incentives for side payments, irrespective of the specification used.

Then, following the literature (see above), we focus on the set of networks that are both efficient and individually stable (or rational)—allowing for side payments—where efficiency is defined with respect to the network value denoted by $T (W)$ . Although this quantity can be defined in many ways, we assume (as in Dutta and Jackson (2000)) that the network value is given by the sum of the individuals' utilities, that is, $T (W) = \sum_{i} U_{i} (W)$ .¹³ This definition allows to see side payments as being made in “utility units.” We focus on strongly efficient networks (Dutta and Jackson (2000)), that is, networks $W^{*}$ such that $T (W^{*}) \geq T (W)$ for all W.

Whether or not strongly efficient networks are individually stable depends on how the network value is shared among individuals, that is, how side payments are made. This is formally described by the allocation rule ${Λ_{i} (W, T) | \sum_{i} Λ_{i} (W, T) = T (W)}$ , which is the utility received after the side payments have been paid.¹⁴ If no side payments are allowed, the allocation rule is simply given by the individuals' utility: $Λ_{i} (W, T) = U_{i} (W)$ . If there are side payments, the value of the side payments received by i is simply given by the difference between the allocation rule and the utility: $side {payments}_{i} = Λ_{i} (W, T) - U_{i} (W)$ . For example, if the value of the network is shared equally among individuals, we get: $Λ_{i} (W, T) = T (W) / m$ . Note that the balance requirement of $\sum_{i} Λ_{i} (W, T) = T (W)$ imposes that no outside resources are used in making side payments. Indeed, using our definition for the network value, we obtain $\sum_{i} Λ_{i} (W, T) = \sum_{i} U_{i} (W)$ .

Let $Ω_{\bar{n}}$ be the set of networks structures such that individuals have at most $\bar{n}$ friends. We say that a network $W \in Ω_{\bar{n}}$ is individually stable, relative to ${Λ_{i}}_{i = 1}^{m}$ and T, if for all i, $Λ_{i} (W, T) \geq Λ_{i} (\tilde{W}, T)$ for all networks $\tilde{W} \in Ω_{\bar{n}}$ such that ${\tilde{w}}_{j k} = w_{j k}$ for all k and all $j \neq i$ .¹⁵ In essence, the network is individually stable if no individual wants to create or remove links, given the side payments. In what follows, we do not make any assumptions regarding the specific allocation rule used. We merely assume that the allocation rule is selected among the (nonempty) set of allocation rules compatible with both strong efficiency and individual stability.

Once the network is formed, individuals are free to select intensities of activities in which they are involved, conditional on network structure. We follow the extensive literature for games on networks (e.g., Ballester, Calvó-Armengol, and Zenou (2006), Calvó-Armengol, Patacchini, and Zenou (2009), Bramoullé, Kranton, and D'amours (2014), Boucher (2016)) and assume that activity intensity choices are part of a Nash equilibrium.¹⁶ Formally, our equilibrium concept for the two-stage game is defined as follows.1 Definition

An (subgame perfect) equilibrium of the two-stage game is a collection $(W, Y_{1}, \dots, Y_{\bar{d}})$ such that:

1. $(Y_{1}, \dots, Y_{\bar{d}})$ is in a Nash equilibrium, conditional on $W \in Ω_{\bar{n}}$ . We denote such an equilibrium by $(Y_{1}^{*} (W), \dots, Y_{\bar{d}}^{*} (W))$ .
2.The network value [Image Omitted. See PDF] is strongly efficient and individually stable for networks in $Ω_{\bar{n}}$ under some allocation rules.

Note that the definition of the value of the network in the first stage of the game (i.e.,

T_{Y^{*}} (W)

) is given by the sum of the individuals' utilities, anticipating that individuals will play the Nash equilibrium

Y_{d}^{*} (W)

in the second stage. In this sense, therefore, the equilibrium is subgame perfect since it is solved by backward induction. The next proposition follows.

1 Proposition

Assume that $| λ_{d} | < 1 / \bar{n}$ for all $d = 1, \dots, \bar{d}$ and that $τ_{W} \equiv \sum_{i} τ_{i, W}$ is distributed according to a Type I extreme value distribution. Then there exists a (generically) unique equilibrium of the two-stage game. Moreover,

(i)for all d, such that $y_{i, d} \in R$ , we have [Image Omitted. See PDF] where $I_{m}$ is an $m \times m$ identity matrix and $μ_{d} = {(μ_{1, d}, \dots, μ_{m, d})}^{'}$ . While for all d such that $y_{i, d} \in R_{+}$ , the unique equilibrium is determined as [Image Omitted. See PDF]
(ii)The equilibrium value is given by $T_{Y^{*}} (W) = V (W) + τ_{W}$ , where [Image Omitted. See PDF] Therefore, the probability of W at equilibrium is given by [Image Omitted. See PDF]

The uniqueness of the Nash equilibrium uses standard arguments (e.g., Ballester, Calvó-Armengol, and Zenou (2006)). Whenever

| λ_{d} | < 1 / \bar{n}

for all

d \in (1, \dots, \bar{d})

, the best response functions are contraction mappings, leading to a unique fixed point.¹⁷ Note that the contracting property also has the important numerical advantage of providing an iterative procedure to solve for the equilibrium when

y_{i, d} \in R_{+}

. See the proof of Proposition 1 in Appendix A.

Also, since the Nash equilibrium in the second stage is unique, the value of the network is also uniquely determined, that is, $T_{Y^{*}} (W) = T (W)$ . Since $τ_{W} \equiv \sum_{i} τ_{i, W}$ is distributed according to a Type I extreme value distribution, standard arguments show that the distribution of the maximum of $T (W)$ follows a logistic form. Brock and Durlauf (2001) used the same assumption when specifying the social welfare function. Indeed, while one might prefer to assume that each $τ_{i, W}$ follow a Type I extreme value distribution, this would make the model intractable since the sum of Type I extreme value distributions is not Type I extreme value distributed. Assuming that $τ_{W}$ is distributed according to a Type I extreme value distribution guarantees that the unique strongly efficient network is given by $(i i)$ of Proposition 1.

An example of an allocation rule for which the strongly efficient network is also individually stable is $Λ_{i} (W, T) = T / m$ . Of course, this is not the only admissible allocation rule. In particular, any allocation rule that can be written as a nondecreasing function of the network value is individually stable. More generally, it is also possible to impose additional normative properties on the admissible allocation rules. We refer the interested reader to Dutta and Jackson (2000) for further discussions and results. Jackson (2005) also presented an extensive discussion of the type of allocation rules compatible with both strong efficiency and individual stability.

It is worth noting that the expression $μ_{d}^{'} Y_{d}^{*} (W) - \frac{1}{2} Y_{d}^{*^{'}} (W) Y_{d}^{*} (W) + λ_{d} Y_{d}^{*^{'}} (W) \times W Y_{d}^{*} (W)$ in the equilibrium value for $V (W)$ reduces to $\frac{1}{2} Y_{d}^{*^{'}} (W) Y_{d}^{*} (W)$ when activity d's intensity is uncensored since we can exploit the closed-form solution in equation (4). In that case, the incentive effect of activities is always non-negative (recall that $δ_{d} \geq 0$ in equation (1)). Indeed, as noted by Ballester, Calvó-Armengol, and Zenou (2006), this property holds whenever the choice of activities features complementarities.¹⁸ As such, without explicit network preferences, that is, $v_{i} (W)$ , individuals would want to have as many links as possible, leading to a regular network in which all individuals have $\bar{n}$ links. In Section 2.4, we discuss the specific parametric assumptions of $v_{i} (W)$ and how they prevent the model from generically producing degenerated network structures.

2.4 Parametric specification

We specify individual exogenous heterogeneity in equation (2) via $μ_{i, d} = x_{i} β_{1 d} + \sum_{j = 1}^{m} w_{i j} x_{j} β_{2 d} + α_{d} + ϵ_{i, d}$ , where $α_{d}$ is a constant term, and $ϵ_{i, d}$ is the unobserved heterogeneity of i's preference regarding activity d. Following Proposition 1, the equilibrium for uncensored activities is given by [Image Omitted. See PDF] where $l_{m}$ is the m-dimensional vector of ones, and $ϵ_{d} = {(ϵ_{1, d}, \dots, ϵ_{m, d})}^{'}$ . Equation (7) matches the reduced form of a spatial autoregressive (SAR) model (Bramoullé, Djebbari, and Fortin (2009), Lee, Liu, and Lin (2010), Lin (2010)) for studying social interactions. The coefficient $λ_{d}$ in equation (7) represents the endogenous (peer) effect, which has been the focus of recent literature due to its policy implications (Glaeser, Sacerdote, and Scheinkman (2003)). The vector of coefficients $β_{d} = {(β_{1 d}^{'}, β_{2 d}^{'})}^{'}$ captures the own and contextual effects of individuals' and friends' exogenous characteristics on $Y_{d}$ .

However, a notable departure of our model from the literature is that the network structure W in equation (7) is explicitly endogenous. To understand the source of this endogeneity, recall that from Proposition 1, we have [Image Omitted. See PDF] For the sake of the discussion, assume for the moment that $y_{i, d} \in R$ , so that we have [Image Omitted. See PDF] We see immediately that $V (W)$ is also a function of $ϵ_{d}$ . Then this implies that any shock $ϵ_{d}$ has three conceptually distinct effects. First, it directly affects $Y_{d}^{*}$ , conditional on W. Second, through its effect on $Y_{d}^{*}$ , it indirectly affects network structure W through its effect on $V (W)$ . Third, through its indirect effect on W, it also affects $Y_{\tilde{d}}^{*}$ for the other activities $\tilde{d} \neq d$ .

This endogeneity is exacerbated if there exist unobserved variables that are directly affecting the network formation as well as intensities of activities (e.g., Goldsmith-Pinkham and Imbens (2013), Hsieh and Lee (2016)). We also allow for such unobserved variables to affect individuals' preferences over links in our model. Specifically, we follow Hsieh and Lee (2016) to introduce the multidimensional individual latent variables $z_{i} = {(z_{i 1}, \dots, z_{i \bar{ℓ}})}^{'}$ in the network formation process through the local network effect as follows: [Image Omitted. See PDF] The variables $c_{i}$ and $c_{j}$ in equation (8) are observed $\bar{s}$ -dimensional row vectors of individual specific characteristics, and the variable $c_{i j}$ is an observed $\bar{q}$ -dimensional row vector of dyad-specific characteristics, such as whether i and j have the same age, sex, or race.¹⁹ In particular, the individual and dyadic characteristics $C = {(c_{i}, c_{j}, c_{i j}) : i = 1, \dots, m, j = 1, \dots, m, i \neq j}$ control for observed homophily in the network formation process (see, e.g., Fafchamps and Gubert (2007a, 2007b) in the context of risk sharing network formation). The variables $| z_{i ℓ} - z_{j ℓ} |$ for $ℓ = 1, \dots, \bar{ℓ}$ in equation (8) are meant to capture the homophily on unobserved characteristics. We expect the coefficients $γ_{4}^{'} s$ to be negative, reflecting the fact that larger differences between individual unobserved characteristics reduce the likelihood that two individuals become friends. For exposition purposes, we denote $γ = {(γ_{0}, γ_{1}^{'}, γ_{2}^{'}, γ_{3}^{'}, γ_{4}^{'})}^{'}$ .

The latent variables also affect intensities of activities through the unobserved heterogeneity: $ϵ_{d} = Z ρ_{1 d} + W Z ρ_{2 d} + ξ_{d}$ , where $Z = {(z_{1}^{'}, \dots, z_{m}^{'})}^{'}$ is a $m \times \bar{ℓ}$ matrix of unobserved (latent) variables. Note that Z is not specific to any activity d. Correspondingly, the activity intensity of equation (7) can be rewritten as [Image Omitted. See PDF] where we assume that $ξ_{d} \sim N_{m} (0, σ_{ξ_{d}}^{2} I_{m})$ . Given the role played by the latent variables Z in equation (8) for network formation, Z and WZ that appear in equation (9) can also be interpreted as control functions for solving endogeneity due to individual and contextual unobserved correlated effects (Arcidiacono, Foster, Goodpaster, and Kinsler (2012), Fruehwirth (2014), Hsieh and Van Kippersluis (2018)).²⁰ Note that latent variables Z are treated as random effects and, therefore, they should be independent of X in equation (9). Besides, although both WZ and $α_{d}$ capture the correlated effects due to unobserveables, they are differentiated by the fact that WZ reflect individual variations while $α_{d}$ is a constant for all group members.

In our empirical application, we consider the following specification of global network effects (see equation (3)): [Image Omitted. See PDF] We discuss the interpretation of each term in equation (10) in turn and provide a visualization of each effect in Figure 2.²¹

The reciprocity effect implies that (provided that $η_{1} > 0$ ) i enjoys more utility from a link with j if j also has a link with i. The congestion effect (provided that either $η_{2} < 0$ or $η_{3} < 0$ ) implies that the value for i of a link with j decreases with the number of links that i has. This may represent the fact that i has limited resources, for example, limited time, energy, or money (Boucher (2015)). Note that we capture this effect by individual i's out-degree and out-degree square to allow this cost to be concave or convex. The popularity effect captures the fact that i may enjoy more utility (if $η_{4} > 0$ ) from their link with j if j is popular, that is, receives many links.

The transitive triads effects include preferences for cliques, that is, explicit preferences for the transitivity of the network. When i is considering a link to j, he may take into account that he has a link to k, and k has a link to j. Therefore, the creation of a link between i and j would close the triad between i, j, and k. There are, of course, other types of transitive triads effects, displayed at the bottom left of Figure 2.

A similar intuition holds for the three-cycle effect, although as noted by Snijders, Van De Bunt, and Steglich (2010), the emergence of more three cycles in a network implies fewer hierarchical relationships among individuals. Indeed, note that the three cycle effect differs from the transitive triads effects by having a circular links between i, j, and k, see the bottom right of Figure 2. As such, the three cycle is less hierarchical since all individuals have one inward link and one outward link. As discussed by Davis (1970), social networks usually feature local hierarchy, that is, fewer three cycles. We therefore expect $η_{6}$ to be negative.²²

Given the parametric assumption in equation (10), we can write [Image Omitted. See PDF] where in general for an $m \times 1$ vector A, $Diag (A)$ is an $m \times m$ diagonal matrix with its diagonal elements formed by the entries of the vector A. One can see that parameters $η_{51}$ , $η_{52}$ , and $η_{53}$ are not identified separately from equation (11) via the (first stage) game. Hence, we will use $η_{5} = η_{51} + η_{52} + η_{53}$ hereafter. We further denote $η = {(η_{1}, \dots, η_{6})}^{'}$ for the purposes of exposition.

It is important to note that by including the possible cost captured by the congestion effect, some of the global network effects are expected to produce sufficiently large negative externalities on link formation so that not all individuals will form $\bar{n}$ links (resulting in a regular graph). Moreover, as discussed by Snijders et al. (2006), Bhamidi et al. (2011), Chatterjee, Diaconis et al. (2013), and Mele (2017b), we need negative externalities in ERGMs to produce sparser graphs so that they can be distinguished from Erdös–Rény random graphs when the number of individuals increases. In our case, $η_{2}$ (or $η_{3}$ ) and $η_{6}$ are expected to create such negative externalities, and we confirm that these parameter estimates are significantly negative in our empirical study in Section 4. Lastly, in order to summarize our empirical model and provide an intuitive illustration, we use Figure 3 to present the model with two activities in which variables and interactions between them are displayed.

3 Model estimation 3.1 Group heterogeneity

Prior to this section, we assumed that individuals belong to one, potentially large population. However in many contexts—such as the high school students in our empirical application—the population can be partitioned into groups, such that individuals can only form links within each group. In this context, it is important to capture the heterogeneity across these groups. We therefore expand the model assuming that the population is partitioned into G groups, and we use the subscript $g \in {1, \dots, G}$ to indicate explicit group heterogeneity.

3.2 Likelihood function of the model

To clarify the intuition of the estimation procedure—and for the purposes of exposition—we assume that $\bar{d} = 1$ and that $y_{i, d} = y_{i}$ is uncensored. Accordingly, we drop the subscript d for clarity.²³ Although our preferred specification for the empirical application is for two activities (one censored and one uncensored), its formal description involves additional notations and steps but not any new conceptual issue. We refer the interested reader to Online Supplementary Appendix D for details.

Since both $Y_{g}$ and $W_{g}$ are endogenous variables, we focus on the joint likelihood, that is,²⁴ [Image Omitted. See PDF] where $θ_{g} = (γ^{'}, η^{'}, δ, λ, β^{'}, ρ^{'}, σ_{ξ_{g}}^{2})$ . Note that to describe group heterogeneity on activity outcomes at both the mean and variance levels, we use $α_{g}$ and $σ_{ξ_{g}}^{2}$ to capture respectively the group fixed effect and group heteroskedasticity in equation (9). To adhere to the principle of model parsimony, the other coefficients are assumed to be common across groups. Using the parametric forms assumed in Section 2.4, we can write the joint mixed density function of $W_{g}$ and $Y_{g}$ conditional on the latent variable $Z_{g}$ as [Image Omitted. See PDF] where $S_{g} (W_{g}) = I_{m_{g}} - λ W_{g}$ , $ξ_{g} = S_{g} (W_{g}) Y_{g} - X_{g} β_{1} - W_{g} X_{g} β_{2} - Z_{g} ρ_{1} - W_{g} Z_{g} ρ_{2} - l_{m_{g}} α_{g}$ , and [Image Omitted. See PDF] The probability of $W_{g}$ in equation (12) is the logit probability from Proposition 1. As $Z_{g}$ consists of latent variables, the joint density of the observed $(W_{g}, Y_{g})$ would be $P (W_{g}, Y_{g} | θ_{g}, α_{g}) = \int_{Z_{g}} P (W_{g}, Y_{g} | θ_{g}, α_{g}, Z_{g}) d F (Z_{g})$ , where $F (\cdot)$ is the joint distribution of $Z_{g}$ .

Before discussing the estimation strategy, we would like to briefly comment on the identification of the model. As discussed below, most of these formal identification results follow the literature. The intuition is also discussed in Figure 3.

First, as seen from the likelihood function of equation (12), the vast majority of parameters in our model are identified from $P (W_{g} | θ_{g}, α_{g}, Z_{g})$ , which is an ERGM. Intuitively, our model is identified under the many networks asymptotics (Mele (2017b)), which assumes that the number of groups grows with the sample size. In such a case, the identification of the parameters is standard and follows the theory for exponential families under the usual regularity conditions as in Lehmann and Casella (2006).

The additional identification issue beyond ERGM is linked to the latent variables Z in equations (8) and (9). We follow Hsieh and Lee (2016) and Hsieh and Van Kippersluis (2018) to impose some assumptions on $z_{i, ℓ}$ : (1) the variance of $z_{i, ℓ}$ is normalized to one; (2) $z_{i ℓ}$ is independent across i and ℓ; (3) $z_{i, ℓ}$ follows a known distribution, in our case a normal distribution; (4) to distinguish different dimension of $z_{i, ℓ}$ , we further restrict $| γ_{41} | \geq | γ_{42} | \geq \dots \geq | γ_{4 \bar{ℓ}} |$ in equation (8).²⁵

Also note that while the individual specific variables $c_{i}$ in the local network effect of equation (8) can overlap with $x_{i}$ in equation (7), the dyad-specific variables $c_{i j}$ used in equation (8) are naturally excluded from equation (7) because of mismatched dimensions. Thus, the model satisfies the exclusion restriction condition in identifying the effects of Z and WZ in equation (9).

The main challenge in estimating this model is to compute the denominator of a network probability in equation (12). As it sums all possible network structures, its direct evaluation is impossible, even for small sized networks. For example, in a network with only five individuals in which individuals can form up to five friends, the number of possible network structures is $2^{5 (5 - 1)} = 2^{20}$ . Hence, any estimation method involving a direct likelihood evaluation is not feasible. This problem applies to all ERGMs for networks (e.g., Badev (2013), Chandrasekhar and Jackson (2014), Mele (2017b), Boucher and Mourifié (2017)), and that can be traced back to the spatial analysis in Besag (1974).

In this paper, we implement a Bayesian estimation approach using an effective MCMC technique developed to handle the intractable normalizing term in the posterior density function (e.g., Mele (2017b)).²⁶ We start by reviewing the intuition behind the general technique.

3.3 General intuition: Double M–H algorithm

To clarify the intuition, we will (abusively) use the following simplifying notation: for any variable $A_{g}$ , we use the notation ${A_{g}}$ to represent the collection of variables $A_{g}$ across G groups, that is, ${A_{g}} : = (A_{1}, \dots, A_{G})$ . Let $y = ({Y_{g}}, {W_{g}})$ and $ϑ = ({θ_{g}}, {α_{g}}, {Z_{g}})$ . From Section 3.2, the likelihood function of y, given ϑ, has the following form: $P (y | ϑ) = f (y; ϑ) / D (ϑ)$ , where $D (ϑ)$ is an intractable normalizing term.

The standard M–H algorithm to simulate random draws of ϑ operates as follows: given an old draw $ϑ_{old}$ , one proposes a new draw $ϑ_{new}$ from a proposal distribution $q (\cdot | ϑ_{old})$ , and then updates the old draw to the new draw with an acceptance ratio $α_{MH} (ϑ_{new}, ϑ_{old})$ given by [Image Omitted. See PDF] where $π (ϑ)$ is the prior distribution of ϑ. One can see that in equation (13), the normalizing terms $D (ϑ_{old})$ and $D (ϑ_{new})$ do not cancel out; thus the evaluation of the acceptance-rejection criterion in equation (13) is intractable.

To solve this problem, Murray, Ghahramani, and MacKay (2006) consider including auxiliary variables $\tilde{y}$ into the acceptance probability, that is, the acceptance probability can be written as [Image Omitted. See PDF] where $\tilde{y}$ are simulated from the likelihood function $P (\tilde{y} | ϑ_{new}) = f (\tilde{y}; ϑ_{new}) / D (ϑ_{new})$ with the exact sampling (Propp and Wilson (1996)).

In the conditional acceptance probability of equation (14), all normalizing terms cancel out, and the remaining terms are computable. This algorithm is called the “exchange algorithm” because a swapping operation between $(ϑ_{old}, y)$ and $(ϑ_{new}, \tilde{y})$ is involved (Geyer (1991)). The exchange algorithm differs from the conventional M–H algorithm by adding a randomization component into the proposal density; this changes $q (ϑ_{new} | ϑ_{old})$ into $q (ϑ_{new} | ϑ_{old}) P (\tilde{y} | ϑ_{new})$ .

The exchange algorithm defines a valid Markov chain for simulating from $P (ϑ | y)$ (Murray, Ghahramani, and MacKay (2006), Liang (2010), Liang, Jin, Song, and Liu (2016)). However, implementing the exchange algorithm is time consuming because it requires the exact sampling of $\tilde{y}$ from $P (\tilde{y} | ϑ_{new})$ . To save computation time, Liang (2010) proposed a “double M–H algorithm” that utilizes the reversibility condition and shows that when $\tilde{y}$ is simulated by the M–H algorithm—starting from y with R iterations—the conditional acceptance probability in equation (14) can be obtained regardless of the value of R. This gives the double M–H algorithm an advantage as a small value of R can be used, thereby removing the need for exact sampling.

Also note that Mele (2017b) suggests similarly the use of the double M–H algorithm in estimating ERGMs; however, he improves the convergence of the double M–H algorithm further by mixing the conventional random walk proposal with other proposals, such as random block techniques (Chib and Ramamurthy (2010)), to improve the mixing and convergence of the network simulation.²⁷ With this mixed proposal, Mele (2017b) showed that simulation of the network can escape from the local maxima in the “low temperature regime” of the ERGM (Bhamidi et al. (2011)) where the mixing is problematic. Given this greater computational efficiency compared to exact sampling, we adopt the double M–H algorithm, combined with Mele (2017b)'s improvement for network simulation.

We also provide a technical contribution for the computation of the double M–H algorithm for our model. Using the double M–H algorithm to update ϑ from $P (ϑ | y)$ requires simulating the auxiliary variable $\tilde{y}$ . In our context, however, the auxiliary activity variables ${{\tilde{Y}}_{g}}$ in $\tilde{y}$ are redundant during simulation as they can be replaced by either a closed-form function (as in equation (4)) or a contraction mapping (as in equation (5)) of auxiliary networks and estimated individual heterogeneity. Thus, we can simplify $\tilde{y}$ to $\tilde{w} = {{\tilde{W}}_{g}}$ and modify the conditional acceptance probability in equation (14) to [Image Omitted. See PDF] To evaluate $α (\tilde{w}, ϑ_{new}, ϑ_{old})$ in equation (15), we need only to simulate the auxiliary networks $\tilde{w}$ from their probability density function $P (\tilde{w} | ϑ_{new}) = f (\tilde{w}; ϑ_{new}) / D (ϑ_{new})$ that shares the same normalizing term, that is, $D (ϑ_{new})$ in $P (\tilde{y} | ϑ_{new})$ .

3.4 Posterior distributions and the MCMC

We now present the MCMC procedure. As we regard unobserved latent variables ${Z_{g}}$ as individual random effects, we follow Zeger and Karim (1991), Hoff, Raftery, and Handcock (2002), and Handcock, Raftery, and Tantrum (2007) to basically treat ${Z_{g}}$ as parameters to be estimated. By Bayes' theorem, the joint posterior distribution of the parameters and unobservables in the model based on the likelihood function of equation (12) can be written as [Image Omitted. See PDF] where $π (\cdot)$ represents the density function of the prior distribution, and as earlier we suppress the dependence of the likelihood function on ${X_{g}}$ and ${C_{g}}$ for notational clarity. We discuss the choice of prior distributions in the Online Supplementary Appendix D.2.

Obtaining draws directly from the joint posterior distribution of equation (16) is challenging. Thus, we block the unknown parameters and latent variables into subgroups and proceed with the Gibbs sampling. We provide the list of conditional posterior distributions used by the Gibbs sampler in the Online Supplementary Appendix D.3. A subset of the parameters admit closed-form conditional posterior distributions; therefore, they can be drawn directly to improve convergence. However, this is not true for remaining parameters, and therefore we use the double M–H algorithm in Section 3.3 to draw from those relevant conditional posterior distributions. Tierney (1994) and Chib and Greenberg (1996) have shown that the combination of Markov chains (Metropolis-within-Gibbs) remains a Markov chain with the invariant distribution being the correct objective distribution.

To understand the general approach and to compare our algorithm with the literature, we first present a pseudo MCMC algorithm, highlighting the main double M–H steps of our formal and full MCMC algorithm. The presentation of the pseudo MCMC allows us to present the main steps of the formal algorithm without introducing heavy notation and computational details. The formal and full MCMC algorithm—with one left-censored and one uncensored activities—is presented in the Online Supplementary Appendix D.4.

1 Algorithm Pseudo MCMC

In each iteration t, given $ϑ^{(t - 1)} = {{θ_{g}}^{(t - 1)}, {α_{g}}^{(t - 1)}, {Z_{g}}^{(t - 1)}}$ from the previous iteration, perform the following double M–H steps sequentially for each group $g = 1, \dots, G$ and for any variable $Ξ_{g} \in ϑ$ :

(a)Propose ${\tilde{Ξ}}_{g}$ from $q (Ξ_{g} | Ξ_{g}^{(t - 1)})$ .
(b)Compute the residuals ${\tilde{ξ}}_{g}$ from the activity intensity equation conditional on the proposed ${\tilde{Ξ}}_{g}$ and the value of the other unknown parameters and variables at iteration $t - 1$ , that is, ${(ϑ ∖ Ξ_{g})}^{(t - 1)}$ .
(c)
Simulate auxiliary network ${\tilde{W}}_{g}$ . Set the initial auxiliary network ( ${\tilde{W}}_{g}^{(0)}$ ) equal to the observed network $W_{g}$ . Conditional on ${\tilde{ξ}}_{g}$ , ${\tilde{Ξ}}_{g}$ , and ${(ϑ ∖ Ξ_{g})}^{(t - 1)}$ , looping over all entries (except the diagonal ones) of ${\tilde{W}}_{g}^{(0)}$ and for each entry performing either a local or a global update (with probability $P_{inv}$ ) described below:
(i)local update: for entry $w_{i j}$ , where $j \neq i$ , propose ${\tilde{w}}_{i j, g}^{(r)} = 1 - {\tilde{w}}_{i j, g}^{(r - 1)}$ . Accept ${\tilde{w}}_{i j, g}^{(r)}$ with the probability [Image Omitted. See PDF] otherwise, set ${\tilde{w}}_{i j, g}^{(r)} = {\tilde{w}}_{i j, g}^{(r - 1)}$ .
(ii)global update: propose ${\tilde{W}}_{g}^{(r)}$ which inverts the entire adjacency matrix, that is, ${\tilde{W}}_{g}^{(r)} = l_{m_{g}} l_{m_{g}}^{'} - I_{m_{g}} - {\tilde{W}}_{g}^{(r - 1)}$ . Accept ${\tilde{W}}_{g}^{(r)}$ with the probability [Image Omitted. See PDF] otherwise, set ${\tilde{W}}_{g}^{(r)} = {\tilde{W}}_{g}^{(r - 1)}$ .

Repeat the described updating procedure R times and obtain the realization of ${\tilde{W}}_{g}^{(R)}$ as the simulation result.²⁸ If the obtained auxiliary network does not belong to $Ω_{\bar{n}, g}$ , reject it and rerun the simulation.

(d)Set

Ξ_{g}^{(t)}

equal to

{\tilde{Ξ}}_{g}

with the probability [Image Omitted. See PDF] otherwise, set

Ξ_{g}^{(t)} = Ξ_{g}^{(t - 1)}

Note that Step (c) of Algorithm 1 represents the simulation of the auxiliary variable $\tilde{w} = {{\tilde{W}}_{g}}$ discussed in Section 3.3. It is worth noting that some authors have used similar estimation strategies. For example, Mele (2017b) used a special case of Algorithm 1 where $ϑ = {θ_{g}}$ and Step (b) is left out. Note also that Algorithm 1 can be adapted to the case where $y_{i, d}$ is left-censored. To do so, one only has to incorporate the additional latent variable $Ÿ_{g}$ to the list in $Ξ_{g}$ .

To examine the finite sample performance of the Bayesian MCMC sampler, we carry out a Monte Carlo simulation experiment to demonstrate that the MCMC sampler can successfully recover the true parameters from the artificially generated network data. We also use this simulation to show the issue of model misspecification and the associated estimation biases. More details of this simulation study can be found in the Online Supplementary Appendix E.

4 Empirical applications: Friendship networks, academic outcomes, and smoking behaviors

We present an empirical application of our model on American high school students' friendship networks within the Add Health data, a national survey based on 132 schools, covering grades 7 through 12 (Udry (2003)). Five waves of the survey were conducted between 1994 and 2018. In the wave I in-school survey, a total of 90,182 students were interviewed. Respondents answered questions regarding their demographic backgrounds, academic performances, and health related behaviors. Most uniquely, students were asked to nominate up to five male and five female friends. This provides detailed information related to their friendship networks.

The different waves' of in-home surveys of the Add Health project ask a greater amount of information about students' families and neighborhoods; however, this information is only recorded for a subset of individuals. To include most of the students' nominated friends and to mitigate as much as possible sampling biases (e.g., Chandrasekhar and Lewis (2011), Liu (2013)), we use the wave I in-school survey.

We consider two types of activities that may be subject to social interactions and that are relevant for friendship formation.²⁹ The first is the student's academic performance (measured by GPA), which is represented by a continuous (uncensored) variable. The second is the student's smoking habit, or more precisely, how frequently a student smokes in a typical week. The latter variable is represented by a censored variable as we observe a significant fraction of nonsmokers.

In the context of social interactions, students' academic performance and smoking behavior are studied extensively as they have important long term consequences on students' future lives and health. Studies of peer effects on students' academic performance, for example, Hoxby (2000), Sacerdote (2001), Hanushek, Kain, Markman, and Rivkin (2003), and Zimmerman (2003) used the linear-in-means model; whereas Calvó-Armengol, Patacchini, and Zenou (2009), Lin (2010), Boucher, Bramoullé, Djebbari, and Fortin (2014), and Liu, Patacchini, and Zenou (2014) use the network interactions model. For studies of peer effects on students' smoking behaviors, evidence of peer effects can be found in Gaviria and Raphael (2001), Powell, Tauras, and Ross (2005), Lundborg (2006), Clark and Lohéac (2007), Fletcher (2010), and Hsieh and Van Kippersluis (2018).

When studying interaction (peer) effects, researchers face a difficulty in identifying correlated effects from group-level unobservables and endogenous selection into groups, as well as separating the endogenous interaction effect from contextual effects in a linear model (the reflection problem of Manski (1993)). Using various approaches (e.g., randomization, fixed effects, etc.) to avoid these difficulties, researchers generally produce evidence for the existence of peer effects.

Hsieh and Lee (2016) considered further the problem of endogenous friendship selection of peer effects on economic activities by modeling unobservables in both network interactions and network formation processes. They find that the endogenous effect on academic performance obtained from a SAR model without controlling for the endogeneity of the spatial weight matrix can be upward biased.

In this paper, we follow Hsieh and Lee (2016) in controlling individual unobservables in the formation of friendship networks and activity outcomes. Furthermore, we investigate the incentive effects of activities in network formation and find that the benefit of interactions from academic learning is an important factor for students to form friendships in a school environment.

4.1 Data summary

We use the Add Health wave I in-school survey dataset in which all students in the sampled schools were expected to participate. We let each school be a group, and we ignore friendship relations between schools. Although there could still be network measurement errors due to students' absences, refusal to cooperate, etc., when compared to the strategies relying on in-home surveys or defining network groups at the grade level, the issue of missing links in our study is minimized. However, to ease the computation burden, we restrict our sample to small schools having student sizes less than 120.

This school-level sample is particularly well adapted to our study since it is very likely that students know each other. As discussed in Section 2, we assume that friendships are formed conditional on side payments. Students must therefore be sufficiently aware of one another as to be able to pay those transfers.

The final sample comprises a total of 1036 respondents from 15 schools (groups).³⁰ The school networks have an average size of 69.29, an average density of 0.076, an average out-degree of 3.752, and an average clustering coefficient of 0.095. Since the average out-degree is far below its top-coded value (at 10), the threat of missing links due to the fixed survey design (Kossinets (2006)) could be ignored.

To capture the local network effects in equation (8), we include an individual specific variable denoting how many years a student has spent in his or her school as well as three dummy variables: whether a pair of students are of the same age, sex, or race. For the variables used in the activity intensity equation of equation (9), the continuous (uncensored) activity outcome, GPA, is calculated using the average of a respondent's reported grades from several subjects, including language, social science, mathematics, and science (each of which each has a value between 1 and 4). The average GPA in the sample is 3.059. The censored activity variable, smoking, is obtained from the student's response to the survey question, “During the past twelve months, how often did you smoke cigarettes?”; the response has a value between 0 and 7. The average smoking frequency is 0.543 with 73.26% of observations censored at zero. We follow Lin (2010), Lee, Liu, and Lin (2010), and Hsieh and Lee (2016) to choose independent variables. A complete list of variables and their summary statistics are in Table 1.

Table 1 Summary statistics.

Variable	Min	Max	Mean	SD
GPA	1	4	3.059	0.742
Smoking	0 $(73.26 %)$	7	0.543	1.715
Age	11	18	13.788	1.641
Male	0	1	0.457	0.498
Female	0	1	0.543	0.498
White	0	1	0.741	0.438
Black	0	1	0.128	0.335
Asian	0	1	0.017	0.131
Hispanic	0	1	0.050	0.218
Other race	0	1	0.063	0.243
Both parents	0	1	0.821	0.383
Less HS	0	1	0.069	0.254
HS	0	1	0.326	0.469
More HS	0	1	0.460	0.499
Edu missing	0	1	0.094	0.291
Professional	0	1	0.273	0.446
Staying home	0	1	0.230	0.421
Other jobs	0	1	0.375	0.484
Job missing	0	1	0.069	0.254
Welfare	0	1	0.002	0.044
Num. of other students at home	0	6	0.591	0.850
Network size	29	101	69.29	23.96
Network density	0.016	0.136	0.076	0.062
Out-degree	0.000	10.000	3.752	2.688
Clustering coefficient	0.025	0.186	0.095	0.048
Sample size	1036
Num. of networks	15

Note: “Both parents” means living with both parents. “Less HS” means mother's education is lower than a high-school level, “HS” means mother's education level is high school. “More HS” means mother's education is above a high-school level. “Edu missing” means mother's education level is missing. “Professional” means mother's employment is as either a scientist, teacher, executive, director, and the like. “Other jobs” means mother's occupation is not among Professional or Staying home categories. “Job missing” means the mother's occupation information is missing. “Welfare” means the mother participates in social welfare programs. “Num. of other students at home” means the number of other students from grades 7 to 12 living in the same household with the student. The variables in italics are the omitted categories during the estimation.

4.2 Estimation results

In this empirical study, we specify our full model with the incentive effects from both activity outcomes—GPA and smoking. As discussed in Section 2.2, despite our assumption of conditional separability, omitting a relevant activity would likely bias the estimation. We therefore proceed to estimate the multiple-activity model. Separate estimations for the two single-activity models are provided in Supplementary Appendix Tables F.3 and F.4.³¹ In addition, modeling multiple activity outcomes allows us to explicitly control the correlation of error terms between activities.

Similar to the simulation study in the Online Supplementary Appendix E, we compare the estimation results of the full model with those of several possibly misspecified models in order to see how each model misspecification affects estimates of the full model, particularly for the estimate of endogenous peer effect (λ) on activity outcomes in equation (9). We present the estimation results in Table 2. From the first to the fifth columns, they are respectively the results of the full model, the model without the latent variables, the model without the global network effects, the model with only latent variables, and lastly the activity intensity equation alone like equation (7) assuming exogenous network links.³²

Table 2 Estimation results based on both GPA and smoking.

	Full		No Latent		No Global		Latent Only		Activity Alone
Local & global & incentive effects
Constant (γ₀)	−4.8531		−5.7071		−3.9329		1.1101		–
	(0.0366)		(0.1194)		(0.1134)		(0.0958)		–
Experience in school (sender) (γ₁)	−0.0405		−0.0159		0.0892		0.0528		–
	(0.0114)		(0.0212)		(0.0211)		(0.0148)		–
Experience in school (receiver) (γ₂)	0.0374		0.0515		0.1586		0.1301		–
	(0.0143)		(0.0198)		(0.0247)		(0.0147)		–
Same age (γ₃₁)	0.3675		0.5340		1.0300		0.4450		–
	(0.0431)		(0.0496)		(0.0773)		(0.0586)		–
Same sex (γ₃₂)	0.3471		0.4889		0.3454		0.5477		–
	(0.0441)		(0.0608)		(0.0686)		(0.0517)		–
Same race (γ₃₃)	0.3116		0.2152		0.3716		0.3961		–
	(0.0430)		(0.0789)		(0.0839)		(0.0595)		–
Latent distance (γ₄₁)	−0.1469		–		−0.3945		−3.9387		–
	(0.0173)		–		(0.0671)		(0.1214)		–
Latent distance (γ₄₂)	−0.0966		–		−0.2231		−2.5592		–
	(0.0414)		–		(0.0662)		(0.1000)		–
Latent distance (γ₄₃)	−0.0125		–		−0.0868		−2.3793		–
	(0.0528)		–		(0.0594)		(0.0887)		–
Reciprocity (η₁)	1.4309		1.4080		–		–		–
	(0.0476)		(0.0552)		–		–		–
Congestion (η₂)	0.2699		0.3521		–		–		–
	(0.0151)		(0.0281)		–		–		–
Congestion (η₃)	−0.0247		−0.0304		–		–		–
	(0.0017)		(0.0022)		–		–		–
Popularity (η₄)	0.0049		0.0015		–		–		–
	(0.0068)		(0.0061)		–		–		–
Trans. triads (η₅)	0.4715		0.4767		–		–		–
	(0.0220)		(0.0189)		–		–		–
Three cycles (η₆)	−0.2071		−0.2083		–		–		–
	(0.0171)		(0.0166)		–		–		–
	Full		No Latent		No Global		Latent Only		Activity Alone
Incentive from GPA (δ₁)	0.2145		0.1825		0.2921		–		–
	(0.0956)		(0.0344)		(0.1122)		–		–
Incentive from smoking (δ₂)	0.0197		0.0083		0.0214		–		–
	(0.0134)		(0.0053)		(0.0131)		–		–
Activity intensity—GPA
Endogenous (λ)	0.0177		0.0189		0.0162		0.0239		0.0330
	(0.0063)		(0.0070)		(0.0071)		(0.0091)		(0.0105)
	Own	Contex.	Own	Contex.	Own	Contex.	Own	Contex.	Own	Contex.
Age	−0.0327	0.0018	−0.0368	0.0013	−0.0382	0.0009	−0.0388	0.0009	−0.0366	−0.0013
	(0.0164)	(0.0024)	(0.0136)	(0.0024)	(0.0148)	(0.0025)	(0.0202)	(0.0025)	(0.0167)	(0.0026)
Male	−0.1984	−0.0152	−0.1926	−0.0166	−0.1854	−0.0143	−0.1863	−0.0166	−0.1906	−0.0161
	(0.0307)	(0.0215)	(0.0381)	(0.0239)	(0.0335)	(0.0212)	(0.0381)	(0.0232)	(0.0359)	(0.0244)
Black	−0.0014	−0.0636	−0.0564	−0.0564	−0.0265	−0.0608	−0.0109	−0.0489	−0.0607	−0.0577^**
	(0.0473)	(0.0247)	(0.0613)	(0.0234)	(0.0433)	(0.0226)	(0.0716)	(0.0219)	(0.0626)	(0.0212)
Asian	0.0028	−0.1957	−0.0791	−0.2240	−0.0574	−0.1647	−0.0384	−0.1824	−0.0309	−0.2251^***
	(0.0723)	(0.0566)	(0.1004)	(0.0659)	(0.0501)	(0.0393)	(0.0902)	(0.0567)	(0.0826)	(0.0643)
Hispanic	−0.2332	−0.0362	−0.2241	−0.0381	−0.2406	−0.0644	−0.2433	−0.0502	−0.1990	−0.0278
	(0.0360)	(0.0485)	(0.0776)	(0.0576)	(0.0442)	(0.0337)	(0.0737)	(0.0500)	(0.0680)	(0.0526)
Other race	−0.0689	−0.0967	−0.1168	−0.0873	−0.0671	−0.1137	−0.0555	−0.0838	−0.0914	−0.1020^**
	(0.0418)	(0.0449)	(0.0673)	(0.0483)	(0.0524)	(0.0385)	(0.0575)	(0.0473)	(0.0556)	(0.0491)
Both parents	0.1149	−0.0701	0.1140	−0.0598	0.1147	−0.0525	0.1145	−0.0602	0.1126	−0.0637^**
	(0.0368)	(0.0213)	(0.0510)	(0.0271)	(0.0382)	(0.0268)	(0.0427)	(0.0258)	(0.0449)	(0.0293)
Less HS	−0.3424	−0.0889	−0.3466	−0.0895	−0.3074	−0.1027	−0.2972	−0.0924	−0.3070	−0.0839^**
	(0.0586)	(0.0392)	(0.0702)	(0.0450)	(0.0507)	(0.0333)	(0.0623)	(0.0435)	(0.0608)	(0.0378)
More HS	0.1593	0.0485	0.1561	0.0419	0.1686	0.0397	0.1536	0.0261	0.1553	0.0272
	(0.0334)	(0.0211)	(0.0397)	(0.0236)	(0.0324)	(0.0223)	(0.0435)	(0.0248)	(0.0367)	(0.0256)
Edu missing	−0.1739	0.0634	−0.1751	0.0457	−0.1696	0.0355	−0.1624	0.0472	−0.1727	0.0323
	(0.0532)	(0.0334)	(0.0588)	(0.0391)	(0.0362)	(0.0341)	(0.0634)	(0.0366)	(0.0533)	(0.0342)
Welfare	−0.1079	−0.2848	−0.1593	−0.2434	−0.0547	−0.2932	−0.1001	−0.2892	−0.1973	−0.1893^**
	(0.0567)	(0.0723)	(0.0950)	(0.0980)	(0.0634)	(0.0550)	(0.1589)	(0.1173)	(0.1000)	(0.0716)
Job missing	−0.1335	−0.0446	−0.1261	−0.0359	−0.1808	−0.0371	−0.1625	−0.0522	−0.1500	−0.0253
	(0.0369)	(0.0442)	(0.0662)	(0.0476)	(0.0551)	(0.0387)	(0.0503)	(0.0381)	(0.0525)	(0.0464)
Professional	−0.1104	0.0010	−0.1158	0.0075	−0.1162	0.0119	−0.1347	0.0003	−0.1097	0.0076
	(0.0309)	(0.0267)	(0.0459)	(0.0278)	(0.0376)	(0.0209)	(0.0377)	(0.0264)	(0.0387)	(0.0246)
Other jobs	−0.0361	0.0268	−0.0482	0.0283	−0.0610	0.0310	−0.0679	0.0214	−0.0354	0.0289
	(0.0360)	(0.0229)	(0.0412)	(0.0222)	(0.0312)	(0.0167)	(0.0377)	(0.0227)	(0.0392)	(0.0214)
Num. of other students at home	0.0257	0.0123	0.0195	0.0131	0.0220	0.0154	0.0216	0.0096	0.0184	0.0068
	(0.0230)	(0.0104)	(0.0257)	(0.0133)	(0.0212)	(0.0119)	(0.0232)	(0.0140)	(0.0232)	(0.0131)
Latent (ρ₁₁)	0.0211	−0.0198	–	–	0.0067	0.0014	0.0272	−0.0170	–	–
	(0.0548)	(0.0283)	–	–	(0.0275)	(0.0230)	(0.0369)	(0.0112)	–	–
Latent (ρ₁₂)	0.0905	0.0070	–	–	0.0260	−0.0020	0.0721	−0.0235	–	–
	(0.0537)	(0.0254)	–	–	(0.0610)	(0.0242)	(0.0472)	(0.0197)	–	–
Latent (ρ₁₃)	0.0542	0.0101	–	–	0.0545	0.0105	0.0724	−0.0045	–	–
	(0.0481)	(0.0277)	–	–	(0.0454)	(0.0232)	(0.0466)	(0.0188)	–	–
Activity intensity—smoking
Endogenous (λ)	0.1052		0.1103		0.1068		0.1056		0.1125
	(0.0196)		(0.0225)		(0.0193)		(0.0197)		(0.0191)
	Own	Contex.	Own	Contex.	Own	Contex.	Own	Contex.	Own	Contex.
Age	0.1462	−0.0047	0.1613	−0.0054	0.1837	−0.0052	0.2051	−0.0120	0.1655	−0.0042
	(0.0201)	(0.0036)	(0.0248)	(0.0045)	(0.0386)	(0.0040)	(0.0455)	(0.0053)	(0.0212)	(0.0040)
Male	0.0126	−0.0185	−0.0598	0.0390	0.0307	−0.0077	−0.0494	0.0414	−0.0240	0.0160
	(0.0650)	(0.0305)	(0.0691)	(0.0424)	(0.0594)	(0.0427)	(0.0915)	(0.0436)	(0.0624)	(0.0395)
Black	−0.1952	0.0828	−0.3633	0.0726	−0.2332	0.0808	−0.4153	0.0810	−0.0383	0.0939
	(0.0749)	(0.0194)	(0.1395)	(0.0357)	(0.1003)	(0.0375)	(0.2075)	(0.0377)	(0.0954)	(0.0311)
Asian	−1.1052	0.3028	−0.5535	0.1893	−1.0746	0.2965	−1.2797	0.3093	−0.5277	0.2472
	(0.0615)	(0.0444)	(0.1917)	(0.0672)	(0.0857)	(0.0639)	(0.3075)	(0.1120)	(0.2115)	(0.1586)
Hispanic	0.1317	0.0734	−0.0626	0.0204	0.1128	0.0625	−0.0591	0.1509	0.2026	0.0232
	(0.0613)	(0.0472)	(0.1029)	(0.0839)	(0.0862)	(0.0735)	(0.2321)	(0.0849)	(0.0990)	(0.0734)
Other race	0.0610	0.4019	0.3284	0.2831	0.0755	0.3913	0.1891	0.5043	0.2874	0.3868
	(0.0704)	(0.0578)	(0.0894)	(0.0896)	(0.0659)	(0.0535)	(0.1459)	(0.0714)	(0.1449)	(0.0814)
Both parents	−0.0808	−0.0140	0.0250	−0.0431	−0.0101	−0.0564	−0.0588	−0.0351	−0.0061	−0.0437
	(0.0492)	(0.0311)	(0.0658)	(0.0434)	(0.0450)	(0.0415)	(0.0878)	(0.0579)	(0.0672)	(0.0382)
Less HS	0.3986	0.0485	0.1828	0.0409	0.2762	0.0382	0.2837	0.0762	0.1371	0.0426
	(0.0484)	(0.0634)	(0.0919)	(0.0607)	(0.0533)	(0.0478)	(0.0972)	(0.0667)	(0.1032)	(0.0676)
More HS	−0.1975	−0.0357	−0.2182	−0.0329	−0.0548	−0.0381	−0.1502	0.0131	−0.1895	−0.0103
	(0.0515)	(0.0369)	(0.0661)	(0.0479)	(0.0443)	(0.0383)	(0.0906)	(0.0472)	(0.0679)	(0.0512)
Edu missing	−0.0965	−0.0551	−0.1907	0.0275	−0.1202	−0.0620	−0.0775	−0.0273	−0.1388	−0.0216
	(0.0636)	(0.0351)	(0.0750)	(0.0611)	(0.0744)	(0.0433)	(0.0874)	(0.0806)	(0.0877)	(0.0619)
Welfare	1.5846	−0.1062	1.2705	−0.1601	1.5957	−0.0860	1.6348	0.0728	1.7299	−0.5848
	(0.0764)	(0.0550)	(0.1107)	(0.2201)	(0.0559)	(0.0653)	(0.2791)	(0.4136)	(0.2190)	(0.4791)
Job missing	0.4931	0.0860	0.3091	0.1156	0.5466	0.0798	0.5669	0.1639	0.2347	0.0045
	(0.0612)	(0.0542)	(0.1322)	(0.0683)	(0.0496)	(0.0658)	(0.1226)	(0.0907)	(0.1107)	(0.0696)
Professional	0.3740	−0.0013	0.3326	0.0014	0.3079	0.0492	0.4512	0.0344	0.2685	−0.0150
	(0.0464)	(0.0258)	(0.0791)	(0.0500)	(0.0852)	(0.0381)	(0.1044)	(0.0559)	(0.0949)	(0.0436)
Other jobs	0.0037	0.0395	0.1530	0.0315	0.1064	0.0460	0.1472	0.0682	−0.0551	0.0047
	(0.0545)	(0.0319)	(0.0552)	(0.0350)	(0.0615)	(0.0371)	(0.0882)	(0.0450)	(0.0758)	(0.0311)
Num. of other students at home	−0.1551	−0.0105	−0.0745	−0.0028	−0.1057	0.0073	−0.0636	0.0173	−0.0541	0.0009
	(0.0478)	(0.0234)	(0.0401)	(0.0246)	(0.0342)	(0.0195)	(0.0472)	(0.0291)	(0.0378)	(0.0234)
Latent (ρ₂₁)	−0.0921	−0.0180	–	–	0.0052	0.0163	−0.0119	−0.0027	–	–
	(0.0288)	(0.0258)	–	–	(0.0447)	(0.0453)	(0.0510)	(0.0201)	–	–
Latent (ρ₂₂)	0.0074	−0.0057	–	–	0.0147	−0.0044	−0.0332	0.0168	–	–
	(0.0471)	(0.0331)	–	–	(0.0712)	(0.0414)	(0.0574)	(0.0290)	–	–
Latent (ρ₂₃)	0.1055	0.0264	–	–	−0.0125	0.0250	0.0758	−0.0579	–	–
	(0.0520)	(0.0462)	–	–	(0.0632)	(0.0365)	(0.0562)	(0.0216)	–	–
Group fixed effect	Yes		Yes		Yes		Yes		Yes
$σ_{ξ_{u c, g}}^{2 (*)}$ (GPA)	0.4508		0.4801		0.4618		0.4668		0.4809
	(0.1532)		(0.1527)		(0.1501)		(0.1548)		(0.1591)
$σ_{ξ_{c, g}}^{2 (*)}$ (smoking)	3.6320		3.6435		3.5293		3.6085		3.5702
	(3.2199)		(3.2489)		(3.1070)		(3.1239)		(3.1364)
$σ_{ξ_{u c c, g}}^{(*)}$	−0.2522		−0.2452		−0.2448		−0.2542		−0.2474
	(0.2708)		(0.2770)		(0.2738)		(0.2755)		(0.2725)

Note: The full model contains the activity intensity equations for GPA and smoking and the network formation model, where the network formation model involves the latent characteristic variables, the global effect, and the incentive effect. In the second column, we remove the latent variables from the network formation model. In the third column, we remove the global effect from the network formation model. In the fourth column, we remove the global effect and the latent variables from the network formation model. In the fifth column, we estimate only the activity intensity equations. The MCMC runs for 100,000 iterations, and the first 50,000 runs are dropped due to burn-in. Values in parentheses are standard deviations of draws from MCMC. The parameters $σ_{ξ_{u c, g}}^{2 (*)}$ , $σ_{ξ_{c, g}}^{2 (*)}$ , and $σ_{ξ_{u c c, g}}^{(*)}$ denote the average of the estimated variances for error terms in the activity intensity equations of GPA, smoking, and their covariances from different groups and the value in the parenthesis is the average of standard deviations. The trace plots of key parameters, i.e., λ and δ and the convergence diagnostics of Geweke (1992), are provided in the Online Supplementary Appendix Figure F.6.

4.2.1 Network Formation

The results on local network effects in the full model are as follows. Staying in a same school for a longer time has a significant negative effect on sending out ( $γ_{1}$ ) but a positive effect ( $γ_{2}$ ) on receiving friendship nominations. The exogenous dyad-specific effects are all positive and significant, where the effect of the same age ( $γ_{31}$ ) is strongest, followed by the effect of the same sex ( $γ_{32}$ ), and then the effect of the same race ( $γ_{33}$ ). We find that the distances of latent variables ( $γ_{41}$ to $γ_{43}$ ) have significant negative effects on network formation, confirming the existence of homphily with respect to unobservables (Hoff, Raftery, and Handcock (2002), Goldsmith-Pinkham and Imbens (2013), Hsieh and Lee (2016)).³³

For the global network effects, we find a positive and strong reciprocity effect ( $η_{1}$ ), which is consistent with findings in the literature (Snijders, Van De Bunt, and Steglich (2010), Badev (2013), Mele (2017a)). This reflects the fact that mutual friendship nominations among students are still common (45.64% of friendship links) in our sample. The congestion effect is concave on individual's out-degree. Indeed, the linear effect ( $η_{2}$ ) is positive, while the quadratic effect ( $η_{3}$ ) is negative. This result confirms our conjecture that limited resources, for example, time, energy, and money, may constrain students from making too many friends. The popularity effect ( $η_{4}$ ) is small and insignificant.

The positive and strong transitive-triads effect ( $η_{5}$ ) shows that students value transitive relationships. As expected, the three-cycles effect ( $η_{6}$ ) is negative. As discussed in Snijders, Van De Bunt, and Steglich (2010), this reveals a certain degree of local hierarchy among students. Our estimated results of those global network effects confirm that there are nontrivial negative externalities on link formation that distinguish our model from the Erdös–Rény random graph model.

The incentive effect from GPA ( $δ_{1}$ ) is strong and significant. Therefore, for high school students in our sample, the utility of interactions in academic achievement influences their friendship decisions. In contrast, the incentive effect from smoking ( $δ_{2}$ ) is small and insignificant. Hence, it implies that students in our sample barely consider the utility of interactions in smoking as a factor in their friendship decisions.

4.2.2 Network Interactions on GPA

Our main finding for network interactions on academic performance is that, by controlling network endogeneity through latent variables and the incentive effect ( $δ_{1}$ ), the estimated endogenous effect (λ) on GPA drops from 0.0330 in the activity intensity equation alone (fifth column) to 0.0177 in the full model (first column). This highlights the effectiveness of our joint modeling approach for correcting the selection bias inherited in the activity intensity equation. Our estimate of endogenous effect coefficient on GPA is close but smaller than the estimate (0.019) found in Hsieh and Lee (2016), which is probably due to that they do not incorporate the utility of interactions in their network formation model. We interpret this estimated endogenous peer effect as follows. Through interactions, an individual could raise his/her GPA by 0.0177 units when any friend improves the GPA by one unit. Also note that the overall effect grows with the number of friends. The more friends that an individual has, the stronger the overall effect he/she receives. On average, one standard deviation increase in peers' GPA will increase an individual's GPA by 0.1530 points. This estimate also implies that the social multiplier effects, as measured by elements of ${(I_{m_{g}} - λ W_{g})}^{−1} l_{m_{g}}$ between individuals and groups, have an average of 1.0683 and a standard deviation of 0.0546.

Results from the second to the fourth columns in Table 2 show that correction of bias comes from both the incentive effect ( $δ_{1}$ ) and the unobserved latent characteristic variables ( $ρ_{11}$ to $ρ_{13}$ ). When the model only contains the latent variables (results in the fourth column), only 59.48% of the observed bias is corrected.³⁴ When the model only controls for the incentive effect (results in the second column), we correct for 92.16% of the observed bias. These empirical results confirm the findings of the simulation study in the Online Supplementary Appendix E that omitting the global network effects from the network formation model can result in an upward bias on the estimated incentive effect (results in the third column), and thus indirectly cause pressure on biasing the estimated endogenous peer effect (λ) downwards.

For the contribution of individual characteristics, we observe that students who are older, male, Hispanic, of other races, having a mother's education level as missing or lower than a high-school level, having a mother's occupation as missing or as a professional, or having a mother that participates in social welfare programs tend to have lower GPA scores. On the contrary, students who live with both parents or have a mother having an education level higher than high school, tend to have a higher GPA. We also see that one latent variable shows a significant positive effect on GPA. The estimates of the contextual effects for students that are either Black, Asian, other race, living with both parents, have a mother with less than a high-school level of education, or have received welfare are found to be positively significant. The estimates of the contextual effects for mothers having greater than a high-school level of education or missing education levels are found to be positively significant in the full model.

4.2.3 Network Interactions on Smoking

For the smoking outcome, we observe that the estimated endogenous (peer) effect (λ) drops from 0.1125 in the activity intensity equation alone (fifth column) to 0.1052 in the full model (first column). The smaller selection bias for smoking (as opposed to GPA) is likely due to the small incentive effect ( $δ_{2}$ ) for smoking.

We nonetheless see that the correction of bias is largely due to the inclusion of latent variables (comparing the fourth column with the fifth column) rather than the inclusion of the incentive effect (comparing the second column with the fifth column). On average, one standard deviation increase in peers' smoking frequencies will increase an individual's smoking frequency by 0.4092 times. Our estimate also implies that the social multipliers have respectively an average and a standard deviation equal to 1.9692 and 1.2002.

The effects of individual characteristics show that students who are either Black, Asian, have a mother's education level higher than high school, or have more school-age children at their home tend to smoke less than their school counterparts. On the contrary, students who are either older, Hispanic, having mothers that have less than a high-school level education, having mothers that participate in welfare programs, or having mothers that have professional jobs or missing job information tend to smoke more than others. For contextual effects, a student may smoke more if they are surrounded by more friends who are Black, Asian, or other races. A student may smoke less if they have friends whose mothers participate in welfare programs. For the estimated covariances of disturbances in the outcome equations of GPA and smoking, we find the values are generally negative with an average of −0.2522 and standard deviation of 0.2708.

Finally, as an additional robustness check, we estimate the model with correlated random group effects in activity intensity equations, as discussed in the Online Supplementary Appendix D.2. The estimation results are available in the Online Supplementary Appendix Table F.6. We find the estimates of the local, global network, and incentive effects in network formation, and the endogenous peer effects in the activity intensity equations remain similar to those in Table 2. Since all of the group averages on $X_{g}$ and $Z_{g}$ used in capturing the mean of correlated random effects are insignificant, there are no significant changes of the estimated own and contextual effects. As a result, our estimation results are robust between the fixed and random group effect settings.

5 Conclusion

Researchers are interested in network structures to analyze possible impacts of those structures on activity outcomes. As mentioned in Jackson (2010, Section 5), if networks only serve as conduits for diffusion, for example, diseases or ideas, their impacts on outcomes are somewhat mechanical, and one need not worry about any feedback effects from outcomes. However, for studying the impact of a friendship network on outcomes, both the network structure and strategic interactions between the network and outcomes should be considered. This extra consideration should be reflected in a dynamic or static equilibrium model.

In this paper, we propose a static equilibrium model that accounts for those features. We present a complete information game in which students respond to incentives stemming from their interactions with friends that in turn affect their friendship decisions. We also allow for unobserved individual characteristics in network formation and activity outcome equations.

Our empirical results show that American high school students regard the utility of interactions in academic learning as a significant incentive for forming friendships, whereas the utility of interactions in smoking is not. Another novelty of our approach to the social interaction literature is to present a model that allows correcting for possible friendship selection biases in activity outcomes that can be attributed to the specification of incentive effects, latent characteristic variables, or both.

Some issues that are not emphasized in this paper remain important for future extensions. First, we focus on a complete information setup. If this assumption were appropriate for a school setting, it may be questionable in other economic contexts. Second, we abstract from network games with multiple outcome equilibria. In the paper, we circumvent this issue by focusing on continuous outcome variables. In a multiple equilibria setting with discrete choice outcomes, one could either provide an equilibrium selection rule or characterize the estimation problem with moment inequalities. Finally, an interesting way forward would be to apply our model to the study of other types of networks, for example, criminal networks, physician referral networks, or academic coauthor networks.

A AppendixProof of Proposition 1

The existence and uniqueness of the Nash equilibrium for a fixed network structure follows directly from the literature (e.g., Ballester, Calvó-Armengol, and Zenou (2006), Calvó-Armengol, Patacchini, and Zenou (2009)). We nonetheless include a short proof for completeness.

We start with the case where $y_{i, d}$ is uncensored. Taking the first-order conditions of $U_{i} (W, Y_{1}, \dots, Y_{\bar{d}})$ with respect to $y_{i, d}$ leads to [Image Omitted. See PDF] or, rearranging and writing in a matrix form: [Image Omitted. See PDF] where $B_{u} (Y_{d})$ denotes the best response function.

For any $Y_{d}, {\tilde{Y}}_{d}$ , we have ${‖ B_{u} (Y_{d}) - B_{u} ({\tilde{Y}}_{d}) ‖}_{\infty} = | λ_{d} | {‖ W (Y_{d} - {\tilde{Y}}_{d}) ‖}_{\infty} \leq | λ_{d} | \times {‖ W ‖}_{\infty} {‖ Y_{d} - {\tilde{Y}}_{d} ‖}_{\infty}$ . Then $B_{u} (Y_{d})$ is a contraction mapping whenever $| λ_{d} | < 1 / {‖ W ‖}_{\infty} = 1 / \bar{n}$ . By the Banach fixed-point theorem, this implies that there exists a unique Nash equilibrium of $Y_{d}$ such that $B_{u} (Y_{d}) = Y_{d}$ . It also implies that the linear system (A.1) has a unique solution so that $Y_{d}^{*} = {[I_{m} - λ_{d} W]}^{−1} μ_{d}$ , where the inverse is well-defined.

The case where $y_{i, d}$ is left censored, that is, $y_{i, d} \geq 0$ is similar. Indeed, since $U_{i} (W, Y_{1}, \dots, Y_{\bar{d}})$ is concave in $y_{i, d}$ , the optimal solution $y_{i, d}^{*} = \arg \max_{y_{i, d} \geq 0} U_{i} (W, Y_{1}, \dots, Y_{\bar{d}})$ is given by 0 or by the first order conditions. Formally, [Image Omitted. See PDF] Then, similar to the case where $y_{i, d}$ is uncensored, we can write the vector-valued best response function $B_{c} (Y_{d}) = {[y_{1, d}^{*}, \dots, y_{m, d}^{*}]}^{'}$ . Now, for a fixed value of $Y_{d}$ , note that we necessarily have: ${‖ B_{c} (Y_{d}) - B_{c} ({\tilde{Y}}_{d}) ‖}_{\infty} \leq {‖ B_{u} (Y_{d}) - B_{u} ({\tilde{Y}}_{d}) ‖}_{\infty}$ . This implies that if $B_{u} (Y_{d})$ is a contraction mapping, then so is $B_{c} (Y_{d})$ . Using the same argument as before, there exists a unique Nash equilibrium of the game for left-censored activities.

We now turn to the first stage of the game. Since there exists a unique Nash equilibrium $(Y_{1}^{*} (W), \dots, Y_{\bar{d}}^{*} (W))$ , the value of the network $T (W)$ is uniquely defined. Also, since $τ_{W}$ is drawn from a Type I extreme value distribution, the probability of having more than one network structure maximizing $T (W)$ is zero. There is, therefore, a generically unique strongly efficient network, and the probability that W maximizes T is given by [Image Omitted. See PDF]

Existence is guaranteed by letting the allocation rule $Λ_{i} (W, T) = T (W) / m$ for all i, which implies that strongly efficient networks are individually stable. This completes the proof.

A.1 Comment on the contraction mapping assumption

The argument in Appendix A depends on the best response functions being contraction mappings. This is important for uniqueness and for the solution of equation (A.2) to be found iteratively.

First, note that a contraction mapping is only a sufficient condition and in no way necessary. Also, the use of the infinity norm ${‖ \cdot ‖}_{\infty}$ is stronger than what is needed. Indeed, the argument works as long as there exists a (submultiplicative) matrix norm for which the condition holds.

In particular, one could use the norm given by the spectral radius of W. We focus on the infinity norm since it is more intuitive. Indeed, otherwise, we would have to require individuals to choose their friends such that W has a bounded spectral radius, which carries much less economic intuition than requiring them to choose at most $\bar{n}$ friends.

²For example, job finding and labor force participation (Calvo-Armengol and Jackson (2004, 2007), Bayer, Ross, and Topa (2008)); social learning and knowledge diffusion (Conley and Udry (2001), Conley and Udry (2010)); risk sharing and insurance (Fafchamps and Gubert (2007a, 2007b)); obesity transmission (Christakis and Fowler (2007), Fowler and Christakis (2008)); peer effects on students' academic achievement (Calvó-Armengol, Patacchini, and Zenou (2009)); sport and club participation (Bramoullé, Djebbari, and Fortin (2009), Liu, Patacchini, and Zenou (2014)); and juvenile delinquencies or criminal activities (Patacchini and Zenou (2008), Bayer, Hjalmarsson, and Pozen (2009), Ballester, Zenou, and Calvó-Armengol (2010), Patacchini and Zenou (2012)).

³A static network refers to a cross-sectional case in which only one observation of a network is available. We focus on a static setting because most widely used social network data are cross-sectional, for example, Add Health data (Udry (2003)) and Indian rural village data (Banerjee, Chandrasekhar, Duflo, and Jackson (2013)). Limited network data with a panel structure can be found in the literature dealing with stochastic actor-based dynamic network modeling; for example, see Snijders (2001) and Snijders, Van De Bunt, and Steglich (2010).

⁴Under pairwise independence, the likelihood of the entire network conditional on unobservables is the product of likelihoods from all pairs of agents.

⁵See Wasserman and Pattison (1996), Snijders (2002), and Snijders, Pattison, Robins, and Handcock (2006) for more complete lists of network statistics used in ERGMs.

⁶There is a consequent theoretical literature. We refer the interested reader to Boucher (2016) for a discussion and additional references.

⁷It is called spatial weights matrix, adjacency matrix, or sociomatrix in the literature.

⁸This nonreciprocity is motivated by our empirical application. In fact, 54.36% of friendship links in our dataset (Add Health) are nonreciprocal. This assumption is also present in Hsieh and Lee (2016), Mele (2017b), Jochmans (2018), and others.

⁹We abstract from the activity outcome of discrete choices in this paper as it generally involves multiple equilibria (Krauth (2006), Soetevent and Kooreman (2007)). This is left for future research.

¹⁰Since this is a two-stage game (see Section 2.3), the coefficient $δ_{d}$ can also be interpreted as an activity specific discount factor.

¹¹While our model could be extended to such a setting, it would raise nontrivial identification issues (Cohen-Cole, Liu, and Zenou (2018)), which are left for future research.

¹²We also assume that the unobserved part (for the econometrician) of the utility functions for each activity may be correlated. See details in Section 4.

¹³The randomness is due to the preference shocks $τ_{i, W}$ , which would be unobservable to an econometrician.

¹⁴The dependence of $U_{i}$ on Y is omitted on purpose. The formal definition is presented in Definition 1 below.

¹⁵This definition assumes implicitly that the allocation rule is balanced, as in Dutta and Jackson (2000).

¹⁶This is equivalent to saying that W is individually stable if W is a Nash equilibrium of the game where individuals' payoffs are given by the allocation rule, under the network value $T (W)$ .

¹⁷That is, $(Y_{1}^{*}, \dots, Y_{\bar{d}}^{*})$ such that $y_{i, d}^{*} \in \arg \max_{y_{i, d}} U_{i} (W, Y_{1}^{*}, \dots, y_{i, d}, Y_{- i, d}^{*}, \dots, Y_{\bar{d}}^{*})$ for all i and d.

¹⁸See the proof of Proposition 1 for a discussion of the strength of this assumption.

¹⁹Boucher (2016) studied a model of conformism having an endogenous network. He shows that conformism has non-monotonic effects on the value of network links.

²⁰It is possible to specify $c_{i j} = | c_{i} - c_{j} |$ if $c_{i}$ is continuous. For binary $c_{i}$ , however, we prefer the use of dummy variables $c_{i j}$ taking a value of 1 if i and j have the same value. Of course, this is fully equivalent to taking the distance, which would take a value of 1 if i and j have different values, and 0 otherwise.

²¹Fruehwirth (2014) argued that the rationale of including peer activity outcome into the social interactions model is to proxy for the unobserved peer inputs or characteristics. Therefore, even if the model is identified, the estimated effects of peer characteristics from the social interactions model with the endogenous effect is hard to interpret and used for policy suggestion. Thus, an intuitive way to alleviate such a concern is to distinguish the effect of unobserved peer characteristics from the endogenous effect of peer activity outcome by directly specifying WZ in the model.

²²We also provide additional discussion in the Online Supplementary Appendix B.

²³This is indeed what we find in our empirical study; see Table 2.

²⁴In the Online Supplementary Appendix C, we present the model likelihood function for the censored activity outcome case.

²⁵Of course, assuming independence across groups, the joint likelihood (over all groups) can be written as $\prod_{g = 1}^{G} P (W_{g}, Y_{g} | θ_{g}, α_{g}, Z_{g})$ .

²⁶The justification of these identification restrictions can be found in the Online Supplementary Appendix of Hsieh and Van Kippersluis (2018).

²⁷Alternative estimation approaches include the maximum pseudo likelihood approach (Besag (1974), Strauss and Ikeda (1990), Boucher and Mourifié (2017)) and Monte Carlo maximum likelihood (Geyer and Thompson (1992)).

²⁸Mele (2017b) used the term “approximate exchange algorithm” instead of double M–H algorithm in his paper. He also provides the formal statement of the convergence of the algorithm in Appendix B of his paper.

²⁹Similar local and global updates are suggested in Snijders (2002) and Mele (2017b) to improve the convergence of graph sampling, particularly when the graph distribution exhibits a bimodal shape, one mode having low and the other high graph densities. In practice, we set $R = 2$ and the probability of global update $P_{inv} = 0.01$ in the following simulation and empirical studies.

³⁰Discussions about how academic performance and smoking affect friendship selections can be found in, for example, Kiuru, Burk, Laursen, Salmela-Aro, and Nurmi (2010), Lomi, Snijders, Steglich, and Torló (2011), Flashman (2012), Schaefer, Haas et al. (2013). Other activities may also affect friendship choices. We focus on academic performance and smoking because they are the key subjects of interest discussed in the literature of social interactions.

³¹To clarify, we do not use the Add Health saturation sample (Udry (2003)) which consists of 16 schools. In this saturation sample, all enrolled students in the schools were selected for in-home interviews; thus, it is an ideal sample if information from in-home interviews is needed. However, since we do not use variables from the in-home survey, we do not use the saturation sample.

³³Comparing the results of the multiple-activity model in Table 2 with those of the single-activity models in the Online Supplementary Appendix Tables F.3 and F.4, we find that the latent variables in the local network effects in single-activity models have smaller estimated coefficients. Also, the estimated incentive effect from smoking is higher, and the estimated endogenous effect on smoking is lower in the single-activity model compared to those in Table 2. These differences illustrate the potential concern of omitted-variable biases when activity outcomes are modeled separately.

³⁴In Table 2, the mean and the standard deviation (in parentheses) of the MCMC posterior draws are reported as point estimates for each parameter. We set the hyperparameters in the prior distributions to be identical to those used in the simulation study presented in the Online Supplementary Appendix E.

³⁶Because the exact likelihood value for the full model in equation (12) is unavailable due to the intractable denominator, we cannot directly apply the likelihood-based model selection criteria to choose the number of latent dimensions in the full model. Alternatively, we determine the latent dimensions based on the model in which the global network effects and the incentive effects are taken away. When there are no global network effects or incentive effects in the network formation model, each link becomes conditionally independent given the latent variables. In that case, the likelihood value can be computed, and we can apply the Akaike's information criterion–Monte Carlo (AICM) as proposed by Raftery, Newton, Satagopan, and Krivitsky (2007) to choose the latent dimension (Hsieh and Lee (2016)). We report the estimation results for that model having one to four latent variable dimensions and the corresponding AICM values in the Online Supplementary Appendix Table F.5. Dimension three is chosen due to it having the smallest AICM value.

³⁷This percentage is obtained by dividing the difference of estimated λ's between the fourth and the fifth columns with the difference of estimated λ's between the first and the fifth columns.

Word count: 13986

Show less

© 2020. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

We model network formation and interactions under a unified framework by considering that individuals anticipate the effect of network structure on the utility of network interactions when choosing links. There are two advantages of this modeling approach: first, we can evaluate whether network interactions drive friendship formation or not. Second, we can control for the friendship selection bias on estimated interaction effects. We provide microfoundations of this statistical model based on the subgame perfect equilibrium of a two‐stage game and propose a Bayesian MCMC approach for estimating the model. We apply the model to study American high school students' friendship networks using the Add Health dataset. From two interaction variables, GPA and smoking frequency, we find that the utility of interactions in academic learning is important for friendship formation, whereas the utility of interactions in smoking is not. However, both GPA and smoking frequency are subject to significant peer effects.

Details

Title

Specification and estimation of network formation and network interaction models with the exponential probability distribution

Author

Chih‐Sheng Hsieh¹; Lung‐Fei Lee²; Boucher, Vincent³

¹ Department of Economics, National Taiwan University
² Department of Economics, The Ohio State University
³ Department of Economics, Université Laval; CRREP; CREATE

Pages

1349-1390

Section

Original Articles

Publication year

2020

Publication date

Nov 2020

Publisher

John Wiley & Sons, Inc.

ISSN

17597323

e-ISSN

17597331

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3982/QE944

ProQuest document ID

2643973105

Specification and estimation of network formation and network interaction models with the exponential probability distribution

Jump to:

Full text

Abstract

Details

Suggested sources