Abstract

Network homophily, the tendency of similar nodes to be connected, and transitivity, the tendency of two nodes to be connected if they share a common neighbor, are conflated properties in network analysis since one mechanism can drive the other. Here, we present a generative model and corresponding inference procedure that are capable of distinguishing between both mechanisms. Our approach is based on a variation of the stochastic block model (SBM) with the addition of triadic closure edges, and its inference can identify the most plausible mechanism responsible for the existence of every edge in the network, in addition to the underlying community structure itself. We show how the method can evade the detection of spurious communities caused solely by the formation of triangles in the network and how it can improve the performance of edge prediction when compared to the pure version of the SBM without triadic closure.

Alternate abstract:

Plain Language Summary

The network of social connections between friends, the interactions between proteins, metabolic relationships in the cell, links between websites, and many other systems are almost always the result of a mixture of generative mechanisms. These mechanisms often operate at distinct scales—globally or locally—but nevertheless leave traces in the network structure that are difficult to distinguish from one other. Here, we provide a way to distinguish two key generative mechanisms based only on a final snapshot of the system.

In our study, we explore two network processes: homophily (the tendency of two nodes to connect if they share some underlying property) and triadic closure (the tendency of two nodes to connect if they already share a neighbor). Although distinct, these two processes lead to similar observed patterns in the network.

For each link in a network, our method can reveal whether it was more likely the result of triadic closure or homophily. From this, we can decide if dense “communities” in the network are more likely the result of one or the other. Likewise, we can tell if the presence of “triangles” (groups of three nodes all connected to each other) is a direct result of triadic closure or homophily. This has important implications for the interpretation of network data and also for the prediction of missing or unobserved edges in the networks.

Our methodology paves the way for a general, principled, and effective approach to disentangling local, global, and mesoscopic mechanisms of network formation.

Details

Title
Disentangling Homophily, Community Structure, and Triadic Closure in Networks
Author
Peixoto, Tiago P  VIAFID ORCID Logo 
Publication year
2022
Publication date
Jan-Mar 2022
Publisher
American Physical Society
e-ISSN
21603308
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2731133322
Copyright
© 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.