Full Text

Turn on search term navigation

Introduction

Nestedness is a statistical property of systems where two kinds of entity interact, which can be represented as bipartite networks. Originally used as a metric for species-site distributions ^1,
2, nestedness has recently gathered much attention as a metric for bipartite species interaction networks, e.g. plant-pollinator mutualisms ^3,
4 and host-virus interactions ^5–
7. Various discussions have considered the sources of nestedness in such systems and its potential implications for ecological dynamics ^4,
8–
13. However, it is unclear how to systematically compare results for different ecological datasets. Furthermore, nestedness is not restricted to ecological datasets, but is a generic property of any bipartite network. Thus, there is a need for measures of nestedness that are context-independent and do not depend on any particular (ecological) interpretation. Multiple methods for measuring nestedness have been used in different studies, along with multiple approaches to calculating statistical significance of the measured values. This provides a large number of ways in which nestedness could be evaluated ^14–
16. Before theoretical investigations of the mechanisms of nestedness can be properly undertaken, robust measures and statistical tests for nestedness are required to allow comparison of results from different studies.

Here we present FALCON – a free software package that allows the user to easily compute several measures of nestedness and associated statistical significances based on a selection of null models. FALCON stands for “Framework for Adaptive ensembLes for the Comparison Of Nestedness”. FALCON operates on any form of bipartite interaction data represented as a matrix of associations and is set up to be deliberately ‘blind’ to the source and interpretation of input data. FALCON is based on the assumption that nestedness is a general statistical property of matrices and therefore its measurement should be independent of context or interpretation. FALCON calculates nestedness as a statistical property of a matrix, by returning the nestedness score for the most-nested configuration of the input matrix. Since calculating statistical significance of nestedness scores can be computationally demanding, involving generation of a large ensemble of matrices from a null distribution, FALCON uses a novel “adaptive ensemble” method to improve efficiency by using the minimal ensemble size sufficient to give robust statistics.

Several software packages for calculating nestedness already exist – including ^{1,
11,
17–
21}, but these are subject to various factors which make the direct comparison of different nestedness measures and the statistical interpretation of returned values difficult to achieve. Several nestedness measures are handled by packages which deliver a single measure, making the comparison difficult. Some are specific to a particular operating system. Some do not make the source codes available for re-implementation, reducing confidence in their outputs and prevent future extensions. Two packages for the R statistical programming language, bipartite ¹⁹ and vegan ²¹, together contain functions for several nestedness measures and associated null models, as well as many other tools for analysis of bipartite ecological networks. However, these packages offer no obvious implementation of significance testing (the principal method for reporting results of nestedness analyses) and they also lack several nestedness measures which have been recently developed. FALCON is designed to address these deficiencies, enabling the calculation of nestedness and statistical significance by using a variety of measures and null models, with open source code provided for several platforms.

The FALCON package is available for three commonly used numerical analysis platforms: MATLAB, Octave and R. MATLAB ( http://www.mathworks.co.uk/products/matlab/) is a commercial software platform, while Octave ( https://www.gnu.org/software/octave/) and R ( http://www.r-project.org/) are both freely available open source platforms. FALCON can be freely downloaded on Github ( http://github.com/sjbeckett/FALCON) or figshare ²² and all code is open and accessible. A guide to downloading, installing and running FALCON accompanies the code. This document describes the assumptions on which FALCON is based, how it calculates nestedness and statistical significance, gives details of the adaptive ensemble method used to improve computational efficiency and provides a case study to demonstrate its usage and outputs.

1 What is nestedness?

Nestedness is a statistical property of bipartite interaction data presented in matrix form. In a perfectly nested matrix, the entries in each successive row are a strict subset of those in the previous row, while the entries in each successive column are a strict subset of those in the previous column ( Figure 1). Interpretation of nestedness depends on context.

Figure 1.

Perfectly nested, weakly nested, and randomly connected matrices.

White squares indicate connections between two kinds of entity arranged in rows and columns.

The concept of nestedness was first described in studies on how species distributions varied between sites ^23–
25, and later defined quantitatively as measuring the ‘amount of order/disorder’ in matrices representing the presence/absence of species in island communities ¹. Used in this way, nestedness is calculated from a matrix of presence-absence data where rows are species and columns are sampling sites along some environmental or spatial gradient. A perfectly “nested” matrix (see Figure 1) would be achieved when the set of species present at each site along the gradient is a subset of the species present at the previous site. Since then, the concept of nestedness has been extended in various directions; see ²⁶ for an historical overview of the nestedness concept. Nestedness has continued to be applied to spatial patterning (e.g. ²⁷) and has been linked with β-diversity ²⁸, but has also been applied to study mutualistic or antagonistic species-species interactions ^29,
30, species-time relationships for a single site ³¹, and several other types of bipartite networks ^{9,
10,
32–
35}. For pairwise interactions (e.g. plant-pollinator or host-parasite systems), nestedness has been interpreted as placing species along a gradient of generalism-specialism in the number of partners they interact with; in this context, perfect nestedness is achieved when species within each class are ordered such that the interaction set (set of partners) for each species is a strict subset of that of the next species, and the most generalised species of one class interact with the most specialised species of the other class.

Nestedness is calculated from a biadjacency matrix representing pairwise interactions between two kinds of entity (one represented by rows, the other by columns). The order of rows and columns for a biadjacency matrix is arbitrary with respect to connectivity; rows and columns can be permuted without affecting the underlying topology of the interaction network. Any non-arbitrary ordering of rows and columns in the matrix representation necessitates the use of additional contextual information to specify which order rows and columns should take. While some datasets may suggest a “natural” ordering to rows and columns in the matrix representation of data (e.g. when one of the dimensions represents an environmental/spatial/temporal gradient), for many applications of nestedness there is no natural ordering (e.g. species interactions).

As stated above, we consider that nestedness should be a context-free metric, so that it can be applied to data without requiring any supplemental information on row/column ordering. This assumption implies that the ordering of rows and columns should not affect the measurement of nestedness. While some nestedness measures are insensitive to row/column ordering, several of the most commonly used measures are highly sensitive to ordering, introducing indeterminacy to the quantification of nestedness when rows/columns are ordered arbitrarily. To avoid this indeterminacy and return a single robust nestedness score for a given input matrix, FALCON can sort the rows and columns such that nestedness (however calculated) is maximised. Since re-ordering rows/columns in a matrix representation does not alter the structural information (node adjacency) of the underlying data, this re-ordering is a reasonable approach and makes the measurement of nestedness more consistent.

2 Measures of nestedness in FALCON

Nestedness is most commonly calculated for binary data representing presence/absence of an interaction between two entities, but can also be calculated for weighted data that indicate the strength of the interaction. The methods used to calculate nestedness vary depending on whether binary or weighted interaction data are provided. The nestedness measures available in FALCON are shown and briefly described below and in Table 1; further details are given in Appendix A.

Table 1.

Nestedness measures available in FALCON.

Name	Shorthand	Binary/ Weighted	Brief description	Reference
Nestedness based on overlap and decreasing fill	NODF	Binary	Pairwise row and column comparisons	[ 36]
Manhattan distance	MD	Binary	Sum of row and column indexes of connections	[ 37]
Nestedness temperature calculator	NTC	Binary	Difference from an ‘isocline of perfect order’	[ 1] [ 21]
Johnson, Domínguez- García & Muñoz	JDM	Binary	Measure of dissassortivity using configuration model	[ 38]
Discrepancy	BR	Binary	Difference from a ‘maximally packed’ matrix	[ 2]
Weighted NODF	WNODF	Weighted	Weighted version of NODF	[ 20]
Spectral radius	SR	Both	Maximum real eigenvalue of adjacency matrix	[ 11]

The nestedness measures considered here are not trivial variations upon each other, but differ significantly in their derivations. However, some similarities can be drawn. Spectral radius (SR) ¹¹ and the measure of Johnson, Domínguez-García, & Muñoz ³⁸ (JDM) are invariant to the ordering of rows and columns in the network and are calculated using the adjacency matrix of the network. On the other hand, discrepancy (BR) ², Manhattan distance (MD) ³⁷ and nestedness based on overlap and decreasing fill (NODF) ³⁶ are all sensitive to row/column ordering and are maximised when rows/columns are ranked by degree. The nestedness temperature calculator (NTC) ^1,
21 involves sorting of rows and columns against the ‘isocline of perfect order’ (see Figure 5) such that it maximises connections above the isocline and minimises connections below the isocline. BR is similarly calculated relative to an idealised ‘maximally packed’ matrix. NODF is found through pairwise comparisons of overlap between subsequent rows and columns, whilst MD is found by assigning a weight to each connection as a sum of it’s row and column indexes. The measures also differ in how nestedness is scored; the degree of nestedness in a network increases with increasing measure score for JDM, NODF and SR, but with decreasing measure score for BR, MD and NTC.

3 Comparison of nestedness scores

Nestedness is strongly sensitive to the size (number of rows and columns) and fill (number of non-zero entries) of the input matrix ¹⁷. This is problematic in practical terms, since we often wish to compare nestedness of matrices that differ in these basic properties; in fact, cases where we compare empirically derived matrices with identical size and fill are an exception. Thus comparison of absolute values of nestedness metrics is not informative and may be misleading. To compare nestedness of matrices with differing size and fill, observed nestedness should always be interpreted in the context of a null distribution of matrices with similar properties. Measuring observed nestedness relative to expected nestedness derived from a null distribution of similar matrices allows determination of both effect size (e.g. as a z-score, which is commonly used to compare different nestedness schemes ^26,
39) and statistical significance (e.g. as a p-value giving the expected frequency of the observed score in the null distribution). This approach necessitates choice of a suitable null model and generation of a distribution of random matrices drawn from it.

In the present context, a null model is a method for creating a distribution of matrices that conserves some properties of the input matrix while varying other properties at random ⁴⁰. We continue the “context-free” approach in our treatment of null models; to allow comparison of nestedness across different scenarios, a good null model should not make assumptions about the mechanisms by which data were generated, but treat the matrix as an independent data structure. However, to be comparable to the input matrix, null matrices must conserve some key matrix properties (such as size and fill) on which nestedness depends. The null models available in FALCON are given in Table 2; further detail is given in Appendix B. FALCON includes some of the more popular null models from the literature, alongside some additional null models that we feel can be useful. Null models vary in whether the original data is binary or quantitative, and in which properties of the original input matrix are preserved.

Table 2.

Null models in FALCON.

Name	Description	Binary/ weighted	Conserved features	Reference
SS	Shuffles positions randomly	Binary	Shape, Fill	[ 11]
FF	Permutations of structure with same node degrees	Binary	Shape, Fill, Degree	[ 41]
CC	Some structure is preserved, rest is shuffled	Binary	Shape, Fill
DD	Determined probabilistically by node degree	Binary	Probabilistic Degree	[ 3]
EE	Determined probabilistically by fill	Binary	Probabilistic Fill
Binary Shuffle	Order of weighted links is swapped	Weighted	Binary positions, weights	[ 11]
CRT	Random weights where row totals conserved	Weighted	Binary positions, row totals
CCT	Random weights where column totals conserved	Weighted	Binary positions, column totals
RCTA	Average of conserve row totals and column totals	Weighted	Binary positions

4 How FALCON works

4.1 Inputs and outputs

FALCON requires several inputs:

an input network in the form of a bipartite matrix

whether binary or quantitative nestedness should be investigated (quantitative matrices can be analysed using binary measures)

whether to sort rows and columns to maximise nestedness score

which nestedness measures should be used

which null models nestedness should be tested under

whether the ensemble of null models should be created with a fixed number or adaptively chosen

whether or not to plot the distributions of nestedness scores

Output is returned to the user in the form of:

the most nested configuration of the input matrix

the nestedness measure(s) of the input matrix

the expected value of nestedness under the null model(s) (as the mean measure of matrices created in the ensemble)

the number of ensemble members used to calculate significance in each null model

the statistical significance of the nestedness of the input matrix against each null model as a p-value

the standard deviation and sample z-scores of the measure in the ensemble as well as other properties.

4.2 What FALCON does

FALCON follows the process shown in Figure 2. First, it sorts the user input matrix into a maximally nested configuration and removes any empty rows/columns before finding the nestedness of this matrix using the users chosen measures. Then, FALCON goes through each of the user specified null models one by one, creating an ensemble of null matrices according to the rules of each null model. Each null matrix is then sorted and measured by each of the chosen nestedness measures. Thus, for each null model, nestedness measures are calculated for each of the null matrices in a single null ensemble, enabling direct comparison of results. The size of the null ensemble is determined by the input choice of using either the fixed or adaptive ensemble size (see Section 4.5). Statistics are computed from the measures found in the null ensemble (and the direction in which that nestedness measure is calculated), before the next null model ensemble is instantiated. Once all null models have been computed, the results are returned to the user.

Figure 2.

FALCON algorithmic procedure.

4.3 Direction of increasing nestedness

For different nestedness measures, increasing scores can represent either increasing or decreasing nestedness as discussed in Section 2. FALCON initially determines whether a higher measure score is related to greater nestedness (or vice versa) in the chosen measure by comparing the scores returned for a highly nested network (see Figure 3A) and a highly non-nested network (a weighted checkerboard configuration; Figure 3B), for which the fill (number of non-zero elements) and element sums are equal. The direction of increasing nestedness for a given measure is used during calculation of statistical significance. This method of determining direction each time the algorithm runs is included to allow easy extensibility; if a new measure is added, FALCON will automatically determine which direction indicates increasing nestedness.

Figure 3.

( A) Weighted nested matrix. ( B) Weighted non-nested matrix (checkerboard). Both ( A) and ( B) have 55 non-zero elements that sum to 220.

4.4 Initial sort

For efficiency, FALCON is set up to initially sort the input matrix by row and column degrees for calculation of BR, MD and NODF, retains this sorted configuration for calculation of JDM and SR, and subsequently re-sorts for NTC in order to find the maximal nestedness of a binary matrix. For quantitative data, FALCON uses the same methods as for binary interactions, but also utilises weight data to break symmetry when two rows (columns) have the same degree; in this case, the row (column) which has greater values for most overlapping elements is ranked highest. Where two or more rows (columns) share the same degree and most overlapping elements, the rows (columns) are ranked according to the total sum of row (column) elements. This sorting does not affect the underlying topology or the relationships in the data. FALCON also allows the user to decide if any sorting is performed, enabling the “context free” assumption to be relaxed (e.g. for investigation of gradient-based nestedness ³⁹).

4.5 Size of null ensemble

FALCON uses a bootstrap method to calculate the statistical significance of a given nestedness score, since the true null distributions of the test statistics are not known. The ensemble size used for this calculation can either be fixed or calculated adaptively by FALCON to improve computational efficiency and reduce undersampling effects. Note that the strongest significance that can be assigned is $P < \frac{1}{N}$ where N is the ensemble size.

Fixed. The number of null matrices used to make up the ensemble is fixed by the user. This method is effective providing that the ensemble is large enough to have statistical power; the larger the ensemble, the more power the test has and the closer the answer will be to the p-value for the (unknown) true null distribution. However, it is not obvious how large the ensemble needs to be; in the literature, amongst others ³⁰, use 1,000 null models in their ensembles, whilst ¹² use 10,000, and ⁶ use 100,000. A large number of different null matrix configurations are possible for a given input matrix and we may wish to avoid undersampling ⁴²; however, at the same time very large ensembles can make the calculation of significance computationally intractable.

Adaptive. FALCON includes a mechanism for adaptive determination of ensemble size. This is intended to ensure robust statistics are achieved, avoiding concerns about undersampling or oversampling ⁴², while minimising computational load. The adaptive method works by creating two ensembles in parallel using the same null model. Starting with a minimum ensemble size of 500 in each group, the ensembles are expanded until they show similar statistical properties. This condition is met when the null hypothesis (both ensembles come from the same distribution) of a Mann-Whitney U-test cannot be rejected at 10% significance. When this occurs, it suggests each group represents a good sample of the underlying distribution, and the two groups are combined to form a single null ensemble used to calculate final statistics. The expansion of the size of the ensemble has an upper limit of 100,000 members in case the null hypothesis is always rejected. The adaptive ensemble methods balances statistical precision with computational efficiency; we conservatively use 1,000 as a minimum final ensemble size such that a p-value as low as 0.001 can be assigned.

4.6 Output statistics

p-value. The p-value is the probability that a matrix drawn from the null distribution will be more nested than the input matrix. Low values ( p → 0) indicate that the input matrix is highly nested relative to the null distribution; commonly a threshold of p ≤ 0.05 or p ≤ 0.01 is used to denote a statistically significant level of nestedness. Here p is calculated by counting the frequency of matrices in the null ensemble that are more nested than the input matrix; for cases where no member of the null ensemble is more nested than the input matrix we conservatively assign $P < \frac{1}{N}$ where N is the ensemble size.

Normalised Temperature. The normalised temperature is inspired by the τ-Temperature ³⁷. It describes the relationship between the nestedness measure found for the input matrix and the expected nestedness measure derived from the null model ensemble. It is described as:

$T = \frac{M e a s u r e}{< M e a s u r e >} (1)$

where < Measure > denotes the expected value. In simple terms, the normalised temperature indicates whether the input matrix is more or less nested than the expectation for a null distribution of similar matrices. Where the measure gives increasing scores with increasing nestedness, T > 1 indicates greater-than-expected nestedness. Where the measure gives decreasing scores with increasing nestedness, T < 1 indicates greater-than-expected nestedness.

Mean. The mean average of the set of nestedness measures found for each of the ensemble members is returned.

Standard Deviation. The standard deviation ( σ) of the set of nestedness measures found for each of the ensemble members is returned.

Sample z-score. The z-score, or standard score, is calculated as the difference between the nestedness measure and its expected value divided by the standard deviation of the sample:

$z = \frac{M e a s u r e - < M e a s u r e >}{σ} (2)$

It is a measure of the number of standard deviations the nestedness measure of the input matrix is above the expected value. Hence, the way it should be interpreted, as with the normalised temperature, depends on whether nestedness increases with increasing measure score.

5 FALCON usage - case study

To demonstrate FALCON we analyse nestedness analysis in a bipartite network representing the hashtags used by a sample of Twitter users. Data were collected using the Twitter API ( https://dev.twitter.com/docs/api) by searching for tweets including the hashtag “#IPCC” in the time period 21st September 2013 – 5th October 2013. A list of all hashtags used by all users found in the search dataset was then used to create a binary bipartite adjacency matrix for users and hashtags. This was then sampled to create a smaller matrix used for this case study by including each row/column with probability of 1.1 and removing any empty rows/columns. The resulting matrix was stored in a comma-separated file called ’IPCC_HTuse_10_10_1_53x27.csv’.

The box below shows the command sequence used to perform a binary nestedness analysis using FALCON in MATLAB. The first line reads in the “.csv” datafile, which includes row and column headers. The second line extracts the adjacency matrix from the imported data. The third line runs FALCON, using two binary nestedness measures (NODF and SR) and two null models (CC and FF, i.e. nulls 2 and 3) using the adaptive solver and displaying histogram plots. The fourth line plots the input matrix in its most nested configuration, as determined by FALCON. The nested configuration of the matrix and output histograms from significance testing are shown in Figure 4, whilst Table 3 shows an example output from the significance testing. Further examples for use of FALCON in R are given in supporting information accompanying the software.

Figure 4.

Example output from FALCON.

( A) the nested arrangement of the UserHashtag data. ( B) Distribution of NODF scores found for an ensemble of FF null models generated for the UserHashtag network ( C) distribution of spectral radius scores found from the same null matrix ensemble. The asterix marks the nestedness score of the input network.

Table 3.

Example output from FALCON showing sample statistics for the UserHashtag network using the operations in the box above.

Nestedness statistics were computed for the FF and CC null models using the NODF and spectral radius measures.

Null model	FF		CC
Nestedness measure	NODF	SR	NODF	SR
Measure	44.4339	7.5552	44.4339	7.5552
Ensemble Size	1000	1000	1000	1000
Mean	45.3207	7.5346	29.9048	6.4755
Standard Deviation	0.8357	0.0336	3.5600	0.2136
z-score	-1.0612	0.6123	4.0812	5.0550
p-value	0.8490	0.2480	<0.001	<0.001
Normalised Temperature	0.9804	1.0027	1.4858	1.1667

1 >> UserHashtag = importdata(‘IPCC_HTuse_10_10_1_53x27.csv’) 2 >> data = UserHashtag.data; 3 >> output = PERFORM_NESTED_TEST(data,1,1,{’NODF’, ’SPECTRAL_RADIUS’},[2,3],[],1) 4 >> MATRIXPLOT(output.NestedConfig.DegreeMatrix)

6 Summary

In this paper we have presented FALCON, a software tool for reliable and efficient calculation of nestedness (and associated effect size and statistical significance) based on a selection of popular nestedness measures and null models used in the literature. FALCON treats nestedness purely as a statistical property of a bipartite matrix and removes any form of interpretation or contextual information from the analysis. This enables FALCON to be used to compare nestedness across a wide variety of application areas, noting that the concept of nestedness has already spread from its origin in island biogeography to include species-species interactions and other scenarios, and is likely to find further applications in other domains. The contribution of FALCON is to enable easy cross-comparison of observed nestedness using different nestedness measures and null models. We hope that this functionality will allow greater methodological uniformity and comparability of studies of nestedness. We are in the process of performing a large comparison study of nestedness metrics using FALCON (Beckett and Williams., in preparation). Uniformity of measurement and comparability of empirical results is an important preliminary step that must be achieved to enable understanding of the mechanistic basis and ecological (and otherwise) implications of nestedness. We hope that FALCON will be of use to other researchers and help illuminate this intriguing property of bipartite networks in many natural systems.

7 Software availability

Software access

The FALCON software package is available from Github: http://github.com/sjbeckett/FALCON

Source code as at the time of publication

https://github.com/F1000Research/FALCON

Archived source as at the time of publication

http://dx.doi.org/10.6084/m9.figshare.999117 ²²

AuthorAffiliation

1 College of Life and Environmental Sciences, University of Exeter, Exeter, EX4 4QE,, UK

Word count: 4036

Show less

Copyright: © 2014 Beckett SJ et al. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Nestedness is a statistical measure used to interpret bipartite interaction data in several ecological and evolutionary contexts, e.g. biogeography (species-site relationships) and species interactions (plant-pollinator and host-parasite networks). Multiple methods have been used to evaluate nestedness, which differ in how the metrics for nestedness are determined. Furthermore, several different null models have been used to calculate statistical significance of nestedness scores. The profusion of measures and null models, many of which give conflicting results, is problematic for comparison of nestedness across different studies.

We developed the FALCON software package to allow easy and efficient comparison of nestedness scores and statistical significances for a given input network, using a selection of the more popular measures and null models from the current literature. FALCON currently includes six measures and five null models for nestedness in binary networks, and two measures and four null models for nestedness in weighted networks. The FALCON software is designed to be efficient and easy to use. FALCON code is offered in three languages (R, MATLAB, Octave) and is designed to be modular and extensible, enabling users to easily expand its functionality by adding further measures and null models. FALCON provides a robust methodology for comparing the strength and significance of nestedness in a given bipartite network using multiple measures and null models. It includes an “adaptive ensemble” method to reduce undersampling of the null distribution when calculating statistical significance. It can work with binary or weighted input networks. FALCON is a response to the proliferation of different nestedness measures and associated null models in the literature. It allows easy and efficient calculation of nestedness scores and statistical significances using different methods, enabling comparison of results from different studies and thereby supporting theoretical study of the causes and implications of nestedness in different biological contexts.

Details

Title

FALCON: a software package for analysis of nestedness in bipartite networks

Author

Beckett, Stephen J; Boulton, Chris A; Williams Hywel T. P.

University/institution

U.S. National Institutes of Health/National Library of Medicine

Publication year

2014

Publication date

2014

Publisher

Faculty of 1000 Ltd.

e-ISSN

20461402

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.12688/f1000research.4831.1

ProQuest document ID

2623274021

FALCON: a software package for analysis of nestedness in bipartite networks

Jump to:

Full Text

Abstract

Details

Suggested sources