Introduction
A longstanding mystery in evolutionary genetics is that the observed levels of genetic variation across sexual species span an unexpectedly narrow range. Under neutral theory, the average number of nucleotide differences between sequences (pairwise diversity, π) is determined by the balance of new mutations and their loss by genetic drift (Kimura and Crow, 1964; Malécot, 1948; Wright, 1931). In particular, expected pairwise diversity at neutral sites in a panmictic population of diploids is , where μ is the per basepair per generation mutation rate. Given that metazoan germline mutation rates only differ 10-fold (10−8–10−9, Kondrashov and Kondrashov, 2010; Lynch, 2010), and census sizes vary over several orders of magnitude, under neutral theory one would expect that pairwise diversity also vary over several orders of magnitude. However, early allozyme surveys revealed that diversity levels across a wide range of species varied just an order of magnitude (Lewontin, 1974, p. 208); this is known as Lewontin’s ‘‘Paradox of Variation’. With modern sequencing-based estimates of π across taxa ranging over only three orders of magnitude (0.01–10%, Leffler et al., 2012), Lewontin’s paradox remains unresolved through the genomics era.
Early on, explanations for Lewontin’s Paradox have been framed in terms of the neutralist–selectionist controversy (Lewontin, 1974; Kimura, 1984; Gillespie, 1991; Gillespie, 2001). The neutralist view is that beneficial alleles are sufficiently rare and deleterious alleles are removed sufficiently quickly, that levels of genetic diversity are shaped predominantly by genetic drift and mutation (Kimura, 1984). Specifically,
Possible resolutions of Lewontin’s Paradox
A resolution of Lewontin’s Paradox would involve a mechanistic description and quantification of the evolutionary processes that prevent diversity from scaling with census sizes across species. This would necessarily connect to the broader literature on the empirical relationship between diversity and population size (Frankham, 1996; Nei and Graur, 1984; Soulé, 1976; Leroy et al., 2021), and the ecological and life history correlates of genetic diversity (Nevo, 1978; Powell, 1975; Nevo et al., 1984). Three categories of processes stand out as potentially capable of decoupling census sizes from diversity: non-equilibrium demography, variance and skew in reproductive success, and selective processes.
It has long been appreciated that effective population sizes are typically less than census population sizes, tracing back to early debates between R.A. Fisher and Sewall Wright (Fisher and Ford, 1947; Wright, 1948). Possible causes of this divergence between effective and census population sizes include demographic history (e.g. population bottlenecks), extinction and recolonization dynamics, or the breeding structure of populations (e.g. the variance in reproductive success and population substructure). Early explanations for Lewontin’s Paradox suggested bottlenecks during the last glacial maximum severely reduced population sizes (Kimura, 1984; Ohta and Kimura, 1973; Nei and Graur, 1984), and emphasized that large populations recover to equilibrium diversity levels more slowly (Nei and Graur, 1984, Kimura, 1984 p. 203–204). Another explanation is that cosmopolitan species repeatedly endure extinction and recolonization events, which reduces effective population size (Maruyama and Kimura, 1980; Slatkin, 1977).
While intermittent demographic events like bottlenecks and recent expansions have long-term impacts on diversity (since mutation-drift equilibrium is reached on the order of size of the population), characteristics of the breeding structure such as high variance () or skew in reproductive success continuously suppress diversity below the levels predicted by the census size (Wright, 1938). For example, in many marine animals, females are highly fecund, and dispersing larvae face extremely low survivorship, leading to high variance in reproductive success (Waples et al., 2018; Waples et al., 2013; Hedgecock and Pudovkin, 2011; Hauser and Carvalho, 2008). Such ‘‘sweepstakes’ reproductive systems can lead to remarkably small ratios of effective to census population size (e.g. can range from 10–6–10–2), since (Hedgecock, 1994; Wright, 1938; Nunney, 1993), and require multiple-merger coalescent processes to describe their genealogies (Eldon and Wakeley, 2006). Overall, these reproductive systems diminish the diversity in some species, but seem unlikely to explain Lewontin’s Paradox broadly across metazoans.
Alternatively, selective processes, and in particular the indirect effects of selection on linked neutral variation, could potentially explain the observed narrow range of diversity. The earliest mathematical model of hitchhiking was proffered as a explanation of Lewontin’s Paradox (Smith and Haigh, 1974). Since, linked selection has been shown to impact diversity levels in a variety of species, as evidenced by the correlation between recombination and diversity (Aguade et al., 1989; Begun and Aquadro, 1992; Cutter and Payseur, 2003; Stephan and Langley, 1998; Cai et al., 2009). Theoretic work to explain this pattern has considered the impact of a steady influx of beneficial mutations (recurrent hitchhiking; Stephan et al., 1992; Stephan, 1995), and purifying selection against deleterious mutations (background selection, BGS; Charlesworth et al., 1993; Nordborg et al., 1996; Hudson and Kaplan, 1994). Indeed, empirical work indicates background selection diminishes diversity around genic regions in a variety of species (McVicker et al., 2009; Hernandez et al., 2011; Charlesworth, 1996), and now efforts have shifted towards teasing apart the effects of positive and negative selection on genomic diversity (Elyashiv et al., 2016).
A class of models that are of particular interest in the context of Lewontin’s Paradox are recurrent hitchhiking models that decouple diversity from the census population size. These models predict diversity levels when strongly selected beneficial mutations regularly enter and sweep through the population, trapping lineages and forcing them to coalesce (Kaplan et al., 1989; Gillespie, 2000). In general, decoupling occurs under these hitchhiking models when the rate of coalescence due to selection is much greater than the rate of neutral coalescence (e.g. Coop and Ralph, 2012, Equation 22). In contrast, under other linked selection models, the resulting effective population size is proportional to population size; these models cannot decouple diversity, all else equal. For example, models of background selection and polygenic fitness variation predict diversity is proportional to population size, mediated by the total recombination map length and the deleterious mutation rate or fitness variation (Charlesworth et al., 1993; Nicolaisen and Desai, 2012; Nordborg et al., 1996; Robertson, 1961; Santiago and Caballero, 1995).
Recent approaches towards resolving Lewontin’s Paradox
Recently, Corbett-Detig et al., 2015 used population genomic data to estimate the reduction in diversity due to background selection and hitchhiking across 40 species, and showed that the impact of selection increases with two proxies of census population size, species range and the inverse of body size. Based on this evidence, they argued that selection could explain Lewontin’s Paradox; however, in a re-analysis, Coop, 2016 demonstrated that the observed magnitude of these reductions is insufficient to explain the orders-of-magnitude shortfall between observed and expected levels of diversity across species. Other recent work has found that life history characteristics related to parental investment, such as propagule size, are good predictors diversity in animals (Romiguier et al., 2014; Chen et al., 2017). Nevertheless, while these diversity correlates are important clues, they do not propose a mechanism by which these traits act to constrain diversity within a few orders of magnitude.
Here, I revisit Lewontin’s Paradox by integrating several data sets in order to compare the observed relationship between diversity and census size with the predicted relationship under different selection models. Prior surveys of genetic diversity either lacked census population size estimates, used allozyme-based measures of heterozygosity, or included fewer species. To address these shortcomings, I first estimate census sizes by combining predictions of population density based on body size with ranges estimated from geographic occurrence data. Using these estimates, I quantify the relationship between census size and previously-published genomic diversity estimates across 172 metazoan taxa within nine phyla, thus characterizing the relationship between π and that underlies Lewontin’s Paradox.
Past work looking at the relationship between π and has been unable to fully account for phylogenetic non-independence across taxa (Felsenstein, 1985). To address this, I use phylogenetic comparative methods (PCMs) with a synthetic time-calibrated phylogeny to account for shared phylogenetic history. Moreover, it is disputed whether considering phylogenetic non-independence is necessary in population genetics, given that coalescent times within species are much less than divergence times (Whitney and Garland, 2010; Lynch, 2011). Using PCMs, I address this by estimating the degree of phylogenetic signal in the diversity census size relationship, and investigating how these traits evolve along the phylogeny.
Finally, I explore whether the predicted reductions of diversity under background selection and recurrent hitchhiking are sufficiently strong to resolve Lewontin’s Paradox. I do so using selection parameters from
Results
Estimates of census population size
An impediment in resolving Lewontin’s Paradox is characterizing the relationship between diversity and census population sizes. This is difficult because census population sizes are unavailable for many taxa, especially for extremely abundant, cosmopolitan species that define the upper limit of ranges. Previous work has surveyed the literature for census size estimates (Nei and Graur, 1984; Soulé, 1976; Frankham, 1996), or used range, body size, or qualitative categories as proxies for census size (Corbett-Detig et al., 2015; Leffler et al., 2012). To quantify the relationship between genomic estimates of diversity and census population sizes, I first approximate census population sizes for 172 metazoan taxa (Figure 1). I estimate population densities based on an empirical linear relationship between body sizes and density that holds across metazoans (see Figure 1—figure supplement 1; Damuth, 1981; Damuth, 1987). Then, from geographic occurrence data, I estimate range sizes. Finally, I estimate population size as the product of these predicted densities and range estimates (see Materials and methods: Macroecological Estimates of Population Size). Note that the relationship between population density and body size is driven by energy budgets, and thus reflects macroecological equilibria (Damuth, 1987). Consequently, population sizes are underestimated for taxa like humans and their domesticated species, and overestimated for species with anthropogenically reduced densities or fragmented ranges. For example, the population size of
Figure 1.
The distribution of approximate census population sizes estimated by this study.
Some phyla containing few species were excluded for clarity.
Figure 1—figure supplement 1.
The relationship between body mass and population density found by Damuth, 1987, which is used to predict population densities.
The source of this data is appendix table of Damuth, 1987; the color indicates Damuth’s original group labels. The dashed line was estimated using a lognormal regression model in Stan. References to each measurement are available in Damuth, 1987.
Figure 1—figure supplement 2.
The fraction of total species per class on earth included in this study’s sample, per class.
The color of the points represents phylum, and the size of the point represents the absolute number of species by class.
Figure 1—figure supplement 3.
Comparison of this paper’s range estimates procedure against the IUCN Red List’s range estimates.
The correspondence between the ranges estimated with the alpha hull method applied to GBIF data used in this paper and IUCN Red List’s Extent of Occurrence for the subset of species in both datasets. Note that the IUCN Red List contains predominantly endangered species, which leads to ascertainment bias; still, the high correlation between the estimated ranges shows the alpha hull method works well.
Figure 1—figure supplement 4.
Validation of this paper’s range estimates against the categorical labels of Leffler et al., 2012.
The estimated ranges using GBIF occurrence data, ordered within and colored by the original range category labels assigned in Leffler et al., 2012.
Figure 1—figure supplement 5.
The relationship between body length (meters) and body mass (grams) in the Romiguier et al., 2014 data set.
The relationship between body length (meters) and body mass (grams) in the Romiguier et al., 2014 data set. This is used to infer body masses for taxa. The gray dashed line is the line of best fit inferred using Stan.
Characterizing the Diversity–Census-size Relationship
To determine which ecological or evolutionary processes could decouple diversity from census population size, we first need to quantify this relationship across a wide variety of taxa. Previous work has found there is a significant relationship between heterozygosity and the logarithm of population size or range size, but these studies relied on heterozygosity measured from allozyme data (Soulé, 1976; Frankham, 1996; Nei and Graur, 1984). I confirm these findings using pairwise diversity estimates from genomic sequence data and the estimated census sizes (Figure 2). The pairwise diversity estimates are from three sources: Leffler et al., 2012, Corbett-Detig et al., 2015, and Romiguier et al., 2014, and are predominantly from either synonymous or non-coding DNA (see Methods and Materials: 4.1 Diversity and Map Length Data). Overall, an ordinary least squares (OLS) relationship on a log-log scale fits the data well (Figure 2, gray dashed line). The OLS slope estimate is significant and implies a 13% percent increase in differences per basepair for every order of magnitude census size grows (95% confidence interval [12%, 14%], adjusted ; see also the OLS fit per-phyla, Figure 2—figure supplement 2).
Figure 2.
A visualization of Lewontin’s Paradox of Variation.
Pairwise diversity (data from Leffler et al., 2012, Corbett-Detig et al., 2015, and Romiguier et al., 2014), which varies over three orders of magnitude, shows a weak relationship with approximate population size, which varies over 12 orders of magnitude. The shaded curve shows the range of expected neutral diversity if were to equal under the four-alleles model, where , for two mutation rates, and , and the light gray dashed line represents the maximum pairwise diversity under the four alleles model. The dark gray dashed line is the OLS regression fit, and the blue dashed line is the regression fit using a phylogenetic mixed-effects model. Points are colored by phylum. The species
Figure 2—figure supplement 1.
A linear-log version of Figure 2.
Points are colored by phylum, and the shaded region is the predicted neutral level of diversity assuming with mutation range ranging between .
Figure 2—figure supplement 2.
A version of Figure 2 with OLS estimates per phylum.
Diversity and approximate population size for 172 taxa, colored by phylum; the dashed lines indicate the non-phylogenetic OLS estimates of the relationship between population size and diversity grouped by phyla.
Figure 2—figure supplement 3.
The posterior distributions and fitted relationship between diversity and both body mass and range size.
The relationship between diversity (differences per basepair) and body mass (left) and range (right) across 172 species. The top row are posterior distributions of parameters estimated using the phylogenetic mixed-effects model using 166 taxa in the synthetic phylogeny for the intercept, slope, and phylogenetic signal from the mixed-effects model. The bottom row contain each species as a point, colored by phyla. The gray dashed line is the non-phylogenetic standard regression estimate, and the blue dashed line is the relationship fit by the phylogenetic mixed-effects model.
Figure 2—figure supplement 4.
Pairwise diversity grouped by the range categories from Leffler et al., 2012, with point size indicating the predicted population density.
The vertical lines are the range category group means.
Notably, this relationship has few outliers and is relatively homoscedastic. This is in part because of the log-log scale, in contrast to previous work (Nei and Graur, 1984; Soulé, 1976); see Figure 2—figure supplement 1 for a version on a log-linear scale. However, it is noteworthy that few taxa have diversity estimates below 10−3.5 differences per basepair. Those that do, lynx (
Phylogenetic non-independence and the population size diversity relationship
One limitation of using ordinary least squares is that shared phylogenetic history can create correlation structure in the residuals, which violates an assumption of the regression model and can lead to bias (Felsenstein, 1985; Revell, 2010). To address this shortcoming, I fit the diversity–census-size relationship using a phylogenetic mixed-effects model, investigated whether there is a signal of phylogenetic non-independence, estimated the continuous trait values on the phylogeny, and explored how diversity and population size evolve. Prior population genetic comparative studies have lacked time-calibrated phylogenies and assumed unit branch lengths (Whitney and Garland, 2010), a shortcoming that has drawn criticism (Lynch, 2011). I use a synthetic time-calibrated phylogeny created from the DateLife project (O’Meara et al., 2020) to account for shared phylogenetic history (see Materials and methods: Phylogenetic Comparative Methods).
Using a phylogenetic mixed-effects model (Lynch, 1991; Hadfield and Nakagawa, 2010; de Villemereuil and Nakagawa, 2014) implemented in Stan (Carpenter et al., 2017; Stan Development Team, 2020), I estimated the linear relationship between diversity and population size (on a log-log scale) accounting for phylogeny, for the 166 taxa without missing data and present in the synthetic chronogram. This type of model is needed because closely-related species may differ from the average trend between and π in similar ways due to shared phylogenetic history, similar life history traits, etc., and thus do not represent independent observations as is assumed by the standard regression model. This is a form of phylogenetic pseudoreplication, and can be accounted for with a phylogenetic mixed-effects model. The phylogenetic mixed-effects model does not assume that there is phylogenetic structure in either or π (which itself is not a violation of the standard regression model, Revell, 2010 and Uyeda et al., 2018), but rather accounts for phylogenetic correlation structure in the residuals if any is present. Importantly, phylogenetic mixed-effects models simultaneously estimate the degree of phylogenetic structure in the residuals while fitting the relationship between and π. If the residuals are distributed independently, the estimated relationship would be similar to that found by ordinary least squares, and the estimated phylogenetic signal would be zero. Overall, this approach is conservative, making no assumptions about the source of the phylogenetic signal while accounting for violations of the regression model due to dependence among the residuals if present (see Revell, 2010 for a discussion of this).
As with the linear regression, I find this relationship is positive and significant (95% credible interval 0.03, 0.11), though somewhat attenuated compared to the OLS estimates (Figure 3B). Since the population size estimates are based on range and body mass, they are essentially a composite trait; fitting phylogenetic mixed-effects models separately on body mass and range indicates these have significant positive and negative effects, respectively (Figure 2—figure supplement 3; see also Figure 2—figure supplement 4 for the relationship between diversity and the range categories of Leffler et al., 2012).
Figure 3.
Phylogenetic comparative models of diversity and population size.
(A) The ancestral continuous trait estimates for the population size and diversity (differences per bp, log scaled) across the phylogeny of 166 taxa. The phyla of the tips are indicated by the color bar in the center. (B) The posterior distributions of the intercept, slope, and phylogenetic signal (λ, de Villemereuil and Nakagawa, 2014) of the phylogenetic mixed-effects model of diversity and population size (log scaled). Also shown are the 90% credible interval (light blue shading), posterior mean (blue line), OLS estimate (gray solid line), and bootstrap OLS confidence intervals (light gray shading). (C) The node-height tests of diversity, population size, and the two components of the population size estimates, body mass, and range (all traits on log scale before contrast was calculated). Each point shows the standardized phylogenetic independent contrast and branching time for a pair of lineages. Red lines are robust regression estimates (and are only shown for statistically significant relationships at the level). Note that some outlier pairs with very high phylogenetic independent contrasts were excluded (in all cases, these outliers were in the genus
Figure 3—figure supplement 1.
The posterior distributions for the parameters of the phylogenetic mixed-effects model of diversity and population size (this is analogous to Figure 3B) fit separately on chordates (), molluscs (), and arthropods ().
The phylogenetic mixed-effects model for chordates indicated the best-fitting model had no residual variance (), so an alternate model without this variance component was used to ensure proper convergence; this model is shown in green. The light blue (green) shaded regions are the 90% credible intervals, the blue (green) lines the posterior averages, the gray shaded regions the OLS bootstrap 95% confidence intervals, and the gray lines the OLS estimate. Note that unlike Figure 3, the OLS estimate uses all taxa, not just those present in the phylogeny, since splitting the data by phyla reduces sample sizes (OLS with just the subset of taxa in the phylogeny is not significant for either chordates and arthropods). The vertical dashed gray line indicates zero.
Figure 3—figure supplement 2.
The ancestral continuous trait estimates for diversity and population size with species labels.
Figure 3—figure supplement 3.
The ancestral continuous trait estimates for recombination map length and diversity and population size with species labels.
Since the phylogenetic mixed-effects model simultaneously estimates the variance of the phylogenetic effect () and the residual variance (), these can be used to estimate the phylogenetic signal, (Lynch, 1991; de Villemereuil and Nakagawa, 2014; see Freckleton et al., 2002 for a comparison to Pagel’s λ). When residuals are free of correlations due to shared phylogenetic history, then and all the variance could be explained by evolution or noise on the tips. In the relationship between population size and diversity, the posterior mean of (90% credible interval ) indicates a majority of the variance perhaps might be due to shared phylogenetic history (Figure 3B).
This high degree of phylogenetic signal substantiates Gillespie's concern (Gillespie, 1991) that the π– relationship may be driven by chordate-arthropod differences. A visual inspection of the estimated ancestral continuous values for diversity and population size on the phylogeny indicates the high phylogenetic signal seems to be driven in part by chordates having low diversity and small population sizes compared to non-chordates (Figure 3A). This problem resembles Felsenstein’s worst-case scenario (Felsenstein, 1985; Uyeda et al., 2018), where a singular event on a lineage separating two clades generates a spurious association between two traits.
To investigate whether clade-level differences dominated the relationship between diversity and population size, I fit phylogenetic mixed-effects models to phyla-level subsets of the data for clades with sufficient sample sizes (see Methods: 4.4 Phylogenetic Comparative Methods). This analysis shows a significant positive relationship between diversity and population size in arthropods, and positive weak relationships in molluscs and chordates (Figure 3—figure supplement 1). Each of the 90% credible intervals for slope overlap, suggesting the relationship between π and is similar across these clades.
Additionally, I have explored the rate of trait change through time using node-height tests (Freckleton and Harvey, 2006). Node-height tests regress the absolute values of the standardized contrasts between lineages against the branching time (since present) of these lineages. Under Brownian Motion (BM), standardized contrasts are estimates of the rate of character evolution (Felsenstein, 1985); if a trait evolves under constant rate BM, this relationship should be flat. For both diversity and population size, node-height tests indicate a significant increase in the rate of evolution towards the present (robust regression p-values 0.023 and 0.00018 respectively; Figure 3C). Considering the constituents of the population size estimate, range and body mass, separately, the rate of evolution of range but not body mass shows a significant increase (p-value 1.03 × 10−7) towards the present.
Interestingly, the diversity node-height test reveals two rate shifts at deeper splits (Figure 3C, top left) around 570 Mya. These nodes represent the branches between tunicates and vertebrates in chordates, and cephalopods and pleistomollusca (bivalves and gastropods) in molluscs. While the cephalopod-pleistomollusca split outlier may be an artifact of having a single cephalopod (
Finally, an important caveat is the increase in rate towards the tips could be caused by measurement noise, or possibly uncertainty or bias in the divergence time estimates deep in the tree. Inspecting the lineage pairs that lead to this increase in rate towards the tips indicates these represent plausible rate shifts, e.g. between cosmopolitan and endemic sister species like
Assessing the impact of linked selection on diversity across taxa
The above analyses reemphasize the drastic shortfall of diversity levels as compared to census sizes. Linked selection has been proposed as the mechanism that acts to reduce diversity levels from what we would expect given census sizes (Smith and Haigh, 1974; Gillespie, 2000; Corbett-Detig et al., 2015). Here, I test this hypothesis by estimating the scale of diversity reductions expected under background selection and recurrent hitchhiking, and comparing these to the observed relationship between π and .
I quantify the effect of linked selection on diversity as the ratio of observed diversity (π) to the estimated diversity in the absence of linked selection (), . Here, would reflect only demographic history and non-heritable variation in reproductive success. There are two difficulties in evaluating whether linked selection could resolve Lewontin’s Paradox. The first difficulty is that is unobserved. Previous work has estimated using methods that exploit the spatial heterogeneity in recombination and functional density across the genome to fit linked selection models that incorporate both hitchhiking and background selection (Elyashiv et al., 2016; Corbett-Detig et al., 2015). The second difficulty is understanding how varies across taxa, since we lack estimates of critical model parameters for most species. Still, I can address a key question: if diversity levels were determined by census sizes (), would the combined effects of background selection and recurrent hitchhiking be sufficient to reduce diversity to observed levels? Furthermore, does the relationship between census size and predicted diversity under linked selection across species, , match the observed relationship in Figure 2?
Since we lack estimates of selection parameters across species, I parameterize the hitchhiking and BGS models using estimates from
(1)
where , is the effect of background selection, and is the rate of coalescence caused by sweeps (Elyashiv et al., 2016, Equation 1, Coop and Ralph, 2012, Equation 20). Under background selection models with recombination, the reduction is where is the per diploid genome per generation deleterious mutation rate, and is the recombination map length in Morgans (Hudson and Kaplan, 1994; Nordborg et al., 1996). This BGS model is similar to models of effective population size under polygenic fitness variation, and can account for other modes of linked selection (Robertson, 1961; Santiago and Caballero, 1995; Santiago and Caballero, 1998, see Appendix 2, Background Selection and Polygenic Fitness Models). The coalescence rate due to sweeps is , where γ is the number of adaptive substitutions per generation, and is the probability a lineage is trapped by sweeps as they occur across the genome ( in Equation 15 of Coop and Ralph, 2012).Parameterizing the model this way, I then set the key parameters that determine the impact of recurrent hitchhiking and background selection (γ, , and ) to strong selection values estimated for
Previous work has found that the impact of linked selection increases with (Corbett-Detig et al., 2015; see Figure 4—figure supplement 4A), and it is often thought that this is driven by higher rates of adaptive substitutions in larger populations (Ohta, 1992), despite equivocal evidence (Galtier, 2016). However, there is another mechanism by which species with larger population sizes might experience a greater impact of linked selection: recombination map length, , is known to correlate with body mass (Burt and Bell, 1987) and thus varies inversely with population size. As this is a critical parameter that determines the genome-wide impact of both hitchhiking and background selection, I examine the relationship between recombination map length () and census population size () across taxa, using available estimates of map lengths across species (Stapley et al., 2017; Corbett-Detig et al., 2015). I find a significant non-linear relationship using phylogenetic mixed-effects models (Figure 4A; see Methods and materials: 4.4 Phylogenetic Comparative Methods). There is also a correlation between map length and genome size (Figure 4—figure supplement 2) and genome size and population size (Figure 4—figure supplement 1). These findings are consistent with the hypothesis that non-adaptive processes increase genome size in small- species (Lynch and Conery, 2003) which in turn could increase map lengths, as well as the hypothesis that map lengths are adaptively longer to more efficiently select against deleterious alleles in smaller populations (Roze, 2021). Overall, the negative relationship between map length and census size indicates linked selection is expected to be stronger in species with short map lengths, which are high- species.
Figure 4.
Predicting the impact of linked selection on diversity.
(A) The observed relationship between recombination map length () and census size () across 136 species with complete data and known phylogeny. Triangle points indicate six social taxa excluded from the model fitting since these have adaptively higher recombination map lengths (Wilfert et al., 2007). The dark gray line is the estimated relationship under a phylogenetic mixed-effects model, and the gray interval is the 95% posterior average. (B) Points indicate the observed π– relationship across taxa shown in Figure 2, and the blue ribbon is the range of predicted diversity were for –, and after accounting for the expected reduction in diversity due to background selection and recurrent hitchhiking under
Figure 4—figure supplement 1.
The relationship between genome size and approximate census population size.
The dashed gray line indicates the OLS fit. Tiger salamander (
Figure 4—figure supplement 2.
The relationship between genome size and recombination map length.
The dashed gray line indicates the OLS fit for all taxa, and the dashed colored dashed lines indicate the linear relationship fit by phyla. Tiger salamander (
Figure 4—figure supplement 3.
The observed π– relationship (points) across species compared to the predicted diversity (ribbons) under different modes of linked selection and parameters, for a range of mutation rates, 10–9–10–8.
In both subplots, the gray ribbon is the expected diversity if . In (A), the predicted impact on diversity for four modes of linked selection are depicted: background selection (purple) and hitchhiking (yellow) individually under the
Figure 4—figure supplement 4.
The relationship between and diversity in the Corbett-Detig et al., 2015 data, and the relationship between estimated reduction in diversity and census size, for three different approaches.
(A) The diversity data from Corbett-Detig et al., 2015 and the census population size estimated here for metazoan taxa. (B) The reductions in diversity, , plotted against census size across species. The red points are the reductions estimated by Corbett-Detig et al., 2015. This confirms Corbett-Detig et al., 2015 finding that the impact of selection () increases with census population size (though, in the original paper size body size and range were used as separate proxy variables for census population size). The green and red points are the predicted reduction in diversity under the recurrent hitchhiking (RHH) and background selection (BGS) model using the
Figure 4—figure supplement 5.
Comparison of the
(A) The estimate of the number of sweeps per basepair, per genome () from Table 2 of Elyashiv et al., 2016 (the studies included are Li and Stephan, 2006; Andolfatto, 2007; Macpherson et al., 2007 and Jensen et al., 2008); the red point is my estimate used in this paper. (B) Points are the data from Shapiro et al., 2007. The blue line is the non-linear least squares fit to the data, and the green dashed line is the sweep model parameterized by the genome-wide average sweep coalescence rate from the classic sweep and background selection model of Elyashiv et al., 2016 (
Then, I predict the expected diversity () under background selection and hitchhiking, assuming and that all species had the rate of sweeps and strength of BGS as
Finally, the poor fit between observed and predicted levels of diversity across species is not remedied by stronger selection parameters. In Figure 4—figure supplement 3B, I increase both selection parameters and γ ten-fold each, and find the same qualitative pattern: on a log-log scale the relationship between and π is linear, while the predicted diversity under linked selection is non-linear with . Under this ten-fold higher selection regime, there is more overlap between observed and predicted levels of diversity, but diversity is severely
Discussion
Nearly fifty years after Lewontin’s description of the Paradox of Variation, how evolutionary, life history, and ecological processes interact to constrain diversity across taxa to a narrow range remains a mystery. I revisit Lewontin’s Paradox by first characterizing the relationship between genomic estimates of pairwise diversity and approximate census population size across 172 metazoan species. Previous surveys have used allozyme-based estimates, fewer taxa, or proxies of population size. My estimates of census population sizes are rough approximates, since they use body size to predict density. An improved estimate might account for vagility (as Soulé, 1976 did), though this is harder to do systematically across many taxa. Future work might also use other ecological information, such as total biomass, or species distribution modeling to improve census size estimates (Bar-On et al., 2018; Mora et al., 2011). Still, it seems more accurate estimates would be unlikely to change the qualitative findings here, which resemble those of early surveys (Nei and Graur, 1984; Soulé, 1976).
One limitation of this study is that diversity estimates are collated from a variety of sources rather than estimated with a single bioinformatic pipeline. This leads to technical noise across diversity estimates; perhaps the relationship between π and found here could be tighter with a standardized bioinformatic pipeline. In addition, there might be systematic bioinformatic sources of bias: for example high-diversity sequences may fail to align to the reference genome and end up unaccounted for, leading to a downward bias. Alternatively, a high-diversity sequences might map to the reference genome, but adjacent mis-matching SNPs might be mistaken for a short insertion or deletion. While these issues might affect estimates in high-diversity species, it is unlikely to change the qualitative relationship between π–.
Macroevolution and Across-Taxa population genomics
Lewontin’s Paradox arises from a comparison of diversity across species, yet it has been disputed whether such comparisons require phylogenetic comparative methods. Extending previous work that has accounted for phylogeny in particular clades (Leffler et al., 2012), or using taxonomical-level averages (Romiguier et al., 2014), I show that the positive relationship between diversity and census size is significant using a mixed-effects model with a time-calibrated phylogeny. Additionally, I find a high degree of phylogenetic signal, evidence of deep shifts in the rate of evolution of genetic diversity, and that arthropods and chordates form clusters. Overall, this suggests that previous concerns about phylogenetic non-independence in comparative population genetic studies were warranted (Gillespie, 1991; Whitney and Garland, 2010). Notably, Lynch, 2011 has argued that PCMs for pairwise diversity are unnecessary, since mutation rate evolution is fast and thus free of phylogenetic inertia, sampling variance should exceed the variance due to phylogenetic shared history, and coalescence times are much shorter than divergence times. Since my findings suggest PCMs are necessary in some cases, it is worthwhile to address these points.
First, Lynch has correctly pointed out that while coalescence times are much less than divergence times and should be free of phylogenetic shared history, the factors that determine coalescence times (e.g. mutation rates and effective population size) may not be (Lynch, 2011). In other words, coalescence times are free from phylogenetic shared history
Finally, beyond just accounting for phylogenetic non-independence, macroevolution and phylogenetic comparative methods are a promising way to approach across-species population genomic questions. For example, one could imagine that diversification processes could contribute to Lewontin’s Paradox. If large- species were to have a rate of speciation that is greater than the rate at which mutation and drift reach equilibrium (which is indeed slower for large species), this could act to decouple diversity from census population size. That is to say, even if the rate of random demographic bottlenecks were constant across taxa, lineage-specific diversification processes could lead certain clades to be systematically further from demographic equilibrium, and thus have lower diversity than expected for their census population size.
How could selection still explain Lewontin’s Paradox?
Even assuming selection parameters estimated from
First, I approximate the reduction in diversity using homogeneous background selection and recurrent hitchhiking models (Kaplan et al., 1989; Hudson and Kaplan, 1995; Coop and Ralph, 2012), when in reality, there is genome-wide heterogeneity in functional density, recombination rates, and the adaptive substitutions across species. Each of these factors mediate how strongly linked selection impacts diversity across the genome. Despite these model simplifications, the predicted reduction in diversity in
Second, my model here only considers hard sweeps, and ignores the contribution of soft sweeps (e.g. from standing variation or recurrent mutations; Hermisson and Pennings, 2005; Pennings and Hermisson, 2006), partial sweeps (e.g those that do not reach fixation), and the interaction of sweeps and spatial processes. While future work exploring these alternative types of sweeps is needed, the predicted reductions in diversity found here under the simplified sweep model are likely relatively robust to these other modes of sweeps for a few reasons. First, the shape of the diversity–recombination curve is equivalent under models of partial sweeps and hard sweeps, though these imply different rates of sweeps (Coop and Ralph, 2012). Second, in the limit where most fitness variation is due to weak soft sweeps from standing variation scattered across the genome (i.e. due to polygenic fitness variation), levels of diversity are well approximated by quantitative genetic linked selection models (Robertson, 1961; Santiago and Caballero, 1995). The reduction in diversity under these models is nearly identical to that under background selection models, in part because deleterious alleles at mutation-selection balance constitute a considerable component of fitness variation (see Appendix Section B; Charlesworth and Hughes, 2000; Charlesworth, 2015). Third, the parameters from Elyashiv et al., 2016 could reflect a mixture of types of sweeps (Elyashiv et al., 2016 p. 14 and p. 19 of their Supplementary Online Materials). Finally, I also disregarded the interaction of sweeps and spatial processes. For populations spread over wide ranges, limited dispersal slows the spread of sweeps, allowing for new beneficial alleles to arise, spread, and compete against other segregating beneficial variants (Ralph and Coop, 2015; Ralph and Coop, 2010). Through limited dispersal should act to ‘‘soften sweeps’ and not impact my findings for the reasons described above, future work could investigate how these processes impact diversity in ways not captured by hard sweep models.
Third, other selective processes, such as fluctuating selection or hard selective events (i.e. selection resulting in a reduction in the population size), could reduce diversity in ways not captured by the background selection and hitchhiking models. Since frequency-independent fluctuating selection reduces diversity under most conditions (Novak and Barton, 2017), this could lead seasonality and other sources of temporal heterogeneity to reduce diversity in large- species with short generation times more than longer-lived species with smaller population sizes. Future work could consider the impact of fluctuating selection on diversity under simple models (Barton, 2000) if estimates of key parameters governing the rate of such fluctuations were known across taxa. Additionally, another mode of selection that could severely reduce diversity across taxa, yet remains unaccounted for in this study, is periodic hard selective events. These selective events could occur regularly in a species’ history yet be indistinguishable from demographic bottlenecks with just population genomic data.
Spatial and demographic processes
One limitation of this study is the inability to quantify the impact of spatial and demographic population genetic processes on the relationship between diversity and census population sizes across taxa. The genomic diversity estimates collated in this study unfortunately lack details about the sampling process and spatial data, which can have a profound impact on population genomic summary statistics (Battey et al., 2020). These issues could systematically bias species-wide diversity estimates; for example, if diversity estimates from a cosmopolitan species were primarily from a single region or subpopulation, diversity would be an underestimate relative to the entire population. However, biased spatial sampling alone seems incapable of explaining the π- divergence in high- taxa. In the extreme scenario in which only one subpopulation was sampled, would need to be close to one for population subdivision alone to sufficiently reduce the total population heterozygosity to explain the orders-of-magnitude shortfall between predicted and observed diversity levels. This can be seen by rearranging the expression for as , where and are the subpopulation and total population heterozygosities; if , then only can reduce several orders of magnitude. Yet, across-taxa surveys indicate that is almost never this high within species (Roux et al., 2016). Future work could quantify the extent to which more realistic spatial processes contribute to Lewontin’s Paradox. For example, high- taxa usually experience range expansions, with repeated founder effects and local extinction/recolonization dynamics that depress diversity (Slatkin, 1977). In particular, with the appropriate data, one could estimate the empirical relationship between dispersal distance, range size, and coalescent effective population size across taxa.
In this study, I have focused entirely on assessing the role of linked selection, rather than demography, in reducing diversity across taxa. In contrast to demographic models, models of linked selection have comparatively fewer parameters and more readily permit rough estimates of diversity reductions across taxa. Given that I find that models of linked selection are incapable of explaining the observed relationship between and π, this supports the hypothesis the diversity across species are shaped primarily by past demographic fluctuations. Still, a full resolution of Lewontin’s Paradox would require understanding how the demographic processes across taxa with incredibly heterogeneous ecologies and life histories transform into . With population genomic data becoming available for more species, this could involve systematically inferring the demographic histories of tens of species and looking for correlations in the frequency and size of bottlenecks with across species.
Measures of effective population size, Timescales, and Lewontin’s Paradox
Lewontin’s Paradox describes the extent to which the effective population sizes implied by diversity, , diverge from census population sizes. However, there are a variety other effective population size estimators calculable from different data and summary statistics (Wang et al., 2016; Caballero, 1994; Galtier and Rousselle, 2020). These include estimators based on the site frequency spectrum, observed decay in linkage disequilibrium, or temporal estimators that use the variance in allele frequency change through time. These various estimators capture different summaries of effective population size on shorter timescales than coalescent-based estimators (see Wang, 2005 for a review), and thus could be used to tease apart processes that impact the - relationship in the more recent past.
Temporal estimators already play an important role in understanding another summary of the - relationship: the ratio , which is an important quantity in conservation genetics (Frankham, 1995; Mace and Lande, 1991) and in understanding evolution in highly fecund marine species. Surveys of the short-term relationship across taxa indicate mean is on order of (Frankham, 1995; Palstra and Ruzzante, 2008; Palstra and Fraser, 2012), though the uncertainty in these estimates is high, and some species with sweepstakes reproduction systems like Pacific Oyster (
Finally, short-term temporal estimators may play an important role in resolving Lewontin’s Paradox. These estimators, along with short-term estimates of the impact of linked selection (Buffalo and Coop, 2019; Buffalo and Coop, 2020), can inform us how much diversity is depressed by selection on shorter timescales, free from the rare strong selective events or severe bottlenecks that impact pairwise diversity. It could be that in any one generation, selection contributes more to the variance of allele frequency changes than drift, yet across-taxa patterns in diversity are better explained processes acting sporadically on longer timescales, such as colonization, founder effects, and bottlenecks. Thus, the pairwise diversity may not give us the best picture of the generation to generation evolutionary processes acting in a population to change allele frequencies. Furthermore, certain observed adaptations occur at a pace that is inexplicable given small effective population sizes implied by diversity, and are only possible if short-term effective population sizes are orders of magnitude larger (Karasov et al., 2010; Barton, 2010).
Conclusions
In
Looking at Lewontin’s Paradox through an macroecological and macroevolutionary lens begets interesting questions outside of the traditional realm of population genetics. Here, I have found that diversity and have a consistent relationship without many outliers, despite the wildly disparate ecologies, life histories, and evolutionary histories of the taxa included. Furthermore, taxa with very large census sizes have surprisingly low diversity. Is this explained by macroevolutionary processes, such as different rates of speciation for large- taxa? Or, are the levels of diversity we observe today an artifact of our timing relative to the last glacial maximum, or the last major extinction? Did large- prehistoric animal populations living in other geological eras have higher levels of diversity than our present taxa? Or, does ecological competition occur on shorter timescales such that strong population size contractions transpire and depress diversity, even if a species is undisturbed by climatic shifts or mass extinctions? Overall, patterns of diversity across taxa are determined by many overlaid evolutionary and ecological processes occurring on vastly different timescales. Lewontin’s Paradox of Variation may persist unresolved for some time because the explanation requires synthesis and model building at the intersection of all these disciplines.
Materials and methods
Diversity and map length data
The data used in this study are collated from a variety of previously published surveys. Of the 172 taxa with diversity estimates, 14 are from Corbett-Detig et al., 2015, 96 are from Leffler et al., 2012, and 62 are from Romiguier et al., 2014. The Corbett-Detig et al. data is estimated from four-fold degenerate sites, the Romiguier et al. data is synonymous sites, and the Leffler et al. data is estimated predominantly from silent, intronic, and non-coding sites. All types of diversity estimates from Leffler et al., 2012 were included to maximize the taxa in the study, since the variability of diversity across functional categories is much less than the diversity across taxa. Multiple diversity estimates per taxa were averaged. The total recombination map length data were from both (Stapley et al., 2017; 127 taxa), and (Corbett-Detig et al., 2015; 9 taxa). Both studies used sex-averaged recombination maps estimated with cross-based approaches; in some cases errors in the original data were found, documented, and corrected. These studies also included genome size estimates used to create Figure 4—figure supplement 2 and Figure 4—figure supplement 1.
Macroecological estimates of population size
A rough approximation for total population size (census size) is , where is the population density in individuals per km2 and is the range size in km2. Since population density estimates are not available for many taxa included in this study, I used the macroecological abundance-body size relationship to predict population density from body size. Since body length measurements are more readily available than body mass, I collated body length data from various sources (see https://github.com/vsbuffalo/paradox_variation; copy archived at swh:1:rev:8fa6b5834f6536319b1e5cd9722ca02d317183df, Buffalo, 2021); body lengths were averaged across sexes for sexually dimorphic species, and if only a range of lengths was available, the midpoint was used.
Then, I re-estimated the relationship between body mass and population density using the data in the appendix table of Damuth, 1987, which includes 696 taxa with body mass and population density measurements across mammals, fish, reptiles, amphibians, aquatic invertebrates, and terrestrial arthropods. Though the abundance-body size relationship can be noisy at small spatial or phylogenetic scales (Chapter 5, Gaston and Blackburn, 2008), across deeply diverged taxa such as those included in this study and Damuth, 1987, the relationship is linear and homoscedastic (see Figure 1—figure supplement 1). Using Stan (Stan Development Team, 2020), I jointly estimated the relationship between body mass from body length using the Romiguier et al., 2014 taxa, and used this relationship to predict body mass for the taxa in this study. These body masses were then used to predict population density simultaneously, using the Damuth, 1981 relationship. The code of this routine (pred_popsize_missing_centered.stan) is available in the GitHub repository (https://github.com/vsbuffalo/paradox_variation/).
To estimate range, I first downloaded occurrence records from Global Biodiversity Information Facility (Global Biodiversity Information Facility, 2020) using the rgbif R package (Chamberlain et al., 2014; Chamberlain and Boettiger, 2017). Using the occurrence locations, I inferred whether a species was marine or terrestrial, based on whether the majority of their recorded occurrences overlapped a continent using rnaturalearth and the sf packages (South, 2017; Pebesma, 2018). For each taxon, I estimated its range by finding the minimum α-shape containing these occurrences. The α parameters were set more permissive for marine species since occurrence data for marine taxa were sparser. Then, I intersected the inferred ranges for terrestrial taxa with continental polygons, so their ranges did not overrun landmasses (and likewise with marine taxa and oceans). I inspected diagnostic plots for each taxa for quality control (all of these plots are available in paradox_variation GitHub repository), and in some cases, I manually adjusted the α parameter or manually corrected the range based on known range maps (these changes are documented in the code data/species_ranges.r and data/species_range_fixes.r). The range of
Population size validation
I validated the approximate census sizes by comparing the implied biomass of these estimates to estimates of the total carbon biomass on earth by phylum (Bar-On et al., 2018). For species with wet body mass
First, I compared the relative carbon biomass in this study to the relative carbon biomass on earth per phylum. This shows that this study’s sample over represents chordate biomass (by a factor of ∼3), and under represents in arthropod biomass (by a factor of 0.02) relative to the proportion of carbon biomass of these phyla on earth (see column eight of Table 1). Second, to check whether the carbon biomass per phylum in the sample was broadly consistent with the total on earth by phylum ( for phylum ), I calculated the expected sample biomass if species were sampled randomly from the total species in a phylum, (, where is the total number of species in the sample in phylum , is the total number of species in phylum on earth). The fraction of total species on earth included in the sample in this study is depicted in Figure 1—figure supplement 2.
Table 1.
How the total carbon biomass estimates by phylum from Bar-On et al., 2018 compare to the implied biomass estimates from this study.
All biomass estimates are carbon biomass, and the proportions are of total biomass with respect to the study. The proportion of biomass in this study compared to the Bar-On et al. estimates Bar-On et al., 2018 indicates chordates are overrepresented and arthropods are underrepresented in the present study; the factor that each phylum is overrepresented is given in the eighth column. Total species by phylum estimates are from Reaka-Kudla et al., 1996; Nicol, 1969; Zhang, 2013; Chapman, 2009. The ratio column is the ratio of total biomass implied by the estimates of each species in a phylum to the actual biomass of that phylum.
| Bar-On et al. | Present study | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| phylum | total species ( | biomass ( | prop. biomass | biomass ( | prop. biomass | num. species ( | factor overrepresented | prop. total species (f=n/T) | factor (b/fB) |
| Arthropoda | 1.26 × 106 | 1.20 | 0.4635 | 2.80 × 10−4 | 0.0102 | 68 | 0.02 | 5.41 × 10−5 | 4.31 |
| Chordata | 5.41 × 104 | 0.87 | 0.3357 | 2.67 × 10−2 | 0.9715 | 68 | 2.89 | 1.26 × 10−3 | 24.40 |
| Annelida | 1.70 × 104 | 0.20 | 0.0772 | 1.23 × 10−5 | 0.0004 | 3 | 0.01 | 1.76 × 10−4 | 0.35 |
| Mollusca | 9.54 × 104 | 0.20 | 0.0772 | 4.56 × 10–4 | 0.0166 | 13 | 0.21 | 1.36 × 10−4 | 16.70 |
| Cnidaria | 1.60 × 104 | 0.10 | 0.0386 | 3.07 × 10−5 | 0.0011 | 2 | 0.03 | 1.25 × 10−4 | 2.45 |
| Nematoda | 2.50 × 104 | 0.02 | 0.0077 | 4.03 × 10−6 | 0.0001 | 1 | 0.02 | 4.00 × 10−5 | 5.03 |
Next, I look at the ratio of sample biomass per phylum, , to this expected biomass per phylum (Table 1). The consistency is quite close for this rough approach and the non-random sample of taxa included in this study. The carbon biomass estimates for chordates implied by the census size estimates are ∼24-fold higher than expected, but is well within reasonable expectations given that the chordate sample includes many larger-bodied domesticated species (and is a biased sample in other ways). Similarly, the implied arthropod carbon biomass is quite close to what one would expect. Overall, these values indicate that the census size estimates here do not lead to implied biomasses per phylum that are outside the range of plausibility. For other population size consistency checks, see Appendix 3.
Phylogenetic comparative methods
Of the full dataset of 172 taxa with diversity and population size estimates, a synthetic calibrated phylogeny was created for 166 species that appear in phylogenies in DateLife project (O’Meara et al., 2020; Sanchez-Reyes and O’Meara, 2019). This calibrated synthetic phylogeny was then subset for the analyses based on what species had complete trait data. The diversity-population size relationship assessed by a linear phylogenetic mixed-effects model implemented in Stan (Stan Development Team, 2020), according to the methods described in de Villemereuil and Nakagawa, 2014, (see stan/phylo_mm_regression.stan in the GitHub repository). This same Stan model was used to estimate the same relationship between arthropod, chordate, and mollusc subsets of the data, though a reduced model was used for the chordate subset due to identifiability issues leading to poor MCMC convergence (Figure 3—figure supplement 1).
The relationship between recombination map length and the logarithm of population size is non-linear and heteroscedastic, and was fit using a lognormal phylogenetic mixed-effects model on the 130 species with complete data. Since social insects have longer recombination map lengths (Wilfert et al., 2007), social taxa were excluded when fitting this model. All Rhat (Vehtari et al., 2019) values were below 1.01 and the effective number of samples was over 1,000, consistent with good mixing; details about the model are available in the GitHub repository (phylo_mm_lognormal.stan). Continuous trait maps (Figure 3A, Figure 3—figure supplement 3, and Figure 3—figure supplement 2) were created using phytools (Revell, 2012). Node-height tests were implemented based on the methods in Geiger (Pennell et al., 2014; Harmon et al., 2008), and use robust regression to fit a linear relationship between phylogenetic independent contrasts and branching times.
Predicted reductions in diversity
The predicted reductions in diversity due to linked selection are approximated using selection and deleterious mutation parameters from
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021, Buffalo. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Neutral theory predicts that genetic diversity increases with population size, yet observed levels of diversity across metazoans vary only two orders of magnitude while population sizes vary over several. This unexpectedly narrow range of diversity is known as Lewontin’s Paradox of Variation (1974). While some have suggested selection constrains diversity, tests of this hypothesis seem to fall short. Here, I revisit Lewontin’s Paradox to assess whether current models of linked selection are capable of reducing diversity to this extent. To quantify the discrepancy between pairwise diversity and census population sizes across species, I combine previously-published estimates of pairwise diversity from 172 metazoan taxa with newly derived estimates of census sizes. Using phylogenetic comparative methods, I show this relationship is significant accounting for phylogeny, but with high phylogenetic signal and evidence that some lineages experience shifts in the evolutionary rate of diversity deep in the past. Additionally, I find a negative relationship between recombination map length and census size, suggesting abundant species have less recombination and experience greater reductions in diversity due to linked selection. However, I show that even assuming strong and abundant selection, models of linked selection are unlikely to explain the observed relationship between diversity and census sizes across species.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer




