Introduction
The spread of human influenza A viruses among well-connected and highly populated cities is recognized as the primary process driving viral dissemination over a wide range of geographic regions [1–6]. Such viral diffusion also determines the countrywide geographic patterns of genetic structure of the virus; frequent viral exchange between major urban areas may give rise to high degrees of genetic similarity of viral populations across these regions, resulting in broad-scale geographic patterns of genetic homogeneity of viral populations. Increasing connectivity between populations as a result of rapid development of human transportation networks may also facilitate viral exchange not only from urban to urban areas, but also from urban to local (or less populated regions) and local to local areas [7–10], potentially leading to a countrywide viral mixing. This may be particularly the case for influenza pandemic viruses, such as influenza A/H1N1pdm09 virus that spread globally within a few weeks after its first isolation in Mexico and California in April 2009 [11–14]. The A/H1N1pdm09 virus was highly contagious, particularly during the first year of the pandemic, implying that there might be extensive viral exchange and co-circulation of multiple lineages of the virus over a wide range of geographic regions, which could generate unique spatial patterns of genetic structure over the course of the 2009 pandemic. As yet, however, the spatial genetic structure of the A/H1N1pdm09 virus populations has not been fully investigated in all regions.
The geographic patterns of genetic structure of influenza A viruses, such as the geographic distribution of viral populations and the spatial extent to which each lineage of influenza A virus forms genetic clusters across space, provide insight into the underlying mechanisms of evolution and spread of the virus across the human population landscapes [15–17]. Hemagglutinin (HA) and neuraminidase (NA) are two major glycoproteins on the surface of influenza A viruses, which are associated with entry and release in the virus life cycle. These two glycoproteins are the primary targets of the human immune response to the viruses, implying that both genes evolve as a result of the human-virus interaction [18–20]. This suggests that identifying the spatial genetic structure of HA and NA genes helps to improve our understanding of how influenza A viruses interact with humans to adapt to, evolve, and spread across human population landscapes.
The scarcity of specific geographic locations associated with sequenced influenza samples in public databases is one of the major challenges for such studies. Since its emergence in North America, however, a considerable number of influenza A/H1N1pdm09 viral genetic sequences with specific sampling locations have been collected in mainland China during the pandemic period. The availability of this new data provides a unique opportunity to investigate the spatial patterns of virus genetic structure at finer geographic scales, which allows for disentangling the dynamic interaction between humans and the influenza viruses across the landscape [16, 21–23].
In this study, we investigated geographic heterogeneity of genetic structure of HA and NA genes of the A/H1N1pdm09 virus isolated in mainland China during the first year of the 2009 pandemic. Despite the fact that the virus emerged over a decade ago in 2009, the spatial patterns of genetic differentiation of the virus, as well as the aspect of spatial scale in mainland China, are still poorly understood. To address this gap, we examined the spatial genetic structure of the A/H1N1pdm09 viruses at multiple geographic scales using phylogenetic and Bayesian clustering analyses. We analyzed the genetic sequence data of the virus, along with their sampling locations, to identify the geographic patterns of genetic clusters of the viruses in mainland China. This study aimed to investigate the following questions: 1) how genetically divergent were the A/H1N1pdm09 viruses circulating in mainland China during the first year of the pandemic, 2) what were the spatial extent and scale of genetic populations of the virus across the human population landscape of mainland China, and 3) how were the viral populations distributed as a result of dynamic interaction between humans and the viruses? Determining the spatial genetic structure of A/H1N1pdm09 viruses in the early phase of the pandemic can illuminate whether this form of pandemic influenza followed typical patterns of viral diffusion, with genetic homogeneity as a result of high population connectivity, or with highly localized patterns of spatial genetic structure driven by the strong circulation within the communities. In addition, the spatial genetic structure of the virus allows for identifying the linkage between the human population dynamics and the spread of viral population distribution. Better understanding of spatial patterns of the spread of influenza A/H1N1pdm09 virus will help us develop effective intervention strategies for future influenza pandemics, as well as other airborne infectious viruses, such as SARS-CoV-1.
Methods
Sequence samples
Nucleotide sequence datasets for HA and NA genes of the A/H1N1pdm09 virus isolated between April 2009 and August 2010 in mainland China were obtained from the NCBI Influenza Virus Database (IVDB) (HA: 275, NA: 273 –accession date: Oct 29, 2020) and GISAID EpiFlu database (HA: 646, NA: 646 –accession date: Oct 29, 2020) [24, 25]. The datasets were combined and duplicate isolates were removed. Additionally, only sequences that included district-level geographic locations were retained for the study. After removing entries with incomplete sequences in the coding region, the remaining HA and NA gene sequences were aligned in Muscle (v 3.8.31) [26]. Maximum likelihood (ML) phylogenetic trees for HA and NA genes were reconstructed using the general time reversible (GTR) + Gamma substitution model in IQ-TREE (v 2.2.0) [27]. The resulting ML trees were used to identify temporal outliers in TempEst (1.10.4) [28], and those outliers were subsequently removed. Sampling locations were geocoded to the centroid of the district-level boundaries using Google Maps API. The final datasets include 413 sequences for the HA gene and 406 sequences for the NA gene, along with the district-level locations and assignment into one of seven geographic regions (Central, East, North, Northeast, Northwest, South, and Southwest regions) in mainland China (Fig 1), as well as sequence sampling dates (HA: Aug 16, 2009 –Aug 4, 2010, NA: Sep 1, 2009 –Aug 8, 2010) obtained from the sequence metadata (S1 Table).
[Figure omitted. See PDF.]
Phylogenetic analysis
Temporal phylogenetic trees of HA and NA genes in A/H1N1pdm09 virus were reconstructed to estimate viral evolutionary rates, lineage diversification through time, and time to the most recent common ancestor (tMRCA) of viral lineages (S1 and S2 Figs). Nucleotide sequences of HA and NA genes, and their sampling dates were used to infer phylogenetic trees using a Bayesian Markov Chain Monte Carlo (MCMC) approach in the BEAST software package (v1.10.4) [29]. The HKY+G substitution model selected from the maximum likelihood test in MEGAX [30], the uncorrelated lognormal relaxed molecular clock model [31], and exponential population demographic model were used for the reconstruction of phylogenetic trees [29, 32]. Three independent runs of 100 million generations were performed and sub-sampled every 10,000 generations to ensure MCMC convergence. Tree and log files were combined with a burn in of 10 million samples per run using LogCombiner (v1.10.4). Convergence of parameters was analyzed using Tracer (v1.7.1) [33] and effective sample size (ESS) for all posteriors were greater than 200, indicating sufficient mixing of MCMC chains and no significant autocorrelation in the posterior sample. The tree logs for HA and NA were summarized as maximum clade credibility (MCC) trees in TreeAnnotator (v1.10.4) and visualized in FigTree (v1.4.3) [34].
ML phylogenetic trees with 1,000 bootstrap replications were reconstructed in IQ-TREE (S3 and S4 Figs) to calculate dissimilarity matrices of patristic distances between the genetic sequences of HA and NA genes, the sum of branch length between pairs of samples in the phylogenetic tree [27, 35]. The genetic sequences of A/California/04/2009 (accession number—HA: FJ966082, NA: FJ966084) were added as outgroup for the purpose of rooting, and then removed from the trees. Dissimilarity matrices were generated using ape package (v 4.0) in R [36].
Spatial extent of genetic populations
Partial Mantel correlograms were used to analyze the spatial extent and scale of genetic structure of the A/H1N1pdm09 virus and for identifying the role of human population landscape in the spatial genetic structure of the virus in mainland China. A Mantel correlogram is a non-parametric statistical tool to illustrate the association between two distance matrices, such as genetic distance and geographic distance [37, 38]. The Mantel correlogram classifies pairwise dissimilarity of genetic distances into several geographic distance lags and then calculates a Mantel statistic within each lag. A partial Mantel correlogram, an extension of the Mantel correlogram, can utilize three distance matrices simultaneously: a genetic distance matrix as a response variable and two matrices for explanatory variables (e.g., geographic and temporal distances). The partial Mantel correlogram can be used to capture the association between genetic and geographic distances while accounting for potential confounders, such as temporal trends. Statistical significances within each lag were calculated using a permutation test (n = 10,000). Partial Mantel correlograms were constructed using ecodist package [39] in R.
Two different geographic distance matrices were generated to identify the spatial scale of genetic structure of the HA and NA genes: Euclidean distance and indexed distance matrices. Euclidean distance, the classical geographic distance measure in ecological studies, was used to investigate the correlation between genetic and geographic distances. Human population distributions can be considered as landscapes to human influenza viruses: the geographic distribution of human populations, their movements, and population characteristics may determine the spatial patterns of genetic structure of the A/H1N1pdm09 virus. In particular, the population landscape of China is highly heterogeneous; large and densely populated cities are located along the east coast, while the population size and density dramatically decline moving westward. The complexity of human population landscapes in mainland China suggests that the spatial genetic structure of viral diversity might not conform to isolation by distance (IBD) patterns, whereby viruses closer in geographic space are more similar than those further apart in geographic space, which is commonly observed in wild animals, fungi, or plants along a gradient of environmental landscapes [40–42]. The partial Mantel correlograms with Euclidean distance were used to identify the spatial extent of genetic differentiation of the HA and NA genes across the highly heterogeneous human population landscape in mainland China. Euclidean distances between district-level sampling locations were measured using ArcMap 10.7 [43].
Meanwhile, the degrees of genetic similarity between pairs of the virus may vary across different administrative membership; high levels of genetic similarity between viruses are expected within small geographic regions (i.e., community, neighborhood, or district), while such patterns may become less obvious at larger geographic scales (i.e., prefectural and province). The indexed distance measure was designed to capture the genetic spatial structure of the virus at each level of administrative divisions in China. In addition, this measure controls for the impact of different area size across the same levels of administrative divisions that might be unduly influential in the traditional Euclidean measures of space. For instance, two locations may be far apart in geographic space but still belong to the same administrative unit and have greater interactions because of this shared membership than with other sites that are geographically closer but administratively further. The indexed distance measure will thus allow us to identify the degree of genetic similarities across the different spatial scales across mainland China while overcoming the potential confounding impacts of Euclidean distance. The indexed distance matrices for the HA and NA genes were generated based on the sampling addresses in the following ways: 1) if two viral isolates were sampled in the same district-level region, 2) if two viral isolates were sampled in the same prefectural-level region but not in the same district-level region, 3) if two viral isolates were sampled in the same province-level regions but not in the same prefectural-level region, 4) if two viral isolates were sampled in the same geographic region, 5) if two viral isolates were sampled in the different geographic regions.
Principal component analysis
The geographic differentiation of genetic structure of A/H1N1pdm09 virus by seven geographic regions of mainland China (Fig 1) were investigated using a principal component analysis (PCA) approach. A PCA approach allows us to reduce the dimensions of single nucleotide polymorphism (SNP) frequency data by seeking principal components (PCs); the first PC is the summary of frequency of SNPs that maximize the variance of the projected data, and the rest of PCs are orthogonal to the first PCs accounting for the residual variance in the data [44, 45]. Two matrices of SNPs frequencies for each gene were generated and used for the PCA. The first two PCs with the highest eigenvalues were used for visualizing the genetic characteristics of viruses by seven geographic regions of mainland China on to two-dimensional space. We also generated 95% inertia ellipses to visualize the genetic differentiation of the A/H1N1pdm09 virus by seven geographic regions. The PCA was performed using ade4 package in R [46].
Bayesian clustering analysis
While Mantel correlograms summarize the spatial patterns of genetic structure of the HA and NA genes, this method cannot illustrate the distribution and genetic differentiation of viral populations across space. Bayesian clustering analysis uses the genetic sequence data to calculate the probability that each individual sequence belongs to pre-defined genetic population groups, which allows for classification of the genetic samples into subpopulations and visualization of the probability of the population memberships for each individual sequence in space. By mapping the population structure of the viral samples onto geographic space, the spatially heterogeneous genetic structure of the HA and NA genes can be illustrated across mainland China.
For the clustering analysis, we first identified the optimal number of genetic clusters (K) using an iterative approach in the STRUCTURE software package [47]. Log-likelihood statistics for each value of K, ranging from 1 (all sequences consist of a unique genetic cluster) to 30 (30 genetically heterogeneous groups), were calculated to identify the optimal number of clusters. For the model setting in STRUCTURE, non-admixture model, assuming that each individual is originated from one of K populations, was used to calculate the proportion of clustering membership assigned to each cluster for individual sequences. Further, we allowed allele frequencies in our sequences to be correlated. Five independent runs of 500 thousand MCMC steps with a burn in of 50 thousand steps for each iteration were obtained, combined, and plotted. Two different approaches, non-parametric Wilcoxon test and the ad hoc quantity (Δ K), were used to determine the optimal number of genetic clusters using the combined logs. It is acknowledged that log-likelihoods of K would plateau or slightly continue to increase, and a high variance between runs would be observed as K reaches the true value—the optimal number of subpopulations [48, 49]. Based on these two criteria, the optimal values of K for the HA and NA genes were visually determined from the log-likelihood plots. The Δ K plots, the second order rate of change of the log-likelihood across Ks, were further used to confirm the optimal Ks identified in the log likelihood plots of the HA and NA genes [50]. After we identified the optimal Ks for each gene, we conducted ten independent runs of two million MCMC steps with a burn in of one million steps at the optimal Ks identified in the previous steps to obtain the membership probabilities for each individual sequence (membership coefficient–q value). Ten independent logs of the membership coefficients were averaged using CLUMPP software [51] and the membership coefficient matrices (q-matrix) of the HA and NA genes were obtained. The probabilities of genetic memberships of each gene were visualized using an Inverse Distance Weighted (IDW) interpolation tool with a power of 2 and 12 minimum number of neighbors to highlight local genetic structure of the virus while capturing global patterns of the genetic structure across mainland China. Contour lines ranging from 0.1 to 0.6 by 0.1 interval were generated and the q-values greater than 0.6 were highlighted to identify the boundaries of high q-values for each genetic cluster. Pairwise Fst values among provinces, measuring the degrees of genetic differentiation of viral populations, were calculated to support our finding of geographic distribution of genetic clusters of the A/H1N1pdm09 virus using hierfstat package in R [52]. The membership probability surfaces for the HA and NA genes were generated in ArcMap 10.7 using the world administrative boundaries shapefile obtained from the World Bank Data Catalog (https://datacatalog.worldbank.org/).
Results
Phylogenetic analyses
Phylogenetic trees of the HA and NA genes of A/H1N1pdm09 virus were reconstructed to understand the evolutionary dynamics of the viruses sampled in mainland China during the first year of the pandemic (Aug 2009—Aug 2010). Branches of phylogenies were color coded by year of isolation, revealing that the viruses were not divergent across the year but intermingled across time in the same clades (Fig 2, S1 and S2 Figs). This is a typical pattern observed in phylogenies of pandemic viruses, demonstrative of an explosive increase in genetic diversity of both genes during the early stages of the pandemic. The mean substitution rates of the HA and NA genes were 5.607 x 10−3 [95% HPD: 4.66 x 10−3–6.64 x 10−3] and 4.97 x 10−3 [95% HPD: 3.84 x 10−3–6.14 x 10−3] substitutions per site per year, respectively (Table 1). These rates were higher than those from the early phase of the pandemic (HA: 3.67 x 10−3, NA: 3.65 x 10−3) [53], but not significantly different from results over a longer study period [54].
[Figure omitted. See PDF.]
Branches were colored in grey (2009) or red (2010) based on year of viral isolation.
[Figure omitted. See PDF.]
Geographic scales of spatial genetic structure
Partial Mantel correlograms based upon two different measures of geographic distance (Euclidean distance and indexed distance) for the HA and NA genes are presented in Fig 3. Statistically significant Mantel r values within each distance lag are symbolized as filled circles, while values that lack significance are represented as hollow circles. Surprisingly, partial Mantel correlograms of the HA and NA genes from Euclidean distance matrices generally present the IBD patterns, where the genetic distance is positively correlated with geographic distance, although such patterns are more evident in the HA than the NA gene. Of note, the Mantel r value around the 2,000km distance lag in the NA gene indicates a significant positive correlation, but this pattern was not seen in the HA gene.
[Figure omitted. See PDF.]
Meanwhile, partial Mantel correlograms with the indexed distance measure illustrate the correlation between genetic sequences across different administrative memberships. The partial Mantel correlogram plot with indexed distance measure (Fig 3) exhibits high positive correlations among samples of the HA gene in the first, second, and third classes, indicative of frequent and strong gene exchanges within the same district, prefecture, and province level regions in mainland China during the period. The Mantel r statistic of HA genes in the fifth class was negative and statistically significant, indicating the presence of genetic differentiation in the HA gene across the seven geographic regions in mainland China. The Mantel r value in the first and fourth class were significant, while the second, third, and fifth classes in NA genes were statistically not significant, implying the absence of genetic structure at these spatial scales. The scatterplots of PCA of the HA and NA genes support these findings, indicating that the geographic patterns of genetic differentiation across the seven geographic regions were more pronounced in HA than NA genes (Fig 4). Collectively, high levels of genetic similarity between HA and NA genes within the same district, prefecture, and provincial level regions imply the presence of the genetic clusters at these spatial scales, while the genetic differentiation of the HA genes across seven geographic regions was observed.
[Figure omitted. See PDF.]
Bayesian clustering analyses
The geographic patterns of the distribution of subpopulations of HA and NA genes of the A/H1N1pdm09 virus were investigated using an individual-level Bayesian clustering approach. First, we identified the optimal numbers of genetic clusters for each HA and NA gene using the non-parametric Wilcoxon test and Δ K plots. The log-likelihoods plot of the HA (S5 Fig) gradually increased from K = 1, exhibited a large variance at K = 3 followed by reaching its plateau of the log-likelihood statistic from K = 4. The ad hoc quantity (Δ K) indicates clear peak at K = 4, supporting our finding of the optimal K for HA genes. The log-likelihood values of the NA gene reached its plateau at K = 2 with a large variance of log-likelihood statistics between K = 1 and K = 2. In the ad hoc quantity plot of the NA gene, the Δ K was maximum at K = 2, supporting our finding of the optimal number of subpopulations of the NA genes from the log-likelihood plot. Taken together, the optimal number of subpopulations of the HA genes was larger than these of the NA genes, suggesting that the HA genes were more genetically diverged than the NA genes during the first year of the pandemic.
The probability of genetic memberships assigned to each individual sequence of the HA and NA genes of the A/H1N1pdm09 virus (S2 and S3 Tables and S6 Fig) were visualized using the IDW interpolation to investigate the spatial patterns of the distribution of genetic subpopulations of the HA and NA genes. Fig 5A illustrates that the HA genes with high membership probabilities of Cluster 1 were widely distributed across East and Central China. Provinces which contain contour lines of the IDW with q values greater than 0.6 include Sichuan, Chongqing, Hunan, Hubei, Anhui, Henan, Shanxi, and Shandong. Notably, both global and localized spatial genetic clusters are apparent in the map of Cluster 1. Specifically, there is a broad cluster spanning from the Shandong peninsula in the east coast to the inner provinces (Fig 5A) and high degrees of population genetic similarity among these provinces were identified from pairwise Fst values (S4 Table and S7 Fig). Meanwhile, locally confined subpopulations (genetic demes) were observed in several regions, mainly in Gansu, Sichuan, and Guangxi. In contrast, the spatial distribution of HA genes with higher membership probability for Cluster 2 illustrates a more geographically confined pattern that is dominant in South China, particularly in Guangdong and Fujian provinces, although small local subpopulations of with Cluster 2 membership were also found in Heilongjiang, Jilin, Shaanxi, and Guangxi (Fig 5B, S4 Table and S7 Fig). In contrast, genes with majority membership to Clusters 3 and 4 are found in sub-regional clusters. The geographic distribution of HA genes with q values of Cluster 3 greater than 0.6 illustrate locally isolated patterns, observed mainly in Heilongjiang, Zhejiang, Guangdong, Guangxi, and Hainan (Fig 5C). Interestingly, analysis of pairwise Fst between Hainan and other provinces indicated the smallest genetic differentiation between HA genes in Hainan and Heilongjiang (S4 Table and S7 Fig). Lastly, mapping Cluster 4 membership identified geographic clusters with q > 0.6 only in Yunnan province, and no other significant genetic subpopulation was observed in other regions (Fig 5D).
[Figure omitted. See PDF.]
Thick contour lines represent the boundaries of high q-values greater than 0.6.
The clustering analysis of NA gene sequences identified only two genetic clusters, as visualized in Fig 6. The genetic sequence samples of NA that were classified as Cluster 1 are mainly distributed across North, Northeast, and South China (Fig 6A), while the NA genes with higher membership probability of Cluster 2 were widely observed in East and Central China (Fig 6B). Although the geographic differentiation of the genetic cluster of the NA genes is evident, the spatial patterns of the genetic clusters were determined by only two subpopulations.
[Figure omitted. See PDF.]
Thick contour lines represent the boundaries of high q-values greater than 0.6.
Discussion
Our study used influenza A/H1N1pdm09 virus HA and NA gene sequences with known sampling locations to investigate the geographic patterns of genetic structure and genetic differentiation of the virus in mainland China during the initial active phase of the 2009 H1N1 pandemic. Due to the highly mobile nature of humans and, by extension, the influenza virus, we expected that broad-scale geographic patterns of genetic population structure might be dominant over localized patterns of genetic clustering, as multiple introductions from major cities into local communities can blur the genetic population structure of the virus by increasing the genetic diversity in local areas (see for example [10]). In addition, only few research investigated the geographic distribution of the genetic mutations or spatial genetic structure of the virus at a national scale [55], and most previous studies have focused on the influenza trends within a single city or province in China [56–59]. Our study, however, highlights the presence of spatial genetic structure at both national and local scales across mainland China that emerged during the initial stages of the pandemic.
Partial Mantel correlograms with Euclidean distance indicate the presence of IBD patterns in the genetic structure of HA and NA genes, clearly showing the positive correlation between genetic and geographic distances. Given that influenza can travel long distances via multiple transportation networks, frequent viral migration among major municipalities and provinces, as well as gene flow from those regions to local communities, is expected. This may lead to genetic homogeneity among distant but well-connected populations, while multiple introductions of more than two lineages into the local areas may blur the genetic structure of viral populations at smaller geographic scales [10]. Contrary to our hypotheses, however, the results indicate high degrees of genetic similarity among the HA and NA genes sampled within short geographic distances, while spatial genetic differentiation was found across mainland China. These patterns are clearly shown in the partial Mantel correlograms with indexed distance and the scatterplots of PCA, indicating high degrees of genetic similarity between HA and NA genes within the same district regions, and the HA genes in the same prefectural and provincial regions, and the statistically significant genetic differentiation of HA genes across the seven geographic regions of China (Figs 3 and 4). The small genetic demes in the maps of genetic clustering (e.g., Fig 5A) further support our finding of the strong local circulation of the virus during the study period.
The genetic homogeneity of the HA and NA genes at small geographic scales suggests that founder effects might take place at these spatial scales [60, 61]. Though there were likely multiple introductions of different lineages into local areas, only a few successful lineages of the A/H1N1pdm09 virus would become dominant, while the majority of introduced lineages would fail to persist in local or regional populations. Although extensive viral migration, presumably from East China, is supported by geographical clustering (Fig 5A, S7 Fig and S4 Table), country-wide viral exchange and gene flow do not appear to be a predominant driver in establishing a nation-wide pattern of spatial population structure. There appears to be regional differences in which lineage became dominated during the first year of the H1N1 pandemic in China. However, we cannot completely rule out potential bias in these spatial patterns, as only a small proportion of influenza cases were genotyped, and these sequences may not be sufficient to represent the genetic diversity within the regions. Therefore, further study is needed to investigate the genetic characteristics of A/H1N1pdm09 virus at small geographic scales.
Interestingly, the IBD patterns of Mantel correlogram of NA gene are more ambiguous than those of HA genes, particularly in terms of the significant positive Mantel r values near the 2,000km distance lag. These unclear IBD patterns in the Mantel correlogram of NA gene may be because the NA gene was under weaker selection pressure and not as genetically differentiated as the HA gene. Although both HA and NA genes determine the antigenic immune profile of influenza A viruses, the greater neutralizing potential of anti-HA antibodies imposes stronger selection pressures on the HA gene [62–65]. In addition to being a target for host immune neutralization, the HA gene synthesizes the spike-like receptor binding glycoprotein on the surface of the virus which allows the virus to bind to the host cells, and thus plays an important role in determining the infectivity of viral particles. Although the NA protein is also highly mutable and under diversifying positive selection in response to anti-NA antibody pressure [62, 66], positive selection pressure is generally stronger on the HA gene due to the highest concentration of epitopes in the HA1 sub-domain of the HA protein [54, 67], which is the least conserved segment of influenza virus and the major target of human immunity against the virus. This is further supported by the estimates of the mean substitution rates of the HA and NA genes during the study period (Table 1). The mean substitution rates of NA were estimated as generally lower than those documented for the HA gene, implying that the NA genes were genetically less differentiated. Taken together, the NA genes in the A/H1N1pdm09 virus were less genetically diverged over the first year of the pandemic, thereby the genetic differentiation of NA genes by geographic distance might be less obvious, particularly within the range of 0–2,500km and as shown in the scatterplots of PCA (Fig 4).
We found broad-scale spatial patterns of distribution of the HA genes with high probability of Cluster 1 membership (Fig 5A). These patterns were also evident in pairwise Fst estimates, indicating low genetic differentiation among provinces in East and Central China (S4 Table and S7 Fig). These patterns may be attributable to the movement of migrant workers from East China to rural areas in Central and Western China. In particular, the Yangtse River Delta, one of the major industrial regions in China, covers a wide range of geographic regions including the provinces of Jiangsu, Anhui, Zhejiang, and Shanghai, which combined account for more than 20% of rural migrant workers in China [68–70]. Many of the migrant workers that travel to the east originate in inner provinces with large rural populations, such as Sichuan, Henan, Anhui, and Shandong [69]. This large population of migrant workers is also involved in the periodic population movements from east to west during the Lunar New Year holiday when the flu is most prevalent and incidence is high. These geographic patterns may suggest that the viruses might maintain circulation among human populations in East China and then spread to local communities in Central and Southwest China, giving rise to the formation of local genetic clusters of the virus in these regions. It should be noted, however, that the periodic movement of migrant workers may not be a sole driver of the spatial patterns of Cluster 1 subpopulation distributions, but other factors may play a role in the formation of spatial genetic structure in mainland China, such as climate, socioeconomic, and demographic characteristics in each region, or viral introductions from other regions outside mainland China via international air travel [71–73]. Previous studies analyzed the role of human movements during the Lunar New Year holiday in the spread of human infectious diseases across China (e.g., influenza, STDs, and SARS-CoV-1) [74–78]. However, this study only provides descriptive interpretations of the results, thus the association between human migrations and genetic structure was not statistically tested. Therefore, further model-testing and data collection are necessary to identify the specific demographic and environmental forces driving these spatial patterns of genetic structure across East and Central China.
The HA genes with high probability of Cluster 2 membership were observed mainly in South China during the study period (Fig 5B). Cultural landscapes in South China are characterized by highly populated urban areas, frequent domestic and international trade, extensive human travel between neighboring countries, and a large number of seasonal migrant workers from other provinces [79–81]. In particular, over the past three decades, Guangdong province remains the largest destination of rural migrant workers, accounting for 44% of the total population in the province in 2004 [69]. The population inflow from rural areas has dramatically increased the population size and density of cities in Guangdong province, such as Shenzhen, one of the destination cities for rural migrant workers in China [82]. These rural migrant workers often live in poor and overcrowded housing conditions, with low incomes, generally low awareness of disease prevention, and poor immunization status, which makes these populations more vulnerable to respiratory infectious diseases and widespread viral circulation [82–85]. These characteristics of the cultural landscape of South China might reinforce the local circulation of the virus among the large number of human populations of South China [86, 87].
It is still questionable, however, that the HA genes with high probability of Cluster 2 membership were geographically confined to South China and failed to spread beyond the region. South China has historically been proposed to serve as a human influenza A epicenter responsible for at least two influenza pandemics in the previous century, A/H2N2 in 1957 and A/H3N2 in 1968 [88, 89], implying a high potential of viral strains circulating in South China to have more antigenicity than those circulating in other regions. The results were opposite to what we expected, however, indicating that the strains in South China were relatively geographically confined in these regions, presumably because of less competitive antigenicity or infectivity than the viral strains that were found in East and Central China. It is still unclear the relationship between genetic/antigenic characteristics of the virus in South China and their geographical constraints acting on their spatial distribution. Therefore, identifying the prevalence of influenza A viruses among migrant workers in South China and local persistence of the virus over time is required to understand the role of human population landscape in the geographic patterns of genetic structure in the region.
The map of Cluster 3 membership and pairwise Fst identified high degrees of genetic similarity of the HA genes among Heilongjiang in North China, Zhejiang in East China, and Guangdong, Guangxi, and Hainan in South China (Fig 5C and S7 Fig). Interestingly, analysis of pairwise Fst between Hainan and other provinces indicated that the genetic differentiation between Hainan and Heilongjiang was the smallest, implying frequent viral exchange between two distant regions. Notably, Northeast China has a northerly continental monsoon climate with long and cold winters, lasting from November to March with an average daily high temperature below 27°F. Because of this seasonal variation, it is common for people in Northeast China to vacation in tropical places to escape the harsh winters of this region. Hainan province, in particular, is one of the popular destinations for Northeast Chinese tourists during the Lunar New Year holiday [90], which coincides with the period when flu is most prevalent. Despite the distance between these two geographic regions, a large volume of returning tourists from South to Northeast China after the Lunar New Year vacations might be sufficient to establish long-distance viral transmission, resulting in a high degree of genetic similarity between viruses from these two regions. Meanwhile, local genetic clusters in Guangxi, Guangdong, and Hunan may form due to the geographic proximity to Hainan. However, a more formal phylogeographic approach is required to identify the gene flow of the A/H1N1pdm09 virus across these regions [54, 91, 92].
The spatial genetic structure of the NA sequences in the cluster maps was evident, presenting high levels of genetic similarity between NA gene sequences sampled in North and South China, and viral population that extends from East to Southwest China. However, we observed only two genetic subpopulations of in our sample of NA gene sequences, due to low degree of genetic differentiation. This would seem unusual, as anti-NA antibodies in humans help reduce both the replication of the virus and its virulence [93], though it is possible selection for immune escape within the HA gene overwhelmed similar selection pressures in the NA gene during the early stages of the pandemic.
This study was conducted using genome sequences of the A/H1N1pdm09 virus isolated only in mainland China. This geographically constrained sample may fail to capture viral exchanges between China and other countries, and thus under-estimate the degree to which independent introductions of virus might affect our inferences of geographic structure of viral gene sequence diversity. Furthermore, the short sampling period of only one year (August 2009 –August 2010) may not be sufficient to examine the evolution and geographic patterns of genetic differentiation of the virus in space. Moreover, early viral sequences sampled before August 2009 that cover the first wave of the pandemic in early spring were excluded from this study due to their lack of associated specific geographic locations. Earlier samples might prove important, as although geographic patterns of genetic differentiation might not exist at the earliest phase of the pandemic, these viral samples may provide insight into how A/H1N1pdm09 virus was first introduced to and how it subsequently spread through mainland China. Lastly, spatially and temporally uneven sampling should be addressed to avoid potential bias in the outcome derived from over/under sampled provinces in future studies.
The geographic patterns of the genetic structure of the pandemic influenza viruses presented in this study imply the strong association between human population connections and the spread of the virus. The IBD patterns of spatial genetic structure of the A/H1N1pdm09 viruses were clear, suggesting efficient viral circulation at smaller geographic scales (i.e., districts, prefectural, and provincial regions) and genetic differentiation at large scales. The results also highlight the presence of both global and local patterns of spatial genetic structure of A/H1N1pdm09 virus HA and NA genes. Periodic population movements from provinces along the east coast to inner provinces may contribute to broad-scale geographic patterns of genetic structure, while localized genetic subpopulations may imply that viral transmission from highly populated urban areas to local communities as well as local to local areas was also significant during the first year of the pandemic.
Our findings are expected to provide the basis for surveillance and intervention strategies before/after the emergence of new pandemic viruses. Active influenza surveillance during the Lunar New Year holiday may help control the viral transmission driven by periodic population movements, while monitoring the viruses circulating in densely populated areas along the east coast of China may allow for the identification of the antigenic variants in a timely manner. Identifying the geographic distribution of place-specific predominant influenza viral strains will be helpful for developing influenza vaccines and efficient vaccination plans that maximize the efficiency for disease control. Lastly, further investigation of landscape factors, such as climate, human transportation networks, and socioeconomic characteristics, is necessary to improve our knowledge about the underlying drivers of the genetic differentiation of human influenza viruses in preparation of public health strategies for future pandemics.
Supporting information
S1 Fig. Bayesian temporal phylogeny of HA genes of A/H1N1pdm09 virus with sequence names.
https://doi.org/10.1371/journal.pone.0284716.s001
S2 Fig. Bayesian temporal phylogeny of NA genes of A/H1N1pdm09 virus with sequence names.
https://doi.org/10.1371/journal.pone.0284716.s002
S3 Fig. Maximum likelihood phylogeny of HA genes of A/H1N1pdm09 virus.
Internal nodes with a bootstrap support of 50% or greater (1,000 replications) are indicated.
https://doi.org/10.1371/journal.pone.0284716.s003
S4 Fig. Maximum likelihood phylogeny of NA genes of A/H1N1pdm09 virus.
Internal nodes with a bootstrap support of 50% or greater (1,000 replications) are indicated.
https://doi.org/10.1371/journal.pone.0284716.s004
S5 Fig. The summary of log-likelihood statistics and the estimates of ΔK over K = 1–30 from STRUCTURE.
https://doi.org/10.1371/journal.pone.0284716.s005
S6 Fig. The bar plots of the membership probabilities of HA and NA genes of A/H1N1pdm09 virus by province by geographic region in mainland China.
https://doi.org/10.1371/journal.pone.0284716.s006
S7 Fig. The maps of pairwise Fst estimates of HA genes between Shandong, Guangdong, and Hainan and their top 5 closest provinces in mainland China.
https://doi.org/10.1371/journal.pone.0284716.s007
S1 Table. A list of isolate names, sources, isolate ID, and collection date used in the study.
https://doi.org/10.1371/journal.pone.0284716.s008
(CSV)
S2 Table. Summary of the membership coefficients (K = 4) of HA genes calculated in CLUMPP software.
https://doi.org/10.1371/journal.pone.0284716.s009
(CSV)
S3 Table. Summary of the membership coefficients (K = 2) of NA genes calculated in CLUMPP software.
https://doi.org/10.1371/journal.pone.0284716.s010
(CSV)
S4 Table. Pairwise Fst estimates of HA genes of A/H1N1pdm09 virus among provinces in mainland China.
https://doi.org/10.1371/journal.pone.0284716.s011
(XLSX)
S5 Table. Pairwise Fst estimates of NA genes of A/H1N1pdm09 virus among provinces in mainland China.
https://doi.org/10.1371/journal.pone.0284716.s012
(XLSX)
Citation: Kim S, Carrel M, Kitchen A (2023) Spatial genetic structure of 2009 H1N1 pandemic influenza established as a result of interaction with human populations in mainland China. PLoS ONE 18(5): e0284716. https://doi.org/10.1371/journal.pone.0284716
About the Authors:
Seungwon Kim
Roles: Conceptualization, Data curation, Formal analysis, Methodology, Visualization, Writing – original draft
E-mail: [email protected]
Current address: Department of Pathology, Johns Hopkins University, Baltimore, Maryland, United States of America
Affiliation: Department of Geographical and Sustainability Sciences, University of Iowa, Iowa City, Iowa, United States of America
ORICD: https://orcid.org/0000-0002-5509-0969
Margaret Carrel
Roles: Conceptualization, Supervision, Writing – review & editing
Affiliations: Department of Geographical and Sustainability Sciences, University of Iowa, Iowa City, Iowa, United States of America, Department of Epidemiology, University of Iowa, Iowa City, Iowa, United States of America
Andrew Kitchen
Roles: Conceptualization, Methodology, Validation, Writing – review & editing
Affiliation: Department of Anthropology, University of Iowa, Iowa City, Iowa, United States of America
1. Viboud C, Bjornstad ON, Smith DL, Simonsen L, Miller MA, Grenfell BT. Synchrony, Waves, and Spatial Hierarchies in the Spread of Influenza. Science (80-). 2006;312: 447–451. pmid:16574822
2. Stark JH, Cummings DAT, Ermentrout B, Ostroff S, Sharma R, Stebbins S, et al. Local Variations in Spatial Synchrony of Influenza Epidemics. Cook AR, editor. PLoS One. 2012;7: e43528. pmid:22916274
3. Pyle GF. The Diffusion of Influenza: Patterns and Paradigms. Totowa, NJ, USA: Rowan & Littlefield; 1986.
4. Sabel CE, Pringle D, Schrstrm A. Infectious Disease Diffusion. A Companion to Health and Medical Geography. Oxford, UK: Wiley-Blackwell; 2010. pp. 111–132. https://doi.org/10.1002/9781444314762.ch7
5. Riley S. Large-Scale Spatial-Transmission Models of Infectious Disease. Science (80-). 2007;316: 1298–1301. pmid:17540894
6. Russell CA, Jones TC, Barr IG, Cox NJ, Garten RJ, Gregory V, et al. The Global Circulation of Seasonal Influenza A (H3N2) Viruses. Science (80-). 2008;320: 340–346. pmid:18420927
7. Balcan D, Colizza V, Goncalves B, Hu H, Ramasco JJ, Vespignani A. Multiscale mobility networks and the spatial spreading of infectious diseases. Proc Natl Acad Sci. 2009;106: 21484–21489. pmid:20018697
8. Charu V, Zeger S, Gog J, Bjørnstad ON, Kissler S, Simonsen L, et al. Human mobility and the spatial transmission of influenza in the United States. Salathé M, editor. PLOS Comput Biol. 2017;13: e1005382. pmid:28187123
9. Colizza V, Barrat A, Barthelemy M, Vespignani A. The role of the airline transportation network in the prediction and predictability of global epidemics. Proc Natl Acad Sci. 2006;103: 2015–2020. pmid:16461461
10. Holmes EC, Ghedin E, Halpin RA, Stockwell TB, Zhang X-Q, Fleming R, et al. Extensive Geographical Mixing of 2009 Human H1N1 Influenza A Virus in a Single University Community. J Virol. 2011;85: 6923–6929. pmid:21593168
11. Smith GJD, Bahl J, Vijaykrishna D, Zhang J, Poon LLM, Chen H, et al. Dating the emergence of pandemic influenza viruses. Proc Natl Acad Sci. 2009;106: 11709–11712. pmid:19597152
12. Dawood FS, Jain S, Finelli L, Shaw MW, Lindstrom S, Garten RJ, et al. Emergence of a Novel Swine-Origin Influenza A (H1N1) Virus in Humans. N Engl J Med. 2009;360: 2605–2615. pmid:19423869
13. Khan K, Arino J, Hu W, Raposo P, Sears J, Calderon F, et al. Spread of a Novel Influenza A (H1N1) Virus via Global Airline Transportation. N Engl J Med. 2009;361: 212–214. pmid:19564630
14. Fraser C, Donnelly CA, Cauchemez S, Hanage WP, Van Kerkhove MD, Hollingsworth TD, et al. Pandemic potential of a strain of influenza A (H1N1): Early findings. Science (80-). 2009;324: 1557–1561. pmid:19433588
15. Young SG, Kitchen A, Kayali G, Carrel M. Unlocking pandemic potential: Prevalence and spatial patterns of key substitutions in avian influenza H5N1 in Egyptian isolates. BMC Infect Dis. 2018;18: 1–13. pmid:29980172
16. Carrel M, Emch M. Genetics: A New Landscape for Medical Geography. Ann Assoc Am Geogr. 2013;103: 1452–1467. pmid:24558292
17. Jombart T, Devillard S, Balloux F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 2010;11: 94. pmid:20950446
18. Rambaut A, Pybus OG, Nelson MI, Viboud C, Taubenberger JK, Holmes EC. The genomic and epidemiological dynamics of human influenza A virus. Nature. 2008;453: 615–619. pmid:18418375
19. Fitch WM, Bush RM, Bender CA, Cox NJ. Long term trends in the evolution of H(3) HA1 human influenza type A. Proc Natl Acad Sci U S A. 1997;94: 7712–7718. pmid:9223253
20. Butler J, Hooper KA, Petrie S, Lee R, Maurer-Stroh S, Reh L, et al. Estimating the Fitness Advantage Conferred by Permissive Neuraminidase Mutations in Recent Oseltamivir-Resistant A(H1N1)pdm09 Influenza Viruses. PLoS Pathog. 2014;10. pmid:24699865
21. Manel S, Schwartz MK, Luikart G, Taberlet P. Landscape genetics: Combining landscape ecology and population genetics. Trends Ecol Evol. 2003;18: 189–197.
22. Storfer A, Murphy M a, Evans JS, Goldberg CS, Robinson S, Spear SF, et al. Putting the “landscape” in landscape genetics. Heredity (Edinb). 2007;98: 128–42. pmid:17080024
23. Biek R, Real LA. The landscape genetics of infectious disease emergence and spread. Mol Ecol. 2010;19: 3515–3531. pmid:20618897
24. Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance. 2017;22: 2–4. pmid:28382917
25. Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, et al. The Influenza Virus Resource at the National Center for Biotechnology Information. J Virol. 2008;82: 596–601. pmid:17942553
26. Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32: 1792–1797. pmid:15034147
27. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Teeling E, editor. Mol Biol Evol. 2020;37: 1530–1534. pmid:32011700
28. Rambaut A, Lam TT, Max Carvalho L, Pybus OG. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2016;2: vew007. pmid:27774300
29. Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7: 214. pmid:17996036
30. Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018;35: 1547–1549. pmid:29722887
31. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006;4: 699–710. pmid:16683862
32. Kingman JFC. The coalescent. Stoch Process their Appl. 1982;13: 235–248.
33. Rambaut A, Suchard MA, Drummond AJ. Tracer. 2013 [cited 16 May 2017]. Available: http://tree.bio.ed.ac.uk/software/tracer/
34. Drummond AJ, Rambaut A. FigTree. Available: http://tree.bio.ed.ac.uk/software/figtree/
35. Page RDM, Holmes EC. Molecular evolution: A phylogenetic approach. London, UK: Blackwell Science Ltd; 1998.
36. Paradis E, Claude J, Strimmer K. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20: 289–290. pmid:14734327
37. Mantel N. The detection of disease clustering and a generalized regression approach. Cancer Res. 1967;27: 209–220. pmid:6018555
38. Oden NL, Sokal RR. Directional autocorrelation: an extension of spatial correlograms to two dimensions. Syst Zool. 1986;35: 608–617.
39. Goslee SC, Urban DL. The ecodist package for dissimilarity-based analysis of ecological data. J Stat Softw. 2007;22: 1–19. citeulike-article-id:
40. Hellberg ME. Gene Flow and Isolation among Populations of Marine Animals. Annu Rev Ecol Evol Syst. 2009;40: 291–310.
41. Sork VL, Nason J, Campbell DR, Fernandez JF. Landscape approaches to historical and contemporary gene flow in plants. Trends Ecol Evol. 1999;14: 219–224. pmid:10354623
42. Storfer A, Murphy MA, Spear SF, Holderegger R, Waits LP. Landscape genetics: Where are we now? Mol Ecol. 2010;19: 3496–3514. pmid:20723061
43. ESRI. ArcGIS Desktop: Release 10.7.1Redlands, CA; 2019.
44. Jombart T, Devillard S, Dufour AB, Pontier D. Revealing cryptic spatial patterns in genetic variability by a new multivariate method. Heredity (Edinb). 2008;101: 92–103. pmid:18446182
45. Demšar U, Harris P, Brunsdon C, Fotheringham AS, McLoone S. Principal Component Analysis on Spatial Data: An Overview. Ann Assoc Am Geogr. 2013;103: 106–128.
46. Dray S, Dufour A-B. The ade4 Package: Implementing the Duality Diagram for Ecologists. J Stat Softw. 2007;22: 1–20.
47. Porras-Hurtado L, Ruiz Y, Santos C, Phillips C, Carracedo Á, Lareu M V. An overview of STRUCTURE: applications, parameter settings, and supporting software. Front Genet. 2013;4: 1–13. pmid:23755071
48. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155: 945–959. pmid:10835412
49. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, et al. Genetic Structure of Human Populations. Science (80-). 2002;298: 2381–2385. pmid:12493913
50. EVANNO G, REGNAUT S, GOUDET J. Detecting the number of clusters of individuals using the software structure: a simulation study. Mol Ecol. 2005;14: 2611–2620. pmid:15969739
51. Jakobsson M, Rosenberg NA. CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics. 2007;23: 1801–1806. pmid:17485429
52. GOUDET J. HIERFSTAT, a package for r to compute and test hierarchical F-statistics. Mol Ecol Notes. 2005;5: 184–186.
53. Smith GJD, Vijaykrishna D, Bahl J, Lycett SJ, Worobey M, Pybus OG, et al. Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature. 2009;459: 1122–1125. pmid:19516283
54. Su YCF, Bahl J, Joseph U, Butt KM, Peck HA, Koay ESC, et al. Phylodynamics of H1N1/2009 influenza reveals the transition from host adaptation to immune-driven selection. Nat Commun. 2015;6: 7952. pmid:26245473
55. Yang TT, Wang ZG, Li SP, Liu XL, Yi Y, Yang Y, et al. Increased prevalence of a rare mutant of pandemic H1N1 influenza virus in a Eurasian region. Infect Genet Evol. 2011;11: 227–231. pmid:20934538
56. Li T, Fu C, Di B, Wu J, Yang Z, Wang Y, et al. A two-year surveillance of 2009 pandemic influenza a (H1N1) in Guangzhou, China: From pandemic to seasonal influenza? PLoS One. 2011;6: 1–5. pmid:22125653
57. Shen Y, Lu H. Pandemic (H1N1) 2009, Shanghai, China. Emerg Infect Dis. 2010;16: 1011–1013. pmid:20507760
58. Zhao XN, Zhang HJ, Li D, Zhou JN, Chen YY, Sun YH, et al. Whole-genome sequencing reveals origin and evolution of influenza A(H1N1)pdm09 viruses in Lincang, China, from 2014 to 2018. PLoS One. 2020;15: 1–18. pmid:32579578
59. Xiao H, Lin X, Chowell G, Huang C, Gao L, Chen B, et al. Urban structure and the risk of influenza A (H1N1) outbreaks in municipal districts. Chinese Sci Bull. 2014;59: 554–562.
60. Nelson MI, Tan Y, Ghedin E, Wentworth DE, St. George K, Edelman L, et al. Phylogeography of the Spring and Fall Waves of the H1N1/09 Pandemic Influenza Virus in the United States. J Virol. 2011;85: 828–834. pmid:21068250
61. Nelson MI, Spiro D, Wentworth D, Beck E, Fan J, Ghedin E, et al. The early diversification of influenza A/H1N1pdm. PLoS Curr. 2009;1: RRN1126. pmid:20029664
62. Zost SJ, Wu NC, Hensley SE, Wilson IA. Immunodominance and Antigenic Variation of Influenza Virus Hemagglutinin: Implications for Design of Universal Vaccine Immunogens. J Infect Dis. 2019;219: S38–S45. pmid:30535315
63. Petrova VN, Russell CA. The evolution of seasonal influenza viruses. Nat Rev Microbiol. 2018;16: 47–60. pmid:29081496
64. Bush RM, Fitch WM, Bender CA, Cox NJ. Positive selection on the H3 hemagglutinin gene of human influenza virus A. Mol Biol Evol. 1999;16: 1457–1465. pmid:10555276
65. Knossow M, Skehel JJ. Variation and infectivity neutralization in influenza. Immunology. 2006;119: 1–7. pmid:16925526
66. Webster RG, Bean WJ, Gorman OT, Chambers TM, Kawaoka Y. Evolution and ecology of influenza A viruses. Microbiol Rev. 1992;56: 152 LP– 179. Available: pmid:1579108
67. Ina Y, Gojobori T. Statistical analysis of nucleotide sequences of the hemagglutinin gene of human influenza A viruses. Proc Natl Acad Sci U S A. 1994;91: 8388–8392. pmid:8078892
68. Chen Xiangming. A Tale of Two Regions in China. Int J Comp Sociol. 2007;48: 167–201.
69. Shi L. Rural migrant workers in China: scenario, challenges and public policy. 2008. Available: https://www.ilo.int/wcmsp5/groups/public/—dgreports/—integration/documents/publication/wcms_097744.pdf
70. Wei S, Lv X, Cong H. The Spatial Distribution of Industries: From “Massive Economic” to Industrial Cluster. In: Liu Z, Li X, editors. Transition of the Yangtze River Delta. Springer Japan; 2015. pp. 21–48. https://doi.org/10.1007/978-4-431-55178-2_2
71. Xu B, Tian H, Sabel CE, Xu B. Impacts of road traffic network and socioeconomic factors on the diffusion of 2009 pandemic influenza a (H1N1) in mainland China. Int J Environ Res Public Health. 2019;16: 1–14. pmid:30959783
72. Zheng Y, Wang K, Zhang L, Wang L. Study on the relationship between the incidence of influenza and climate indicators and the prediction of influenza incidence. Environ Sci Pollut Res. 2021;28: 473–481. pmid:32815008
73. Liu Y, Wang W, Li X, Wang H, Luo Y, Wu L, et al. Geographic Distribution and Risk Factors of the Initial Adult Hospitalized Cases of 2009 Pandemic Influenza A (H1N1) Virus Infection in Mainland China. PLoS One. 2011;6: e25934. Available: pmid:22022474
74. Soares Magalhães RJ, Zhou X, Jia B, Guo F, Pfeiffer DU, Martin V. Live Poultry Trade in Southern China Provinces and HPAIV H5N1 Infection in Humans and Poultry: The Role of Chinese New Year Festivities. Nishiura H, editor. PLoS One. 2012;7: e49712. pmid:23166751
75. Liu K, Ai S, Song S, Zhu G, Tian F, Li H, et al. Population Movement, City Closure in Wuhan, and Geographical Expansion of the COVID-19 Infection in China in January 2020. Clin Infect Dis. 2020;71: 2045–2051. pmid:32302377
76. Niu X, Yue Y, Zhou X, Zhang X. How Urban Factors Affect the Spatiotemporal Distribution of Infectious Diseases in Addition to Intercity Population Movement in China. ISPRS Int J Geo-Information. 2020;9: 615.
77. Chen H, Chen Y, Lian Z, Wen L, Sun B, Wang P, et al. Correlation between the migration scale index and the number of new confirmed coronavirus disease 2019 cases in China. Epidemiol Infect. 2020. pmid:32423504
78. Smith CJ. Social geography of sexually transmitted diseases in China: Exploring the role of migration and urbanisation. Asia Pac Viewp. 2005;46: 65–80.
79. Chow AT, Cheung S, Yip PK. Wildlife markets in south China. Human-Wildlife Interact. 2014;8: 108–112.
80. Garske T, Yu H, Peng Z, Ye M, Zhou H, Cheng X, et al. Travel Patterns in China. Jones J, editor. PLoS One. 2011;6: e16364. pmid:21311745
81. Fan CC. Migration in a Socialist Transitional Economy: Heterogeneity, Socioeconomic and Spatial Characteristics of Migrants in China and Guangdong Province. Int Migr Rev. 1999;33: 954. pmid:12349707
82. Xie X, Lu SQY, Cheng JQ, Cheng XW, Xu ZH, Mou J, et al. Estimate of 2009 H1N1 influenza cases in Shenzhen—The biggest migratory city in China. Epidemiol Infect. 2012;140: 788–797. pmid:21745428
83. Zhang D, Mou J, Cheng JQ, Griffiths SM. Public health services in Shenzhen: a case study. Public Health. 2011;125: 15–19. pmid:21256365
84. Keung Wong DF, Li CY, Song HX. Rural migrant workers in urban China: Living a marginalised life. Int J Soc Welf. 2007;16: 32–40.
85. Mou J, Griffiths SM, Fong H, Dawes MG. Health of China’s rural-urban migrants and their families: A review of literature from 2000 to 2012. Br Med Bull. 2013;106: 19–43. pmid:23690451
86. Ke C, Lu J, Wu J, Guan D, Zou L, Song T, et al. Circulation of reassortant influenza A(H7N9) viruses in poultry and humans, Guangdong Province, China, 2013. Emerg Infect Dis. 2014;20: 2034–2040. pmid:25418838
87. Webster RG, Guan Y, Peiris M, Walker D, Krauss S, Zhou NN, et al. Characterization of H5N1 influenza viruses that continue to circulate in geese in southeastern China. J Virol. 2002;76: 118–126. pmid:11739677
88. Scholtissek C, Rohde W, Von Hoyningen V, Rott R. On the origin of the human influenza virus subtypes H2N2 and H3N2. Virology. 1978;87: 13–20. pmid:664248
89. Shortridge KF, Stuart-Harris CH. an Influenza Epicentre? Lancet. 1982;320: 812–813. pmid:6126676
90. Wang J, Su Q. Dì èr jūsuǒ lǚjū zhě yǔ dāngdì jūmín shèhuì hùdòng guòchéng jí jīzhì—yǐ sānyà shì wéi lì [The social interaction processes and mechanisms between second home sojourners and local residents: A case study of Sanya city, China]. Dìlǐ yánjiū. 2021;40: 462–476.
91. Lemey P, Rambaut A, Drummond AJ, Suchard MA. Bayesian phylogeography finds its roots. PLoS Comput Biol. 2009;5. pmid:19779555
92. Ma Y, Liu K, Yin Y, Qin J, Zhou YH, Yang J, et al. The Phylodynamics of Seasonal Influenza A/H1N1pdm Virus in China Between 2009 and 2019. Front Microbiol. 2020;11: 1–14. pmid:32457705
93. Murphy BR, Kasel JA, Chanock RM. Association of Serum Anti-Neuraminidase Antibody with Resistance to Influenza in Man. N Engl J Med. 1972;286: 1329–1332. pmid:5027388
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Identifying the spatial patterns of genetic structure of influenza A viruses is a key factor for understanding their spread and evolutionary dynamics. In this study, we used phylogenetic and Bayesian clustering analyses of genetic sequences of the A/H1N1pdm09 virus with district-level locations in mainland China to investigate the spatial genetic structure of the A/H1N1pdm09 virus across human population landscapes. Positive correlation between geographic and genetic distances indicates high degrees of genetic similarity among viruses within small geographic regions but broad-scale genetic differentiation, implying that local viral circulation was a more important driver in the formation of the spatial genetic structure of the A/H1N1pdm09 virus than even, countrywide viral mixing and gene flow. Geographic heterogeneity in the distribution of genetic subpopulations of A/H1N1pdm09 virus in mainland China indicates both local to local transmission as well as broad-range viral migration. This combination of both local and global structure suggests that both small-scale and large-scale population circulation in China is responsible for viral genetic structure. Our study provides implications for understanding the evolution and spread of A/H1N1pdm09 virus across the population landscape of mainland China, which can inform disease control strategies for future pandemics.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer