Population Prediction of Chinese Prefecture-Level

Full text

Turn on search term navigation

1. Introduction

As the world’s most populous country, population has always been an important issue in the social development of China. The latest statistics from China’s National Bureau of Statistics, calculated at of the end of 2019, revealed that China is facing the lowest birth rate and natural growth rate among its population (10.48‰ and 3.34‰) since the years 1949 and 1961, respectively. The total population of Mainland China is 1.405 billion, and the population growth rate has increased by 0.33% year on year. The labor force participation rate dropped to a historical low (68%) in 2019, indicating that the demographic dividend may soon be exhausted. In terms of demographic structure, as the proportion of the aged population continues to increase, China has already begun to suffer from the rapid aging problem [1]. The urbanization rate and aging rate exhibit similar growth patterns [2], while the fertility rate has been decreasing year on year [3]. China has been an aging society since 2000, requiring continuous improvements in healthcare and provisions for ensuring the welfare of aged people, and the proportion of the aged population will continue to rise. Furthermore, the fertility level and fertility pattern of young women of childbearing age, as well as the proportion of the total population who are classed as women of childbearing age, are decreasing at significant rates [4]. The number of children per woman of childbearing age is only 1.047; thus, China may now be approaching the peak of its population size. In terms of migration, due to a high degree of difference in the opportunities and welfare offered by top-tier cities, the inflow of migrants from low-tier cities into high-tier cities continues.

As the driver of social development, population is closely related to development-related issues such as economic development, resource consumption, and ecological environmental protection. Apart from being the reflection of the fundamental status of the country, as well as its regions and cities, population is also a core element for further research on microenvironmental development [5]. Social problems, such as the overall slowdown of the population growth rate in China, the increasingly serious problems of sub-replacement fertility and an aging population, the year-on-year decrease in the fertility rate [6], and the large quantities of people who have travelled from rural areas and into cities, have brought heavy pressure upon and presented severe challenges to social care and welfare systems. These issues may also bring with them risks that are yet unforeseen and may threaten the operation of the country’s entire socioeconomic system.

Recently, the competition among the regions and cities has shifted from attracting highly educated and talented people to their areas to vying for young labor instead. The competition to attract a young labor force has intensified, which indicates that the economically developed regions are attracting more people who do not yet possess a household registration for that region (China’s household registration system is a state-implemented population management policy by which the state collects, confirms, and registers citizens’ births, deaths, familial relationships, legal addresses, and other basic information as part of population management. It informs policy decisions in areas such as employment, education, and social welfare provision) [7]. Most developed cities have been able to maintain their economic growth by absorbing young laborers, whose migration has left rural areas with an ever-greater proportion of aged inhabitants [8]. Therefore, there is an urgent need to formulate urban economic and social development plans and regional population development policies that are efficient and reasonable. To formulate relevant plans and policies scientifically, it is particularly important to accurately analyze and predict the development trends of the urban population.

Through a literature review, it is found that the mainstream population theories include Malthus’ population theory, sociological population theory, biological population theory, moderate population theory, population transition theory, population explosion theory, zero population growth theory, and so on [9,10]. Does China’s current population growth conform to the previous population theories? Based on the population agglomeration effect, economic siphon effect, the dialectical materialism of the whole and the part, mean regression, and other theories, this paper proposes a hypothesis about population theory: the future population growth of prefecture-level cities in China generally conforms to the zero population growth theory.

In the following, four models are used to predict the population development trend of 210 prefecture-level cities in China, the most accurate model data is used for further analysis, and some suggestions are put forward for future development.

2. Methodology

The prediction of population involves estimations of the scale, level, and trends of population development at a certain time in the future, using scientific methods based on the current population status and factors that may influence population development. Population prediction provides information that aids decision making. Numerous methods and models have been proposed for population prediction. Seven widely used types of population prediction methods are listed in Table 1 [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. Among them, the most common used methods are historical trend analysis, classic population analysis, the regression model, and the gray prediction model.

China’s unique national conditions include extensive geographical coverage (land area of 9.6 million square kilometers, ranking third in the world) [28], large number of cities (34 provincial-level administrative regions and 333 prefecture-level city administrative regions) [29], and a large population base (1.443 billion people, ranking first in the world) [30]. By organically combining China’s national conditions with the advantages and disadvantages of the model (see Table 1 for details), the model was filtered.

With the continuous improvement of science and technology, the timeliness and accuracy of statistical data are getting higher and higher. Therefore, using data close to the current year (short- and medium-term data) for prediction can avoid data prediction errors to a certain extent and the analysis results will be more accurate. The classical population analysis and population prediction models are not suitable for this study due to their high requirements for data and longtime accumulation; the BP neural network and the system dynamics approach meet these difficulties in implementation. However, the former requires extensive data for the training procedure [31], and the latter has a higher degree of subjectivity in its parameter-setting process and is prone to problems such as the omission of important variables [32].

However, the Malthusian model, the unary linear regression model, the logistic competitive biological model (or logistic model), and the gray prediction model are suitable for China’s national conditions and the analysis requirements of this study due to the characteristics of requiring small amounts of data, appropriate suitability for short- and medium-term prediction, and strong explanatory ability.

Therefore, this paper proposes a hypothesis about the model: by assuming that these four models can achieve decent results in population prediction, among them, gray prediction model has the highest prediction accuracy. Next, prediction and analysis will be carried out through these four methods. The following is an introduction to these four models.

2.1. Malthusian Model

This model was proposed by British demographer Thomas Malthus in 1798 and is based on the principle that, under normal circumstances, the population growth rate within a certain period can be approximately regarded as a certain value, and the population is predicted by an exponential growth function. The model is as follows:

(1) $P_{t} = P_{0} {(1 + r)}^{n},$

where

P_{t}

is set as the population size at the end of the prediction target year,

P_{0}

is set as the population size of the prediction base year,

r

is set as the average annual population growth rate, and

n

is set as the prediction period.

2.2. Unary Linear Regression Model

The unary linear regression model is as follows:

(2) $Y = a + b X,$

Among the terms above, the estimated values of the parameters are:

(3) $\hat{a} = \bar{Y} - \hat{b} \bar{X},$

(4) $\hat{b} = \frac{\sum (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sum {(X_{i} - \bar{X})}^{2}} = \frac{\sum X_{i} Y_{i} - n \bar{X} \bar{Y}}{\sum {(X_{i} - \bar{X})}^{2}},$

where time

X

is set as the control variable and the population

Y

is set as the state variable. The parameters

a

and

b

can be calculated by Equations (3) and (4) using historical data or by utilizing the ordinary least squares (OLS) method.

In the processing of population growth, unary linear regression can be used for population estimation when the population development rates of each period are relatively close; in other words, when the slopes of the tangent lines at all points on the population development curve are the same and approximate a linear growth. Subject to the limitations of the estimation conditions, this model is more suitable for short-term predictions. When used for long-term predictions, the errors caused by population changes would be gradually amplified thus affecting the accuracy of the prediction results.

2.3. Logistic Model

The logistic competitive biological model considers the limitations of population growth and proposes a law of population growth, which says that as the total population grows, the population growth rate will gradually decrease.

Let $p (t)$ be the population size at time $t$ . When the population is large, $p (t)$ can be considered continuous or even differentiable. Let $b (t, p)$ represent the birth rate of a single living being in the population per unit of time at time $t$ , while $d (t, p)$ represents the death rate in the population per unit of time at time $t$ . Consequently, $r (t, p) = b (t, p) - d (t, p)$ represents the natality rate of a single individual at time $t$ . The model is as follows:

(5) $Δ P = P_{t} + Δ (t - P_{t}) = r (t, P) * P_{t} Δ t,$

Let $Δ t \to 0,$ then we have:

(6) $\frac{d p}{d t} = r (t, p) * P,$

When $r (t, p) = a,$ then:

(7) $P^{'} = a P - b P^{2}, P (t_{0}) = P_{0},$

This is the idea behind the Malthusian model.

Owing to environmental constraints, competition between individuals results in a decrease in $r (t, P)$ . Therefore, the Dutch biologist Verhulst proposed to add the competition term $- b p^{2} (b > 0)$ . Thus, we get:

(8) $P^{'} = a P - b p^{2}, P (t) = P,$

The above equation is the logistic competitive biological model.

2.4. Gray Prediction Model

Due to certain conditions that are particular to some cities (such as a unique culture, special policies, etc.), the development of the population in some urban areas will be a more complicated, versatile, and nonstationary random process. Thus, static models such as linear regression models and exponential growth models, cannot guarantee accurate prediction results. On certain occasions, the confidence interval of the prediction result is extremely large due to errors in the model; thus, the results are of no practical significance. The gray prediction model generates a dynamic or nondynamic white module from a known data sequence according to a certain rule, explores the inner law from the noisy raw data, and builds a gray model according to certain changes and solutions to start the prediction. Characteristics including a low modeling information requirement, high operation, and high accuracy, make the model significantly better than the others in improving the randomness of data and accuracy of prediction.

With reference to Wang [33], Camelia [34,35], and Gao et al. [36], the GM (1, 1) model, which is a method of the gray prediction model, is suitable for population prediction. The prediction steps taken by this study are as shown below.

Set the original sequence ${X_{t}} = \{X_{1}, X_{2}, X_{3}, \dots, X_{n}\}, (t = 1, 2, 3, \dots, n)$ and conduct accumulative generation on ${X_{t}}$ to obtain the new sequence $\{Y_{t}\}$ and the mean-value sequence $\{Z_{t}\}$ :
(9) $Y_{t} = \sum_{i = 1}^{t} X_{i}, Z_{t} = \frac{(Y_{t} + Y_{t - 1})}{2},$
Establish a first-order differential equation:
(10) $\frac{d Y_{t}}{d t} + a Y_{t} = u,$
where $a$ is the gray developmental coefficient and $u$ is the gray action quantity.
Conduct parameter estimation using OLS:
(11) $\hat{a} = {(B^{T} B)}^{- 1} B^{T} λ_{t},$

(12) $B = {[\begin{matrix} - Z_{2} & - Z_{3} & \dots & - Z_{n} \\ 1 & 1 & \dots & 1 \end{matrix}]}^{T},$
Obtain the time function:
(13) ${\hat{Y}}_{t} (t + 1) = (X_{t} - μ) e x (- a t) + \frac{u}{a},$
and then calculate the estimated value of $X_{t}$ :
(14) ${\hat{X}}_{t} = {\hat{Y}}_{t} - {\hat{Y}}_{t - 1} (t = 1, 2, 3, \dots, n),$
Test the fitting effect using the after-test rule.

3. Results

The vast majority of previous studies predicted China’s total population on the national level and/or the provincial level, while a small minority of studies focused on the prediction of city populations using large samples. While the prefecture-level city unit is not only the smallest administrative division in China (smaller units include county and district levels), prefecture-level cities are important driving forces behind China’s economic development and policy decisions made at the city level, having a large effect upon wider regional development. Therefore, it is necessary to analyze the data of prefecture-level cities.

This study focuses on the analysis of urban population data at the level of cities using data from the China City Statistical Yearbook from 1999 to 2018. First, the population data of cities with missing values or abnormal population structure changes from year to year were identified and eliminated from this study. Next, cities that have been merged, split, or renamed during the selected years were eliminated. Finally, the population data of 210 prefecture-level cities from 1999 to 2018 were screened from 333 prefecture-level cities. Predictions were conducted using the four above mentioned models and methods, while the data with the highest degree of accuracy were chosen from the prediction results as the result for subsequent analysis. Owing to the large number of cities involved in this research, it is impossible to list the prediction data of all 210 cities. Therefore, only the prediction results of four first-tier cities are listed: Beijing, Shanghai, Guangzhou, and Shenzhen. (The city classification system adopted in this paper mirrors China’s new city classification list. The list evaluates prefecture-level cities according to five dimensions: business resource concentration, urban hub, urban activity, lifestyle diversity, and future plasticity. These dimensions decide the city’s tier: first-tier, new first-tier, second-tier, third-tier, fourth-tier, or fifth-tier. It is sometimes useful to groups these city tiers into the following groupings: the high-tier group includes first- and new first-tier cities, the middle-tier group includes second- and third-tier cities, and the low-tier group includes fourth- and fifth-tier cities.).

3.1. Population Prediction Results

3.1.1. Malthusian Model

According to the population development pattern of the four first-tier cities, based on population data on these cities from 1999 to 2018, three scenarios that utilize high, middle, and low natural population growth rates for prediction were selected.

With reference to the model used by Zhang et al. [5] for population prediction, the specific settings of growth rates for the high-, low- and middle-value scenarios are as follows. The high-value scenario uses the maximum natural population growth rate in 1999–2018, and the low-value scenario uses the minimum natural population growth rate in 1999–2018. The middle-value scenario uses the mean value of the growth rates of the high- and low-value scenarios as the annual population growth rate. According to the Malthusian population model, with 2010 as the base period, the predicted values of each city’s total population from 2011 to 2030 can be obtained from the high-, middle- and low-value scenarios. The prediction results are shown in Table 2 below.

3.1.2. Unary Linear Regression Model

With reference to the approach of Jia et al. [37], the population data from 1999 to 2018 are divided into two data sequences of 2009–2018 (10-year sample) and 1999–2018 (20-year sample) according to different sample sizes. The final prediction results are as shown in Table 3.

3.1.3. Logistic Model

Differentiation and separation of variables are conducted in Equation (6). Subsequently, a logistic population prediction model for the 210 cities is established. We again use two data sequences: the 10-year sample from 2009–2018 and the 20-year sample from 1999–2018. The prediction results are shown in Table 4.

3.1.4. Gray Prediction Model

Following the step 1 to step 5 listed in Section 2.4, the prediction results of the GM (1, 1) model are shown in Table 5.

3.2. Error Tests and Analysis

3.2.1. Malthusian Model

To get as accurate a forecast as possible, based on the previous prediction results of the Malthusian model, the predicted values of the high-, middle- and low-value scenarios for each of the cities were compared with the existing population data from 2011 to 2018. Thus, the results of the error tests on the three scenarios were obtained (Table 6).

The comparison result reveals that all the population prediction results of the listed first-tier cities in the high-value scenario are more accurate. However, when the sample size expanded to 210 cities, the average relative errors of the predictions for the 210 cities in the high-, middle- and low-value scenarios are 3.88%, 0.82%, and −2.12%, respectively. Therefore, considering the overall prediction results, the predicted value of the middle-value scenario is slightly lower than the actual populations and has the smallest overall error. Under this model method, the middle scheme is the most reliable.

3.2.2. Unary Linear Regression Model

With the prediction steps introduced in Section 3.1, the prediction results of the unary regression model are as follows (Table 7).

The results of the error tests in Table 7 reveal that the prediction model of the 10-year sample has a smaller error (both error and relative error) than that of the 20-year sample, possibly owing to irregularities caused by a longer data sequence. It also shows that such a model is better suited to short-term population prediction, as the accuracy of the prediction results may decline along with the prediction term of the model.

3.2.3. Logistic Model

The following prediction results are obtained from the logistic biological competition model (Table 8).

Table 8 shows that the prediction results of the four first-tier cities fit better with the populations in reality. The average relative errors of the 10-year sample size and the 20-year sample size are 0.371% and −0.303%, respectively. The overall prediction accuracy of the logistic competitive biological model is high, but the accuracy of the 20-year sample size is much higher. This shows that the characteristics of the data cannot be fully reflected when a shorter time series is used, and the use of a longer time series will improve the model’s accuracy.

3.2.4. Gray Prediction Model

The average prediction error of the GM (1, 1) model is shown in Table 9 below.

The results in Table 9 show that, compared to the three models and methods above, the average relative error of the gray prediction method is the lowest of the four. For the first-tier cities, being popular immigration destinations, the errors increased slightly with time over the last few years. This is probably caused by increasing population flow into first-tier cities. Through error testing and numerical prediction, the prediction results of the 10-year sample size prove to be more accurate, with an average relative error of only 0.005%, which confirms that the GM (1, 1) model has high accuracy while requiring less data.

As shown in Table 10, by comprehensively comparing the errors of the four prediction models mentioned above, it is found that the 10-year sample size of the GM (1, 1) model has the smallest error: only 0.005%. Among the hypotheses about the model mentioned above, the four models can basically make appropriate predictions, and GM (1, 1) has the highest accuracy. Thus, the GM (1, 1) model passes the hypothesis.

Therefore, the prediction data of the GM (1, 1) model were selected for analysis of the future development trends in the remainder of this article.

4. Discussion

The calculations in this section refer to the data analysis methods of Langford [38] and Koenker and Bassett [39]. Based on the quartile analysis method, the box plots of the four sets of data for the years 2000, 2010, 2020, and 2030 are shown to facilitate intuitive analysis, as well as the comparison of the average levels and degrees of variation among multiple data sets. The box plots are shown in Figure 1 below.

The analysis shows that in the selected four years (2000, 2010, 2020, and 2030), among the 210 cities selected, Chongqing has the largest population (3091.09, 3303.45, 3432.93, and 3552.76, respectively), while Jiayuguan has the smallest population (15.96, 21.8, 21.34, and 22.02, respectively). The median of the urban population was 425.06, 465.71, 491.16, and 522.56, respectively. The data dispersion in each year is relatively high, while all population values are right-skewed, and all of them contain extreme values. This is because, in different years, the population values of the cities of Chongqing, Shanghai, Chengdu, and Nanyang are extremely large in their respective groups and are judged to be outliers, which increased the average population of each year thus leading to the right skewness of data. The population growth pattern in each city exhibits spatial difference over the years, while the degree of population dispersion in each city maintains an increasing momentum, resulting in enormous numerical gaps between cities in terms of their populations.

The growth in populations has accelerated the speed at which cities change their spatial layout, adopt different development models, and adapt in other ways to accommodate this growth. Cities are also profiting due to the economic benefits of population growth, which drives up demand for goods and services while providing additional labor. Consequently, there is an interaction between population growth and city expansion. As the public calls for regional development, the role of metropolitan areas in delivering development have become increasingly important [22,40].

For a clearer and more intuitive analysis of the development trends of the cities’ populations, the latitude and longitude coordinates of the above mentioned 210 prefecture-level cities were imported to the geo module of pyecharts (v1.9.0) in Python (v3.7). The highly accurate population prediction results from the GM (1, 1) model mentioned above were matched with the imported cities to draw the spatial distribution of the cities’ populations in 2000, 2010, 2020, and 2030.

As shown in Figure 2 and Figure 3, the populations of the sample cities have grown steadily in general but the population growth rates have slowed down significantly. In terms of specific cities, the total populations of the first-tier cities and new first-tier cities are stabilizing, while the growth rates of these cities all show a downward trend. The population growth rates of all cities fell below 1%, except for Guangzhou, Shenzhen, Chengdu, Hangzhou, Xi’an, Changsha, and Dongguan. Among them, the population growth rate of Zhengzhou City was −1.05%. For some cities, including Beijing, due to the implementation of population regulations, as well as factors such as housing prices and living costs, there is a noticeable growth in reverse migration.

As for total population, most first-tier and new first-tier cities have populations between 5–10 million. It is expected that the population of Xi’an and Guangzhou will exceed 10 million in 2023 and 2025, respectively. In addition, China has 19 first-tier and new first-tier cities, and it is expected that the population of seven of these cities will exceed 10 million by 2030.

The population growth rates of second- and third-tier cities are increasing steadily, and those of most cities fluctuate around 0.5%. Among them, the population growth rates of Zhuhai, Foshan, Huizhou, Zhongshan, Guiyang, Handan, Langfang, Putian, Linyi, and Yinchuan are relatively higher than other second- and third-tier cities.

As for population growth, only three second- or third-tier cities (Baoding, Linyi, Nanyang) had a population exceeding 10 million in 2000. Owing to the rapid development of middle-tier cities (second-tier cities and third-tier cities), it is expected that there will be nine second- or third-tier cities (Shijiazhuang, Xuzhou, Handan, Baoding, Fuyang, Ganzhou, Linyi, Nanyang, and Shangqiu) with a population exceeding 10 million in 2030. Many cities have a population that exceeds 1 million; the only city with less than 1 million people is Sanya (643,600 people). The population sizes of the fourth- and fifth-tier cities are significantly smaller, but there are numerous low-tier cities that have a population in the range of 2–5 million. In recent years, population growth has slowed down, and the population growth rates of most cities have been negative. It is expected that in 2030, Yichun (969,000 people), Tongchuan (763,000 people), Shizuishan (739,200 people), Jinchang (431,500 people), Wuhai (316,300 people), Jiayuguan (217,800 people), and Karamay (169,600 people) will still each have a population under 1 million.

Owing to the ever-expanding economic advantages and talent introduction policies of core cities, their agglomeration effects have been continuously enhanced [40]. To further analyze the trend of inter-regional population development in the future, Figure 4 and Figure 5 are drawn by overlaying the original urban agglomeration distribution data upon a map with latitude and longitude coordinates for each city, sourced from Google Maps, to show the distribution of urban agglomerations in China.

The analysis of cities shows that the overall population development of the north and south of the country is relatively stable, but the city disparity is obvious. (The geographical boundary between north and south China is the Qinling-Huai River line, along the southeast side of the Qinghai-Tibet Plateau, which coincides with the 800 mm isothermal precipitation line and the January zero isothermal line. The area north of this line is called Northern China, and the area south of this line is called Southern China.) In the sample data of 210 cities, the sample range from 1999 to 2018 show that the average annual population growth of southern cities (totaling 106) was 3.862 million people, and that of northern cities (totaling 104) was 2.652 million people. The development characteristic can be described as fast in the southern cities and slow in the northern cities. In the prediction data of 2021–2030, the average annual population growth in the southern and northern cities is expected to be 2.0451 million and 1.9081 million, respectively, indicating a population growth trend that is roughly equal in the southern cities and the northern cities.

As for the urban agglomerations, as shown in Figure 4 and Figure 5, the overall population growth rate of the sample city data is steady while slowing down. The population growth rates of the northern cities have gradually surpassed those of the southern cities. The ratio of the population growth in the northern urban agglomerations to that of the southern urban agglomerations is expected to increase from 0.73 from 1999–2018 to 0.79 from 2021–2030. Among them, the population growth of the urban agglomeration in the middle reaches of the Yangtze River Delta has a significantly slowing growth rate. The population growth rates of Yichang, Jingmen, Xiaogan, Huanggang, Jingzhou, Xiangtan, and Changde are all negative, and the average annual population growth is expected to be 259,600 from 2021 to 2030. The population growth of the Central Plains urban agglomeration and the Yangtze River Delta urban agglomeration has been progressing steadily. The growth rates are stable at around 1% and 0.5% in the long term, and it is expected that the average annual population growth in the next 10 years will reach 774,900 and 420,500, respectively.

Regarding the metropolitan areas, the core metropolitan areas, including the Yangtze River Delta interlocking the metropolitan areas, Guangdong–Hong Kong–Macao Greater Bay Area, and the capital, Beijing, have attracted talent from across the nation to migrate there due to these areas offering a solid economic base, being well-located, and presenting policy advantages. Consequently, the population has grown especially rapidly in these areas. It is expected that the average annual population growth in 2021–2030 will exceed 400,000. Among these areas, the average annual population growth of the Guangdong–Hong Kong–Macao Greater Bay Area is expected to exceed 600,000. For the city circles represented by Shenyang, Changchun, Xi’an, and Chengdu, the population growth rates are negative. In particular, the city population growth rates in the northeastern regions have decreased rapidly due to economic depression and the harsh climate there, and there is a serious outflow of population.

Generally speaking, the future population growth of prefecture-level cities in China generally meets the theory of zero population growth, and the hypothesis of population theory above is basically valid. This study finds that population growth exhibits inconsistent patterns between different city tiers. The population growth of high-tier cities (first-tier cities and new first-tier cities) is shrinking, the populations of low-tier cities (fourth-tier cities and fifth-tier cities) are experiencing negative growth, and the populations of middle-tier cities are exploding. With the continuous enhancement of the regional integration of large urban agglomerations and metropolitan areas, the layout with scattered groups has been replaced by a large landscape with a uniform pace. However, the promotional effect of core urban agglomerations and metropolitan areas on the surrounding areas is still not significant.

5. Conclusions

Population growth is a source of economic and social vitality, the foundation of innovation and entrepreneurship, and a core element in encouraging urban development. China is now facing population problems such as the overall slowdown of the population growth rate, the intensifying of sub-replacement fertility and an aging population, the decrease in the fertility rate, and the overly rapid mobilization of the population. Furthermore, the competition to attract labor has turned into a competition to retain labor as migration becomes ever easier.

Based on the data from the cities’ statistics bureaus, statistical yearbooks over the years, and population censuses, this study utilized the Malthusian model, unary linear regression model, logistic model, and gray prediction model to perform population predictions on 210 prefecture-level cities and analyzed the characteristics of urban population distribution and development. This study has found that the total urban population has changed from fast growth in developed cities and slow growth in underdeveloped cities to slow growth in high- and low-tier cities, and fast growth in middle-tier cities. The growth rates of city populations have changed from a fast pattern in the southern cities and a slow pattern in the northern cities to being basically balanced between the northern and southern cities. Comparing the populations of cities at different tiers, the high-tier cities had once dominated the top rankings, but the middle- and low-tier cities are gradually catching up and weakening the dominance of high-tier cities. The target of the urban agglomerations and the metropolitan areas is gradually shifting from integration to urban cohesion, although the population gaps between cities tend to widen; as the growth rate of the population declines, China’s total population is likely to be relatively stable in the future. In addition to the population aggregation effect, economic siphon effect, and other reasons, fertility decline may also be an important reason for this phenomenon. Cities with faster economic development attract highly educated people [40], and their high expectations for potential marriage partners coupled with the huge marriage costs of China’s skyrocketing housing prices may hinder the willingness of highly educated people to marry [41]. Those who are married, especially highly educated married women, often choose to have only one child or give up work because it is difficult to balance work and family [42], also leading to slow urban population growth.

In the future, the state should accelerate the reform of the household registration system and optimize the employment promotion mechanism. It should conform to the pattern of urbanization development and guide the population to migrate to small- and medium-sized cities surrounding the densely populated urban agglomerations and metropolitan areas. The purpose of this is to coordinate the populations of the regions and to connect different parts of the nation. The state must also proactively intensify the leading effect of core cities and regions while at the same time facilitate the harmonious development of regional economies. It must determine the largest scope of common interests, target the highest common factors that affect mutual benefit, promote industrial development, upgrade and renew the regional populations, and move high-quality talents to the core regions. Further, the state should focus on building a policy system and social environment conducive to elderly care and filial piety, and cancel all policy constraints on childbirth. This should encourage people to get married and reproduce at an appropriate age, let families control their own reproduction, and further accelerate the construction of a support system for accelerating reproduction.

This study used microdata on the population of prefecture-level cities in China to make population predictions and used visualization software to more intuitively show the development and change of the future population, and combines prefecture-level cities and urban agglomerations to provide suggestions for the government to help cities develop better. This study certainly has reference value to the research of urban population development, urban agglomeration, and urban agglomeration development. Due to the limitation of population data system construction, the data granularity in this study can only reach prefecture-level cities. After obtaining more detailed data, its practical and theoretical significance will be more prominent.

Author Contributions

All the authors equally contributed to this research. Conceptualization, J.D. and L.C.; methodology, J.D., X.L. and L.C.; writing—original draft, L.C. and T.M.; writing—review and editing, L.C. and T.M.; visualization, T.M.; data curation, L.C. and T.M.; investigation, L.C.; supervision, J.D. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 71573244, 71850014, 71974180).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data can be obtained by email from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures and Tables

Figure 1. Population box plots for the selected four years.

View Image - Figure 2. The spatial distribution of the population of the cities, population growth rate: the spatial distribution diagrams of prefecture-level urban populations in 2000, 2010, 2020, and 2030 (Unit: %). Note: this diagram uses the base diagram that comes with the Python pyecharts package. The base map is not modified. The bold gray lines in the figure indicate the geographical boundary between Northern and Southern China.

Figure 2. The spatial distribution of the population of the cities, population growth rate: the spatial distribution diagrams of prefecture-level urban populations in 2000, 2010, 2020, and 2030 (Unit: %). Note: this diagram uses the base diagram that comes with the Python pyecharts package. The base map is not modified. The bold gray lines in the figure indicate the geographical boundary between Northern and Southern China.

View Image - Figure 3. The spatial distribution of the population of the cities—population: the spatial distribution diagrams of prefecture-level urban populations in 2000, 2010, 2020, and 2030 (Unit: ten thousand people). Note: this diagram uses the base diagram that comes with the Python pyecharts package. The base map is not modified. The bold gray lines in the figure indicate the geographical boundary between Northern and Southern China.

Figure 3. The spatial distribution of the population of the cities—population: the spatial distribution diagrams of prefecture-level urban populations in 2000, 2010, 2020, and 2030 (Unit: ten thousand people). Note: this diagram uses the base diagram that comes with the Python pyecharts package. The base map is not modified. The bold gray lines in the figure indicate the geographical boundary between Northern and Southern China.

View Image - Figure 4. Spatial distribution of the urban agglomeration areas. Note: this diagram uses the administrative map of the People’s Republic of China from https://beijing.tianditu.gov.cn/, (Access date 31 May 2022). The base map is not modified. The urban agglomeration showed in this diagram is the approximate range of urban agglomeration in China, covering all the prefecture-level cities but not differentiating county-level cities, so it might be larger or smaller than the exact area of the urban agglomerations. The bold black lines in the figure indicate the geographical boundary between Northern and Southern China.

Figure 4. Spatial distribution of the urban agglomeration areas. Note: this diagram uses the administrative map of the People’s Republic of China from https://beijing.tianditu.gov.cn/, (Access date 31 May 2022). The base map is not modified. The urban agglomeration showed in this diagram is the approximate range of urban agglomeration in China, covering all the prefecture-level cities but not differentiating county-level cities, so it might be larger or smaller than the exact area of the urban agglomerations. The bold black lines in the figure indicate the geographical boundary between Northern and Southern China.

View Image - Figure 5. Urban agglomeration areas. Note: this diagram uses the administrative map of the People’s Republic of China from https://beijing.tianditu.gov.cn/, (Access date 22 May 2022). The base map is not modified. Due to the unavailability of data and the boundary problem of urban agglomeration, the figure shows the approximate range of urban agglomeration area. The bold gray lines in the figure indicate the geographical boundary between Northern and Southern China. The cities are sorted in no particular order.

Figure 5. Urban agglomeration areas. Note: this diagram uses the administrative map of the People’s Republic of China from https://beijing.tianditu.gov.cn/, (Access date 22 May 2022). The base map is not modified. Due to the unavailability of data and the boundary problem of urban agglomeration, the figure shows the approximate range of urban agglomeration area. The bold gray lines in the figure indicate the geographical boundary between Northern and Southern China. The cities are sorted in no particular order.

Table 1

Summary of mainstream population prediction methods.

Method	Core Content	Hypothesis	Pros	Cons
Historical trend method	Historical growth trends, demographic characteristics, inertial measurement of population, Malthus model	Future population would show the same pattern as population in the past	The method is classical and widely used	Long-term prediction accuracy will decline
Classical population analysis	Life table technology, hypothetical cohort analysis, population prediction technology and total fertility rate	Accounting for birth and death of people, the measure of population is not the only factor	High prediction accuracy	The demand for data is high and requires long-term accumulation
Regression model	Logistic model, ARIMA model, Leslie matrix, unitary and multiple linear regression model	Sociology investors believe that population growth may have linear or quasi-linear relationship on certain factors	Less data is required and the method is simple	Long-term prediction error fluctuation is relatively large
Gray system prediction model	GM (1, 1) model	The trend of population is a fuzzy system, which means the status in the past cannot be used to predict the accurate population growth, only the probability of the status in the future	Less required data, high accuracy of prediction	Only for short- and medium-term forecasts
Population prediction model	Gravity model, radiation model, Song Jian population equation model, population flow model, etc.	Spatial interaction among cities reflects the growth pattern of population	The model form is more intuitive and explicable	The data requirements are strict and the model is complex
BP neural network, artificial intelligence	Nonlinear dynamics systems, mathematical equations, etc.	High-order statistical rule reflections from big data might be suitable for population prediction, which cannot be calculated by small amounts of data	Strong ability of nonlinear mapping and generalization	The convergence rate of the algorithm is slow, the interpretability is poor
System dynamics	Differential dynamical equations and nonlinear stochastic processes	Population is embedded into the whole system of the society, and might be correlated with a number of factors	Can find the root cause from the internal structure of the system	The subjective color is heavy

Table 2

Population prediction results of cities based on the Malthusian model.

The Results of Population Prediction of Beijing
Year	High Scheme	Middle Scheme	Low Scheme	Year	High Scheme	Middle Scheme	Low Scheme
2011	1268.53	1259.52	1250.52	2021	1380.98	1276.89	1179.97
2012	1279.35	1261.25	1243.28	2022	1392.76	1278.63	1173.14
2013	1290.26	1262.98	1236.08	2023	1404.64	1280.39	1166.35
2014	1301.27	1264.71	1228.92	2024	1416.63	1282.14	1159.59
2015	1312.37	1266.44	1221.81	2025	1428.71	1283.90	1152.88
2016	1323.56	1268.17	1214.73	2026	1440.90	1285.66	1146.20
2017	1334.85	1269.91	1207.70	2027	1453.19	1287.42	1139.57
2018	1346.24	1271.65	1200.71	2028	1465.58	1289.18	1132.97
2019	1357.72	1273.39	1193.75	2029	1478.08	1290.95	1126.41
2020	1369.30	1275.14	1186.84	2030	1490.69	1292.72	1119.89
The Results of Population Prediction of Shanghai
Year	High Scheme	Middle Scheme	Low Scheme	Year	High Scheme	Middle Scheme	Low Scheme
2011	1413.01	1410.36	1407.72	2021	1419.95	1390.95	1362.49
2012	1413.70	1408.41	1403.13	2022	1420.65	1389.03	1358.05
2013	1414.40	1406.46	1398.55	2023	1421.34	1387.10	1353.62
2014	1415.09	1404.51	1393.99	2024	1422.04	1385.18	1349.21
2015	1415.78	1402.57	1389.45	2025	1422.74	1383.26	1344.81
2016	1416.48	1400.62	1384.92	2026	1423.43	1381.35	1340.43
2017	1417.17	1398.68	1380.40	2027	1424.13	1379.43	1336.06
2018	1417.87	1396.75	1375.90	2028	1424.83	1377.52	1331.70
2019	1418.56	1394.81	1371.42	2029	1425.53	1375.61	1327.36
2020	1419.26	1392.88	1366.95	2030	1426.23	1373.71	1323.03
The Results of Population Prediction of Guangzhou
Year	High Scheme	Middle Scheme	Low Scheme	Year	High Scheme	Middle Scheme	Low Scheme
2011	818.71	813.33	807.95	2021	955.68	888.81	826.23
2012	831.47	820.58	809.76	2022	970.58	896.74	828.08
2013	844.43	827.89	811.57	2023	985.71	904.73	829.93
2014	857.60	835.27	813.39	2024	1001.08	912.80	831.79
2015	870.97	842.72	815.21	2025	1016.69	920.94	833.66
2016	884.55	850.23	817.04	2026	1032.54	929.15	835.52
2017	898.34	857.81	818.87	2027	1048.63	937.43	837.39
2018	912.34	865.46	820.70	2028	1064.98	945.79	839.27
2019	926.57	873.18	822.54	2029	1081.58	954.22	841.15
2020	941.01	880.96	824.38	2030	1098.45	962.72	843.03
The Results of Population Prediction of Shenzhen
Year	High Scheme	Middle Scheme	Low Scheme	Year	High Scheme	Middle Scheme	Low Scheme
2011	266.41	264.37	262.32	2021	341.63	313.85	288.13
2012	273.12	268.94	264.79	2022	350.23	319.28	290.85
2013	280.00	273.60	267.29	2023	359.05	324.80	293.59
2014	287.05	278.33	269.81	2024	368.09	330.42	296.36
2015	294.28	283.15	272.36	2025	377.36	336.14	299.16
2016	301.69	288.05	274.92	2026	386.86	341.96	301.98
2017	309.28	293.03	277.52	2027	396.61	347.88	304.83
2018	317.07	298.10	280.13	2028	406.59	353.90	307.70
2019	325.06	303.26	282.78	2029	416.83	360.02	310.60
2020	333.24	308.51	285.44	2030	427.33	366.25	313.53