Content area
Urban air pollution is a critical global challenge, especially in rapidly industrializing cities, where effective environmental management requires robust probabilistic models. This study evaluates the three parameter Burr-XII distribution for modeling daily average concentrations of carbon monoxide (CO), sulfur dioxide (SO2), and nitric oxide (NO) in Visakhapatnam, India, using data from January 1st, 2018 to December 31st, 2022. Various statistical tools-such as skewness-kurtosis plots, probability density functions (PDFs), empirical cumulative distribution functions (ECDFs), P-P, and Q-Q plots are employed to assess the model's validity. Maximum Likelihood Estimation (MLE), goodness-of-fit tests (Kolmogorov-Smirnov, Anderson-Darling, and Cramér-von Mises), and model selection criteria like Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC) are applied to evaluate the performance of the Burr-XII distribution compared to the Dagum-I and Log-Logistic distributions. Results show that the Burr-XII distribution consistently provides the best fit, demonstrating superior error metrics-mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and the coefficient of determination (R2), excelling in goodness-of-fit and model selection criteria, while showing lower standard errors and better alignment with empirical data, particularly in the tails and extreme values. These findings highlight the robustness of the Burr-XII distribution in capturing the variability and skewness inherent in air pollutant concentrations. The study underscores the potential of the Burr-XII distribution as a reliable tool for air quality modeling, enhancing pollution forecasting and regulatory compliance. By supporting effective environmental monitoring and policy-making, the findings contribute to improved public health protection in urban centers.
Abstract-Urban air pollution is a critical global challenge, especially in rapidly industrializing cities, where effective environmental management requires robust probabilistic models. This study evaluates the three parameter Burr-XII distribution for modeling daily average concentrations of carbon monoxide (CO), sulfur dioxide (SO2), and nitric oxide (NO) in Visakhapatnam, India, using data from January 1st, 2018 to December 31st, 2022. Various statistical tools-such as skewness-kurtosis plots, probability density functions (PDFs), empirical cumulative distribution functions (ECDFs), P-P, and Q-Q plots are employed to assess the model's validity. Maximum Likelihood Estimation (MLE), goodness-of-fit tests (Kolmogorov-Smirnov, Anderson-Darling, and Cramér-von Mises), and model selection criteria like Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC) are applied to evaluate the performance of the Burr-XII distribution compared to the Dagum-I and Log-Logistic distributions. Results show that the Burr-XII distribution consistently provides the best fit, demonstrating superior error metrics-mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and the coefficient of determination (R2), excelling in goodness-of-fit and model selection criteria, while showing lower standard errors and better alignment with empirical data, particularly in the tails and extreme values. These findings highlight the robustness of the Burr-XII distribution in capturing the variability and skewness inherent in air pollutant concentrations. The study underscores the potential of the Burr-XII distribution as a reliable tool for air quality modeling, enhancing pollution forecasting and regulatory compliance. By supporting effective environmental monitoring and policy-making, the findings contribute to improved public health protection in urban centers.
Index Terms-Environment, Air Pollution, Burr-XII Distribution, Carbon Monoxide, Sulfur Dioxide, Nitric Oxide. Maximum Likelihood Estimation.
(ProQuest: ... denotes formulae omited.)
I. INTRODUCTION
THE increasing concentration of airborne contaminants in urban and metropolitan areas poses significant challenges in managing air pollution.
Urban air pollution poses a significant threat to human health and the environment, particularly in rapidly industrializing cities. Pollutants such as carbon monoxide (CO), sulfur dioxide (SO2), and nitric oxide (NO) are known contributors to respiratory and cardiovascular diseases, acid rain, and smog formation. Their persistent presence in urban environments necessitates robust modeling approaches to predict pollutant concentrations accurately and inform mitigation strategies. Visakhapatnam, a major industrial city on India's eastern coast, faces substantial air quality challenges due to dense vehicular traffic, industrial activities, and rapid urbanization. Accurate pollution modeling is essential for understanding pollutant dynamics in such contexts, enabling timely regulatory measures and public health interventions. Traditional statistical methods often fail to capture the complexity and variability of pollution data, especially the extreme values. Hence, the need arises for advanced probabilistic models that can accommodate skewed, heavy-tailed data.
Several studies have demonstrated the utility of statistical distributions in air quality modeling. Marani, A., et al. [1], employed the generalized gamma distribution to model air quality data in Venice, Italy, highlighting its ability to capture the characteristics of observed frequency distributions of air pollutant concentrations. Bell, Michelle L. [2], aimed to enhance the geographic and temporal precision of exposure estimates in air quality assessments by using air quality modeling instead of traditional monitoring methods. De Foy et al. [3], demonstrated different statistical methods for analyzing the impact of air pollutants on the health of urban populations. Jiang, Xue et al. [4], analysed components of air pollutants such as SO2, NO2, and PM10 by fitting them to statistical models. Sharma et al. [5], utilized the Type I asymptotic distribution to predict NAAQS violations in Delhi, finding Gumbel's method effective for fitting observed data and managing urban air quality. Benjamin et al. [6], compared the Dagum and GEV distributions for modeling ozone levels in Mexico City [7]. Favarato, Graziella, et al., evaluated a statistical model to analyse the association between NO2 levels and asthma prevalence in children, finding a positive correlation between the two. Ganora, D., and F. Laio [8], proposed the Burr-XII distribution for modeling stream flows and rainfall, demonstrating negligible errors in quantile estimation and confirming its suitability using flow duration curves from north-western Italy.
Thupeng, W. M. [9], modeled daily maximum NO2 concentrations in Gaborone using the Burr-XII distribution, finding it provided the best fit compared to other distributions for extreme air pollution values. Jamaati et al. [10], investigated the trend of air pollution concentrations in Tehran, using data from 22 air quality monitoring stations to analyze pollutants such as CO, NO2, SO2, O3, and PM10, and found a worsening trend in air quality, with an increase in unhealthy days and a decline in healthy days. Lopez- Rodríguez et al. [11], Introduced the Dagum distribution for fitting rainfall data, finding it superior to traditional models like Gumbel and GEV. Muse, A.H, et al. [12], illustrated the log-logistic tangent (LLT) distribution for analyzing COVID-19 mortality in Somalia, finding it superior to other models for fitting mortality data processed various goodness-of-fit measures. Elbatal et al. [13], introduced the Sine Burr X-G family of probability distributions for modeling extreme environmental factors, designed for both site-specific and multi-site applications. Emam, W., and Y. Tashkandy [14], proposed a new five-parameter modified Alpha-Power Weibull-Weibull distribution for modeling carbon dioxide emissions, demonstrating its superior performance compared to other models using the Kolmogorov-Smirnov test. Arun Kumar Chaudhary, et al. [15], developed the New Extended Kumaraswamy Exponential Distribution for air quality data in Kathmandu.
Ahmat et al. [16], found that the three-parameter Generalized Extreme Value (GEV) distribution effectively predicted extreme PM10 concentrations in Malaysia, with strong accuracy in forecasting exceedances of the air quality guideline. Suebyat, K., and N. Pochai [17], performed a numerical simulation of air quality under Bangkok's sky train platforms using a 3D advection-diffusion equation, analyzing pollution from tunnel entrances, wind inflow, and obstacles, with satisfactory results for air pollution control in tunnel environments. Sooknum, J., and N. Pochai [18], developed a mathematical model using an explicit finite difference technique to assess airborne infection risk among bus passengers, optimizing capacity, ventilation, and air quality for enhanced safety. Khamrot et al. [19], applied the GEV distribution to carbon dioxide emissions data from Phitsanulok, Thailand (2010-2023), identifying rising emission peaks and increased environmental risks from fuels such as Gasohol and LPG.
Building on this foundation, the present study explores the applicability of the three-parameter Burr-XII distribution to model daily average concentrations of CO, SO2, and NO in Visakhapatnam, using data from January 2018 to December 2022. The Burr-XII distribution, known for its ability to handle skewed and heavy-tailed data, is compared with Dagum-I and Log-Logistic distributions. Parameters are estimated using Maximum Likelihood Estimation (MLE), and model performance is evaluated through goodness-of-fit tests (Kolmogorov-Smirnov, Anderson- Darling, and Cramér-von Mises) and selection criteria (AIC, BIC), along superior error metrics (MAE, MSE, RMSE, and R2). the best-fitting model for CO, NO, and SO2 concentrations is identified. The comparative analysis seeks to identify the distribution that best represents the statistical properties of air pollution data, offering the most reliable predictions for regulatory and forecasting purposes. By advancing the understanding of pollutant concentration dynamics, this study contributes to better environmental monitoring, more accurate forecasting, and informed policymaking, ultimately protecting public health in rapidly growing urban centers.
II. MONITORING SITE AND DATA DESCRIPTION
A. Research Location and Data Overview This study focuses on Visakhapatnam, a major industrial and coastal city in Andhra Pradesh, India, located at 17.6868° N and 83.2185° E. Known for its dense population, rapid industrialization, and significant vehicular emissions, Visakhapatnam faces critical air quality challenges. The city is also home to India's Eastern Naval Command and is ranked among the world's 100 fastestgrowing cities and one of India's top ten wealthiest cities.
The data for this study, spanning January 2018 to December 2022, was obtained from the Continuous Ambient Air Quality Monitoring Station (CAAQMS) operated by the Greater Visakhapatnam Municipal Corporation (GVMC). Daily average concentrations of carbon monoxide (CO), sulfur dioxide (SO2), and nitric oxide (NO) were analyzed, with each dataset comprising 1,627 observations. The Andhra Pradesh Pollution Control Board (APPCB) monitors daily air pollutant concentrations in Visakhapatnam in real time. The CAAQMS records wide range pollutants, including NH3, SO2, CO, O3, Benzene, Toluene, Xylene, PM2.Â..., PM10, NO, NO2, and NOx. The Air Quality Index (AQI) is calculated based on one particulate matter and three gaseous pollutants, reflecting the combined impact of weather conditions and pollution levels. Analysis CO, NO, and SO2 concentrations offers valuable insights into pollution trends and air quality dynamics in urban cities.
B. Probability Distributions in Air Quality Modeling
Accurately modeling air pollutant concentrations requires selecting an appropriate probability distribution that captures the data's skewness and variability. This study compares the Burr-XII, Dagum-I, and Log-Logistic distributions due to their effectiveness in modeling heavytailed and positively skewed environmental data. Each of these distributions possesses unique characteristics that make them suitable for analyzing air quality data.
Burr-XII Distribution. The Burr-XII distribution is widely used for modeling skewed and heavy-tailed environmental data. It offers flexibility in capturing pollutant concentration variations, particularly at extreme values in air quality components like PM10 and O3. This distribution is particularly effective for datasets exhibiting strong positive skewness and high kurtosis, such as air pollution measurements. The cumulative distribution function (CDF) for a positive random variable X is:
(1)
This function describes the probability that X is less than or equal to a given value x. Then the probability density function (PDF) of Burr-XII distribution for X is:
... (2)
Here, r and s are shape parameters, while t serves as the scale parameter. The Burr-XII distribution reduces to the Pareto distribution when r=1 and to the two-parameter Burr distribution when t=1. Increasing r increases right skewness, while large s values indicate heavier tails. Large t values broaden the distribution.
Dagum-I Distribution. The Dagum-I distribution is frequently used for modeling socioeconomic and environmental variables with heavy tails. It is known for its ability to represent pollutant concentration fluctuations by accommodating different degrees of skewness. The CDF for the Dagum-I distribution is:
...(3)
The parameter determining the type of Dagum-I distribution (Type I, II, or III) with the specific numerical values ... corresponding to each type. For the Dagum-I distribution (where ), the CDF simplifies to:
...(4)
The PDF for the Dagum-I distribution is
...(5)
Here, X is the random variable, r and t are shape parameters, and s is a scale parameter. The Dagum-I distribution is particularly useful for extreme pollutant values, often outperforming other distributions in the tail regions.
Log-Logistic Distribution. The Log-Logistic distribution is commonly applied in environmental and reliability studies. It provides a good fit for skewed datasets and is especially useful for pollutant data where concentration levels show an initial rise followed by a gradual decline. The CDF of the Log-Logistic distribution is:
... (6)
The PDF of the Log-Logistic distribution is:
... (7)
where X is the random variable, r is the shape parameter, s is the scale parameter, and t is the location parameter. This distribution is particularly useful for modeling extreme pollution levels with declining hazards over time.
C. Evaluation of Goodness fit of the Distribution
To assess the goodness-of-fit for each model, a range of statistical tests and criteria were employed, as outlined below.
Goodness-of-Fit Tests. Assessing the goodness-of-fit for statistical models is crucial in data analysis. This study employs formal statistical tests including the Kolmogorov- Smirnov (KS), Anderson-Darling (AD), and Cramer-von Mises (CvM) tests. Complementary to these tests, diagnostic tools such as skewness vs. kurtosis plots, PDFs, empirical CDFs, as well as P-P and Q-Q plots are used to rigorously evaluate model fit.
Kolmogorov-Smirnov Test. This non-parametric test evaluates the maximum discrepancy between the theoretical distribution F0(x) and the empirical distribution function Fn(x). It is calculated as follows:
...(8)
A higher D value indicates a less satisfactory match between the sample data and the proposed distribution F0(x).
Anderson-Darling Test. This test focuses on the tails of the distribution, offering a robust criterion for model evaluation. The Anderson-Darling statistic, A2 is calculated as:
...(9)
A smaller value of A2 indicates a better fit of the fitted distribution F0(x) to the observed data.
Cramer-von Mises Test. Like the Kolmogorov-Smirnov test, the Cramer-von Mises test evaluates the overall fit of the distribution, with particular attention to the tails. The Cramer-von Mises statistic, W2, is calculated as follows:
...(10)
A higher W2 indicates greater discrepancies between the observed data and the hypothesized distribution.
Model Selection Criteria. Model selection is the process of choosing the best model from a set of candidate models, considering several factors.
Log-Likelihood (LL). The log-likelihood measures how well a statistical model fits a given dataset. For a dataset, X={x1, x2,....,xn}and a probability distribution f(x; θ) with the parameter θ, the log-likelihood is calculated as follows:
...(11)
Given a data set and the parameters r, s, and t, the loglikelihood function of Burr-XII distribution is:
...(12)
Dagum-I distribution is:
... (13)
Log-Logistic distribution is:
... (14)
Akaike Information Criterion (AIC). The Akaike Information Criterion balances model complexity with goodness of fit. A lower AIC indicates a better model. It is calculated as follows:
...(15)
Bayesian Information Criterion (BIC). The Bayesian Information Criterion is a more conservative measure that applies a larger penalty for models with additional parameters. It is calculated as follows:
...(16)
Lower BIC values indicate better models with a preference for simplicity.
Hannan-Quinn Information Criterion (HQIC). The Hannan-Quinn Information Criterion is employed for model selection, particularly in large sample sizes. It is calculated as follows:
...(17)
Adjusted Bayesian Information Criterion (ABIC). The Adjusted Bayesian Information Criterion modifies the BIC to account for the number of parameters in relation to the sample size. It is calculated as follows:
...(18)
Consistent Akaike Information Criterion (CAIC). The Consistent Akaike Information Criterion is a more conservative version of AIC, applying a greater penalty for the number of parameters. It is calculated as follows:
... (19)
Model Evaluation Metrics. Selecting the appropriate evaluation metric is crucial to understanding how well your model is performing and what aspects of its predictions need improvement.
Mean Absolute Error (MAE). The mean absolute error measures the average of the absolute errors between actual and predicted values:
...(20)
Mean Squared Error (MSE). The mean squared error calculates the average of the squared differences between actual and predicted values, giving more weight to larger errors.
... (21)
Root Mean Squared Error (RMSE). The square root of the MSE, providing an error measure in the same units as the data:
...(22)
Coefficient of Determination (R2). Measure the proportion of variance in the dependent variable explained by the model:
...(23)
Adjusted R-Squared (Adj. R2). A more trustworthy measure of model fit is provided by adjusted R2, for the number of predictors, providing a more reliable measure of model fit:
...(24)
Where is the actual value, is the predicted value, is the mean of the observed data, n is the number of observations, and k is the number of predictors. These metrics provide for a detailed understanding of model performance; the selection of metrics is contingent upon the objectives of the investigation. For instance, MAE offers a simple way to assess average error, but RMSE is helpful in recognizing significant prediction mistakes. In order to evaluate a model thoroughly, several indications are needed.
III. METHODOLOGY
This study analyzed 24-hour ambient air quality data for CO, NO, and SO2, with 1627 observations for each pollutant. The data underwent preprocessing to handle missing values and outliers. Descriptive statistics, including mean, median, variance, standard deviation, skewness, and kurtosis, were computed to summarize the distributions of CO, SO2, and NO.
To visually assess the data and model fit, we employed various graphical methods, including Skewness-Kurtosis plots, PDFs, Q-Q plots, ECDFs, P-P plots. The data were modeled thorough the Dagum-I, Log-Logistic, and Burr-XII distributions, with parameters estimated via MLE. Goodness-of-fit was rigorously evaluated by statistical tests (like KS, CvM, and AD), alongside model selection criteria such as LL, AIC, BIC, HQIC, ABIC and CAIC.
Model performance was further assessed through error metrics, including MAE, MSE, and RMSE, as well as Rsquared (R2) and Adjusted R-squared (Adj.R2), which provided insights into the explanatory power of each model. These comprehensive analyses enabled a rigorous comparison of the statistical models, leading to a deeper understanding of the distribution and behavior of CO, NO, and SO2 pollutants.
Figure 1 presents the methodology used to identify the best-fit distribution for modeling daily average concentrations of CO, NO, and SO2 in Visakhapatnam, during data from January 2018 to December 2022.
A. Descriptive Statistics Assessment
To summarize the data, descriptive statistics were calculated for CO, NO, and SO2 concentrations, including minimum, maximum, range, mean, and median values. Quartile analysis emphasized the interquartile range, while the standard error of the mean assessed precision. Confidence intervals (95%) provided the range for the true mean. The coefficient of variation, variance, and standard deviation quantified relative and overall data variability. The presence of significant skewness and kurtosis indicated nonnormal distributions, justifying the use of advanced statistical models capable of handling skewed and heavytailed data.
B. Visual Evaluation
The fit of the Burr-XII, Log-Logistic, and Dagum-I distributions for CO, NO, and SO2 data was visually analyzed with several plotting techniques. Skewness vs. kurtosis plots illustrated data asymmetry and peakedness. PDFs compared the fitted distributions with observed data. P-P plots assessed fit by comparing cumulative probabilities, while Q-Q plots compared quantiles of observed and fitted data. CDF plots evaluated the cumulative distribution fit, and probability difference plots highlighted discrepancies between the models and actual data. These visual assessments helped determine the suitability of each distribution for modeling the pollutants.
C. Parameter Estimation
Maximum Likelihood Estimation (MLE) was used to estimate the parameters of each distribution. It maximizes the likelihood of the observed data under the given distribution, ensuring that the parameter estimates and their associated standard errors were obtained with high statistical efficiency. The R programming language was employed to perform all computations, leveraging its robust statistical libraries for precise estimation.
IV. NUMERICAL ILLUSTRATION
This study evaluates the applicability of the Burr-XII distribution using three datasets: CO, SO2, and NO. The datasets consist of daily average ambient concentrations (µg/mÃÂ) reported in Visakhapatnam city from January 2018 to December 2022. The Burr-XII distribution is contrasted with other models, including the Dagum-I and Logistic distributions. The analysis of air quality data for CO, NO, and SO2, compared to international and Indian regulations, offers significant insights into concentration levels and potential health impacts of each pollutant. with 1627 samples per contaminant, the study provides a comprehensive understanding of the pollutants' concentration levels and their possible health effects.
A. Descriptive Statistics and Data Distribution Analysis
This section provides an in-depth analysis of the descriptive statistics of the daily average concentrations of CO, NO, and SO2 concentrations provide insights into the distributional characteristics of these pollutants. Table I and II provides the key statistics, including measures of central tendency (mean, median), variability (standard deviation (Sd), range), and distribution shape (skewness, kurtosis). The CO data set exhibits a mean concentration of 0.6973 ppm, with a standard deviation of 0.2865 ppm, a range of 2.03 ppm, and a median of 0.66 ppm, indicating moderate variability. A positive skewness of 1.1921 and kurtosis of 2.5205 suggest a right skewed distribution with a longer right tail and sharper peak, confirming the need for models capable of handling heavy tails.
For NO, the average concentration is 14.3733 ppb, with a high standard deviation of 13.2132 ppb, a range of 128.8 ppb, and a median of 11.50 ppb, and significant right skewness (3.5171) and high positive kurtosis (21.6140), reflecting an asymmetric distribution with a long right tail and an extremely peaked distribution, requiring a flexible model like Burr-XII.
The SO2 concentrations have an average of 12.38 ppb, with a standard deviation of 6.6783 ppb, a range of 99.6 ppb, and a median of 11.30 ppb. SO2 data are also rightskewness (2.7937) with notable positive kurtosis (23.6105), also demonstrating a high peak and extended right tail.
These characteristics confirm that all three datasets deviate significantly from normality, displaying positive skewness and high kurtosis. Such distributions necessitate models that can accommodate asymmetry and heavy tails, justifying the selection of Burr-XII, Dagum-I, and Log-Logistic distributions for analysis.
Histogram vs. density in Figures 3 (a)-3 (c) illustrates these traits, emphasizing the need for probability distributions like Burr-XII, which is well-suited for positively skewed, heavy-tailed data. The analysis of skewness and kurtosis further supports the appropriateness of Burr-XII for modeling these pollutants, aligning with empirical observations and enhancing the reliability of subsequent modeling efforts. Efficient air quality management is crucial to address occasional high concentrations and ensure public health and safety.
The descriptive parameters of the empirical distributions for CO, NO, and SO2, as shown in Table II, align with the results from Table I. This confirms the strong kurtosis and right skewness observed in the data. These characteristics highlight the critical need for effective air quality management to mitigate health risks associated with high pollution levels.
B. Skewness-Kurtosis plots
The best statistical distribution for air quality data, particularly for CO, NO, and SO2, may be found using the Cullen and Frequency graphs. These graphs compare theoretical distributions with empirical data by plotting kurtosis against the square of skewness. Figure 2 presents Skewness-Kurtosis plots to assess the distributional characteristics of CO, NO, and SO2. These plots visually evaluate whether the observed data aligns with theoretical probability distributions. Figure 2 (a) shows that bootstrapped samples (grey) and actual CO data points (black) cluster around the beta distribution region. This suggests that CO data exhibits moderate skewness and peakedness, favoring the Burr-XII distribution due to its flexibility in modeling asymmetric data. Figure 2 (b) The NO data distribution falls between Beta and Logistic regions, indicating strong positive skewness and heavytailed behavior. The Burr-XII and Log-Logistic distributions are appropriate for capturing this trend, with Burr-XII likely performing best due to its ability to accommodate extreme values. Figure 2 (c) illustrates that SO2 data points are near the gamma and lognormal distribution zones, reflecting higher degree of skewness and strong kurtosis typical of these distributions. This alignment further supports Burr-XII and Dagum-I distributions, as they effectively model datasets with extended right tails.
Overall, the analysis of all three pollutant datasets exhibit strong positive skewness and high kurtosis, demonstrates that the Burr-XII distribution aligns closely with the empirical skewness-kurtosis structure, making it the most appropriate model for estimating CO, NO, and SO2 concentrations. This visual assessment supports the statistical findings in Table IV (goodness-of-fit tests), which confirm the superiority of the Burr-XII model. thereby enhancing air quality forecasting and management efforts.
C. Parameter Assessments
Table III provides parameter estimates and standard errors for CO, NO, and SO2 characterized by the Dagum-I, Log-Logistic, and Burr-XII distributions. These parameters are crucial for evaluating compliance with 24-hour mean concentration standards set by international and Indian regulations.
The Burr-XII model is effective for CO, with stable parameter estimates and moderate standard errors, assessing levels against the WHO's 24-hour limit of 10 mg/mÃÂ and India's 8-hour limit of 2 mg/mÃÂ. For NO, it provides consistent estimates with lower standard errors compared to Dagum-I, relevant for the WHO's 1-hour limit of 200 µg/m3 and India's 24-hour limit of 80 µg/mÃÂ. For SO2, the Burr-XII distribution shows stable estimates with relatively low standard errors, ensuring reliability in monitoring levels below the WHO's 24-hour limit of 20 µg/m3.
Overall, the Burr-XII distribution consistently exhibits lower standard errors and stable estimates, making it the most reliable model for pollutant concentration modeling, supporting regulatory compliance and public health.
D. Goodness-of-Fit Criteria
Table IV presents the results of the goodness-of-fit tests, including the CvM, AD, and KS tests, along with performance criteria such as LL, AIC, BIC, HQIC, CAIC, and ABIC. These metrics were used to identify the bestfitting distribution for CO, NO, and SO2. A comprehensive summary of each model's performance metrics is provided below.
The Burr-XII distribution consistently provides best fit for all three pollutants, confirming its suitability for modeling air quality data.:
* CO (Table IV (A)): It shows the lowest CvM (0.1586), KS (0.0272), and AD (1.1651) statistics, along with the lowest AIC (252.9086), BIC (269.0921), and HQIC (258.9131) values.
* NO (Table IV (B)): It exhibits the lowest KS (0.0299), CvM (0.2614), and AD (2.0444) statistics, and the most favorable AIC (11678.24), BIC (11694.43), and HQIC (11684.25) values.
* SO2 (Table IV (C)): It shows the lowest KS (0.0172), CvM (0.0738), and AD (0.7492) statistics, with the lowest AIC (10215.26), BIC (10231.45), and HQIC (10221.27) values.
Thus, the Burr-XII distribution is the most accurate model for estimating CO, NO, and SO2 concentrations. Implementing this model enhances air quality forecasting, aiding in effective pollution mitigation and regulatory compliance, and ultimately protecting public health and the environment.
E. Model Selection Evaluation
Table V presents an evaluation of model performance metrics for CO, NO, and SO2, comparing the efficiency of three continuous distributions: Burr XII, Log-Logistic, and Dagum-I. The findings help determine the most suitable model for representing the data. A detailed summary of each model's evaluation metrics is provided above. Model prediction accuracy based on Table V:
* CO Prediction: The Burr-XII distribution exhibits the lowest MAE (0.3032), MSE (0.1582), and RMSE (0.3978), indicating outperforming other models despite low R2 and Adj. R2 values.
* NO Prediction: Burr-XII achieves the lowest MSE (367.2417), RMSE (19.1636), and MAE (11.9272), demonstrating superior predictive performance among the models.
* SO2 Prediction: The Burr-XII distribution again records the lowest MAE (6.5943), MSE (82.8279), and RMSE (9.1010), confirming its accuracy.
Overall, The Burr-XII distribution outperforms the other models in all pollutant categories-CO, NO, and SO2, demonstrating its robustness in prediction, although R2 values suggest potential room for improvement in capturing extreme variability, making it the best model for air quality forecasting despite its relatively low explanatory power across all models.
F. Model Fit Evaluation
Figure 3 compares the empirical distributions of CO, NO, and SO2 with the fitted Burr-XII, Log-Logistic, and Dagum- I distributions using histograms overlaid with probability density functions (PDFs), empirical cumulative distribution functions (ECDFs), quantile-quantile (Q-Q) plots, and probability-probability (P-P) plots. The results indicate that the Burr-XII distribution provides the best overall fit for all three pollutants, particularly in the tail regions. In Figure 3(a) (CO distribution comparison), both Burr-XII and Log- Logistic closely align with the empirical data, but Dagum-I shows noticeable deviations, particularly in the tails, making it less effective in capturing extreme CO concentrations. Similarly, in Figure 3(b) (NO distribution comparison), the empirical data exhibit strong positive skewness with a heavy right tail, where Log-Logistic underestimates high NO values, leading to tail-fitting discrepancies. The Burr-XII model provides a superior fit, minimizing deviations across quantiles, as confirmed by the Q-Q and P-P plots.
In Figure 3(c) (SO2 distribution comparison), while the Burr-XII and Log-Logistic distributions show comparable performance in central values, Burr-XII proves more effective in capturing extreme SO2 concentrations, whereas Dagum-I exhibits substantial deviations in the upper quantiles. Across all figures, the P-P and Q-Q plots further validate Burr-XII's strong alignment with empirical percentiles, confirming its reliability in modeling skewed air pollution data. These visual comparisons are consistent with the statistical results in Table V, where Burr-XII achieved the lowest error metrics (MAE, MSE, RMSE), reinforcing its suitability as the most robust distribution for air quality modeling. Overall, the Burr-XII distribution provides the best fit, slightly outperforming Log-Logistic, while the Dagum-I distribution is less suitable.
V. COMPREHENSIVE ANALYSIS AND FUTURE DIRECTIONS
A. Comprehensive Analysis
This research aimed to assess the effectiveness of the Burr-XII distribution in modeling daily average ambient concentrations of NO, SO2, and CO concentrations in urban cities. By comparing the Burr-XII distribution with the Dagum-I and Log-Logistic distributions, the study identified the Burr-XII distribution as the most effective model for these pollutants. Descriptive statistics indicated that all datasets exhibited significant positive skewness and kurtosis, suggesting non-normal distributions characterized by extended right tails and sharp peaks. Skewness-kurtosis plots further supported the Burr-XII distribution's suitability, aligning closely with empirical data in these characteristics.
Parameter estimates, based on MLE, revealed that the Burr-XII distribution consistently yielded lower standard errors compared to the Log-Logistic and Dagum-I distributions. This demonstrates its reliability in assessing pollution levels against both Indian and international standards. Goodness-of-fit evaluations highlighted the Burr- XII distribution's superior performance, with the lowest values for AD, CvM, KS statistics, as well as the lowest AIC, BIC, CAIC, ABIC and HQIC. Additionally, it showed the lowest MAE, MSE, and RMSE, underscoring its accuracy in predicting pollutant concentrations.
B. Conclusions
The study establishes the Burr-XII distribution has emerged as the most reliable model for accurately estimating daily average concentrations of carbon monoxide (CO), nitric oxide (NO), and sulfur dioxide (SO2) in urban regions. The Burr-XII distribution consistently outperformed the Dagum-I and Log-Logistic distributions across various statistical tests, selection criteria, and error metrics, demonstrating its robustness in handling positively skewed, heavy-tailed pollutant data. Its ability to provide lower standard errors, better goodness-of-fit statistics (e.g., Kolmogorov-Smirnov, Anderson-Darling), and minimal prediction errors (MAE, MSE, RMSE) underscores its suitability for modeling urban air quality. This model's strong alignment with empirical data and its compatibility with regulatory air quality standards reinforce its utility for environmental monitoring and policy formulation. By accurately modeling pollutant concentrations, the Burr-XII distribution supports proactive interventions to mitigate air pollution and its adverse effects on public health.
C. Future Directions
Future research should explore the Burr-XII distribution's applicability to broader geographic regions and longer temporal datasets, incorporating diverse environmental settings and seasonal variations. Expanding the study to include additional pollutants like PM2.5, PM10, ozone (O3), and nitrogen dioxide (NO2) can provide a more comprehensive understanding of urban air quality dynamics. Integrating meteorological factors (e.g., wind speed, temperature, and humidity) with pollutant modeling can enhance predictions of pollutant dispersion and seasonal trends. Comparing the Burr-XII distribution with machine learning techniques, such as neural networks and ensemble methods, could provide valuable insights into improving prediction accuracy. Real-time monitoring systems and linking pollutant data with health outcome metrics will help bridge the gap between research and policy, enabling more targeted interventions. Additionally, incorporating uncertainty analyses, cross-validation methods, and community-based air quality data collection can improve model reliability and applicability, paving the way for advanced environmental management solutions.
REFERENCES
[1] A. Marani, I. Lavagnini, and C. Buttazzoni, "Statistical study of air pollutant concentrations via generalized gamma distributions," Journal of the Air Pollution Control Association, vol. 36, no. 11, pp. 1250-1254, 1986.
[2] M. L. Bell, "The use of ambient air quality modeling to estimate individual and population exposure for human health research: A case study of ozone in the Northern Georgia Region of the United States," Environment International, vol. 32, no. 5, pp. 586-593, 2006.
[3] B. De Foy, W. Lei, M. Zavala, R. Volkamer, J. Samuelsson, J. Mellqvist, and L. T. Molina, "Modeling constraints on the emission inventory and on vertical dispersion for CO and SO2 in the Mexico City Metropolitan Area using Solar FTIR and zenith sky UV spectroscopy," Atmospheric Chemistry and Physics, vol. 7, no. 3, pp. 781-801, 2007.
[4] X. Jiang, S. Deng, N. Liu, and B. Shen, "The statistical distributions of SO2, NO2, and PM10 concentrations in Xi'an, China," in Proceedings of the International Symposium on Water Resource and Environmental Protection, IEEE, vol. 3, no.3, pp. 2206-2212, 2011.
[5] P. Sharma, A. Chandra, S. C. Kaushik, P. Sharma, and S. Jain, "Predicting violations of national ambient air quality standards using extreme value theory for Delhi city," Atmospheric Pollution Research, vol. 3, no. 2, pp. 170-179, 2012.
[6] S. M. Benjamin, V. H. Humberto, and C. Arnold B., "Use of the Dagum distribution for modeling tropospheric ozone levels," Journal of Environmental Statistics, vol. 5, no. 5, pp. 1-11, 2013.
[7] G. Favarato, H. R. Anderson, R. Atkinson, G. Fuller, I. Mills, and H. Walton, "Traffic-related pollution and asthma prevalence in children: Quantification of associations with nitrogen dioxide," Air Quality, Atmosphere & Health, vol. 7, no.4, pp. 459-466, 2014.
[8] D. Ganora and F. Laio, "Hydrological applications of the Burr distribution: Practical method for parameter estimation," Journal of Hydrologic Engineering, vol. 20, no. 11, pp. 04015024, 2015.
[9] W. M. Thupeng, "Use of the three-parameter Burr-XII distribution for modeling ambient daily maximum nitrogen dioxide concentrations in the Gaborone fire brigade," American Scientific Research Journal for Engineering Technology and Sciences (ASRJETS), vol. 26, no. 2, pp. 18-32, 2016.
[10] H. Jamaati, M. Attarchi, S. Hassani, E. Farid, S. M. Seyedmehdi, and P. S. Pormehr, "Investigating air quality status and air pollutant trends over the Metropolitan Area of Tehran, Iran, over the past decade between 2005 and 2014," Environmental Health and Toxicology, vol. 33, no. 2, pp.01-07, 2018.
[11] F. López-Rodríguez, J. García-Sanz-Calcedo, F. J. Moral-García, and A. J. García-Conde, "Statistical study of rainfall control: The Dagum distribution and applicability to the Southwest of Spain," Water, vol. 11, no. 3, pp. 453-467, 2019.
[12] A. H. Muse, A. H. Tolba, E. Fayad, O. A. Abu Ali, M. Nagy, and M. Yusuf, "Modeling the COVID-19 mortality rate with a new versatile modification of the log-logistic distribution," Computational Intelligence and Neuroscience, vol. 2021, no.1, pp. 8640794- 8640807, 2021.
[13] I. Elbatal, S. Khan, T. Hussain, M. Elgarhy, N. Alotaibi, H. E. Semary, and M. M. Abdelwahab, "A new family of lifetime models: Theoretical developments with applications in biomedical and environmental data," Axioms, vol. 11, no. 8, pp. 361-389, 2022.
[14] W. Emam and Y. Tashkandy, "Modeling the amount of carbon dioxide emissions: Application of a new modified alpha power Weibull-X family of distributions," Symmetry, vol. 15, no. 2, pp. 366-384, 2023.
[15] A. K. Chaudhary, L. B. S. Telee, M. Karki, and V. Kumar, "Statistical analysis of air quality dataset of Kathmandu, Nepal, with a new extended Kumaraswamy exponential distribution," Environmental Science and Pollution Research, vol. 31, no. 1, pp. 21073-21088, 2024.
[16] Ahmat H., Yahaya A.S., and Ramli N.A., "Prediction of PM10 extreme concentrations in urban monitoring stations in Selangor, Malaysia using three parameters extreme value distributions (EVD)," Jurnal Teknologi, vol. 77, no. 32, pp. 37-46, 2015.
[17] K. Suebyat and N. Pochai, "A numerical simulation of a threedimensional air quality model in an area under a Bangkok sky train platform using an explicit finite difference scheme," IAENG International Journal of Applied Mathematics, vol. 47, no. 4, pp. 471-476, 2017.
[18] J. Sooknum and N. Pochai, "A mathematical model for the evaluation of airborne infection risk for bus passengers," IAENG International Journal of Computer Science, vol. 50, no. 1, pp. 14-22, 2023. [19] P. Khamrot, N. Phankhieo, P. Wachirawongsakorn, S. Piros, and N. Deetae, "Analysis of Carbon Dioxide Value with Extreme Value Theory Using Generalized Extreme Value Distribution," IAENG International Journal of Applied Mathematics, vol. 54, no. 10, pp. 2108-2117, 2024.
© 2025. This work is published under https://creativecommons.org/licenses/by-nc-nd/4.0/ (the"License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.