1. Introduction
Koenker and Bassett [1] introduced the concept of the quantile regression (QR) model. Unlike the linear regression (OLS) model, OLS focuses solely on modeling the mean of the response variable. QR allows us to estimate regression coefficients at different quartiles of the response variable. This enables us to capture the varying degrees of correlation between the covariates and the response variable. Quantile regression models are widely used in various fields such as finance, biology, and ecology (Baur and Dirk [2]; Huang et al. [3]; Cade and Noon [4]) due to their robust to outliers in the data, robustness, and wide applicability. Yu and Moye [5] examined Bayesian quantile regression (QR) by redefining QR as an asymmetric Laplace distribution. In continuation of this work, Kozumi and Kobayashi [6] proposed an effective Gibbs sampling method for the QR model by leveraging the decomposition property of the asymmetric Laplace distribution. Alhamzawi and Rahim [7] explored the composite quantile regression model with Bayesian estimation, while Yuan et al. [8] focused on studying Bayesian composite quantile regression for a single indicator model. Additionally, Hu et al. [9] introduced a Bayesian joint quantile regression model. An important aspect of building quantile regression models is the selection of predictor variables. Li et al. [10] approached the regularization problem of quantile regression from a Bayesian perspective, focusing on variable selection. To address the instability of posterior estimates caused by variable selection in Gibbs sampling and convergence issues due to fuzzy priors, Alhamzawi and Yu [11] proposed random search variable selection (ISSVS) within a Bayesian framework. Mallick et al. [12] introduced reciprocal Lasso (rLasso) regularization, which offers advantages over Lasso regularization in terms of estimation, prediction, and variable selection within the Bayesian framework. Considerable progress has been made in Bayesian frameworks for both statistical inference and variable selection in quantile regression. However, it is important to note that all the aforementioned studies are based on fully observable data.
Missing data are inevitable in research and can result from various uncontrollable factors. For instance, some results may be lost due to machine malfunction, while in questionnaires, individuals may be unwilling to disclose their income. The study of missing data originated in the 1970s, and Little and Rubin [13] defined three missing data mechanisms based on the causes of missingness: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). The first two mechanisms, MCAR and MAR, are unrelated to the missing data themselves. However, the third mechanism, MNAR, is directly associated with the missing data and closely reflects real life. Ignoring non-negligible missing data, especially those reflecting practical scenarios through the MNAR mechanism, can lead to erroneous conclusions. Recognizing the distinctive attributes of such data and the multitude of benefits offered by quantile regression, several researchers have explored quantile regression models as a means to address non-negligible missing data. For instance, Yuan and Yin [14] examined Bayesian quantile regression models for longitudinal data with non-negligible missingness. Zhao et al. [15] conducted a study on several estimation methods using inverse probability weighting (IPW) to estimate parameters in quantile regression models when either covariates or response variables more with non-negligible missing values. Wang and Tang [16] investigated nonlinear dynamic factor analysis models with mixed discrete and non-negligible missing covariates, employing Bayesian quantile regression models for statistical inference. Tang et al. [17] and Mulati and Tang [18] explored nonlinear dynamic factor analysis models with non-negligible missing data using Bayesian non-parametric and semi-parametric approaches, respectively. In all of the aforementioned studies, the Bayesian framework was adopted, and Bayesian inference was facilitated using the Markov Chain Monte Carlo (MCMC) algorithm. However, the MCMC algorithm has certain inherent limitations. It can be challenging to sample from, especially when dealing with large datasets or complex models.
The variational Bayesian algorithm transforms the high-dimensional integration problem in Bayesian inference into an optimization problem, enabling efficient computation of the evidence lower bound and obtaining a variational approximation of the posterior distribution, rendering it well-suited for analyzing extensive datasets. The development of variational inference (VI) has also yielded advantages for high-dimensional data, such as stochastic variational inference (SVI) by Hoffman et al. [19], “black box” variational inference by Ranganath et al. [20], and amortized variational inference by Ganguly et al. [21]. Various studies have utilized VI in different contexts, encompassing Bayesian approximation inference for models with missing covariates, as demonstrated in the work of Faes et al. [22], exploring parameter uncertainty in dynamic factorial models with missing data as shown by Erik [23], and conducting statistical inference on identified gene regulatory networks with missing data through variational Bayesian inference as presented in the research by Liu et al. [24]. However, none of the existing works have applied the variational Bayesian algorithm to infer quantile regression (QR) models with missing data.
The main contributions of this paper include the following: (i) a variational Bayesian approach for parameter estimation in quantile regression models; (ii) a variational Bayesian approach for quantile regression models with missing covariates and response variables, in which the benefits of our suggested method are demonstrated through simulations; and (iii) a variational Bayesian approach for variable selection in quantile regression models with missing data, which does the job very well.
This paper is organized as follows: Section 2 presents the variational Bayesian algorithm, QR, lasso penalty term, and missing data. In Section 3, variational inference is performed for QR in the presence of missing covariates and missing response variables, respectively. In Section 4, the variable selection and parameter estimation are performed for QR with lasso penalty term in the presence of missing covariates and response variables, respectively. The feasibility of variational Bayes in QR with missing data and the superiority of comparing Gibbs’ algorithm are illustrated by four experiments in Section 5. Section 6 analyzes the example data using the variational Bayes algorithm and obtains better results.
2. Model and Notation
2.1. Quantile Regression Model
The quantile regression model was initially introduced by Koenker and Bassett [1], and the model can be represented as follows:
(1)
represents a vector of covariates, and denotes a response variable. Let p () represent the quantile level. The parameter vector is a k-dimensional vector that is associated with the quantile level p. The random error term is denoted as . In this case, we consider a more general distribution for the error term that satisfies .For the quantile regression model, given a specific and , the model can be expressed as . The estimation of can be obtained by minimizing the following formula:
The loss function is defined as , where represents the indicator function. Due to the non-differentiability of the loss function, obtaining estimates for becomes challenging. Yu and Moye [5] demonstrated that the quantile regression model can be estimated within a Bayesian framework when the error term follows an asymmetric Laplacian distribution (ALD). However, ALD is not a standard distribution, making its posterior density function complex and inevitably increasing the computational burden. According to Kozumi and Kobayashi [6], ALD can be expressed as a mixture of exponential and normal distributions. Specifically,
(2)
Among them, and are independent of each other. Here, and . Finally, the quantile regression model can be expressed as a layered model, represented as follows:
(3)
2.2. Elements of Variational Bayes
Taking into account the latent variable and the observed values , the joint density function can be expressed as follows:
Variational Bayesian inference no longer relies on sampling to approximate the posterior density of complex problems. Instead, it employs optimization techniques. Within an acceptable range of error, a simpler density function (where represents the space of possible density functions for the latent variable ) is used to approximate the posterior density . The difference between these two density functions is measured using the Kullback–Leibler divergence.
(4)
Alternatively,
(5)
The term is referred to as the evidence lower bound. To minimize the KL divergence, we maximize the lower bound.
(6)
This is equivalent to solving the following optimization problem:
(7)
By solving the optimization problem, the best approximation function can be obtained as follows:
(8)
or(9)
where . For convenience, in the following paragraphs, is denoted as .Additionally, the complexity of the density of the latent variables determines the complexity of the optimization algorithm. Therefore, we consider the mean-field variational family:
2.3. Bayesian Lasso Regularized Quantile Regression
Li and Zhu [25] proposed lasso regularized quantile regression. Specifically, they introduced a norm penalty term to control the complexity of the model, aiming to shrink some regression coefficients to zero. The objective function can be expressed as follows: Li and Zhu [25] proposed lasso regularized quantile regression. Specifically, they introduced an norm penalty term to control the complexity of the model, aiming to shrink some regression coefficients to zero. The objective function can be expressed as follows:
(10)
Here, is the regularization parameter used to adjust the balance between model complexity and the data. The term represents the loss function proposed by Koenker and Bassett [1]. It is defined as:
Bayesian lasso regularization for quantile regression was introduced by Li et al. [10], which improves upon the standard lasso regularization by incorporating a Bayesian framework. The Bayesian lasso regularization tends to capture outliers in the data and improve robustness. By introducing suitable priors on , the solution to (10) is equivalent to the Bayesian maximum a posteriori (MAP) estimation.
In this approach, Laplace priors are used for the parameters :
(11)
By using the equation:
(12)
where , the Laplace prior on can be written as:(13)
where and .Considering the error term follows an asymmetric Laplace distribution (ALD), the posterior distribution of is given by:
(14)
To complete the model, gamma priors are placed on and , resulting in the following hierarchical model:
(15)
2.4. Missing Data
In the context of missing data, a comprehensive study was conducted by Little and Rubin [13], who provided insights into missing data mechanisms and patterns. Missing data mechanisms can be classified into three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In the following, we describe each mechanism and introduce the relevant parameters and their priors for modeling the missing data.
-
(1). Missing Completely at Random (MCAR): These types of missing data are not related to either the observed data or the missing data. When or is missing, the probability of missingness, denoted as , follows a certain link function parameterized by , i.e., .
-
(2). Missing at Random (MAR): These types of missing data are only related to the observed data. For example, when the covariate is missing, the probability of missingness is conditional on the observed response and follows a link function parameterized by , i.e., .
-
(3). Missing Not at Random (MNAR): These types of missing data are related to the missing data itself. For example, when the covariate is missing, the probability of missingness depends on the observed covariates and follows a link function parameterized by , i.e., .
In the vast proportion of previous studies, the emphasis has been on random missing mechanisms. However, missing data frequently exhibit a connection to the data that are missing, suggesting nonignorable missingness. The study of such nonignorable missing data holds significant value as disregarding this data can potentially result in erroneous conclusions during analysis.
Furthermore, the missing data quantile regression models discussed in Section 3 and Section 4 are hierarchical models that can be represented using probabilistic-directed acyclic graphs (DAGs). In these graphs, the nodes correspond to the parameters in the model, and the arrows illustrate the dependencies between the parameters (Bishop [26]; Wasserman [27]). Specifically, in the DAG presented in this paper, the unshaded nodes correspond to the observed data.
Now, let us delve into Equation (3), where and denote the observed values of response variables and covariates for the ith observation. Assuming that both and are prone to nonignorable missingness, but not simultaneously missing, we can establish this model as a quantile regression model that accommodates nonignorable missing data. To effectively model the missing data mechanism, we introduce pertinent parameters and assign their prior distributions. For the random variable , the optimal density function for it .
3. Variational Bayesian Inference
3.1. Missing Covariate Variables
In Equation (3), we define the matrix:
When the covariate x is missing, we introduce an indicator function to detect if is missing, defined as:
Here, represents the -dimensional subvector consisting of the missing , and . However, may not correspond to the original data in terms of subscripts. Considering the nature of the indicator function, follows a distribution with probability , where denotes the ith row of matrix . We adopt the probit regression model as the link function , thus:
To implement the approach proposed by Albert and Chib [28], we introduce n independent latent variables in the missing data mechanism, where follows a normal distribution with mean and variance 1. For , we define if , and if . The interactions between the regression parameters and the parameters of the missing data mechanism are shown in Figure 1.
3.2. Variational Bayesian Inference of QR Model Parameters
Based on the above parameters, we consider the mean field variational family,
In , are independent of each other, are independent of each other, are independent of each other, so there was:
The optimal density function for each parameter is derived as follows. Please refer to Appendix A for a detailed derivation of the optimal density function for each parameter, as well as the calculations of and , where .
(16)
(17)
(18)
(19)
(20)
(21)
(22)
For the generalized inverse Gaussian distribution , where is the Bessel function with order p, with
(23)
(24)
(25)
For the inverse gamma distribution , where denotes the inverse gamma function, we have
(26)
In addition, the required expectations for the additional variable a are as follows:
(27)
In the model, the evidence lower bound by
(28)
where:(29)
The specific algorithm is shown in Algorithm 1.
Algorithm 1: VI with Nonignorable Missing Covariates in QR Models. |
Input: The prior distribution of each parameter, including , and a given lower bound on the evidence lower bound |
3.3. Missing Response Variables
For Equation (3), let us consider the scenario where the response variable contains missing values. Similar to missing covariates, we introduce an indicator function to detect missing response values, defined as:
Here, represents the -dimensional subvector consisting of the missing , and we have . However, does not necessarily correspond to the original data in terms of subscripts.
Similar to Section 3.1, we introduce the indicator function and consider the link function as a logistic regression model. Consequently, we have:
Here, represents the i-th row of the matrix , and is the parameter vector.
Auxiliary Variables
Polson et al. [29] proposed a Bayesian estimation method for logistic regression models by introducing auxiliary variables that follow the Pòlya-Gamma distribution. This approach addresses the challenges associated with complex integration and sampling using the Metropolis–Hastings (MH) algorithm.
In their methodology, they consider the following model:
where and . They place a Gaussian prior on , denoted as . To incorporate the auxiliary variable , they introduce it into the model.The posterior density, after adding the auxiliary variable , can be expressed as follows:
Here, represents the likelihood function, is the prior distribution of , is the distribution of the auxiliary variable , and denotes the prior distribution of the covariates .
The specific form of the posterior density and the method of inference depend on the choice of prior distributions, likelihood function, and the implementation of the Pòlya-Gamma auxiliary variables.
whereWe define , where represents the diagonal matrix with as the diagonal elements. Here, .
Variational Bayesian inference is typically applicable to specific types of models, particularly conditional covariance exponential family models. However, logistic regression models with Gaussian priors are an exception because there is no covariance between the logistic likelihood and the Gaussian prior in such models. To address this limitation, Daniele et al. [30] proposed introducing an additional variable that follows the Pòlya-Gamma distribution. By incorporating this variable into models with logistic components, it strengthens the covariance between the logistic likelihood and the Gaussian prior. This enables Bayesian framework statistical inference for this class of models using a variational Bayesian algorithm.
In the context of this chapter, we consider the missing data mechanism as a logistic regression model. Consequently, we introduce additional variables that follow the Pòlya-Gamma distribution, denoted as . The interaction between the parameters is shown in Figure 2.
3.4. Variational Bayesian Inference of QR Model
For the above parameters, we consider the mean field variational family,
Consider the variables , , and . Here, are independent of each other, are independent of each other, and are independent of each other. This follows a similar pattern as discussed in Section 3.1.
In the scenario of missing response variables, the posterior probabilities , , and remain the same as in the presence of missing covariates and are not reiterated here. However, we provide the full conditional probability posterior for , , and . Following the approach of Polson et al. [29], we consider the prior distribution of as .
where , denotes the diagonal matrix with diagonal elements, . The optimal density of each parameter in the missing data mechanism section is as follows:(30)
(31)
where:(32)
(33)
(34)
The optimal density functions of the remaining parameters are as follows, and the detailed derivation of the optimal density function for each parameter and is similar to .
(35)
(36)
(37)
(38)
In the model, the evidence lower bound by
(39)
The specific algorithm is shown in Algorithm 2.
Algorithm 2: VI with Nonignorable Missing Response in QR Models. |
Input: The prior distribution of each parameter, including , and a given lower bound on the evidence lower bound |
4. Bayesian Variable Selection of the QR Model
4.1. Covariates with Nonignorable Missing Variable Choices
For Equation (15), the following matrix is defined:
In the context of our analysis, the variable represents the covariate for the i-th observation, while denotes the response variable corresponding to the same observation. When any of the covariates in X, specifically the j-th dimension covariate where j ranges from 1 to k, is missing, we follow a similar approach as described in Section 3.1. We introduce the indicative function to identify the missing covariates and consider the missing data mechanism as a probit regression.
Furthermore, to incorporate the variables’ choices, we introduce the parameters s and . The prior distribution for s follows an exponential distribution, specifically . On the other hand, the parameter is assigned a prior distribution with a gamma distribution, denoted as . The parameter interactions are shown in Figure 3.
The optimal densities of the parameters are similar to those of Algorithm 1 and will not be repeated. In the following we give the optimal density functions for the parameters ,s,,, where .
-
(1). The full conditional distribution of the parameter is as follows:
where . Therefore, the full conditional distribution of the parameter is , and . From Equation (7), we know that the parameter optimal density function is normally distributed as follows:(40)
where -
(2). The full conditional distribution of the parameter s is as follows:
Therefore, the full conditional distribution of the parameter is . From Equation (7), we know that the parameter optimal density function is a generalized inverse Gaussian distribution
(41)
where . -
(3). The full conditional distribution of the parameter is as follows:
The optimal density function is known to be the gamma distribution
(42)
where -
(4). The full conditional distribution of is as follows:
The above equation, , . The optimal density function can be found as
(43)
where:In the model, the evidence lower bound by
(44)
The specific algorithm is shown in Algorithm 3.
4.2. Response with Nonignorable Missing Variable Choices
Only when the response variable has a nonignorable presence of missing values, similar to Section 3.3, we consider the missing data mechanism as a logistic regression model with parameters s and . The prior distribution for these parameters follows the same approach as described in Section 4.1. The interactions between the parameters shown in Figure 4.
Algorithm 3: VI with Nonignorable Missing Covariates in Bayesian lasso regularized QR Models. |
Input: The prior distribution of each parameter, including , and a given lower bound on the evidence lower bound |
For the above parameters, we consider the family of mean field variables.
Since are independent of each other, so . In addition, the optimal density functions for the parameters are given in Section 4.1, the optimal density functions for the parameters are given in Section 3.3, and only the optimal density functions for the parameter are given here as follows.
From Equation (7), we know that the parameter optimal density function is normally distributed
(45)
whereIn the model, the evidence lower bound by the evidence lower bound in this model is as follows, and the specific algorithm is described in Algorithm 4.
Algorithm 4: VI with Nonignorable Missing Response in Bayesian Lasso-Regularized QR Models. |
Input: The prior distribution of each parameter, including , and a given lower bound on the evidence lower bound |
(46)
5. Simulation Studies
In this section, we conducted four experiments to validate the effectiveness of variational inference for parameter estimation in a quantile regression model with non-negligible missing data. The computer involved used the Windows operating system with an Intel® Core i5-8400 six-core processor. The experimental data were generated based on the following quantile regression models:
In the above equation, and , where . Furthermore, we also performed Gibbs sampling simulations for different scenarios in Simulations 1 and II. We compared the results in terms of CPU time, the deviation of parameter estimates from the true values (BIAS, ), the root mean square error (RMSE, ), and the standard deviation (SD). For Simulations I and II, we introduced approximately 10% missing data as M1 and approximately 20% missing data as M2 under different error terms.
-
Simulation I:
In this simulation, we take or 30, randomly generated from the normal distribution and consider the case of missing covariates . When , we take the parameters , and the truth values of are .
The prior distributions for the parameters and are set as and . We consider different distributions for and examine various deletion cases at different quartiles within each distribution. The details of these cases are as follows: (C1): follows a standard normal distribution . We set the true value of as , and the missing rates of are approximately . Alternatively, when we set the true value of as , the missing rate of is about . (C2): follows a mixed normal distribution . The true value of is set as , and the missing rates of are approximately . Alternatively, when we set the true value of as , the missing rate of is about . (C3): follows a chi-squared distribution . We set the true value of as , and the missing rates of are approximately . Alternatively, when we set the true value of as , the missing rate of is about . When , We examined the M1 situation, maintaining the parameters and error distributions as previously detailed.
-
Simulation II:
In this simulation, we consider a scenario where or 30, . The covariate is randomly generated from a normal distribution . When n = 200, we focus on the case of missing response variable . The true values of the parameters and are set as follows: The prior distributions for the parameters and are set as and . We consider different distributions for and examine various deletion cases at different quartiles within each distribution. The details of these cases are as follows: (C1): follows a standard normal distribution . We set the true value of as , and the missing rates of are approximately . Alternatively, when we set the true value of as , the missing rates of are approximately . (C2): follows a mixed normal distribution . We set the true value of as , and the missing rates of are approximately . Alternatively, when we set the true value of as , the missing rates of are approximately . (C3): follows a chi-square distribution . We set the true value of as , and the missing rates of are approximately . Alternatively, when we set the true value of as , the missing rates of are approximately . When , we examined the M1 situation, maintaining the parameters and error distributions as previously detailed.
By comparing the results with and without the inclusion of the scale parameter in Simulation I and Simulation II, when , we observed the following findings for different missing cases and error distributions. The parameter estimation results obtained using the VI algorithm for Simulation I and Simulation II are presented in Table 1 and Table 2, respectively. Meanwhile, the parameter estimation results obtained using Gibbs are displayed in Table A1 and Table A2 (Appendix B). The Bayesian estimates obtained by the proposed method we present perform satisfactorily in that (i) they have smaller BIAS, RMS, and SD, and the SD and RMS values are close to each other; and (ii) the comparison between Table 1 and Table A1, and Table 2 and Table A2 reveals that different error distributions, various missing cases, and the inclusion of the parameter have a significant impact on CPU time. When considering covariates with nonignorable missingness and including the parameter , VI performs over ten times faster than Gibbs. Without including , VI is nearly 50 times faster than Gibbs. Similarly, in the presence of nonignorable missing response variables and including sigma, VI is nearly 50 times faster than Gibbs. Without , VI speeds up by almost 80 times compared to Gibbs. This means our proposed variational Bayesian method is faster than MCMC while ensuring parameter estimation accuracy. When , although VI is almost a hundred times faster than Gibbs, its accuracy is somewhat lower. This difference is even more noticeable when the response variable is absent, for specific displays in Table 3.
Based on the good performance of variational inference observed in Simulation I and Simulation II, we employed Bayesian lasso for variable selection in the models with missing covariates and response variables in Simulations III and IV, respectively. Furthermore, the variational Bayesian inference achieved convergence in both Simulations I and II. For convenience, we provide the convergence results of C1 at different quartiles of M1 when the covariates have missing values and the response variables have missing values, as depicted in Figure 5, see Figure 6 for run time.
-
Simulation III:
In this simulation, we utilize a Bayesian lasso quantile regression model to perform variable selection and parameter estimation in the presence of missing covariates. The data generation process follows the equation , where we omit the scale parameter . The true values are set as , , , with . The covariates are independently sampled from a normal distribution , and we take with a prior distribution of . The error term is considered under three different distributions, similar to Simulations I and II, when is missing. (C1), the error term follows a standard normal distribution . By setting the true values as , the missing rates of are approximately . (C2), the error term follows a mixed normal distribution . Taking the true values as , the missing rates of are approximately . (C3), the error term follows a chi-square distribution . With the true values , the missing rates of are approximately .
-
Simulation IV:
In this simulation, we employ a Bayesian lasso-quantile regression model to handle variable selection and parameter estimation in the presence of missing response variables. The data generation process is similar to Simulations III and does not include the scale parameter . We set and assume a prior distribution of . The error term is considered under three different distributions, following the pattern of Simulation I and Simulation II when is missing. (C1), the error term follows a standard normal distribution . With the true values set as , the missing rates of are approximately . (C2), the error term follows a mixed normal distribution . Taking the true values as , the missing rates of are approximately . (C3), the error term follows a chi-square distribution . With the true values , the missing rates of re about . In Simulation III and Simulation IV, we repeated 100 times and evaluated the performance of variable selection and parameter estimation using the following metrics: (1) the parametrization between the parameter estimates and the true value ; (2) mean square error (MSE) ; (3) the number of parameters that have a true value of zero and are correctly identified as zero is recorded as “C”; (4) the number of parameters whose true value is not zero but is incorrectly identified as zero is recorded as “IC”.
We investigated three interquartile levels (0.25, 0.5, 0.75), as shown in Table 4 and Table 5, yielding the following results:
(1). The “IC” value is extremely close to zero, and “C” closely approximates the number of zeros in the real parameter.
(2). The values for the three interquartile levels are below 0.03, indicating a negligible difference from zero.
(3). The “MSE” value at all three quantile levels is less than 0.09.
These findings demonstrate the strong performance of our proposed VI for variable selection.
6. A Real Example
In this section, we demonstrate the application of Algorithm 3 using data obtained from the American Press Institute, specifically the 1995 American University News Report dataset. The dataset can be accessed at
We select the following factors for analysis: the public/private indicator as , the student-to-faculty ratio as , the natural logarithm of the number of applicants accepted as , the average Combined SAT score as , the average ACT score as , the natural logarithm of the number of applications received as , the natural logarithm of the number of new students enrolled as , room and board costs as , and the natural logarithm of instructional expenditure per student as .
To ensure a complete set of observations, we assume , , and y to be fully observed. However, we remove one erroneous case from the response variable y, resulting in a total of 1203 cases. The final missing rates for the remaining variables are as follows: (0.41%), (39.23%), (45.22%), (0.49%), (0.16%), (4.90%), and (2.07%).
Furthermore, to facilitate statistical inferences on the data, we apply two preprocessing steps. First, we take the natural logarithm of the variables , , , , and . Second, we standardize all the variables to eliminate any scale differences among them. These preprocessing steps ensure that the data are appropriately transformed and normalized for subsequent analysis and inference.
We consider the following quantile regression model.
where denotes the intercept term. For with missing , they are assumed to follow a normal distribution, and the expectation and variance are the mean and variance of the observable data, respectively. The missing data mechanism is set as in the algorithm. We consider the above missingness as non-negligible missingness. We calculate the algorithm’s success based on and the estimates of the last known parameters: (1) have a small effect on the response variable, and the 0.5 quantile regression can be written as (2) with missing missing mechanisms for MAR, for MNAR. Lower bound convergence in Figure 7, Bayesian estimates (EST) and 95% confidence intervals (CI) of the parameters in Table 6.7. Discussion
In existing research on quantile regression models with significant missing data, most studies have utilized the Markov Chain Monte Carlo (MCMC) algorithm for Bayesian inference, despite some drawbacks such as sampling difficulties and extensive computational time. In this paper, we propose employing the variational Bayesian algorithm for statistical inference of the aforementioned model while incorporating variable selection. Specifically, in the presence of missing covariates, we consider a probit regression model for the missing data mechanism, and for the missing response variable, we consider a logistic regression model. Additionally, we employ lasso regularization for variable selection in both cases. Simulation and real-world experiments demonstrate that when covariates or response variables have nonignorable missing values in quantile regression models, the variational Bayesian method not only ensures inference precision but also consumes less time compared to MCMC.
Based on our experience, we often encounter situations where both covariates and response variables have missing values. The aforementioned developed variational Bayesian approach faces a well-known ill-posed problem in practical applications. To address this issue, we explore variational Bayesian parameter estimation for quantile regression models in the presence of simultaneous missing covariates or response variables.
Our proposed method has some potential drawbacks. For instance, as the number of dimensions increases, the convergence speed of variational Bayes may slow down. Furthermore, the calculation of expectations may lack an analytical solution, and assumptions about the correctness of the missing data mechanism must be considered. To solve these issues, we can employ commonly used machine learning techniques like deep learning and neural networks to address missing data mechanisms. Furthermore, in the future, the stochastic variational Bayesian method can enhance the algorithm’s efficiency, while the Bayesian dimensionality reduction approach can tackle issues related to high-dimensional quantile regression problems.
Conceptualization, M.T.; methodology, X.L. and M.T.; software, X.L.; data curation, X.L.; writing—original draft, X.L.; supervision, M.T. and X.H.; funding acquisition, X.H. All authors have read and agreed to the published version of the manuscript.
Not applicable.
We are grateful to the editor, associated editor, and four referees for their valuable suggestions and comments that greatly improved the article, grateful to both StatLib and the original contributor of the material.
The authors declare no conflict of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 3. DAG for the missing covariates for Bayesian lasso regularized quantile regression model.
Figure 4. DAG for the missing response for Bayesian lasso regularized quantile regression model.
Figure 6. Comparison of CPU time of variational Bayesian over Gibbs when covariates with missing.
Parameter estimation and CPU time cases in Simulation I using VI,
Method | Missing | p | BIAS | RMSE | SD | CPU Time | ||||
---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|||||
without | C1 | M1 | 0.25 | 0.006 | 0.078 | 0.062 | 0.138 | 0.061 | 0.114 | 124.4 |
0.5 | 0.049 | 0.040 | 0.079 | 0.109 | 0.062 | 0.101 | 111 | |||
0.75 | 0.104 | 0.0001 | 0.126 | 0.120 | 0.071 | 0.121 | 133.2 | |||
M2 | 0.25 | 0.014 | 0.082 | 0.069 | 0.152 | 0.068 | 0.128 | 185 | ||
0.5 | 0.071 | 0.022 | 0.095 | 0.117 | 0.063 | 0.116 | 176.6 | |||
0.75 | 0.129 | 0.001 | 0.147 | 0.134 | 0.072 | 0.136 | 201.8 | |||
C2 | M1 | 0.25 | 0.066 | 0.076 | 0.088 | 0.132 | 0.059 | 0.109 | 138.4 | |
0.5 | 0.056 | 0.039 | 0.076 | 0.101 | 0.050 | 0.095 | 139.8 | |||
0.75 | 0.036 | 0.030 | 0.079 | 0.105 | 0.072 | 0.101 | 137.4 | |||
M2 | 0.25 | 0.068 | 0.077 | 0.088 | 0.127 | 0.055 | 0.101 | 178.2 | ||
0.5 | 0.063 | 0.050 | 0.084 | 0.123 | 0.053 | 0.109 | 165 | |||
0.75 | 0.053 | 0.039 | 0.083 | 0.120 | 0.064 | 0.113 | 236.6 | |||
C3 | M1 | 0.25 | 0.028 | 0.040 | 0.058 | 0.101 | 0.053 | 0.096 | 124.8 | |
0.5 | 0.081 | 0.015 | 0.142 | 0.111 | 0.073 | 0.126 | 139.6 | |||
0.75 | 0.121 | 0.010 | 0.136 | 0.131 | 0.117 | 0.208 | 136.6 | |||
M2 | 0.25 | 0.045 | 0.019 | 0.121 | 0.130 | 0.062 | 0.103 | 172.2 | ||
0.5 | 0.110 | 0.018 | 0.124 | 0.109 | 0.072 | 0.153 | 158.8 | |||
0.75 | 0.137 | 0.018 | 0.105 | 0.162 | 0.108 | 0.109 | 268.4 | |||
with | C1 | M1 | 0.25 | 0.025 | 0.009 | 0.062 | 0.074 | 0.058 | 0.074 | 387.83 |
0.5 | 0.048 | 0.019 | 0.066 | 0.083 | 0.045 | 0.081 | 342.17 | |||
0.75 | 0.035 | 0.011 | 0.054 | 0.088 | 0.040 | 0.088 | 359.41 | |||
M2 | 0.25 | 0.045 | 0.011 | 0.062 | 0.085 | 0.043 | 0.085 | 1068.91 | ||
0.5 | 0.088 | 0.025 | 0.097 | 0.082 | 0.041 | 0.079 | 963.02 | |||
0.75 | 0.125 | 0.015 | 0.133 | 0.093 | 0.046 | 0.093 | 955.78 | |||
C2 | M1 | 0.25 | 0.098 | 0.009 | 0.105 | 0.074 | 0.038 | 0.074 | 384.02 | |
0.5 | 0.059 | 0.004 | 0.070 | 0.063 | 0.039 | 0.064 | 335.35 | |||
0.75 | 0.039 | 0.004 | 0.057 | 0.072 | 0.041 | 0.073 | 385.50 | |||
M2 | 0.25 | 0.098 | 0.016 | 0.105 | 0.075 | 0.040 | 0.073 | 1019.75 | ||
0.5 | 0.078 | 0.008 | 0.088 | 0.076 | 0.039 | 0.076 | 885.35 | |||
0.75 | 0.069 | 0.013 | 0.084 | 0.083 | 0.048 | 0.083 | 925.44 | |||
C3 | M1 | 0.25 | 0.026 | 0.004 | 0.035 | 0.046 | 0.023 | 0.046 | 320.22 | |
0.5 | 0.072 | 0.004 | 0.085 | 0.075 | 0.046 | 0.075 | 318.81 | |||
0.75 | 0.151 | 0.039 | 0.162 | 0.109 | 0.059 | 0.102 | 414.64 | |||
M2 | 0.25 | 0.045 | 0.006 | 0.054 | 0.050 | 0.029 | 0.050 | 903.49 | ||
0.5 | 0.117 | 0.037 | 0.125 | 0.086 | 0.042 | 0.078 | 865.73 | |||
0.75 | 0.175 | 0.036 | 0.186 | 0.130 | 0.063 | 0.125 | 1109.56 |
Parameter estimation and CPU time cases in Simulation II using VI,
Method | Missing | p | BIAS | RMSE | SD | CPU Time | ||||
---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|||||
with | C1 | M1 | 0.25 | 0.002 | 0.051 | 0.042 | 0.095 | 0.042 | 0.081 | 153.4 |
0.5 | 0.063 | 0.019 | 0.074 | 0.072 | 0.038 | 0.070 | 132.2 | |||
0.75 | 0.128 | 0.058 | 0.132 | 0.098 | 0.030 | 0.079 | 152.6 | |||
M2 | 0.25 | 0.011 | 0.025 | 0.035 | 0.086 | 0.033 | 0.082 | 211.2 | ||
0.5 | 0.076 | 0.025 | 0.085 | 0.088 | 0.038 | 0.084 | 190.8 | |||
0.75 | 0.151 | 0.019 | 0.156 | 0.079 | 0.042 | 0.077 | 200 | |||
C2 | M1 | 0.25 | 0.015 | 0.003 | 0.026 | 0.043 | 0.021 | 0.043 | 115.8 | |
0.5 | 0.003 | 0.001 | 0.020 | 0.038 | 0.020 | 0.039 | 103.6 | |||
0.75 | 0.077 | 0.009 | 0.079 | 0.057 | 0.020 | 0.057 | 148.4 | |||
M2 | 0.25 | 0.038 | 0.022 | 0.045 | 0.056 | 0.023 | 0.051 | 168.4 | ||
0.5 | 0.048 | 0.032 | 0.040 | 0.030 | 0.019 | 0.030 | 157.2 | |||
0.75 | 0.117 | 0.030 | 0.012 | 0.064 | 0.029 | 0.055 | 173 | |||
C3 | M1 | 0.25 | 0.015 | 0.003 | 0.026 | 0.043 | 0.021 | 0.043 | 127.8 | |
0.5 | 0.003 | 0.001 | 0.020 | 0.038 | 0.020 | 0.039 | 84 | |||
0.75 | 0.077 | 0.009 | 0.079 | 0.057 | 0.020 | 0.057 | 134.6 | |||
M2 | 0.25 | 0.043 | 0.022 | 0.053 | 0.055 | 0.030 | 0.051 | 150 | ||
0.5 | 0.048 | 0.032 | 0.054 | 0.057 | 0.026 | 0.048 | 134 | |||
0.75 | 0.117 | 0.032 | 0.121 | 0.064 | 0.029 | 0.055 | 214 | |||
without | C1 | M1 | 0.25 | 0.024 | 0.060 | 0.038 | 0.102 | 0.031 | 0.087 | 69.4 |
0.5 | 0.066 | 0.055 | 0.078 | 0.078 | 0.031 | 0.055 | 58.8 | |||
0.75 | 0.057 | 0.061 | 0.065 | 0.088 | 0.030 | 0.064 | 96 | |||
M2 | 0.25 | 0.049 | 0.090 | 0.128 | 0.116 | 0.064 | 0.076 | 88.4 | ||
0.5 | 0.037 | 0.087 | 0.040 | 0.100 | 0.030 | 0.051 | 77.6 | |||
0.75 | 0.049 | 0.018 | 0.057 | 0.185 | 0.028 | 0.046 | 113.5 | |||
C2 | M1 | 0.25 | 0.027 | 0.012 | 0.028 | 0.045 | 0.008 | 0.046 | 65 | |
0.5 | 0.019 | 0.019 | 0.025 | 0.031 | 0.015 | 0.025 | 57.8 | |||
0.75 | 0.002 | 0.013 | 0.017 | 0.037 | 0.017 | 0.035 | 81.6 | |||
M2 | 0.25 | 0.102 | 0.051 | 0.104 | 0.070 | 0.022 | 0.048 | 105.17 | ||
0.5 | 0.078 | 0.050 | 0.081 | 0069 | 0.021 | 0.047 | 97.80 | |||
0.75 | 0.057 | 0.044 | 0.061 | 0.070 | 0.023 | 0.054 | 120.80 | |||
C3 | M1 | 0.25 | 0.010 | 0.036 | 0.023 | 0.072 | 0.022 | 0.066 | 83.8 | |
0.5 | 0.009 | 0.015 | 0.024 | 0.040 | 0.023 | 0.038 | 86.4 | |||
0.75 | 0.076 | 0.027 | 0.080 | 0.059 | 0.025 | 0.052 | 103.2 | |||
M2 | 0.25 | 0.033 | 0.011 | 0.041 | 0.070 | 0.026 | 0.074 | 108 | ||
0.5 | 0.052 | 0.042 | 0.057 | 0.066 | 0.024 | 0.051 | 112.2 | |||
0.75 | 0.122 | 0.055 | 0.012 | 0.083 | 0.026 | 0.062 | 150.6 |
Parameter estimation and CPU time cases in Simulation I and Simulation II,
Method | Missing | p | BIAS | RMSE | SD | CPU Time | ||||
---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|||||
Simulation I | C1 | VI | 0.25 | 0.199 | 0.412 | 0.244 | 0.481 | 0.143 | 0.262 | 18.4 |
0.5 | 0.032 | 0.250 | 0.148 | 0.344 | 0.149 | 0.232 | 22.2 | |||
0.75 | 0.033 | 0.264 | 0.111 | 0.346 | 0.108 | 0.226 | 25.6 | |||
Gibbs | 0.25 | 0.033 | 0.065 | 0.123 | 0.267 | 0.128 | 0.264 | 3353.2 | ||
0.5 | 0.022 | 0.070 | 0.104 | 0.261 | 0.108 | 0.253 | 3347.8 | |||
0.75 | 0.035 | 0.141 | 0.132 | 0.268 | 0.105 | 0.223 | 4275 | |||
C2 | VI | 0.25 | 0.086 | 0.333 | 0.145 | 0.116 | 0.113 | 0.256 | 15.8 | |
0.5 | 0.065 | 0.281 | 0.11 | 0.365 | 0.090 | 0.239 | 17.6 | |||
0.75 | 0.007 | 0.299 | 0.099 | 0.357 | 0.093 | 0.257 | 18.4 | |||
Gibbs | 0.25 | 0.038 | 0.062 | 0.125 | 0.266 | 0.123 | 0.251 | 3668.4 | ||
0.5 | 0.018 | 0.132 | 0.100 | 0.230 | 0.119 | 0.230 | 3357.2 | |||
0.75 | 0.117 | 0.112 | 0.192 | 0.264 | 0.169 | 0.235 | 3873 | |||
C3 | VI | 0.25 | 0.145 | 0.433 | 0.186 | 0.493 | 0.121 | 0.243 | 16.8 | |
0.5 | 0.056 | 0.271 | 0.120 | 0.358 | 0.090 | 0.229 | 18 | |||
0.75 | 0.037 | 0.239 | 0.149 | 0.327 | 0.130 | 0.237 | 17.6 | |||
Gibbs | 0.25 | 0.063 | 0.082 | 0.133 | 0.225 | 0.110 | 0.211 | 4150 | ||
0.5 | 0.158 | 0.172 | 0.234 | 0.357 | 0.176 | 0.318 | 3834 | |||
0.75 | 0.127 | 0.192 | 0.361 | 0.464 | 0.359 | 0.425 | 3614 | |||
Simulation II | C1 | VI | 0.25 | 0.124 | 0.350 | 0.178 | 0.442 | 0.121 | 0.277 | 14.7 |
0.5 | 0.046 | 0.275 | 0.138 | 0.418 | 0.131 | 0.315 | 16.7 | |||
0.75 | 0.005 | 0.021 | 0.145 | 0.348 | 0.140 | 0.284 | 17 | |||
Gibbs | 0.25 | 0.030 | 0.040 | 0.088 | 0.116 | 0.074 | 0.126 | 1450.4 | ||
0.5 | 0.007 | 0.017 | 0.050 | 0.090 | 0.060 | 0.091 | 1473.6 | |||
0.75 | 0.007 | 0.030 | 0.077 | 0.145 | 0.078 | 0.146 | 1539.5 | |||
C2 | VI | 0.25 | 0.167 | 0.312 | 0.198 | 0.425 | 0.118 | 0.286 | 15.2 | |
0.5 | 0.009 | 0.269 | 0.155 | 0.341 | 0.155 | 0.225 | 15.8 | |||
0.75 | 0.042 | 0.273 | 0.187 | 0.372 | 0.187 | 0.275 | 16.2 | |||
Gibbs | 0.25 | 0.062 | 0.006 | 0.084 | 0.050 | 0.082 | 0.048 | 1566.17 | ||
0.5 | 0.028 | 0.005 | 0.004 | 0.059 | 0.031 | 0.057 | 1600.80 | |||
0.75 | 0.003 | 0.024 | 0.021 | 0.060 | 0.023 | 0.054 | 1666.80 | |||
C3 | VI | 0.25 | 0.090 | 0.246 | 0.123 | 0.312 | 0.092 | 0.266 | 15.4 | |
0.5 | 0.059 | 0.135 | 0.193 | 0.290 | 0.193 | 0.268 | 16.3 | |||
0.75 | 0.009 | 0.272 | 0.240 | 0.499 | 0.225 | 0.422 | 15.9 | |||
Gibbs | 0.25 | 0.033 | 0.001 | 0.041 | 0.060 | 0.046 | 0.064 | 1600 | ||
0.5 | 0.012 | 0.008 | 0.057 | 0.086 | 0.044 | 0.081 | 1600.2 | |||
0.75 | 0.012 | 0.025 | 0.062 | 0.003 | 0.066 | 0.082 | 1500.6 |
Results of variable selection for Simulation III.
0.25 | 0.5 | 0.75 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
L2 | MSE | C | IC | L2 | MSE | C | IC | L2 | MSE | C | IC | |
C1 | 0.0276 | 0.0072 | 4.85 | 0 | 0.0056 | 0.0022 | 4.94 | 0 | 0.0251 | 0.0080 | 4.94 | 0 |
C2 | 0.0252 | 0.0080 | 4.89 | 0 | 0.0063 | 0.0062 | 4.93 | 0 | 0.0184 | 0.0074 | 4.88 | 0 |
C3 | 0.0134 | 0.0048 | 4.86 | 0 | 0.0048 | 0.0019 | 4.75 | 0 | 0.0124 | 0.0045 | 4.9 | 0 |
Results of variable selection for Simulation IV.
0.25 | 0.5 | 0.75 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
L2 | MSE | C | IC | L2 | MSE | C | IC | L2 | MSE | C | IC | |
C1 | 0.0369 | 0.0118 | 4.93 | 0 | 0.0035 | 0.0011 | 4.93 | 0 | 0.0170 | 0.0053 | 4.95 | 0 |
C2 | 0.0071 | 0.0021 | 4.95 | 0 | 0.0036 | 0.0012 | 4.89 | 0 | 0.0063 | 0.0020 | 4.93 | 0 |
C3 | 0.0174 | 0.0053 | 4.97 | 0 | 0.0044 | 0.0016 | 4.91 | 0 | 0.0054 | 0.0018 | 4.95 | 0 |
Bayesian estimates (EST) and 95% confidence intervals (CI) of the parameters in real example.
Par | Est (CI) | Par | Est (CI) |
---|---|---|---|
|
−0.0334 (−0.0369,−0.0300) |
|
0.2552 (0.2538,0.2567) |
|
0.3947 (0.3885,0.4009) |
|
−0.1527 (−0.1622,0.1432) |
|
−0.0500 (−0.1102,0.0105) |
|
0.1066 (0.1066,0.1067) |
|
0.0121 (−0.1023,0.1265) |
|
−0.0284 (−0.0307,−0.0261) |
|
0.3185 (0.3114,0.3256) |
|
1.5119 (1.5117,1.5121) |
|
0.0649 (0.0616,0.0682) |
|
0.0036 (0.0028,0.0044) |
|
0.0335 (−0.0428,0.1138) |
|
1.5435 (1.5344,1.5346) |
|
−0.0600 (−0.1299,0.0098) |
|
0.0002 (0,0.0003) |
|
0.1480 (0.1387,0.1573) |
|
1.2650 (1.2648,1.2652) |
|
0.0340 (0.0208,0.0472) |
|
−0.0416 (−0.0442,−0.0399) |
|
1.5177 (1.5174,1.5180) |
|
1.4132 (1.4130,1.4133) |
|
0.0024 (0.0014,0.0034) |
|
−0.0042 (−0.0061,−0.0024) |
Appendix A
The derivations of Algorithms 1–4 are similar to each other; thus, we only refine the steps of Algorithm 1. The full conditional probability posterior are as follows:
In Step 4 of Algorithm 1,
In Step 7 of Algorithm 1,
Expressions for
In other words, by replacing
It is known that
Appendix B
Parameter estimation and CPU time cases in Simulation I using Gibbs,
Method | P | BIAS | RMSE | SD | CPU Time | |||||
---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|||||
with | C1 | M1 | 0.25 | 0.009 | 0.034 | 0.059 | 0.107 | 0.059 | 0.102 | 7273 |
0.5 | 0.013 | 0.047 | 0.055 | 0.092 | 0.054 | 0.079 | 8157 | |||
0.75 | 0.039 | 0.114 | 0.076 | 0.162 | 0.066 | 0.115 | 8925 | |||
M2 | 0.25 | 0.010 | 0.078 | 0.070 | 0.141 | 0.070 | 0.119 | 16,166 | ||
0.5 | 0.054 | 0.046 | 0.078 | 0.109 | 0.056 | 0.099 | 16,270 | |||
0.75 | 0.075 | 0.017 | 0.098 | 0.112 | 0.063 | 0.111 | 15,185 | |||
C2 | M1 | 0.25 | 0.065 | 0.045 | 0.089 | 0.120 | 0.062 | 0.112 | 8202 | |
0.5 | 0.011 | 0.051 | 0.056 | 0.122 | 0.055 | 0.111 | 8962 | |||
0.75 | 0.029 | 0.077 | 0.060 | 0.134 | 0.053 | 0.110 | 9922 | |||
M2 | 0.25 | 0.085 | 0.095 | 0.115 | 0.159 | 0.076 | 0.127 | 16,220 | ||
0.5 | 0.048 | 0.145 | 0.080 | 0.179 | 0.063 | 0.105 | 15,989 | |||
0.75 | 0.038 | 0.193 | 0.078 | 0.109 | 0.067 | 0.122 | 15,570 | |||
C3 | M1 | 0.25 | 0.077 | 0.034 | 0.078 | 0.062 | 0.029 | 0.055 | 8646 | |
0.5 | 0.167 | 0.088 | 0.165 | 0.117 | 0.042 | 0.085 | 9535 | |||
0.75 | 0.248 | 0.131 | 0.123 | 0.102 | 0.060 | 0.131 | 8627 | |||
M2 | 0.25 | 0.137 | 0.037 | 0.127 | 0.093 | 0.040 | 0.090 | 16,323 | ||
0.5 | 0.129 | 0.078 | 0.130 | 0.103 | 0.042 | 0.074 | 16,661 | |||
0.75 | 0.139 | 0.105 | 0.168 | 0.126 | 0.044 | 0.081 | 16,555 | |||
without | C1 | M1 | 0.25 | 0.038 | 0.010 | 0.056 | 0.073 | 0.041 | 0.073 | 5721 |
0.5 | 0.071 | 0.005 | 0.081 | 0.070 | 0.038 | 0.070 | 6958 | |||
0.75 | 0.112 | 0.012 | 0.120 | 0.078 | 0.045 | 0.078 | 5717 | |||
M2 | 0.25 | 0.038 | 0.101 | 0.056 | 0.073 | 0.041 | 0.073 | 12,503 | ||
0.5 | 0.071 | 0.005 | 0.081 | 0.070 | 0.081 | 0.070 | 12,324 | |||
0.75 | 0.112 | 0.012 | 0.120 | 0.073 | 0.045 | 0.078 | 12,901 | |||
C2 | M1 | 0.25 | 0.101 | 0.004 | 0.108 | 0.070 | 0.108 | 0.070 | 6989 | |
0.5 | 0.101 | 0.008 | 0.108 | 0.070 | 0.038 | 0.069 | 6934 | |||
0.75 | 0.033 | 0.007 | 0.049 | 0.081 | 0.036 | 0.081 | 6892 | |||
M2 | 0.25 | 0.110 | 0.010 | 0.116 | 0.073 | 0.039 | 0.073 | 12,901 | ||
0.5 | 0.067 | 0.048 | 0.079 | 0.088 | 0.041 | 0.074 | 12,484 | |||
0.75 | 0.083 | 0.046 | 0.092 | 0.100 | 0.042 | 0.089 | 12,560 | |||
C3 | M1 | 0.25 | 0.029 | 0.001 | 0.038 | 0.055 | 0.025 | 0.055 | 6922 | |
0.5 | 0.082 | 0.007 | 0.091 | 0.081 | 0.040 | 0.081 | 6888 | |||
0.75 | 0.142 | 0.008 | 0.153 | 0.105 | 0.058 | 0.105 | 6896 | |||
M2 | 0.25 | 0.047 | 0.016 | 0.056 | 0.052 | 0.031 | 0.049 | 12,359 | ||
0.5 | 0.113 | 0.032 | 0.120 | 0.089 | 0.042 | 0.084 | 12,426 | |||
0.75 | 0.175 | 0.044 | 0.187 | 0.118 | 0.065 | 0.110 | 12,462 |
Parameter estimation and CPU time cases in Simulation II using Gibbs,
Method | P | BIAS | RMSE | SD | CPU Time | |||||
---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|||||
with | C1 | M1 | 0.25 | 0.032 | 0.019 | 0.042 | 0.045 | 0.028 | 0.041 | 6159.5 |
0.5 | 0.001 | 0.044 | 0.023 | 0.055 | 0.023 | 0.033 | 6180.5 | |||
0.75 | 0.047 | 0.12 | 0.054 | 0.13 | 0.027 | 0.05 | 9567.5 | |||
M2 | 0.25 | 0.113 | 0.09 | 0.128 | 0.116 | 0.064 | 0.076 | 9144.5 | ||
0.5 | 0.037 | 0.087 | 0.04 | 0.1 | 0.03 | 0.051 | 10,985 | |||
0.75 | 0.049 | 0.018 | 0.057 | 0.185 | 0.028 | 0.046 | 10,882 | |||
C2 | M1 | 0.25 | 0.025 | 0.027 | 0.026 | 0.034 | 0.01 | 0.021 | 7521 | |
0.5 | 0.007 | 0.02 | 0.012 | 0.024 | 0.009 | 0.013 | 6396.5 | |||
0.75 | 0.006 | 0.032 | 0.012 | 0.037 | 0.01 | 0.018 | 7632 | |||
M2 | 0.25 | 0.083 | 0.087 | 0.084 | 0.087 | 0.028 | 0.05 | 9013 | ||
0.5 | 0.019 | 0.036 | 0.021 | 0.04 | 0.01 | 0.018 | 8655.5 | |||
0.75 | 0.017 | 0.061 | 0.021 | 0.067 | 0.011 | 0.028 | 9447 | |||
C3 | M1 | 0.25 | 0.085 | 0.05 | 0.089 | 0.064 | 0.028 | 0.042 | 7583 | |
0.5 | 0.022 | 0.041 | 0.032 | 0.057 | 0.024 | 0.039 | 6798.5 | |||
0.75 | 0.022 | 0.076 | 0.035 | 0.089 | 0.027 | 0.046 | 7240 | |||
M3 | 0.25 | 0.033 | 0.013 | 0.041 | 0.044 | 0.025 | 0.041 | 11,252.5 | ||
0.5 | 0.029 | 0.083 | 0.035 | 0.091 | 0.018 | 0.036 | 11,173.5 | |||
0.75 | 0.047 | 0.011 | 0.053 | 0.117 | 0.023 | 0.044 | 11,873.5 | |||
without | C1 | M1 | 0.25 | 0.034 | 0.032 | 0.047 | 0.06 | 0.032 | 0.051 | 4545 |
0.5 | 0.019 | 0.013 | 0.034 | 0.047 | 0.028 | 0.045 | 4625 | |||
0.75 | 0.007 | 0.037 | 0.031 | 0.068 | 0.03 | 0.057 | 5358 | |||
M2 | 0.25 | 0.079 | 0.03 | 0.112 | 0.099 | 0.105 | 0.108 | 8190 | ||
0.5 | 0.082 | 0.012 | 0.089 | 0.056 | 0.033 | 0.055 | 8599.5 | |||
0.75 | 0.062 | 0.023 | 0.076 | 0.052 | 0.039 | 0.047 | 8877.5 | |||
C2 | M1 | 0.25 | 0.027 | 0.008 | 0.031 | 0.027 | 0.014 | 0.025 | 4754 | |
0.5 | 0.023 | 0.001 | 0.033 | 0.029 | 0.023 | 0.029 | 4944.5 | |||
0.75 | 0.028 | 0.001 | 0.04 | 0.05 | 0.029 | 0.05 | 5175 | |||
M2 | 0.25 | 0.04 | 0.003 | 0.044 | 0.031 | 0.018 | 0.032 | 8334.5 | ||
0.5 | 0.035 | 0.002 | 0.04 | 0.03 | 0.019 | 0.03 | 8072.5 | |||
0.75 | 0.036 | 0.004 | 0.049 | 0.043 | 0.031 | 0.043 | 8232 | |||
C3 | M1 | 0.25 | 0.028 | 0.01 | 0.031 | 0.026 | 0.014 | 0.024 | 4822 | |
0.5 | 0.023 | 0.002 | 0.032 | 0.029 | 0.023 | 0.029 | 4756 | |||
0.75 | 0.028 | 0.001 | 0.04 | 0.05 | 0.029 | 0.05 | 4955.5 | |||
M2 | 0.25 | 0.04 | 0.02 | 0.041 | 0.029 | 0.01 | 0.023 | 7966 | ||
0.5 | 0.035 | 0.002 | 0.04 | 0.03 | 0.019 | 0.03 | 7895 | |||
0.75 | 0.036 | 0.004 | 0.048 | 0.043 | 0.031 | 0.043 | 8127.5 |
References
1. Koenker, R.; Bassett, G. Regression quantiles. Econom. Econom. Soc.; 1987; 46, pp. 33-50.
2. Baur, D.G.; Dimpfl, T.; Jung, R.C. Stock return autocorrelations revisited: A quantile regression approach. J. Empir. Finance; 2012; 19, pp. 254-265. [DOI: https://dx.doi.org/10.1016/j.jempfin.2011.12.002]
3. Huang, L.; Zhu, W.; Saunders, C.P.; MacLeod, J.N.; Zhou, M.; Stromberg, A.J.; Bathke, A.C. A novel application of quantile regression for identification of biomarkers exemplified by equine cartilage microarray data. BMC Bioinform.; 2008; 9, 300. [DOI: https://dx.doi.org/10.1186/1471-2105-9-300] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18597687]
4. Cade, B.S.; Barry, R.N. A gentle introduction to quantile regression for ecologists. Front. Ecol. Environ.; 2003; 1, pp. 412-420. [DOI: https://dx.doi.org/10.1890/1540-9295(2003)001[0412:AGITQR]2.0.CO;2]
5. Yu, K.; Stander, J. Bayesian analysis of a Tobit quantile regression model. J. Econom.; 2007; 137, pp. 260-276. [DOI: https://dx.doi.org/10.1016/j.jeconom.2005.10.002]
6. Hideo, K.; Kobayashi, G. Gibbs sampling methods for Bayesian quantile regression. J. Stat. Comput. Simul.; 2011; 81, pp. 1565-1578.
7. Alhamzawi, R. Bayesian Analysis of Composite Quantile Regression. Stat. Biosci.; 2016; 8, pp. 358-373. [DOI: https://dx.doi.org/10.1007/s12561-016-9158-8]
8. Yuan, X.; Xiang, X.; Zhang, X. Bayesian composite quantile regression for the single-index model. PLoS ONE; 2023; 18, e0285277. [DOI: https://dx.doi.org/10.1371/journal.pone.0285277]
9. Hu, Y.; Wang, H.J.; He, X.; Guo, J. Bayesian joint-quantile regression. Comput. Stat.; 2020; 36, pp. 2033-2053. [DOI: https://dx.doi.org/10.1007/s00180-020-00998-w]
10. Li, Q.; Lin, N.; Xi, R. Bayesian regularized quantile regression. Bayesian Anal.; 2010; 5, pp. 533-556. [DOI: https://dx.doi.org/10.1214/10-BA521]
11. Alhamzawi, R.; Yu, K. Variable selection in quantile regression via Gibbs sampling. J. Appl. Stat.; 2012; 39, pp. 799-813. [DOI: https://dx.doi.org/10.1080/02664763.2011.620082]
12. Alhamzawi, R.; Mallick, H. Bayesian reciprocal LASSO quantile regression. Commun. Stat. Simul. Comput.; 2020; 51, pp. 6479-6494. [DOI: https://dx.doi.org/10.1080/03610918.2020.1804585]
13. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; 3rd ed. John Wiley & Sons: New York, NY, USA, 2019.
14. Ying, Y.; Yin, G. Bayesian Quantile Regression for Longitudinal Studies with Nonignorable Missing Data. Biometrics; 2010; 66, pp. 105-114.
15. Zhao, P.-Y.; Tang, N.-S.; Jiang, D.-P. Efficient inverse probability weighting method for quantile regression with nonignorable missing data. Statistics; 2017; 51, pp. 363-386. [DOI: https://dx.doi.org/10.1080/02331888.2016.1268615]
16. Wang, Z.; Tang, N. Bayesian Quantile Regression with Mixed Discrete and Nonignorable Missing Covariates. Bayesian Anal.; 2020; 15, pp. 579-604. [DOI: https://dx.doi.org/10.1214/19-BA1165]
17. Tang, N.; Chow, S.-M.; Ibrahim, J.G.; Zhu, H. Bayesian Sensitivity Analysis of a Nonlinear Dynamic Factor Analysis Model with Nonparametric Prior and Possible Nonignorable Missingness. Psychometrika; 2017; 82, pp. 875-903. [DOI: https://dx.doi.org/10.1007/s11336-017-9587-4] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29030749]
18. Tuerde, M.; Tang, N. Bayesian semiparametric approach to quantile nonlinear dynamic factor analysis models with mixed ordered and nonignorable missing data. Statistics; 2022; 56, pp. 1166-1192. [DOI: https://dx.doi.org/10.1080/02331888.2022.2121399]
19. Hoffman, M.D.; Blei, D.M.; Wang, C.; Paisley, J. Stochastic variational inference. J. Mach. Learn. Res.; 2012; 14, pp. 1303-1347.
20. Beal, M.J. Variational Algorithms for Approximate Bayesian Inference; University of London, University College London: London, UK, 2003.
21. Ganguly, A.; Jain, S.; Watchareeruetai, U. Amortized Variational Inference: Towards the Mathematical Foundation and Review. arXiv; 2022; arXiv: 2209.10888
22. Faes, C.; Ormerod, J.T.; Wand, M.P. Variational Bayesian Inference for Parametric and Nonparametric Regression with Missing Data. J. Am. Stat. Assoc.; 2011; 106, pp. 959-971. [DOI: https://dx.doi.org/10.1198/jasa.2011.tm10301]
23. Spaanberg, E. Variational Inference of Dynamic Factor Models with Arbitrary Missing Data. arXiv; 2022; arXiv: 2207.01976
24. Liu, Q.; Li, J.; Dong, M.; Liu, M.; Chai, Y. Identification of gene regulatory networks using variational bayesian inference in the presence of missing data. IEEE/ACM Trans. Comput. Biol. Bioinform.; 2022; 20, pp. 399-409. [DOI: https://dx.doi.org/10.1109/TCBB.2022.3144418] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35061589]
25. Li, Y.; Zhu, J. L1-Norm Quantile Regression. J. Comput. Graphical Stat.; 2008; 17, pp. 163-185. [DOI: https://dx.doi.org/10.1198/106186008X289155]
26. Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006.
27. Wasserman, L. All of Statistics: A Concise Course in Statistical Inference; Springer: New York, NY, USA, 2004.
28. Jim, A.; Chib, S. Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc.; 1993; 88, pp. 669-679.
29. Polson, N.G.; Scott, J.G.; Windle, J. Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables. J. Am. Stat. Assoc.; 2012; 108, pp. 1339-1349. [DOI: https://dx.doi.org/10.1080/01621459.2013.829001]
30. Durante, D.; Rigon, T. Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models. Statist. Sci.; 2019; 34, pp. 472-485. [DOI: https://dx.doi.org/10.1214/19-STS712]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Quantile regression models are remarkable structures for conducting regression analyses when the data are subject to missingness. Missing values occur because of various factors like missing completely at random, missing at random, or missing not at random. All these may result from system malfunction during data collection or human error during data preprocessing. Nevertheless, it is important to deal with missing values before analyzing data since ignoring or omitting missing values may result in biased or misinformed analysis. This paper studies quantile regressions from a Bayesian perspective. By proposing a hierarchical model framework, we develop an alternative approach based on deterministic variational Bayes approximations. Logistic and probit models are adopted to specify propensity scores for missing manifests and covariates, respectively. Bayesian variable selection method is proposed to recognize significant covariates. Several simulation studies and real examples illustrate the advantages of the proposed methodology and offer some possible future research directions.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer