Full text

Turn on search term navigation

1. Introduction

Koenker and Bassett [1] introduced the concept of the quantile regression (QR) model. Unlike the linear regression (OLS) model, OLS focuses solely on modeling the mean of the response variable. QR allows us to estimate regression coefficients at different quartiles of the response variable. This enables us to capture the varying degrees of correlation between the covariates and the response variable. Quantile regression models are widely used in various fields such as finance, biology, and ecology (Baur and Dirk [2]; Huang et al. [3]; Cade and Noon [4]) due to their robust to outliers in the data, robustness, and wide applicability. Yu and Moye [5] examined Bayesian quantile regression (QR) by redefining QR as an asymmetric Laplace distribution. In continuation of this work, Kozumi and Kobayashi [6] proposed an effective Gibbs sampling method for the QR model by leveraging the decomposition property of the asymmetric Laplace distribution. Alhamzawi and Rahim [7] explored the composite quantile regression model with Bayesian estimation, while Yuan et al. [8] focused on studying Bayesian composite quantile regression for a single indicator model. Additionally, Hu et al. [9] introduced a Bayesian joint quantile regression model. An important aspect of building quantile regression models is the selection of predictor variables. Li et al. [10] approached the regularization problem of quantile regression from a Bayesian perspective, focusing on variable selection. To address the instability of posterior estimates caused by variable selection in Gibbs sampling and convergence issues due to fuzzy priors, Alhamzawi and Yu [11] proposed random search variable selection (ISSVS) within a Bayesian framework. Mallick et al. [12] introduced reciprocal Lasso (rLasso) regularization, which offers advantages over Lasso regularization in terms of estimation, prediction, and variable selection within the Bayesian framework. Considerable progress has been made in Bayesian frameworks for both statistical inference and variable selection in quantile regression. However, it is important to note that all the aforementioned studies are based on fully observable data.

Missing data are inevitable in research and can result from various uncontrollable factors. For instance, some results may be lost due to machine malfunction, while in questionnaires, individuals may be unwilling to disclose their income. The study of missing data originated in the 1970s, and Little and Rubin [13] defined three missing data mechanisms based on the causes of missingness: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). The first two mechanisms, MCAR and MAR, are unrelated to the missing data themselves. However, the third mechanism, MNAR, is directly associated with the missing data and closely reflects real life. Ignoring non-negligible missing data, especially those reflecting practical scenarios through the MNAR mechanism, can lead to erroneous conclusions. Recognizing the distinctive attributes of such data and the multitude of benefits offered by quantile regression, several researchers have explored quantile regression models as a means to address non-negligible missing data. For instance, Yuan and Yin [14] examined Bayesian quantile regression models for longitudinal data with non-negligible missingness. Zhao et al. [15] conducted a study on several estimation methods using inverse probability weighting (IPW) to estimate parameters in quantile regression models when either covariates or response variables more with non-negligible missing values. Wang and Tang [16] investigated nonlinear dynamic factor analysis models with mixed discrete and non-negligible missing covariates, employing Bayesian quantile regression models for statistical inference. Tang et al. [17] and Mulati and Tang [18] explored nonlinear dynamic factor analysis models with non-negligible missing data using Bayesian non-parametric and semi-parametric approaches, respectively. In all of the aforementioned studies, the Bayesian framework was adopted, and Bayesian inference was facilitated using the Markov Chain Monte Carlo (MCMC) algorithm. However, the MCMC algorithm has certain inherent limitations. It can be challenging to sample from, especially when dealing with large datasets or complex models.

The variational Bayesian algorithm transforms the high-dimensional integration problem in Bayesian inference into an optimization problem, enabling efficient computation of the evidence lower bound and obtaining a variational approximation of the posterior distribution, rendering it well-suited for analyzing extensive datasets. The development of variational inference (VI) has also yielded advantages for high-dimensional data, such as stochastic variational inference (SVI) by Hoffman et al. [19], “black box” variational inference by Ranganath et al. [20], and amortized variational inference by Ganguly et al. [21]. Various studies have utilized VI in different contexts, encompassing Bayesian approximation inference for models with missing covariates, as demonstrated in the work of Faes et al. [22], exploring parameter uncertainty in dynamic factorial models with missing data as shown by Erik [23], and conducting statistical inference on identified gene regulatory networks with missing data through variational Bayesian inference as presented in the research by Liu et al. [24]. However, none of the existing works have applied the variational Bayesian algorithm to infer quantile regression (QR) models with missing data.

The main contributions of this paper include the following: (i) a variational Bayesian approach for parameter estimation in quantile regression models; (ii) a variational Bayesian approach for quantile regression models with missing covariates and response variables, in which the benefits of our suggested method are demonstrated through simulations; and (iii) a variational Bayesian approach for variable selection in quantile regression models with missing data, which does the job very well.

This paper is organized as follows: Section 2 presents the variational Bayesian algorithm, QR, lasso penalty term, and missing data. In Section 3, variational inference is performed for QR in the presence of missing covariates and missing response variables, respectively. In Section 4, the variable selection and parameter estimation are performed for QR with lasso penalty term in the presence of missing covariates and response variables, respectively. The feasibility of variational Bayes in QR with missing data and the superiority of comparing Gibbs’ algorithm are illustrated by four experiments in Section 5. Section 6 analyzes the example data using the variational Bayes algorithm and obtains better results.

2. Model and Notation

2.1. Quantile Regression Model

The quantile regression model was initially introduced by Koenker and Bassett [1], and the model can be represented as follows:

(1) $y_{i} = X_{i}^{T} β_{p} + ε_{i}, (i = 1, \dots, n)$

X_{i} = (x_{i 1}, \dots, x_{i k})

represents a

k \times 1

vector of covariates, and

y_{i}

denotes a response variable. Let p (

0 < p < 1

) represent the quantile level. The parameter vector

β_{p}

is a k-dimensional vector that is associated with the quantile level p. The random error term is denoted as

ε_{i}

. In this case, we consider a more general distribution for the error term that satisfies

P (ε_{i} < 0 | x_{i}) = p

For the quantile regression model, given a specific $X_{i}$ and $y_{i}$ , the model can be expressed as $Q_{p} (y_{i} | X_{i}) = X_{i}^{T} β_{p}$ . The estimation of $β_{p}$ can be obtained by minimizing the following formula:

$\sum_{1}^{m} ρ_{p} (y_{i} - X_{i}^{T} β_{p}) .$

The loss function is defined as $ρ_{p} (u) = u (p - I_{(u < 0)})$ , where $I (\cdot)$ represents the indicator function. Due to the non-differentiability of the loss function, obtaining estimates for $β_{p}$ becomes challenging. Yu and Moye [5] demonstrated that the quantile regression model can be estimated within a Bayesian framework when the error term follows an asymmetric Laplacian distribution (ALD). However, ALD is not a standard distribution, making its posterior density function complex and inevitably increasing the computational burden. According to Kozumi and Kobayashi [6], ALD can be expressed as a mixture of exponential and normal distributions. Specifically,

(2) $ε_{i} = φ z_{i} + θ \sqrt{σ z_{i}} v_{i} .$

Among them, $z_{i} \sim exp (1 / σ)$ and $v_{i} \sim N (0, 1)$ are independent of each other. Here, $φ = \frac{1 - 2 p}{p (1 - p)}$ and $θ = \sqrt{\frac{2}{p (1 - p)}}$ . Finally, the quantile regression model can be expressed as a layered model, represented as follows:

(3) $\{\begin{matrix} y_{i} = X_{i}^{T} β + φ z_{i} + θ \sqrt{σ z_{i}} v_{i} \\ z_{i} ∣ σ \sim exp (1 / σ) = σ exp (- σ z_{i}) \\ v_{i} \sim N (0, 1) = 1 / \sqrt{2 π} exp (- v_{i}^{2} / 2) \\ φ = \frac{1 - 2 p}{p (1 - p)}, θ = \sqrt{\frac{2}{p (1 - p)}} . \end{matrix}$

2.2. Elements of Variational Bayes

Taking into account the latent variable $Z = Z_{1}, \dots, Z_{m}$ and the observed values $X = x_{1}, \dots, x_{n}$ , the joint density function can be expressed as follows:

$P (Z, X) = P (Z) P (X ∣ Z) .$

Variational Bayesian inference no longer relies on sampling to approximate the posterior density of complex problems. Instead, it employs optimization techniques. Within an acceptable range of error, a simpler density function $q (Z) \in Θ$ (where $Θ$ represents the space of possible density functions for the latent variable $Z$ ) is used to approximate the posterior density $P (Z ∣ X)$ . The difference between these two density functions is measured using the Kullback–Leibler divergence.

(4) $D_{K L} (q | p) = \sum_{Z} q (Z) log \frac{q (Z)}{p (Z ∣ X)} = \sum_{Z} q (Z) log \frac{q (Z)}{p (Z, X)} + log p (Z) .$

Alternatively,

(5) $log p (Z) = D_{K L} (q | p) - \sum_{Z} q (Z) log \frac{q (Z)}{p (Z ∣ X)} = D_{K L} (q | p) + L (q) .$

The term $L (q)$ is referred to as the evidence lower bound. To minimize the KL divergence, we maximize the lower bound.

(6) $L (q) = \sum_{Z} q (Z) log p (Z, X) - \sum_{Z} q (Z) log q (Z) = E_{q} [log \frac{p (Z, X)}{q (Z)}] .$

This is equivalent to solving the following optimization problem:

(7) $q^{*} (Z) = arg min D_{K L} (q | p)$

By solving the optimization problem, the best approximation function $q^{*} (Z_{i})$ can be obtained as follows:

(8) $q^{*} (Z i) \propto exp E_{- Z_{i}} log p (Z, X) (i = 1, \dots, m)$

(9) $q^{*} (Z i) \propto exp E_{- Z_{i}} log p (Z_{i} ∣ Θ_{r}) (i = 1, \dots, m)$

where

Θ_{r} = X, Z_{1}, \dots, Z_{i - 1}, Z_{i + 1}, \dots, Z_{m}

. For convenience, in the following paragraphs,

q^{*} (\cdot)

is denoted as

q (\cdot)

Additionally, the complexity of the density of the latent variables determines the complexity of the optimization algorithm. Therefore, we consider the mean-field variational family:

$q (Z) = \prod_{i = 1}^{m} q (Z_{i}) .$

2.3. Bayesian Lasso Regularized Quantile Regression

Li and Zhu [25] proposed lasso regularized quantile regression. Specifically, they introduced a $L_{1}$ norm penalty term to control the complexity of the model, aiming to shrink some regression coefficients to zero. The objective function can be expressed as follows: Li and Zhu [25] proposed lasso regularized quantile regression. Specifically, they introduced an $L_{1}$ norm penalty term to control the complexity of the model, aiming to shrink some regression coefficients to zero. The objective function can be expressed as follows:

(10) $min_{β} \sum_{i = 1}^{n} ρ_{p} (y_{i} - X_{i}^{T} β) + λ \sum_{i = 1}^{k} | β_{i} | .$

Here, $λ > 0$ is the regularization parameter used to adjust the balance between model complexity and the data. The term $ρ_{p} (\cdot)$ represents the loss function proposed by Koenker and Bassett [1]. It is defined as:

$ρ_{p} (y_{i} - X_{i}^{T} β) = \{\begin{matrix} p \cdot (y_{i} - X_{i}^{T} β) & if y_{i} - X_{i}^{T} β > 0 \\ - (1 - p) \cdot (y_{i} - X_{i}^{T} β) & otherwise . \end{matrix}$

Bayesian lasso regularization for quantile regression was introduced by Li et al. [10], which improves upon the standard lasso regularization by incorporating a Bayesian framework. The Bayesian lasso regularization tends to capture outliers in the data and improve robustness. By introducing suitable priors on $β$ , the solution to (10) is equivalent to the Bayesian maximum a posteriori (MAP) estimation.

In this approach, Laplace priors are used for the parameters $β$ :

(11) $P (β ∣ τ, λ) = {(τ λ / 2)}^{p} exp \{- τ λ \sum_{i = 1}^{k} | β_{i} |\} .$

By using the equation:

(12) $\frac{a}{2} e^{- a H} = \int_{0}^{\infty} \frac{1}{\sqrt{2 π s}} exp (- \frac{z^{2}}{2 s}) \frac{a^{2}}{2} exp (- \frac{a^{2} s}{2}) d s .$

where

η = p λ

, the Laplace prior on

β

can be written as:

(13) $P (β ∣ p, λ) = \prod_{i = 1}^{k} \int_{0}^{\infty} \frac{1}{\sqrt{2 π s_{i}}} exp (- \frac{β_{i}^{2}}{2 s_{i}}) \frac{η^{2}}{2} exp (- \frac{η^{2} s_{i}}{2}) d s_{i}$

where

s = (s_{1}, \dots, s_{k})

and

β = (β_{1}, \dots, β_{k})

Considering the error term follows an asymmetric Laplace distribution (ALD), the posterior distribution of $β$ is given by:

(14) $P (β ∣ y, X, p, λ) \propto exp \{- p \sum_{i = 1}^{n} ρ_{p} (y_{i} - X_{i}^{p} β) - p λ \sum_{i = 1}^{k} | β_{i} |\}$

To complete the model, gamma priors are placed on $λ$ and $η^{2}$ , resulting in the following hierarchical model:

(15) $\{\begin{matrix} y_{i} = X_{i}^{τ} β + φ z_{i} + θ \sqrt{σ z_{i}} v_{i} \\ z_{i} ∣ σ \sim exp (1 / σ) = σ exp (- σ z_{i}) \\ v_{i} \sim N (0, 1) = 1 / \sqrt{2 π} exp (- v_{i}^{2} / 2) \\ β_{j} ∣ s_{j} \sim N (0, s_{j}) = \frac{1}{\sqrt{2 π s_{j}}} exp (- β_{j}^{2} / 2 s_{j}) \\ s_{j} ∣ η^{2} \sim exp (η^{2} / 2) = \frac{η^{2}}{2} exp (- \frac{η^{2}}{2} s_{j}) \\ λ_{i} η^{2} - λ^{i - 1} exp (- b λ) {(η^{2})}^{c - 1} exp (- d η^{2}) . \end{matrix}$

2.4. Missing Data

In the context of missing data, a comprehensive study was conducted by Little and Rubin [13], who provided insights into missing data mechanisms and patterns. Missing data mechanisms can be classified into three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In the following, we describe each mechanism and introduce the relevant parameters and their priors for modeling the missing data.

(1). Missing Completely at Random (MCAR): These types of missing data are not related to either the observed data or the missing data. When $x_{i k}$ or $y_{i}$ is missing, the probability of missingness, denoted as $P (R_{i} ∣ x_{i k}, y_{i}, ϕ)$ , follows a certain link function $H (\cdot)$ parameterized by $ϕ$ , i.e., $P (R_{i} ∣ x_{i k}, y_{i}, ϕ) = H (R_{i} ∣ ϕ)$ .
(2). Missing at Random (MAR): These types of missing data are only related to the observed data. For example, when the covariate $x_{i}$ is missing, the probability of missingness $P (R_{i} ∣ x_{i k}, y_{i}, ϕ)$ is conditional on the observed response $y_{i}$ and follows a link function $H (\cdot)$ parameterized by $ϕ$ , i.e., $P (R_{i} ∣ x_{i k}, y_{i}, ϕ) = H (R_{i} ∣ y_{i}, ϕ)$ .
(3). Missing Not at Random (MNAR): These types of missing data are related to the missing data itself. For example, when the covariate $x_{j k}$ is missing, the probability of missingness $P (R_{i} ∣ x_{i k}, y_{i}, ϕ)$ depends on the observed covariates $X_{i k}$ and follows a link function $H (\cdot)$ parameterized by $ϕ$ , i.e., $P (R_{i} ∣ x_{i k}, y_{i}, ϕ) = H (R_{i} ∣ X_{i k}, ϕ)$ .

In the vast proportion of previous studies, the emphasis has been on random missing mechanisms. However, missing data frequently exhibit a connection to the data that are missing, suggesting nonignorable missingness. The study of such nonignorable missing data holds significant value as disregarding this data can potentially result in erroneous conclusions during analysis.

Furthermore, the missing data quantile regression models discussed in Section 3 and Section 4 are hierarchical models that can be represented using probabilistic-directed acyclic graphs (DAGs). In these graphs, the nodes correspond to the parameters in the model, and the arrows illustrate the dependencies between the parameters (Bishop [26]; Wasserman [27]). Specifically, in the DAG presented in this paper, the unshaded nodes correspond to the observed data.

Now, let us delve into Equation (3), where $y_{i}$ and $X_{i}$ denote the observed values of response variables and covariates for the ith observation. Assuming that both $y_{i}$ and $X_{i}$ are prone to nonignorable missingness, but not simultaneously missing, we can establish this model as a quantile regression model that accommodates nonignorable missing data. To effectively model the missing data mechanism, we introduce pertinent parameters and assign their prior distributions. For the random variable $δ_{i}$ , the optimal density function for it $q^{*} (δ_{i})$ .

$\begin{matrix} β \sim N (0, σ_{β}^{2} I), x_{i k} \sim N (μ_{x}, σ_{x}^{2}), ϕ_{0}, ϕ_{1} \overset{ind .}{\sim} N (0, σ_{ϕ}^{2}), σ \sim I G (A_{σ}, B_{σ}) \\ μ_{x} \sim N (0, σ_{μ_{x}}^{2}), σ_{x}^{2} \sim I G (A_{x}, B_{x}), z_{i} \sim exp (\frac{1}{σ}), v_{i} \sim N (0, 1) \\ σ \sim I G (a^{*}, b), μ_{q} (δ_{i}) = E_{q} (δ_{i}), μ_{q} (1 / δ_{i}) = E_{q} (1 / δ_{i}) \\ μ_{q} (log (δ_{i})) = E_{q} (log (δ_{i})), σ_{q (δ_{i})}^{2} = {Var}_{q} (δ_{i}) \end{matrix}$

3. Variational Bayesian Inference

3.1. Missing Covariate Variables

In Equation (3), we define the matrix:

$X = [\begin{matrix} 1 & x_{1} \\ ⋮ & ⋮ \\ 1 & x_{n} \end{matrix}] = [\begin{matrix} X_{1} \\ ⋮ \\ X_{n} \end{matrix}], Y = [\begin{matrix} 1 & y_{1} \\ ⋮ & ⋮ \\ 1 & y_{n} \end{matrix}] = [\begin{matrix} Y_{1} \\ ⋮ \\ Y_{n} \end{matrix}] .$

When the covariate x is missing, we introduce an indicator function $R = {(R_{1}, \dots, R_{n})}^{T}$ to detect if $x_{i}$ is missing, defined as:

$R_{i} = \{\begin{matrix} 1, & if x_{i} is observed, \\ 0, & if x_{i} is missing . \end{matrix}$

Here, $x_{mis}$ represents the $n_{mis}$ -dimensional subvector consisting of the missing $x_{i}$ , and $n_{obs} + n_{mis} = n$ . However, $x = {(x_{obs}, x_{mis})}^{T}$ may not correspond to the original data $x = {(x_{1}, \dots, x_{n})}^{T}$ in terms of subscripts. Considering the nature of the indicator function, $R_{i}$ follows a $0 - 1$ distribution with probability $H (X_{i} ϕ)$ , where $X_{i}$ denotes the ith row of matrix $X$ . We adopt the probit regression model as the link function $H (\cdot)$ , thus:

$R_{i} ∣ ϕ, X_{i} \sim Φ (X_{i}^{T} ϕ) .$

To implement the approach proposed by Albert and Chib [28], we introduce n independent latent variables $a_{1}, a_{2}, \dots, a_{n}$ in the missing data mechanism, where $a_{i}$ follows a normal distribution with mean $X_{i}^{T} ϕ$ and variance 1. For $i = 1, \dots, n$ , we define $R_{i} = 1$ if $a_{i} > 0$ , and $R_{i} = 0$ if $a_{i} < 0$ . The interactions between the regression parameters and the parameters of the missing data mechanism are shown in Figure 1.

3.2. Variational Bayesian Inference of QR Model Parameters

Based on the above parameters, we consider the mean field variational family,

$q (β, z, σ, x_{m i s}, μ_{x}, σ_{x}^{2}, ϕ, a) = q (β) q (z) q (σ) q (x_{m i s}) q (μ_{x}) q (σ_{x}^{2}) q (ϕ) q (a)$

In $z = (z_{1}, \dots, z_{n}), a = (a_{1}, \dots, a_{n}), x_{m i s} = (x_{m i s, 1}, \dots, x_{m i s, n_{m i s}})$ , $z_{1}, \dots, z_{n}$ are independent of each other, $a_{1}, \dots, a_{n}$ are independent of each other, $x_{m i s, 1}, \dots, x_{m i s, n_{m i s}}$ are independent of each other, so there was:

$\begin{matrix} q (z) = q (z_{1}) q (z_{2}) \dots q (z_{n}) \\ q (a) = q (a_{1}) q (a_{2}) \dots q (a_{n}) \\ q (x_{m i s}) = q (x_{m i s, 1}) q (x_{m i s, 2}) \dots q (x_{m i s, n_{m i s}}) \end{matrix}$

The optimal density function for each parameter is derived as follows. Please refer to Appendix A for a detailed derivation of the optimal density function for each parameter, as well as the calculations of $E_{q} (X_{i} X_{i}^{T})$ and $E_{q} (X)$ , where $l = 1, \dots, n_{mis}$ .

(16) $\begin{matrix} q (β) & \sim N (μ_{q (β)}, Σ_{q (β)}) \end{matrix}$

(17) $\begin{matrix} q (z_{i}) & \sim G I G (\frac{1}{2}, a_{z_{i}}, b_{z_{i}}) \end{matrix}$

(18) $\begin{matrix} q (σ) & \sim I G (a_{σ}, b_{σ}) \end{matrix}$

(19) $\begin{matrix} q (x_{m i s, l}) & \sim N (μ_{q (x_{m i s, l})}, σ_{q (x_{m i s, l})}^{2}) \end{matrix}$

(20) $\begin{matrix} q (μ_{x}) & \sim N (μ_{q (μ_{x})}, σ_{q (μ_{x})}^{2}) \end{matrix}$

(21) $\begin{matrix} q (σ_{x}^{2}) & \sim I G (A_{q}, B_{q}) \end{matrix}$

(22) $\begin{matrix} q (ϕ) & \sim N (μ_{q (ϕ)}, Σ_{q (ϕ)}) \end{matrix}$

For the generalized inverse Gaussian distribution $q (z_{i}) \sim G I G (\frac{1}{2}, a_{z_{i}}, b_{z_{i}})$ , where $K_{p} (\cdot)$ is the Bessel function with order p, with

(23) $\begin{matrix} μ_{q (z_{i})} & = \frac{\sqrt{b_{q (z_{i})}} K_{3 / 2} \sqrt{a_{q (z_{i})} b_{q (z_{i})}}}{\sqrt{a_{q (z_{i})}} K_{1 / 2} \sqrt{a_{q (z_{i})} b_{q (z_{i})}}} \end{matrix}$

(24) $\begin{matrix} μ_{q (1 / z_{i})} & = \frac{\sqrt{a_{q (z_{i})}} K_{3 / 2} \sqrt{a_{q (z_{i})} b_{q (z_{i})}}}{\sqrt{b_{q (z_{i})}} K_{1 / 2} \sqrt{a_{q (z_{i})} b_{q (z_{i})}}} - \frac{1}{b_{q (z_{i})}} \end{matrix}$

(25) $\begin{matrix} μ_{q (log (z_{i}))} & = log \frac{\sqrt{b_{q (z_{i})}}}{\sqrt{a_{q (z_{i})}}} + \frac{\partial}{\partial p} log K_{p} (\sqrt{a_{q (z_{i})} b_{q (z_{i})}}) \end{matrix}$

For the inverse gamma distribution $q (σ) \sim I G (a_{σ}, b_{σ})$ , where $ψ (\cdot)$ denotes the inverse gamma function, we have

(26) $\begin{matrix} μ_{q (σ)} & = \frac{b_{σ}}{a_{σ} - 1} \\ σ_{q (1 / σ)}^{2} & = \frac{a_{σ}}{b_{σ}} \\ μ_{q (log (σ))} & = log (b_{q (σ)}) - ψ (b_{q (σ)}) \end{matrix}$

In addition, the required expectations for the additional variable a are as follows:

(27) $\begin{matrix} μ_{q (a)} = E_{q (x_{mis})} (X) μ_{q (ϕ)} + \frac{{(2 π)}^{- 2} (2 R - 1) ⊙ exp {- \frac{1}{2} {(E_{q (x_{mis})} (X) μ_{q (ϕ)})}^{2}}}{Φ ((2 R - 1) ⊙ (E_{q (x_{mis})} (X) μ_{q (ϕ)}))} \end{matrix}$

In the model, the evidence lower bound by

(28) $\begin{matrix} L (p) & = \frac{1}{2} log \frac{|Σ_{q (β)}|}{σ_{β}^{2}} - \frac{1}{2 σ_{β}^{2}} \{{∥μ_{q (β)}∥}^{2} + tr (Σ_{q (β)})\} + n μ_{q (log σ)} - \frac{1}{2} \sum_{i = 1}^{n} (a_{z_{i}} μ_{q (z_{i})} + b_{z_{i}} μ_{q (1 / z_{i})}) \\ + \sum_{i = 1}^{n} log \frac{{(a_{z_{i}} / b_{z_{i}})}^{1 / 4}}{2 K_{1 / 2} (\sqrt{a_{z_{i}} b_{z_{i}}})} - \frac{1}{2} \sum_{i = 1}^{n} μ_{q (log (z_{i}))} + \frac{1}{2} log \frac{σ_{q (μ_{x})}^{2}}{σ_{μ_{x}}^{2}} - \frac{μ_{q (μ_{x})}^{2} + σ_{q (μ_{x})}^{2}}{2 σ_{μ_{x}}^{2}} \\ + \frac{1}{2} {∥E_{q (x_{m i s})} (X) μ_{q (ϕ)}∥}^{2} - \frac{1}{2} tr \{E_{q (x_{m i})} (X^{T} X) (μ_{q (ϕ)} μ_{q (ϕ)}^{T} + Σ_{q (ϕ)})\} \\ + R^{T} log \{Φ (E_{q (x_{m i s})} (X) μ_{q (ϕ)})\} + {(1 - R)}^{T} log \{1 - Φ (E_{q (x_{m i s})} (X) μ_{q (ϕ)})\} \\ + \frac{1}{2} log \frac{|\sum_{q (ϕ)}|}{σ_{ϕ}^{2}} - \frac{1}{2 σ_{ϕ}^{2}} \{{∥μ_{q (ϕ)}∥}^{2} + tr (Σ_{q (ϕ)})\} + \frac{1}{2} 1^{T} μ_{q (x_{m i s})} - A_{q (σ_{x}^{2})} log (B_{q (σ_{x}^{2})}) \\ + (a_{q (σ)} - a) μ_{q (log (σ))} + \frac{(b_{q (σ)} - b) a_{q (σ)}}{b_{q (σ)}} + log Γ (a_{q (σ)}) - a_{q (σ)} log b_{q (σ)} \\ - \frac{\sum_{i = 1}^{n} μ_{q (log z_{i})}}{2} - \frac{n}{2} μ_{q (log σ)} - \frac{1}{2 θ^{2}} μ_{q (1 / σ)} 1^{T} μ_{q (1 / z)} E_{q (x_{m i s}) q (β)} {∥y - X^{T} β∥}^{2} \\ + \frac{φ μ_{q (1 / σ)}}{θ^{2}} \sum_{i = 1}^{n} y_{i} - \frac{φ μ_{q (1 / σ)}}{θ^{2}} 1^{T} E_{q (x_{m i s})} (X) μ_{q (β)} - \frac{φ^{2} μ_{q (1 / σ)}}{2 θ^{2}} 1^{T} μ_{q (z)} \end{matrix}$

where:

(29) $\begin{matrix} E_{q (x_{m i s}) q (β)} {∥y - X^{T} β∥}^{2} & = {∥ y ∥}^{2} - 2 y^{T} E_{q (x_{m i s})} (X) μ_{q (β)} \\ + tr \{E_{q (x_{m \dot{m}})} (X^{T} X) (Σ_{q (β)} + μ_{q (β)} μ_{q (β)}^{T})\} \end{matrix}$

The specific algorithm is shown in Algorithm 1.

Algorithm 1: VI with Nonignorable Missing Covariates in QR Models.

Input: The prior distribution of each parameter, including $σ_{β}^{2}, μ_{x}, σ_{x}^{2}, σ_{ϕ}^{2}, σ_{μ_{x}}^{2}, A_{x}, B_{x}, a^{*}, b$ , and a given lower bound on the evidence lower bound $t o l = 10^{- 5}$

3.3. Missing Response Variables

For Equation (3), let us consider the scenario where the response variable $y = (y_{1}, \dots, y_{n})$ contains missing values. Similar to missing covariates, we introduce an indicator function $R = (R_{1}, \dots, R_{n})$ to detect missing response values, defined as:

$R_{i} = \{\begin{matrix} 1, & if y_{i} is observed, \\ 0, & if y_{i} is missing . \end{matrix}$

Here, $y_{m i s}$ represents the $n_{m i s}$ -dimensional subvector consisting of the missing $y_{i}$ , and we have $n_{o b s} + n_{m i s} = n$ . However, $y = (y_{o b s}, y_{m i s})$ does not necessarily correspond to the original data $y = (y_{1}, \dots, y_{n})$ in terms of subscripts.

Similar to Section 3.1, we introduce the indicator function $R_{i}$ and consider the link function as a logistic regression model. Consequently, we have:

$R_{i} ∣ ϕ, X_{i} \sim Bernoulli (\frac{1}{1 + exp (- X_{i}^{T} ϕ)}) .$

Here, $X_{i}$ represents the i-th row of the matrix $X$ , and $ϕ$ is the parameter vector.

$P (R_{i} ∣ ϕ, Y_{i}) = 1 / (1 + exp (- Y_{i}^{T} ϕ))$

Auxiliary Variables

Polson et al. [29] proposed a Bayesian estimation method for logistic regression models by introducing auxiliary variables $ω = (ω_{1}, \dots, ω_{n})$ that follow the Pòlya-Gamma distribution. This approach addresses the challenges associated with complex integration and sampling using the Metropolis–Hastings (MH) algorithm.

In their methodology, they consider the following model:

$y_{i}^{*} \sim Binom (n_{i}^{*}, \frac{1}{1 + e^{- ψ_{i}^{*}}})$

where

ψ_{i}^{*} = x_{i}^{* T} β^{*}

and

i = 1, \dots, N^{*}

. They place a Gaussian prior on

β^{*}

, denoted as

β^{*} \sim N (b^{*}, B)

. To incorporate the auxiliary variable

ω

, they introduce it into the model.

The posterior density, after adding the auxiliary variable $ω$ , can be expressed as follows:

$P (β^{*}, ω | y^{*}, x^{*}) \propto P (y^{*} | β^{*}, x^{*}) \cdot P (β^{*}) \cdot P (ω) \cdot P (x^{*}) .$

Here, $P (y^{*} | β^{*}, x^{*})$ represents the likelihood function, $P (β^{*})$ is the prior distribution of $β^{*}$ , $P (ω)$ is the distribution of the auxiliary variable $ω$ , and $P (x^{*})$ denotes the prior distribution of the covariates $x^{*}$ .

The specific form of the posterior density and the method of inference depend on the choice of prior distributions, likelihood function, and the implementation of the Pòlya-Gamma auxiliary variables.

$\begin{matrix} (ω_{i} ∣ β^{*}) \sim P G (n_{i}^{*}, x_{i}^{* T} β^{*}) \\ (ϕ ∣ y^{*}, ω) \sim N (m_{ω}, V_{ω}) \end{matrix}$

where

$\begin{matrix} V_{ω} = {(X^{* T} Ω X^{*} + B^{- 1})}^{- 1} \\ m_{ω} = V_{ω} (X^{* T} κ + B^{- 1} b^{*}) . \end{matrix}$

We define $κ = (y_{1}^{*} - n_{1}^{*} / 2, \dots, y_{N^{*}}^{*} - n_{N^{*}}^{*} / 2)$ , where $Ω$ represents the diagonal matrix with $ω_{1}$ as the diagonal elements. Here, $i = 1, \dots, N^{*}$ .

Variational Bayesian inference is typically applicable to specific types of models, particularly conditional covariance exponential family models. However, logistic regression models with Gaussian priors are an exception because there is no covariance between the logistic likelihood and the Gaussian prior in such models. To address this limitation, Daniele et al. [30] proposed introducing an additional variable $ω = (ω_{1}, \dots, ω_{n})$ that follows the Pòlya-Gamma distribution. By incorporating this variable into models with logistic components, it strengthens the covariance between the logistic likelihood and the Gaussian prior. This enables Bayesian framework statistical inference for this class of models using a variational Bayesian algorithm.

In the context of this chapter, we consider the missing data mechanism as a logistic regression model. Consequently, we introduce additional variables that follow the Pòlya-Gamma distribution, denoted as $ω = (ω_{1}, \dots, ω_{n})$ . The interaction between the parameters is shown in Figure 2.

3.4. Variational Bayesian Inference of QR Model

For the above parameters, we consider the mean field variational family,

$q (β, z, σ, y_{m i s}, ϕ, ω) = q (β) q (z) q (σ) q (y_{m i s}) q (ϕ) q (ω) .$

Consider the variables $z = (z_{1}, \dots, z_{n})$ , $a = (a_{1}, \dots, a_{n})$ , and $y_{mis} = (y_{mis, 1}, \dots, y_{mis, n_{mis}})$ . Here, $z_{1}, \dots, z_{n}$ are independent of each other, $a_{1}, \dots, a_{n}$ are independent of each other, and $y_{mis, 1}, \dots, y_{mis, n_{mis}}$ are independent of each other. This follows a similar pattern as discussed in Section 3.1.

$\begin{matrix} q (z) = q (z_{1}) q (z_{2}) \dots q (z_{n}) \\ q (a) = q (a_{1}) q (a_{2}) \dots q (a_{n}) \\ q (y_{m i s}) = q (y_{m i s, 1}) q (y_{m i s, 2}) \dots q (y_{m i s, n_{m i s}}) . \end{matrix}$

In the scenario of missing response variables, the posterior probabilities $P (β ∣ Θ_{r})$ , $P (z ∣ Θ_{r})$ , and $P (σ ∣ Θ_{r})$ remain the same as in the presence of missing covariates and are not reiterated here. However, we provide the full conditional probability posterior for $P (ϕ ∣ θ_{r})$ , $P (ω ∣ θ_{r})$ , and $P (y_{mix} ∣ θ_{r})$ . Following the approach of Polson et al. [29], we consider the prior distribution of $ω_{i}$ as $P G (1, 0)$ .

$\begin{matrix} P (ω_{i} ∣ Θ_{r}) \sim P G (R_{i}, Y_{i}^{T} ϕ) \\ P (ϕ ∣ Θ_{r}) \sim N (μ_{ϕ}, Σ_{ϕ}) \end{matrix}$

where

$\begin{matrix} Σ_{ϕ} = {(Y^{T} Ω Y + {(σ_{ϕ}^{2} I)}^{- 1})}^{- 1} \\ μ_{ϕ} = Σ_{ϕ} (Y^{T} K) \end{matrix}$

κ = (R_{1} - 1 / 2, \dots, R_{n} - 1 / 2)

Ω

denotes the diagonal matrix with

ω_{i}

diagonal elements,

i = 1, \dots, n

. The optimal density of each parameter in the missing data mechanism section is as follows:

(30) $q (ω_{i}) \sim P G (1, ξ_{i})$

(31) $q (ϕ) \sim N (μ_{q (ϕ^{y})}, Σ_{q (ϕ^{y})})$

where:

(32) $ξ_{i} = \sqrt{E_{q} {(Y_{i})}^{T} (σ_{ϕ}^{2}) I E_{q} (Y_{i}) + (E_{q} {(Y_{i})}^{T} μ_{q (ϕ)})}$

(33) $Σ_{q (ϕ)} = {(E_{q} {(Y)}^{T} Ω E_{q} (Y) + {(σ_{ϕ}^{2} I)}^{- 1})}^{- 1}$

(34) $μ_{q (ϕ)} = Σ_{q (ϕ)} (E_{q} {(Y)}^{T} κ)$

The optimal density functions of the remaining parameters are as follows, and the detailed derivation of the optimal density function for each parameter and $E_{q} (Y)$ is similar to $E_{q} (X)$ .

(35) $q (β) \sim N (μ_{q (β)}, Σ_{q (β)})$

(36) $q (z_{i}) \sim G I G (\frac{1}{2}, a_{z_{i}}, b_{z_{i}})$

(37) $q (σ) \sim I G (a_{σ}, b_{σ})$

(38) $q (y_{m i s, l}) \sim N (μ_{q (y_{m i s, l})}, σ_{q (y_{m i s, l})}^{2})$

In the model, the evidence lower bound by

(39) $\begin{matrix} L (q) & = \frac{1}{2} log \frac{|Σ_{q (β)}|}{σ_{β}^{2}} - \frac{1}{2 σ_{β}^{2}} \{{∥μ_{q (β)}∥}^{2} + Σ_{q (β)}\} - n μ_{q (log (σ))} - \frac{1}{2} \sum_{j = 1}^{n_{m i s}} log σ_{q (y_{m i s, j})}^{2} \\ - \sum_{i = 1}^{n} (μ_{q (z_{i})} μ_{q (1 / σ)} + log \frac{{(a_{z_{i}} / b_{z_{i}})}^{1 / 4}}{2 K_{1 / 2} (\sqrt{a_{z_{i}} b_{z_{i}}})} - \frac{1}{2} μ_{q (log z_{i})} - \frac{1}{2} (a_{z_{i}} μ_{q (z_{i})} + b_{z_{i}} μ_{q (1 / z_{i})})) \\ + (a_{q (σ)} - a) μ_{q (log (σ))} + \frac{(b_{q (σ)} - b) a_{q (σ)}}{b_{q (σ)}} - a_{q (σ)} log b_{q (σ)} \frac{1}{2} log |σ_{q (ϕ)}^{2} I| - \frac{1}{2} μ_{q (ϕ)}^{T} {(Σ_{ϕ})}^{- 1} μ_{q (ϕ)} \\ + \sum_{i = 1}^{n} [(R_{i} - \frac{1}{2}) E_{q} (Y_{i}) μ_{q (ϕ)} - log (\frac{exp (ξ_{i})}{1 + exp (ξ_{i})}) - \frac{1}{2} ξ_{i}] - \frac{1}{2} tr ({(Σ_{ϕ})}^{- 1} Σ_{q (ϕ)}) \\ - \frac{μ_{q (1 / σ)}}{2} \sum_{i = 1}^{n} [E_{q} (y_{i}^{2}) μ_{q (1 / {\tilde{z}}_{i})} / θ^{2} - \frac{2}{θ^{2}} E_{q} (y_{i}) X_{i}^{T} μ_{q (β)} μ_{q (1 / z_{i})} - 2 E_{q} (y_{i}) φ / θ^{2}] \\ - \frac{μ_{q (1 / σ)}}{2} \sum_{i = 1}^{n} [\frac{φ}{θ^{2}} X_{i}^{T} μ_{q (β)} + \frac{μ_{q (1 / z_{i})}}{θ^{2}} tr (X_{i}^{T} X_{i} (Σ_{q (β)} + μ_{q (β)} μ_{q (β)}^{T})) + \frac{φ^{2}}{θ^{2}} μ_{q (z_{i})}] \\ - \frac{1}{2} \sum_{i = 1}^{n} μ_{q (log (z_{i})} - \frac{n}{2} μ_{q (log (σ)) .} \end{matrix}$

The specific algorithm is shown in Algorithm 2.

Algorithm 2: VI with Nonignorable Missing Response in QR Models.

Input: The prior distribution of each parameter, including $σ_{β}^{2}, σ_{ϕ}^{2}, μ_{x}, σ_{x}^{2}, a^{*}, b$ , and a given lower bound on the evidence lower bound $t o l = 10^{- 5}$

4. Bayesian Variable Selection of the QR Model

4.1. Covariates with Nonignorable Missing Variable Choices

For Equation (15), the following matrix is defined:

$X = [\begin{matrix} 1 & x_{11} & \dots & x_{1 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & x_{1 n} & \dots & x_{n k} \end{matrix}] = [\begin{matrix} X_{1} \\ ⋮ \\ X_{n} \end{matrix}] Y = [\begin{matrix} 1 & y_{1} \\ ⋮ & ⋮ \\ 1 & y_{n} \end{matrix}] = [\begin{matrix} Y_{1} \\ ⋮ \\ Y_{n} \end{matrix}] .$

In the context of our analysis, the variable $x_{i}$ represents the covariate for the i-th observation, while $y_{i}$ denotes the response variable corresponding to the same observation. When any of the covariates in X, specifically the j-th dimension covariate where j ranges from 1 to k, is missing, we follow a similar approach as described in Section 3.1. We introduce the indicative function $R = {(R_{1}, \dots, R_{n})}^{T}$ to identify the missing covariates and consider the missing data mechanism as a probit regression.

Furthermore, to incorporate the variables’ choices, we introduce the parameters s and $η^{2}$ . The prior distribution for s follows an exponential distribution, specifically $s \sim exp (η^{2} / 2)$ . On the other hand, the parameter $η^{2}$ is assigned a prior distribution with a gamma distribution, denoted as $η^{2} \sim G a (c, d)$ . The parameter interactions are shown in Figure 3.

The optimal densities of the parameters $z, σ, μ_{x}, σ_{x}^{2}, a, ϕ$ are similar to those of Algorithm 1 and will not be repeated. In the following we give the optimal density functions for the parameters $β$ ,s, $η^{2}$ , $x_{j, m i s, l}$ , where $l = 1, \dots, n_{m i s}$ .

(1). The full conditional distribution of the parameter $β$ is as follows:
$\begin{matrix} P (β_{j} ∣ Θ_{r}) & \propto P (y ∣ X, β_{j}, σ, z) P (β_{j} ∣ s_{j}) \\ \propto exp \{- \frac{1}{2} [(\sum_{i = 1}^{n} \frac{x_{j, i}^{2}}{θ^{2} z_{i} σ} + \frac{1}{s_{k}}) β_{j}^{2} - 2 \sum_{i = 1}^{n} \frac{y_{i, j} x_{j, i}}{θ^{2} z_{i} σ} β_{j}]\} \end{matrix}$
where $x_{k} = {(x_{k 1}, \dots, x_{k n})}^{T}, y_{j, i} = y_{i} - φ z_{i} - \sum_{i = 1, i \neq j}^{k} x_{i, - j} β_{- j}$ . Therefore, the full conditional distribution of the parameter $β$ is $N ({\tilde{μ}}_{β}, {\tilde{σ}}_{β}^{2})$ , and ${\tilde{μ}}_{β} = σ_{β}^{2} \sum_{i = 1}^{n} \frac{{\tilde{y}}_{i, j} x_{j, i}}{θ^{2} z_{i} σ}, {\tilde{σ}}_{β}^{2} = 1 / (\sum_{i = 1}^{n} \frac{x_{j, i}^{2}}{θ^{2} z_{i} σ} + \frac{1}{s_{j}})$ . From Equation (7), we know that the parameter $β_{j}$ optimal density function is normally distributed as follows:
(40) $\begin{matrix} q (β_{j}) \sim N (μ_{q (β_{j})}, σ_{q (β_{j})}^{2}) \end{matrix}$
where
$\begin{matrix} σ_{q (β_{j})}^{2} = 1 / (\sum_{i = 1}^{n} \frac{μ_{q (1 / σ)} μ_{q (1 / z_{i})} E_{q (x_{m i s})} {(X_{j})}^{2}}{θ^{2}} + μ_{q (1 / z_{j})}) \\ μ_{q (β_{j})} = σ_{q (β_{j})}^{2} [\frac{μ_{q (1 / σ)} E_{q (x_{m i s})} (X_{j})}{θ^{2}} (y ⊙ μ_{q (1 / z)} - φ l - (E_{q (x_{m i s})} (X) μ_{q (β)} - E_{q (x_{m i s})} (X_{j}) μ_{q (β_{j})}) ⊙ μ_{q (1 / z)})] . \end{matrix}$
(2). The full conditional distribution of the parameter s is as follows:
$\begin{matrix} P (s_{j} ∣ Θ_{r}) & \propto P (β_{j} ∣ s_{j}) P (s_{j} ∣ η^{2}) \\ \propto \frac{1}{\sqrt{s_{j}}} exp \{- \frac{1}{2} [η^{2} s_{j} + β_{j}^{2} s_{j}^{- 1}]\} . \end{matrix}$

Therefore, the full conditional distribution of the parameter $s_{j}$ is $G I G (1 / 2, η^{2}, β_{k}^{2})$ . From Equation (7), we know that the parameter $s_{j}$ optimal density function is a generalized inverse Gaussian distribution
(41) $\begin{matrix} q (s_{j}) \sim G I G (\frac{1}{2}, a_{s_{j}}, b_{s_{j}}) \end{matrix}$
where $a_{s_{j}} = μ_{q (η^{2})}, b_{s_{j}} = μ_{q (β_{j})}^{2} + {(σ_{q (β_{j})}^{2})}^{2}$ .
(3). The full conditional distribution of the parameter $η^{2}$ is as follows:
$\begin{matrix} P (η^{2} ∣ Θ_{r}) & \propto P (s ∣ η^{2}) P (η^{2}) \\ \propto {(η^{2})}^{k + c - 1} exp \{- η^{2} (\sum_{i = 1}^{k} \frac{s_{i}}{2} + d)\} . \end{matrix}$

The optimal density function is known to be the gamma distribution
(42) $\begin{matrix} q (η^{2}) \sim G a (c_{q}, d_{q}) \end{matrix}$
where $c_{q} = k + c, d_{q} = \frac{1}{2} \sum_{j = 1}^{k} σ_{q (β_{j})}^{2} + d$
(4). The full conditional distribution of $x_{l, m i s}$ is as follows:
$\begin{matrix} P (x_{j, m i s, l} ∣ Θ_{r}) \propto P (y ∣ x, β, z, σ) P (x_{j, m i s, l} ∣ μ_{x}, σ_{x}^{2}) P (a ∣ ϕ, x) \\ \propto \frac{exp \{- \frac{1}{2} [x_{j, m i s, l}^{2} (\frac{β_{j}^{2}}{θ^{2} σ z_{m i s, l}} + \frac{1}{σ_{x}^{2}} + ϕ_{1}^{2})]\}}{exp \{- \frac{1}{2} [- 2 x_{j, m i s, l} (- \frac{β_{j} X_{l, - j} β_{- j} + β_{j} φ z_{m i s, l} - y_{m i s, l} β_{j}}{θ^{2} σ z_{m i s, l}} + \frac{μ_{x}}{σ_{x}^{2}} + a_{m i s, l} ϕ_{1} + ϕ_{0} ϕ_{1})]\}} . \end{matrix}$

The above equation, $X = (x_{1}, \dots, x_{k}), x_{j} = {(x_{j, o b s}, x_{j, m i s})}^{T}, X_{l, - j} = (x_{l, 1}, \dots, x_{l, j - 1},$ $x_{l, j + 1}, \dots, x_{l, k})$ , $β_{- j} = (β_{1}, \dots, β_{j - 1}, β_{j + 1}, \dots, β_{k})$ . The $x_{j, m i s, l}$ optimal density function can be found as

(43) $\begin{matrix} q (x_{j, m i s, l}) \sim N (μ_{q (x_{j, m i s, l})}, σ_{q (x_{j, m i s, l})}^{2}) \end{matrix}$

where:

$\begin{matrix} σ_{q (x_{j, m i s, l})}^{2} & = 1 / (μ_{q (1 / σ_{x}^{2})} + \frac{μ_{q (1 / σ)} μ_{q (1 / z_{m i s, l})}}{θ^{2}} (μ_{q (β_{j})}^{2} + {(σ_{q (β_{j})}^{2})}^{2}) + μ_{q (ϕ_{1})}^{2} + {(Σ_{q (ϕ)})}_{22}^{2}) \\ μ_{q (x_{j, m i s,})} & = - \frac{σ_{q (x_{j, m i s, l})}^{2} μ_{q (1 / σ)} μ_{q (1 / z_{m i s, l})}}{θ^{2}} (μ_{q (β_{j})} X_{l, - j} μ_{q (β_{- j})} + μ_{q (β_{j})} φ μ_{q (1 / z_{m i s, l})} - y_{m i s, l} μ_{q (β_{j})}) \\ + σ_{q (x_{j, m i s, l})}^{2} (μ_{q (μ_{x})} / μ_{q (σ_{x}^{2})} + μ_{q (a_{m i s, j})} μ_{q (ϕ_{1})} + μ_{q (ϕ_{0})} μ_{q (ϕ_{1})}) . \end{matrix}$

In the model, the evidence lower bound by

(44) $\begin{matrix} l (p) & = \frac{1}{2} log \frac{|Σ_{q (\hat{β})}|}{σ_{β}^{2}} - \frac{1}{2 σ_{β}^{2}} \{{∥μ_{q (\hat{β})}∥}^{2} + tr (Σ_{q (\hat{β})})\} + n μ_{q (log \hat{σ})} - \frac{1}{2} \sum_{i = 1}^{n} (a_{{\hat{z}}_{i}} μ_{q ({\hat{z}}_{i})} + b_{z_{i}} μ_{q (1 / {\hat{z}}_{i})}) \\ + \sum_{i = 1}^{n} log \frac{{(a_{{\hat{z}}_{i}} / b_{{\hat{z}}_{i}})}^{1 / 4}}{2 K_{1 / 2} (\sqrt{a_{{\tilde{z}}_{i}} b_{{\hat{z}}_{i}}})} - \frac{1}{2} \sum_{i = 1}^{n} μ_{q (log ({\tilde{z}}_{i}))} + \frac{1}{2} log \frac{σ_{q (μ_{x})}^{2}}{σ_{μ_{x}}^{2}} - \frac{μ_{q (μ_{x})}^{2} + σ_{q (μ_{x})}^{2}}{2 σ_{μ_{k}}^{2}} \\ + \frac{1}{2} {∥E_{q (x_{m i s})} (X) μ_{q (\hat{ϕ})}∥}^{2} - \frac{1}{2} tr \{E_{q (x_{m i s})} (X^{T} X) (μ_{q (\hat{ϕ})} μ_{q (\hat{ϕ})}^{T} + Σ_{q (\hat{ϕ})})\} \\ + R^{T} log \{Φ (E_{q (x_{m i s})} (X) μ_{q (\hat{ϕ})})\} + {(1 - R)}^{T} log \{1 - Φ (E_{q (x_{m i s})} (X) μ_{q (\hat{ϕ})})\} \\ + \frac{1}{2} log \frac{|\sum_{q (\hat{ϕ})}|}{σ_{ϕ}^{2}} - \frac{1}{2 σ_{ϕ}^{2}} \{{∥μ_{q (\hat{ϕ})}∥}^{2} + tr (Σ_{q (\hat{ϕ})})\} + \frac{1}{2} 1^{T} μ_{q (x_{m \dot{z}})} - A_{q (σ_{x}^{2})} log (B_{q (σ_{x}^{2})}) \\ + (a_{q (σ)} - a) μ_{q (log (σ))} + \frac{(b_{q (σ)} - b) a_{q (σ)}}{b_{q (\hat{σ})}} + log Γ (a_{q (σ)}) - a_{q (σ)} log b_{q (σ)} \\ - \frac{1}{2} \sum_{i = 1}^{n} μ_{q (log z_{i})} - \frac{n}{2} μ_{q (log \hat{σ})} - \frac{1}{2 θ^{2}} μ_{q (1 / \hat{σ})} 1^{T} μ_{q (1 / \hat{z})} E_{q (x_{m i s}) q (\hat{β})} {∥y - X^{T} β∥}^{2} \\ + \frac{φ μ_{q (1 / σ)}}{θ^{2}} \sum_{i = 1}^{n} y_{i} - \frac{φ μ_{q (1 / σ)}}{θ^{2}} 1^{T} E_{q (x_{m i s})} (X) μ_{q (β)} - \frac{φ^{2} μ_{q (1 / σ)}}{2 θ^{2}} 1^{T} μ_{q (z)} \\ - \sum_{j = 1}^{k} (- \frac{1}{2} μ_{q (s_{j})} μ_{q (η^{2})} - log \frac{{(a_{s_{j}} / b_{s_{j}})}^{1 / 4}}{2 K_{1 / 2} (\sqrt{a_{s_{j}} b_{s_{j}}})} + \frac{1}{2} μ_{q (s_{j})} (log s_{j}) + \frac{1}{2} (a_{s_{j}} μ_{q (s_{j})} + b_{s_{j}} μ_{q (1 / s_{j})})) \\ + \frac{k}{2} μ_{q (log \hat{σ})} - c_{q} log (d_{q}) + (c_{q} - c) μ_{q (η^{2})} + log (\frac{τ (c_{q})}{τ (c)}) + (d_{q} - d) μ_{q (η^{2})} . \end{matrix}$

The specific algorithm is shown in Algorithm 3.

4.2. Response with Nonignorable Missing Variable Choices

Only when the response variable has a nonignorable presence of missing values, similar to Section 3.3, we consider the missing data mechanism as a logistic regression model with parameters s and $η$ . The prior distribution for these parameters follows the same approach as described in Section 4.1. The interactions between the parameters shown in Figure 4.

Algorithm 3: VI with Nonignorable Missing Covariates in Bayesian lasso regularized QR Models.

Input: The prior distribution of each parameter, including $σ_{β}^{2}, σ_{ϕ}^{2}, μ_{x}, σ_{x}^{2}, σ_{μ_{x}}^{2}, a^{*}, b, c, d$ , and a given lower bound on the evidence lower bound $t o l = 10^{- 5}$

For the above parameters, we consider the family of mean field variables.

$q (β, s, η^{2}, z, σ, y_{m i s}, ϕ, ω) = q (β) q (s) q (η^{2}) q (z) q (σ) q (y_{m i s}) q (ϕ) q (ω)$

Since $s = (s_{1}, \dots, s_{k}), s_{j}$ are independent of each other, so $q (s) = q (s_{1}) \dots q (s_{k})$ . In addition, the optimal density functions for the parameters $s, η$ are given in Section 4.1, the optimal density functions for the parameters $z, σ, y_{m i s}, ϕ, ω$ are given in Section 3.3, and only the optimal density functions for the parameter $β$ are given here as follows.

From Equation (7), we know that the parameter $β_{j}$ optimal density function is normally distributed

(45) $\begin{matrix} q (β_{j}) \sim N (μ_{q (β_{j})}, σ_{q (β_{j})}^{2}) \end{matrix}$

where

$\begin{matrix} σ_{q (β_{j})}^{2} = 1 / (\sum_{i = 1}^{n} \frac{μ_{q (1 / σ)} μ_{q (1 / z_{i})} X_{j}^{2}}{θ^{2}} + μ_{q (1 / z_{j})}) \\ μ_{q (β_{j})} = σ_{q (β_{j})}^{2} [\frac{μ_{q (1 / σ)} X_{j}}{θ^{2}} (E_{q (y_{m i s})} (y) ⊙ μ_{q (1 / z)} - φ l - (X μ_{q (β)} - X_{j} μ_{q (β_{j})}) ⊙ μ_{q (1 / z)})] . \end{matrix}$

In the model, the evidence lower bound by the evidence lower bound in this model is as follows, and the specific algorithm is described in Algorithm 4.

Algorithm 4: VI with Nonignorable Missing Response in Bayesian Lasso-Regularized QR Models.

Input: The prior distribution of each parameter, including $σ_{β}^{2}, σ_{ϕ}^{2}, μ_{x}, σ_{x}^{2}, a^{*}, b, c, d$ , and a given lower bound on the evidence lower bound $t o l = 10^{- 5}$

(46) $\begin{matrix} L (q) = \frac{1}{2} log \frac{|\sum_{q (β)}|}{σ_{β}^{2}} - \frac{1}{2 σ_{β}^{2}} \{{∥μ_{q (β)}∥}^{2} + tr (Σ_{q (β)})\} - n μ_{q (og (σ))} - \frac{1}{2} \sum_{j = 1}^{η_{m ε}} log σ_{q (y_{m \in \dot{j},})}^{2} \\ - \sum_{i = 1}^{n} (μ_{q (z_{i})} μ_{q (1 / σ)} + log \frac{{(a_{z_{i}} / b_{z_{i}})}^{1 / 4}}{2 K_{1 / 2} (\sqrt{a_{z_{i}} b_{z_{i}}})} - \frac{1}{2} μ_{q (log z_{i})} - \frac{1}{2} (a_{z_{i}} μ_{q (z_{i})} + b_{z_{i}} μ_{q (1 / z_{i})})) \\ + (a_{q (σ)} - a) μ_{q (log (σ))} + \frac{(b_{q (σ)} - b) a_{q (σ)}}{b_{q (σ)}} - a_{q (σ)} log b_{q (σ)} \frac{1}{2} log |σ_{q (ϕ)}^{2} I| - \frac{1}{2} μ_{q (ϕ)}^{T} {(\sum_{ϕ})}^{- 1} μ_{q (Φ)} \\ + \sum_{i = 1}^{n} [(R_{i} - \frac{1}{2}) E_{q} (Y_{i}) μ_{q (ϕ)} - log (\frac{exp (ξ_{i})}{1 + exp (ξ_{i})}) - \frac{1}{2} ξ_{i}] - \frac{1}{2} tr ({(Σ_{ϕ})}^{- 1} Σ_{q (ϕ)}^{2}) \\ - \frac{μ_{q (1 / σ)}}{2} \sum_{i = 1}^{n} [E_{q} (y_{i}^{2}) μ_{q (1 / z_{i})} / θ^{2} - \frac{2}{θ^{2}} E_{q} (y_{i}) X_{i}^{T} μ_{q (β)} μ_{q (1 / z_{i})} - 2 E_{q} (y_{i}) φ / θ^{2}] + (d_{q} - d) μ_{q (η^{2})} \\ - \frac{μ_{q (1 / σ)}}{2} \sum_{i = 1}^{n} [\frac{φ}{θ^{2}} X_{i}^{T} μ_{q (β)} + \frac{μ_{q (1 / z_{i})}}{θ^{2}} tr (X_{i}^{T} X_{i} (Σ_{q (β)} + μ_{q (β)} μ_{q (β)}^{T})) + \frac{φ^{2}}{θ^{2}} μ_{q ((_{i})}] \\ - \frac{1}{2} \sum_{i = 1}^{n} μ_{q (log (z_{i})} - \frac{n}{2} μ_{q (log (σ))} + \frac{k}{2} μ_{q (σ)} (log σ) - c_{q} log (d_{q}) + (c_{q} - c) μ_{q (η^{2})} + log (\frac{τ (c_{q})}{τ (c)}) \\ - \sum_{i = 1}^{k} (- \frac{1}{2} μ_{q (s_{i})} μ_{q (η^{2})} - log \frac{{(a_{s_{i}} / b_{s_{i}})}^{1 / 4}}{2 K_{1 / 2} (\sqrt{a_{s_{i}} b_{s_{i}}})} + \frac{1}{2} μ_{q (s_{i})} (log s_{i}) + \frac{1}{2} (a_{s_{i}} μ_{q (s_{i})} + b_{s_{i}} μ_{q (1 / s_{i})})) \end{matrix}$

5. Simulation Studies

In this section, we conducted four experiments to validate the effectiveness of variational inference for parameter estimation in a quantile regression model with non-negligible missing data. The computer involved used the Windows operating system with an Intel® Core i5-8400 six-core processor. The experimental data were generated based on the following quantile regression models:

$\begin{matrix} y_{i} = X_{i}^{T} β + ε_{i}, i = 1, \dots, n \\ ε_{i} \sim A L D (0, σ, p) \end{matrix}$

In the above equation, $X_{i}$ $= (x_{i 1}, \dots, x_{i k})$ and $β = (β_{0}, \dots, β_{k})$ , where $x_{i 1} = 1$ . Furthermore, we also performed Gibbs sampling simulations for different scenarios in Simulations 1 and II. We compared the results in terms of CPU time, the deviation of parameter estimates from the true values (BIAS, $|\frac{1}{100} \sum_{i = 1}^{100} \hat{β} - β|$ ), the root mean square error (RMSE, $\sqrt{\frac{1}{100} \sum_{i = 1}^{100} {(\hat{β} - β)}^{2}}$ ), and the standard deviation (SD). For Simulations I and II, we introduced approximately 10% missing data as M1 and approximately 20% missing data as M2 under different error terms.

Simulation I:

In this simulation, we take $n = 200$ or 30, $k = 2, x_{i 2}$ randomly generated from the normal distribution $N (0, 0.6)$ and consider the case of missing covariates $x_{i 2}$ . When $n = 200$ , we take the parameters $β = (β_{0}, β_{1})$ , and the truth values of $σ$ are $β_{0} = β_{1} = 1, σ = 1$ .

The prior distributions for the parameters $β$ and $σ$ are set as $β \sim N (0, I)$ and $σ \sim I G (1, 0.0009)$ . We consider different distributions for $ε_{i}$ and examine various deletion cases at different quartiles within each distribution. The details of these cases are as follows: (C1): $ε_{i}$ follows a standard normal distribution $N (0, 0.7)$ . We set the true value of $ϕ$ as $(2.5, - 2.7)$ , and the missing rates of $x_{i 2}$ are approximately $10.9 %$ . Alternatively, when we set the true value of $ϕ$ as $(0.75, - 1.05)$ , the missing rate of $x_{i 2}$ is about $23 %$ . (C2): $ε_{i}$ follows a mixed normal distribution $0.9 N (0, 0.7) + 0.1 N (0, 1)$ . The true value of $ϕ$ is set as $(2.4, - 2.8)$ , and the missing rates of $x_{i 2}$ are approximately $11 %$ . Alternatively, when we set the true value of $ϕ$ as $(0.75, - 1)$ , the missing rate of $x_{i 2}$ is about $22 %$ . (C3): $ε_{i}$ follows a chi-squared distribution $0.3 χ_{5}^{2}$ . We set the true value of $ϕ$ as $(2.55, - 2.75)$ , and the missing rates of $x_{i 2}$ are approximately $9 %$ . Alternatively, when we set the true value of $ϕ$ as $(0.75, - 1)$ , the missing rate of $x_{i 2}$ is about $23 %$ . When $n = 30$ , We examined the M1 situation, maintaining the parameters and error distributions as previously detailed.

Simulation II:

In this simulation, we consider a scenario where $n = 200$ or 30, $k = 2$ . The covariate $x_{i 2}$ is randomly generated from a normal distribution $N (0, 0.6)$ . When n = 200, we focus on the case of missing response variable $y_{i}$ . The true values of the parameters $β = (β_{0}, β_{1})$ and $σ$ are set as follows: $n β_{0} = β_{1} = 1, σ = 1 .$ The prior distributions for the parameters $β$ and $σ$ are set as $β \sim N (0, 0.8 I)$ and $σ \sim I G (1, 0.05)$ . We consider different distributions for $ε_{i}$ and examine various deletion cases at different quartiles within each distribution. The details of these cases are as follows: (C1): $ε_{i}$ follows a standard normal distribution $N (0, 0.3)$ . We set the true value of $ϕ$ as $(1.5, 1.5)$ , and the missing rates of $x_{i 2}$ are approximately $11 %$ . Alternatively, when we set the true value of $ϕ$ as $(0.82, 0.85)$ , the missing rates of $x_{i 2}$ are approximately $19 %$ . (C2): $ε_{i}$ follows a mixed normal distribution $0.9 N (0, 0.1) + 0.1 N (0, 0.3)$ . We set the true value of $ϕ$ as $(2.1, 0.29)$ , and the missing rates of $x_{i 2}$ are approximately $12 %$ . Alternatively, when we set the true value of $ϕ$ as $(1.5, 0.3)$ , the missing rates of $x_{i 2}$ are approximately $23 %$ . (C3): $ε_{i}$ follows a chi-square distribution $0.1 χ_{3}^{2}$ . We set the true value of $ϕ$ as $(2, 0.45)$ , and the missing rates of $x_{i 2}$ are approximately $9 %$ . Alternatively, when we set the true value of $ϕ$ as $(2, - 0.15)$ , the missing rates of $x_{i 2}$ are approximately $19 %$ . When $n = 30$ , we examined the M1 situation, maintaining the parameters and error distributions as previously detailed.

By comparing the results with and without the inclusion of the scale parameter $σ$ in Simulation I and Simulation II, when $n = 200$ , we observed the following findings for different missing cases and error distributions. The parameter estimation results obtained using the VI algorithm for Simulation I and Simulation II are presented in Table 1 and Table 2, respectively. Meanwhile, the parameter estimation results obtained using Gibbs are displayed in Table A1 and Table A2 (Appendix B). The Bayesian estimates obtained by the proposed method we present perform satisfactorily in that (i) they have smaller BIAS, RMS, and SD, and the SD and RMS values are close to each other; and (ii) the comparison between Table 1 and Table A1, and Table 2 and Table A2 reveals that different error distributions, various missing cases, and the inclusion of the parameter $σ$ have a significant impact on CPU time. When considering covariates with nonignorable missingness and including the parameter $σ$ , VI performs over ten times faster than Gibbs. Without including $σ$ , VI is nearly 50 times faster than Gibbs. Similarly, in the presence of nonignorable missing response variables and including sigma, VI is nearly 50 times faster than Gibbs. Without $σ$ , VI speeds up by almost 80 times compared to Gibbs. This means our proposed variational Bayesian method is faster than MCMC while ensuring parameter estimation accuracy. When $n = 30$ , although VI is almost a hundred times faster than Gibbs, its accuracy is somewhat lower. This difference is even more noticeable when the response variable is absent, for specific displays in Table 3.

Based on the good performance of variational inference observed in Simulation I and Simulation II, we employed Bayesian lasso for variable selection in the models with missing covariates and response variables in Simulations III and IV, respectively. Furthermore, the variational Bayesian inference achieved convergence in both Simulations I and II. For convenience, we provide the convergence results of C1 at different quartiles of M1 when the covariates have missing values and the response variables have missing values, as depicted in Figure 5, see Figure 6 for run time.

Simulation III:

In this simulation, we utilize a Bayesian lasso quantile regression model to perform variable selection and parameter estimation in the presence of missing covariates. The data generation process follows the equation $y_{i} = β_{0} + X_{i}^{T} β + ε_{i}, i = 1, \dots, n$ , where we omit the scale parameter $σ$ . The true values are set as $β_{0} = 1$ , $β = {(β_{1}, \dots, β_{7})}^{T}$ , $X_{i} = (x_{i 1}, \dots, x_{i 7})$ , with $n = 200$ . The covariates $x_{i 1}, \dots, x_{i 7}$ are independently sampled from a normal distribution $N (0, 0.6)$ , and we take $β = (3, 1.5, 0, 0, 2, 0, 0)$ with a prior distribution of $N (0, 1.5 I)$ . The error term $ε_{i}$ is considered under three different distributions, similar to Simulations I and II, when $x_{i 1}$ is missing. (C1), the error term $ε_{i}$ follows a standard normal distribution $N (0, 0.7)$ . By setting the true values as $ϕ = (0.70, - 1.05)$ , the missing rates of $x_{i 1}$ are approximately $28 %$ . (C2), the error term $ε_{i}$ follows a mixed normal distribution $0.9 N (0, 0.7) + 0.1 N (0, 1)$ . Taking the true values as $ϕ = (0.75, - 1.00)$ , the missing rates of $x_{i 1}$ are approximately $25 %$ . (C3), the error term $ε_{i}$ follows a chi-square distribution $0.3 χ_{5}^{2}$ . With the true values $ϕ = (0.75, - 1.00)$ , the missing rates of $x_{i 1}$ are approximately $25 %$ .

Simulation IV:

In this simulation, we employ a Bayesian lasso-quantile regression model to handle variable selection and parameter estimation in the presence of missing response variables. The data generation process is similar to Simulations III and does not include the scale parameter $σ$ . We set $β = (1, - 1, 1, 0, 0, 0, 0)$ and assume a prior distribution of $N (0, I)$ . The error term $ε_{i}$ is considered under three different distributions, following the pattern of Simulation I and Simulation II when $y_{i}$ is missing. (C1), the error term $ε_{i}$ follows a standard normal distribution $N (0, 0.3)$ . With the true values set as $ϕ = (0.82, 0.80)$ , the missing rates of $y_{i}$ are approximately $23 %$ . (C2), the error term $ε_{i}$ follows a mixed normal distribution $0.9 N (0, 0.1) + 0.1 N (0, 0.9)$ . Taking the true values as $ϕ = (1.08, 0.60)$ , the missing rates of $y_{i}$ are approximately $16 %$ . (C3), the error term $ε_{i}$ follows a chi-square distribution $0.1 χ_{3}^{2}$ . With the true values $ϕ = (1.00, 0.45)$ , the missing rates of $y_{i}$ re about $18 %$ . In Simulation III and Simulation IV, we repeated 100 times and evaluated the performance of variable selection and parameter estimation using the following metrics: (1) the $L_{2}$ parametrization between the parameter estimates and the true value $(L_{2} = {(\hat{β} - β)}^{T} (\hat{β} - β))$ ; (2) mean square error (MSE) $(M S E = {(\hat{β} - β)}^{T} X^{T} X (\hat{β} - β) / n)$ ; (3) the number of parameters that have a true value of zero and are correctly identified as zero is recorded as “C”; (4) the number of parameters whose true value is not zero but is incorrectly identified as zero is recorded as “IC”.

We investigated three interquartile levels (0.25, 0.5, 0.75), as shown in Table 4 and Table 5, yielding the following results:

(1). The “IC” value is extremely close to zero, and “C” closely approximates the number of zeros in the real parameter.
(2). The $L_{2}$ values for the three interquartile levels are below 0.03, indicating a negligible difference from zero.
(3). The “MSE” value at all three quantile levels is less than 0.09.

These findings demonstrate the strong performance of our proposed VI for variable selection.

6. A Real Example

In this section, we demonstrate the application of Algorithm 3 using data obtained from the American Press Institute, specifically the 1995 American University News Report dataset. The dataset can be accessed at http://lib.stat.cmu.edu/datasets/colleges (accessed on 21 April 2023). Our focus is on examining the relationship between graduation rates (y) and various factors.

We select the following factors for analysis: the public/private indicator as $x_{1}$ , the student-to-faculty ratio as $x_{2}$ , the natural logarithm of the number of applicants accepted as $x_{3}$ , the average Combined SAT score as $x_{4}$ , the average ACT score as $x_{5}$ , the natural logarithm of the number of applications received as $x_{6}$ , the natural logarithm of the number of new students enrolled as $x_{7}$ , room and board costs as $x_{8}$ , and the natural logarithm of instructional expenditure per student as $x_{9}$ .

To ensure a complete set of observations, we assume $x_{1}$ , $x_{2}$ , and y to be fully observed. However, we remove one erroneous case from the response variable y, resulting in a total of 1203 cases. The final missing rates for the remaining variables are as follows: $x_{3}$ (0.41%), $x_{4}$ (39.23%), $x_{5}$ (45.22%), $x_{6}$ (0.49%), $x_{7}$ (0.16%), $x_{8}$ (4.90%), and $x_{9}$ (2.07%).

Furthermore, to facilitate statistical inferences on the data, we apply two preprocessing steps. First, we take the natural logarithm of the variables $x_{3}$ , $x_{6}$ , $x_{7}$ , $x_{8}$ , and $x_{9}$ . Second, we standardize all the variables to eliminate any scale differences among them. These preprocessing steps ensure that the data are appropriately transformed and normalized for subsequent analysis and inference.

We consider the following quantile regression model.

$Q (y_{i} ∣ x_{i}, β_{z}) = β_{z 0} + x_{i}^{T} β_{τ}$

where

i = 1, \dots, 1203, x_{i} = {(x_{i 1}, \dots, x_{i 9})}^{T}, β_{τ} = {(β_{τ 1}, \dots, β_{τ 9})}^{T}, β_{τ 0}

denotes the intercept term. For

x_{3}, x_{4}, x_{5}, x_{6}, x_{7}, x_{8}, x_{9}

with missing

x_{3}, x_{4}, x_{5}, x_{6}, x_{7}, x_{8}, x_{9}

, they are assumed to follow a normal distribution, and the expectation and variance are the mean and variance of the observable data, respectively. The missing data mechanism is set as in the algorithm. We consider the above missingness as non-negligible missingness. We calculate the algorithm’s success based on

τ = 0.5

and the estimates of the last known parameters: (1)

x_{2}, x_{3}, x_{6}, x_{7}

have a small effect on the response variable, and the 0.5 quantile regression can be written as

y = - 0.0334 + 0.3947 x_{1} + 0.3185 x_{4} + 0.0649 x_{5} + 0.148 x_{8} + 0.034 x_{9}

(2) with missing

x_{3}, x_{6}, x_{7}

missing mechanisms for MAR,

x_{4}, x_{5}, x_{8}, x_{9}

for MNAR. Lower bound convergence in Figure 7, Bayesian estimates (EST) and 95% confidence intervals (CI) of the parameters in Table 6.

7. Discussion

In existing research on quantile regression models with significant missing data, most studies have utilized the Markov Chain Monte Carlo (MCMC) algorithm for Bayesian inference, despite some drawbacks such as sampling difficulties and extensive computational time. In this paper, we propose employing the variational Bayesian algorithm for statistical inference of the aforementioned model while incorporating variable selection. Specifically, in the presence of missing covariates, we consider a probit regression model for the missing data mechanism, and for the missing response variable, we consider a logistic regression model. Additionally, we employ lasso regularization for variable selection in both cases. Simulation and real-world experiments demonstrate that when covariates or response variables have nonignorable missing values in quantile regression models, the variational Bayesian method not only ensures inference precision but also consumes less time compared to MCMC.

Based on our experience, we often encounter situations where both covariates and response variables have missing values. The aforementioned developed variational Bayesian approach faces a well-known ill-posed problem in practical applications. To address this issue, we explore variational Bayesian parameter estimation for quantile regression models in the presence of simultaneous missing covariates or response variables.

Our proposed method has some potential drawbacks. For instance, as the number of dimensions increases, the convergence speed of variational Bayes may slow down. Furthermore, the calculation of expectations may lack an analytical solution, and assumptions about the correctness of the missing data mechanism must be considered. To solve these issues, we can employ commonly used machine learning techniques like deep learning and neural networks to address missing data mechanisms. Furthermore, in the future, the stochastic variational Bayesian method can enhance the algorithm’s efficiency, while the Bayesian dimensionality reduction approach can tackle issues related to high-dimensional quantile regression problems.

Author Contributions

Conceptualization, M.T.; methodology, X.L. and M.T.; software, X.L.; data curation, X.L.; writing—original draft, X.L.; supervision, M.T. and X.H.; funding acquisition, X.H. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Not applicable.

Acknowledgments

We are grateful to the editor, associated editor, and four referees for their valuable suggestions and comments that greatly improved the article, grateful to both StatLib and the original contributor of the material.

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1. DAG for the missing covariates for Quantile regression model.

Figure 2. DAG for the missing response for Quantile regression model.

Figure 3. DAG for the missing covariates for Bayesian lasso regularized quantile regression model.

Figure 4. DAG for the missing response for Bayesian lasso regularized quantile regression model.

$View Image - Figure 5. Variable fractional Bayesian convergence effect.$

Figure 5. Variable fractional Bayesian convergence effect.

Figure 6. Comparison of CPU time of variational Bayesian over Gibbs when covariates with missing.

Figure 7. Lower bound convergence of evidence in the case analysis.

Table 1

Parameter estimation and CPU time cases in Simulation I using VI, $n = 200$ .

	Method	Missing	p	BIAS		RMSE		SD		CPU Time
	Method	Missing	p	$β_{0}$	$β_{1}$	$β_{0}$	$β_{1}$	$β_{0}$	$β_{1}$	CPU Time
without	C1	M1	0.25	0.006	0.078	0.062	0.138	0.061	0.114	124.4
			0.5	0.049	0.040	0.079	0.109	0.062	0.101	111
			0.75	0.104	0.0001	0.126	0.120	0.071	0.121	133.2
		M2	0.25	0.014	0.082	0.069	0.152	0.068	0.128	185
			0.5	0.071	0.022	0.095	0.117	0.063	0.116	176.6
			0.75	0.129	0.001	0.147	0.134	0.072	0.136	201.8
	C2	M1	0.25	0.066	0.076	0.088	0.132	0.059	0.109	138.4
			0.5	0.056	0.039	0.076	0.101	0.050	0.095	139.8
			0.75	0.036	0.030	0.079	0.105	0.072	0.101	137.4
		M2	0.25	0.068	0.077	0.088	0.127	0.055	0.101	178.2
			0.5	0.063	0.050	0.084	0.123	0.053	0.109	165
			0.75	0.053	0.039	0.083	0.120	0.064	0.113	236.6
	C3	M1	0.25	0.028	0.040	0.058	0.101	0.053	0.096	124.8
			0.5	0.081	0.015	0.142	0.111	0.073	0.126	139.6
			0.75	0.121	0.010	0.136	0.131	0.117	0.208	136.6
		M2	0.25	0.045	0.019	0.121	0.130	0.062	0.103	172.2
			0.5	0.110	0.018	0.124	0.109	0.072	0.153	158.8
			0.75	0.137	0.018	0.105	0.162	0.108	0.109	268.4
with	C1	M1	0.25	0.025	0.009	0.062	0.074	0.058	0.074	387.83
			0.5	0.048	0.019	0.066	0.083	0.045	0.081	342.17
			0.75	0.035	0.011	0.054	0.088	0.040	0.088	359.41
		M2	0.25	0.045	0.011	0.062	0.085	0.043	0.085	1068.91
			0.5	0.088	0.025	0.097	0.082	0.041	0.079	963.02
			0.75	0.125	0.015	0.133	0.093	0.046	0.093	955.78
	C2	M1	0.25	0.098	0.009	0.105	0.074	0.038	0.074	384.02
			0.5	0.059	0.004	0.070	0.063	0.039	0.064	335.35
			0.75	0.039	0.004	0.057	0.072	0.041	0.073	385.50
		M2	0.25	0.098	0.016	0.105	0.075	0.040	0.073	1019.75
			0.5	0.078	0.008	0.088	0.076	0.039	0.076	885.35
			0.75	0.069	0.013	0.084	0.083	0.048	0.083	925.44
	C3	M1	0.25	0.026	0.004	0.035	0.046	0.023	0.046	320.22
			0.5	0.072	0.004	0.085	0.075	0.046	0.075	318.81
			0.75	0.151	0.039	0.162	0.109	0.059	0.102	414.64
		M2	0.25	0.045	0.006	0.054	0.050	0.029	0.050	903.49
			0.5	0.117	0.037	0.125	0.086	0.042	0.078	865.73
			0.75	0.175	0.036	0.186	0.130	0.063	0.125	1109.56

Table 2

Parameter estimation and CPU time cases in Simulation II using VI, $n = 200$ .

	Method	Missing	p	BIAS		RMSE		SD		CPU Time
	Method	Missing	p	$β_{0}$	$β_{1}$	$β_{0}$	$β_{1}$	$β_{0}$	$β_{1}$	CPU Time
with	C1	M1	0.25	0.002	0.051	0.042	0.095	0.042	0.081	153.4
			0.5	0.063	0.019	0.074	0.072	0.038	0.070	132.2
			0.75	0.128	0.058	0.132	0.098	0.030	0.079	152.6
		M2	0.25	0.011	0.025	0.035	0.086	0.033	0.082	211.2
			0.5	0.076	0.025	0.085	0.088	0.038	0.084	190.8
			0.75	0.151	0.019	0.156	0.079	0.042	0.077	200
	C2	M1	0.25	0.015	0.003	0.026	0.043	0.021	0.043	115.8
			0.5	0.003	0.001	0.020	0.038	0.020	0.039	103.6
			0.75	0.077	0.009	0.079	0.057	0.020	0.057	148.4
		M2	0.25	0.038	0.022	0.045	0.056	0.023	0.051	168.4
			0.5	0.048	0.032	0.040	0.030	0.019	0.030	157.2
			0.75	0.117	0.030	0.012	0.064	0.029	0.055	173
	C3	M1	0.25	0.015	0.003	0.026	0.043	0.021	0.043	127.8
			0.5	0.003	0.001	0.020	0.038	0.020	0.039	84
			0.75	0.077	0.009	0.079	0.057	0.020	0.057	134.6
		M2	0.25	0.043	0.022	0.053	0.055	0.030	0.051	150
			0.5	0.048	0.032	0.054	0.057	0.026	0.048	134
			0.75	0.117	0.032	0.121	0.064	0.029	0.055	214
without	C1	M1	0.25	0.024	0.060	0.038	0.102	0.031	0.087	69.4
			0.5	0.066	0.055	0.078	0.078	0.031	0.055	58.8
			0.75	0.057	0.061	0.065	0.088	0.030	0.064	96
		M2	0.25	0.049	0.090	0.128	0.116	0.064	0.076	88.4
			0.5	0.037	0.087	0.040	0.100	0.030	0.051	77.6
			0.75	0.049	0.018	0.057	0.185	0.028	0.046	113.5
	C2	M1	0.25	0.027	0.012	0.028	0.045	0.008	0.046	65
			0.5	0.019	0.019	0.025	0.031	0.015	0.025	57.8
			0.75	0.002	0.013	0.017	0.037	0.017	0.035	81.6
		M2	0.25	0.102	0.051	0.104	0.070	0.022	0.048	105.17
			0.5	0.078	0.050	0.081	0069	0.021	0.047	97.80
			0.75	0.057	0.044	0.061	0.070	0.023	0.054	120.80
	C3	M1	0.25	0.010	0.036	0.023	0.072	0.022	0.066	83.8
			0.5	0.009	0.015	0.024	0.040	0.023	0.038	86.4
			0.75	0.076	0.027	0.080	0.059	0.025	0.052	103.2
		M2	0.25	0.033	0.011	0.041	0.070	0.026	0.074	108
			0.5	0.052	0.042	0.057	0.066	0.024	0.051	112.2
			0.75	0.122	0.055	0.012	0.083	0.026	0.062	150.6

Table 3

Parameter estimation and CPU time cases in Simulation I and Simulation II, $n = 30$ .

	Method	Missing	p	BIAS		RMSE		SD		CPU Time
	Method	Missing	p	$β_{0}$	$β_{1}$	$β_{0}$	$β_{1}$	$β_{0}$	$β_{1}$	CPU Time
Simulation I	C1	VI	0.25	0.199	0.412	0.244	0.481	0.143	0.262	18.4
			0.5	0.032	0.250	0.148	0.344	0.149	0.232	22.2
			0.75	0.033	0.264	0.111	0.346	0.108	0.226	25.6
		Gibbs	0.25	0.033	0.065	0.123	0.267	0.128	0.264	3353.2
			0.5	0.022	0.070	0.104	0.261	0.108	0.253	3347.8
			0.75	0.035	0.141	0.132	0.268	0.105	0.223	4275
	C2	VI	0.25	0.086	0.333	0.145	0.116	0.113	0.256	15.8
			0.5	0.065	0.281	0.11	0.365	0.090	0.239	17.6
			0.75	0.007	0.299	0.099	0.357	0.093	0.257	18.4
		Gibbs	0.25	0.038	0.062	0.125	0.266	0.123	0.251	3668.4
			0.5	0.018	0.132	0.100	0.230	0.119	0.230	3357.2
			0.75	0.117	0.112	0.192	0.264	0.169	0.235	3873
	C3	VI	0.25	0.145	0.433	0.186	0.493	0.121	0.243	16.8
			0.5	0.056	0.271	0.120	0.358	0.090	0.229	18
			0.75	0.037	0.239	0.149	0.327	0.130	0.237	17.6
		Gibbs	0.25	0.063	0.082	0.133	0.225	0.110	0.211	4150
			0.5	0.158	0.172	0.234	0.357	0.176	0.318	3834
			0.75	0.127	0.192	0.361	0.464	0.359	0.425	3614
Simulation II	C1	VI	0.25	0.124	0.350	0.178	0.442	0.121	0.277	14.7
			0.5	0.046	0.275	0.138	0.418	0.131	0.315	16.7
			0.75	0.005	0.021	0.145	0.348	0.140	0.284	17
		Gibbs	0.25	0.030	0.040	0.088	0.116	0.074	0.126	1450.4
			0.5	0.007	0.017	0.050	0.090	0.060	0.091	1473.6
			0.75	0.007	0.030	0.077	0.145	0.078	0.146	1539.5
	C2	VI	0.25	0.167	0.312	0.198	0.425	0.118	0.286	15.2
			0.5	0.009	0.269	0.155	0.341	0.155	0.225	15.8
			0.75	0.042	0.273	0.187	0.372	0.187	0.275	16.2
		Gibbs	0.25	0.062	0.006	0.084	0.050	0.082	0.048	1566.17
			0.5	0.028	0.005	0.004	0.059	0.031	0.057	1600.80
			0.75	0.003	0.024	0.021	0.060	0.023	0.054	1666.80
	C3	VI	0.25	0.090	0.246	0.123	0.312	0.092	0.266	15.4
			0.5	0.059	0.135	0.193	0.290	0.193	0.268	16.3
			0.75	0.009	0.272	0.240	0.499	0.225	0.422	15.9
		Gibbs	0.25	0.033	0.001	0.041	0.060	0.046	0.064	1600
			0.5	0.012	0.008	0.057	0.086	0.044	0.081	1600.2
			0.75	0.012	0.025	0.062	0.003	0.066	0.082	1500.6

Table 4

Results of variable selection for Simulation III.

	0.25				0.5				0.75
	L2	MSE	C	IC	L2	MSE	C	IC	L2	MSE	C	IC
C1	0.0276	0.0072	4.85	0	0.0056	0.0022	4.94	0	0.0251	0.0080	4.94	0
C2	0.0252	0.0080	4.89	0	0.0063	0.0062	4.93	0	0.0184	0.0074	4.88	0
C3	0.0134	0.0048	4.86	0	0.0048	0.0019	4.75	0	0.0124	0.0045	4.9	0

Table 5

Results of variable selection for Simulation IV.

	0.25				0.5				0.75
	L2	MSE	C	IC	L2	MSE	C	IC	L2	MSE	C	IC
C1	0.0369	0.0118	4.93	0	0.0035	0.0011	4.93	0	0.0170	0.0053	4.95	0
C2	0.0071	0.0021	4.95	0	0.0036	0.0012	4.89	0	0.0063	0.0020	4.93	0
C3	0.0174	0.0053	4.97	0	0.0044	0.0016	4.91	0	0.0054	0.0018	4.95	0

Table 6

Bayesian estimates (EST) and 95% confidence intervals (CI) of the parameters in real example.

Par	Est (CI)	Par	Est (CI)
$β_{0}$	−0.0334 (−0.0369,−0.0300)	$ϕ_{4, 0}$	0.2552 (0.2538,0.2567)
$β_{1}$	0.3947 (0.3885,0.4009)	$ϕ_{4, 1}$	−0.1527 (−0.1622,0.1432)
$β_{2}$	−0.0500 (−0.1102,0.0105)	$ϕ_{5, 0}$	0.1066 (0.1066,0.1067)
$β_{3}$	0.0121 (−0.1023,0.1265)	$ϕ_{5, 1}$	−0.0284 (−0.0307,−0.0261)
$β_{4}$	0.3185 (0.3114,0.3256)	$ϕ_{6, 0}$	1.5119 (1.5117,1.5121)
$β_{5}$	0.0649 (0.0616,0.0682)	$ϕ_{6, 1}$	0.0036 (0.0028,0.0044)
$β_{6}$	0.0335 (−0.0428,0.1138)	$ϕ_{7, 0}$	1.5435 (1.5344,1.5346)
$β_{7}$	−0.0600 (−0.1299,0.0098)	$ϕ_{7, 1}$	0.0002 (0,0.0003)
$β_{8}$	0.1480 (0.1387,0.1573)	$ϕ_{8, 0}$	1.2650 (1.2648,1.2652)
$β_{9}$	0.0340 (0.0208,0.0472)	$ϕ_{8, 1}$	−0.0416 (−0.0442,−0.0399)
$ϕ_{3, 0}$	1.5177 (1.5174,1.5180)	$ϕ_{9, 0}$	1.4132 (1.4130,1.4133)
$ϕ_{3, 1}$	0.0024 (0.0014,0.0034)	$ϕ_{9, 1}$	−0.0042 (−0.0061,−0.0024)

Appendix A

The derivations of Algorithms 1–4 are similar to each other; thus, we only refine the steps of Algorithm 1. The full conditional probability posterior are as follows:(A1) $\begin{matrix} p (β ∣ Θ_{r}) \propto exp \{- \frac{1}{2} [β^{T} (\frac{1}{σ} \sum_{i = 1}^{n} \frac{X_{i} X_{i}^{T}}{θ^{2} z_{i}} + \frac{1}{σ_{β}^{2}} I) β - 2 β^{T} (\frac{1}{σ} \sum_{i = 1}^{n} \frac{(y_{i} - φ z_{i}) x_{i}}{θ^{2} z_{i}})]\} \end{matrix}$ (A2) $\begin{matrix} p (z ∣ Θ_{r}) \propto \prod_{i = 1}^{n} z_{i}^{- \frac{1}{2}} exp \{- \frac{1}{2} [\frac{{(y_{i} - X_{i}^{T} β)}^{2}}{θ^{2} σ z_{i}} + \frac{2 θ^{2} + φ^{2}}{θ^{2} σ} z_{i}]\} \end{matrix}$ (A3) $\begin{matrix} p (σ ∣ Θ_{r}) \propto {(\frac{1}{σ})}^{\frac{3 n}{2} + a + 1} exp \{- \frac{1}{σ} (\frac{(y - X^{T} β - φ z_{i}) \sum_{1}^{- 1} (y - X^{T} β - φ z_{i})}{2} + \sum_{i = 1}^{n} z_{i} + b)\} : \end{matrix}$ (A4) $\begin{matrix} p (x_{m i s} | Θ_{r}) \propto \prod_{i = 1}^{N m i s} \frac{exp [- \frac{1}{2} x_{i} (\frac{β_{i}^{2}}{θ^{2} σ z_{i}} + \frac{1}{σ_{x}^{2}} + ϕ_{1}^{2}) x_{i}]}{exp [- x_{i} (\frac{μ_{x}}{σ_{x}^{2}} - \frac{β_{0} β_{1}}{θ^{2} σ z_{i}} - ϕ_{0} ϕ_{1} + a_{i} ϕ_{1} + \frac{β_{1} y_{i}}{θ^{2} σ z_{i}} - \frac{β_{1} φ}{2 θ^{2} σ})]} \end{matrix}$ (A5) $\begin{matrix} p (μ_{x} ∣ Θ_{r}) \propto exp \{- \frac{1}{2} [μ_{x} (\frac{n}{σ_{x}^{2}} + \frac{1}{σ_{μ_{x}}^{2}}) μ_{x} - 2 μ_{x} \frac{\sum_{i}^{N_{o b s}} x_{o b s, i}}{σ_{μ_{x}}^{2}}]\} \end{matrix}$ (A6) $\begin{matrix} p (σ_{x}^{2} ∣ Θ_{r}) \sim I G (A_{x} + \frac{1}{2} n, B_{x} + \frac{1}{2} {∥X_{j} - μ_{x} 1∥}^{2}) \end{matrix}$ (A7) $\begin{matrix} p (ϕ ∣ Θ_{r}) \propto exp \{- \frac{1}{2} [ϕ^{T} (x^{T} x + \frac{1}{σ_{ϕ}^{2}}) ϕ - 2 ϕ^{T} x a]\} \end{matrix}$ (A8) $\begin{matrix} p (a ∣ Θ_{r}) = {(2 π)}^{- \frac{n}{2}} \prod_{i = 1}^{n} [{\{\frac{I (a_{i} > 0)}{Φ (x_{i} ϕ)}\}}^{R_{i}} {\{\frac{I (a_{i} < 0)}{1 - Φ (x_{i} ϕ)}\}}^{1 - R_{i}} exp \{- \frac{1}{2} {(a_{i} - ϕ_{0} - ϕ_{1} x_{i})}^{2}\}] \end{matrix}$

In Step 4 of Algorithm 1, $\begin{matrix} E_{q (x_{m i s}) q (β) φ (z)} \sum_{i = 1}^{n} [\frac{{(y_{i} - X_{i}^{T} β - φ z_{i})}^{2}}{θ^{2} z_{i}}] \\ = \frac{1}{θ^{2}} \sum_{i = 1}^{n} (y_{i}^{2} μ_{q (1 / z_{i})} - 2 y_{i} E_{q (x_{mis})} (X_{i}) μ_{q (β)} μ_{q (1 / z_{i})}) \\ - \frac{1}{θ^{2}} \sum_{i = 1}^{n} (- 2 y_{i} φ + 2 E_{q (x_{m i s})} (X_{i}) μ_{q (β)} φ + μ_{q (1 / z_{i})} φ^{2} + μ_{q (1 / z_{i})} E_{q} {(X_{i}^{T} β)}^{2}) \end{matrix}$ and, $E_{q} {(X_{i}^{T} β)}^{2} = t r [E_{q (x_{m i s})} (X_{i}^{T} X_{i}) (σ_{q (β)}^{2} + μ_{q (β)}^{T} μ_{q (β)})]$

In Step 7 of Algorithm 1, $E_{q} {∥x - μ_{x} 1∥}^{2} = {∥x_{o b s} - μ_{q (μ_{x})} 1∥}^{2} + {∥μ_{q (x_{m i s})} - μ_{q (μ_{x})} 1∥}^{2} + n σ_{q (μ_{x})}^{2} + \sum_{j = 1}^{n_{m i s}} σ_{q (x_{m i s, j})}^{2}$

Expressions for $E_{q (x_{m i s})} (X), E_{q (x_{m i s})} (X_{i}^{T} X_{i})$ , $E_{q} (X) = (\begin{matrix} 1 & E_{q} (x) \end{matrix})$ , where $E_{q} (x) = (\begin{matrix} x_{o b s} \\ μ_{q (x_{mis})} \end{matrix})$

In other words, by replacing $x_{i}$ with $μ_{q (x_{m i s, i})}$ in X with the missing $x_{i}$ , we obtain $E_{q} (X)$ . Since $X_{i} = (1 x_{i})$ is a vector of $1 \times 2$ , we have, $E_{q} (X_{i} X_{i}^{T}) = (\begin{matrix} 1 & a_{12} \\ a_{21} & a_{22} \end{matrix})$

It is known that $E_{q} (X_{i} X_{i}^{T})$ is a symmetric matrix. When $x_{i}$ is missing, $a_{12} = μ_{q (x_{m i s, i})},$ $a_{22} = {(μ_{q (x_{m i s, i})})}^{2} + σ_{q (x_{m i s, i})}^{2}$ . then, $E_{q} (X X^{T}) = (\begin{matrix} n & 1^{T} x_{o b s} + 1^{T} μ_{q (x_{m i j})} \\ 1^{T} x_{o b s} + 1^{T} μ_{q (x_{m b s})} & {∥x_{o b s}∥}^{2} + {∥μ_{q (x_{m i s})}∥}^{2} + \sum_{j = 1}^{n_{m i s}} σ_{q (x_{m b s, j})}^{2} \end{matrix})$

Appendix B

Table A1

Parameter estimation and CPU time cases in Simulation I using Gibbs, $n = 200$ .

	Method		P	BIAS		RMSE		SD		CPU Time
	Method		P	$β_{0}$	$β_{1}$	$β_{0}$	$β_{1}$	$β_{0}$	$β_{1}$	CPU Time
with	C1	M1	0.25	0.009	0.034	0.059	0.107	0.059	0.102	7273
			0.5	0.013	0.047	0.055	0.092	0.054	0.079	8157
			0.75	0.039	0.114	0.076	0.162	0.066	0.115	8925
		M2	0.25	0.010	0.078	0.070	0.141	0.070	0.119	16,166
			0.5	0.054	0.046	0.078	0.109	0.056	0.099	16,270
			0.75	0.075	0.017	0.098	0.112	0.063	0.111	15,185
	C2	M1	0.25	0.065	0.045	0.089	0.120	0.062	0.112	8202
			0.5	0.011	0.051	0.056	0.122	0.055	0.111	8962
			0.75	0.029	0.077	0.060	0.134	0.053	0.110	9922
		M2	0.25	0.085	0.095	0.115	0.159	0.076	0.127	16,220
			0.5	0.048	0.145	0.080	0.179	0.063	0.105	15,989
			0.75	0.038	0.193	0.078	0.109	0.067	0.122	15,570
	C3	M1	0.25	0.077	0.034	0.078	0.062	0.029	0.055	8646
			0.5	0.167	0.088	0.165	0.117	0.042	0.085	9535
			0.75	0.248	0.131	0.123	0.102	0.060	0.131	8627
		M2	0.25	0.137	0.037	0.127	0.093	0.040	0.090	16,323
			0.5	0.129	0.078	0.130	0.103	0.042	0.074	16,661
			0.75	0.139	0.105	0.168	0.126	0.044	0.081	16,555
without	C1	M1	0.25	0.038	0.010	0.056	0.073	0.041	0.073	5721
			0.5	0.071	0.005	0.081	0.070	0.038	0.070	6958
			0.75	0.112	0.012	0.120	0.078	0.045	0.078	5717
		M2	0.25	0.038	0.101	0.056	0.073	0.041	0.073	12,503
			0.5	0.071	0.005	0.081	0.070	0.081	0.070	12,324
			0.75	0.112	0.012	0.120	0.073	0.045	0.078	12,901
	C2	M1	0.25	0.101	0.004	0.108	0.070	0.108	0.070	6989
			0.5	0.101	0.008	0.108	0.070	0.038	0.069	6934
			0.75	0.033	0.007	0.049	0.081	0.036	0.081	6892
		M2	0.25	0.110	0.010	0.116	0.073	0.039	0.073	12,901
			0.5	0.067	0.048	0.079	0.088	0.041	0.074	12,484
			0.75	0.083	0.046	0.092	0.100	0.042	0.089	12,560
	C3	M1	0.25	0.029	0.001	0.038	0.055	0.025	0.055	6922
			0.5	0.082	0.007	0.091	0.081	0.040	0.081	6888
			0.75	0.142	0.008	0.153	0.105	0.058	0.105	6896
		M2	0.25	0.047	0.016	0.056	0.052	0.031	0.049	12,359
			0.5	0.113	0.032	0.120	0.089	0.042	0.084	12,426
			0.75	0.175	0.044	0.187	0.118	0.065	0.110	12,462

Table A2

Parameter estimation and CPU time cases in Simulation II using Gibbs, $n = 200$ .

	Method		P	BIAS		RMSE		SD		CPU Time
	Method		P	$β_{0}$	$β_{1}$	$β_{0}$	$β_{1}$	$β_{0}$	$β_{1}$	CPU Time
with	C1	M1	0.25	0.032	0.019	0.042	0.045	0.028	0.041	6159.5
			0.5	0.001	0.044	0.023	0.055	0.023	0.033	6180.5
			0.75	0.047	0.12	0.054	0.13	0.027	0.05	9567.5
		M2	0.25	0.113	0.09	0.128	0.116	0.064	0.076	9144.5
			0.5	0.037	0.087	0.04	0.1	0.03	0.051	10,985
			0.75	0.049	0.018	0.057	0.185	0.028	0.046	10,882
	C2	M1	0.25	0.025	0.027	0.026	0.034	0.01	0.021	7521
			0.5	0.007	0.02	0.012	0.024	0.009	0.013	6396.5
			0.75	0.006	0.032	0.012	0.037	0.01	0.018	7632
		M2	0.25	0.083	0.087	0.084	0.087	0.028	0.05	9013
			0.5	0.019	0.036	0.021	0.04	0.01	0.018	8655.5
			0.75	0.017	0.061	0.021	0.067	0.011	0.028	9447
	C3	M1	0.25	0.085	0.05	0.089	0.064	0.028	0.042	7583
			0.5	0.022	0.041	0.032	0.057	0.024	0.039	6798.5
			0.75	0.022	0.076	0.035	0.089	0.027	0.046	7240
		M3	0.25	0.033	0.013	0.041	0.044	0.025	0.041	11,252.5
			0.5	0.029	0.083	0.035	0.091	0.018	0.036	11,173.5
			0.75	0.047	0.011	0.053	0.117	0.023	0.044	11,873.5
without	C1	M1	0.25	0.034	0.032	0.047	0.06	0.032	0.051	4545
			0.5	0.019	0.013	0.034	0.047	0.028	0.045	4625
			0.75	0.007	0.037	0.031	0.068	0.03	0.057	5358
		M2	0.25	0.079	0.03	0.112	0.099	0.105	0.108	8190
			0.5	0.082	0.012	0.089	0.056	0.033	0.055	8599.5
			0.75	0.062	0.023	0.076	0.052	0.039	0.047	8877.5
	C2	M1	0.25	0.027	0.008	0.031	0.027	0.014	0.025	4754
			0.5	0.023	0.001	0.033	0.029	0.023	0.029	4944.5
			0.75	0.028	0.001	0.04	0.05	0.029	0.05	5175
		M2	0.25	0.04	0.003	0.044	0.031	0.018	0.032	8334.5
			0.5	0.035	0.002	0.04	0.03	0.019	0.03	8072.5
			0.75	0.036	0.004	0.049	0.043	0.031	0.043	8232
	C3	M1	0.25	0.028	0.01	0.031	0.026	0.014	0.024	4822
			0.5	0.023	0.002	0.032	0.029	0.023	0.029	4756
			0.75	0.028	0.001	0.04	0.05	0.029	0.05	4955.5
		M2	0.25	0.04	0.02	0.041	0.029	0.01	0.023	7966
			0.5	0.035	0.002	0.04	0.03	0.019	0.03	7895
			0.75	0.036	0.004	0.048	0.043	0.031	0.043	8127.5

References

1. Koenker, R.; Bassett, G. Regression quantiles. Econom. Econom. Soc.; 1987; 46, pp. 33-50.

2. Baur, D.G.; Dimpfl, T.; Jung, R.C. Stock return autocorrelations revisited: A quantile regression approach. J. Empir. Finance; 2012; 19, pp. 254-265. [DOI: https://dx.doi.org/10.1016/j.jempfin.2011.12.002]

3. Huang, L.; Zhu, W.; Saunders, C.P.; MacLeod, J.N.; Zhou, M.; Stromberg, A.J.; Bathke, A.C. A novel application of quantile regression for identification of biomarkers exemplified by equine cartilage microarray data. BMC Bioinform.; 2008; 9, 300. [DOI: https://dx.doi.org/10.1186/1471-2105-9-300] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18597687]

4. Cade, B.S.; Barry, R.N. A gentle introduction to quantile regression for ecologists. Front. Ecol. Environ.; 2003; 1, pp. 412-420. [DOI: https://dx.doi.org/10.1890/1540-9295(2003)001[0412:AGITQR]2.0.CO;2]

5. Yu, K.; Stander, J. Bayesian analysis of a Tobit quantile regression model. J. Econom.; 2007; 137, pp. 260-276. [DOI: https://dx.doi.org/10.1016/j.jeconom.2005.10.002]

6. Hideo, K.; Kobayashi, G. Gibbs sampling methods for Bayesian quantile regression. J. Stat. Comput. Simul.; 2011; 81, pp. 1565-1578.

7. Alhamzawi, R. Bayesian Analysis of Composite Quantile Regression. Stat. Biosci.; 2016; 8, pp. 358-373. [DOI: https://dx.doi.org/10.1007/s12561-016-9158-8]

8. Yuan, X.; Xiang, X.; Zhang, X. Bayesian composite quantile regression for the single-index model. PLoS ONE; 2023; 18, e0285277. [DOI: https://dx.doi.org/10.1371/journal.pone.0285277]

9. Hu, Y.; Wang, H.J.; He, X.; Guo, J. Bayesian joint-quantile regression. Comput. Stat.; 2020; 36, pp. 2033-2053. [DOI: https://dx.doi.org/10.1007/s00180-020-00998-w]

10. Li, Q.; Lin, N.; Xi, R. Bayesian regularized quantile regression. Bayesian Anal.; 2010; 5, pp. 533-556. [DOI: https://dx.doi.org/10.1214/10-BA521]

11. Alhamzawi, R.; Yu, K. Variable selection in quantile regression via Gibbs sampling. J. Appl. Stat.; 2012; 39, pp. 799-813. [DOI: https://dx.doi.org/10.1080/02664763.2011.620082]

12. Alhamzawi, R.; Mallick, H. Bayesian reciprocal LASSO quantile regression. Commun. Stat. Simul. Comput.; 2020; 51, pp. 6479-6494. [DOI: https://dx.doi.org/10.1080/03610918.2020.1804585]

13. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; 3rd ed. John Wiley & Sons: New York, NY, USA, 2019.

14. Ying, Y.; Yin, G. Bayesian Quantile Regression for Longitudinal Studies with Nonignorable Missing Data. Biometrics; 2010; 66, pp. 105-114.

15. Zhao, P.-Y.; Tang, N.-S.; Jiang, D.-P. Efficient inverse probability weighting method for quantile regression with nonignorable missing data. Statistics; 2017; 51, pp. 363-386. [DOI: https://dx.doi.org/10.1080/02331888.2016.1268615]

16. Wang, Z.; Tang, N. Bayesian Quantile Regression with Mixed Discrete and Nonignorable Missing Covariates. Bayesian Anal.; 2020; 15, pp. 579-604. [DOI: https://dx.doi.org/10.1214/19-BA1165]

17. Tang, N.; Chow, S.-M.; Ibrahim, J.G.; Zhu, H. Bayesian Sensitivity Analysis of a Nonlinear Dynamic Factor Analysis Model with Nonparametric Prior and Possible Nonignorable Missingness. Psychometrika; 2017; 82, pp. 875-903. [DOI: https://dx.doi.org/10.1007/s11336-017-9587-4] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29030749]

18. Tuerde, M.; Tang, N. Bayesian semiparametric approach to quantile nonlinear dynamic factor analysis models with mixed ordered and nonignorable missing data. Statistics; 2022; 56, pp. 1166-1192. [DOI: https://dx.doi.org/10.1080/02331888.2022.2121399]

19. Hoffman, M.D.; Blei, D.M.; Wang, C.; Paisley, J. Stochastic variational inference. J. Mach. Learn. Res.; 2012; 14, pp. 1303-1347.

20. Beal, M.J. Variational Algorithms for Approximate Bayesian Inference; University of London, University College London: London, UK, 2003.

21. Ganguly, A.; Jain, S.; Watchareeruetai, U. Amortized Variational Inference: Towards the Mathematical Foundation and Review. arXiv; 2022; arXiv: 2209.10888

22. Faes, C.; Ormerod, J.T.; Wand, M.P. Variational Bayesian Inference for Parametric and Nonparametric Regression with Missing Data. J. Am. Stat. Assoc.; 2011; 106, pp. 959-971. [DOI: https://dx.doi.org/10.1198/jasa.2011.tm10301]

23. Spaanberg, E. Variational Inference of Dynamic Factor Models with Arbitrary Missing Data. arXiv; 2022; arXiv: 2207.01976

24. Liu, Q.; Li, J.; Dong, M.; Liu, M.; Chai, Y. Identification of gene regulatory networks using variational bayesian inference in the presence of missing data. IEEE/ACM Trans. Comput. Biol. Bioinform.; 2022; 20, pp. 399-409. [DOI: https://dx.doi.org/10.1109/TCBB.2022.3144418] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35061589]

25. Li, Y.; Zhu, J. L1-Norm Quantile Regression. J. Comput. Graphical Stat.; 2008; 17, pp. 163-185. [DOI: https://dx.doi.org/10.1198/106186008X289155]

26. Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006.

27. Wasserman, L. All of Statistics: A Concise Course in Statistical Inference; Springer: New York, NY, USA, 2004.

28. Jim, A.; Chib, S. Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc.; 1993; 88, pp. 669-679.

29. Polson, N.G.; Scott, J.G.; Windle, J. Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables. J. Am. Stat. Assoc.; 2012; 108, pp. 1339-1349. [DOI: https://dx.doi.org/10.1080/01621459.2013.829001]

30. Durante, D.; Rigon, T. Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models. Statist. Sci.; 2019; 34, pp. 472-485. [DOI: https://dx.doi.org/10.1214/19-STS712]

Word count: 8735

Show less

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Quantile regression models are remarkable structures for conducting regression analyses when the data are subject to missingness. Missing values occur because of various factors like missing completely at random, missing at random, or missing not at random. All these may result from system malfunction during data collection or human error during data preprocessing. Nevertheless, it is important to deal with missing values before analyzing data since ignoring or omitting missing values may result in biased or misinformed analysis. This paper studies quantile regressions from a Bayesian perspective. By proposing a hierarchical model framework, we develop an alternative approach based on deterministic variational Bayes approximations. Logistic and probit models are adopted to specify propensity scores for missing manifests and covariates, respectively. Bayesian variable selection method is proposed to recognize significant covariates. Several simulation studies and real examples illustrate the advantages of the proposed methodology and offer some possible future research directions.

Details

Title

Variational Bayesian Inference for Quantile Regression Models with Nonignorable Missing Data

Author

Li, Xiaoning; Tuerde, Mulati; Hu, Xijian

First page

3926

Publication year

2023

Publication date

2023

Publisher

MDPI AG

e-ISSN

22277390

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/math11183926

ProQuest document ID

2869428714

Variational Bayesian Inference for Quantile Regression Models with Nonignorable Missing Data

Jump to:

Full text

Abstract

Details

Suggested sources