Content area

Abstract

In this paper, we propose a smoothing estimation method for censored quantile regression models. The method associates the convolutional smoothing estimation with the loss function, which is quadratically derivable and globally convex by using a non-negative kernel function. Thus, the parameters of the regression model can be computed by using the gradient-based iterative algorithm. We demonstrate the convergence speed and asymptotic properties of the smoothing estimation for large samples in high dimensions. Numerical simulations show that the smoothing estimation method for censored quantile regression models improves the estimation accuracy, computational speed, and robustness over the classical parameter estimation method. The simulation results also show that the parametric methods perform better than the KM method in estimating the distribution function of the censored variables. Even if there is an error setting in the distribution estimation, the smoothing estimation does not fluctuate too much.

Full text

Turn on search term navigation

1. Introduction

Since the regression quantile has similar robustness as the sample quantile, the quantile regression model can well characterize the conditional distribution of the response variable y when the independent variable x is given, thus characterizing the link between the two [1]. Response variable yR and p-dimensional covariates x=(x1,...,xp)T are given, and Fy|x(·) is the conditional distribution function of y concerning x. The classical quantile regression model can be expressed as

(1)Fy|x1(τ)=[β(τ)]Tx,

where β=β(τ)=(β1(τ),...,βp(τ))TRp is the regression coefficient. Koenker, R. et al. [2,3] provided a detailed discussion of quantile regression modeling in terms of methodology, theory, and computation. Because the loss function of quantile regression is not smooth, its computational complexity increases dramatically. Considering that the loss function is not derivable, Horowitz [4] used the smooth function to approximate the indicator function, in order to smooth the objective function. This method has also been applied to deal with other quantile regression-related problems, in which de Castro, L. et al. [5] used a smooth moment estimating equation to estimate the parameter in quantile regression; Chen et al. [6] used a smoothing method to study the quantile regression model with constraints; Galvao, A.F. and Kato, K. [7] studied the smoothing estimation of fixed effects in quantile regression models for panel data; Whang [8] discussed the empirical likelihood estimation of quantile regression models by using a smoothing approach. Although the above literature has solved the problem that the objective function is not derivable, it cannot guarantee the concavity of the objective function. Thus, there is no guarantee that the result is a global optimal solution. Recently, Fernandes et al. [9] proposed a convolutional smoothing method for estimating fixed dimensional parameters of the quantile regression model, based on which the loss function is quadratically derivable and convex, and the method outperforms the original smoothing estimation in terms of estimation accuracy. Under the complete observation data, He [10] used the convolutional smoothing method to estimate the parameters of the high-dimensional quantile regression model, and the loss function after smoothing is a quadratically derivable and convex function. When solving the minimum point of the smoothed objective function numerically, the gradient descent algorithm [10] is used to replace the quantile regression with the least squares estimation, which effectively shortens the computation time and improves the estimation accuracy.

In empirical studies, it is frequently observed that variables of interest are subject to censoring. For instance, in the study [11] on the survival times of AIDS patients, it was found that 43% of the data were right-censored. For parameter estimation in quantile regression models for right-censored data, we refer to Ying, Z. et al. [12], Honoré, B. et al. [13], Portnoy, S. [14], Peng, L. [15], and Yuan, X. et al. [16]. Parameter estimation in quantile regression models for censored data [17,18,19] has been extensively studied, including right-censored data. In the quantile regression for right-censored data, the problem of non-smooth loss functions still exists, and it has already been studied [20,21,22,23]. For the right-censored quantile regression model with fixed dimensional parameters, Peng, L. and Huang, Y. [20], Xu, G. et al. [21], Cai, Z. and Sit, T. [22], Kim, K.H. [23] considered the smoothing estimation methods, respectively. For the high-dimensional large-sample case of the censored quantile regression model, Wu, Y. et al. [24], He, X. et al. [25], and Fei, Z. et al. [26] used a similar method to Fernandes et al. [9] to smooth the estimating equations, which improved the estimation accuracy over classical parameter estimation methods. However, these smoothing estimation methods for high-dimensional censored quantile regression need to grid the quantiles as 0=τ1<<τj<τj+1<τk=1, to ensure that the approximation error of the estimation equation will not be too large. Furthermore, we need to use the estimates of β(τ1),,β(τj) before estimating β(τj+1), and the estimation accuracy depends on the number of segmentation points, which also increases the computational complexity.

Considering the censored data, this paper extends the convolutional smoothing method [10] to the high-dimensional censored quantile regression model and proposes the coefficient estimation of the censored quantile regression model based on the convolutional smoothing method. Under certain conditions, the loss function of the smoothed censored quantile regression model is quadratically derivable and globally convex, and the gradient-based iterative algorithm could be used to calculate the regression parameters. In this paper, the bias of the smoothing estimation of the censored quantile regression is characterized under certain conditions, and the speed of convergence, the Bahadur–Kiefer representation, and the Berry–Esseen upper bound of the smoothing estimation of the censored quantile regression are established under high-dimensional and large-sample conditions. Moreover, compared with the classical parameter estimation method, which has different estimation accuracies at different quantile points, the smoothing estimation method basically maintains the same estimation accuracies at each quantile point, and the proposed method greatly saves the computation time under high-dimensional large-sample condition. In summary, the key contributions and significance of this article are as follows: (1) To our knowledge, this article is the first to apply the convolutional smoothing method to high-dimensional censored data regression analysis. Compared with the non differentiability of the objective function in classical censored quantile regression, the objective function in this method is second-order differentiable, which brings great convenience to the establishment of gradient based calculation algorithms and the discussion of theoretical properties. (2) For high-dimensional data scenarios, under certain regularization conditions, this paper also establishes the asymptotic consistency, asymptotic normality, and other theoretical properties of the proposed smooth estimator, ensuring the good properties of statistical inference.

This paper is organized as follows. Section 2 proposes a convolutional smoothing estimation method for high-dimensional censored quantile regression models, and gives the asymptotic properties of the smoothing estimation. In Section 3, numerical simulations of the smoothing estimation and the classical parameter estimation are carried out for the low and high dimensional cases, and the estimation accuracy and computational speed of the smoothing method in the censored quantile regression analysis are discussed. The discussion and conclusion are given in Section 4 and Section 5, and then a detailed proof of the asymptotic property is given in Appendix A.

2. Methods

In this paper, we consider a right-censored quantile regression model; i.e., we assume that the response variable y in the model (1) is right-censored, in which case the observed variables are z=min(y,c),δ=I(yc), where c is the censored variable. The observations are denoted as {(zi,xi,δi),i=1,...,n}. Then, the estimation of the parameters in the censored quantile regression model is defined as

(2)β^(τ)minβRpQ(β)=minβRp1ni=1nδi1G^(zi)ρτ(ziβTxi),

where ρτ(u)=u{τI(u<0)} is the τ quantile loss function, and G(·) is the distribution function of the censored variable ci, and G^(·) is the estimate of G(·).

Convolutional smoothing estimation for high-dimensional quantile regression models has been studied in the literature [10] by He et al. In this paper, we apply the method to censored data. Let K(·) be the kernel function with integral 1, and h be the window width. Denote the following:

Kh(u)=h1K(u/h),Kh(u)=K(u/h),K(u)=uK(υ)dυ,uR.

Then, the objective function for the smoothing estimation of the censored quantile regression can be written as

(3)Qh(β)=1ni=1nδi1G^(zi)h(ziβTx),

where h(u)=(ρτKh)(u)=+ρτ(υ)Kh(υu)dυ and “∗” denotes the convolution operator. The convolutional smoothing estimate (denoted SCQ) of the censored quantile regression model is defined as the point of minimum of Qh(β), β^h. For any βRp, write Qh*(β)=EQh(β), with

(4)βh*(τ)argminβRpQh*(β).

For ease of presentation and without ambiguity, β*(τ)argminβRpEQ(β) and βh*(τ) are abbreviated as β* and βh*, respectively.

It is easy to find that the loss function of the (3) is quadratically derivable, and its gradient and Hessian matrix can be expressed, respectively, as follows:

(5)Qh(β)=1ni=1nδi1G^(zi){Kh(βTxizi)τ}xi,

2Qh(β)=1ni=1nδi1G^(zi)Kh(ziβTxi)xixiT.

As long as the kernel function K(·) is non-negative, for any window width h>0, Qh(β) is a convex function and βh*=βh*(τ) satisfies the first-order condition Qh(βh*)=0.

Remark 1. 

When estimating the distribution of the censored variable ci, we can use KM estimation to estimate G^(·). If we assume that the form of the distribution G(·) of the censored variable ci is known, even if there is a mis-specification of the distribution, we find through the subsequent simulation that the estimation of the distribution of the censored variable by the parametric method is better than that of the distribution of the censored variable by using the KM estimation before smoothing the estimation of the regression parameter in most cases; thus, we rewrite the distribution function of the censored variable ci as follows G^(·)=G(·,θ^n). Where the parameter vector θnRp, θ^n is the maximum likelihood estimate of θn. The parametric distribution form is used for estimation in both the proof and the hypotheses.

To give theoretical results, we assume that the covariate x has been centered. Given the vectors u,υRp, uTυ and u,υ both represent their inner product: ab=max{a,b}, where a and b are constants. The ·q(1q<) denotes the q-paradigm, i.e., uq=(i=1p|ui|q)1/q, and u=max1ip|ui|, where ui denotes the ith element of the p-dimensional real vector u. Given a semi-positive definite matrix ΣRp×p, define uΣ=Σ1/2u2 for any vector uRp. For all real numbers r0, define Bp(r)={βRp:β2r} and Sp1(r)={βRp:β2=r}. For two non-negative sequences {an}n1 and {bn}n1, anbn denotes the existence of a constant C>0 independent of n such that anCbn. anbn is equivalent to bnan. anbn is equivalent to anbn and bnan holds simultaneously. The assumptions required for the theorem are as follows.

A1. The non-negative kernel function K(·) satisfies K(u)=K(u), with upper bound κ:=supuRK(u)<+, and κk:=+|u|kK(u)du<+, where k=1,2.

A2. The regression error term ε on the conditional density function fε|x(·) on x satisfies the Lipschtiz condition; i.e., there exists a constant L>0 such that u,υR, there is |fε|x(u)fε|x(υ)|L|uυ| which holds almost everywhere. There exist real numbers f̲>0 such that fε|x(0)f̲ holds almost everywhere for any x.

A3. There exist positive constants C1 and Kj such that

(6)G(z;θ)θjKj(z),E{Kj2(z)}C1<.(j=1,...,d)

Denote z(i) as the ith order statistic of {zi}, and δ(i) as the corresponding indicator function. z(i) and δ(i) satisfy

(7)P(δ(n)=1|z(n))>0.

A4. The covariate x obeys a subexponential distribution—i.e., there exists υ0>0 such that for any uSp1 and t0, there are P{|u,ω|υ0t}et, where Σ=E{xxT} is positive definite with ω=Σ1/2x.

With ω=Σ1/2x and positive integer k, we define mk=supuSp1E|u,ω|k. The following theorems can be obtained.

Theorem 1 

(Upper bound on the estimation error). Suppose the condition A1A4 holds for any real number t>0. If h satisfies the constraints f̲1m31/2υ0*Pd(p+t)/nhf̲m31/2, Pd=iδin is the uncensored proportion, then the convolutional smoothing estimate β^h satisfies the boundary conditions

(8)Pβ^hβ*ΣRf̲υ0*log(log(1h1))+p+tn+Lκ2h212et,

where υ0*=C0υ0. C0 and R are positive constants.

The upper bounds on the estimation error υ0*log(log(1h1))+p+tn and Lκ2h2 can be interpreted as the prediction bias and the speed of convergence. A smaller h leads to smaller deviations after smoothing, but an h that is too small could result in overfitting and slow convergence rates. According to Theorem 1, h satisfies the condition (p+t)/nh1. In order to obtain a non-asymptotic Bahadur representation of the smoothing estimation, we tend to replace A4 with A4*.

A4*. The covariate x obeys a sub-Gaussian distribution—i.e. there exists υ1>0 such that for any uSp1 and t0, there is a P{|u,ω|υ1t}2et2/2, where Σ=E{xxT} is positive definite with ω=Σ1/2x.

Theorem 2 

(non-asymptotic Bahadur representation). Assuming conditions A1A3 and A4* hold, supuRfε|x(u)f¯ holds almost everywhere. For any real number t>0, h satisfies the constraint f̲1m31/2υ1*Pd(p+t)/nhf̲m31/2. Let Jh=2Qh(β*)=E{[δ1G(z)]2Kh(ε)xxT}; then,

(9)P{Jh(β^hβ*)1ni=1n{τKh(εi)}xiΣRp+tnh1/2+h3/2·p+tn+h4}13et,

where the real number R>0 is a constant independent of p and n.

Theorem 2 allows for establishing the limiting distribution of the estimators. Based on the non-asymptotic representation in Theorem 2, we establish the Berry–Esseen upper bound for smoothing estimators.

Theorem 3 

(Berry–Esseen upper bound). Assuming that the conditions A1A3 and A4* hold, supuRfε|x(u)f¯ holds almost everywhere, and h satisfies the condition that for any real number t>0, there exists p+tnh1. Then,

(10)Λn,p(h):=supxR,aRpP(n1/2σh1a,β^hβ*x)Φ(x)p+logn(nh)1/2+n1/2h2,

where σh2=σh2(a)=aTJh1Eδ1G(z)2{Kh(ε)τ}2xxTJh1a, Φ(·) denotes the standard normal distribution function.

Further, if fε|x(·) is quadratically continuously derivable and satisfies |fε|x(u)fε|x(υ)|l2(x)|uυ| for any real number u,υR, where the function l2:RpR+ satisfies E{l22(x)}C for some positive constant C, then

sup x R , a R p | P ( n 1 / 2 σ h 1 a , β ^ h β * + 0.5 κ 2 h 2 J h 1 E { f ε | x ( 0 ) x i } ) Φ ( x ) |

(11) p + log n ( n h ) 1 / 2 + ( p + log n ) 1 / 2 h 3 / 2 + n 1 / 2 h 4 .

Theorem 3 proves that when h is chosen in the appropriate range and n,p, the linear combination of β^h is estimated to be asymptotically normal. According to Theorem 3, the optimal h={(p+logn)/n}2/5 in the sense that minimizes the right hand of (10), and the error is approximated as (p+logn)4/5n3/10. If p8/n30, for any given vector aRp, n1/2a,β^hβ* is asymptotically normal.

Remark 2. 

The assumptions A1, A2, A4 and A4* are commonly used in convolutional smoothing estimation of high-dimensional quantile regression models in fully observed data [10]. Condition A3 refers to the assumptions concerning the distribution of the censoring variable c. Note that P(δ(n)=1|z(n))=P(z(n)c|z(n))=z(n)+dG(s)=1G(z(n)), assuming that P(δ(n)=1|z(n))>0 is equivalent to G(z(n))<1—i.e., the probability that the largest observation z(n) equals to the true variable of interest is not zero.This is a commonly used condition in statistical inference for censored data [27,28], and this condition can avoid the situation where a large number of observations are censored data. Assuming that (6) provides a local smoothing condition for G(·) in the neighborhood θ, the validity of this assumption could be verified intuitively for many commonly used distribution functions G(·).

3. Numerical Simulation

In this section, the smoothing estimation and classical parameter estimation of the quantile regression model for censored data are numerically simulated, and two cases of low and high dimensionality are considered. Estimation of (2), as proposed in the literature [17], was chosen as the classical parameter estimation of the censored quantile regression model. Notice that the objective function of the classical parameter estimation for censored quantile regression model can be rewritten as Q(β)=i=1nρτ[δi1G(zi)(ziβTx)]. Therefore, when calculating the regression parameters, the censoring problem is transformed into a non-censoring problem, and the objective function of the smoothing estimation for censored quantile regression is rewritten as Qh(β)=i=1nh[δi1G(zi)(ziβTx)]. The Gaussian kernel and window width h={(p+logn)/n}2/5 are taken as the kernel function and window width, respectively, for the smoothing estimation of the censored quantile regression.

3.1. Model Setting and Evaluation Indicators

In the simulation, the covariates xi=(xi1,...,xip)TRp are generated from different distributions to simulate different types of variables commonly found in real data. The error term εi is generated by three different distributions, specifically, by drawing independent identically distributed random numbers ε˜i with capacity n, and letting εi=ε˜iFε˜i1(τ), where ε˜i obeys the distributions: (i)t(4); (ii)χ2(1); (iii)Laplace(0,1). Let the regression coefficient β=(1,...,1)TRp, given the quantile τ(0,1); then, the response variable yi is modeled by

yi=β1xi1+β2xi2+...+βpxip+εi.

For both low- and high-dimensional models, the right censoring variable is set as ciconst+E(pa), where const,pa are unknown parameters, which can take different values to make the censoring ratio of the response variable yi up to the set 15%, 30%, or 45%. In the actual simulation, in order to obtain the value of G(zi), the parameters const,pa are estimated by using maximum likelihood estimation in the simulations of Section 3.2 and Section 3.3. In Section 3.4, we discuss the smoothing estimation under the misspecification for the distribution of censored variables. KM estimation is also taken into consideration. Let the number of simulation repetitions be K, and for the parameter estimates β^ in the kth simulation, write

SEk=1pβ^β22,

Then, we can use

DMSE=1Kk=1KSEk,

to evaluate the performance of classical parameter estimation for censored quantile regression models (CQ) and smoothing estimation methods (SCQ). In the actual simulation, we set K=500.

3.2. Low-Dimensional Performance of Regression Smoothing Estimates for Censored Quantiles

In the low-dimensional numerical study of smoothing estimation, the number of covariates is set to be p=5 and sample sizes are 100, 200, and 500. In order to assess the performance of smoothing method in the low-dimensional case, the generation of covariates is categorized into three cases.

  • Case 1: The p-dimensional covariates are generated from multivariate uniform distribution on [10,20]p, and the covariance matrix is a unit matrix;

  • Case 2: The p-dimensional covariate is generated from multivariate uniform distribution on [10,20]p, where Σ=(0.5|jk|)1j,kp is the covariance matrix;

  • Case 3: The p-dimensional covariates consist of a mixture of distributions, where the first two dimensional covariates are generated by a multivariate uniform distribution on [10,20]2, with a covariance matrix of Σ1=(0.5|jk|)1j,k2. The covariates in the posterior three dimensions are generated from N(μ,Σ2) with mean μ=(11,12,13) and covariance matrix Σ2=(σjk)1j,kp, where

    σjk=0.2|jk|,1jkp,3,1j=kp.

Table 1, Table 2 and Table 3 show the simulation results when the covariates are generated according to the three scenarios, where CP denotes the censoring ratio of the response variable, n is the sample size, and columns 3–12 show the results of CQ and SCQ at different quantiles. From the estimation results, when the regression errors are generated by symmetric distributions, i.e., t and Laplace distributions, SCQ has higher accuracy than CQ, especially at the lower and higher quantiles. When the regression error term is generated by the χ2 distribution, the estimation accuracy of CQ decreases as τ increases from a global perspective, and the estimation accuracy of SCQ is much better than that of CQ at the higher quantiles, although CQ is better than SCQ at the lower quantiles. This may be because the density function of the χ2 distribution is biased and the observations are excessively clustered in the lower quantiles, so that CQ decreases in estimation accuracy as the number of quantiles increases, while SCQ maintains better estimation accuracy. It can be seen from Table 1, Table 2 and Table 3 that SCQ is more stable than CQ, regardless of whether the error terms follow symmetric or asymmetric distribution. Specifically, the estimation accuracy of SCQ is almost the same in all quantiles, while that of CQ fluctuates with the change of τ, especially in the case of the asymmetric distribution of the error terms. Overall, the estimation accuracy of CQ depends greatly on the value of τ, the size of the censoring ratio, and the distribution of the error term, while the estimation effect of SCQ is minimally affected by these factors and shows good robustness.

3.3. High-Dimensional Performance of Smoothing Estimators of Censored Quantile Regression

In high-dimensional large-sample numerical studies, the ratio of sample size to dimension is fixed at n/p=20, the sample size is set at 1000–5000, and the step size is 500. In order to simulate the smoothing estimation of censored quantile regression with the change of dimension and sample size, the covariate generation is categorized into three cases.

  • Case 1: The p-dimensional covariate is generated from a multivariate uniform distribution on [10,20]p, with covariance matrix Σ=(0.5|jk|)1j,kp.

  • Case 2: The p-dimensional covariate is generated from a multivariate uniform distribution on [10,20]p, with covariance matrix Σ=(σjk)1j,kp, where

    σjk=0.2|jk|,1jkp,3,1j=kp.

  • Case 3: The p-dimensional covariate consists of a mixture of distributions, where the first [p/2]-dimensional covariate is generated from a multivariate uniform distribution on [10,20][p/2], and the covariance matrix is Σ1=(0.5|jk|)1j,k[p/2]; the second [p/2]+1-dimensional covariate is generated by N(μ,Σ2) with mean μ=(11,....,30), where the jth component μj=11+20(j1)[p/2]+1(j=1,,[p/2]+1), and covariance matrix Σ2=(σjk)1j,k[p/2]+1, where

    σjk=0.2|jk|,1jk[p/2]+1,3,1j=k[p/2]+1.

The ratio of DMSE between CQ and SCQ is firstly calculated, and the simulation results of the covariates under Case 1 are displayed in Figure 1. Since the results of the three covariate generation cases are very similar, we do not show the results of Case 2 and Case 3. The results in Figure 1 show that the ratio of DMSE estimated by CQ and SCQ is not significantly affected by changes in sample size and dimensionality. When the regression error terms are generated by symmetric distributions, i.e., the t-distribution and Laplace-distribution, the DMSE ratios of the regression coefficients estimators between CQ and SCQ remain above one. This indicates that SCQ has a higher precision compared with CQ, especially in the case where the difference between the lower quantile of τ=0.1 and the upper quantile of τ=0.9 is more obvious, and the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ can reach three. When the error term is generated by the asymmetric χ2 distribution, the ratio of DMSE between CQ and SCQ is less than one at the low quantile level, and CQ is superior to SCQ in this case. However, the ratio of DMSE between the two methods increases as τ increases, and the ratio reaches around 10 when τ=0.9. In order to clarify the reason for this phenomenon, we calculate the DMSE of two estimation methods when the error term is generated by the χ2 distribution. As shown in Figure 2, CQ performs better when τ = 0.1. As τ increases, the DMSE of CQ increases while its growth rate gradually increases. The CQ’s DMSE is significantly higher than the SCQ’s when τ=0.9, whereas the SCQ’s DMSE stabilizes at each quantile point in a straight line. The observed phenomenon aligns with its lower-dimensional counterpart, underscoring that the estimation accuracy of the CQ depends on the magnitude of τ, the censoring ration of response variables and the distribution characteristics of the error term. In contrast, the estimation accuracy of the SCQ exhibits robustness against variations in these factors.

To assess the computational efficiency of the smoothing estimation, it is imperative to compare both the DMSE of CQ and SCQ estimations within the context of high-dimensional simulations, as well as the time expenditure associated with each estimation method. In Figure 3, the computational time ratios of CQ and SCQ are both greater than 1, which tends to increase with the augmentation of dimensionality and the sample size, and the computational time ratios of CQ and SCQ are more than 10 in all the cases when the sample size is n=5000. The results of the covariate generation in Case 2 and Case 3, which are not shown here, are similar to those in Case 1. Combined with the previous study, it is clear that, when compared to CQ, SCQ significantly decreases calculation time and increases estimation accuracy in the majority of circumstances.

3.4. Robustness of Smoothing Estimates

In order to compare the effects under the misspecification for the distribution of censored variables and KM estimation of censored distributions on the smoothing estimation of parameters, we choose Case 1 of the covariate generation method in the low- and high-dimensional simulations for numerical research.

When the sample size is 200, the model coefficients are estimated using convolutional smoothing after assuming that the distribution of the censored variables is misclassified as Normal, Weibull, and Lognormal, and using the KM estimation of G with different conditions of τ quantiles, censoring proportions, and error terms. The simulation in Table 4 shows that the smoothing estimation using the misclassified distribution of the censored variables is more robust than the smoothing estimation using the KM estimation to estimate the distribution of the censored variables. Similar results are obtained for a sample size of 3000, as shown in Table 5.

The simulation shows that although there is a possibility of mis-setting in estimating the parameters of the distribution of the censored variables by using G^(·)=G(·,θ^n), this method is better and more robust than the smoothing of regression models after estimating the distribution of the censored variables by using the KM estimation. During the simulation process, it is also found in Table 4 and Table 5 that when the censored variables are estimated in different distribution forms, the smoothing estimation errors of the model coefficients exhibit a minimal degree of variation, and the estimation accuracy is higher than that of using KM estimation to estimate the distribution of the censored variables and then carry out the smoothing estimation. Regarding computational efficiency, the smoothing estimation method, when applied in scenarios of distributional misclassification, demonstrates a running time ratio to the SCQ that fluctuates within the range of 0.9 to 1.1. In contrast, the smoothing estimation method, once preceded by the KM estimation, incurs a significantly extended running time.

4. Discussion

In the smoothing estimation of censored quantile regression models, the distribution of the censored variables is usually unknown. The problem of the unknown distribution of the censored variables can be solved by estimating the density function of the censored variables through existing nonparametric methods, then determining the type of the censored distribution using the goodness-of-fit test, and accessing the unknown parameters in the distribution using the great likelihood method. In the numerical simulation, this paper also discusses using such a method to fit the unknown censored distribution G(·) to estimate G^(·)=G(·,θ^n), and then estimate the regression parameter β.

We also discuss the parametric smoothing estimation method in the case of misspecification of the distribution of the censored variables, and compare it with the smoothing estimation method of the parameters after estimating the distribution of the censored variables G^(·) with KM estimation. The simulation results show that the smoothing estimation method is still more robust than the method of estimating the distribution of censored variables G^(·) with KM estimation even if there is a misspecification in the estimation of the distribution of the censored variables. Meanwhile, the smoothing estimation method is more robust than the classical censored quantile regression model.

Our research has certain limitations and there are some issues that need to be further explored. Firstly, we have performed some analysis and research on parameter smoothing estimation for quantile linear models; moreover, we can plan further research on parameter estimation and interval estimation for complex models such as generalized linear models. Secondly, our proof is based on the assumption that the form of censored distribution is known, and the theoretical proof by using a nonparametric model to estimate the censored distribution is still a challenging task. This requires understanding and knowledge in the theory of nonparametric statistics and probability limits.

5. Conclusions

In this paper, a convolutional smoothing estimation method for the censored quantile regression model is proposed to address the problem that the loss function is not derivable. Our method associates the convolutional smoothing estimation with the loss function of censored quantile regression, which is quadratically derivable, compared with the classical censored quantile regression estimation. Moreover, the smoothing estimation method for censored quantile regression models improves the estimation accuracy, computational speed, and robustness over the classical parameter estimation method. The contribution and significance of this paper can be summarized as follows:

  1. The method establishes links between the convolutional smoothing method and the loss function of the censored quantile regression model, and the use of the non-negative kernel function ensures that the smoothed loss function is quadratically derivable and globally convex, which can be used to improve the computational speed by using the gradient-based iterative algorithm.

  2. Theoretically, we characterize the bias of the smoothing estimation for censored quantile regression and establish the convergence rate, Bahadur–Kiefer representation, and Berry–Esseen upper bound of the smoothing estimation under high-dimensional and large-sample conditions.

  3. The numerical simulation shows that the smoothing estimation method greatly reduces the computation time and improves the estimation accuracy in most cases, compared with the classical parameter estimation. In addition, the accuracy of CQ estimator is highly dependent on τ, censoring ratio(CP), and the distribution of error term, but the SCQ estimation is robust to these factors.

Author Contributions

Conceptualization, M.W.; methodology, M.W.; software, M.W.; validation, M.W., X.Z. and Q.G.; formal analysis, M.W.; investigation, M.W.; resources, M.W.; data curation, M.W.; writing—original draft preparation, M.W.; writing—review and editing, X.W., X.M. and J.W.; visualization, M.W.; supervision, X.Z. and Q.G.; project administration, X.Z. and Q.G.; funding acquisition, X.W., X.Z. and Q.G. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The datasets used and analyzed of this study are available from the corresponding author(s) on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables
View Image - Figure 1. Estimation results for the high-dimensional model when covariate X is generated from Case 1. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

Figure 1. Estimation results for the high-dimensional model when covariate X is generated from Case 1. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

View Image - Figure 1. Estimation results for the high-dimensional model when covariate X is generated from Case 1. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

Figure 1. Estimation results for the high-dimensional model when covariate X is generated from Case 1. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

View Image - Figure 2. The DMSE of CQ and SCQ for three high-dimensional covariate generation cases when the error terms obey the [Forumla omitted. See PDF.] distribution, with the horizontal coordinates denoting the different quantile levels, and the vertical axes denoting the DMSEs of the CQ and SCQ estimations scaled up by a factor of 10, where the solid line denotes the SCQ, and the dashed line denotes the CQ.

Figure 2. The DMSE of CQ and SCQ for three high-dimensional covariate generation cases when the error terms obey the [Forumla omitted. See PDF.] distribution, with the horizontal coordinates denoting the different quantile levels, and the vertical axes denoting the DMSEs of the CQ and SCQ estimations scaled up by a factor of 10, where the solid line denotes the SCQ, and the dashed line denotes the CQ.

View Image - Figure 2. The DMSE of CQ and SCQ for three high-dimensional covariate generation cases when the error terms obey the [Forumla omitted. See PDF.] distribution, with the horizontal coordinates denoting the different quantile levels, and the vertical axes denoting the DMSEs of the CQ and SCQ estimations scaled up by a factor of 10, where the solid line denotes the SCQ, and the dashed line denotes the CQ.

Figure 2. The DMSE of CQ and SCQ for three high-dimensional covariate generation cases when the error terms obey the [Forumla omitted. See PDF.] distribution, with the horizontal coordinates denoting the different quantile levels, and the vertical axes denoting the DMSEs of the CQ and SCQ estimations scaled up by a factor of 10, where the solid line denotes the SCQ, and the dashed line denotes the CQ.

View Image - Figure 3. Simulation results under the high-dimensional model when the covariates are generated from Case 1, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

Figure 3. Simulation results under the high-dimensional model when the covariates are generated from Case 1, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

View Image - Figure 3. Simulation results under the high-dimensional model when the covariates are generated from Case 1, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

Figure 3. Simulation results under the high-dimensional model when the covariates are generated from Case 1, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

The DMSE of CQ and SCQ estimators for the low-dimensional model with covariate X generated from Case 1, where the values in columns 3–12 are DMSE ×104.

CP % n τ = 0.1 τ = 0.3 τ = 0.5 τ = 0.7 τ = 0.9
CQ SCQ CQ SCQ CQ SCQ CQ SCQ CQ SCQ
ε ˜ i t ( 4 )
100 397 116 143 112 112 109 137 116 387 114
15 200 180 58 63 54 51 56 66 52 179 56
500 71 24 28 23 21 22 27 21 73 21
100 489 151 190 134 144 133 170 131 518 135
30 200 228 74 85 70 65 69 82 67 230 65
500 95 34 30 27 25 27 32 26 91 25
100 687 200 224 179 181 178 220 178 628 173
45 200 302 97 103 91 84 88 107 81 291 77
500 118 45 38 39 34 35 39 31 118 32
ε ˜ i χ 2 ( 1 )
100 2 128 13 135 62 126 212 122 1103 123
15 200 0 78 6 78 29 79 118 75 591 77
500 0 39 2 40 13 39 48 39 236 37
100 2 164 17 155 77 164 270 155 1397 144
30 200 0 97 7 98 37 97 136 95 718 90
500 0 51 3 54 15 52 55 48 305 44
100 3 208 23 198 89 204 321 189 1746 194
45 200 1 127 10 120 46 124 162 117 860 107
500 0 68 4 69 20 62 73 59 366 55
ε ˜ i L a p l a c e ( 0 , 1 )
100 603 235 246 230 213 235 258 223 564 234
15 200 304 119 125 115 109 114 129 114 299 111
500 119 52 50 49 43 46 52 45 109 46
100 743 322 343 302 271 291 336 288 762 270
30 200 365 171 159 149 124 147 163 136 385 289
500 140 74 62 66 53 60 59 54 138 54
100 983 441 410 374 370 364 435 361 964 330
45 200 461 210 200 184 165 171 200 175 456 163
500 195 101 76 80 65 81 76 71 187 64

The DMSE of CQ and SCQ estimators for the low-dimensional model with covariate X generated from Case 2, where the values in columns 3–12 are DMSE ×104.

CP % n τ = 0.1 τ = 0.3 τ = 0.5 τ = 0.7 τ = 0.9
CQ SCQ CQ SCQ CQ SCQ CQ SCQ CQ SCQ
ε ˜ i t ( 4 )
100 661 173 241 168 180 179 245 183 667 173
15 200 323 88 113 80 89 80 108 78 303 79
500 128 33 46 32 36 32 47 34 119 33
100 840 230 266 219 226 205 288 208 847 209
30 200 390 110 137 105 112 99 145 103 413 106
500 160 47 53 42 44 42 55 42 151 40
100 1101 310 423 302 344 295 422 265 1210 275
45 200 521 150 180 143 145 136 175 135 506 125
500 186 62 69 53 55 52 74 50 210 52
ε ˜ i χ 2 ( 1 )
100 3 190 21 195 97 199 352 194 1889 196
15 200 1 118 10 113 50 114 195 110 996 115
500 0 60 4 59 21 58 81 59 450 60
100 4 252 29 253 124 243 465 236 2219 236
30 200 1 146 13 145 60 142 221 139 1181 136
500 0 78 5 74 25 73 94 72 521 68
100 9 321 41 330 157 303 571 301 2653 307
45 200 1 179 17 184 75 193 271 187 1578 179
500 0 93 7 92 31 91 121 93 671 90
ε ˜ i L a p l a c e ( 0 , 1 )
100 1027 363 434 363 365 361 462 347 1028 356
15 200 506 179 227 168 179 183 218 174 484 172
500 195 75 86 67 76 68 85 70 204 71
100 1267 449 546 453 481 414 583 437 1366 443
30 200 560 224 270 208 215 209 273 221 574 196
500 241 93 112 85 94 86 100 86 244 83
100 1656 615 702 586 646 608 784 584 1642 564
45 200 783 302 389 280 296 277 348 266 830 255
500 305 122 133 115 114 116 136 105 313 101

The DMSE of CQ and SCQ estimators for the low-dimensional model with covariate X generated from Case 3, where the values in columns 3–12 are DMSE ×104.

CP % n τ = 0.1 τ = 0.3 τ = 0.5 τ = 0.7 τ = 0.9
CQ SCQ CQ SCQ CQ SCQ CQ SCQ CQ SCQ
ε ˜ i t ( 4 )
100 863 259 315 255 257 253 302 252 841 249
15 200 405 131 144 128 118 126 148 125 442 122
500 166 52 56 50 43 49 54 48 164 46
100 1082 331 371 321 311 315 390 311 104 304
30 200 490 168 172 159 141 153 181 149 528 144
500 203 73 71 67 54 64 66 61 187 58
100 1443 467 518 442 421 431 522 425 1372 415
45 200 653 226 233 209 187 202 247 196 712 186
500 260 101 91 89 70 83 89 78 248 73
ε ˜ i χ 2 ( 1 )
100 3 286 30 286 127 284 468 282 2529 277
15 200 1 180 12 180 65 179 262 177 1290 172
500 0 87 5 87 26 86 101 85 535 82
100 5 358 36 357 154 354 566 348 2992 335
30 200 1 218 15 216 76 212 295 208 1551 199
500 0 118 6 116 32 114 122 109 688 100
100 9 467 51 465 210 457 709 450 382 431
45 200 2 281 21 279 97 274 374 265 1866 246
500 0 154 8 152 41 148 161 139 907 124
ε ˜ i L a p l a c e ( 0 , 1 )
100 1372 553 619 545 537 539 615 535 1337 527
15 200 638 259 276 252 229 247 269 244 649 241
500 260 117 103 112 89 109 107 106 255 103
100 1757 718 780 693 655 678 766 664 1675 639
30 200 780 358 334 337 295 325 336 315 819 304
500 311 170 133 154 114 146 137 139 318 130
100 2274 985 1007 939 865 907 945 874 2145 822
45 200 999 483 452 446 388 424 433 407 1064 386
500 405 238 172 210 150 196 178 183 415 165

Smoothing estimators of regression model parameters after misclassifying the distribution of censored variables as Normal, Weibull, and Lognormal, and after estimating G using KM estimation. The covariate X in the low-dimensional model is generated from Case 1, where the values in columns 2–13 are DMSE ×105. The sample size of the simulation is fixed at 200.

τ Normal Weibull Lognormal KM Estimation
15% 30% 45% 15% 30% 45% 15% 30% 45% 15% 30% 45%
ε ˜ i t ( 4 )
0.1 588 766 991 567 722 938 564 709 916 1366 1404 1448
0.3 554 682 861 552 677 858 552 677 858 1158 1192 1203
0.5 537 644 808 543 655 823 545 660 832 1123 1128 1157
0.7 524 618 774 536 639 800 539 648 815 1037 1051 1100
0.9 511 594 754 529 620 776 534 631 793 976 1000 1091
ε ˜ i χ 2 ( 1 )
0.1 808 1006 1270 793 974 1227 791 966 1212 1786 1847 1982
0.3 803 995 1250 790 969 1215 789 962 1202 1788 1810 1902
0.5 792 969 1212 785 955 1191 785 953 1185 1716 1778 1856
0.7 772 922 1150 776 929 1151 778 933 1155 1521 1684 1820
0.9 740 858 1076 759 886 1088 765 898 1102 1305 1409 1472
ε ˜ i L a p l a c e ( 0 , 1 )
0.1 1213 1671 2138 1160 1560 2047 1151 1525 1999 1778 2023 2348
0.3 1141 1472 1835 1126 1451 1851 1125 1444 1851 1370 1595 1897
0.5 1102 1372 1695 1106 1392 1754 1110 1400 1777 1208 1434 1731
0.7 1071 1295 1593 1092 1344 1678 1099 1363 1717 1116 1326 1609
0.9 1034 1215 1513 1072 1285 1604 1083 1314 1655 1078 1304 1598

Smoothing estimates of regression model parameters after misclassifying the distribution of censored variables as Normal, Weibull, and Lognormal, and after estimating G using KM estimation. The covariate X in the high-dimensional model is generated from Case 1, where the values in columns 2–13 are DMSE ×104. The sample size of the simulation is fixed at 3000.

τ Normal Weibull Lognormal KM Estimation
15% 30% 45% 15% 30% 45% 15% 30% 45% 15% 30% 45%
ε ˜ i t ( 4 )
0.1 1962 2272 2891 1952 2270 2945 2020 2274 2965 2634 3593 4324
0.3 1951 2260 2937 1940 2258 2925 2022 2263 2943 2570 3532 4343
0.5 1945 2253 2891 1935 2251 2913 2026 2255 2936 2489 3459 4373
0.7 1943 2249 2902 1933 2247 2900 2019 2253 2931 2394 3381 4273
0.9 1943 2239 2892 1932 2236 2876 2013 2239 2912 2369 3306 4172
ε ˜ i χ 2 ( 1 )
0.1 2411 2427 2855 2391 2415 2844 2421 2445 2868 3605 3613 4105
0.3 2402 2439 2864 2392 2437 2844 2416 2448 2885 3390 3438 4035
0.5 2421 2435 2851 2391 2426 2842 2432 2447 2865 3308 3350 4062
0.7 2412 2446 2854 2392 2434 2843 2425 2465 2864 3307 3353 3955
0.9 2409 2442 2832 2390 2431 2825 2413 2444 2850 3262 3271 4009
ε ˜ i L a p l a c e ( 0 , 1 )
0.1 3857 4742 6153 3846 4739 6143 3861 4757 6165 3640 4442 5388
0.3 3840 4724 6096 3834 4711 6085 3853 4742 6101 3602 4349 5403
0.5 3838 4710 6051 3827 4692 6045 3841 4740 6067 3728 4252 5332
0.7 3835 4682 6018 3819 4675 6006 3851 4695 6024 3789 4169 5236
0.9 3834 4662 5963 3816 4654 5951 3849 4686 5975 3610 4097 5195

Appendix A

Suppose x=(x1,...,xp)T, where E(xj)=0, j=1,2,...,p, and Σ=E(xxT) is positive definite. For a real number r0, define the set Θ(r)={uRp:uΣr}, partialΘ(r)={uRp:uΣ=r}.

In order to prove Theorem 1, two lemmas are given first.

Assuming conditions A1,A2 hold, m3=supuSp1E|u,ωrangle|3<, the window width satisfies 0<h<1L{κ1+(m3κ2)1/2}f̲. Secondly, βh* is the unique solution to Qh*(β) and satisfies θ h : = β h * β * Σ < L κ 2 h 2 f ̲ L κ 1 h .

Moreover, assuming that fε|x(·) is continuously derivable and is satisfied for a constant >0, |fε|x(u)fε|x(0)|l|u| holds almost everywhere under x. Then, we have Σ 1 / 2 J ( β h * β * ) + 1 2 κ 2 h 2 · Σ 1 / 2 E { f ε | x ( 0 ) x } 2 ( 1 6 L κ 3 h 3 + 1 2 L m 3 θ h 2 + L κ 1 h θ h ) ( 1 + O ( n 1 / 2 ) ) , where J=E{δ1G(z)fε|x(0)xxT}.

To prove (A1), we define θh=βh*β*Rp and θh=θhΣ. Because of the convexity of the loss function Qh(β) and the fact that βh* is an optimal solution of Qh*(β)—i.e., there exists a first-order optimality condition Qh*(βh*)=0—so we have0Qh*(βh*)Qh*(β*),βh*β*Σ1/2Qh*(β*)2θhΣ.

The last step of (A4) is given by the ho¨lder inequality.

By using Taylor’s formula, we obtain1G(zi,θn)1G(zi,θ^n)=1G(zi,θn)1G(zi,θn)G(z;θn*)θ(θ^nθn), where θn*θn2θ^nθn2. Combined with A3 and the assumption that θ^n is the maximum likelihood estimation of θn, we have θ^nθn2=O(1n). When the number of parameters is limited, the estimated distribution of censored variable equals to its true distribution and converges to probability one. For ease of illustration, the equals sign is used in the following explanations.

Notice that Qh*(β*)=E{δ[K(ε/h)τ]1G^(z)x}=E{δ[K(ε/h)τ)1G^(y)x}. Through expanding using Taylor’s formula yields, we haveE{δ1G^(z)[K(ε/h)τ]|x}=+t+βTx+dG(s)1G^(t+βTx){K(t/h)τ}dFε|x(t)=+K(u)0hu{fε|x(t)fε|x(0)}dtdu(1+Op(n1/2)).

Combined with the above equation, we can obtainΣ1/2Qh*(β*)2=supuSp1E{δ[K(ε/h)τ]1G^(z)}u,Σ1/2x12Lκ2h2(1+O(n1/2)).

For the left-hand side of Equation (A4), the mean value theorem for vector functions is given byQh*(βh*)Qh*(β*)=012Qh*(β*+tθh)dtθh, where 2Qh*(β)=E{δ1G^(z)Kh(zβTx)xxT}. By θ=ββ* and Lipschitz condition, we obtainE{δKh(zβTx)1G^(z)|x}=K(ν)fε|x(θTx+hν)dν=(fε|x(0)+Rh*(θ))(1+O(n1/2)), and Rh*(θ) satisfies |Rh*(θ)|L(|θTx|+κ1h). Combining (A6) and (A7) and the assumption that fε|x(0)f̲>0,Qh*(βh*)Qh*(β*),βh*β*(f̲θh212Lm3·θh3Lκ1h·θh2)(1+O(n1/2)).

Combining (A4), (A5), and (A8), it can be found that θh0 satisfies 0.5Lm3·θh2(f̲Lκ1h)θh+0.5Lκ2h20. Under the condition that L{κ1+(m3κ2)1/2}h<f̲, the solution to the inequality yieldsθhLκ2h2f̲Lκ1h+Δh1/2,θhf̲Lκ1h+Δh1/2Lm3, where Δh:=(f̲Lκ1h)2L2m3κ2h2, but it is necessary to exclude the (A10) equation.

Suppose θh satisfies (A10), θh>L(m3κ2)1/2h/(Lm3)=(κ2/m3)1/2h=:r0. There must exist β(0,1), such that β˜:=(1η)β*+ββh* satisfies β˜β*Σ=ηθh=r0. By the convexity of Qh(β), it can be proved thatQh*(β˜)Qh*(β*),β˜β*η·Qh*(βh*)Qh*(β*),βh*β*=Qh*(β*),β˜β*.

Repeating the analysis of (A5) and (A8), the right of above the inequality is found, 0.5Lκ2h2·r0, as is the left, f̲·r020.5Lm3·r03Lκ1h·r02={f̲Lκ1h0.5L(m3κ2)1/2h}r02. Eliminating the common element r0 on both sides, there existsr00.5Lκ2h2f̲Lκ1h0.5L(m3κ2)1/2h<0.5Lκ2h20.5L(m3κ2)1/2h=(κ2/m3)1/2h=r0.

This leads to a contradiction. Therefore, h must be satisfied to fulfill (A9), the first inequality constraint.

In order to demonstrate (A3), the bias of the estimator is examined below. We define the variablesΔ=Σ1/2{Qh*(βh*)Qh*(β*)J(βh*β*)}, and the matrix H=Σ1/2JΣ1/2=E{δ1G^(z)fε|x(0)ωωT}, where ω=Σ1/2x. Similarly, according to the mean value theorem for vector functions it follows thatΔ=01Σ1/22Qh*(β*+tθh)Σ1/2dtHΣ1/2θh, where θh=βh*β*. Combining the Lipschitz continuity of fε|x(·) and (A11),Δ2L(0.5m3θhΣ+κ1h)θhΣ(1+O(n1/2)).

Furthermore, expanding fε|x(·) by a second-order Taylor series, we obtainΣ1/2Qh*(β*)12κ2h2·Σ1/2E{fε|x(0)x}216lκ3h3(1+O(n1/2)).

Combining (A12) and (A13) proves that Lemma A1 holds. □

Lemma A1 discusses the connection between βh* and β*. Lemma A2 must be proved before discussing the relationship between β^h and β*.Dh(θ)=Qh(β*+θ)Qh(β*),Rh(θ)=Dh(θ)Qh(β*),θ,Dh*(θ)=Qh*(β*+θ)Qh*(β*),Rh*(θ)=Dh*(θ)Qh*(β*),θ.

For any u0, given r0, the following equation holds: P sup θ Θ ( r ) { D h * ( θ ) D h ( θ ) } 3 τ ¯ υ 0 * r · P d · u n + u n 1 e 4 p u , where Pd=i=1nδin is the uncensored proportion, τ¯=max{τ,1τ}, and υ0*=C0υ0. Furthermore, given ru>rl>0, for any u0, the probability inequality P D h * ( θ ) D h ( θ ) 4.25 τ ¯ υ 0 * θ Σ P d · u n + u n 1 e l o g ( r u r l ) e 4 p u holds, where θ satisfies rlθΣru.

For each sample si=(xi,εi), define the loss function Difference dh(θ;si)=h(εixi,θ)h(εi), so Dh(θ)=(1/n)i=1ndh(θ;si). By the loss function h(u) being Lipschitz continuous, dh(θ;si) is also τ¯Lipschitz continuous in xi,θ, i.e., for any si and θ,θRp, dh(θ;si)dh(θ;si)δi1G^(zi)τ¯xi,θxi,θ.

For any given r>0 and some ϵ(0,1), define ϵ(r)=n(1ϵ)supeΘ(r){Dh*(θ)Dh(θ)}/(2τ¯r), where Dh*(θ)=EDh(θ). Utilizing Chebyshev’s inequality, we obtain thatP{ϵ(r)u}expsupλ0{λulogEeλϵ(r)}.

Through Rademacher’s symmetric control moment function Eeλϵ(r), there existsEeλϵ(r)Eexp2λ(1ϵ)supθΘ(r)12τ¯rδi=1πidh(θ;si), where π1,...,πn are independent Rademacher random variables. If xi,θ=0, then dh(θ;si)=0 considering that dh(θ;si) is also τ¯Lipschitz continuous in xi,θ. By using the contraction inequality, we haveEexp2λ(1ϵ)supθΘ(r)12τ¯rδi=1πidh(θ;si)Eexpλr(1ϵ)supθΘ(r)δi=1πiδixi,θ1G^(yi), where ωi=Σ1/2xi. For such a ϵ(0,1), there exists an ϵ net {u1,...,uNϵ} on Sp1, when Nϵ(1+2/ϵ)p such that δi=1πiδiωi1G^(yi)2(1ϵ)1max1jNϵδi=1δiπiujTωi1G^(yi). This suggests thatEexpλ(1ϵ)δi=1πiδiωi1G^(yi)2j=1NϵEexpλδi=1πiujTδiωi1G^(yi).

Define Sj=δi=1πiujTδiωi1G^(yi), and we notice that πi{1,1} is symmetric. From condition A3, there exists a constant C0 such that 11G(z)C0, then we haveP(|u,δ1G(z)ω|υ0*t)P(C0|uTω|υ0*t)=P(|uTω|υ0t)et.

Taking υ0*=C0υ0, for any k3, E|ujTδiωi1G^(yi)|k(υ0*)kk0+tk1etdt=(υ0*)kk!(1+O(n1/2)). Thus, for all 0c<1/υ0,EecπiujTδiωi1G^(yi)=1+c22E(πiujTδiωi1G^(yi))2+=3c!(πiujTδiωi1G^(yi))1+c2υ0*22+=2c2(2)!υ0*2(2)!1+c2υ0*22+=2(cυ0*)2(1+O(n1/2)).

It follows that for every 0<λ<1/(2υ0*) and j=1,....,Nϵ,logEeλSj(n0+O(n0))υ0*2λ2(12υ0*λ),s.t.logEeλSjn0υ0*2λ2(12υ0*λ) where n0 is the n sample size that satisfies the condition δi=1. For any u0, notice thatsupλ0{λulogEeλϵ(r)}logNϵ+supλ(0,(2υ0*)1)λun0υ0*2λ2(12υ0*λ).

Substituting the above inequality into (A16), it can be shown that when ϵ=2/(e41),PsupθΘ(r0){Dh*(θ)Dh(θ)}221ϵτ¯υ0*r·Pd·un+un1e4pu, where Pd=n0/n. The following is a proof of Equation (A15), which by disproof yields θΘ(rl,ru):={υRp.rlυΣru}. For some γ>1 and positive integers k=1,...,N:=log(rurl)/log(γ), define the set Θk={υRp:γk1rlυΣγkrl},. So Θ(rl,ru)k=1NΘk. Then, we repeat (A17) and let r=γkrl,k=1,...,N,PθΘ(rl,ru)s.t.Dh*(θ)Dh(θ)>22γ1ϵτ¯υ0*θΣ·Pd·un+unk=1NPsupθΘ(γkrl)Dh*(θ)Dh(θ)>221ϵτ¯υ0*γkrl·Pd·un+unk=1Nexp{plog(1+2/ϵ)u}log(rurl)/log(γ)exp{plog(1+2/ϵ)u}.

The required constraint can be obtained by taking ϵ=2/(e41) and γ=e1/e. □

According to the definition of Dh(θ) and (A5), it can be found thatDh(θ)=Qh*(β*),θ+Rh*(θ)+{Dh(θ)Dh*(θ)}Rh*(θ)Σ1/2Qh*(β*)2θΣ{Dh*(θ)Dh(θ)} where {Dh*(θ)Dh(θ)} is the sampling error. From (A8), it follows again that for each θRp, there areRh*(θ)12(f̲Lκ1h0.5Lm3·θΣ)·θΣ2(1+O(n1/2)).

Take r0=(2κ2/m3)1/2h as the radius of convergence. For any θΘ(r0), combining (A18) and (A19), there existsRh*(θ)0.5Lκ2h2·θΣ(1+O(n1/2))12(f̲Lκ1hL(2κ2m3)1/2h)r02(1+O(n1/2)).

To control the last term on the right-hand side of (A18), Lemma A2 gives the law of large numbers for {Dh(θ)Dh*(θ),θΘ(r)}. First, applying (A14) and letting r=r0,u=4p+t, we obtain the following variation.PsupθΘ(r0){Dh*(θ)Dh(θ)}3τ¯υ0*r0·Pd(4p+t)n+4p+tn1et.

Combining the inequalities (A18) and (A20), and the upper bound of (A21), it is sufficient that the window width h satisfies f̲1m31/2υ0*Pd(p+t)/nhf̲m31/2. Then, for all θΘ(r0), thePDh(θ)>01et.

Using the condition (A22), Dh(0)=0 and θ^=argminDh(θ), and the convexity of Dh(·) ensures that the θ^Σr0 holds with high probability. On the other hand, according to the optimality of β^h, θ^:=β^hβ* satisfies Dh(θ^)0.

Secondly, when θ^Σ has a lower bound, i.e., θ^Σ falls into an interval, the rate of convergence of β^h needs to be redetermined. Consider the cyclic set Θ(rl,r0)={θRp:rlθΣr0} and rl=r0h. If θ^Θ(rl,r0), then there must exist θ^Θ(rl), in which case the constraints can be reduced to the first case. Assume that θ^Θ(rl,r0). Using Equation (A15), let (rl,ru)=(r0h,r0), u=Pd(log(elogh1)+4p+t), and r1=4.25τ¯υ0*Pd(log(elogh1)+4p+t)n+log(elogh1)+4p+tn, we getPDh*(θ)Dh(θ)θΣ·r11et.

The above inequality in θ satisfies θΘ(rl,r0); thus, in θ(rl,r0), it must contain θ^. Applying this formula and combining it with (A18) and (A19) and the fact that Dh(θ^)0, we obtain(f̲Lκ1h)θ^Σ2(1+O(n1/2))(2r1+Lκ2h)θ^Σ+0.5Lm3θ^Σ32r1θ^Σ+2Lκ2hθ^Σ(1+O(n1/2)).

The conclusion of Theorem 1 can be proved by eliminating θ^Σ from both the leftmost and rightmost sides of the above equation. The proof of Theorem 1 differs from reference [10] in that θ^Σ will have O(n1/2) more convergence order, but since h is related to the sample size, it will only retain a higher convergence speed, so only the slower converging orders are retained. Thus, it looks like the result is the same as in the non-censoring case. □

The procedure for the proof of Theorem 2 will retain the notation used in the proof of Theorem 1, for any t0 such that r=r(n,p,t)(p+t)/n+h2>0 as long as (p+t)/nh1 there exists P{β^hβ*+Θ(r)}12et. Define a vector-valued stochastic process,φ(θ)=Σ1/2{Qh*(β*+θ)Qh*(β*)Jhθ}, where Jh=2Qh*(β*) is denoted as the overall Hessian array of β*. Since β^h falls in the domain of β* with a high probability, β^h is able to constrain well the supθΘ(r)φ(θ)2. By means of the trigonometric inequality,supθΘ(r)φ(θ)2supθΘ(r)Eφ(θ)2+supθΘ(r)φ(θ)Eφ(θ)2:=M+N.

It is now necessary to give upper bounds for M and N, respectively. Consider first the upper bound for M, using the median theorem,Eφ(θ)=01Σ1/22Qh*(β*+tθ)Σ1/2dtHh,Σ1/2θ, where Hh:=Σ1/2JhΣ1/2=E{δ1G^(z)Kh(ε)ωωT}. Expanding the second-order derivative of the above expectation second-order derivative, we write υ=Σ1/2θ. For all θΘ(r), we haveΣ1/22Qh*(β*+tθ)Σ1/2=E+K(u)fε|x(tω,υhu)du·ωωT.

Utilizing the Lipschitz continuity of fε|x(·),Σ1/2012Qh*(β*+tθ)dtΣ1/2Hh2=EK(u){fε|x(tω,υhu)fε|x(hu)}du·ωωT2Lm3rt(1+O(n1/2)).

The last inequality can be obtained from the Cauchy–Schwarz inequality. Therefore,supθΘ(r)Eφ(θ)20.5Lm3r2(1+O(n1/2)). The proof procedure for the upper bound of N can be found in [10,29], i.e., the upper bound of the zero-mean stochastic process φ(θ)Eφ(θ) in the sense of l2-paradigm numbers. □

Let aRp be a p-dimensional real vector. Given h=hn>0, define Sn=1ni=1nγiηi and the centered variable Sn0=SnESn, where ηi=δi1G^(zi){τK(εi/h)},γi=Jh1a,xi. According to the Lipschitz continuity for fεi|xi(·) and the Fundamental Theorem of Calculusm we obtain |E(ηi|xi)|0.5Lκ2h2(1+O(n1/2)), κ2(h2(1+O(n1/2)), and then E(γiηi)0.5Lκ2Jh1aΣ·h2.

The proof procedure of Theorem 3 can be found in [10]. □

Appendix B

Estimation results for the high-dimensional model when covariate X is generated from Case 2 (Figure A1) and Case 3 (Figure A2). Simulation results under the high-dimensional model when the covariates are generated from Case 2 (Figure A3) and Case 3 (Figure A4).

View Image - Figure A1. Estimation results for the high-dimensional model when covariate X is generated from Case 2. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

Figure A1. Estimation results for the high-dimensional model when covariate X is generated from Case 2. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

View Image - Figure A2. Estimation results for the high-dimensional model when covariate X is generated from Case 3. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

Figure A2. Estimation results for the high-dimensional model when covariate X is generated from Case 3. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

View Image - Figure A3. Simulation results under the high-dimensional model when the covariates are generated from Case 2, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

Figure A3. Simulation results under the high-dimensional model when the covariates are generated from Case 2, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

View Image - Figure A4. Simulation results under the high-dimensional model when the covariates are generated from Case 3, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

Figure A4. Simulation results under the high-dimensional model when the covariates are generated from Case 3, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

References

1. Koenker, R.; Gilbert, B., Jr. Regression quantiles. Econometrica; 1978; 46, pp. 33-50. [DOI: https://dx.doi.org/10.2307/1913643]

2. Koenker, R. Quantile Regression; Cambridge University Press: Cambridge, UK, 2005.

3. Koenker, R.; Chernozhukov, V.; He, X.M.; Peng, L. Handbook of Quantile Regression; Chapman & Hall/CRC: Boca Raton, FL, USA, 2017.

4. Horowitz, J.L. Bootstrap methods for median regression models. Econometrica; 1998; 66, pp. 1327-1351. [DOI: https://dx.doi.org/10.2307/2999619]

5. de Castro, L.; Galvao, A.F.; Kaplan, D.M.; Liu, X. Smoothed GMM for quantile models. J. Econom.; 2019; 213, pp. 121-144. [DOI: https://dx.doi.org/10.1016/j.jeconom.2019.04.008]

6. Chen, X.; Liu, W.; Zhang, Y. Quantile regression under memory constraint. Ann. Stat.; 2019; 47, pp. 3244-3273. [DOI: https://dx.doi.org/10.1214/18-AOS1777]

7. Galvao, A.F.; Kato, K. Smoothed quantile regression for panel data. J. Econom.; 2016; 193, pp. 92-112. [DOI: https://dx.doi.org/10.1016/j.jeconom.2016.01.008]

8. Whang, Y.J. Smoothed empirical likelihood methods for quantile regression models. Econ. Theory; 2006; 22, pp. 173-205. [DOI: https://dx.doi.org/10.1017/S0266466606060087]

9. Fernandes, M.; Guerre, E.; Horta, E. Smoothing quantile regressions. J. Bus. Econom Stat.; 2021; 39, pp. 338-357. [DOI: https://dx.doi.org/10.1080/07350015.2019.1660177]

10. He, X.; Pan, X.; Tan, K.M.; Zhou, W.X. Smoothed quantile regression with large-scale inference. J. Econom.; 2023; 232, pp. 367-388. [DOI: https://dx.doi.org/10.1016/j.jeconom.2021.07.010] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36776480]

11. Bayarassou, N.; Hamrani, F.; Ould, S.E. Nonparametric relative error estimation of the regression function for left truncated and right censored time series data. J. Nonparametr. Stat.; 2023; 36, pp. 706-729. [DOI: https://dx.doi.org/10.1080/10485252.2023.2241572]

12. Ying, Z.; Jung, S.H.; Wei, L.J. Survival analysis with median regression models. J. Am. Stat. Assoc.; 1995; 90, pp. 178-184. [DOI: https://dx.doi.org/10.1080/01621459.1995.10476500]

13. Honoré, B.; Khan, S.; Powell, J.L. Quantile regression under random censoring. J. Econom.; 2002; 109, pp. 67-105. [DOI: https://dx.doi.org/10.1016/S0304-4076(01)00142-7]

14. Portnoy, S. Censored regression quantiles. J. Am. Stat. Assoc.; 2003; 98, pp. 1001-1012. [DOI: https://dx.doi.org/10.1198/016214503000000954]

15. Peng, L. Self-consistent estimation of censored quantile regression. J. Multivar. Anal.; 2012; 105, pp. 368-379. [DOI: https://dx.doi.org/10.1016/j.jmva.2011.10.005]

16. Yuan, X.; Zhang, X.; Guo, W.; Hu, Q. An adapted loss function for composite quantile regression with censored data. Comput. Stat.; 2024; 39, pp. 1371-1401. [DOI: https://dx.doi.org/10.1007/s00180-023-01352-6]

17. Gao, Q.; Zhou, X.; Feng, Y.; Du, X.; Liu, X. An empirical likelihood method for quantile regres sion models with censored data. Metrika; 2021; 84, pp. 75-96. [DOI: https://dx.doi.org/10.1007/s00184-020-00775-1]

18. Hao, R.; Weng, C.; Liu, X.; Yang, X. Data augmentation based estimation for the censored quantile regression neural network model. Expert Syst. Appl.; 2023; 214, 119097. [DOI: https://dx.doi.org/10.1016/j.eswa.2022.119097]

19. Yang, X.; Narisetty, N.N.; He, X. A new approach to censored quantile regression estimation. J. Comput. Graph. Stat.; 2018; 27, pp. 417-425. [DOI: https://dx.doi.org/10.1080/10618600.2017.1385469]

20. Peng, L.; Huang, Y. Survival analysis with quantile regression models. J. Am. Stat. Assoc.; 2008; 103, pp. 637-649. [DOI: https://dx.doi.org/10.1198/016214508000000355]

21. Xu, G.; Sit, T.; Wang, L.; Huang, C.Y. SEstimation and inference of quantile regression for survival data under biased sampling. J. Am. Stat. Assoc.; 2017; 112, pp. 1571-1586. [DOI: https://dx.doi.org/10.1080/01621459.2016.1222286] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30078919]

22. Cai, Z.; Sit, T. On interquantile smoothness of censored quantile regression with induced smoothing. Biometrics; 2023; 79, pp. 3549-3563. [DOI: https://dx.doi.org/10.1111/biom.13892] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37382567]

23. Kim, K.H.; Caplan, D.J.; Kang, S. Smoothed quantile regression for censored residual life. Comput Stat.; 2023; 38, pp. 1001-1022. [DOI: https://dx.doi.org/10.1007/s00180-022-01262-z]

24. Wu, Y.; Ma, Y.; Yin, G. Smoothed and corrected score approach to censored quantile regression with measurement errors. J. Am. Stat. Assoc.; 2015; 110, pp. 1670-1683. [DOI: https://dx.doi.org/10.1080/01621459.2014.989323]

25. He, X.; Pan, X.; Tan, K.M.; Zhou, W.X. Scalable estimation and inference for censored quantile regression process. Ann. Stat.; 2022; 50, pp. 2899-2924. [DOI: https://dx.doi.org/10.1214/22-AOS2214]

26. Fei, Z.; Zheng, Q.; Hong, H.G.; Li, Y. Inference for high-dimensional censored quantile regression. J. Am. Stat. Assoc.; 2023; 118, pp. 898-912. [DOI: https://dx.doi.org/10.1080/01621459.2021.1957900] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37309513]

27. De Backer, M.; Ghouch, A.E.; Van, K.I. An adapted loss function for cen sored quantile regression. J. Am. Stat. Assoc.; 2019; 114, pp. 1126-1137. [DOI: https://dx.doi.org/10.1080/01621459.2018.1469996]

28. Leng, C.; Tong, X. Censored quantile regression via Box-Cox transformation under conditional independence. Stat. Sin.; 2014; 24, pp. 221-249. [DOI: https://dx.doi.org/10.5705/ss.2012.089]

29. Spokoiny, V. Bernstein–von Mises theorem for growing parameter dimension. arXiv; 2013; [DOI: https://dx.doi.org/10.48550/arXiv.1302.3430] arXiv: 1302.3430

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.