Full text

Turn on search term navigation

1. Introduction

Since the regression quantile has similar robustness as the sample quantile, the quantile regression model can well characterize the conditional distribution of the response variable y when the independent variable $x$ is given, thus characterizing the link between the two [1]. Response variable $y \in R$ and p-dimensional covariates $x = {(x_{1}, . . ., x_{p})}^{T}$ are given, and $F_{y | x} (\cdot)$ is the conditional distribution function of y concerning $x$ . The classical quantile regression model can be expressed as

(1) $\begin{matrix} F_{y | x}^{- 1} (τ) = {[β (τ)]}^{T} x, \end{matrix}$

where

β = β (τ) = {(β_{1} (τ), . . ., β_{p} (τ))}^{T} \in R^{p}

is the regression coefficient. Koenker, R. et al. [2,3] provided a detailed discussion of quantile regression modeling in terms of methodology, theory, and computation. Because the loss function of quantile regression is not smooth, its computational complexity increases dramatically. Considering that the loss function is not derivable, Horowitz [4] used the smooth function to approximate the indicator function, in order to smooth the objective function. This method has also been applied to deal with other quantile regression-related problems, in which de Castro, L. et al. [5] used a smooth moment estimating equation to estimate the parameter in quantile regression; Chen et al. [6] used a smoothing method to study the quantile regression model with constraints; Galvao, A.F. and Kato, K. [7] studied the smoothing estimation of fixed effects in quantile regression models for panel data; Whang [8] discussed the empirical likelihood estimation of quantile regression models by using a smoothing approach. Although the above literature has solved the problem that the objective function is not derivable, it cannot guarantee the concavity of the objective function. Thus, there is no guarantee that the result is a global optimal solution. Recently, Fernandes et al. [9] proposed a convolutional smoothing method for estimating fixed dimensional parameters of the quantile regression model, based on which the loss function is quadratically derivable and convex, and the method outperforms the original smoothing estimation in terms of estimation accuracy. Under the complete observation data, He [10] used the convolutional smoothing method to estimate the parameters of the high-dimensional quantile regression model, and the loss function after smoothing is a quadratically derivable and convex function. When solving the minimum point of the smoothed objective function numerically, the gradient descent algorithm [10] is used to replace the quantile regression with the least squares estimation, which effectively shortens the computation time and improves the estimation accuracy.

In empirical studies, it is frequently observed that variables of interest are subject to censoring. For instance, in the study [11] on the survival times of AIDS patients, it was found that 43% of the data were right-censored. For parameter estimation in quantile regression models for right-censored data, we refer to Ying, Z. et al. [12], Honoré, B. et al. [13], Portnoy, S. [14], Peng, L. [15], and Yuan, X. et al. [16]. Parameter estimation in quantile regression models for censored data [17,18,19] has been extensively studied, including right-censored data. In the quantile regression for right-censored data, the problem of non-smooth loss functions still exists, and it has already been studied [20,21,22,23]. For the right-censored quantile regression model with fixed dimensional parameters, Peng, L. and Huang, Y. [20], Xu, G. et al. [21], Cai, Z. and Sit, T. [22], Kim, K.H. [23] considered the smoothing estimation methods, respectively. For the high-dimensional large-sample case of the censored quantile regression model, Wu, Y. et al. [24], He, X. et al. [25], and Fei, Z. et al. [26] used a similar method to Fernandes et al. [9] to smooth the estimating equations, which improved the estimation accuracy over classical parameter estimation methods. However, these smoothing estimation methods for high-dimensional censored quantile regression need to grid the quantiles as $0 = τ_{1} < \dots < τ_{j} < τ_{j + 1} \dots < τ_{k} = 1$ , to ensure that the approximation error of the estimation equation will not be too large. Furthermore, we need to use the estimates of $β (τ_{1}), \dots, β (τ_{j})$ before estimating $β (τ_{j + 1})$ , and the estimation accuracy depends on the number of segmentation points, which also increases the computational complexity.

Considering the censored data, this paper extends the convolutional smoothing method [10] to the high-dimensional censored quantile regression model and proposes the coefficient estimation of the censored quantile regression model based on the convolutional smoothing method. Under certain conditions, the loss function of the smoothed censored quantile regression model is quadratically derivable and globally convex, and the gradient-based iterative algorithm could be used to calculate the regression parameters. In this paper, the bias of the smoothing estimation of the censored quantile regression is characterized under certain conditions, and the speed of convergence, the Bahadur–Kiefer representation, and the Berry–Esseen upper bound of the smoothing estimation of the censored quantile regression are established under high-dimensional and large-sample conditions. Moreover, compared with the classical parameter estimation method, which has different estimation accuracies at different quantile points, the smoothing estimation method basically maintains the same estimation accuracies at each quantile point, and the proposed method greatly saves the computation time under high-dimensional large-sample condition. In summary, the key contributions and significance of this article are as follows: (1) To our knowledge, this article is the first to apply the convolutional smoothing method to high-dimensional censored data regression analysis. Compared with the non differentiability of the objective function in classical censored quantile regression, the objective function in this method is second-order differentiable, which brings great convenience to the establishment of gradient based calculation algorithms and the discussion of theoretical properties. (2) For high-dimensional data scenarios, under certain regularization conditions, this paper also establishes the asymptotic consistency, asymptotic normality, and other theoretical properties of the proposed smooth estimator, ensuring the good properties of statistical inference.

This paper is organized as follows. Section 2 proposes a convolutional smoothing estimation method for high-dimensional censored quantile regression models, and gives the asymptotic properties of the smoothing estimation. In Section 3, numerical simulations of the smoothing estimation and the classical parameter estimation are carried out for the low and high dimensional cases, and the estimation accuracy and computational speed of the smoothing method in the censored quantile regression analysis are discussed. The discussion and conclusion are given in Section 4 and Section 5, and then a detailed proof of the asymptotic property is given in Appendix A.

2. Methods

In this paper, we consider a right-censored quantile regression model; i.e., we assume that the response variable y in the model (1) is right-censored, in which case the observed variables are $z = min (y, c), δ = I (y \leq c)$ , where c is the censored variable. The observations are denoted as ${(z_{i}, x_{i}, δ_{i}), i = 1, . . ., n}$ . Then, the estimation of the parameters in the censored quantile regression model is defined as

(2) $\begin{matrix} \hat{β} (τ) \in min_{β \in R^{p}} Q (β) = min_{β \in R^{p}} \frac{1}{n} \sum_{i = 1}^{n} \frac{δ_{i}}{1 - \hat{G} (z_{i})} ρ_{τ} (z_{i} - β^{T} x_{i}), \end{matrix}$

where

ρ_{τ} (u) = u {τ - I (u < 0)}

is the

τ

quantile loss function, and

G (\cdot)

is the distribution function of the censored variable

c_{i}

, and

\hat{G} (\cdot)

is the estimate of

G (\cdot)

Convolutional smoothing estimation for high-dimensional quantile regression models has been studied in the literature [10] by He et al. In this paper, we apply the method to censored data. Let $K (\cdot)$ be the kernel function with integral 1, and h be the window width. Denote the following:

$\begin{matrix} K_{h} (u) = h^{- 1} K (u / h), K_{h} (u) = K (u / h), K (u) = \int_{- \infty}^{u} K (υ) d υ, u \in R . \end{matrix}$

Then, the objective function for the smoothing estimation of the censored quantile regression can be written as

(3) $\begin{matrix} Q_{h} (β) = \frac{1}{n} \sum_{i = 1}^{n} \frac{δ_{i}}{1 - \hat{G} (z_{i})} ℓ_{h} (z_{i} - β^{T} x), \end{matrix}$

where

ℓ_{h} (u) = (ρ_{τ} * K_{h}) (u) = \int_{- \infty}^{+ \infty} ρ_{τ} (υ) K_{h} (υ - u) d υ

and “∗” denotes the convolution operator. The convolutional smoothing estimate (denoted SCQ) of the censored quantile regression model is defined as the point of minimum of

Q_{h} (β)

{\hat{β}}_{h}

. For any

β \in R^{p}

, write

Q_{h}^{*} (β) = E Q_{h} (β)

, with

(4) $\begin{matrix} β_{h}^{*} (τ) \in {argmin}_{β \in R^{p}} Q_{h}^{*} (β) . \end{matrix}$

For ease of presentation and without ambiguity, $β^{*} (τ) \in {argmin}_{β \in R^{p}} E Q (β)$ and $β_{h}^{*} (τ)$ are abbreviated as $β^{*}$ and $β_{h}^{*}$ , respectively.

It is easy to find that the loss function of the (3) is quadratically derivable, and its gradient and Hessian matrix can be expressed, respectively, as follows:

(5) $\nabla Q_{h} (β) = \frac{1}{n} \sum_{i = 1}^{n} \frac{δ_{i}}{1 - \hat{G} (z_{i})} {K_{h} (β^{T} x_{i} - z_{i}) - τ} x_{i},$

$\nabla^{2} Q_{h} (β) = \frac{1}{n} \sum_{i = 1}^{n} \frac{δ_{i}}{1 - \hat{G} (z_{i})} K_{h} (z_{i} - β^{T} x_{i}) x_{i} x_{i}^{T} .$

As long as the kernel function $K (\cdot)$ is non-negative, for any window width $h > 0$ , $Q_{h} (β)$ is a convex function and $β_{h}^{*} = β_{h}^{*} (τ)$ satisfies the first-order condition $\nabla Q_{h} (β_{h}^{*}) = 0$ .

Remark 1.

When estimating the distribution of the censored variable $c_{i}$ , we can use KM estimation to estimate $\hat{G} (\cdot)$ . If we assume that the form of the distribution $G (\cdot)$ of the censored variable $c_{i}$ is known, even if there is a mis-specification of the distribution, we find through the subsequent simulation that the estimation of the distribution of the censored variable by the parametric method is better than that of the distribution of the censored variable by using the KM estimation before smoothing the estimation of the regression parameter in most cases; thus, we rewrite the distribution function of the censored variable $c_{i}$ as follows $\hat{G} (\cdot) = G (\cdot, {\hat{θ}}_{n})$ . Where the parameter vector $θ_{n} \in R^{p}$ , ${\hat{θ}}_{n}$ is the maximum likelihood estimate of $θ_{n}$ . The parametric distribution form is used for estimation in both the proof and the hypotheses.

To give theoretical results, we assume that the covariate $x$ has been centered. Given the vectors $u, υ \in R^{p}$ , $u^{T} υ$ and $⟨ u, υ ⟩$ both represent their inner product: $a \lor b = max {a, b}$ , where a and b are constants. The ${∥\cdot∥}_{q} (1 \leq q < \infty)$ denotes the $ℓ_{q}$ -paradigm, i.e., ${∥u∥}_{q} = (\sum_{i = 1}^{p} | u_{i} {|^{q})}^{1 / q}$ , and ${∥u∥}_{\infty} = {max}_{1 \leq i \leq p} | u_{i} |$ , where $u_{i}$ denotes the ith element of the p-dimensional real vector $u$ . Given a semi-positive definite matrix $Σ \in R^{p \times p}$ , define ${∥u∥}_{Σ} = {∥ Σ^{1 / 2} u ∥}_{2}$ for any vector $u \in R^{p}$ . For all real numbers $r \geq 0$ , define $B^{p} (r) = {β \in R^{p} {: ∥ β ∥}_{2} \leq r}$ and $S^{p - 1} (r) = {β \in R^{p} {: ∥ β ∥}_{2} = r}$ . For two non-negative sequences ${a_{n}}_{n \geq 1}$ and ${b_{n}}_{n \geq 1}$ , $a_{n} ≲ b_{n}$ denotes the existence of a constant $C > 0$ independent of n such that $a_{n} \leq C b_{n}$ . $a_{n} ≳ b_{n}$ is equivalent to $b_{n} ≲ a_{n}$ . $a_{n} ≍ b_{n}$ is equivalent to $a_{n} ≲ b_{n}$ and $b_{n} ≲ a_{n}$ holds simultaneously. The assumptions required for the theorem are as follows.

$A 1$ . The non-negative kernel function $K (\cdot)$ satisfies $K (u) = K (- u)$ , with upper bound $κ : = {sup}_{u \in R} K (u) < + \infty$ , and $κ_{k} : = \int_{- \infty}^{+ \infty} {| u |}^{k} K (u) d u < + \infty$ , where $k = 1, 2$ .

$A 2$ . The regression error term $ε$ on the conditional density function $f_{ε | x} (\cdot)$ on $x$ satisfies the Lipschtiz condition; i.e., there exists a constant $L > 0$ such that $\forall u, υ \in R$ , there is $| f_{ε | x} (u) - f_{ε | x} (υ) | \leq L | u - υ |$ which holds almost everywhere. There exist real numbers $\underset{̲}{f} > 0$ such that $f_{ε | x} (0) \geq \underset{̲}{f}$ holds almost everywhere for any $x$ .

$A 3$ . There exist positive constants $C_{1}$ and $K_{j}$ such that

(6) $\begin{matrix} \frac{\partial G (z; θ)}{\partial θ_{j}} \leq K_{j} (z), E {K_{j}^{2} (z)} \leq C_{1} < \infty . (j = 1, . . ., d) \end{matrix}$

Denote $z_{(i)}$ as the ith order statistic of ${z_{i}}$ , and $δ_{(i)}$ as the corresponding indicator function. $z_{(i)}$ and $δ_{(i)}$ satisfy

(7) $\begin{matrix} P (δ_{(n)} = 1 | z_{(n)}) > 0 . \end{matrix}$

$A 4$ . The covariate $x$ obeys a subexponential distribution—i.e., there exists $υ_{0} > 0$ such that for any $u \in S^{p - 1}$ and $t \geq 0$ , there are $P {| u, ω | \geq υ_{0} t} \leq e^{- t}$ , where $Σ = E {x x^{T}}$ is positive definite with $ω = Σ^{- 1 / 2} x$ .

With $ω = Σ^{- 1 / 2} x$ and positive integer k, we define $m_{k} = {sup}_{u \in S^{p - 1}} E {| ⟨ u, ω ⟩ |}^{k}$ . The following theorems can be obtained.

Theorem 1

(Upper bound on the estimation error). Suppose the condition $A 1$ – $A 4$ holds for any real number $t > 0$ . If h satisfies the constraints ${\underset{̲}{f}}^{- 1} m_{3}^{1 / 2} υ_{0}^{*} \sqrt{P_{d} (p + t) / n} ≲ h ≲ \underset{̲}{f} m_{3}^{- 1 / 2}$ , $P_{d} = \frac{\sum_{i} δ_{i}}{n}$ is the uncensored proportion, then the convolutional smoothing estimate ${\hat{β}}_{h}$ satisfies the boundary conditions

(8) $\begin{matrix} P \{∥ {\hat{β}}_{h} - β^{*} ∥_{Σ} \leq \frac{R}{\underset{̲}{f}} [υ_{0}^{*} \sqrt{\frac{l o g (l o g (\frac{1}{h} \lor 1)) + p + t}{n}} + L κ_{2} h^{2}]\} \geq 1 - 2 e^{- t}, \end{matrix}$

where $υ_{0}^{*} = C_{0} υ_{0}$ . $C_{0}$ and R are positive constants.

The upper bounds on the estimation error $υ_{0}^{*} \sqrt{\frac{l o g (l o g (\frac{1}{h} \lor 1)) + p + t}{n}}$ and $L κ_{2} h^{2}$ can be interpreted as the prediction bias and the speed of convergence. A smaller h leads to smaller deviations after smoothing, but an h that is too small could result in overfitting and slow convergence rates. According to Theorem 1, h satisfies the condition $\sqrt{(p + t) / n} ≲ h ≲ 1$ . In order to obtain a non-asymptotic Bahadur representation of the smoothing estimation, we tend to replace $A 4$ with $A 4^{*}$ .

$A 4^{*}$ . The covariate $x$ obeys a sub-Gaussian distribution—i.e. there exists $υ_{1} > 0$ such that for any $u \in S^{p - 1}$ and $t \geq 0$ , there is a $P {| u, ω | \geq υ_{1} t} \leq 2 e^{- t^{2} / 2}$ , where $Σ = E {x x^{T}}$ is positive definite with $ω = Σ^{- 1 / 2} x$ .

Theorem 2

(non-asymptotic Bahadur representation). Assuming conditions $A 1$ – $A 3$ and $A 4^{*}$ hold, ${sup}_{u \in R} f_{ε | x} (u) \leq \bar{f}$ holds almost everywhere. For any real number $t > 0$ , h satisfies the constraint ${\underset{̲}{f}}^{- 1} m_{3}^{1 / 2} υ_{1}^{*} \sqrt{P_{d} (p + t) / n} ≲ h ≲ \underset{̲}{f} m_{3}^{- 1 / 2}$ . Let $J_{h} = \nabla^{2} Q_{h} (β^{*}) = E {{[\frac{δ}{1 - G (z)}]}^{2} K_{h} (ε) x x^{T}}$ ; then,

(9) $\begin{matrix} P {∥ J_{h} ({\hat{β}}_{h} - β^{*}) - \frac{1}{n} \sum_{i = 1}^{n} {τ - K_{h} (- ε_{i})} x_{i} ∥_{Σ} \leq & R (\frac{p + t}{n h^{1 / 2}} + h^{3 / 2} \cdot \sqrt{\frac{p + t}{n}} + h^{4})} \\ \geq & 1 - 3 e^{- t}, \end{matrix}$

where the real number $R > 0$ is a constant independent of p and n.

Theorem 2 allows for establishing the limiting distribution of the estimators. Based on the non-asymptotic representation in Theorem 2, we establish the Berry–Esseen upper bound for smoothing estimators.

Theorem 3

(Berry–Esseen upper bound). Assuming that the conditions $A 1$ – $A 3$ and $A 4^{*}$ hold, ${sup}_{u \in R} f_{ε | x} (u) \leq \bar{f}$ holds almost everywhere, and h satisfies the condition that for any real number $t > 0$ , there exists $\sqrt{\frac{p + t}{n}} ≲ h ≲ 1$ . Then,

(10) $\begin{matrix} Λ_{n, p} (h) : = sup_{x \in R, a \in R^{p}} |P (n^{1 / 2} σ_{h}^{- 1} ⟨ a, {\hat{β}}_{h} - β^{*} ⟩ \leq x) - Φ (x)| ≲ \frac{p + log n}{{(n h)}^{1 / 2}} + n^{1 / 2} h^{2}, \end{matrix}$

where $σ_{h}^{2} = σ_{h}^{2} (a) = a^{T} J_{h}^{- 1} E [{(\frac{δ}{1 - G (z)})}^{2} {K_{h} (- ε) - τ}^{2} x x^{T}] J_{h}^{- 1} a$ , $Φ (\cdot)$ denotes the standard normal distribution function.

Further, if $f_{ε | x} (\cdot)$ is quadratically continuously derivable and satisfies $| f_{ε | x}^{″} (u) - f_{ε | x}^{″} (υ) | \leq l_{2} (x) | u - υ |$ for any real number $u, υ \in R$ , where the function $l_{2} : R^{p} \to R^{+}$ satisfies $E {l_{2}^{2} (x)} \leq C$ for some positive constant C, then

$\begin{matrix} sup_{x \in R, a \in R^{p}} | P (n^{1 / 2} σ_{h}^{- 1} ⟨ a, {\hat{β}}_{h} - β^{*} & + 0.5 κ_{2} h^{2} J_{h}^{- 1} E {f_{ε | x}^{'} (0) x_{i}} ⟩) - Φ (x) | \end{matrix}$

(11) $\begin{matrix} ≲ \frac{p + log n}{{(n h)}^{1 / 2}} + {(p + log n)}^{1 / 2} h^{3 / 2} + n^{1 / 2} h^{4} . \end{matrix}$

Theorem 3 proves that when h is chosen in the appropriate range and $n, p \to \infty$ , the linear combination of ${\hat{β}}_{h}$ is estimated to be asymptotically normal. According to Theorem 3, the optimal $h = {(p + log n) / n}^{2 / 5}$ in the sense that minimizes the right hand of $(10)$ , and the error is approximated as ${(p + log n)}^{4 / 5} n^{- 3 / 10}$ . If $p^{8} / n^{3} \to 0$ , for any given vector $a \in R^{p}$ , $n^{1 / 2} ⟨ a, {\hat{β}}_{h} - β^{*} ⟩$ is asymptotically normal.

Remark 2.

The assumptions $A 1$ , $A 2$ , $A 4$ and $A 4^{*}$ are commonly used in convolutional smoothing estimation of high-dimensional quantile regression models in fully observed data [10]. Condition $A 3$ refers to the assumptions concerning the distribution of the censoring variable c. Note that $P (δ_{(n)} = 1 | z_{(n)}) = P (z_{(n)} \leq c | z_{(n)}) = \int_{z_{(n)}}^{+ \infty} d G (s) = 1 - G (z_{(n)})$ , assuming that $P (δ_{(n)} = 1 | z_{(n)}) > 0$ is equivalent to $G (z_{(n)}) < 1$ —i.e., the probability that the largest observation $z_{(n)}$ equals to the true variable of interest is not zero.This is a commonly used condition in statistical inference for censored data [27,28], and this condition can avoid the situation where a large number of observations are censored data. Assuming that $(6)$ provides a local smoothing condition for $G (\cdot)$ in the neighborhood θ, the validity of this assumption could be verified intuitively for many commonly used distribution functions $G (\cdot)$ .

3. Numerical Simulation

In this section, the smoothing estimation and classical parameter estimation of the quantile regression model for censored data are numerically simulated, and two cases of low and high dimensionality are considered. Estimation of (2), as proposed in the literature [17], was chosen as the classical parameter estimation of the censored quantile regression model. Notice that the objective function of the classical parameter estimation for censored quantile regression model can be rewritten as $Q (β) = \sum_{i = 1}^{n} ρ_{τ} [\frac{δ_{i}}{1 - G (z_{i})} (z_{i} - β^{T} x)]$ . Therefore, when calculating the regression parameters, the censoring problem is transformed into a non-censoring problem, and the objective function of the smoothing estimation for censored quantile regression is rewritten as $Q_{h} (β) = \sum_{i = 1}^{n} ℓ_{h} [\frac{δ_{i}}{1 - G (z_{i})} (z_{i} - β^{T} x)]$ . The Gaussian kernel and window width $h = {(p + log n) / n}^{2 / 5}$ are taken as the kernel function and window width, respectively, for the smoothing estimation of the censored quantile regression.

3.1. Model Setting and Evaluation Indicators

In the simulation, the covariates $x_{i} = {(x_{i 1}, . . ., x_{i p})}^{T} \in R^{p}$ are generated from different distributions to simulate different types of variables commonly found in real data. The error term $ε_{i}$ is generated by three different distributions, specifically, by drawing independent identically distributed random numbers ${\tilde{ε}}_{i}$ with capacity n, and letting $ε_{i} = {\tilde{ε}}_{i} - F_{{\tilde{ε}}_{i}}^{- 1} (τ)$ , where ${\tilde{ε}}_{i}$ obeys the distributions: $(i) t (4)$ ; $(i i) χ^{2} (1)$ ; $(i i i) L a p l a c e (0, 1)$ . Let the regression coefficient $β = {(1, . . ., 1)}^{T} \in R^{p}$ , given the quantile $τ \in (0, 1)$ ; then, the response variable $y_{i}$ is modeled by

$\begin{matrix} y_{i} = β_{1} x_{i 1} + β_{2} x_{i 2} + . . . + β_{p} x_{i p} + ε_{i} . \end{matrix}$

For both low- and high-dimensional models, the right censoring variable is set as $c_{i} \sim c o n s t + E (p a)$ , where $c o n s t, p a$ are unknown parameters, which can take different values to make the censoring ratio of the response variable $y_{i}$ up to the set 15%, 30%, or 45%. In the actual simulation, in order to obtain the value of $G (z_{i})$ , the parameters $c o n s t, p a$ are estimated by using maximum likelihood estimation in the simulations of Section 3.2 and Section 3.3. In Section 3.4, we discuss the smoothing estimation under the misspecification for the distribution of censored variables. KM estimation is also taken into consideration. Let the number of simulation repetitions be K, and for the parameter estimates $\hat{β}$ in the kth simulation, write

$\begin{matrix} S E_{k} = \frac{1}{p} {∥ \hat{β} - β ∥}_{2}^{2}, \end{matrix}$

Then, we can use

$\begin{matrix} D M S E = \frac{1}{K} \sum_{k = 1}^{K} S E_{k}, \end{matrix}$

to evaluate the performance of classical parameter estimation for censored quantile regression models (CQ) and smoothing estimation methods (SCQ). In the actual simulation, we set

K = 500

3.2. Low-Dimensional Performance of Regression Smoothing Estimates for Censored Quantiles

In the low-dimensional numerical study of smoothing estimation, the number of covariates is set to be $p = 5$ and sample sizes are 100, 200, and 500. In order to assess the performance of smoothing method in the low-dimensional case, the generation of covariates is categorized into three cases.

Case 1: The p-dimensional covariates are generated from multivariate uniform distribution on ${[10, 20]}^{p}$ , and the covariance matrix is a unit matrix;
Case 2: The p-dimensional covariate is generated from multivariate uniform distribution on ${[10, 20]}^{p}$ , where $Σ = {(0 . 5^{| j - k |})}_{1 \leq j, k \leq p}$ is the covariance matrix;
Case 3: The p-dimensional covariates consist of a mixture of distributions, where the first two dimensional covariates are generated by a multivariate uniform distribution on ${[10, 20]}^{2}$ , with a covariance matrix of $Σ_{1} = {(0 . 5^{| j - k |})}_{1 \leq j, k \leq 2}$ . The covariates in the posterior three dimensions are generated from $N (μ, Σ_{2})$ with mean $μ = (11, 12, 13)$ and covariance matrix $Σ_{2} = {(σ_{j k})}_{1 \leq j, k \leq p}$ , where
$σ_{j k} = \{\begin{matrix} 0 . 2^{| j - k |} & , 1 \leq j \neq k \leq p, \\ 3 & , 1 \leq j = k \leq p . \end{matrix}$

Table 1, Table 2 and Table 3 show the simulation results when the covariates are generated according to the three scenarios, where CP denotes the censoring ratio of the response variable, n is the sample size, and columns 3–12 show the results of CQ and SCQ at different quantiles. From the estimation results, when the regression errors are generated by symmetric distributions, i.e., t and Laplace distributions, SCQ has higher accuracy than CQ, especially at the lower and higher quantiles. When the regression error term is generated by the $χ^{2}$ distribution, the estimation accuracy of CQ decreases as $τ$ increases from a global perspective, and the estimation accuracy of SCQ is much better than that of CQ at the higher quantiles, although CQ is better than SCQ at the lower quantiles. This may be because the density function of the $χ^{2}$ distribution is biased and the observations are excessively clustered in the lower quantiles, so that CQ decreases in estimation accuracy as the number of quantiles increases, while SCQ maintains better estimation accuracy. It can be seen from Table 1, Table 2 and Table 3 that SCQ is more stable than CQ, regardless of whether the error terms follow symmetric or asymmetric distribution. Specifically, the estimation accuracy of SCQ is almost the same in all quantiles, while that of CQ fluctuates with the change of $τ$ , especially in the case of the asymmetric distribution of the error terms. Overall, the estimation accuracy of CQ depends greatly on the value of $τ$ , the size of the censoring ratio, and the distribution of the error term, while the estimation effect of SCQ is minimally affected by these factors and shows good robustness.

3.3. High-Dimensional Performance of Smoothing Estimators of Censored Quantile Regression

In high-dimensional large-sample numerical studies, the ratio of sample size to dimension is fixed at $n / p = 20$ , the sample size is set at 1000–5000, and the step size is 500. In order to simulate the smoothing estimation of censored quantile regression with the change of dimension and sample size, the covariate generation is categorized into three cases.

Case 1: The p-dimensional covariate is generated from a multivariate uniform distribution on ${[10, 20]}^{p}$ , with covariance matrix $Σ = {(0 . 5^{| j - k |})}_{1 \leq j, k \leq p}$ .
Case 2: The p-dimensional covariate is generated from a multivariate uniform distribution on ${[10, 20]}^{p}$ , with covariance matrix $Σ^{'} = {(σ_{j k})}_{1 \leq j, k \leq p}$ , where
$σ_{j k} = \{\begin{matrix} 0 . 2^{| j - k |} & , 1 \leq j \neq k \leq p, \\ 3 & , 1 \leq j = k \leq p . \end{matrix}$
Case 3: The p-dimensional covariate consists of a mixture of distributions, where the first $[p / 2]$ -dimensional covariate is generated from a multivariate uniform distribution on ${[10, 20]}^{[p / 2]}$ , and the covariance matrix is $Σ_{1} = {(0 . 5^{| j - k |})}_{1 \leq j, k \leq [p / 2]}$ ; the second $[p / 2] + 1$ -dimensional covariate is generated by $N (μ, Σ_{2})$ with mean $μ = (11, . . . ., 30)$ , where the jth component $μ_{j} = 11 + \frac{20 (j - 1)}{[p / 2] + 1} (j = 1, \dots, [p / 2] + 1)$ , and covariance matrix $Σ_{2} = {(σ_{j k})}_{1 \leq j, k \leq [p / 2] + 1}$ , where
$σ_{j k} = \{\begin{matrix} 0 . 2^{| j - k |} & , 1 \leq j \neq k \leq [p / 2] + 1, \\ 3 & , 1 \leq j = k \leq [p / 2] + 1 . \end{matrix}$

The ratio of DMSE between CQ and SCQ is firstly calculated, and the simulation results of the covariates under Case 1 are displayed in Figure 1. Since the results of the three covariate generation cases are very similar, we do not show the results of Case 2 and Case 3. The results in Figure 1 show that the ratio of DMSE estimated by CQ and SCQ is not significantly affected by changes in sample size and dimensionality. When the regression error terms are generated by symmetric distributions, i.e., the t-distribution and $L a p l a c e$ -distribution, the DMSE ratios of the regression coefficients estimators between CQ and SCQ remain above one. This indicates that SCQ has a higher precision compared with CQ, especially in the case where the difference between the lower quantile of $τ = 0.1$ and the upper quantile of $τ = 0.9$ is more obvious, and the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ can reach three. When the error term is generated by the asymmetric $χ^{2}$ distribution, the ratio of DMSE between CQ and SCQ is less than one at the low quantile level, and CQ is superior to SCQ in this case. However, the ratio of DMSE between the two methods increases as $τ$ increases, and the ratio reaches around 10 when $τ = 0.9$ . In order to clarify the reason for this phenomenon, we calculate the DMSE of two estimation methods when the error term is generated by the $χ^{2}$ distribution. As shown in Figure 2, CQ performs better when $τ$ = 0.1. As $τ$ increases, the DMSE of CQ increases while its growth rate gradually increases. The CQ’s DMSE is significantly higher than the SCQ’s when $τ = 0.9$ , whereas the SCQ’s DMSE stabilizes at each quantile point in a straight line. The observed phenomenon aligns with its lower-dimensional counterpart, underscoring that the estimation accuracy of the CQ depends on the magnitude of $τ$ , the censoring ration of response variables and the distribution characteristics of the error term. In contrast, the estimation accuracy of the SCQ exhibits robustness against variations in these factors.

To assess the computational efficiency of the smoothing estimation, it is imperative to compare both the DMSE of CQ and SCQ estimations within the context of high-dimensional simulations, as well as the time expenditure associated with each estimation method. In Figure 3, the computational time ratios of CQ and SCQ are both greater than 1, which tends to increase with the augmentation of dimensionality and the sample size, and the computational time ratios of CQ and SCQ are more than 10 in all the cases when the sample size is $n = 5000$ . The results of the covariate generation in Case 2 and Case 3, which are not shown here, are similar to those in Case 1. Combined with the previous study, it is clear that, when compared to CQ, SCQ significantly decreases calculation time and increases estimation accuracy in the majority of circumstances.

3.4. Robustness of Smoothing Estimates

In order to compare the effects under the misspecification for the distribution of censored variables and KM estimation of censored distributions on the smoothing estimation of parameters, we choose Case 1 of the covariate generation method in the low- and high-dimensional simulations for numerical research.

When the sample size is 200, the model coefficients are estimated using convolutional smoothing after assuming that the distribution of the censored variables is misclassified as Normal, Weibull, and Lognormal, and using the KM estimation of G with different conditions of $τ$ quantiles, censoring proportions, and error terms. The simulation in Table 4 shows that the smoothing estimation using the misclassified distribution of the censored variables is more robust than the smoothing estimation using the KM estimation to estimate the distribution of the censored variables. Similar results are obtained for a sample size of 3000, as shown in Table 5.

The simulation shows that although there is a possibility of mis-setting in estimating the parameters of the distribution of the censored variables by using $\hat{G} (\cdot) = G (\cdot, {\hat{θ}}_{n})$ , this method is better and more robust than the smoothing of regression models after estimating the distribution of the censored variables by using the KM estimation. During the simulation process, it is also found in Table 4 and Table 5 that when the censored variables are estimated in different distribution forms, the smoothing estimation errors of the model coefficients exhibit a minimal degree of variation, and the estimation accuracy is higher than that of using KM estimation to estimate the distribution of the censored variables and then carry out the smoothing estimation. Regarding computational efficiency, the smoothing estimation method, when applied in scenarios of distributional misclassification, demonstrates a running time ratio to the SCQ that fluctuates within the range of 0.9 to 1.1. In contrast, the smoothing estimation method, once preceded by the KM estimation, incurs a significantly extended running time.

4. Discussion

In the smoothing estimation of censored quantile regression models, the distribution of the censored variables is usually unknown. The problem of the unknown distribution of the censored variables can be solved by estimating the density function of the censored variables through existing nonparametric methods, then determining the type of the censored distribution using the goodness-of-fit test, and accessing the unknown parameters in the distribution using the great likelihood method. In the numerical simulation, this paper also discusses using such a method to fit the unknown censored distribution $G (\cdot)$ to estimate $\hat{G} (\cdot) = G (\cdot, {\hat{θ}}_{n})$ , and then estimate the regression parameter $β$ .

We also discuss the parametric smoothing estimation method in the case of misspecification of the distribution of the censored variables, and compare it with the smoothing estimation method of the parameters after estimating the distribution of the censored variables $\hat{G} (\cdot)$ with KM estimation. The simulation results show that the smoothing estimation method is still more robust than the method of estimating the distribution of censored variables $\hat{G} (\cdot)$ with KM estimation even if there is a misspecification in the estimation of the distribution of the censored variables. Meanwhile, the smoothing estimation method is more robust than the classical censored quantile regression model.

Our research has certain limitations and there are some issues that need to be further explored. Firstly, we have performed some analysis and research on parameter smoothing estimation for quantile linear models; moreover, we can plan further research on parameter estimation and interval estimation for complex models such as generalized linear models. Secondly, our proof is based on the assumption that the form of censored distribution is known, and the theoretical proof by using a nonparametric model to estimate the censored distribution is still a challenging task. This requires understanding and knowledge in the theory of nonparametric statistics and probability limits.

5. Conclusions

In this paper, a convolutional smoothing estimation method for the censored quantile regression model is proposed to address the problem that the loss function is not derivable. Our method associates the convolutional smoothing estimation with the loss function of censored quantile regression, which is quadratically derivable, compared with the classical censored quantile regression estimation. Moreover, the smoothing estimation method for censored quantile regression models improves the estimation accuracy, computational speed, and robustness over the classical parameter estimation method. The contribution and significance of this paper can be summarized as follows:

The method establishes links between the convolutional smoothing method and the loss function of the censored quantile regression model, and the use of the non-negative kernel function ensures that the smoothed loss function is quadratically derivable and globally convex, which can be used to improve the computational speed by using the gradient-based iterative algorithm.
Theoretically, we characterize the bias of the smoothing estimation for censored quantile regression and establish the convergence rate, Bahadur–Kiefer representation, and Berry–Esseen upper bound of the smoothing estimation under high-dimensional and large-sample conditions.
The numerical simulation shows that the smoothing estimation method greatly reduces the computation time and improves the estimation accuracy in most cases, compared with the classical parameter estimation. In addition, the accuracy of CQ estimator is highly dependent on $τ$ , censoring ratio(CP), and the distribution of error term, but the SCQ estimation is robust to these factors.

Author Contributions

Conceptualization, M.W.; methodology, M.W.; software, M.W.; validation, M.W., X.Z. and Q.G.; formal analysis, M.W.; investigation, M.W.; resources, M.W.; data curation, M.W.; writing—original draft preparation, M.W.; writing—review and editing, X.W., X.M. and J.W.; visualization, M.W.; supervision, X.Z. and Q.G.; project administration, X.Z. and Q.G.; funding acquisition, X.W., X.Z. and Q.G. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The datasets used and analyzed of this study are available from the corresponding author(s) on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

View Image - Figure 1. Estimation results for the high-dimensional model when covariate X is generated from Case 1. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

Figure 1. Estimation results for the high-dimensional model when covariate X is generated from Case 1. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

View Image - Figure 2. The DMSE of CQ and SCQ for three high-dimensional covariate generation cases when the error terms obey the [Forumla omitted. See PDF.] distribution, with the horizontal coordinates denoting the different quantile levels, and the vertical axes denoting the DMSEs of the CQ and SCQ estimations scaled up by a factor of 10, where the solid line denotes the SCQ, and the dashed line denotes the CQ.

Figure 2. The DMSE of CQ and SCQ for three high-dimensional covariate generation cases when the error terms obey the [Forumla omitted. See PDF.] distribution, with the horizontal coordinates denoting the different quantile levels, and the vertical axes denoting the DMSEs of the CQ and SCQ estimations scaled up by a factor of 10, where the solid line denotes the SCQ, and the dashed line denotes the CQ.

View Image - Figure 3. Simulation results under the high-dimensional model when the covariates are generated from Case 1, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

Figure 3. Simulation results under the high-dimensional model when the covariates are generated from Case 1, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

Table 1

The DMSE of CQ and SCQ estimators for the low-dimensional model with covariate X generated from Case 1, where the values in columns 3–12 are DMSE × $10^{4}$ .

$CP %$	$n$	$τ = 0.1$		$τ = 0.3$		$τ = 0.5$		$τ = 0.7$		$τ = 0.9$
$CP %$	$n$	CQ	SCQ	CQ	SCQ	CQ	SCQ	CQ	SCQ	CQ	SCQ
${\tilde{ε}}_{i} \sim t (4)$
	100	397	116	143	112	112	109	137	116	387	114
15	200	180	58	63	54	51	56	66	52	179	56
	500	71	24	28	23	21	22	27	21	73	21
	100	489	151	190	134	144	133	170	131	518	135
30	200	228	74	85	70	65	69	82	67	230	65
	500	95	34	30	27	25	27	32	26	91	25
	100	687	200	224	179	181	178	220	178	628	173
45	200	302	97	103	91	84	88	107	81	291	77
	500	118	45	38	39	34	35	39	31	118	32
${\tilde{ε}}_{i} \sim χ^{2} (1)$
	100	2	128	13	135	62	126	212	122	1103	123
15	200	0	78	6	78	29	79	118	75	591	77
	500	0	39	2	40	13	39	48	39	236	37
	100	2	164	17	155	77	164	270	155	1397	144
30	200	0	97	7	98	37	97	136	95	718	90
	500	0	51	3	54	15	52	55	48	305	44
	100	3	208	23	198	89	204	321	189	1746	194
45	200	1	127	10	120	46	124	162	117	860	107
	500	0	68	4	69	20	62	73	59	366	55
${\tilde{ε}}_{i} \sim L a p l a c e (0, 1)$
	100	603	235	246	230	213	235	258	223	564	234
15	200	304	119	125	115	109	114	129	114	299	111
	500	119	52	50	49	43	46	52	45	109	46
	100	743	322	343	302	271	291	336	288	762	270
30	200	365	171	159	149	124	147	163	136	385	289
	500	140	74	62	66	53	60	59	54	138	54
	100	983	441	410	374	370	364	435	361	964	330
45	200	461	210	200	184	165	171	200	175	456	163
	500	195	101	76	80	65	81	76	71	187	64

Table 2

The DMSE of CQ and SCQ estimators for the low-dimensional model with covariate X generated from Case 2, where the values in columns 3–12 are DMSE × $10^{4}$ .

$CP %$	$n$	$τ = 0.1$		$τ = 0.3$		$τ = 0.5$		$τ = 0.7$		$τ = 0.9$
$CP %$	$n$	CQ	SCQ	CQ	SCQ	CQ	SCQ	CQ	SCQ	CQ	SCQ
${\tilde{ε}}_{i} \sim t (4)$
	100	661	173	241	168	180	179	245	183	667	173
15	200	323	88	113	80	89	80	108	78	303	79
	500	128	33	46	32	36	32	47	34	119	33
	100	840	230	266	219	226	205	288	208	847	209
30	200	390	110	137	105	112	99	145	103	413	106
	500	160	47	53	42	44	42	55	42	151	40
	100	1101	310	423	302	344	295	422	265	1210	275
45	200	521	150	180	143	145	136	175	135	506	125
	500	186	62	69	53	55	52	74	50	210	52
${\tilde{ε}}_{i} \sim χ^{2} (1)$
	100	3	190	21	195	97	199	352	194	1889	196
15	200	1	118	10	113	50	114	195	110	996	115
	500	0	60	4	59	21	58	81	59	450	60
	100	4	252	29	253	124	243	465	236	2219	236
30	200	1	146	13	145	60	142	221	139	1181	136
	500	0	78	5	74	25	73	94	72	521	68
	100	9	321	41	330	157	303	571	301	2653	307
45	200	1	179	17	184	75	193	271	187	1578	179
	500	0	93	7	92	31	91	121	93	671	90
${\tilde{ε}}_{i} \sim L a p l a c e (0, 1)$
	100	1027	363	434	363	365	361	462	347	1028	356
15	200	506	179	227	168	179	183	218	174	484	172
	500	195	75	86	67	76	68	85	70	204	71
	100	1267	449	546	453	481	414	583	437	1366	443
30	200	560	224	270	208	215	209	273	221	574	196
	500	241	93	112	85	94	86	100	86	244	83
	100	1656	615	702	586	646	608	784	584	1642	564
45	200	783	302	389	280	296	277	348	266	830	255
	500	305	122	133	115	114	116	136	105	313	101

Table 3

The DMSE of CQ and SCQ estimators for the low-dimensional model with covariate X generated from Case 3, where the values in columns 3–12 are DMSE × $10^{4}$ .

$CP %$	$n$	$τ = 0.1$		$τ = 0.3$		$τ = 0.5$		$τ = 0.7$		$τ = 0.9$
$CP %$	$n$	CQ	SCQ	CQ	SCQ	CQ	SCQ	CQ	SCQ	CQ	SCQ
${\tilde{ε}}_{i} \sim t (4)$
	100	863	259	315	255	257	253	302	252	841	249
15	200	405	131	144	128	118	126	148	125	442	122
	500	166	52	56	50	43	49	54	48	164	46
	100	1082	331	371	321	311	315	390	311	104	304
30	200	490	168	172	159	141	153	181	149	528	144
	500	203	73	71	67	54	64	66	61	187	58
	100	1443	467	518	442	421	431	522	425	1372	415
45	200	653	226	233	209	187	202	247	196	712	186
	500	260	101	91	89	70	83	89	78	248	73
${\tilde{ε}}_{i} \sim χ^{2} (1)$
	100	3	286	30	286	127	284	468	282	2529	277
15	200	1	180	12	180	65	179	262	177	1290	172
	500	0	87	5	87	26	86	101	85	535	82
	100	5	358	36	357	154	354	566	348	2992	335
30	200	1	218	15	216	76	212	295	208	1551	199
	500	0	118	6	116	32	114	122	109	688	100
	100	9	467	51	465	210	457	709	450	382	431
45	200	2	281	21	279	97	274	374	265	1866	246
	500	0	154	8	152	41	148	161	139	907	124
${\tilde{ε}}_{i} \sim L a p l a c e (0, 1)$
	100	1372	553	619	545	537	539	615	535	1337	527
15	200	638	259	276	252	229	247	269	244	649	241
	500	260	117	103	112	89	109	107	106	255	103
	100	1757	718	780	693	655	678	766	664	1675	639
30	200	780	358	334	337	295	325	336	315	819	304
	500	311	170	133	154	114	146	137	139	318	130
	100	2274	985	1007	939	865	907	945	874	2145	822
45	200	999	483	452	446	388	424	433	407	1064	386
	500	405	238	172	210	150	196	178	183	415	165

Table 4

Smoothing estimators of regression model parameters after misclassifying the distribution of censored variables as Normal, Weibull, and Lognormal, and after estimating G using KM estimation. The covariate X in the low-dimensional model is generated from Case 1, where the values in columns 2–13 are DMSE × $10^{5}$ . The sample size of the simulation is fixed at 200.

$τ$	Normal			Weibull			Lognormal			KM Estimation
$τ$	15%	30%	45%	15%	30%	45%	15%	30%	45%	15%	30%	45%
	${\tilde{ε}}_{i} \sim t (4)$
0.1	588	766	991	567	722	938	564	709	916	1366	1404	1448
0.3	554	682	861	552	677	858	552	677	858	1158	1192	1203
0.5	537	644	808	543	655	823	545	660	832	1123	1128	1157
0.7	524	618	774	536	639	800	539	648	815	1037	1051	1100
0.9	511	594	754	529	620	776	534	631	793	976	1000	1091
	${\tilde{ε}}_{i} \sim χ^{2} (1)$
0.1	808	1006	1270	793	974	1227	791	966	1212	1786	1847	1982
0.3	803	995	1250	790	969	1215	789	962	1202	1788	1810	1902
0.5	792	969	1212	785	955	1191	785	953	1185	1716	1778	1856
0.7	772	922	1150	776	929	1151	778	933	1155	1521	1684	1820
0.9	740	858	1076	759	886	1088	765	898	1102	1305	1409	1472
	${\tilde{ε}}_{i} \sim L a p l a c e (0, 1)$
0.1	1213	1671	2138	1160	1560	2047	1151	1525	1999	1778	2023	2348
0.3	1141	1472	1835	1126	1451	1851	1125	1444	1851	1370	1595	1897
0.5	1102	1372	1695	1106	1392	1754	1110	1400	1777	1208	1434	1731
0.7	1071	1295	1593	1092	1344	1678	1099	1363	1717	1116	1326	1609
0.9	1034	1215	1513	1072	1285	1604	1083	1314	1655	1078	1304	1598

Table 5

Smoothing estimates of regression model parameters after misclassifying the distribution of censored variables as Normal, Weibull, and Lognormal, and after estimating G using KM estimation. The covariate X in the high-dimensional model is generated from Case 1, where the values in columns 2–13 are DMSE × $10^{4}$ . The sample size of the simulation is fixed at 3000.

$τ$	Normal			Weibull			Lognormal			KM Estimation
$τ$	15%	30%	45%	15%	30%	45%	15%	30%	45%	15%	30%	45%
	${\tilde{ε}}_{i} \sim t (4)$
0.1	1962	2272	2891	1952	2270	2945	2020	2274	2965	2634	3593	4324
0.3	1951	2260	2937	1940	2258	2925	2022	2263	2943	2570	3532	4343
0.5	1945	2253	2891	1935	2251	2913	2026	2255	2936	2489	3459	4373
0.7	1943	2249	2902	1933	2247	2900	2019	2253	2931	2394	3381	4273
0.9	1943	2239	2892	1932	2236	2876	2013	2239	2912	2369	3306	4172
	${\tilde{ε}}_{i} \sim χ^{2} (1)$
0.1	2411	2427	2855	2391	2415	2844	2421	2445	2868	3605	3613	4105
0.3	2402	2439	2864	2392	2437	2844	2416	2448	2885	3390	3438	4035
0.5	2421	2435	2851	2391	2426	2842	2432	2447	2865	3308	3350	4062
0.7	2412	2446	2854	2392	2434	2843	2425	2465	2864	3307	3353	3955
0.9	2409	2442	2832	2390	2431	2825	2413	2444	2850	3262	3271	4009
	${\tilde{ε}}_{i} \sim L a p l a c e (0, 1)$
0.1	3857	4742	6153	3846	4739	6143	3861	4757	6165	3640	4442	5388
0.3	3840	4724	6096	3834	4711	6085	3853	4742	6101	3602	4349	5403
0.5	3838	4710	6051	3827	4692	6045	3841	4740	6067	3728	4252	5332
0.7	3835	4682	6018	3819	4675	6006	3851	4695	6024	3789	4169	5236
0.9	3834	4662	5963	3816	4654	5951	3849	4686	5975	3610	4097	5195

Appendix A

Suppose $x = {(x_{1}, . . ., x_{p})}^{T}$ , where $E (x_{j}) = 0$ , $j = 1, 2, . . ., p$ , and $Σ = E (x x^{T})$ is positive definite. For a real number $r \geq 0$ , define the set $Θ (r) = {u \in R^{p} : ∥ u ∥_{Σ} \leq r}$ , $p a r t i a l Θ (r) = {u \in R^{p} : ∥ u ∥_{Σ} = r}$ .

In order to prove Theorem 1, two lemmas are given first.

Lemma A1.

Assuming conditions $A 1, A 2$ hold, $m_{3} = {sup}_{u \in S^{p - 1}} {E | ⟨ u, ω r a n g l e |}^{3} < \infty$ , the window width satisfies $0 < h < \frac{1}{L {κ_{1} + {(m_{3} κ_{2})}^{1 / 2}}} \underset{̲}{f}$ . Secondly, $β_{h}^{*}$ is the unique solution to $Q_{h}^{*} (β)$ and satisfies (A1) $\begin{matrix} θ_{h} : = {∥ β_{h}^{*} - β^{*} ∥}_{Σ} < \frac{L κ_{2} h^{2}}{\underset{̲}{f} - L κ_{1} h} . \end{matrix}$

Moreover, assuming that $f_{ε | x} (\cdot)$ is continuously derivable and is satisfied for a constant $> 0$ , $| f_{ε | x}^{'} (u) - f_{ε | x}^{'} (0) | \leq l | u |$ holds almost everywhere under $x$ . Then, we have (A2) $\begin{matrix} ∥ Σ^{- 1 / 2} J (β_{h}^{*} - β^{*}) + & \frac{1}{2} κ_{2} h^{2} \cdot Σ^{- 1 / 2} E {f_{ε | x}^{'} (0) x} ∥_{2} \end{matrix}$ (A3) $\begin{matrix} \leq & (\frac{1}{6} L κ_{3} h^{3} + \frac{1}{2} L m_{3} θ_{h}^{2} + L κ_{1} h θ_{h}) (1 + O (n^{- 1 / 2})), \end{matrix}$ where $J = E {\frac{δ}{1 - G (z)} f_{ε | x} (0) x x^{T}}$ .

Proof of Lemma A1.

To prove (A1), we define $θ_{h} = β_{h}^{*} - β^{*} \in R^{p}$ and $θ_{h} = {∥ θ_{h} ∥}_{Σ}$ . Because of the convexity of the loss function $Q_{h} (β)$ and the fact that $β_{h}^{*}$ is an optimal solution of $Q_{h}^{*} (β)$ —i.e., there exists a first-order optimality condition $\nabla Q_{h}^{*} (β_{h}^{*}) = 0$ —so we have(A4) $\begin{matrix} 0 \leq ⟨ \nabla Q_{h}^{*} (β_{h}^{*}) - \nabla Q_{h}^{*} (β^{*}), β_{h}^{*} - β^{*} ⟩ \leq ∥ Σ^{- 1 / 2} \nabla Q_{h}^{*} (β^{*}) ∥_{2} {∥ θ_{h} ∥}_{Σ} . \end{matrix}$

The last step of (A4) is given by the $h \ddot{o} l d e r$ inequality.

By using Taylor’s formula, we obtain $\begin{matrix} \frac{1 - G (z_{i}, θ_{n})}{1 - G (z_{i}, {\hat{θ}}_{n})} = \frac{1 - G (z_{i}, θ_{n})}{1 - G (z_{i}, θ_{n}) - \frac{\partial G (z; θ_{n}^{*})}{\partial θ} ({\hat{θ}}_{n} - θ_{n})}, \end{matrix}$ where $∥ θ_{n}^{*} - θ_{n} ∥_{2} \leq {∥ {\hat{θ}}_{n} - θ_{n} ∥}_{2}$ . Combined with $A 3$ and the assumption that ${\hat{θ}}_{n}$ is the maximum likelihood estimation of $θ_{n}$ , we have $∥ {\hat{θ}}_{n} - θ_{n} ∥_{2} = O (\frac{1}{\sqrt{n}})$ . When the number of parameters is limited, the estimated distribution of censored variable equals to its true distribution and converges to probability one. For ease of illustration, the equals sign is used in the following explanations.

Notice that $\nabla Q_{h}^{*} (β^{*}) = E {\frac{δ [K (- ε / h) - τ]}{1 - \hat{G} (z)} x} = E {\frac{δ [K (- ε / h) - τ)}{1 - \hat{G} (y)} x}$ . Through expanding using Taylor’s formula yields, we have $\begin{matrix} E {\frac{δ}{1 - \hat{G} (z)} [K (- ε / h) - τ] | x} & = \int_{- \infty}^{+ \infty} \frac{\int_{t + β^{T} x}^{+ \infty} d G (s)}{1 - \hat{G} (t + β^{T} x)} {K (- t / h) - τ} d F_{ε | x} (t) \\ = \int_{- \infty}^{+ \infty} K (u) \int_{0}^{h u} {f_{ε | x} (t) - f_{ε | x} (0)} d t d u (1 + O_{p} (n^{- 1 / 2})) . \end{matrix}$

Combined with the above equation, we can obtain(A5) $\begin{matrix} ∥ Σ^{- 1 / 2} \nabla Q_{h}^{*} (β^{*}) ∥_{2} = sup_{u \in S^{p - 1}} E {\frac{δ [K (- ε / h) - τ]}{1 - \hat{G} (z)}} ⟨ u, Σ^{- 1 / 2} x ⟩ \leq \frac{1}{2} L κ_{2} h^{2} (1 + O (n^{- 1 / 2})) . \end{matrix}$

For the left-hand side of Equation (A4), the mean value theorem for vector functions is given by(A6) $\begin{matrix} \nabla Q_{h}^{*} (β_{h}^{*}) - \nabla Q_{h}^{*} (β^{*}) = \int_{0}^{1} \nabla^{2} Q_{h}^{*} (β^{*} + t θ_{h}) d t θ_{h}, \end{matrix}$ where $\nabla^{2} Q_{h}^{*} (β) = E {\frac{δ}{1 - \hat{G} (z)} K_{h} (z - β^{T} x) x x^{T}}$ . By $θ = β - β^{*}$ and Lipschitz condition, we obtain(A7) $\begin{matrix} E {\frac{δ K_{h} (z - β^{T} x)}{1 - \hat{G} (z)} | x} = \int K (ν) f_{ε | x} (θ^{T} x + h ν) d ν = (f_{ε | x} (0) + R_{h}^{*} (θ)) (1 + O (n^{- 1 / 2})), \end{matrix}$ and $R_{h}^{*} (θ)$ satisfies $| R_{h}^{*} (θ) | \leq L (| θ^{T} x | + κ_{1} h)$ . Combining (A6) and (A7) and the assumption that $f_{ε | x} (0) \geq \underset{̲}{f} > 0$ ,(A8) $\begin{matrix} ⟨ \nabla Q_{h}^{*} (β_{h}^{*}) - \nabla Q_{h}^{*} (β^{*}), β_{h}^{*} - β^{*} ⟩ \geq (\underset{̲}{f} θ_{h}^{2} - \frac{1}{2} L m_{3} \cdot θ_{h}^{3} - L κ_{1} h \cdot θ_{h}^{2}) (1 + O (n^{- 1 / 2})) . \end{matrix}$

Combining (A4), (A5), and (A8), it can be found that $θ_{h} \geq 0$ satisfies $0.5 L m_{3} \cdot θ_{h}^{2} - (\underset{̲}{f} - L κ_{1} h) θ_{h} + 0.5 L κ_{2} h^{2} \geq 0$ . Under the condition that $L {κ_{1} + {(m_{3} κ_{2})}^{1 / 2}} h < \underset{̲}{f}$ , the solution to the inequality yields(A9) $\begin{matrix} θ_{h} \leq \frac{L κ_{2} h^{2}}{\underset{̲}{f} - L κ_{1} h + Δ_{h}^{1 / 2}}, \end{matrix}$ (A10) $\begin{matrix} θ_{h} \geq \frac{\underset{̲}{f} - L κ_{1} h + Δ_{h}^{1 / 2}}{L m_{3}}, \end{matrix}$ where $Δ_{h} : = {(\underset{̲}{f} - L κ_{1} h)}^{2} - L^{2} m_{3} κ_{2} h^{2}$ , but it is necessary to exclude the (A10) equation.

Suppose $θ_{h}$ satisfies (A10), $θ_{h} > L {(m_{3} κ_{2})}^{1 / 2} h / (L m_{3}) = {(κ_{2} / m_{3})}^{1 / 2} h = : r_{0}$ . There must exist $β \in (0, 1)$ , such that $\tilde{β} : = (1 - η) β^{*} + β β_{h}^{*}$ satisfies $∥ \tilde{β} - β^{*} ∥_{Σ} = η θ_{h} = r_{0}$ . By the convexity of $Q_{h} (β)$ , it can be proved that $\begin{matrix} ⟨ \nabla Q_{h}^{*} (\tilde{β}) - \nabla Q_{h}^{*} (β^{*}), \tilde{β} - β^{*} ⟩ & \leq η \cdot ⟨ \nabla Q_{h}^{*} (β_{h}^{*}) - \nabla Q_{h}^{*} (β^{*}), β_{h}^{*} - β^{*} ⟩ \\ = ⟨ - \nabla Q_{h}^{*} (β^{*}), \tilde{β} - β^{*} ⟩ . \end{matrix}$

Repeating the analysis of (A5) and (A8), the right of above the inequality is found, $\leq 0.5 L κ_{2} h^{2} \cdot r_{0}$ , as is the left, $\geq \underset{̲}{f} \cdot r_{0}^{2} - 0.5 L m_{3} \cdot r_{0}^{3} - L κ_{1} h \cdot r_{0}^{2} = {\underset{̲}{f} - L κ_{1} h - 0.5 L {(m_{3} κ_{2})}^{1 / 2} h} r_{0}^{2}$ . Eliminating the common element $r_{0}$ on both sides, there exists $\begin{matrix} r_{0} \leq \frac{0.5 L κ_{2} h^{2}}{\underset{̲}{f} - L κ_{1} h - 0.5 L {(m_{3} κ_{2})}^{1 / 2} h} < \frac{0.5 L κ_{2} h^{2}}{0.5 L {(m_{3} κ_{2})}^{1 / 2} h} = {(κ_{2} / m_{3})}^{1 / 2} h = r_{0} . \end{matrix}$

This leads to a contradiction. Therefore, h must be satisfied to fulfill (A9), the first inequality constraint.

In order to demonstrate (A3), the bias of the estimator is examined below. We define the variables $\begin{matrix} Δ = Σ^{- 1 / 2} {\nabla Q_{h}^{*} (β_{h}^{*}) - \nabla Q_{h}^{*} (β^{*}) - J (β_{h}^{*} - β^{*})}, \end{matrix}$ and the matrix $H = Σ^{- 1 / 2} J Σ^{- 1 / 2} = E {\frac{δ}{1 - \hat{G} (z)} f_{ε | x} (0) ω ω^{T}}$ , where $ω = Σ^{- 1 / 2} x$ . Similarly, according to the mean value theorem for vector functions it follows that(A11) $\begin{matrix} Δ = \{\int_{0}^{1} Σ^{- 1 / 2} \nabla^{2} Q_{h}^{*} (β^{*} + t θ_{h}) Σ^{- 1 / 2} d t - H\} Σ^{1 / 2} θ_{h}, \end{matrix}$ where $θ_{h} = β_{h}^{*} - β^{*}$ . Combining the Lipschitz continuity of $f_{ε | x} (\cdot)$ and (A11),(A12) $\begin{matrix} {∥ Δ ∥}_{2} \leq L (0.5 m_{3} ∥ θ_{h} ∥_{Σ} + κ_{1} h) ∥ θ_{h} ∥_{Σ} (1 + O (n^{- 1 / 2})) . \end{matrix}$

Furthermore, expanding $f_{ε | x} (\cdot)$ by a second-order Taylor series, we obtain(A13) $\begin{matrix} ∥ Σ^{- 1 / 2} \nabla Q_{h}^{*} (β^{*}) - \frac{1}{2} κ_{2} h^{2} \cdot Σ^{- 1 / 2} E {f_{ε | x}^{'} (0) x} ∥_{2} \leq \frac{1}{6} l κ_{3} h^{3} (1 + O (n^{- 1 / 2})) . \end{matrix}$

Combining (A12) and (A13) proves that Lemma A1 holds. □

Lemma A1 discusses the connection between $β_{h}^{*}$ and $β^{*}$ . Lemma A2 must be proved before discussing the relationship between ${\hat{β}}_{h}$ and $β^{*}$ . $D_{h} (θ) = Q_{h} (β^{*} + θ) - Q_{h} (β^{*}), R_{h} (θ) = D_{h} (θ) - ⟨ \nabla Q_{h} (β^{*}), θ ⟩,$ $D_{h}^{*} (θ) = Q_{h}^{*} (β^{*} + θ) - Q_{h}^{*} (β^{*}), R_{h}^{*} (θ) = D_{h}^{*} (θ) - ⟨ \nabla Q_{h}^{*} (β^{*}), θ ⟩ .$

Lemma A2.

For any $u \geq 0$ , given $r \geq 0$ , the following equation holds: (A14) $\begin{matrix} P \{sup_{θ \in Θ (r)} {D_{h}^{*} (θ) - D_{h} (θ)} \leq 3 \bar{τ} υ_{0}^{*} r \cdot (\sqrt{\frac{P_{d} \cdot u}{n}} + \frac{u}{n})\} \geq 1 - e^{4 p - u}, \end{matrix}$ where $P_{d} = \frac{\sum_{i = 1}^{n} δ_{i}}{n}$ is the uncensored proportion, $\bar{τ} = m a x {τ, 1 - τ}$ , and $υ_{0}^{*} = C_{0} υ_{0}$ . Furthermore, given $r_{u} > r_{l} > 0$ , for any $u \geq 0$ , the probability inequality (A15) $\begin{matrix} P \{D_{h}^{*} (θ) - D_{h} (θ) \leq 4.25 \bar{τ} υ_{0}^{*} {∥ θ ∥}_{Σ} (\sqrt{\frac{P_{d} \cdot u}{n}} + \frac{u}{n})\} \geq 1 - ⌈e l o g (\frac{r_{u}}{r_{l}})⌉ e^{4 p - u} \end{matrix}$ holds, where $θ$ satisfies $r_{l} \leq {∥ θ ∥}_{Σ} \leq r_{u}$ .

Proof of Lemma A2.

For each sample $s_{i} = (x_{i}, ε_{i})$ , define the loss function Difference $d_{h} (θ; s_{i}) = ℓ_{h} (ε_{i} - ⟨ x_{i}, θ ⟩) - ℓ_{h} (ε_{i})$ , so $D_{h} (θ) = (1 / n) \sum_{i = 1}^{n} d_{h} (θ; s_{i})$ . By the loss function $ℓ_{h} (u)$ being Lipschitz continuous, $d_{h} (θ; s_{i})$ is also $\bar{τ} - L i p s c h i t z$ continuous in $⟨ x_{i}, θ ⟩$ , i.e., for any $s_{i}$ and $θ, θ^{'} \in R^{p}$ , $|d_{h} (θ; s_{i}) - d_{h} (θ^{'}; s_{i})| \leq \frac{δ_{i}}{1 - \hat{G} (z_{i})} \bar{τ} |⟨ x_{i}, θ ⟩ - ⟨ x_{i}, θ^{'} ⟩|$ .

For any given $r > 0$ and some $ϵ \in (0, 1)$ , define $▵_{ϵ} (r) = n (1 - ϵ) {sup}_{e \in Θ (r)} {D_{h}^{*} (θ) - D_{h} (θ)} / (2 \bar{τ} r)$ , where $D_{h}^{*} (θ) = E D_{h} (θ)$ . Utilizing Chebyshev’s inequality, we obtain that(A16) $\begin{matrix} P {▵_{ϵ} (r) \geq u} \leq exp [- sup_{λ \geq 0} {λ u - log E e^{λ ▵_{ϵ} (r)}}] . \end{matrix}$

Through Rademacher’s symmetric control moment function $E e^{λ ▵_{ϵ} (r)}$ , there exists $\begin{matrix} E e^{λ ▵_{ϵ} (r)} \leq E exp \{2 λ (1 - ϵ) sup_{θ \in Θ (r)} \frac{1}{2 \bar{τ} r} \sum_{δ_{i} = 1} π_{i} d_{h} (θ; s_{i})\}, \end{matrix}$ where $π_{1}, . . ., π_{n}$ are independent Rademacher random variables. If $⟨ x_{i}, θ ⟩ = 0$ , then $d_{h} (θ; s_{i}) = 0$ considering that $d_{h} (θ; s_{i})$ is also $\bar{τ} - L i p s c h i t z$ continuous in $⟨ x_{i}, θ ⟩$ . By using the contraction inequality, we have $\begin{matrix} E exp \{2 λ (1 - ϵ) sup_{θ \in Θ (r)} \frac{1}{2 \bar{τ} r} \sum_{δ_{i} = 1} π_{i} d_{h} (θ; s_{i})\} \\ \leq & E exp \{\frac{λ}{r} (1 - ϵ) sup_{θ \in Θ (r)} \sum_{δ_{i} = 1} π_{i} \frac{δ_{i} ⟨ x_{i}, θ ⟩}{1 - \hat{G} (y_{i})}\}, \end{matrix}$ where $ω_{i} = Σ^{- 1 / 2} x_{i}$ . For such a $ϵ \in (0, 1)$ , there exists an $ϵ$ net ${u_{1}, . . ., u_{N_{ϵ}}}$ on $S^{p - 1}$ , when $N_{ϵ} \leq {(1 + 2 / ϵ)}^{p}$ such that $∥ \sum_{δ_{i} = 1} \frac{π_{i} δ_{i} ω_{i}}{1 - \hat{G} (y_{i})} ∥_{2} \leq {(1 - ϵ)}^{- 1} {max}_{1 \leq j \leq N_{ϵ}} \sum_{δ_{i} = 1} \frac{δ_{i} π_{i} u_{j}^{T} ω_{i}}{1 - \hat{G} (y_{i})}$ . This suggests that $\begin{matrix} E exp \{λ (1 - ϵ) {∥\sum_{δ_{i} = 1} \frac{π_{i} δ_{i} ω_{i}}{1 - \hat{G} (y_{i})}∥}_{2}\} \leq \sum_{j = 1}^{N_{ϵ}} E exp \{λ \sum_{δ_{i} = 1} \frac{π_{i} u_{j}^{T} δ_{i} ω_{i}}{1 - \hat{G} (y_{i})}\} . \end{matrix}$

Define $S_{j} = \sum_{δ_{i} = 1} \frac{π_{i} u_{j}^{T} δ_{i} ω_{i}}{1 - \hat{G} (y_{i})}$ , and we notice that $π_{i} \in {- 1, 1}$ is symmetric. From condition $A 3$ , there exists a constant $C_{0}$ such that $\frac{1}{1 - G (z)} \leq C_{0}$ , then we have $\begin{matrix} P (| ⟨ u, \frac{δ}{1 - G (z)} ω ⟩ | \geq υ_{0}^{*} t) \leq P (C_{0} | u^{T} ω | \geq υ_{0}^{*} t) = P (| u^{T} ω | \geq υ_{0} t) \leq e^{- t} . \end{matrix}$

Taking $υ_{0}^{*} = C_{0} υ_{0}$ , for any $k \geq 3$ , $E | \frac{u_{j}^{T} δ_{i} ω_{i}}{1 - \hat{G} (y_{i})} |^{k} \leq {(υ_{0}^{*})}^{k} k \int_{0}^{+ \infty} t^{k - 1} e^{- t} d t = {(υ_{0}^{*})}^{k} k! (1 + O (n^{- 1 / 2}))$ . Thus, for all $0 \leq c < 1 / υ_{0}$ , $\begin{matrix} E e^{c \frac{π_{i} u_{j}^{T} δ_{i} ω_{i}}{1 - \hat{G} (y_{i})}} = 1 + \frac{c^{2}}{2} E {(\frac{π_{i} u_{j}^{T} δ_{i} ω_{i}}{1 - \hat{G} (y_{i})})}^{2} + \sum_{ℓ = 3}^{\infty} \frac{c^{ℓ}}{ℓ!} {(\frac{π_{i} u_{j}^{T} δ_{i} ω_{i}}{1 - \hat{G} (y_{i})})}^{ℓ} \\ \leq & (1 + \frac{c^{2} υ_{0}^{* 2}}{2} + \sum_{ℓ = 2}^{\infty} \frac{c^{2 ℓ}}{(2 ℓ)!} υ_{0}^{* 2 ℓ} (2 ℓ)!) \leq (1 + \frac{c^{2} υ_{0}^{* 2}}{2} + \sum_{ℓ = 2}^{\infty} {(c υ_{0}^{*})}^{2 ℓ}) (1 + O (n^{- 1 / 2})) . \end{matrix}$

It follows that for every $0 < λ < 1 / (\sqrt{2} υ_{0}^{*})$ and $j = 1, . . . ., N_{ϵ}$ , $\begin{matrix} log E e^{λ S_{j}} \leq \frac{(n_{0} + O (\sqrt{n_{0}})) υ_{0}^{* 2} λ}{2 (1 - \sqrt{2} υ_{0}^{*} λ)}, s . t . log E e^{λ S_{j}} \leq \frac{n_{0} υ_{0}^{* 2} λ}{2 (1 - \sqrt{2} υ_{0}^{*} λ)} \end{matrix}$ where $n_{0}$ is the n sample size that satisfies the condition $δ_{i} = 1$ . For any $u \geq 0$ , notice that $\begin{matrix} sup_{λ \geq 0} {λ u - log E e^{λ ▵_{ϵ} (r)}} \geq - log N_{ϵ} + sup_{λ \in (0, {(\sqrt{2} υ_{0}^{*})}^{- 1})} \{λ u - \frac{n_{0} υ_{0}^{* 2} λ}{2 (1 - \sqrt{2} υ_{0}^{*} λ)}\} . \end{matrix}$

Substituting the above inequality into (A16), it can be shown that when $ϵ = 2 / (e^{4} - 1)$ ,(A17) $\begin{matrix} P \{sup_{θ \in Θ (r_{0})} {D_{h}^{*} (θ) - D_{h} (θ)} \leq \frac{2 \sqrt{2}}{1 - ϵ} \bar{τ} υ_{0}^{*} r \cdot (\sqrt{\frac{P_{d} \cdot u}{n}} + \frac{u}{n})\} \geq 1 - e^{4 p - u}, \end{matrix}$ where $P_{d} = n_{0} / n$ . The following is a proof of Equation (A15), which by disproof yields $θ \in Θ (r_{l}, r_{u}) : = {υ \in R^{p} . r_{l} \leq {∥ υ ∥}_{Σ} \leq r_{u}}$ . For some $γ > 1$ and positive integers $k = 1, . . ., N : = ⌈ log (\frac{r_{u}}{r_{l}}) / log (γ) ⌉$ , define the set $Θ_{k} = {υ \in R^{p} : γ^{k - 1} r_{l} \leq {∥ υ ∥}_{Σ} \leq γ^{k} r_{l}}$ ,. So $Θ (r_{l}, r_{u}) \subseteq ⋃_{k = 1}^{N} Θ_{k}$ . Then, we repeat (A17) and let $r = γ^{k} r_{l}, k = 1, . . ., N$ , $\begin{matrix} P \{\exists θ \in Θ (r_{l}, r_{u}) s . t . D_{h}^{*} (θ) - D_{h} (θ) > \frac{2 \sqrt{2} γ}{1 - ϵ} \bar{τ} υ_{0}^{*} {∥ θ ∥}_{Σ} \cdot (\sqrt{\frac{P_{d} \cdot u}{n}} + \frac{u}{n})\} \\ \leq & \sum_{k = 1}^{N} P \{sup_{θ \in Θ (γ^{k} r_{l})} D_{h}^{*} (θ) - D_{h} (θ) > \frac{2 \sqrt{2}}{1 - ϵ} \bar{τ} υ_{0}^{*} γ^{k} r_{l} \cdot (\sqrt{\frac{P_{d} \cdot u}{n}} + \frac{u}{n})\} \\ \leq & \sum_{k = 1}^{N} exp {p log (1 + 2 / ϵ) - u} \leq ⌈ log (\frac{r_{u}}{r_{l}}) / log (γ) ⌉ exp {p log (1 + 2 / ϵ) - u} . \end{matrix}$

The required constraint can be obtained by taking $ϵ = 2 / (e^{4} - 1)$ and $γ = e^{1 / e}$ . □

Proof of Theorem 1.

According to the definition of $D_{h} (θ)$ and (A5), it can be found that(A18) $\begin{matrix} D_{h} (θ) & = ⟨ \nabla Q_{h}^{*} (β^{*}), θ ⟩ + R_{h}^{*} (θ) + {D_{h} (θ) - D_{h}^{*} (θ)} \\ \geq R_{h}^{*} (θ) - ∥ Σ^{- 1 / 2} \nabla Q_{h}^{*} (β^{*}) ∥_{2} {∥ θ ∥}_{Σ} - {D_{h}^{*} (θ) - D_{h} (θ)} \end{matrix}$ where ${D_{h}^{*} (θ) - D_{h} (θ)}$ is the sampling error. From (A8), it follows again that for each $θ \in R^{p}$ , there are(A19) $\begin{matrix} R_{h}^{*} (θ) \geq \frac{1}{2} (\underset{̲}{f} - L κ_{1} h - 0.5 L m_{3} \cdot {∥ θ ∥}_{Σ} {) \cdot ∥ θ ∥}_{Σ}^{2} (1 + O (n^{- 1 / 2})) . \end{matrix}$

Take $r_{0} = {(2 κ_{2} / m_{3})}^{1 / 2} h$ as the radius of convergence. For any $θ \in \partial Θ (r_{0})$ , combining (A18) and (A19), there exists(A20) $\begin{matrix} R_{h}^{*} (θ) - 0.5 L κ_{2} h^{2} \cdot {∥ θ ∥}_{Σ} (1 + O (n^{- 1 / 2})) \geq \frac{1}{2} (\underset{̲}{f} - L κ_{1} h - L {(2 κ_{2} m_{3})}^{1 / 2} h) r_{0}^{2} (1 + O (n^{- 1 / 2})) . \end{matrix}$

To control the last term on the right-hand side of (A18), Lemma A2 gives the law of large numbers for ${D_{h} (θ) - D_{h}^{*} (θ), θ \in Θ (r)}$ . First, applying (A14) and letting $r = r_{0}$ , $u = 4 p + t$ , we obtain the following variation.(A21) $\begin{matrix} P \{sup_{θ \in Θ (r_{0})} {D_{h}^{*} (θ) - D_{h} (θ)} \leq 3 \bar{τ} υ_{0}^{*} r_{0} \cdot (\sqrt{\frac{P_{d} (4 p + t)}{n}} + \frac{4 p + t}{n})\} \geq 1 - e^{- t} . \end{matrix}$

Combining the inequalities (A18) and (A20), and the upper bound of (A21), it is sufficient that the window width h satisfies ${\underset{̲}{f}}^{- 1} m_{3}^{1 / 2} υ_{0}^{*} \sqrt{P_{d} (p + t) / n} ≲ h ≲ \underset{̲}{f} m_{3}^{- 1 / 2}$ . Then, for all $θ \in \partial Θ (r_{0})$ , the(A22) $\begin{matrix} P \{D_{h} (θ) > 0\} \geq 1 - e^{- t} . \end{matrix}$

Using the condition (A22), $D_{h} (0) = 0$ and $\hat{θ} = a r g m i n D_{h} (θ)$ , and the convexity of $D_{h} (\cdot)$ ensures that the $∥ \hat{θ} ∥_{Σ} \leq r_{0}$ holds with high probability. On the other hand, according to the optimality of ${\hat{β}}_{h}$ , $\hat{θ} : = {\hat{β}}_{h} - β^{*}$ satisfies $D_{h} (\hat{θ}) \leq 0$ .

Secondly, when $∥ \hat{θ} ∥_{Σ}$ has a lower bound, i.e., $∥ \hat{θ} ∥_{Σ}$ falls into an interval, the rate of convergence of ${\hat{β}}_{h}$ needs to be redetermined. Consider the cyclic set $Θ (r_{l}, r_{0}) = {θ \in R^{p} : r_{l} \leq {∥ θ ∥}_{Σ} \leq r_{0}}$ and $r_{l} = r_{0} h$ . If $\hat{θ} \notin Θ (r_{l}, r_{0})$ , then there must exist $\hat{θ} \in Θ (r_{l})$ , in which case the constraints can be reduced to the first case. Assume that $\hat{θ} \in Θ (r_{l}, r_{0})$ . Using Equation (A15), let $(r_{l}, r_{u}) = (r_{0} h, r_{0})$ , $u = P_{d} (l o g (e l o g h^{- 1}) + 4 p + t)$ , and $r_{1} = 4.25 \bar{τ} υ_{0}^{*} (\sqrt{\frac{P_{d} (l o g (e l o g h^{- 1}) + 4 p + t)}{n}} + \frac{l o g (e l o g h^{- 1}) + 4 p + t}{n})$ , we get(A23) $\begin{matrix} P \{D_{h}^{*} (θ) - D_{h} (θ) \leq {∥ θ ∥}_{Σ} \cdot r_{1}\} \geq 1 - e^{- t} . \end{matrix}$

The above inequality in $θ$ satisfies $θ \in Θ (r_{l}, r_{0})$ ; thus, in $θ (r_{l}, r_{0})$ , it must contain $\hat{θ}$ . Applying this formula and combining it with (A18) and (A19) and the fact that $D_{h} (\hat{θ}) \leq 0$ , we obtain $\begin{matrix} (\underset{̲}{f} - L κ_{1} h) {∥ \hat{θ} ∥}_{Σ}^{2} (1 + O (n^{- 1 / 2})) & \leq (2 r_{1} + L κ_{2} h) ∥ \hat{θ} ∥_{Σ} + 0.5 L m_{3} {∥ \hat{θ} ∥}_{Σ}^{3} \\ \leq 2 r_{1} ∥ \hat{θ} ∥_{Σ} + 2 L κ_{2} h {∥ \hat{θ} ∥}_{Σ} (1 + O (n^{- 1 / 2})) . \end{matrix}$

The conclusion of Theorem 1 can be proved by eliminating $∥ \hat{θ} ∥_{Σ}$ from both the leftmost and rightmost sides of the above equation. The proof of Theorem 1 differs from reference [10] in that $∥ \hat{θ} ∥_{Σ}$ will have $O (n^{- 1 / 2})$ more convergence order, but since h is related to the sample size, it will only retain a higher convergence speed, so only the slower converging orders are retained. Thus, it looks like the result is the same as in the non-censoring case. □

Proof of Theorem 2.

The procedure for the proof of Theorem 2 will retain the notation used in the proof of Theorem 1, for any $t \geq 0$ such that $r = r (n, p, t) ≍ \sqrt{(p + t) / n} + h^{2} > 0$ as long as $\sqrt{(p + t) / n} ≲ h ≲ 1$ there exists $P {{\hat{β}}_{h} \in β^{*} + Θ (r)} \geq 1 - 2 e^{- t}$ . Define a vector-valued stochastic process,(A24) $\begin{matrix} φ (θ) = Σ^{- 1 / 2} {\nabla Q_{h}^{*} (β^{*} + θ) - \nabla Q_{h}^{*} (β^{*}) - J_{h} θ}, \end{matrix}$ where $J_{h} = \nabla^{2} Q_{h}^{*} (β^{*})$ is denoted as the overall Hessian array of $β^{*}$ . Since ${\hat{β}}_{h}$ falls in the domain of $β^{*}$ with a high probability, ${\hat{β}}_{h}$ is able to constrain well the ${sup}_{θ \in Θ (r)} {∥ φ (θ) ∥}_{2}$ . By means of the trigonometric inequality,(A25) $\begin{matrix} sup_{θ \in Θ (r)} {∥ φ (θ) ∥}_{2} \leq sup_{θ \in Θ (r)} {∥ E φ (θ) ∥}_{2} + sup_{θ \in Θ (r)} {∥ φ (θ) - E φ (θ) ∥}_{2} : = M + N . \end{matrix}$

It is now necessary to give upper bounds for M and N, respectively. Consider first the upper bound for M, using the median theorem, $\begin{matrix} E φ (θ) = ⟨\int_{0}^{1} Σ^{- 1 / 2} \nabla^{2} Q_{h}^{*} (β^{*} + t θ) Σ^{- 1 / 2} d t - H_{h}, Σ^{1 / 2} θ⟩, \end{matrix}$ where $H_{h} : = Σ^{- 1 / 2} J_{h} Σ^{- 1 / 2} = E {\frac{δ}{1 - \hat{G} (z)} K_{h} (ε) ω ω^{T}}$ . Expanding the second-order derivative of the above expectation second-order derivative, we write $υ = Σ^{1 / 2} θ$ . For all $θ \in Θ (r)$ , we have $\begin{matrix} Σ^{- 1 / 2} \nabla^{2} Q_{h}^{*} (β^{*} + t θ) Σ^{- 1 / 2} = E \{\int_{- \infty}^{+ \infty} K (u) f_{ε | x} (t ⟨ ω, υ ⟩ - h u) d u \cdot ω ω^{T}\} . \end{matrix}$

Utilizing the Lipschitz continuity of $f_{ε | x} (\cdot)$ , $\begin{matrix} ∥ Σ^{- 1 / 2} [\int_{0}^{1} \nabla^{2} Q_{h}^{*} (β^{*} + t θ) d t] Σ^{- 1 / 2} - H_{h} ∥_{2} \\ = & {∥E \int K (u) {f_{ε | x} (t ⟨ ω, υ ⟩ - h u) - f_{ε | x} (- h u)} d u \cdot ω ω^{T}∥}_{2} \\ \leq & L m_{3} r t (1 + O (n^{- 1 / 2})) . \end{matrix}$

The last inequality can be obtained from the Cauchy–Schwarz inequality. Therefore,(A26) $\begin{matrix} sup_{θ \in Θ (r)} {∥ E φ (θ) ∥}_{2} \leq 0.5 L m_{3} r^{2} (1 + O (n^{- 1 / 2})) . \end{matrix}$ The proof procedure for the upper bound of N can be found in [10,29], i.e., the upper bound of the zero-mean stochastic process $φ (θ) - E φ (θ)$ in the sense of $l_{2}$ -paradigm numbers. □

Proof of Theorem 3.

Let $a \in R^{p}$ be a p-dimensional real vector. Given $h = h_{n} > 0$ , define $S_{n} = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} γ_{i} η_{i}$ and the centered variable $S_{n}^{0} = S_{n} - E S_{n}$ , where $η_{i} = \frac{δ_{i}}{1 - \hat{G} (z_{i})} {τ - K (- ε_{i} / h)}, γ_{i} = ⟨ J_{h}^{- 1} a, x_{i} ⟩$ . According to the Lipschitz continuity for $f_{ε_{i} | x_{i}} (\cdot)$ and the Fundamental Theorem of Calculusm we obtain $| E (η_{i} | x_{i}) | \leq 0.5 L κ_{2} h^{2} (1 + O (n^{- 1 / 2}))$ , $κ_{2} (h^{2} (1 + O (n^{- 1 / 2}))$ , and then $E (γ_{i} η_{i}) \leq 0.5 L κ_{2} {∥ J_{h}^{- 1} a ∥}_{Σ} \cdot h^{2}$ .

The proof procedure of Theorem 3 can be found in [10]. □

Appendix B

Estimation results for the high-dimensional model when covariate X is generated from Case 2 (Figure A1) and Case 3 (Figure A2). Simulation results under the high-dimensional model when the covariates are generated from Case 2 (Figure A3) and Case 3 (Figure A4).

View Image - Figure A1. Estimation results for the high-dimensional model when covariate X is generated from Case 2. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

Figure A1. Estimation results for the high-dimensional model when covariate X is generated from Case 2. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

View Image - Figure A2. Estimation results for the high-dimensional model when covariate X is generated from Case 3. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

Figure A2. Estimation results for the high-dimensional model when covariate X is generated from Case 3. The horizontal coordinates of the plots indicate the sample size (in thousands), and the vertical coordinates indicate the ratio of DMSE for regression coefficients’ estimators between CQ and SCQ.

View Image - Figure A3. Simulation results under the high-dimensional model when the covariates are generated from Case 2, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

Figure A3. Simulation results under the high-dimensional model when the covariates are generated from Case 2, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

View Image - Figure A4. Simulation results under the high-dimensional model when the covariates are generated from Case 3, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

Figure A4. Simulation results under the high-dimensional model when the covariates are generated from Case 3, where the horizontal coordinate denotes the sample size (in thousands), and the vertical coordinate denotes the ratio of estimated running time between CQ and SCQ.

References

1. Koenker, R.; Gilbert, B., Jr. Regression quantiles. Econometrica; 1978; 46, pp. 33-50. [DOI: https://dx.doi.org/10.2307/1913643]

2. Koenker, R. Quantile Regression; Cambridge University Press: Cambridge, UK, 2005.

3. Koenker, R.; Chernozhukov, V.; He, X.M.; Peng, L. Handbook of Quantile Regression; Chapman & Hall/CRC: Boca Raton, FL, USA, 2017.

4. Horowitz, J.L. Bootstrap methods for median regression models. Econometrica; 1998; 66, pp. 1327-1351. [DOI: https://dx.doi.org/10.2307/2999619]

5. de Castro, L.; Galvao, A.F.; Kaplan, D.M.; Liu, X. Smoothed GMM for quantile models. J. Econom.; 2019; 213, pp. 121-144. [DOI: https://dx.doi.org/10.1016/j.jeconom.2019.04.008]

6. Chen, X.; Liu, W.; Zhang, Y. Quantile regression under memory constraint. Ann. Stat.; 2019; 47, pp. 3244-3273. [DOI: https://dx.doi.org/10.1214/18-AOS1777]

7. Galvao, A.F.; Kato, K. Smoothed quantile regression for panel data. J. Econom.; 2016; 193, pp. 92-112. [DOI: https://dx.doi.org/10.1016/j.jeconom.2016.01.008]

8. Whang, Y.J. Smoothed empirical likelihood methods for quantile regression models. Econ. Theory; 2006; 22, pp. 173-205. [DOI: https://dx.doi.org/10.1017/S0266466606060087]

9. Fernandes, M.; Guerre, E.; Horta, E. Smoothing quantile regressions. J. Bus. Econom Stat.; 2021; 39, pp. 338-357. [DOI: https://dx.doi.org/10.1080/07350015.2019.1660177]

10. He, X.; Pan, X.; Tan, K.M.; Zhou, W.X. Smoothed quantile regression with large-scale inference. J. Econom.; 2023; 232, pp. 367-388. [DOI: https://dx.doi.org/10.1016/j.jeconom.2021.07.010] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36776480]

11. Bayarassou, N.; Hamrani, F.; Ould, S.E. Nonparametric relative error estimation of the regression function for left truncated and right censored time series data. J. Nonparametr. Stat.; 2023; 36, pp. 706-729. [DOI: https://dx.doi.org/10.1080/10485252.2023.2241572]

12. Ying, Z.; Jung, S.H.; Wei, L.J. Survival analysis with median regression models. J. Am. Stat. Assoc.; 1995; 90, pp. 178-184. [DOI: https://dx.doi.org/10.1080/01621459.1995.10476500]

13. Honoré, B.; Khan, S.; Powell, J.L. Quantile regression under random censoring. J. Econom.; 2002; 109, pp. 67-105. [DOI: https://dx.doi.org/10.1016/S0304-4076(01)00142-7]

14. Portnoy, S. Censored regression quantiles. J. Am. Stat. Assoc.; 2003; 98, pp. 1001-1012. [DOI: https://dx.doi.org/10.1198/016214503000000954]

15. Peng, L. Self-consistent estimation of censored quantile regression. J. Multivar. Anal.; 2012; 105, pp. 368-379. [DOI: https://dx.doi.org/10.1016/j.jmva.2011.10.005]

16. Yuan, X.; Zhang, X.; Guo, W.; Hu, Q. An adapted loss function for composite quantile regression with censored data. Comput. Stat.; 2024; 39, pp. 1371-1401. [DOI: https://dx.doi.org/10.1007/s00180-023-01352-6]

17. Gao, Q.; Zhou, X.; Feng, Y.; Du, X.; Liu, X. An empirical likelihood method for quantile regres sion models with censored data. Metrika; 2021; 84, pp. 75-96. [DOI: https://dx.doi.org/10.1007/s00184-020-00775-1]

18. Hao, R.; Weng, C.; Liu, X.; Yang, X. Data augmentation based estimation for the censored quantile regression neural network model. Expert Syst. Appl.; 2023; 214, 119097. [DOI: https://dx.doi.org/10.1016/j.eswa.2022.119097]

19. Yang, X.; Narisetty, N.N.; He, X. A new approach to censored quantile regression estimation. J. Comput. Graph. Stat.; 2018; 27, pp. 417-425. [DOI: https://dx.doi.org/10.1080/10618600.2017.1385469]

20. Peng, L.; Huang, Y. Survival analysis with quantile regression models. J. Am. Stat. Assoc.; 2008; 103, pp. 637-649. [DOI: https://dx.doi.org/10.1198/016214508000000355]

21. Xu, G.; Sit, T.; Wang, L.; Huang, C.Y. SEstimation and inference of quantile regression for survival data under biased sampling. J. Am. Stat. Assoc.; 2017; 112, pp. 1571-1586. [DOI: https://dx.doi.org/10.1080/01621459.2016.1222286] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30078919]

22. Cai, Z.; Sit, T. On interquantile smoothness of censored quantile regression with induced smoothing. Biometrics; 2023; 79, pp. 3549-3563. [DOI: https://dx.doi.org/10.1111/biom.13892] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37382567]

23. Kim, K.H.; Caplan, D.J.; Kang, S. Smoothed quantile regression for censored residual life. Comput Stat.; 2023; 38, pp. 1001-1022. [DOI: https://dx.doi.org/10.1007/s00180-022-01262-z]

24. Wu, Y.; Ma, Y.; Yin, G. Smoothed and corrected score approach to censored quantile regression with measurement errors. J. Am. Stat. Assoc.; 2015; 110, pp. 1670-1683. [DOI: https://dx.doi.org/10.1080/01621459.2014.989323]

25. He, X.; Pan, X.; Tan, K.M.; Zhou, W.X. Scalable estimation and inference for censored quantile regression process. Ann. Stat.; 2022; 50, pp. 2899-2924. [DOI: https://dx.doi.org/10.1214/22-AOS2214]

26. Fei, Z.; Zheng, Q.; Hong, H.G.; Li, Y. Inference for high-dimensional censored quantile regression. J. Am. Stat. Assoc.; 2023; 118, pp. 898-912. [DOI: https://dx.doi.org/10.1080/01621459.2021.1957900] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37309513]

27. De Backer, M.; Ghouch, A.E.; Van, K.I. An adapted loss function for cen sored quantile regression. J. Am. Stat. Assoc.; 2019; 114, pp. 1126-1137. [DOI: https://dx.doi.org/10.1080/01621459.2018.1469996]

28. Leng, C.; Tong, X. Censored quantile regression via Box-Cox transformation under conditional independence. Stat. Sin.; 2014; 24, pp. 221-249. [DOI: https://dx.doi.org/10.5705/ss.2012.089]

29. Spokoiny, V. Bernstein–von Mises theorem for growing parameter dimension. arXiv; 2013; [DOI: https://dx.doi.org/10.48550/arXiv.1302.3430] arXiv: 1302.3430

Word count: 9080

Show less

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

In this paper, we propose a smoothing estimation method for censored quantile regression models. The method associates the convolutional smoothing estimation with the loss function, which is quadratically derivable and globally convex by using a non-negative kernel function. Thus, the parameters of the regression model can be computed by using the gradient-based iterative algorithm. We demonstrate the convergence speed and asymptotic properties of the smoothing estimation for large samples in high dimensions. Numerical simulations show that the smoothing estimation method for censored quantile regression models improves the estimation accuracy, computational speed, and robustness over the classical parameter estimation method. The simulation results also show that the parametric methods perform better than the KM method in estimating the distribution function of the censored variables. Even if there is an error setting in the distribution estimation, the smoothing estimation does not fluctuate too much.

Details

Title

Smoothing Estimation of Parameters in Censored Quantile Linear Regression Model

Author

Wang, Mingquan¹

; Ma, Xiaohua¹; Wang, Xinrui²; Wang, Jun¹; Zhou, Xiuqing¹

; Gao, Qibing¹

¹ School of Mathematical Sciences, Nanjing Normal University, Nanjing 210023, China; [email protected] (M.W.); [email protected] (X.M.); [email protected] (J.W.)
² College of International Languages and Cultures, Hohai University, Nanjing 211100, China; [email protected]

First page

192

Publication year

2025

Publication date

2025

Publisher

MDPI AG

e-ISSN

22277390

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/math13020192

ProQuest document ID

3159530016

Smoothing Estimation of Parameters in Censored Quantile Linear Regression Model

Jump to:

Full text

3. Numerical Simulation

3.1. Model Setting and Evaluation Indicators

3.2. Low-Dimensional Performance of Regression Smoothing Estimates for Censored Quantiles

3.3. High-Dimensional Performance of Smoothing Estimators of Censored Quantile Regression

3.4. Robustness of Smoothing Estimates

Abstract

Details

Suggested sources