Full Text

Turn on search term navigation

1. Introduction

This work proposes a general variable selection method for the length-biased and interval-censored failure time data with the classical proportional hazards (PH) model. Interval-censored data arise when each failure time of interest cannot be measured accurately but is only known to lie in a certain time interval formed by periodical follow-ups [1]. Such data are frequently encountered in many scientific studies including clinical trials and epidemiological surveys, and their regression analysis has been discussed extensively in the literature, see [2,3,4,5,6,7,8] for details. Specifically, Zeng et al. [4], Wang et al. [6] and Zeng et al. [7] investigated inference procedures for the additive hazards, PH and transformation models, respectively.

In addition to interval censoring, left truncation is also frequently encountered in prospective cohort studies, inducing non-randomly selected samples from the target population. A typical example of left truncation occurs in the PLCO study where individuals with any of the PLCO cancers at the onset of the study were not enrolled [6,9]. In particular, when the truncation times follow the uniform distribution (also known as the length-biased or stationarity assumption), the left-truncated data reduce to the length-biased data discussed by many authors, including but not limited to Wang [10], Shen et al. [11] and Ning et al. [12].

The analysis of length-biased data under right censoring has been investigated extensively in the literature [11,13,14,15,16]. To name a few examples, Shen et al. [11] presented unbiased estimating equation approaches for the transformation and accelerated failure time models. Qin and Shen [13] developed an inverse weighted estimating equation approach for the PH model. Qin et al. [14] developed new expectation-maximization (EM) algorithms to estimate the survival function of the failure time. For the length-biased and interval-censored data, Gao and Chan [15] developed an EM algorithm for the PH model via two-stage data augmentation. Further, Shen et al. [16] considered the mixture PH model with a nonsusceptible or cured fraction.

In many practical applications, one may collect a large number of candidate covariates, but in general, only a few covariates are useful to model the failure time of interest. In such a case, penalized variable selection provides a useful tool to eliminate irrelevant variables and further enhance the estimation accuracy. Popular penalty functions include LASSO [17], SCAD [18], adaptive LASSO (ALASSO) [19], SICA [20], SELO [21], MCP [22] and BAR [23,24]. In particular, Fan et al. [25] provided a comprehensive review of variable selection methods and the corresponding algorithms. In recent years, machine learning-based methods have also gained considerable attention due to their great ability in identifying relevant features. To name a few examples, Garavand et al. [26] used clinical examination features and compared different machine learning algorithms in developing a model for the early diagnosis of coronary artery disease. Hosseini et al. [27] used blood microscopic images and a convolutional neural network algorithm for detecting and classifying B-acute lymphoblastic leukemia. Garavand et al. [28] conducted a systematic review of advanced techniques to facilitate the rapid diagnosis of coronary artery disease. Ghaderzadeh and Aria [29] conducted a systematic review of Artificial Intelligence techniques for COVID-19 detection.

Regarding the left-truncated failure time data, a number of variable selection methods have been proposed. In particular, Chen [30] considered the right-censored data and developed a variable selection method for the additive hazards model with covariate measurement errors. He et al. [31] also considered the right-censored data and performed variable selection with penalized estimating equations for the accelerated failure time model. Li et al. [32] developed a conditional likelihood-based variable selection method for left-truncated and interval-censored data under the PH model. However, it is worth noting that the work of Li et al. [32] only involved the ALASSO penalty and their method can be anticipated to lose some efficiency due to ignoring the distribution information of the truncation times.

In this paper, we offer an efficient penalized likelihood method to achieve variable selection in the PH model with length-biased and interval-censored data. Compared with the traditional conditional likelihood method in Li et al. [32], the proposed method yields an efficiency gain via fully taking into account the distribution information of the truncation times. In particular, to optimize the penalized likelihood function with an intractable form, we develop a penalized EM algorithm by introducing pseudo-left-truncated data and Poisson random variables. The proposed method is easy to implement and computationally stable and has desirable advantages over the variable selection method based on the penalized conditional likelihood [32]. An application to a real data set arising from the Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial demonstrates the practical usefulness of the proposed method.

The PLCO cancer screening trial is a large-scale multicenter trial conducted to screen the PLCO cancers and investigate cancer-related mortality. To date, motivated by the rich data structure in the PLCO database, various statistical methods have already been proposed in the literature. To name a few examples, Wang et al. [6] developed an EM algorithm to estimate the spline-based PH model with interval-censored data. Sun et al. [33] considered variable selection in a semiparametric nonmixture cure model with interval-censored data. Li and Peng [34] investigated instrumental variable estimation of complier causal treatment effect with interval-censored data. Withana Gamage et al. [35] considered the estimation of the PH model with left-truncated and arbitrarily interval-censored data.

The remainder of this paper is organized as follows. In Section 2, we first introduce the notation, assumption and corresponding likelihood. Section 3 presents the proposed penalized EM algorithm, and Section 4 establishes the oracle property of the proposed estimators. In Section 5, a simulation study is conducted to assess the variable selection performance and the estimation accuracy of the proposed method, followed by an application in Section 6. Some discussions and conclusions are given in Section 7. Section 8 provides several potential future research directions.

2. Notation, Model and Penalized Likelihood

For the target population, let $\tilde{T}$ , $\tilde{A}$ and $\tilde{Z}$ denote the failure time of interest (e.g., the time to the onset of the failure event), the truncation time (e.g., the time to the study enrollment) and the p-dimensional vector of covariates, respectively. Given $\tilde{Z}$ , the PH model specifies that the conditional cumulative hazard function of $\tilde{T}$ takes the form

(1) $Λ (t ∣ \tilde{Z}) = Λ (t) exp (β^{⊤} \tilde{Z}),$

where

β

is the p-dimensional vector of the unknown regression coefficients and

Λ (\cdot)

is an unknown increasing cumulative baseline hazard function. Let d denote the number of nonzero components in

β

, and

β_{10}

and

β_{20}

denote all d nonzero and

p - d

zero-component vectors, respectively.

Under the left-truncation scheme, only individuals with $\tilde{T} \geq \tilde{A}$ are enrolled in the study, and the failure time, truncation time and covariate vector that satisfy $\tilde{T} \geq \tilde{A}$ are denoted by T, A and $Z$ , respectively. Then, we know that $(T, A, Z)$ has the same joint distribution as $(\tilde{T}, \tilde{A}, \tilde{Z})$ given $\tilde{T} \geq \tilde{A}$ [36]. As mentioned above, if the truncation time $\tilde{A}$ is further assumed to follow the uniform distribution on $(0, τ)$ , where $(0, τ)$ is the support of $\tilde{T}$ , we have the length-biased sampling mechanism [10,14]. Let $f (t ∣ z)$ and $S (t ∣ z)$ denote the density and survival functions of $\tilde{T}$ given $\tilde{Z} = z$ , respectively. Under the assumption that $\tilde{A}$ is independent of $\tilde{T}$ given $\tilde{Z}$ , the joint density function of $(T, A)$ given $Z = z$ evaluated at $(t, a)$ is

(2) $\frac{h (a) f (t ∣ z)}{P (\tilde{T} \geq \tilde{A})} = \frac{h (a) f (t ∣ z)}{\int_{0}^{τ} h (a) S (a ∣ z) d a},$

where

h (a)

denotes the density function of

\tilde{A}

at time a and equals

1 / τ

under the length-biased sampling scheme.

Consider a failure time study that recruits n subjects and each failure time suffers from interval censoring due to the periodical examinations for the occurrence of the failure event. For $i = 1, \dots, n$ , denote by $T_{i}$ , $A_{i}$ and $Z_{i}$ the failure time, truncation time and covariate vector of the ith subject in the study, respectively. We assume that there exists a sequence of examination times $A_{i} \leq U_{i, 1} < \dots < U_{i, M_{i}} < U_{i, M_{i} + 1}$ for subject i, where $M_{i}$ is a random positive integer, and define $U_{i, M_{i} + 1} = \infty$ . Let $L_{i}$ and $R_{i}$ denote the endpoints of the smallest interval that brackets $T_{i}$ . That is, $T_{i} \in (L_{i}, R_{i}]$ . Clearly, $T_{i}$ is left-censored if $L_{i} = A_{i}$ , and $T_{i}$ is right-censored if $R_{i} = \infty$ . Then, the observed data consist of $O = {O_{i} = (L_{i}, R_{i}, A_{i}, Z_{i}); i = 1, \dots, n}$ . Under the assumption that examination times are independent of the failure and truncation times given the covariates, the likelihood function based on the observed data can be written as

(3) $L (β, Λ) = \prod_{i = 1}^{n} \frac{S (L_{i} ∣ Z_{i}) - I (R_{i} < \infty) S (R_{i} ∣ Z_{i})}{\int_{0}^{τ} exp \{- Λ (a) exp (β^{⊤} Z_{i})\} d a / τ},$

where

S (t ∣ Z_{i}) = exp {- Λ (t) exp (β^{⊤} Z_{i})}

Essentially, the likelihood (3) is the product of the marginal likelihood and the conditional likelihood, that is, $L (β, Λ) = L_{M} (β, Λ) \times L_{C} (β, Λ)$ , where $L_{M} (β, Λ) = \prod_{i = 1}^{n} S (A_{i} ∣ Z_{i}) / μ_{β, Λ} (Z_{i}),$ and $L_{C} (β, Λ) = \prod_{i = 1}^{n} {S (L_{i} ∣ Z_{i}) - I (R_{i} < \infty) S (R_{i} ∣ Z_{i})} / S (A_{i} ∣ Z_{i}) .$ In the above, $I (\cdot)$ denotes the indicator function, $μ_{β, Λ} (Z_{i}) = τ^{- 1} \int_{0}^{τ} exp \{- Λ (a) exp (β^{⊤} Z_{i})\} d a$ ; $L_{M} (β, Λ)$ is the marginal likelihood of ${A_{i}; i = 1, \dots, n}$ given ${Z_{i}; i = 1, \dots, n}$ ; and $L_{C} (β, Λ)$ is the conditional likelihood given the $A_{i}$ ’s. Notably, the commonly used conditional likelihood method only utilizes $L_{C} (β, Λ)$ for inference, which can be anticipated to lose some estimation efficiency because $L_{M} (β, Λ)$ also involves the parameters in model (1).

For the nuisance function $Λ$ , we propose to approximate it with a step function that has non-negative jumps at the unique examination times. Specifically, let $0 < t_{1} < \dots < t_{K_{n}} < \infty$ denote the ordered unique values of ${(L_{i}, R_{i} I (R_{i} < \infty)), i = 1, \dots, n}$ , where $K_{n}$ is an integer determined by the observed data. For $k = 1, \dots, K_{n}$ , denote by $f_{k}$ the non-negative jump size of $Λ$ at $t_{k}$ . Then, we have $Λ (t) = \sum_{t_{k} \leq t} f_{k}$ , and the likelihood function (3) can be rewritten as

(4) $L (β, F) = \prod_{i = 1}^{n} \frac{exp \{- \sum_{t_{k} \leq L_{i}} f_{k} exp (β^{⊤} Z_{i})\} - I (R_{i} < \infty) exp \{- \sum_{t_{k} \leq R_{i}} f_{k} exp (β^{⊤} Z_{i})\}}{\int_{0}^{τ} exp \{- \sum_{t_{k} \leq a} f_{k} exp (β^{⊤} Z_{i})\} d a / τ},$

where

F = {(f_{1}, \dots, f_{K_{n}})}^{⊤}

To accomplish variable selection and estimate the nonzero parameters simultaneously, we propose to maximize the following penalized log-likelihood:

(5) $log L_{P} (β, F) = log L (β, F) - n \sum_{j = 1}^{p} p_{λ} (|β_{j}|),$

where

p_{λ} (| β_{j} |)

denotes a penalty function that depends on the tuning parameter

λ

. In what follows, we provide a general maximization procedure for (5) under various commonly adopted penalty functions, such as LASSO, ALASSO, SCAD, SELO, SICA, MCP and BAR [17,18,19,20,21,22,23,24]. Because the penalized log-likelihood (5) has an intractable form, performing direct maximization with some of the existing software is extremely difficult and unstable. This is also the case even without the penalty term as shown in Gao and Chan [15]. In the next section, we will propose a reliable and stable penalized EM algorithm to overcome this computation challenge.

3. Estimation Procedure

The proposed penalized EM algorithm involves two layers of data augmentation, which aims at simplifying the form of (4) and obtaining a tractable objective function. In the first stage of data augmentation, for the ith subject, we introduce a set of independent pseudo-truncated data, $O^{*} = {(T_{i m}^{*}, A_{i m}^{*}, Z_{i}) : T_{i m}^{*} < A_{i m}^{*}, m = 1, \dots, N_{i}}$ , which are also referred to as “ghost data” [37], and the random integer $N_{i}$ follows a negative binomial distribution $N (1, π_{i})$ with $E (N_{i} ∣ O_{i}) = (1 - π_{i}) / π_{i}$ , where

$\begin{matrix} π_{i} & = P (T_{i m}^{*} \geq A_{i m}^{*} ∣ Z_{i}) = 1 - \sum_{k = 0}^{K_{n}} P (T_{i m}^{*} = t_{k}, A_{i m}^{*} > t_{k} ∣ Z_{i}) \\ = 1 - \sum_{k = 1}^{K_{n}} (1 - t_{k} / τ) f_{k} exp (β^{⊤} Z_{i}) exp \{- \sum_{l = 1}^{k} f_{l} exp (β^{⊤} Z_{i})\} . \end{matrix}$

Given $N_{i} = n_{i}$ , let $N_{i k} = \sum_{m = 1}^{n_{i}} I (T_{i m}^{*} = t_{k})$ for each $k = 1, \dots, K_{n}$ , and then $(N_{i 1}, \dots, N_{i K_{n}})$ follows a multinomial distribution with probabilities $(p_{i 1}, \dots, p_{i K_{n}})$ , where $p_{i k} = P (T_{i m}^{*} = t_{k} ∣ T_{i m}^{*} < A_{i m}^{*}, Z_{i}) = q_{i k} / (1 - π_{i}),$ and $q_{i k} = (1 - t_{k} / τ) f_{k} exp (β^{⊤} Z_{i}) exp {- \sum_{l = 1}^{k} f_{l} exp (β^{⊤} Z_{i})}$ . In the above, $τ$ is the finite upper bound of the support of $\tilde{T}$ and can be specified as $t_{K_{n}}$ in practice [14]. After deleting some constants that are irrelevant to the parameters to be estimated, the augmented likelihood function based on ${O, O^{*}}$ is

(6) $\begin{matrix} L_{1} (β, F) = & \prod_{i = 1}^{n} [(exp \{- \sum_{t_{k} \leq L_{i}} f_{k} exp (β^{⊤} Z_{i})\} - I (R_{i} < \infty) exp \{- \sum_{t_{k} \leq R_{i}} f_{k} exp (β^{⊤} Z_{i})\}) \\ \times \prod_{k = 1}^{K_{n}} q_{i k}^{N_{i k}}] . \end{matrix}$

In the second stage, for the ith subject, we introduce the independent latent variables

W_{i k}

with

i = 1, \dots, n

and

k = 1, \dots, K_{n}

, where

W_{i k}

is the Poisson random variable with mean

f_{k} exp (β^{⊤} Z_{i})

. Then, the likelihood function (6) can be re-expressed with Poisson variables as

$\begin{matrix} L_{2} (β, F) = \prod_{i = 1}^{n} \{P (\sum_{t_{k} \leq L_{i}} W_{i k} = 0) P {(\sum_{L_{i} < t_{k} \leq R_{i}} W_{i k} > 0)}^{I (R_{i} < \infty)} \times \prod_{k = 1}^{K_{n}} q_{i k}^{N_{i k}}\} . \end{matrix}$

Let

P (W_{i k} ∣ f_{k} exp (β^{⊤} Z_{i}))

be the probability mass function of

W_{i k}

and

R_{i}^{*} = L_{i} I (R_{i} = \infty) + R_{i} I (R_{i} < \infty)

. By treating the latent variables

W_{i k}

’s and

N_{i k}

’s as observable, we have the following complete data likelihood

$L_{c} (β, F) = \prod_{i = 1}^{n} \prod_{k = 1}^{K_{n}} \{P {(W_{i k} ∣ f_{k} exp (β^{⊤} Z_{i}))}^{I (t_{k} \leq R_{i}^{*})} \times q_{i k}^{N_{i k}}\},$

where we require that

\sum_{t_{k} \leq L_{i}} W_{i k} = 0

and

\sum_{L_{i} < t_{k} \leq R_{i}} W_{i k} > 0

R_{i} < \infty

, and

\sum_{t_{k} \leq L_{i}} W_{i k} = 0

R_{i} = \infty

Let $θ = {(β^{⊤}, f_{1}, \dots, f_{K_{n}})}^{⊤}$ , and let $θ^{(m)}$ be the update of $θ$ at the mth iteration with $m \geq 0$ . Based on $L_{c} (β, F)$ , we can present the expectation step (E-step) and maximization step (M-step) of the proposed algorithm. In the E-step, we calculate the conditional expectations of $W_{i k}$ and $N_{i k}$ given the observed data and $θ^{(m)}$ in the $log L_{c} (β, F)$ . This step yields

(7) $\begin{matrix} Q (θ; θ^{(m)}) & = \sum_{i = 1}^{n} \sum_{k = 1}^{K_{n}} I (t_{k} \leq R_{i}^{*}) \{E (W_{i k}) log f_{k} + E (W_{i k}) β^{⊤} Z_{i} - f_{k} exp (β^{⊤} Z_{i})\} \\ + \sum_{i = 1}^{n} \sum_{k = 1}^{K_{n}} E (N_{i k}) \{log f_{k} + β^{⊤} Z_{i} - \sum_{l = 1}^{k} f_{l} exp (β^{⊤} Z_{i})\} . \end{matrix}$

In particular, at the mth iteration of the algorithm, the expressions of the conditional expectations are given by

$\begin{matrix} E (W_{i k}) & = I (L_{i} < t_{k} \leq R_{i}, R_{i} < \infty) \frac{f_{k}^{(m)} exp (β^{(m) ⊤} Z_{i})}{1 - exp \{- \sum_{L_{i} < t_{l} \leq R_{i}} f_{l}^{(m)} exp (β^{(m) ⊤} Z_{i})\}} \end{matrix}$

and

$E (N_{i k}) = \frac{(1 - t_{k} / τ) f_{k}^{(m)} exp (β^{(m) ⊤} Z_{i}) exp \{- \sum_{l = 1}^{k} f_{l}^{(m)} exp (β^{(m) ⊤} Z_{i})\}}{π_{i}^{(m)}},$

where

$π_{i}^{(m)} = 1 - \sum_{k = 1}^{K_{n}} (1 - t_{k} / τ) f_{k}^{(m)} exp (β^{(m) ⊤} Z_{i}) exp \{- \sum_{l = 1}^{k} f_{l}^{(m)} exp (β^{(m) ⊤} Z_{i})\} .$

For notational simplicity, we omitted the conditional arguments including the observed data and current estimates of the parameters in the above conditional expectations.

In the M-step of the algorithm, by solving $\partial Q (θ; θ^{(m)}) / \partial f_{k} = 0$ , we have a closed-form expression for $f_{k}$ , which is given by

(8) $f_{k}^{(m + 1)} = \frac{\sum_{i = 1}^{n} {I (t_{k} \leq R_{i}^{*}) E (W_{i k}) + E (N_{i k})}}{\sum_{i = 1}^{n} {I (t_{k} \leq R_{i}^{*}) + \sum_{l = k}^{K_{n}} E (N_{i l})} exp (β^{(m) ⊤} Z_{i})}, k = 1, \dots, K_{n} .$

Next, by plugging (8) into (7), we have the following objective function that only involves the unknown parameter

β

$\begin{matrix} Q_{new} (β; θ^{(m)}) = \sum_{i = 1}^{n} \sum_{k = 1}^{K_{n}} & \{I (t_{k} \leq R_{i}^{*}) E (W_{i k}) + E (N_{i k})\} \\ \times \{β^{⊤} Z_{i} - log [\sum_{i^{^{'}} = 1}^{n} \{I (t_{k} \leq R_{i^{^{'}}}^{*}) + \sum_{l = k}^{K_{n}} E (n_{i^{^{'}} l})\} exp (β^{⊤} Z_{i^{^{'}}})]\} . \end{matrix}$

To obtain the sparse estimator of

β

, we propose to minimize the following objective function

(9) $H (β; θ^{(m)}) + \sum_{j = 1}^{p} p_{λ} (| β_{j} |),$

where

H (β; θ^{(m)}) = - \frac{1}{n} Q_{new} (β; θ^{(m)})

For LASSO and ALASSO, the modified shooting algorithm given in Zhang and Lu [38] and others can be adopted to minimize (9). For the BAR, the closed-form solution for $β$ is available [39]. For other penalties, after using local linear approximation for $p_{λ} (| β_{j} |)$ [40], one can also adopt the modified shooting algorithm to minimize (9).

In summary, for the given $λ$ and initial estimator $θ^{(0)}$ , we repeat the E-step and M-step until the convergence criterion is satisfied, rendering the sparse estimators of the regression parameters. It is worth pointing out that the proposed algorithm is insensitive to the choices of the initial value $θ^{(0)}$ . In practice, one can simply set the initial value of each component in $β$ to 0 and the initial value of $f_{k}$ to $1 / K_{n}$ , for $k = 1, \dots, K_{n}$ . The proposed algorithm is declared to achieve convergence if the summation of the absolute differences in the estimates between two successive iterations is less than a small positive number, such as $10^{- 3}$ .

To select the optimal $λ$ , we follow Li et al. [39] and others and adopt the BIC criterion, which is defined as

${BIC}_{λ} = - 2 l (\hat{β}, \hat{F}) + d f_{λ} \times log (n),$

where

\hat{β}

is the final estimator of

β

\hat{F}

is the final estimator of

F

l (\hat{β}, \hat{F})

denotes the logarithm of (4) and

d f_{λ}

is the total number of the nonzero estimates in

\hat{β}

and

\hat{F}

. For a given set including candidate values of

λ

, the optimal

λ

can be set as the one that yields the smallest BIC.

4. Asymptotic Properties

Without a loss of generality, we write $β = {(β_{1}^{⊤}, β_{2}^{⊤})}^{⊤}$ , where $β_{1}$ includes the first d components of $β$ that are nonzero and $β_{2}$ consists of the remaining zero components. Denote the true value of $β$ by $β_{0} = {(β_{10}^{⊤}, β_{20}^{⊤})}^{⊤}$ , where $β_{10}$ is the true value of $β_{1}$ and $β_{20}$ is the true value of $β_{2}$ . Let $\hat{β} = {({\hat{β}}_{1}^{⊤}, {\hat{β}}_{2}^{⊤})}^{⊤}$ be the estimator of $β$ obtained from the method proposed above, where ${\hat{β}}_{1}$ denotes the estimate of $β_{1}$ and ${\hat{β}}_{2}$ denotes the estimate of $β_{2}$ . In what follows, we establish the asymptotic properties of $\hat{β}$ .

For any penalty function $p_{λ} (| β_{j} |)$ with tuning parameter $λ$ , we let $ρ (x; λ) = λ^{- 1} p_{λ} (x)$ and assume that $p_{λ} (| β_{j} |)$ belongs to the function class $P$ as considered in Lv and Fan [20]:

$\begin{matrix} P = {p_{λ} (\cdot) : ρ (x; λ) & is an increasing function of x \in [0, \infty) and has a continuous derivative \\ ρ^{^{'}} (x; λ) for any x \in (0, \infty), where ρ^{^{'}} (x; λ) \in (0, \infty) is independent of λ} . \end{matrix}$

The class

P

is quite general and includes the penalty functions considered in this work. To establish the asymptotic properties of

\hat{β}

, we need the following regularity conditions.

(C1). The true regression parameter $β_{0}$ lies in a compact set $B$ in $R^{p}$ , and the true cumulative baseline hazard function $Λ_{0} (\cdot)$ is continuously differentiable and positive with $Λ_{0}^{^{'}} (t) > 0$ for all $t \in [τ_{1}, τ_{2}]$ , where $[τ_{1}, τ_{2}]$ is the union of the support of $(U_{1}, \dots, U_{M})$ and $0 < τ_{1} < τ_{2} < \infty$ . In addition, we assume that $0 < Λ_{0} (τ_{2}) < \infty$ .
(C2). The covariate vector $Z$ is bounded with probability one and the covariance matrix of $Z$ is positive definite.
(C3). The number of examination times, M, is positive and $E (M) < \infty$ . Additionally, there exists a positive constant $η$ such that $P (U_{m + 1} - U_{m} \geq η ∣ Z, M) = 1 (m = 1, \dots, M - 1)$ . Furthermore, there exists a probability measure $μ$ in $[τ_{1}, τ_{2}]$ such that the bivariate distribution function of $(U_{m}, U_{m + 1})$ conditional on $(M, Z)$ is dominated by $μ \times μ$ , and its Radon–Nikodym derivative, denoted by ${\tilde{f}}_{m} (u, v; M, Z) (m = 1, \dots, m - 1)$ , can be expanded to a positive and has twice-continuous derivatives with respect to u and v when $v - u > η$ and are continuously differentiable with respect to $Z$ .

Conditions (C1) and (C2) are standard conditions in a failure time data analysis [7]. Condition (C3) pertains to the joint distribution of the examination times, which ensures that two adjacent examination times should be separated by at least $η$ . Otherwise, the data may contain exactly observed failure times, requiring different theoretical treatments. Note that the conditions (C1)–(C3) are used in establishing the root-n consistency of the unpenalized maximum likelihood estimator of the regression vector [7], which is required for the penalty term in the penalized likelihood. Also, the conditions (C1)–(C3) ensure that the log profile likelihood $ℓ_{p} (β) = sup_{Λ} log L (β, Λ)$ has a quadratic expansion around $β_{0}$ [7].

Theorem 1

(root-n consistency). Under conditions (C1) to (C3), if $\sqrt{n} λ = O_{p} (1)$ , then $∥ \hat{β} - β_{0} ∥ = O_{p} (n^{- 1 / 2})$ , where $∥ \cdot ∥$ denotes the Euclidean norm for a given vector.

Theorem 2

(oracle property). Under conditions (C1) to (C3), if $\sqrt{n} λ \to 0$ and $n λ \to \infty$ , then $\hat{β}$ has the following properties:

1.
(Sparsity) $lim_{n \to \infty} P ({\hat{β}}_{2} = 0) = 1$ ;
2.
(Asymptotic normality) $\sqrt{n} ({\hat{β}}_{1} - β_{10}) ⇝ N (0, {\tilde{I}}_{10}^{- 1})$ , where ${\tilde{I}}_{10}$ is the upper-left $d \times d$ sub-matrix of the efficient Fisher information matrix for β, denoted by ${\tilde{I}}_{0}$ .

Theorem 1 indicates that $\hat{β}$ is consistent, and Theorem 2 (i) implies that $\hat{β}$ is sparse and has the selection consistency property, that is, $lim_{n \to \infty} P ({j : {\hat{β}}_{j} \neq 0} = {1, \dots, d}) = 1$ . Theorem 2 (ii) implies that the estimators of the nonzero regression parameters are semiparametrically efficient. The detailed proofs of Theorems 1 and 2 under the $L_{1}$ penalty ALASSO are given in the Appendix A. For other penalty functions that belong to $P$ , one can prove the above two theorems with analogous techniques, which are omitted in this paper.

5. A Simulation Study

We conducted a simulation study to evaluate the finite-sample performance of the proposed penalized EM algorithm. We first assumed that there exist 10 covariates, following the marginal standard normal distribution with the pairwise correlation $c o r r (Z_{j}, Z_{k}) = 0 . 5^{| j - k |} (j, k = 1, \dots, 10)$ . We set the true value of $β$ denoted by $β_{0}$ to be ${(0.7, 0.7, 0.7, 0, 0, 0, 0, 0, 0, 0)}^{⊤}$ (large effects) or ${(0.4, 0.4, 0.4, 0, 0, 0, 0, 0, 0, 0)}^{⊤}$ (weak effects). The truncation time $\tilde{A}$ followed the uniform distribution on $(0, τ)$ with $τ = 15$ . The failure time of interest $\tilde{T}$ was generated from model (1) with $Λ (t) = 0.3 t$ . Because we considered the length-biased sampling, only pairs that satisfy $\tilde{T} > \tilde{A}$ were kept in the simulated data, which were denoted as ${(T_{i}, A_{i}, Z_{i}), i = 1, \dots, n}$ .

To construct interval censoring, for subject i, we generated a series of potential examination times $U_{m} \sim U_{m - 1} + 0.1 + U n i f o r m (0, 2)$ with $m = 1, \dots, n_{c}$ , $U_{0} = \tilde{A}$ and $U_{n_{c}} \leq τ$ . Then, $[L_{i}, R_{i})$ was defined as the smallest interval that brackets $T_{i}$ . On average, we had about 30–44% left-censored observations and 19–30% right-censored ones. We considered some classical penalty functions, including LASSO, ALASSO, SCAD, SELO, SICA, MCP and BAR [17,18,20,21,22,24]. In order to find the optimal $λ$ for each penalty, we considered 20 equally spaced points over the interval $(a, b)$ and selected the one that minimizes the BIC. In particular, b was chosen to guarantee that all the regression parameter estimates were penalized to zero, while a was chosen to ensure that all the covariates were selected. The following results are based on $n = 200$ or 400 and 100 replications.

To assess the variable selection performance, we calculated the FP and TP, which are defined as the average number of selected covariates whose true coefficients are zero and the average number of selected covariates whose true coefficients are not zero, respectively. To measure the estimation accuracy, we reported the median of the mean squared errors (MMSE) and the standard deviation of the mean squared errors (SD), where the mean squared error is defined as ${(\hat{β} - β)}^{⊤} Σ_{Z} (\hat{β} - β)$ and $Σ_{Z}$ denotes the population covariance matrix of the covariate vector $Z$ .

Table 1 and Table 2 present the results obtained by the proposed method with $β_{0} = {(0.7, 0.7, 0.7, 0, 0, 0, 0, 0, 0, 0)}^{⊤}$ and $β_{0} = {(0.4, 0.4, 0.4, 0, 0, 0, 0, 0, 0, 0)}^{⊤}$ , respectively. In both tables, we also present the results of the oracle estimation and the analysis method without conducting variable selection. The results given in Table 1 and Table 2 show that almost all the penalty functions gave a similar variable selection performance. The only exception is LASSO, which yielded a slightly larger FP. This observation can be anticipated because LASSO often selects more noises than the other penalty functions [39]. In addition, one can see from the tables that except LASSO, the MMSEs yielded by the other penalty functions were close to the oracle estimation and smaller than the analysis method without conducting variable selection. As the sample size increased, the performance of the variable selection and estimation accuracy improved for all the penalty functions.

For comparison, we also include in Table 1 and Table 2 the results obtained by the variable selection method based on the penalized conditional likelihood (PCL). The detailed implementation of the PCL method is given in the Appendix B. Notably, Li et al. [32] also maximized the PCL to conduct variable selection but only considered the ALASSO penalty. It is clear that, compared with the PCL approach, the proposed method yielded smaller MMSEs, implying a more accurate estimation performance. Furthermore, the SDs obtained by the proposed method are smaller than those of the PCL method, because the PCL method ignores the distribution information of the truncation times and thus loses some estimation efficiency.

In this study, we also considered other simulation settings with $p = 30$ or 50. Specifically, we set $β_{j} = 0.7$ for $j = 1, \dots, 3$ and $β_{j} = 0$ for $j = 4, \dots, p$ and let other simulation specifications be the same as above. The simulation results are presented in Table 3 and Table 4 and show similar conclusions as above. In particular, the proposed method with ALASSO, SCAD, SELO, SICA, MCP and BAR yielded much smaller MMSEs than the analysis method without conducting variable selection when p increased. This clearly demonstrated the necessity of conducting variable selection in the presence of a large number of covariates.

6. An Application

6.1. Background and Analysis Methods

We applied the proposed method to a set of real data arising from the Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial [9,41]. Sponsored by the National Cancer Institute, the PLCO cancer screening trial was initiated in 1993 and recruited participants who had not previously taken part in any other cancer screening trials at ten screening centers nationwide. The recruited participants were aged from 55 to 74. In particular, the participants who were randomly assigned to the screening group received the Prostate-Specific Antigen (PSA) test periodically over 13 years. If abnormally high PSA levels were found, a prostate biopsy was conducted to determine the occurrence status of prostate cancer. In this study, we focused on the prostate cancer screening data in the screening group and aimed at identifying the important risk factors of the development of prostate cancer. This is because prostate cancer in general causes no signs or symptoms in the early stages, but as the disease progresses, it can cause serious complications, such as urination problems and anemia. Therefore, exploring the risk factors of prostate cancer exhibits a pressing need and is also beneficial to conduct early prevention for males. To this end, the failure time of interest was defined as the age at onset of prostate cancer. Because the participants were only examined intermittently, only interval-censored observations could be obtained for the onset of the prostate cancer. In addition, because the study excluded individuals who had already developed prostate cancer at the study recruitment, the age at the onset of prostate cancer suffered from left truncation with the truncation time being the age the individual enrolled in the study.

We considered seven potential risk factors, including Race (1 for African American and 0 otherwise), Education (1 for at least college and 0 otherwise), Cancer (1 for having an immediate family member with any PLCO cancer and 0 otherwise), ProsCancer (1 for having an immediate family member with prostate cancer and 0 otherwise), Diabetes (1 for having diabetes and 0 otherwise), Stroke (1 for having strokes before and 0 otherwise) and Gallblad (1 for having gall bladder stones and 0 otherwise). The sample size was n = 32,897, and the left and right censoring rates were about $8.67 %$ and $87.69 %$ , respectively.

To achieve variable selection, we implemented the proposed method with LASSO, ALASSO, SCAD, SELO, SICA, MCP and BAR as in the simulation study. To select the optimal $λ$ for each penalty, we utilized a two-step method. In the first step, we examined a range of points over $(a, b)$ to roughly identify a narrower interval that includes the optimal tuning parameter. Here, as in the simulation study, a was selected to ensure that all the covariates were selected, while b was chosen to ensure that all the regression parameter estimates were penalized to zero. Next, we further considered 20 equally spaced points within the narrower interval and selected the optimal $λ$ that minimizes the BIC. In addition, we also employed the BIC to select the best penalty among all the penalties considered for the data, and it turned out that the PH model with SCAD and MCP yielded the smallest BIC value. To calculate the standard error, we used the nonparametric bootstrap method with 100 bootstrap samples. In addition to the proposed method, we also considered the variable method based on the penalized conditional likelihood (PCL) for comparison.

6.2. Results

We summarize in Table 5 the results obtained by the proposed and PCL methods. The results indicated that, except LASSO, the proposed method with other penalties recognized Race, ProsCancer and Diabetes as significant risk factors for prostate cancer. Specifically, being African American and having an immediate family member with prostate cancer increased the risk of developing prostate cancer, while having diabetes exhibited a lower risk of developing prostate cancer. These findings are in accordance with the conclusions obtained by Meister [42], Pierce [43] and others. In contrast, the results given in Table 5 show that the method based on PCL yielded relatively larger standard error estimates compared with the proposed method. This finding again demonstrates the advantage or efficiency gain of the proposed method when taking into account the distribution information of the truncation times in the inference procedure.

7. Discussion and Conclusions

In this article, we considered the length-biased and interval-censored data and developed a penalized analysis procedure to choose important variables among the large number of covariates in the PH model. The main contribution of this work is the development of a novel penalized EM algorithm via introducing two-stage data augmentation, which can greatly simplify the penalized nonparametric maximum likelihood estimation. Specifically, by introducing pseudo-truncated data and Poisson random variables, the possible high-dimensional parameters involved in $Λ (t)$ have explicit solutions, making the proposed algorithm simple and computationally stable. In contrast to the work of Li et al. [32] that only involved the ALASSO penalty, we proposed to jointly utilize the local linear approximation and the modified shooting algorithm, yielding the sparse estimators of the regression parameters under various popular penalty functions. Thus, the proposed method can offer flexible options for the data analyst. The numerical results obtained from a simulation study showed the satisfactory performance and desirable advantage of the proposed method in finite samples. Moreover, by legitimately taking into account the distribution information of the truncation times, the proposed method is more efficient than the traditional penalized conditional likelihood approach (e.g., Li et al.’s method [32]).

Notably, the findings of our prostate cancer data analysis may have certain public health implications. Specifically, African Americans and individuals who have immediate family members with prostate cancer are specific population groups and need to receive early prevention (e.g., cancer screening) in order to reduce the risk of developing prostate cancer.

8. Suggestions for Future Work

Notably, in the proposed method, we only investigated the variable selection technique in the setting of $n > p$ . Obviously, in some practical applications, such as the gene expression studies, p is usually much larger than n, and future efforts will be devoted to extend the proposed method to handle the case of $p ≫ n$ . In addition, generalizations of the proposed method to other regression models (e.g., transformation and additive hazards models [7,44]) and taking into account multivariate interval censoring [45] warrant further research.

Author Contributions

Methodology, F.F.; Writing—review & editing, G.C.; Supervision, J.S. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The data used in the paper are not publicly available but can be requested from https://prevention.cancer.gov/major-programs/prostate-lung-colorectal-and-ovarian-cancer-screening-trial/cancer-data-access-system.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EM	expectation-maximization
LASSO	the least absolute shrinkage and selection operator penalty
ALASSO	the adaptive LASSO penalty
SCAD	the smoothly clipped absolute deviation penalty
SELO	the seamless- $L_{0}$ penalty
SICA	the smooth integration of counting and absolute deviation penalty
MCP	the minimax concave penalty
BAR	the broken adaptive ridge penalty

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Tables

Table 1

Simulation results for the variable selection and estimation accuracy with large effects.

Method	Penalty	$n = 200$			$n = 400$
Method	Penalty	TP	FP	MMSE (SD)	TP	FP	MMSE (SD)
Proposed method	LASSO	3	1.27	0.163 (0.107)	3	1.15	0.092(0.060)
	ALASSO	3	0.12	0.051 (0.058)	3	0.11	0.025 (0.022)
	SCAD	3	0.09	0.025 (0.046)	3	0.07	0.012 (0.014)
	SELO	3	0.14	0.030 (0.042)	3	0.07	0.013 (0.016)
	SICA	3	0.16	0.030 (0.042)	3	0.07	0.013 (0.015)
	MCP	3	0.16	0.025 (0.042)	3	0.07	0.012 (0.015)
	BAR	3	0.13	0.032 (0.040)	3	0.11	0.014 (0.016)
	Oracle	-	-	0.024 (0.041)	-	-	0.011 (0.014)
	Without VS	-	-	0.057 (0.055)	-	-	0.026 (0.020)
PCL method	LASSO	3	1.42	0.163 (0.162)	3	1.29	0.089 (0.074)
	ALASSO	3	0.20	0.081 (0.108)	3	0.13	0.033 (0.041)
	SCAD	3	0.12	0.056 (0.116)	3	0.08	0.025 (0.041)
	SELO	3	0.18	0.065 (0.087)	3	0.10	0.021 (0.038)
	SICA	3	0.18	0.066 (0.087)	3	0.10	0.021 (0.037)
	MCP	3	0.13	0.054 (0.116)	3	0.10	0.025 (0.043)
	BAR	3	0.19	0.065 (0.089)	3	0.13	0.023 (0.035)

“Proposed method” denotes the proposed penalized EM algorithm; “Without VS” denotes the analysis method without conducting variable selection; and “PCL method” denotes the variable selection method based on the penalized conditional likelihood.

Table 2

Simulation results for the variable selection and estimation accuracy with weak covariate effects.

Method	Penalty	$n = 200$			$n = 400$
Method	Penalty	TP	FP	MMSE (SD)	TP	FP	MMSE (SD)
Proposed method	LASSO	3	0.80	0.047 (0.040)	3	0.73	0.024(0.024)
	ALASSO	3	0.25	0.027 (0.024)	3	0.12	0.009 (0.010)
	SCAD	3	0.36	0.030 (0.028)	3	0.17	0.010 (0.011)
	SELO	3	0.23	0.017 (0.019)	3	0.10	0.008 (0.009)
	SICA	2.99	0.22	0.017 (0.021)	3	0.07	0.008 (0.008)
	MCP	2.98	0.25	0.018 (0.025)	3	0.10	0.007 (0.009)
	BAR	3	0.22	0.015 (0.018)	3	0.12	0.008 (0.009)
	Oracle	-	-	0.013 (0.014)	-	-	0.007 (0.007)
	Without VS	-	-	0.043 (0.031)	-	-	0.020 (0.013)
PCL method	LASSO	2.93	1.18	0.065 (0.075)	3	0.72	0.039 (0.040)
	ALASSO	2.85	0.50	0.060 (0.062)	2.98	0.05	0.015 (0.028)
	SCAD	2.86	0.55	0.076 (0.063)	2.99	0.15	0.017 (0.026)
	SELO	2.87	0.47	0.053 (0.060)	2.98	0.06	0.018 (0.028)
	SICA	2.88	0.48	0.053 (0.059)	2.98	0.07	0.015 (0.027)
	MCP	2.84	0.43	0.072 (0.068)	2.98	0.05	0.015 (0.028)
	BAR	2.87	0.37	0.052 (0.059)	2.97	0.08	0.014 (0.028)

Table 3

Simulation results for the variable selection and estimation accuracy with $p = 30$ .

Method	Penalty	$n = 200$			$n = 400$
Method	Penalty	TP	FP	MMSE (SD)	TP	FP	MMSE (SD)
Proposed method	LASSO	3	1.59	0.307 (0.143)	3	1.55	0.184 (0.082)
	ALASSO	3	0.45	0.085 (0.075)	3	0.13	0.028 (0.030)
	SCAD	3	0.26	0.041 (0.073)	3	0.15	0.011 (0.017)
	SELO	3	0.35	0.046 (0.051)	3	0.12	0.014 (0.019)
	SICA	3	0.31	0.045 (0.051)	3	0.12	0.015 (0.019)
	MCP	3	0.14	0.030 (0.043)	3	0.13	0.012 (0.017)
	BAR	3	0.46	0.042 (0.044)	3	0.23	0.015 (0.018)
	Oracle	-	-	0.026 (0.035)	-	-	0.010 (0.017)
	Without VS	-	-	0.216 (0.148)	-	-	0.087 (0.043)
PCL method	LASSO	3	1.87	0.362 (0.223)	3	1.33	0.220 (0.129)
	ALASSO	3	0.85	0.114 (0.124)	3	0.24	0.027 (0.046)
	SCAD	2.97	0.41	0.080 (0.134)	3	0.14	0.028 (0.038)
	SELO	2.99	0.80	0.094 (0.106)	3	0.20	0.025 (0.035)
	SICA	3	0.50	0.091 (0.112)	3	0.18	0.024 (0.035)
	MCP	2.98	0.44	0.073 (0.146)	3	0.15	0.021 (0.044)
	BAR	2.99	0.77	0.100 (0.136)	3	0.37	0.026 (0.037)

Table 4

Simulation results for the variable selection and estimation accuracy with $p = 50$ .

Method	Penalty	$n = 200$			$n = 400$
Method	Penalty	TP	FP	MMSE (SD)	TP	FP	MMSE (SD)
Proposed method	LASSO	3	1.54	0.394 (0.117)	3	1.33	0.269 (0.091)
	ALASSO	3	0.45	0.130 (0.076)	3	0.21	0.037 (0.033)
	SCAD	3	0.29	0.038 (0.061)	3	0.03	0.014 (0.023)
	SELO	3	0.27	0.049 (0.041)	3	0.14	0.018 (0.022)
	SICA	3	0.27	0.048 (0.047)	3	0.20	0.016 (0.020)
	MCP	3	0.14	0.028 (0.046)	3	0.06	0.014 (0.017)
	BAR	3	0.58	0.045 (0.044)	3	0.37	0.017 (0.020)
	Oracle	-	-	0.021 (0.030)	-	-	0.011 (0.016)
	Without VS	-	-	0.637 (0.453)	-	-	0.169 (0.070)
PCL method	LASSO	3	1.74	0.469 (0.263)	3	1.58	0.297 (0.139)
	ALASSO	2.99	0.82	0.155 (0.160)	3	0.38	0.037 (0.045)
	SCAD	2.99	0.61	0.101 (0.152)	3	0.20	0.024 (0.035)
	SELO	2.99	0.70	0.088 (0.105)	3	0.54	0.023 (0.051)
	SICA	2.99	0.65	0.089 (0.103)	3	0.57	0.022 (0.051)
	MCP	2.99	0.56	0.093 (0.157)	3	0.29	0.025 (0.041)
	BAR	3	1.02	0.097 (0.164)	3	0.62	0.036 (0.042)

Table 5

Analysis results of the prostate cancer screening data.

Method	Covariate	LASSO	ALASSO	SCAD	SELO	SICA	MCP	BAR	Without VS
Proposed	Race	0.332 (0.084 )	0.364 * (0.088)	0.444 * (0.072)	0.408 * (0.082)	0.408 * (0.083)	0.444 * (0.072)	0.412 * (0.081)	0.458 * (0.064)
method	Education	0.047 (0.027 )	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0.070 * (0.031)
	Cancer	0.032 (0.029 )	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0.047 (0.032 )
	ProsCancer	0.346 * (0.053 )	0.377 * (0.058)	0.421 * (0.049)	0.397 * (0.053)	0.398 * (0.056)	0.421 * (0.049)	0.405 * (0.052)	0.394 * (0.049)
	Diabetes	−0.310 * (0.060 )	−0.336 * (0.068)	−0.416 * (0.059)	−0.373 * (0.065)	−0.374 * (0.068)	−0.416 * (0.059)	−0.384 * (0.062)	−0.402 * (0.064)
	Stroke	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	−0.244 * (0.109 )
	Gallblad	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	−0.064 (0.058 )
PCL	Race	0.307 * (0.087 )	0.399 * (0.114)	0.420 * (0.102)	0.385 * (0.102)	0.394 * (0.105)	0.420 * (0.102)	0.377 * (0.114)	0.430 * (0.067)
method	Education	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	−0.005 (0.032 )
	Cancer	0.075 * (0.033 )	0.065 (0.049)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0.089 * (0.033 )
	ProsCancer	0.360 * (0.062 )	0.407 * (0.062)	0.453 * (0.062)	0.436 * (0.063)	0.441* (0.064)	0.453 * (0.062)	0.437 * (0.063)	0.407 * (0.051)
	Diabetes	−0.216 * (0.073 )	−0.286 * (0.078)	−0.316 * (0.077)	−0.266 * (0.089)	−0.279 * (0.093)	−0.316 * (0.077)	−0.266 * (0.087)	−0.320 * (0.065 )
	Stroke	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	−0.011 (0.109)
	Gallblad	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0 (-)	0.090 (0.060)

“Proposed method” denotes the proposed penalized EM algorithm; “PCL method” denotes the variable selection method based on the penalized conditional likelihood; “Without VS” denotes the analysis method without conducting variable selection; and “*” indicates that the covariate effect is significant at the level of 0.05.

Appendix A. Proofs of The Asymptotic Results

Appendix A.1. Proof of Theorem 1

Proof.

The penalized log-likelihood function can be written as $\begin{matrix} log L_{P} (β, F) = log L (β, F) - n λ \sum_{j = 1}^{p} \frac{| β_{j} |}{| {\tilde{β}}_{j} |}, \end{matrix}$ According to the definition of the profile likelihood function, for any given $β$ , the log profile likelihood $ℓ_{p} (β) = sup_{F} log L (β, F) = log L (β, F (β))$ , where $F (β) = arg sup_{F} log L (β, F)$ , which can be obtained by the simplified EM algorithm. Set $log L (\tilde{β}, \tilde{F}) = max_{β, F} log L (β, F)$ denotes the maximum value of the likelihood function. Obviously, for any $β$ , we have $log L (\tilde{β}, \tilde{F}) = max_{β, F} log L (β, F) \geq ℓ_{p} (β)$ . On the other hand, $ℓ_{p} (\tilde{β}) = sup_{F} log L (\tilde{β}, F) \geq log L (\tilde{β}, \tilde{F})$ . Thus, we have $ℓ_{p} (\tilde{β}) = log L (\tilde{β}, \tilde{F})$ . Therefore, the estimator $\hat{β}$ obtained by maximizing the penalized log-likelihood is equivalent to the maximum of the following penalized log profile likelihood: (A1) $Q_{n} (β) = ℓ_{p} (β) - n λ \sum_{j = 1}^{p} \frac{| β_{j} |}{| {\tilde{β}}_{j} |} .$ Under the regularity conditions (C1)–(C4), it follows from Theorem 1 of Murphy and van der Vaart [46], for any random sequence $β_{n}^{*} \overset{p}{\to} β_{0}$ , the log profile likelihood $ℓ_{p n} (β_{n}^{*})$ has the following quadratic expansion around $β_{0}$ : (A2) $\begin{matrix} ℓ_{p} (β_{n}^{*}) & = ℓ_{p} (β_{0}) + {(β_{n}^{*} - β_{0})}^{⊤} \sum_{i = 1}^{n} {\tilde{S}}_{0} (O_{i}) - \frac{1}{2} n {(β_{n}^{*} - β_{0})}^{⊤} {\tilde{I}}_{0} (β_{n}^{*} - β_{0}) \\ + o_{P_{β_{0}, Λ_{0}}} {(\sqrt{n} ∥β_{n}^{*} - β_{0}∥ + 1)}^{2}, \end{matrix}$ where ${\tilde{S}}_{0}$ and ${\tilde{I}}_{0}$ are the efficient score function for $β$ and the efficient Fisher information matrix, respectively.

According to Proposition 3.1 and the discussion in Section 4.4 given by Huang and Wellner [47], $ℓ_{p}$ is concave in $(β, F)$ . The proof of Theorem 1 in Zeng et al. [7] implies that $\hat{Λ} (τ)$ is bounded almost surely when n is large. Hence, without sacrificing generality, we can restrict the parameter $F$ to a compact space. Then, $ℓ_{p} (β) = sup_{F} log L (β, F)$ is concave in $β$ . On the other hand, combined with the concavity of $- n λ \sum_{j = 1}^{p} \frac{| β_{j} |}{| {\tilde{β}}_{j} |}$ , we can obtain that $Q_{n} (β)$ is concave. In order to establish the root-n consistency of $\hat{β}$ , it is enough to prove that, for any $ϵ > 0$ , there exists a large constant C such that (A3) $\underset{n \to \infty}{lim inf} P \{sup_{∥ u ∥ \leq C} Q_{n} (β_{0} + n^{- 1 / 2} u) < Q_{n} (β_{0})\} \geq 1 - ϵ .$ This implies that when n is sufficiently large, it can be inferred that there exists a local maximizer of $Q_{n} (β)$ in the interior of the ball ${β_{0} + n^{- 1 / 2} u : ∥ u ∥ \leq C}$ with a probability of at least $1 - ϵ$ . By the concavity of $Q_{n} (β)$ , the local maximizer must be $\hat{β}$ and thus $∥\hat{β} - β_{0}∥ = O_{p} (n^{- 1 / 2})$ .

To prove (A3), we have $\begin{matrix} \frac{1}{n} \{Q_{n} (β_{0} + n^{- 1 / 2} u) - Q_{n} (β_{0})\} & = \frac{1}{n} \{ℓ_{p} (β_{0} + n^{- 1 / 2} u) - ℓ_{p} (β_{0})\} \\ - λ \sum_{j = 1}^{p} \{\frac{| β_{j 0} + n^{- 1 / 2} u_{j} | - | β_{j 0} |}{| {\tilde{β}}_{j} |}\} \\ \leq \frac{1}{n} \{ℓ_{p} (β_{0} + n^{- 1 / 2} u) - ℓ_{p} (β_{0})\} \\ - λ \sum_{j = 1}^{d} \{\frac{| β_{j 0} + n^{- 1 / 2} u_{j} | - | β_{j 0} |}{| {\tilde{β}}_{j} |}\} . \end{matrix}$ By the quadratic expansion (A2) and using the fact that $n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{S}}_{0} (O_{i}) = O_{p} (1)$ , we have, when n is large, (A4) $\begin{matrix} \frac{1}{n} \{ℓ_{p} (β_{0} + n^{- 1 / 2} u) - ℓ_{p} (β_{0})\} \\ = \frac{1}{n} u^{⊤} n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{S}}_{0} (O_{i}) - \frac{1}{2 n} u^{⊤} {\tilde{I}}_{0} u + o_{p_{β_{0}, Λ_{0}}} (n^{- 1 / 2} ∥ u ∥ + n^{- 1 / 2})^{2} \\ = \frac{1}{n} O_{p} (1) \sum_{j = 1}^{p} | u_{j} | - \frac{1}{2 n} u^{⊤} {\tilde{I}}_{0} u + o_{p_{β_{0}, Λ_{0}}} {\frac{{(∥ u ∥ + 1)}^{2}}{n}} . \end{matrix}$ On the other hand, we have $\begin{matrix} - λ \sum_{j = 1}^{p} \{\frac{| β_{j 0} + n^{- 1 / 2} u_{j} | - | β_{j 0} |}{| {\tilde{β}}_{j} |}\} \\ \leq n^{- 1 / 2} λ \sum_{j = 1}^{d} \frac{| u_{j} |}{| \tilde{β_{j}} |} . \end{matrix}$ Note that the unpenalized maximum likelihood estimator $\tilde{β}$ satisfies $∥ \tilde{β} - β_{0} ∥ = O_{p} (n^{- 1 / 2})$ . By the Taylor expansion, we have, for $1 \leq j \leq p$ , $\frac{1}{|{\tilde{β}}_{j}|} = \frac{1}{|β_{j 0}|} - \frac{sign (β_{j 0})}{β_{j 0}^{2}} ({\tilde{β}}_{j} - β_{j 0}) + o_{p} (|{\tilde{β}}_{j} - β_{j 0}|) = \frac{1}{|β_{j 0}|} + \frac{O_{p} (1)}{\sqrt{n}} .$ Then, under the condition $\sqrt{n} λ = O_{p} (1)$ , we have $\begin{matrix} n^{- 1 / 2} λ \sum_{j = 1}^{d} \frac{| u_{j} |}{| {\tilde{β}}_{j} |} \\ = n^{- 1 / 2} λ \sum_{j = 1}^{d} (\frac{| u_{j} |}{| β_{j 0} |} + \frac{| u_{j} |}{\sqrt{n}} O_{p} (1)) \\ \leq C n^{- 1 / 2} λ O_{p} (1) \\ = C n^{- 1} O_{p} (1) . \end{matrix}$ Therefore, (A5) $\begin{matrix} \frac{1}{n} \{Q_{n} (β_{0} + n^{- 1 / 2} u) - Q_{n} (β_{0})\} \\ \leq \frac{1}{n} O_{p} (1) \sum_{j = 1}^{p} | u_{j} | - \frac{1}{2 n} u^{⊤} {\tilde{I}}_{0} u + o_{p_{β_{0}, Λ_{0}}} {\frac{{(∥ u ∥ + 1)}^{2}}{n}} + C n^{- 1} O_{p} (1) \end{matrix}$ Thus, in (A5), the first and fourth terms are of the order $C n^{- 1}$ , and the second term is of the order $C^{2} n^{- 1}$ because ${\tilde{I}}_{0}$ is positive definite (see the proof in Zeng et al. [7]). And the third term is of an order smaller than $C^{2} n^{- 1}$ . Hence, if we choose a sufficiently large C, the second term in (A5) dominates the other term. As a result, (A3) holds, which completes the proof. □

Appendix A.2. Proof of Theorem 2

Proof.

(i) According to theorem 1, we have $∥ \hat{β} - β_{0} ∥ = O_{p} (n^{- 1 / 2})$ . To prove the sparsity property of $β_{2}$ , it is sufficient to show that for any sequence $β_{1}$ such that $∥ β_{1} - β_{10} ∥ = O_{p} (n^{- 1 / 2})$ and for any positive constant C, (A6) $lim_{n \to \infty} P \{Q_{n} (β_{1}, 0) = max_{∥β_{2}∥ \leq C n^{- 1 / 2}} Q_{n} (β_{1}, β_{2})\} = 1 .$ To prove (A6), we only need to show that for any such sequence $β_{1}$ and any positive constant C, $\partial Q_{n} (β) / \partial β_{j}$ have opposite signs for any $β_{j} \in (- C n^{- 1 / 2}, 0) \cup (0, C n^{- 1 / 2}) (j = d + 1, \dots, p)$ with the probability tending to 1.

For any $β_{1}$ satisfying $∥ β_{1} - β_{10} ∥ = O_{p} (n^{- 1 / 2})$ and any $β_{2}$ such that $∥ β_{2} ∥ \leq C n^{- 1 / 2}$ , when n is large, $ℓ_{p} {β}$ has the quadratic expansion (A2). It is easy to see that $\partial ℓ_{p} {β} / \partial β_{j} = O_{p} (n^{- 1 / 2}) (j = d + 1, \dots, p)$ . Hence, for $β_{j} \in (- C n^{- 1 / 2}, 0) \cup (0, C n^{- 1 / 2}) (j = d + 1, \dots, p)$ , $\begin{matrix} \frac{\partial Q_{n} (β)}{\partial β_{j}} = \frac{\partial ℓ_{p} (β)}{\partial β_{j}} - n λ \frac{sign (β_{j})}{|{\tilde{β}}_{j}|} = O_{p} (n^{1 / 2}) - (n λ) n^{1 / 2} \frac{sign (β_{j})}{|n^{1 / 2} {\tilde{β}}_{j}|} . \end{matrix}$ Because $n^{1 / 2} ({\tilde{β}}_{j} - 0) = O_{p} (1)$ , we have (A7) $\frac{\partial Q_{n} (β)}{\partial β_{j}} = n^{1 / 2} \{O_{p} (1) - n λ \frac{sign (β_{j})}{|O_{p} (1)|}\} .$ Because $n λ_{n} \to \infty$ , it is easy to see that the sign of $\partial Q_{n} (β) / \partial β_{j}$ is determined by the sign of $β_{j}$ in (A7) when n is large, and they have opposite signs. Theorem 2(i) is proved.

(ii) We next prove the asymptotic normality of ${\hat{β}}_{1}$ . As shown in (i), it holds that $lim_{n \to \infty} P ({\hat{β}}_{2} = 0) = 1$ . Thus, we only need to derive the asymptotic representation of ${\hat{β}}_{1}$ in the event ${{\hat{β}}_{2} = 0}$ .

Let $\hat{h} = \sqrt{n} ({\hat{β}}_{1} - β_{10})$ and $Δ_{1 n} = n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{S}}_{10} (O_{i})$ , where ${\tilde{S}}_{10}$ is the sub-vector of ${\tilde{S}}_{0}$ corresponding to $β_{1}$ . For any random sample in the event ${{\hat{β}}_{2} = 0}$ , we have $Q_{n} ({\hat{β}}_{1}, 0) \geq Q_{n} (β_{0} + n^{- 1 / 2} ({\tilde{I}}_{10}^{- 1} Δ_{1 n}, 0))$ , where ${\tilde{I}}_{10}$ is the upper-left $d \times d$ sub-matrix of the efficient Fisher information matrix ${\tilde{I}}_{0}$ .

Because $∥ {\hat{β}}_{1} - β_{10} ∥ = O_{p} (n^{- 1 / 2})$ according to Theorem 1 and ${\tilde{I}}_{10}^{- 1} Δ_{1 n} = O_{p} (1)$ . It follows by the quadratic expansion of the log profile likelihood (A2) that we have $\begin{matrix} Q_{n} ({\hat{β}}_{1}, 0) & = ℓ_{p} (β_{0}) + {\hat{h}}^{T} Δ_{1 n} - \frac{1}{2} {\hat{h}}^{T} {\tilde{I}}_{10} \hat{h} + o_{p_{β_{0}, Λ_{0}}} (∥ \hat{h} {∥ + 1)}^{2} - n λ \sum_{j = 1}^{d} \frac{|{\hat{β}}_{j}|}{|{\tilde{β}}_{j}|} \\ = ℓ_{p} (β_{0}) + {\hat{h}}^{T} Δ_{1 n} - \frac{1}{2} {\hat{h}}^{T} {\tilde{I}}_{10} \hat{h} + o_{p} (1) - n λ \sum_{j = 1}^{d} \frac{|{\hat{β}}_{j}|}{|{\tilde{β}}_{j}|} \end{matrix}$ and $\begin{matrix} Q_{n} (β_{0} + n^{- 1 / 2} ({\tilde{I}}_{10}^{- 1} Δ_{1 n}, 0)) \\ = ℓ_{p} (β_{0}) + Δ_{1 n}^{T} {\tilde{I}}_{10}^{- 1} Δ_{1 n} - \frac{1}{2} Δ_{1 n}^{T} {\tilde{I}}_{10}^{- 1} Δ_{1 n} + o_{p_{β_{0}, Λ_{0}}} {(∥{\tilde{I}}_{10}^{- 1} Δ_{1 n}∥ + 1)}^{2} \\ - n λ \sum_{j = 1}^{d} \frac{|β_{j 0} + n^{- 1 / 2} {({\tilde{I}}_{10}^{- 1} Δ_{1 n})}_{j}|}{|{\tilde{β}}_{j}|} \\ = ℓ_{p} (β_{0}) + \frac{1}{2} Δ_{1 n}^{T} {\tilde{I}}_{10}^{- 1} Δ_{1 n} + o_{p} (1) - n λ \sum_{j = 1}^{d} \frac{|β_{j 0} + n^{- 1 / 2} {({\tilde{I}}_{10}^{- 1} Δ_{1 n})}_{j}|}{|{\tilde{β}}_{j}|} . \end{matrix}$ Thus, (A8) $\begin{matrix} Q_{n} ({\hat{β}}_{1}, 0) - Q_{n} (β_{0} + n^{- 1 / 2} ({\tilde{I}}_{10}^{- 1} Δ_{1 n}, 0)) \\ = {\hat{h}}^{T} Δ_{1 n} - \frac{1}{2} {\hat{h}}^{T} {\tilde{I}}_{10} \hat{h} - \frac{1}{2} Δ_{1 n}^{T} {\tilde{I}}_{10}^{- 1} Δ_{1 n} - n λ \sum_{j = 1}^{d} \frac{|{\hat{β}}_{j}| - |β_{j 0} + n^{- 1 / 2} {({\tilde{I}}_{10}^{- 1} Δ_{1 n})}_{j}|}{|{\tilde{β}}_{j}|} \\ = - \frac{1}{2} {(\hat{h} - {\tilde{I}}_{10}^{- 1} Δ_{1 n})}^{⊤} {\tilde{I}}_{10} (\hat{h} - {\tilde{I}}_{10}^{- 1} Δ_{1 n}) - n λ \sum_{j = 1}^{d} \frac{|{\hat{β}}_{j}| - |β_{j 0} + n^{- 1 / 2} {({\tilde{I}}_{10}^{- 1} Δ_{1 n})}_{j}|}{|{\tilde{β}}_{j}|} \\ \leq - c {∥\hat{h} - {\tilde{I}}_{10}^{- 1} Δ_{1 n}∥}^{2} + \sqrt{n} λ \sum_{j = 1}^{d} \frac{|\sqrt{n} ({\hat{β}}_{j} - β_{j 0}) - {({\tilde{I}}_{10}^{- 1} Δ_{1 n})}_{j}|}{|{\tilde{β}}_{j}|} \end{matrix}$ for some positive constant c because ${\tilde{I}}_{10}$ is positive-definite.

On the other hand, because $Q_{n} ({\hat{β}}_{1}, 0) - Q_{n} (β_{0} + n^{- 1 / 2} ({\tilde{I}}_{10}^{- 1} Δ_{1 n}, 0)) \geq 0$ , combining (A8) we have that ${∥\hat{h} - {\tilde{I}}_{10}^{- 1} Δ_{1 n}∥}^{2} \leq \frac{1}{c} \sqrt{n} λ \sum_{j = 1}^{d} \frac{|\sqrt{n} ({\hat{β}}_{j} - β_{j 0}) - {({\tilde{I}}_{10}^{- 1} Δ_{1 n})}_{j}|}{|{\tilde{β}}_{j}|} + o_{p} (1) .$ Note that $|{\tilde{β}}_{j}| \overset{p}{\to} |β_{j 0}| > 0$ for $j = 1, \dots, d$ . Thus, under the condition $\sqrt{n} λ \to 0$ , $∥ \hat{h} - {\tilde{I}}_{10}^{- 1} Δ_{1 n} ∥ = o_{p} (1)$ , and then $\sqrt{n} ({\hat{β}}_{1} - β_{10}) = {\tilde{I}}_{10}^{- 1} Δ_{1 n} + o_{p} (1) ⇝ N (0, {\tilde{I}}_{10}^{- 1}) .$ □

Appendix B. The Penalized Conditional Likelihood Method

In this section, we provide the EM algorithm for implementing the maximum conditional likelihood approach. The conditional likelihood function can be written as $L_{C} (β, Λ) = \prod_{i = 1}^{n} \frac{exp {- Λ (L_{i}) exp (β^{⊤} Z_{i})} - I (R_{i} < \infty) exp {- Λ (R_{i}) exp (β^{⊤} Z_{i})}}{exp {- Λ (A_{i}) exp (β^{⊤} Z_{i})}} .$ To maximize the above conditional likelihood function, we treat $Λ$ as a step function. Specifically, let $0 < t_{1} < \dots < t_{K_{n}}$ denote the ordered distinct time point of all $L_{i}, R_{i} I (R_{i} < \infty)$ , and $A_{i} > 0$ . Let $f_{k}$ denote the non-negative jump size of the estimator for $Λ$ at $t_{k}$ for $k = 1, \dots, K_{n}$ . Then, we have $Λ (t) = \sum_{t_{k} \leq t} f_{k}$ and the above conditional likelihood function can be re-expressed as (A9) $L_{C 1} (β, F) = \prod_{i = 1}^{n} exp \{- \sum_{A_{i} \leq t_{k} \leq L_{i}} f_{k} exp (β^{⊤} Z_{i})\} \times {[1 - exp \{- \sum_{L_{i} < t_{k} \leq R_{i}} f_{k} exp (β^{⊤} Z_{i})\}]}^{I (R_{i} < \infty)}$ where $F = {(f_{1}, f_{2}, \dots, f_{K_{n}})}^{⊤}$ .

To present the EM algorithm, for the ith subject, we introduce a set of new independent latent variables ${W_{i k}, k = 1, \dots, K_{n}, A_{i} \leq t_{k} \leq R_{i}^{*}}$ , where $R_{i}^{*} = L_{i} I (R_{i} = \infty) + R_{i} I (R_{i} < \infty)$ and $W_{i k}$ is a Poisson latent variable with the mean $f_{k} exp (β^{⊤} Z_{i})$ . Then, the likelihood function (A9) can be equivalently expressed as $L_{C 2} (β, F) = \prod_{i = 1}^{n} P (\sum_{A_{i} \leq t_{k} \leq L_{i}} W_{i k} = 0) P {(\sum_{L_{i} < t_{k} \leq R_{i}} W_{i k} > 0)}^{I (R_{i} < \infty)} .$ Let $P (W_{i k} | f_{k} exp (β^{⊤} Z_{i}))$ be the probability mass function of $W_{i k}$ , and if we assume that the latent variables $W_{i k}$ ’s were known, we would have the following complete data likelihood function $L_{C}^{c} (β, F) = \prod_{i = 1}^{n} \prod_{k = 1}^{K_{n}} P {(W_{i k} | f_{k} exp (β^{⊤} Z_{i}))}^{I (A_{i} \leq t_{k} \leq R_{i}^{*})},$ where $\sum_{t_{k} \leq L_{i}} W_{i k} = 0$ , $\sum_{L_{i} < t_{k} \leq R_{i}} W_{i k} > 0$ if $R_{i} < \infty$ ; $\sum_{t_{k} \leq L_{i}} W_{i k} = 0$ , if $R_{i} = \infty$ .

In the E-step of the EM algorithm, we need to determine the conditional expectation of the log-likelihood function $log L_{C}^{c} (β, F)$ with respect to all the latent variables and obtain the following objective function: (A10) $Q_{C} (θ; θ^{(m)}) = \sum_{i = 1}^{n} \sum_{k = 1}^{K_{n}} I (A_{i} \leq t_{k} \leq R_{i}^{*}) \{E (W_{i k}) log f_{k} + E (W_{i k}) β^{⊤} Z_{i} - f_{k} exp (β^{T} Z_{i})\}$ We need to calculate the following conditional expectations at the mth iteration of the algorithm. $E (W_{i k}) = I (L_{i} < t_{k} \leq R_{i}, R_{i} < \infty) \times \frac{f_{k}^{(m)} exp (β^{(m) ⊤} Z_{i})}{1 - exp \{- \sum_{L_{i} < t_{l} \leq R_{i}} f_{l}^{(m)} exp (β^{(m) ⊤} Z_{i})\}} .$ In the M-step, we can update each $f_{k}$ by solving $\partial Q (θ; θ^{(m)}) / \partial f_{k} = 0$ as follows: (A11) $f_{k} = \frac{\sum_{i = 1}^{n} I (A_{i} \leq t_{k} \leq R_{i}^{*}) E (W_{i k})}{\sum_{i = 1}^{n} I (A_{i} \leq t_{k} \leq R_{i}^{*}) exp (β^{(m) ⊤} Z_{i})} .$ Plugging (A11) into (A10), we have the following objective function $H (β; θ^{(m)}) + \sum_{j = 1}^{p} P_{λ} (| β_{j} |)$ where $\begin{matrix} H (β; θ^{(m)}) & = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K_{n}} I (A_{i} \leq t_{k} \leq R_{i}^{*}) E (W_{i k}) {β^{⊤} Z_{i} \\ - log [\sum_{i^{^{'}} = 1}^{n} I (A_{i^{^{'}}} \leq t_{k} \leq R_{i^{^{'}}}^{*}) exp (β^{⊤} Z_{i^{^{'}}})]} . \end{matrix}$ The sparse estimator of $β$ based on $L_{C} (β, F)$ can be obtained by minimizing the above objective function, which can be accomplished with the same algorithm as in Section 3 of the paper.

References

1. Sun, J. The Statistical Analysis of Interval-Censored Failure Time Data; Springer: New York, NY, USA, 2006.

2. Huang, J. Efficient estimation for the proportional hazards model with interval censoring. Ann. Stat.; 1996; 24, pp. 540-568. [DOI: https://dx.doi.org/10.1214/aos/1032894452]

3. Shen, X. Proportional odds regression and sieve maximum likelihood estimation. Biometrika; 1998; 85, pp. 165-177. [DOI: https://dx.doi.org/10.1093/biomet/85.1.165]

4. Zeng, D.; Cai, J.; Shen, Y. Semiparametric additive risks model for interval-censored data. Stat. Sin.; 2006; 16, pp. 287-302.

5. Zhang, Y.; Hua, L.; Huang, J. A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scand. J. Stat.; 2010; 37, pp. 338-354. [DOI: https://dx.doi.org/10.1111/j.1467-9469.2009.00680.x]

6. Wang, L.; McMahan, C.S.; Hudgens, M.G.; Qureshi, Z.P. A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics; 2016; 72, pp. 222-231. [DOI: https://dx.doi.org/10.1111/biom.12389] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26393917]

7. Zeng, D.; Mao, L.; Lin, D.Y. Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika; 2016; 103, pp. 253-271. [DOI: https://dx.doi.org/10.1093/biomet/asw013]

8. Zhou, Q.; Hu, T.; Sun, J. A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. J. Am. Stat. Assoc.; 2017; 112, pp. 664-672. [DOI: https://dx.doi.org/10.1080/01621459.2016.1158113]

9. Prorok, P.C.; Andriole, G.L.; Bresalier, R.S.; Buys, S.S.; Chia, D.; Crawford, E.D.; Fogel, R.; Gelmann, E.P.; Gilbert, F.; Gohagan, J.K. Design of the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial. Control. Clin. Trials; 2000; 21, pp. 273S-309S. [DOI: https://dx.doi.org/10.1016/S0197-2456(00)00098-2]

10. Wang, M.C. Nonparametric estimation from cross-sectional survival data. J. Am. Stat. Assoc.; 1991; 86, pp. 130-143. [DOI: https://dx.doi.org/10.1080/01621459.1991.10475011]

11. Shen, Y.; Ning, J.; Qin, J. Analyzing length-biased data with semiparametric transformation and accelerated failure time models. J. Am. Stat. Assoc.; 2009; 104, pp. 1192-1202. [DOI: https://dx.doi.org/10.1198/jasa.2009.tm08614]

12. Ning, J.; Qin, J.; Shen, Y. Semiparametric accelerated failure time model for length-biased data with application to dementia study. Stat. Sin.; 2014; 24, pp. 313-333. [DOI: https://dx.doi.org/10.5705/ss.2011.197] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/24478570]

13. Qin, J.; Shen, Y. Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics; 2010; 66, pp. 382-392. [DOI: https://dx.doi.org/10.1111/j.1541-0420.2009.01287.x] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19522872]

14. Qin, J.; Ning, J.; Liu, H.; Shen, Y. Maximum likelihood estimations and EM algorithms with length-biased data. J. Am. Stat. Assoc.; 2011; 106, pp. 1434-1449. [DOI: https://dx.doi.org/10.1198/jasa.2011.tm10156] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22323840]

15. Gao, F.; Chan, K.C.G. Semiparametric regression analysis of length-biased interval-censored data. Biometrics; 2019; 75, pp. 121-132. [DOI: https://dx.doi.org/10.1111/biom.12970]

16. Shen, P.S.; Peng, Y.; Chen, H.J.; Chen, C.M. Maximum likelihood estimation for length-biased and interval-censored data with a nonsusceptible fraction. Lifetime Data Anal.; 2022; 28, pp. 68-88. [DOI: https://dx.doi.org/10.1007/s10985-021-09536-2]

17. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.); 1996; 58, pp. 267-288. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1996.tb02080.x]

18. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.; 2001; 96, pp. 1348-1360. [DOI: https://dx.doi.org/10.1198/016214501753382273]

19. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc.; 2006; 101, pp. 1418-1429. [DOI: https://dx.doi.org/10.1198/016214506000000735]

20. Lv, J.; Fan, Y. A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat.; 2009; 37, pp. 3498-3528. [DOI: https://dx.doi.org/10.1214/09-AOS683]

21. Dicker, L.; Huang, B.; Lin, X. Variable selection and estimation with the seamless-L-0 penalty. Stat. Sin.; 2013; 23, pp. 929-962.

22. Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat.; 2010; 38, pp. 894-942. [DOI: https://dx.doi.org/10.1214/09-AOS729]

23. Liu, Z.; Li, G. Efficient regularized regression with penalty for variable selection and network construction. Comput. Math. Methods Med.; 2016; 2016, 3456153. [DOI: https://dx.doi.org/10.1155/2016/3456153]

24. Dai, L.; Chen, K.; Sun, Z.; Liu, Z.; Li, G. Broken adaptive ridge regression and its asymptotic properties. J. Multivar. Anal.; 2018; 168, pp. 334-351. [DOI: https://dx.doi.org/10.1016/j.jmva.2018.08.007] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30911202]

25. Fan, J.; Li, R.; Zhang, C.H.; Zou, H. Statistical Foundations of Data Science; Chapman and Hall/CRC: New York, NY, USA, 2020.

26. Garavand, A.; Salehnasab, C.; Behmanesh, A.; Aslani, N.; Zadeh, A.; Ghaderzadeh, M. Efficient Model for Coronary Artery Disease Diagnosis: A Comparative Study of Several Machine Learning Algorithms. J. Healthc. Eng.; 2022; 2022, 5359540. [DOI: https://dx.doi.org/10.1155/2022/5359540] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36304749]

27. Hosseini, A.; Eshraghi, M.A.; Taami, T.; Sadeghsalehi, H.; Hoseinzadeh, Z.; Ghaderzadeh, M.; Rafiee, M. A mobile application based on efficient lightweight CNN model for classification of B-ALL cancer from non-cancerous cells: A design and implementation study. Inform. Med. Unlocked; 2023; 39, 101244. [DOI: https://dx.doi.org/10.1016/j.imu.2023.101244]

28. Garavand, A.; Behmanesh, A.; Aslani, N.; Sadeghsalehi, H.; Ghaderzadeh, M. Towards Diagnostic Aided Systems in Coronary Artery Disease Detection: A Comprehensive Multiview Survey of the State of the Art. Int. J. Intell. Syst.; 2023; 2023, 6442756. [DOI: https://dx.doi.org/10.1155/2023/6442756]

29. Ghaderzadeh, M.; Aria, M. Management of Covid-19 Detection Using Artificial Intelligence in 2020 Pandemic. Proceedings of the ICMHI ’21: 5th International Conference on Medical and Health Informatics; Kyoto, Japan, 14–16 May 2021; pp. 32-38.

30. Chen, L.P. Variable selection and estimation for the additive hazards model subject to left-truncation, right-censoring and measurement error in covariates. J. Stat. Comput. Simul.; 2020; 90, pp. 3261-3300. [DOI: https://dx.doi.org/10.1080/00949655.2020.1800705]

31. He, D.; Zhou, Y.; Zou, H. High-dimensional variable selection with right-censored length-biased data. Stat. Sin.; 2020; 30, pp. 193-215. [DOI: https://dx.doi.org/10.5705/ss.202018.0089]

32. Li, C.; Pak, D.; Todem, D. Adaptive lasso for the Cox regression with interval censored and possibly left truncated data. Stat. Methods Med. Res.; 2020; 29, pp. 1243-1255. [DOI: https://dx.doi.org/10.1177/0962280219856238]

33. Withana Gamage, P.; McMahan, C.; Wang, L. Variable selection in semiparametric nonmixture cure model with interval-censored failure time data. Stat. Med.; 2019; 38, pp. 3026-3039.

34. Li, S.; Peng, L. Instrumental Variable Estimation of Complier Causal Treatment Effect with Interval-Censored Data. Biometrics; 2023; 79, pp. 253-263. [DOI: https://dx.doi.org/10.1111/biom.13565]

35. Withana Gamage, P.; McMahan, C.; Wang, L. A flexible parametric approach for analyzing arbitrarily censored data that are potentially subject to left truncation under the proportional hazards model. Lifetime Data Anal.; 2023; 29, pp. 188-212. [DOI: https://dx.doi.org/10.1007/s10985-022-09579-z]

36. Huang, C.Y.; Qin, J. Semiparametric estimation for the additive hazards model with left-truncated and right-censored data. Biometrika; 2013; 100, pp. 877-888. [DOI: https://dx.doi.org/10.1093/biomet/ast039]

37. Turnbull, B.W. The empirical distribution function with arbitrarily grouped, censored and truncated data. J. R. Stat. Soc. Ser. (Methodol.); 1976; 38, pp. 290-295. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1976.tb01597.x]

38. Zhang, H.H.; Lu, W. Adaptive Lasso for Cox’s proportional hazards model. Biometrika; 2007; 94, pp. 691-703. [DOI: https://dx.doi.org/10.1093/biomet/asm037]

39. Li, S.; Wu, Q.; Sun, J. Penalized estimation of semiparametric transformation models with interval-censored data and application to Alzheimer’s disease. Stat. Methods Med. Res.; 2020; 29, pp. 2151-2166. [DOI: https://dx.doi.org/10.1177/0962280219884720] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31718478]

40. Zou, H.; Li, R. One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat.; 2008; 36, pp. 1509-1533.

41. Andriole, G.L.; Crawford, E.D.; Grubb, R.L.; Buys, S.S.; Chia, D.; Church, T.R.; Fouad, M.N.; Isaacs, C.; Prorok, P. Prostate cancer screening in the randomized prostate, lung, colorectal, and ovarian cancer screening trial: Mortality results after 13 years of follow-up. J. Natl. Cancer Inst.; 2012; 104, pp. 125-132. [DOI: https://dx.doi.org/10.1093/jnci/djr500] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22228146]

42. Meister, K. Risk Factors for Prostate Cancer; American Council on Science and Health: New York, NY, USA, 2002.

43. Pierce, B.L. Why are diabetics at reduced risk for prostate cancer? A review of the epidemiologic evidence. Urol. Oncol. Semin. Orig. Investig.; 2012; 30, pp. 735-743. [DOI: https://dx.doi.org/10.1016/j.urolonc.2012.07.008]

44. Lu, T.; Li, S.; Sun, L. Combined estimating equation approaches for the additive hazards model with left-truncated and interval-censored data. Lifetime Data Anal.; 2023; 29, pp. 672-697. [DOI: https://dx.doi.org/10.1007/s10985-023-09596-6]

45. Sun, L.; Li, S.; Wang, L.; Song, X.; Sui, X. Simultaneous variable selection in regression analysis of multivariate interval-censored data. Biometrics; 2022; 78, pp. 1402-1413. [DOI: https://dx.doi.org/10.1111/biom.13548] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34407218]

46. Murphy, S.A.; Van Der Vaart, A.W. On profile likelihood. J. Am. Stat. Assoc.; 2000; 95, pp. 449-465. [DOI: https://dx.doi.org/10.1080/01621459.2000.10474219]

47. Huang, J.; Wellner, J.A. Interval censored survival data: A review of recent progress. Proceedings of the First Seattle Symposium in Biostatistics; Lin, D.Y.; Fleming, T.R. Springer: New York, NY, USA, 1997; pp. 123-169.

Word count: 8182

Show less

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Length-biased failure time data occur often in various biomedical fields, including clinical trials, epidemiological cohort studies and genome-wide association studies, and their analyses have been attracting a surge of interest. In practical applications, because one may collect a large number of candidate covariates for the failure event of interest, variable selection becomes a useful tool to identify the important risk factors and enhance the estimation accuracy. In this paper, we consider Cox’s proportional hazards model and develop a penalized variable selection technique with various popular penalty functions for length-biased data, in which the failure event of interest suffers from interval censoring. Specifically, a computationally stable and reliable penalized expectation-maximization algorithm via two-stage data augmentation is developed to overcome the challenge in maximizing the intractable penalized likelihood. We establish the oracle property of the proposed method and present some simulation results, suggesting that the proposed method outperforms the traditional variable selection method based on the conditional likelihood. The proposed method is then applied to a set of real data arising from the Prostate, Lung, Colorectal and Ovarian cancer screening trial. The analysis results show that African Americans and having immediate family members with prostate cancer significantly increase the risk of developing prostate cancer, while having diabetes exhibited a significantly lower risk of developing prostate cancer.

Details

Title

Variable Selection for Length-Biased and Interval-Censored Failure Time Data

Author

Fan, Feng¹; Cheng, Guanghui²; Sun, Jianguo³

¹ School of Mathematics, Jilin University, Changchun 130012, China; [email protected]
² Guangzhou Institute of International Finance, Guangzhou University, Guangzhou 510006, China
³ Department of Statistics, University of Missouri, Columbia, MO 65211, USA; [email protected]

First page

4576

Publication year

2023

Publication date

2023

Publisher

MDPI AG

e-ISSN

22277390

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/math11224576

ProQuest document ID

2893161048

Variable Selection for Length-Biased and Interval-Censored Failure Time Data

Jump to:

Full Text

Abstract

Details

Suggested sources