Full Text

Turn on search term navigation

1. Introduction

Biomedical research often involves multivariate survival data, such as cancer patients facing local recurrence and repeated hospitalizations in the context of chronic disease management. A significant characteristic of these data is that the survival times of the same individual are correlated. Theoretical development is relatively slow due to the complexity of the dependency structure among survival times, and researchers have mainly focused on modeling the marginal distributions of survival times (see Liang [1], Lin [2], and Spiekerman [3]), where the dependence is not specified.

The most widely used approach is the WLW method [4]. Suppose that there is a random sample of n subjects. Let $T_{k i}$ be the kth survival time of the ith subject, where $k = 1, \dots, K$ and $i = 1, \dots, n$ . In the WLW method, the marginal hazard function of $T_{k i}$ is assumed to take the form

$λ_{k i} (t | Z_{k i}) = λ_{k 0} (t) e x p (β_{k 0}^{T} Z_{k i} (t)), t ⩾ 0,$

where

Z_{k i}

is a p-dimensional possibly covariate vector,

λ_{k 0} (t)

is the unspecified baseline hazard function, and

β_{k 0}

represents the true values of unknown regression coefficients. Let

β_{0} = {(β_{10}^{T}, \dots, β_{K 0}^{T})}^{T}

and

C_{k i}

be the censoring time. Let

{\tilde{T}}_{k i} = T_{k i} \land C_{k i}

and

δ_{k i} = I (T_{k i} ⩽ C_{k i})

. Assume that

T_{k i}

and

C_{k i}

are given independent covariates of

Z_{k i}

. Then, the marginal partial likelihood ([5,6]) is

$L_{k} (β_{k}) = \prod_{i = 1}^{n} {(\frac{e x p (β_{k}^{T} Z_{k i} ({\tilde{T}}_{k i}))}{Σ_{l \in R_{k} ({\tilde{T}}_{k i})} e x p (β_{k}^{T} Z_{k l} ({\tilde{T}}_{k i}))})}^{δ_{k i}},$

and the corresponding score function is

(1) $\frac{\partial l o g L_{k} (β_{k})}{\partial β_{k}} = \sum_{i = 1}^{n} \int_{0}^{\infty} (Z_{k i} (t) - {\bar{Z}}_{k i} (β_{k}, t)) d N_{k i} (t),$

where

R_{k} (t) = {i : {\tilde{T}}_{k i} ⩾ t}

is the risk set at t,

Y_{k i} (t) = I ({\tilde{T}}_{k i} ⩾ t)

N_{k i} (t) = δ_{k i} I ({\tilde{T}}_{k i} ⩽ t)

, and

${\bar{Z}}_{k} (β_{k}, t) = \frac{\sum Z_{k i} (t) Y_{k i} (t) e x p (β_{k}^{T} Z_{k i} (t))}{\sum Y_{k i} (t) e x p (β_{k}^{T} Z_{k i} (t))} .$

Thus, we have K sets of estimating equations as follows:

(2) $\sum_{i = 1}^{n} \int_{0}^{\infty} (Z_{k i} (t) - {\bar{Z}}_{k} (β_{k}, t)) d N_{k i} (t) = 0, k = 1, 2, \dots, K .$

We can obtain the estimators ${\hat{β}}_{1}, \dots, {\hat{β}}_{K}$ . Let ${\hat{β}}_{W} = {({\hat{β}}_{1}^{T}, \dots, {\hat{β}}_{K}^{T})}^{T}$ , which is known as the WLW estimator.

Cui [7] proposed a new method to improve the efficiency of the WLW method, which is called the partition method or partition-estimating equation in this paper. The main idea of their method is to partition the score function into small blocks. He further explored the situation where the number of blocks increases with the sample size, and he established the asymptotic normality of the estimators obtained using their method. His method described and made use of the dependency information among survival times, and the simulation results showed that their method performed better than the WLW method.

In practice, it is always difficult for investigators to identify significant covariates when the number of covariates is large, and variable selection studies increasingly involve the analysis of survival data with high-dimensional covariates to solve this difficulty. Tibshirani [8] proposed the application of the $L_{1}$ penalty function (LASSO) in the Cox model; Zou [9] proposed the adaptive Lasso (AdpLASSO), which Zhang [10] studied in the Cox model; and Zhang [11] proposed the minimax concave penalty. Several studies have focused on variable selection for multivariate survival data. For example, Cai [12] proposed a variable selection method for a growing number of regression coefficients; Liu [13] proposed a multivariate varying-coefficient hazard model; and Sun [14] developed a variable selection technique for multivariate interval-censored data.

Fan and Li [15] proposed a new type of penalty function called Smoothly Clipped Absolute Deviation (SCAD). The SCAD method combines characteristics of the LASSO and least squares. It compresses the coefficients of the model through a penalty function such that some coefficients are compressed to 0, thereby achieving variable selection, and the larger coefficients can also achieve asymptotically unbiased estimates. Moreover, Fan and Li proposed the oracle property and then introduced the SCAD penalty function into the Cox model [16].

In this paper, we aim to further improve the partition method to propose a new variable selection method for multivariate survival data. Based on the partition method, we make better use of the dependency information among survival times compared to the WLW method. Moreover, we directly achieve the purpose of variable selection using estimating equations. We construct our method with the SCAD penalty function and prove that the obtained estimators possess the oracle property. Numerical studies show that the proposed method performs well.

The rest of this paper is organized as follows. In Section 2, we present the used notation and assumptions. Then, we introduce our method and present its asymptotic and oracle properties in Section 3. We address implementation issues in Section 4, while simulations and an application of our method to real data are given in Section 5 and Section 6. We leave the proofs in the Appendix A.

2. Notation and Assumptions

$T_{k i}, Z_{k i} (t), C_{k i}, {\tilde{T}}_{k i}, N_{k i} (t), Y_{k i} (t), λ_{k i} (t | Z_{k i})$ , and $λ_{k 0} (t)$ are the same as in Section 1. To facilitate the notation, let

(3) $M_{k i} (t) = N_{k i} (t) - \int_{0}^{t} Y_{k i} (u) λ_{k 0} (u) e x p (β_{k 0}^{T} Z_{k i} (u)) d u, i = 1, \dots, n, k = 1, \dots, K,$

which is a martingale with respect to

σ

-filtration

F_{t, k i} = σ {N_{k i} (s), Y_{k i} (s), Z_{k i} (s) : 0 ⩽ s ⩽ t}

For $k = 1, \dots, K$ , let

$\begin{matrix} S_{k}^{(d)} (β_{k}, t) = \frac{1}{n} \sum_{i = 1}^{n} Y_{k i} (t) Z_{k i} {(t)}^{\otimes d} e x p (β_{k}^{T} Z_{k i} (t)), s_{k}^{(d)} = E S_{k}^{(d)} (β_{k}, t), \\ {\bar{Z}}_{k} (β_{k}, t) = \frac{S_{k}^{(1)} (β_{k}, t)}{S_{k}^{(0)} (β_{k}, t)}, μ_{k} (β_{k}, t) = \frac{s_{k}^{(1)} (β_{k}, t)}{s_{k}^{(0)} (β_{k}, t)}, \\ V_{k} (β_{k}, t) = \frac{S_{k}^{(2)} (β_{k}, t)}{S_{k}^{(0)} (β_{k}, t)} - {\bar{Z}}_{k} {(β_{k}, t)}^{\otimes 2}, v_{k} (β_{k}, t) = \frac{s_{k}^{(2)} (β_{k}, t)}{s_{k}^{(0)} (β_{k}, t)} - μ_{k} {(β_{k}, t)}^{\otimes 2}, \end{matrix}$

where E denotes expectation. For a column vector

α

α^{\otimes 0} = 1

α^{\otimes 1} = α

, and

α^{\otimes 2} = α α^{T}

The preliminary estimators of the baseline cumulative hazard functions are given by

${\hat{Λ}}_{k 0} (t) = \int_{0}^{t} \frac{d N_{k \cdot} (u)}{n S_{k}^{(0)} ({\hat{β}}_{k}, u)},$

where

N_{k \cdot} (t) = \sum_{i = 1}^{n} N_{k i} (t)

. We obtain the following estimated martingales:

${\hat{M}}_{k i} (t) = N_{k i} (t) - \int_{0}^{t} Y_{k i} (u) e x p ({\hat{β}}_{k}^{T} Z_{k i} (u)) d {\hat{Λ}}_{k 0} (u), i = 1, \dots, n, k = 1, \dots, K,$

where

{\hat{β}}_{1}, \dots, {\hat{β}}_{K}

are the WLW estimators.

We consider only events up to $τ$ such that $P r ({\tilde{T}}_{k} ⩾ τ) = 0$ for all k. Let

$U_{k} (β_{k}, t) = \sum_{i = 1}^{n} \int_{0}^{t} (Z_{k i} (u) - {\bar{Z}}_{k} (β_{k}, u)) d N_{k i} (u), t \in [0, τ] .$

Cui [7] introduced a partition. For the kth event, partition $[0, τ]$ into $L_{k}$ intervals is expressed as follows:

$0 = t_{0}^{(k)} < t_{1}^{(k)} < \dots < t_{L_{k}}^{(k)} = τ .$

Let $L = L_{1} + \dots + L_{K}$ define partition $Π$ , as follows:

(4) $Π = {(t_{i_{1}}^{(1)}, \dots, t_{i_{K}}^{(K)}) : t_{i_{k}}^{(k)} ⩾ 0, i = 0, 1, \dots, L_{k}, 1 ⩽ k ⩽ K} .$

Following Cui [7], we break $U_{k} (β_{k}, t)$ into $L_{k}$ pieces as follows:

$Δ U_{Π}^{(k)} (β_{k}) = {(Δ U_{Π}^{(k)} {(β_{k}, t_{1}^{(k)})}^{T}, \dots, Δ U_{Π}^{(k)} {(β_{k}, t_{L_{k}}^{(k)})}^{T})}^{T},$

where, for

l = 1, \dots, L_{k}

$Δ U_{Π}^{(k)} (β_{k}, t_{l}^{(k)}) = U_{Π}^{(k)} (β_{k}, t_{l}^{(k)}) - U_{Π}^{(k)} (β_{k}, t_{l - 1}^{(k)}) = \sum_{i = 1}^{n} \int_{t_{l - 1}^{(k)}}^{t_{l}^{(k)}} (Z_{k i} (t) - {\bar{Z}}_{k} (β_{k}, t)) d N_{k i} (t) .$

Let $Δ U_{Π} = {(Δ U_{Π}^{(1)} {(β_{1})}^{T}, \dots, Δ U_{Π}^{(K)} {(β_{K})}^{T})}^{T}$ . Cui [7] introduced the following notations:

$\begin{matrix} Δ {\hat{Ψ}}_{Π} (\hat{β}) & = - \frac{1}{n} \frac{\partial Δ U_{Π} (β)}{\partial β^{T}} |_{β = \hat{β}} \\ = {(\begin{matrix} Δ {\hat{Ψ}}_{11} ({\hat{β}}_{1}) & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Δ {\hat{Ψ}}_{1 L_{1}} ({\hat{β}}_{1}) & 0 & \dots & 0 \\ 0 & Δ {\hat{Ψ}}_{21} ({\hat{β}}_{2}) & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & Δ {\hat{Ψ}}_{2 L_{2}} ({\hat{β}}_{2}) & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & Δ {\hat{Ψ}}_{K 1} ({\hat{β}}_{K}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & Δ {\hat{Ψ}}_{K L_{K}} ({\hat{β}}_{K}) \end{matrix})}_{p L \times p K}, \end{matrix}$

$\begin{matrix} Δ {\hat{Ψ}}_{k l} ({\hat{β}}_{k}) = Δ {\hat{Ψ}}_{Π} ({\hat{β}}_{k}, t_{l}^{(k)}) - Δ {\hat{Ψ}}_{Π} ({\hat{β}}_{k}, t_{l - 1}^{(k)}) = \int_{t_{l - 1}^{(k)}}^{t_{l}^{(k)}} V_{k} ({\hat{β}}_{k}, u) S_{k}^{(0)} ({\hat{β}}_{k}, u) d {\hat{Λ}}_{k 0} (u), \\ Δ {\hat{Σ}}_{Π} (\hat{β}) = {({\hat{b}}_{l l^{'}}^{(k j)} (\hat{β}))}_{1 ⩽ l ⩽ L_{k}, 1 ⩽ l^{'} ⩽ L_{j}, 1 ⩽ k, j ⩽ K}, \\ ({\hat{b}}_{l l^{'}}^{(k j)} (\hat{β})) = \frac{1}{n} \sum_{i = 1}^{n} [\int_{t_{l - 1}^{(k)}}^{t_{l}^{(k)}} (Z_{k i} (t) - {\bar{Z}}_{k} ({\hat{β}}_{k}, t)) d {\hat{M}}_{k i} (t)] [\int_{t_{l^{'} - 1}^{(j)}}^{t_{l^{'}}^{(j)}} {(Z_{j i} (t) - {\bar{Z}}_{j} ({\hat{β}}_{j}, t))}^{T} d {\hat{M}}_{j} i (t)], \\ \hat{β} = {\hat{β}}_{W} . \end{matrix}$

Cui [7] focused on the situation in which $Π_{n}$ varies with sample size n. Let

$Δ t_{Π_{n}}^{(k)} = max_{1 ⩽ l ⩽ L_{k}} {t_{l}^{(k)} - t_{l - 1}^{(k)}},$

such that

$lim_{n \to \infty} Δ t_{Π_{n}}^{(k)} ⟶ 0, k = 1, \dots, K .$

In addition, he proposed the following estimating equations:

(5) $Δ {\hat{Ψ}}_{Π_{n}} {({\hat{β}}_{W})}^{T} Δ {\hat{Σ}}_{Π_{n}} {({\hat{β}}_{W})}^{- 1} Δ U_{Π_{n}} (β) = 0 .$

We impose the following conditions:

$P r (Y_{k i} (t) = 1, t \in [0, τ]) > 0, i = 1, \dots, n, k = 1, \dots, K .$

$| Z_{k i j} (0) | + \int_{0}^{t} | d Z_{k i j} (t) | < B$ a.s. for $j = 1, \dots, p, i = 1, \dots, n, k = 1, \dots, K$ , and some constant $B < \infty .$

$λ_{k 0} (t)$ is a continuous function of $t \in [0, τ]$ , and there exist constants $C_{1} > 0$ and $γ > 0$ such that $λ_{k 0} (t) ⩾ C t^{γ}, t \in [0, τ] .$

There exists a neighborhood $B$ of $β_{0}$ such that, for $d = 0, 1, 2$ and $k = 1, \dots, K$ ,

$sup_{t \in [0, τ], β \in B} ∥ S_{k}^{(d)} (β_{k}, t) - s_{k}^{(d)} (β_{k}, t) ∥ = O_{p} (\frac{1}{\sqrt{n}}) .$

$s_{k}^{(d)} (β_{k}, t)$ (k = 1, …, K; d = 0, 1, 2) is a continuous function of $β \in B$ uniformly in $t \in [0, τ]$ and is bounded on $B \times [0, τ]$ , $s_{k}^{(0)} ⩾ s_{0} > 0 (k = 1, \dots, K)$ , $s_{0}$ is a constant, $β \in B$ , $t \in [0, τ]$ , and

$s_{k}^{(1)} (β_{k}, t) = \frac{\partial}{\partial β_{k}} s_{k}^{(0)} (β_{k}, t), s_{k}^{(2)} (β_{k}, t) = \frac{\partial^{2}}{\partial β_{k}^{2}} s_{k}^{(0)} (β_{k}, t),$

for

k = 1, \dots, K, β \in B

, and

t \in [0, τ]

$v_{k} (β_{k}, t) (k = 1, \dots, K)$ is a positive definite matrix on $B \times [0, τ]$ ,

$inf_{B \times [0, τ]} λ_{m i n} (v_{k} (β_{k}, t)) = λ_{m i n}^{(k)} > 0,$

and

λ_{m i n} (\cdot)

represents the minimum eigenvalue of the matrix.

For all sufficiently large n, there exists $Δ {\hat{Σ}}_{Π_{n}} {({\hat{β}}_{W})}^{- 1}$ . We use $η_{n}$ to denote the partition index for partition $Π_{n}$ , $η_{n}$ is an increasing positive sequence, and there exists a constant $C_{3} > C_{2} > 0$ such that

$C_{2} η_{n} ⩽ min {L_{1}, \dots, L_{K}} ⩽ max {L_{1}, \dots, L_{K}} ⩽ C_{3} η_{n},$

where we assume that

η_{n} \to \infty

, and

η_{n}^{4 + 2 γ} / \sqrt{n} \to 0

Let $Δ Ψ_{Π}^{T} (β) = - \frac{1}{n} \frac{\partial Δ U_{Π} (β)}{\partial β^{T}}$ and $Δ Σ_{Π} (β) = {(b_{l l^{'}}^{(k j)} (β))}_{1 ⩽ l ⩽ L_{k}, 1 ⩽ l^{'} ⩽ L_{j}, 1 ⩽ k, j ⩽ K}$ . Let

$A (β) = Δ Ψ_{Π_{n}}^{T} (β) Δ Σ_{Π_{n}}^{- 1} (β) Δ Ψ_{Π_{n}} (β),$

and assume that there exists a constant

C_{4}

such that

$λ_{m a x} {A (β)} < C_{4} < \infty,$

Assume that the penalty function $p_{λ} (| β_{j} |)$ satisfies

$lim_{n \to \infty} \underset{β_{j} \to 0 +}{lim inf} p_{λ}^{'} (β_{j}) / λ > 0$

for all

j = 1, \dots, d_{n}

. Furthermore, we assume that there exists a constant

C_{4}

such that, for nonzero

θ_{1}

and

θ_{2}

| p_{λ}^{″} (θ_{1}) - p_{λ}^{″} (θ_{2}) | ⩽ C_{5} | θ_{1} - θ_{2} |

Remark 1.

Conditions 1 and 2 are also used by Andersen et al. [17]. Conditions 4–8, which are adapted from Cai et al. [18] and Cui et al. [7], guarantee the asymptotic normality of the penalized partition estimator. Condition 3 is satisfied for most commonly used distributions in survival analysis [19]. Condition 9 is also used by Cai et al. [12]

3. Main Results

3.1. Construction of Estimators

We introduce the penalized partition-estimating Equation (PPEE) as follows:

(6) $Δ {\hat{Ψ}}_{Π_{n}} {({\hat{β}}_{W})}^{T} Δ {\hat{Σ}}_{Π_{n}} {({\hat{β}}_{W})}^{- 1} Δ U_{Π_{n}} (β) - n p_{λ}^{'} (| β |) s g n (β) = 0$

where

p_{λ}^{'} (| β |) = {(p_{λ}^{'} (| β_{1} |), \dots, p_{λ}^{'} (| β_{d} |))}^{T}

, and

p_{λ} (\cdot)

is a penalty function. We consider the differential form of the SCAD penalty proposed by Fan and Li [15,20] defined by

$p_{λ}^{'} (| θ |) = λ \{I (| θ | < λ) + \frac{{(a λ - | θ |)}_{+}}{(a - 1) λ} I (| θ | ⩾ λ)\},$

for some

a > 2

Then, we can obtain the estimators $\hat{β}$ by solving the penalized partition-estimating Equation (6). The SCAD penalty function involves two tuning parameters, a and $λ$ . We will explain how to choose the parameters and how to obtain the estimators $\hat{β}$ in Section 4.

3.2. Asymptotic and Oracle Properties of the Proposed Estimator

Fan and Li proposed the oracle property [21], which means that the estimator has the same limiting distribution as an estimator that knows the true model a priori. In this section, we will provide the asymptotic properties for $\hat{β}$ and show that it achieves the oracle property.

We consider the situation mentioned by Cai [12], where the regression coefficient varies with the sample size n, that is, $β = {(β_{1}, \dots, β_{d_{n}})}^{T}$ , where $d_{n}$ tends to ∞ as $n \to \infty$ and $d_{n}^{4} / n \to 0$ . Let $β_{0} \equiv {(β_{01}, \dots, β_{0 d_{n}})}^{T}$ denote the true value of $β$ . Furthermore, we let $β_{10}$ and $β_{20}$ , respectively, denote the nonzero and zero components of $β_{0}$ . Without loss of generality, we write $β_{0} = {(β_{10}^{T}, β_{20}^{T})}^{T}$ and suppose that $β_{0 j} \neq 0$ for $j ⩽ s_{n}$ and $β_{0 j} = 0$ for $j > s_{n}$ , which means that $β_{10}$ consists of the $s_{n}$ nonzero components of $β_{0}$ . Let

(7) $a_{n} = m a x {| p_{λ_{n}}^{'} (| β_{0 j} |) | : β_{0 j} \neq 0},$

and

(8) $b_{n} = m a x {| p_{λ_{n}}^{″} (| β_{0 j} |) | : β_{0 j} \neq 0} .$

In this section, we will use $λ_{n}$ instead of $λ$ to emphasize that $λ$ depends on n. For simplicity, we let $f (β) = Δ {\hat{Ψ}}_{Π_{n}} {({\hat{β}}_{W})}^{T} Δ {\hat{Σ}}_{Π_{n}} {({\hat{β}}_{W})}^{- 1} Δ U_{Π_{n}} (β)$ and $g (β) = f (β) - n p_{λ_{n}}^{'} (| β |) s g n (β)$ .

Remark 2.

Here, we discuss the compatibility of the given parameters. We first impose the condition that the SCAD penalty function possesses the following property: for nonzero fixed θ, $lim \sqrt{n / d_{n}} p_{λ_{n}}^{'} (| θ |) = 0$ and $lim p_{λ_{n}}^{″} (| θ |) = 0$ . This can be satisfied by appropriately choosing the regularization parameter $λ_{n}$ . If we choose $λ_{n} \to 0$ and $\sqrt{n / d_{n}} λ_{n} \to \infty$ , the above property holds because $\sqrt{n / d_{n}} p_{λ_{n}}^{'} = p_{λ_{n}}^{″} (| θ |) = 0$ for $θ \neq 0$ ; thus, $a_{n} \to 0$ and $b_{n} \to 0$ can be satisfied. Furthermore, we can obtain that, for any given constant M, ${inf}_{| θ | ⩽ M n^{- 1 / 2}} p_{λ_{n}}^{'} (| θ |) = λ_{n}$ , which means that condition (I) can be satisfied. Therefore, the conditions given in the upcoming theorem will not contradict each other.

Based on Cui’s research [7], we have the following theorem and lemma:

Theorem 1.

Under conditions (1)–(8), let ${Π_{n}}$ be the partition sequence that satisfies the following condition:

$lim_{n \to \infty} Δ t_{Π_{n}}^{(k)} ⟶ 0, k = 1, \dots, K .$

Then, there exists a matrix W such that

$lim_{n \to \infty} Δ Ψ_{Π_{n}} {(β_{0})}^{T} Δ Σ_{Π_{n}} {(β_{0})}^{- 1} Δ Ψ_{Π_{n}} (β_{0}) = W,$

and

$\frac{1}{\sqrt{n}} f (β_{0}) \overset{d}{\to} N (0, W) .$

Lemma 1.

Under conditions 1-8, we can obtain that

$∥ \frac{1}{n} f^{'} (β) + A (β) ∥ = o_{p} (1) .$

The above theorem and lemma have been proven by Cui [7] and will not be repeated in this paper. With Lemma 1, we can obtain the following theorem, which shows that there exists a penalized partition estimate that converges at the rate $O_{p} {\sqrt{d_{n}} (n^{- 1 / 2} + a_{n})}$ .

Theorem 2.

Under conditions 1–8, if $a_{n} \to 0$ , $b_{n} \to 0$ , $d_{n}^{4} / n \to 0$ , and $η_{n}^{4 + 2 γ} / \sqrt{n} \to 0$ as $n \to \infty$ , there exists an approximate zero-crossing $\hat{β}$ of $g (β)$ such that $∥ \hat{β} - β_{0} ∥ = O_{p} {\sqrt{d_{n}} (n^{- 1 / 2} + a_{n})}$ .

From Theorem 2, if $a_{n} = O (n^{- 1 / 2})$ , which can be achieved by selecting the appropriate $λ_{n}$ , then there exists a $r o o t - (n / d_{n})$ -consistent approximate zero-crossing of $g (β)$ . Let

$Σ = d i a g \{p_{λ_{n}}^{″} (| β_{01} |), \dots, p_{λ_{n}}^{″} (| β_{0 s_{n}} |)\},$

and

$b = (p_{λ_{n}}^{'} (| β_{01} |) s g n (β_{01}), \dots, p_{λ_{n}}^{'} (| β_{0 s_{n}} |) s g n (β_{0 s_{n}})) .$

Then, we can obtain Theorem 3, which shows that the proposed estimator achieves the oracle property.

Theorem 3.

Under conditions 1-9, if $a_{n} \to 0$ , $b_{n} \to 0$ , $d_{n}^{4} / n \to 0$ , and $η_{n}^{4 + 2 γ} / \sqrt{n} \to 0$ as $n \to \infty$ , and if $λ_{n} \to 0$ , $\sqrt{n / d_{n}} λ_{n} \to \infty$ , and $a_{n} = O (n^{- 1 / 2})$ , then under the conditions of Theorem 2, with probability tending to 1, there exists a $r o o t - (n / d_{n})$ -consistent approximate zero-crossing $\hat{β} = {({\hat{β}}_{1}^{T}, {\hat{β}}_{2}^{T})}^{T}$ in Theorem 2 such that

(Sparsity) ${\hat{β}}_{2} = 0$ ;

(Asymptotic normality)

(9) $\sqrt{n} (A_{11} + Σ) \{{\hat{β}}_{1} - β_{10} + {(A_{11} + Σ)}^{- 1} b\} \overset{d}{\to} N (0, W_{11}),$

where $A_{11}$ and $W_{11}$ are the first $s_{n} \times s_{n}$ sub-matrices of $A (β_{10}, 0)$ and $W (β_{10}, 0)$ , respectively.

Remark 3.

The two theorems above indicate that with the SCAD penalty function, which means $a_{n} = 0$ , $b = 0$ , and $Σ = 0$ for sufficiently large n, and we have

$\sqrt{n} A_{11} ({\hat{β}}_{1} - β_{10}) \overset{d}{\to} N (0, W_{11}) .$

Thus, ${\hat{β}}_{1}$ possesses the same sampling property as the oracle estimate. The oracle estimator knows $β$ in advance, and ${\hat{β}}_{2} = 0$ is the same as that. Hence, the penalized partition estimator that we propose achieves the oracle property.

4. Implementation

4.1. Solution of Penalized Partition-Estimating Equation

In Section 3, we construct the penalized partition-estimating equation. Here, we provide the method for solving the equation. First, we need to establish a reasonable partition $Π_{n}$ . We make the number of partitions corresponding to each event the same, that is, $L_{1} = \dots = L_{K} = L_{*}$ . To ensure that the penalized partition-estimating equation is effective, we need to ensure that each interval of the partition contains a certain number of event times or failure times. Hence, a reasonable partition method is as follows: for the $k$ –th event, sort the failure times of each subject from small to large, and use the $n / L_{*}, 2 n / L_{*}, \dots, (L_{*} - 1) n / L_{*}$ -th failure time as the cut-point. If $k n / L_{*}$ is not an integer ( $k = 1, \dots, L_{*} - 1$ ), then round it to an integer.

As the derivative function of the SCAD penalty ${(p_{λ} (| β |))}^{'} = p_{λ}^{'} (| β |) s g n (β)$ is discontinuous near 0, we need to obtain an approximate differential penalty function ${\tilde{p}}_{λ}^{'} (θ)$ . We rewrite the penalized partition-estimating equation as follows:

(10) $f (β) - n {\tilde{p}}_{λ}^{'} (| β |) s g n (β) = 0 .$

Then, we need to solve the equation above to obtain the penalized partition estimator. In this study, we use the gradient descent method to solve (10).

4.2. Abnormal Condition Handling Within Zero Neighborhood

We need to address the issue mentioned above, that is, the approximation of the differential penalty function when a component $β_{j}$ of $β$ approaches 0 (abnormal condition handling within the zero neighborhood). In practice, for a very small $ϵ, 0 < ϵ ≪ λ$ , it can be assumed that $β = 0$ if $β \in (- ϵ, ϵ)$ . For the SCAD penalty $p_{λ} (| β |)$ , its derivative function is discontinuous near 0. We use a linear function in a small neighborhood $(- ϵ, ϵ)$ near 0 to obtain the approximate differential penalty function ${\tilde{p}}_{λ}^{'}$ :

${\tilde{p}}_{λ}^{'} (| β |) = λ \{\frac{| β |}{ϵ} I (| β | < ϵ) + I (ϵ ⩽ | β | < λ) + \frac{{(a λ - | β |)}_{+}}{(a - 1) λ} I (| β | ⩾ λ)\} .$

This approximation has little impact on the SCAD penalty function. Since there are random errors in the data and model, it will not have a significant impact on the model’s performance as long as the impact of the small neighborhood

(- ϵ, ϵ)

on the SCAD penalty does not exceed the random errors.

However, there are still issues with this approximation. When $β_{j} \in (- ϵ, 0) \cup (0, ϵ)$ , though $β_{j}$ can be approximated as 0 in the analysis of the estimation, the variation in $β_{j}$ may cause significant variation in $g (β) = f (β) - n {\tilde{p}}_{λ}^{'} (| β |) s g n (β)$ , thus resulting in variation in other components of $β$ , except for $β_{j}$ , which will make the zero point unstable. If we let all $β_{j} \in (- ϵ, ϵ)$ be 0 and prohibit their variation, then although $g (β)$ will remain stable, this method will cause another deficiency, that is, all $β_{j}$ close to 0 can no longer escape $(- ϵ, ϵ)$ once they fall into this interval. This can cause a deviation in the zero point and even make it impossible to solve for the zero point. Therefore, we introduce the following Algorithm 1 to solve this problem.

Algorithm 1 Abnormal condition handling within zero neighborhood

Input: $β, g$

Output: $β, g$

1:. $I_{j} = I (β \in (- ϵ, ϵ))$ , $j = 1, \dots, d$ , $I = (I_{1}, \dots, I_{d})$

$/ /$ I denotes a vector representing whether $β$ is within a small neighborhood around 0

2:. $Δ β = α g$

$/ /$ $Δ β$ denotes the variation in $β$ in the gradient descent method.

3:. $M =$ which $(I = = 1)$

$/ /$ M denotes the index of the components of $β$ that fall into the small neighborhood around 0.

4:. if $M \neq N U L L$ then

5:. if ${max}_{j \in M} | β - Δ β_{j} | < ϵ$ then

6:. $β_{j} = 0$ , $j \in M$

7:. $g (β) = f (β) - n {\tilde{p}}_{λ}^{'} (β)$

8:. $g_{j} = 0$ , $j \in M$

9:. end if

10:. end if

11:. return $β, g$

It can be seen that this method can set a small neighborhood with “attraction” near 0. When $β_{j}$ falls into this neighborhood, it will temporarily be “attracted” to 0. If $g (β)$ has a large value in this component and can leave this special zero neighborhood in the next step, then this component will not be approximated as 0. Otherwise, when all the components “attracted” to 0 cannot leave this special zero neighborhood in the next step, we will approximate these components to 0, calculate $g (β)$ , and perform gradient descent on these components. The next transformation in the algorithm is set to 0.

Condition handling within the zero neighborhood was also mentioned by Fan and Li [21]. They used a quadratic function to approximate the penalty function, resulting in the situation where coefficients are forced to be 0 and can no longer leave the small neighborhood [21]. Our proposed method for abnormal condition handling within the zero neighborhood can solve this problem by setting an appropriate threshold for the “attraction” near 0.

4.3. Tuning Parameter Selection

In order to achieve effective variable selection in the solution of (10), it is necessary to first select appropriate regularization parameters. In their simulations, Fan and Li found that, when $a \approx 3.7$ , the SCAD method provided the best variable selection and coefficient estimation performance in penalty least-squares estimation [21]. Subsequently, many scholars continued to use $a \approx 3.7$ in the field of multivariate survival analysis (see Cai [12], liu [13], and Cai [22]). They pointed out that, from a Bayesian statistical point of view, it is suggested that $a \approx 3.7$ be used, as the Bayes risk cannot be reduced much with other choices of a. In this paper, we also take $a \approx 3.7$ . Therefore, in the following parameter selection, the value of $λ$ needs to be chosen.

Next, we use k-fold cross-validation (k-fold CV) to select $λ$ . Specifically, we use 10 folds. As it is hard to determine the concrete representation of the objective function of our method, we cannot establish the cross-validation statistic. However, we can apply the 10-fold CV method to the WLW method with the SCAD penalty and select the appropriate regularization parameter from it. The basis for doing so is that the WLW method is a special case of partition estimation when the number of partitions $L_{*}$ is set to 1. Therefore, the estimators obtained using the WLW method and the partition-estimating equation have values and errors on the same scale. Hence, it is reasonable to assume that the corresponding optimal regularization parameters have values of similar scale and will not have significant differences.

5. Simulation Study

In this section, we describe our evaluation of the performance of the proposed method based on the results of simulation studies. Raftery [23,24] introduced a bivariate exponential model in 1984. Based on their model, we generate bivariate survival data as follows:

$\begin{matrix} T_{1} & = ((1 - q_{1}) Y_{1} - I_{1} log (U)) exp (- β_{1}^{T} Z), \\ T_{2} & = ((1 - q_{2}) Y_{2} - I_{2} log (1 - U)) exp (- β_{2}^{T} Z), \end{matrix}$

where the covariate Z is a p-dimensional 0–1 vector that indicates the presence or absence of features corresponding to each variable,

T_{1}

and

T_{2}

are the bivariate failure times,

Y_{1}

and

Y_{2}

follow a standard exponential distribution,

U \sim U (0, 1)

I_{1}

and

I_{2}

are independent random variables

P (I_{i} = 1) = q_{i} (i = 1, 2)

, and

q_{1}

q_{2}

are the pre-set parameters used to adjust the correlation between binary exponential distributions. In addition,

Y_{1}

Y_{2}

, U,

I_{1}

, and

I_{2}

are independent.

5.1. Different Numbers of Partitions

This section presents simulation experiments to evaluate the performance of the proposed method in parameter estimation for different numbers of partitions $L_{*}$ . We considered settings with $p = 8$ , $n = 1000$ , and $q_{1} = q_{2} = q = 0.98$ . In each simulation, each component of $β_{1}$ and $β_{2}$ had a probability of $(\frac{1}{6}, \frac{1}{6}, \frac{1}{3}, \frac{1}{6}, \frac{1}{6})$ of taking on a value of $(- 0.5, - 0.25, 0, 0.25, 0.5)$ . This experiment did not set a censor time, and the simulation was repeated 1000 times. Figure 1 shows the simulation results for different numbers of partitions. When the number of partitions is 1, it is the WLW method. The results show that the larger the number of partitions, the smaller the MSE of parameter estimation, indicating that the larger the number of partitions, the better the performance of our method. Initially, as the number of partitions increases, the MSE of parameter estimation will significantly decrease, and the advantages of the proposed method will rapidly expand. When the number of partitions is large, as the number of partitions increases, the rate of MSE reduction in parameter estimation slows down, and the improvement in the proposed method’s performance in parameter estimation will become inapparent.

5.2. Different Correlations Between Various Events in Multivariate Survival Data

This section presents simulation experiments to evaluate the performance of the proposed method in the parameter estimation and variable selection for different correlations between various events in multivariate survival data. We considered settings with p = 8, n = 200, 1000, and $L_{*} = 8$ . Let the true values be $β_{0} = {(0.5, - 0.25, 0, 0, 0, 0, 0.25, 0, 0, - 0.5, 0, 0.25, 0, 0, 0, 0)}^{T}$ . In a simulation study, Cui observed that the partition-estimating function method was superior to the WLW method [7] when there was a significant correlation between various events in multivariate survival data. We let $q_{1} = q_{2} = q = 0.98, 0.8, 0.25$ . Censoring times were generated from a uniform distribution over $(0, c)$ , and we chose $c = 1, 5$ to change the censoring rate. Each configuration had 1000 replications.

We assess the performance of our method using the model error (ME), similar to Fan and Li [16].

(11) $M E = E {e x p (- \hat{β} Z) - e x p (- β_{0} Z)}^{2} .$

In addition, we use the oracle estimator ${\hat{β}}_{O R}$ to define the relative model error (RME) as follows:

(12) $R M E = \frac{M E ({\hat{β}}_{O R})}{M E (\hat{β})} .$

We compared our method (PPEE) with the WLW-with-LASSO method and the WLW-with-SCAD method. The results of 1000 simulated datasets are given in Table 1. The column labeled “RME” provides the median of 1000 RMEs, and the column labeled “C” reports the average number of coefficients correctly estimated as 0, while the column “IC” presents the average number of the coefficients erroneously estimated as 0. From Table 1, we can see that our method performs well in terms of variable selection. When there is a high correlation between $T_{1}$ and $T_{2}$ , our method performs particularly well. While when there is a weak correlation, the performance is slightly better than that of WLW+SCAD but still much better than that of WLW+LASSO. This indicates that our method performs better when the correlation between $T_{1}$ and $T_{2}$ is greater. In practical analysis, there is a certain correlation between failure times in multivariate survival data, which is also the advantage of our method over classical methods. In addition, none of the methods set the coefficients of significant variables to 0; thus, there is no under-fitting situation. Moreover, when the sample size n is large, none of the methods set the coefficients of significant variables to 0, and there is no under-fitting situation.

6. The Colon Cancer Study

In this section, we report the results of applying our method to the dataset collected in the Colon Cancer Study [25]. This study was initiated in the 1980s and included 929 patients with stage C disease randomly assigned to observation, levamisole alone, or levamisole combined with fluorouracil. The observation (Obs), levamisole alone (Lev), and levamisole combined with fluorouracil (Lev + 5-FU) groups comprise 315, 310, and 304 patients, respectively. There are multiple failure outcomes, such as the time to cancer recurrence and the survival time. By the end of the study, 155 patients in Obs, 144 in Lev, and 103 in Lev + 5-FU had experienced recurrences, and 114, 109, and 78 had died, respectively. We are interested in the following risk factors: sex, where 1 is male and 0 is female; age; obstruction of the colon by the tumor (obstruct); adherence to nearby organs (adhere); differentiation of the tumor (differ); perforation of the colon (perfor); number of lymph nodes with detectable cancer (nodes); extent of local spread (extent); more than four positive lymph nodes (node4), coded as 1 if true and 0 otherwise; and time from surgery to registration (surg), coded as 1 for long and 0 for short. In addition, similar to Lin [2], we created two dummy variables as follows: Lev, coded as 1 for the levamisole-alone treatment group and 0 for others, and Lev + 5FU, coded as 1 for the levamisole combined with the fluorouracil treatment group and 0 otherwise.

Table 2 shows the estimated coefficients and standard errors for the Colon Cancer Study using different methods, including the unpenalized method (UNM), LASSO, and our method (PPEE). From Table 2, we can see that our method only keeps five significant variables. In the column “LASSO”, we observe that certain variables may have an impact on one failure event but not on another. According to our method, “Lev + 5FU”, “Extent”, and “Node4” all have a significant impact on the two failure events. “Lev + 5FU” has a negative impact on both death and recurrence, which is consistent with Moertel’s study [26]; that is, levamisole combined with fluorouracil is effective in reducing the mortality rate of colon cancer. In addition, Lin [2] found that “extent” and “node4” increased the risk of colon cancer, and our method is consistent with this.

7. Discussion and Conclusions

We proposed a penalized partition-estimating equation for variable selection in multivariate survival data. The partition-estimating equation was originally proposed by Cui [7]. We developed Cui’s method and proposed the penalized partition-estimating equation to simultaneously estimate the parameters and select variables. Compared with the classic Cox regression method, our method can more effectively select variables and estimate coefficients, which is reflected in our simulation experiments. Moreover, our method makes use of the dependency information among survival times and performs better when the correlation between failure times is greater, which is also considered as an advantage of our method over classical methods, as there is a certain correlation between failure times in multivariate survival data in practical analysis. Moreover, we proved the asymptotic and oracle properties of the proposed method.

Future studies can supplement this work. In this study, our method performed well when failure events have a strong correlation. Therefore, one future task is to consider whether our method still performs well when there is no obvious correlation between events and, if not, explore how this problem can be solved. Another interesting task is ultrahigh-dimensional variable selection.

Author Contributions

Methodology, W.C.; Validation, W.C. and W.T.; formal analysis, W.C. and W.T.; writing—original draft preparation, W.T.; writing—review and editing, W.C. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figure and Tables

Figure 1 Mean squared error (MSE) of parameter estimates for $β$ for different number of partitions.

Table 1

Relative model error.

Method	q	RME	Number of Zeros		RME	Number of Zeros
			C	IC		C	IC
		c = 1			c = 5
n = 200
LASSO		0.473	10.458	0.036	0.531	10.513	0.006
SCAD	0.98	0.677	10.703	0.024	0.745	10.783	0.004
PPEE		0.733	10.712	0.024	0.801	10.785	0.004
LASSO		0.469	10.471	0.030	0.518	10.543	0.005
SCAD	0.8	0.691	10.710	0.030	0.753	10.770	0.006
PPEE		0.703	10.705	0.036	0.783	10.776	0.004
LASSO		0.479	10.443	0.036	0.520	10.412	0.004
SCAD	0.25	0.682	10.702	0.026	0.747	10.773	0.002
PPEE		0.691	10.719	0.024	0.756	10.775	0.002
n = 1000
LASSO		0.583	10.498	0.000	0.612	10.532	0.000
SCAD	0.98	0.752	10.811	0.000	0.796	10.848	0.000
PPEE		0.813	10.819	0.000	0.857	10.855	0.000
LASSO		0.590	10.473	0.000	0.593	10.528	0.000
SCAD	0.8	0.743	10.802	0.000	0.790	10.842	0.000
PPEE		0.792	10.811	0.000	0.824	10.847	0.000
LASSO		0.591	10.482	0.000	0.588	10.541	0.000
SCAD	0.25	0.755	10.807	0.000	0.781	10.837	0.000
PPEE		0.761	10.811	0.000	0.792	10.841	0.000

Table 2

Estimated coefficients and standard errors for the Colon Cancer Study using different methods.

Effect	UNM	LASSO	SCAD	PPEE
	Recurrence
Lev	−0.026 (0.111)	0.000	0.000	0.000
Lev + 5FU	−0.499 (0.122)	−0.441 (0.108)	−0.416 (0.108)	−0.428 (0.107)
Sex	−0.138 (0.096)	0.000	0.000	0.000
Age	−0.003 (0.004)	0.000	0.000	0.000
Obstruct	0.194 (0.119)	0.061 (0.095)	0.050 (0.134)	0.048 (0.103)
Perfor	0.211 (0.257)	0.000	0.000	0.000
Adhere	0.161 (0.130)	0.028 (0.137)	0.028 (0.136)	0.000
Nodes	0.038 (0.015)	0.037 (0.017)	0.000	0.000
Differ	0.153 (0.098)	0.118 (0.108)	0.036 (0.106)	0.024 (0.105)
Extent	0.451 (0.119)	0.414 (0.120)	0.393 (0.119)	0.532 (0.116)
Surg	0.240 (0.104)	0.072 (0.110)	0.084 (0.108)	0.000
Node4	0.591 (0.141)	0.641 (0.103)	0.772 (0.146)	0.751 (0.106)
	Death
Lev	−0.041 (0.114)	0.000	0.000	0.000
Lev + 5FU	−0.362 (0.122)	−0.294 (0.109)	−0.209 (0.108)	−0.226 (0.107)
Sex	0.007 (0.097)	0.000	0.000	0.000
Age	0.008 (0.004)	0.006 (0.004)	0.002 (0.004)	0.000
Obstruct	0.269 (0.120)	0.118 (0.135)	0.098 (0.135)	0.094 (0.131)
Perfor	0.017 (0.270)	0.000	0.000	0.000
Adhere	0.170 (0.131)	0.138 (0.145)	0.130 (0.141)	0.135 (0.126)
Nodes	0.044 (0.015)	0.043 (0.014)	0.000	0.000
Differ	0.138 (0.101)	0.106 (0.110)	0.007 (0.106)	0.003 (0.110)
Extent	0.446 (0.118)	0.420 (0.114)	0.377 (0.111)	0.427 (0.112)
Surg	0.240 (0.106)	0.021 (0.113)	0.079 (0.110)	0.000
Node4	0.667 (0.143)	0.641 (0.128)	0.657 (0.143)	0.899 (0.153)

Appendix A. Proofs

For simplicity, we let $F^{'} (β) = f (β)$ and $G (β) = F (β) - n \sum_{j = 1}^{d_{n}} p_{λ} (| β_{j} |)$ .

Proof of Theorem 2.

To prove Theorem 2, it is sufficient to show that $\hat{β}$ is a local maximizer of $G (β)$ . To prove this, let $α_{n} = \sqrt{d_{n}} (n^{- 1 / 2} + a_{n})$ ; it is sufficient to show that for any $ϵ > 0$ , there exists a large constant C such that(A1) $P \{sup_{∥ u ∥ = C} G (β_{0} + α_{n} u) < G (β_{0})\} ⩾ 1 - ϵ .$

This means that the probability of a local maximum existing in the ball ${β_{0} + α_{n} u : ∥ u ∥ ⩽ C}$ is at least $1 - ϵ$ . Therefore, a local maximizer exists such that $∥ \hat{β} - β_{0} ∥ = O_{p} (α_{n})$ , and $\hat{β}$ is a zero-crossing of $g (β)$ .

Note that $p_{λ} (0) = 0$ and $p_{λ} (\cdot) ⩾ 0$ ; then, $\begin{matrix} L_{n} (u) & = G (β_{0} + α_{n} u) - G (β_{0}) \\ ⩽ {F (β_{0} + α_{n} u) - F (β_{0})} - n \sum_{j = 1}^{s_{n}} {p_{λ_{n}} (| β_{0 j} + α_{n} u_{j} |) - p_{λ_{n}} (| β_{0 j} |)} \\ ≜ I_{1} + I_{2} . \end{matrix}$

Firstly, we consider $I_{1}$ . Let $f^{'} (β) = \frac{\partial}{\partial β^{T}} f (β) .$

According to the Taylor expansion, $\begin{matrix} I_{1} & = α_{n} u^{T} f (β_{0}) + \frac{1}{2} α_{n}^{2} u^{T} f^{'} (β_{n}^{*}) u \\ ≜ I_{11} + I_{12}, \end{matrix}$ where $β_{n}^{*}$ lies between $β_{0}$ and $β_{0} + α_{n} u$ . According to the Cauchy–Schwarz inequality, $I_{11} = α_{n} u^{T} f (β_{0}) ⩽ α_{n} ∥ f (β_{0}) ∥ ∥ u ∥ = O_{p} (α_{n} \sqrt{n d_{n}}) ∥ u ∥ = O_{p} (n α_{n}^{2}) ∥ u ∥ .$

Next, we consider $I_{12}$ . We can obtain $f^{'} (β) = - n Δ {\hat{Ψ}}_{Π_{n}} {(\hat{β_{W}})}^{T} Δ {\hat{Σ}}_{Π_{n}} {(\hat{β_{W}})}^{- 1} Δ {\hat{Ψ}}_{Π_{n}} (β) .$

From Lemma 1, we can obtain(A2) $∥ \frac{1}{n} f^{'} (β) + A (β) ∥ = o_{p} (1) .$

Consequently, $I_{12} = - \frac{1}{2} n α_{n}^{2} u^{T} A (β_{0}) u {1 + o p (1)} .$

Thus, $I_{12}$ dominates $I_{11}$ uniformly in $∥ u ∥ = C$ when choosing a sufficiently large C. Then, using the Taylor expansion, we can obtain $\begin{matrix} I_{2} & = - \sum_{j = 1}^{s_{n}} \{n α_{n} p_{λ_{n}}^{'} (| β_{0 j} |) s g n (β_{0 j}) u_{j} + n α_{n}^{2} p_{λ_{n}}^{″} (| β_{0 j} |) u_{j}^{2} {1 + o (1)}\} \\ ≜ - I_{21} - I_{22} . \end{matrix}$

Using the Cauchy–Schwarz inequality, $| I_{21} | ⩽ \sqrt{s_{n}} n α_{n} a_{n} ∥ u ∥ ⩽ \sqrt{d_{n}} n α_{n} a_{n} ∥ u ∥ ⩽ n α_{n}^{2} ∥ u ∥ .$

Furthermore, $| I_{22} | = n α_{n}^{2} \sum_{j = 1}^{s_{n}} |p_{λ_{n}}^{″} (| β_{0 j} |)| u_{j}^{2} {1 + o (1)} ⩽ 2 b_{n} n α_{n}^{2} {∥ u ∥}^{2} .$

$b_{n} \to 0$ , and therefore, we can choose a sufficiently large C so that $I_{12}$ also dominates both $I_{21}$ and $I_{22}$ . This means that (A1) holds, and the proof is complete. □

To prove Theorem 3, we introduce a lemma first.

Lemma A1.

Under the conditions of Theorem 3, with a probability tending to 1, for any given $β_{1}$ satisfying $∥ β_{1} - β_{10} ∥ = O_{p} (\sqrt{d_{n} / n})$ and any constant C, the following holds: (A3) $G {{(β_{1}^{T}, 0)}^{T}} = max_{∥ β_{2} ∥ ⩽ C \sqrt{d_{n} / n}} G {{(β_{1}^{T}, β_{2}^{T})}^{T}} .$

Proof.

It is sufficient to show that for any $β_{1}$ and $β_{2}$ satisfying the conditions above, $\partial G (β) / \partial β_{j}$ and $β_{j}$ have different signs for all $β_{j} \in (- C \sqrt{d_{n} / n}, C \sqrt{d_{n} / n}), j = s_{n} + 1, \dots, d_{n}$ . We have $\frac{\partial G (β)}{\partial β_{j}} = \frac{\partial F (β)}{\partial β_{j}} - n p_{λ_{n}}^{'} (| β_{j} |) s g n (β_{j}) .$

Using the Taylor expansion, we have $\begin{matrix} \frac{\partial F (β)}{\partial β_{j}} & = \frac{\partial F (β_{0})}{\partial β_{j}} - n \sum_{l = 1}^{d_{n}} A_{j l} (β_{0}) (β_{l} - β_{0 l}) {1 + o_{p} (1)} \\ = I - I I, \end{matrix}$ where $A_{j l} (β_{0})$ is the $(j, l)$ -element of $A (β_{0})$ . According to the standard argument, it follows that $I = O_{p} (\sqrt{n}) = O_{p} (\sqrt{n d_{n}}) .$

Next, we consider $I I$ . According to the Cauchy–Schwarz inequality, it follows that $∥ I I ∥ ⩽ n {\{\sum_{l = 1}^{d_{n}} A_{j l}^{2} (β_{0})\}}^{1 / 2} ∥ β - β_{0} ∥,$ and then we have $\sum_{j = 1}^{d_{n}} A_{j l}^{2} (β_{0}) = e_{j}^{T} A {(β_{0})}^{T} A (β) e_{j} ⩽ λ_{m a x}^{2} {A (β_{0})} = O (1),$ where $e_{j}$ is a $d_{n} \times 1$ vector whose jth element is 1 and others are 0. Since $β_{1} - β_{10} = O_{p} (\sqrt{d_{n} / n})$ and $∥ β_{2} ∥ ⩽ C \sqrt{d_{n} / n}$ , we can obtain $∥ β - β_{0} ∥ = O_{p} (\sqrt{d_{n} / n})$ . Hence, $I I = O_{p} (\sqrt{n d_{n}}), \frac{\partial F (β)}{\partial β_{j}} = O_{p} (\sqrt{n d_{n}}) .$

Thus, $\frac{\partial G (β)}{\partial β_{j}} = n λ_{n} [- λ_{n}^{- 1} p_{λ_{n}}^{'} (| β_{j} |) s g n (β_{j}) + O_{p} {1 / (λ_{n} \sqrt{n / d_{n}})}] .$

Given the assumption that $λ_{n} \sqrt{n / d_{n}} \to \infty$ , $O_{p} {1 / (λ_{n} \sqrt{n / d_{n}})} = o_{p} (1)$ , the sign of $\frac{\partial G (β)}{\partial β_{j}}$ is determined by the sign of $β_{j}$ . Thus, we complete the proof. □

Proof of Theorem 3.

We can immediately prove part 1 through Lemma A1. To prove part 2, it is sufficient to show that(A4) $f_{1} (β_{0}) - n (A_{11} + Σ) \{\hat{β_{1}} - β_{10} + {(A_{11} + Σ)}^{- 1} b\} + o_{p} (\sqrt{n}) = 0,$ where $f_{1} (β_{0})$ denotes the first $s_{n}$ components of $f (β_{0})$ . Then, $\sqrt{n} (A_{11} + Σ) \{{\hat{β}}_{1} - β_{10} + {(A_{11} + Σ)}^{- 1} b\} = \frac{1}{\sqrt{n}} f_{1} (β_{0}) + o_{p} (1)$

According to condition (F) and Cui [7], we have $\frac{1}{\sqrt{n}} f_{1} (β_{0}) \overset{d}{\to} N (0, W_{11}) .$

Thus, we need to prove that (A4) is valid. There exists a $\hat{β_{1}}$ in Theorem 2 that is a $r o o t - (n / d_{n})$ -consistent approximate zero-crossing of $g {{(β_{1}, 0)}^{T}}$ , and it satisfies $\partial G {{({\hat{β}}_{1}, 0)}^{T}} / \partial β_{j} = 0, j = 1, \dots, s_{n} .$

Using the Taylor expansion to $\partial G {{({\hat{β}}_{1}, 0)}^{T}} / \partial β_{1}$ at $β_{10}$ , it follows that $f_{1} (β_{0}) + f_{1}^{'} (β_{1}^{*}) (\hat{β_{1}} - β_{10} - n b - n Σ^{*} (\hat{β_{1}} - β_{10})) = 0,$ where $β_{1}^{*}$ lies between $\hat{β_{1}}$ and $β_{10}$ , and $Σ^{*} = d i a g \{p_{λ_{n}}^{″} (| β_{1}^{*} |), \dots, p_{λ_{n}}^{″} (| β_{s_{n}}^{*} |)\}$ . From (A2), we have $\begin{matrix} ∥ {f_{1}^{'} (β_{1}^{*}) + n A_{11}} (\hat{β_{1}} - β_{10}) ∥ & ⩽ n ∥ n^{- 1} f_{1}^{'} (β_{1}^{*}) + A_{11} ∥ ∥ \hat{β_{1}} - β_{10} ∥ \\ ⩽ n o_{p} (1) O_{p} (\sqrt{d_{n} / n}) \\ = o_{p} (\sqrt{n / d_{n}}) \\ = o_{p} (\sqrt{n}) . \end{matrix}$

From condition (I), it follows that $∥ n (Σ^{*} - Σ) (\hat{β_{1}} - β_{10}) ∥ ⩽ n C_{5} {∥ \hat{β_{1}} - β_{10} ∥}^{2} = C_{4} n O_{p} (d_{n} / n) = O_{p} (d_{n}) = o_{p} (\sqrt{n}) .$

Therefore, (A4) holds. Thus, we complete the proof. □

References

1. Liang, K.Y.; Self, S.G.; Chang, Y.C. Modelling marginal hazards in multivariate failure time data. J. R. Stat. Soc. Ser. B Stat. Methodol.; 1993; 55, pp. 441-453. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1993.tb01914.x]

2. Lin, D.Y. Cox regression analysis of multivariate failure time data: The marginal approach. Stat. Med.; 1994; 13, pp. 2233-2247. [DOI: https://dx.doi.org/10.1002/sim.4780132105] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/7846422]

3. Spiekerman, C.F.; Lin, D.Y. Marginal regression models for multivariate failure time data. J. Am. Stat. Assoc.; 1998; 93, pp. 1164-1175. [DOI: https://dx.doi.org/10.1080/01621459.1998.10473777]

4. Wei, L.J.; Lin, D.Y.; Weissfeld, L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J. Am. Stat. Assoc.; 1989; 84, pp. 1065-1073. [DOI: https://dx.doi.org/10.1080/01621459.1989.10478873]

5. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B Methodol.; 1972; 34, pp. 187-202. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1972.tb00899.x]

6. Cox, D.R. Partial likelihood. Biometrika; 1975; 62, pp. 269-276. [DOI: https://dx.doi.org/10.1093/biomet/62.2.269]

7. Cui, W.Q.; Ying, Z.L.; Zhao, L.C. A simple construction of optimal estimation in multivariate marginal Cox regression. Sci. China Math.; 2012; 55, pp. 1827-1857. [DOI: https://dx.doi.org/10.1007/s11425-012-4400-4]

8. Tibshirani, R. The lasso method for variable selection in the Cox model. Stat. Med.; 1997; 16, pp. 385-395. [DOI: https://dx.doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3]

9. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc.; 2006; 101, pp. 1418-1429. [DOI: https://dx.doi.org/10.1198/016214506000000735]

10. Zhang, H.H.; Lu, W. Adaptive Lasso for Cox’s proportional hazards model. Biometrika; 2007; 94, pp. 691-703. [DOI: https://dx.doi.org/10.1093/biomet/asm037]

11. Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Statist.; 2010; 38, pp. 894-942. [DOI: https://dx.doi.org/10.1214/09-AOS729] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/17244211]

12. Cai, J.W.; Fan, J.Q.; Li, R.Z.; Zhou, H.B. Variable selection for multivariate failure time data. Biometrika; 2005; 92, pp. 303-316. [DOI: https://dx.doi.org/10.1093/biomet/92.2.303] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19458784]

13. Liu, J.C.; Zhang, R.Q.; Zhao, W.H.; Lv, Y.Z. Variable selection in semiparametric hazard regression for multivariate survival data. J. Multivar. Anal.; 2015; 142, pp. 26-40. [DOI: https://dx.doi.org/10.1016/j.jmva.2015.07.015]

14. Sun, L.; Li, S.; Wang, L.; Song, X.; Sui, X. Simultaneous variable selection in regression analysis of multivariate interval-censored data. Biometrics; 2022; 78, pp. 1402-1413. [DOI: https://dx.doi.org/10.1111/biom.13548]

15. Fan, J.Q.; Li, R.Z. Variable Selection via Penalized Likelihood; Department of Statistics, UCLA: Los Angeles, CA, USA, 1999.

16. Fan, J.Q.; Li, R.Z. Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat.; 2002; 30, pp. 74-99. [DOI: https://dx.doi.org/10.1214/aos/1015362185]

17. Andersen, P.K.; Gill, R.D. Cox’s regression model for counting processes: A large sample study. Ann. Stat.; 1982; 10, pp. 1100-1120. [DOI: https://dx.doi.org/10.1214/aos/1176345976]

18. Cai, J.W. Hypothesis testing of hazard ratio parameters in marginal models for multivariate failure time data. Lifetime Data Anal.; 1999; 5, pp. 39-53. [DOI: https://dx.doi.org/10.1023/A:1009679032314]

19. Cui, W.Q. Analysis of Multivariate Survival Data by Marginal Proportional Hazards Regression Models. Ph.D. Thesis; University of Science and Technology of China: Hefei, China, 2004; (In Chinese)

20. Fan, J.Q.; Li, R.Z. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Am. Stat. Assoc.; 2004; 99, pp. 710-723. [DOI: https://dx.doi.org/10.1198/016214504000001060]

21. Fan, J.Q.; Li, R.Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.; 2001; 96, pp. 1348-1360. [DOI: https://dx.doi.org/10.1198/016214501753382273]

22. Cai, K. Bi-level Variable Selection and Dimension-reduction Methods in Complex Lifetime Data Analytics. Ph.D. Thesis; University of Calgary: Calgary, AB, Canada, 2019.

23. Raftery, A.E. A continuous multivariate exponential distribution. Commun. Stat. Theory Methods; 1984; 13, pp. 947-965. [DOI: https://dx.doi.org/10.1080/03610928408828733]

24. Raftery, A.E. Some properties of a new continuous bivariate exponential distribution. Stat. Decis. Suppl. Issue; 1985; 2, pp. 53-58.

25. Moertel, C.G.; Fleming, T.R.; Macdonald, J.S.; Haller, D.G.; Laurie, J.A.; Goodman, P.J.; Ungerleider, J.S.; Emerson, W.A.; Tormey, D.C.; Glick, J.H. . Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. N. Engl. J. Med.; 1990; 322, pp. 352-358. [DOI: https://dx.doi.org/10.1056/NEJM199002083220602] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/2300087]

26. Moertel, C.G.; Fleming, T.R.; Macdonald, J.S.; Haller, D.G.; Laurie, J.A.; Tangen, C.M.; Ungerleider, J.S.; Emerson, W.A.; Tormey, D.C.; Glick, J.H. . Fluorouracil plus levamisole as effective adjuvant therapy after resection of stage III colon carcinoma: A final report. Ann. Intern. Med.; 1995; 122, pp. 321-326. [DOI: https://dx.doi.org/10.7326/0003-4819-122-5-199503010-00001] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/7847642]

Word count: 5685

Show less

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

In this paper, we propose a new variable selection method using a partitioning-based estimating equation for multivariate survival data to simultaneously perform variable selection and parameter estimation. The main idea of the partitioning-based estimating equation is to partition the score function into small blocks. We construct our method using the SCAD penalty function and achieve the purpose of directly selecting variables through the estimating equation. We further establish asymptotic normality and prove that our method achieves the oracle property. Moreover, we use a simple approximation of the penalty function such that our method can be implemented algorithmically. We conducted simulation studies to validate the performance of our method and analyzed the dataset from the Colon Cancer Study.

Details

Title

A Partitioning-Based Approach to Variable Selection in WLW Model for Multivariate Survival Data

Author

Tian Wenjian; Cui Wenquan

First page

348

Publication year

2025

Publication date

2025

Publisher

MDPI AG

e-ISSN

20751680

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/axioms14050348

ProQuest document ID

3211858320

A Partitioning-Based Approach to Variable Selection in WLW Model for Multivariate Survival Data

Jump to:

Full Text

1. Introduction

2. Notation and Assumptions

3. Main Results

3.1. Construction of Estimators

3.2. Asymptotic and Oracle Properties of the Proposed Estimator

4. Implementation

4.1. Solution of Penalized Partition-Estimating Equation

4.2. Abnormal Condition Handling Within Zero Neighborhood

4.3. Tuning Parameter Selection

5. Simulation Study

5.1. Different Numbers of Partitions

5.2. Different Correlations Between Various Events in Multivariate Survival Data

6. The Colon Cancer Study

7. Discussion and Conclusions

Abstract

Details

Suggested sources