1. Introduction
Biomedical research often involves multivariate survival data, such as cancer patients facing local recurrence and repeated hospitalizations in the context of chronic disease management. A significant characteristic of these data is that the survival times of the same individual are correlated. Theoretical development is relatively slow due to the complexity of the dependency structure among survival times, and researchers have mainly focused on modeling the marginal distributions of survival times (see Liang [1], Lin [2], and Spiekerman [3]), where the dependence is not specified.
The most widely used approach is the WLW method [4]. Suppose that there is a random sample of n subjects. Let be the kth survival time of the ith subject, where and . In the WLW method, the marginal hazard function of is assumed to take the form
where is a p-dimensional possibly covariate vector, is the unspecified baseline hazard function, and represents the true values of unknown regression coefficients. Let and be the censoring time. Let and . Assume that and are given independent covariates of . Then, the marginal partial likelihood ([5,6]) is and the corresponding score function is(1)
where is the risk set at t, , , andThus, we have K sets of estimating equations as follows:
(2)
We can obtain the estimators . Let , which is known as the WLW estimator.
Cui [7] proposed a new method to improve the efficiency of the WLW method, which is called the partition method or partition-estimating equation in this paper. The main idea of their method is to partition the score function into small blocks. He further explored the situation where the number of blocks increases with the sample size, and he established the asymptotic normality of the estimators obtained using their method. His method described and made use of the dependency information among survival times, and the simulation results showed that their method performed better than the WLW method.
In practice, it is always difficult for investigators to identify significant covariates when the number of covariates is large, and variable selection studies increasingly involve the analysis of survival data with high-dimensional covariates to solve this difficulty. Tibshirani [8] proposed the application of the penalty function (LASSO) in the Cox model; Zou [9] proposed the adaptive Lasso (AdpLASSO), which Zhang [10] studied in the Cox model; and Zhang [11] proposed the minimax concave penalty. Several studies have focused on variable selection for multivariate survival data. For example, Cai [12] proposed a variable selection method for a growing number of regression coefficients; Liu [13] proposed a multivariate varying-coefficient hazard model; and Sun [14] developed a variable selection technique for multivariate interval-censored data.
Fan and Li [15] proposed a new type of penalty function called Smoothly Clipped Absolute Deviation (SCAD). The SCAD method combines characteristics of the LASSO and least squares. It compresses the coefficients of the model through a penalty function such that some coefficients are compressed to 0, thereby achieving variable selection, and the larger coefficients can also achieve asymptotically unbiased estimates. Moreover, Fan and Li proposed the oracle property and then introduced the SCAD penalty function into the Cox model [16].
In this paper, we aim to further improve the partition method to propose a new variable selection method for multivariate survival data. Based on the partition method, we make better use of the dependency information among survival times compared to the WLW method. Moreover, we directly achieve the purpose of variable selection using estimating equations. We construct our method with the SCAD penalty function and prove that the obtained estimators possess the oracle property. Numerical studies show that the proposed method performs well.
The rest of this paper is organized as follows. In Section 2, we present the used notation and assumptions. Then, we introduce our method and present its asymptotic and oracle properties in Section 3. We address implementation issues in Section 4, while simulations and an application of our method to real data are given in Section 5 and Section 6. We leave the proofs in the Appendix A.
2. Notation and Assumptions
, and are the same as in Section 1. To facilitate the notation, let
(3)
which is a martingale with respect to -filtration .For , let
where E denotes expectation. For a column vector , , , and .The preliminary estimators of the baseline cumulative hazard functions are given by
where . We obtain the following estimated martingales: where are the WLW estimators.We consider only events up to such that for all k. Let
Cui [7] introduced a partition. For the kth event, partition into intervals is expressed as follows:
Let define partition , as follows:
(4)
Following Cui [7], we break into pieces as follows:
where, for ,Let . Cui [7] introduced the following notations:
Cui [7] focused on the situation in which varies with sample size n. Let
such thatIn addition, he proposed the following estimating equations:
(5)
We impose the following conditions: a.s. for , and some constant is a continuous function of , and there exist constants and such that There exists a neighborhood of such that, for and ,
(k = 1, …, K; d = 0, 1, 2) is a continuous function of uniformly in and is bounded on , , is a constant, , , and
for , and .is a positive definite matrix on ,
and represents the minimum eigenvalue of the matrix.For all sufficiently large n, there exists . We use to denote the partition index for partition , is an increasing positive sequence, and there exists a constant such that
where we assume that , and .Let and . Let
and assume that there exists a constant such thatAssume that the penalty function satisfies
for all . Furthermore, we assume that there exists a constant such that, for nonzero and , .Conditions 1 and 2 are also used by Andersen et al. [17]. Conditions 4–8, which are adapted from Cai et al. [18] and Cui et al. [7], guarantee the asymptotic normality of the penalized partition estimator. Condition 3 is satisfied for most commonly used distributions in survival analysis [19]. Condition 9 is also used by Cai et al. [12]
3. Main Results
3.1. Construction of Estimators
We introduce the penalized partition-estimating Equation (PPEE) as follows:
(6)
where , and is a penalty function. We consider the differential form of the SCAD penalty proposed by Fan and Li [15,20] defined by for some .Then, we can obtain the estimators by solving the penalized partition-estimating Equation (6). The SCAD penalty function involves two tuning parameters, a and . We will explain how to choose the parameters and how to obtain the estimators in Section 4.
3.2. Asymptotic and Oracle Properties of the Proposed Estimator
Fan and Li proposed the oracle property [21], which means that the estimator has the same limiting distribution as an estimator that knows the true model a priori. In this section, we will provide the asymptotic properties for and show that it achieves the oracle property.
We consider the situation mentioned by Cai [12], where the regression coefficient varies with the sample size n, that is, , where tends to ∞ as and . Let denote the true value of . Furthermore, we let and , respectively, denote the nonzero and zero components of . Without loss of generality, we write and suppose that for and for , which means that consists of the nonzero components of . Let
(7)
and(8)
In this section, we will use instead of to emphasize that depends on n. For simplicity, we let and .
Here, we discuss the compatibility of the given parameters. We first impose the condition that the SCAD penalty function possesses the following property: for nonzero fixed θ, and . This can be satisfied by appropriately choosing the regularization parameter . If we choose and , the above property holds because for ; thus, and can be satisfied. Furthermore, we can obtain that, for any given constant M, , which means that condition (I) can be satisfied. Therefore, the conditions given in the upcoming theorem will not contradict each other.
Based on Cui’s research [7], we have the following theorem and lemma:
Under conditions (1)–(8), let be the partition sequence that satisfies the following condition:
Then, there exists a matrix W such that
and
Under conditions 1-8, we can obtain that
The above theorem and lemma have been proven by Cui [7] and will not be repeated in this paper. With Lemma 1, we can obtain the following theorem, which shows that there exists a penalized partition estimate that converges at the rate .
Under conditions 1–8, if , , , and as , there exists an approximate zero-crossing of such that .
From Theorem 2, if , which can be achieved by selecting the appropriate , then there exists a -consistent approximate zero-crossing of . Let
andThen, we can obtain Theorem 3, which shows that the proposed estimator achieves the oracle property.
Under conditions 1-9, if , , , and as , and if , , and , then under the conditions of Theorem 2, with probability tending to 1, there exists a -consistent approximate zero-crossing in Theorem 2 such that
(Sparsity); (Asymptotic normality) (9)
The two theorems above indicate that with the SCAD penalty function, which means , , and for sufficiently large n, and we have
Thus, possesses the same sampling property as the oracle estimate. The oracle estimator knows in advance, and is the same as that. Hence, the penalized partition estimator that we propose achieves the oracle property.
4. Implementation
4.1. Solution of Penalized Partition-Estimating Equation
In Section 3, we construct the penalized partition-estimating equation. Here, we provide the method for solving the equation. First, we need to establish a reasonable partition . We make the number of partitions corresponding to each event the same, that is, . To ensure that the penalized partition-estimating equation is effective, we need to ensure that each interval of the partition contains a certain number of event times or failure times. Hence, a reasonable partition method is as follows: for the –th event, sort the failure times of each subject from small to large, and use the -th failure time as the cut-point. If is not an integer (), then round it to an integer.
As the derivative function of the SCAD penalty is discontinuous near 0, we need to obtain an approximate differential penalty function . We rewrite the penalized partition-estimating equation as follows:
(10)
Then, we need to solve the equation above to obtain the penalized partition estimator. In this study, we use the gradient descent method to solve (10).
4.2. Abnormal Condition Handling Within Zero Neighborhood
We need to address the issue mentioned above, that is, the approximation of the differential penalty function when a component of approaches 0 (abnormal condition handling within the zero neighborhood). In practice, for a very small , it can be assumed that if . For the SCAD penalty , its derivative function is discontinuous near 0. We use a linear function in a small neighborhood near 0 to obtain the approximate differential penalty function :
This approximation has little impact on the SCAD penalty function. Since there are random errors in the data and model, it will not have a significant impact on the model’s performance as long as the impact of the small neighborhood on the SCAD penalty does not exceed the random errors.However, there are still issues with this approximation. When , though can be approximated as 0 in the analysis of the estimation, the variation in may cause significant variation in , thus resulting in variation in other components of , except for , which will make the zero point unstable. If we let all be 0 and prohibit their variation, then although will remain stable, this method will cause another deficiency, that is, all close to 0 can no longer escape once they fall into this interval. This can cause a deviation in the zero point and even make it impossible to solve for the zero point. Therefore, we introduce the following Algorithm 1 to solve this problem.
Algorithm 1 Abnormal condition handling within zero neighborhood |
Input: Output: 1:. , , I denotes a vector representing whether is within a small neighborhood around 0 2:. denotes the variation in in the gradient descent method. 3:. which M denotes the index of the components of that fall into the small neighborhood around 0. 4:. if then 5:. if then 6:. , 7:. 8:. , 9:. end if 10:. end if 11:. return |
It can be seen that this method can set a small neighborhood with “attraction” near 0. When falls into this neighborhood, it will temporarily be “attracted” to 0. If has a large value in this component and can leave this special zero neighborhood in the next step, then this component will not be approximated as 0. Otherwise, when all the components “attracted” to 0 cannot leave this special zero neighborhood in the next step, we will approximate these components to 0, calculate , and perform gradient descent on these components. The next transformation in the algorithm is set to 0.
Condition handling within the zero neighborhood was also mentioned by Fan and Li [21]. They used a quadratic function to approximate the penalty function, resulting in the situation where coefficients are forced to be 0 and can no longer leave the small neighborhood [21]. Our proposed method for abnormal condition handling within the zero neighborhood can solve this problem by setting an appropriate threshold for the “attraction” near 0.
4.3. Tuning Parameter Selection
In order to achieve effective variable selection in the solution of (10), it is necessary to first select appropriate regularization parameters. In their simulations, Fan and Li found that, when , the SCAD method provided the best variable selection and coefficient estimation performance in penalty least-squares estimation [21]. Subsequently, many scholars continued to use in the field of multivariate survival analysis (see Cai [12], liu [13], and Cai [22]). They pointed out that, from a Bayesian statistical point of view, it is suggested that be used, as the Bayes risk cannot be reduced much with other choices of a. In this paper, we also take . Therefore, in the following parameter selection, the value of needs to be chosen.
Next, we use k-fold cross-validation (k-fold CV) to select . Specifically, we use 10 folds. As it is hard to determine the concrete representation of the objective function of our method, we cannot establish the cross-validation statistic. However, we can apply the 10-fold CV method to the WLW method with the SCAD penalty and select the appropriate regularization parameter from it. The basis for doing so is that the WLW method is a special case of partition estimation when the number of partitions is set to 1. Therefore, the estimators obtained using the WLW method and the partition-estimating equation have values and errors on the same scale. Hence, it is reasonable to assume that the corresponding optimal regularization parameters have values of similar scale and will not have significant differences.
5. Simulation Study
In this section, we describe our evaluation of the performance of the proposed method based on the results of simulation studies. Raftery [23,24] introduced a bivariate exponential model in 1984. Based on their model, we generate bivariate survival data as follows:
where the covariate Z is a p-dimensional 0–1 vector that indicates the presence or absence of features corresponding to each variable, and are the bivariate failure times, and follow a standard exponential distribution, , and are independent random variables , and are the pre-set parameters used to adjust the correlation between binary exponential distributions. In addition, , , U, , and are independent.5.1. Different Numbers of Partitions
This section presents simulation experiments to evaluate the performance of the proposed method in parameter estimation for different numbers of partitions . We considered settings with , , and . In each simulation, each component of and had a probability of of taking on a value of . This experiment did not set a censor time, and the simulation was repeated 1000 times. Figure 1 shows the simulation results for different numbers of partitions. When the number of partitions is 1, it is the WLW method. The results show that the larger the number of partitions, the smaller the MSE of parameter estimation, indicating that the larger the number of partitions, the better the performance of our method. Initially, as the number of partitions increases, the MSE of parameter estimation will significantly decrease, and the advantages of the proposed method will rapidly expand. When the number of partitions is large, as the number of partitions increases, the rate of MSE reduction in parameter estimation slows down, and the improvement in the proposed method’s performance in parameter estimation will become inapparent.
5.2. Different Correlations Between Various Events in Multivariate Survival Data
This section presents simulation experiments to evaluate the performance of the proposed method in the parameter estimation and variable selection for different correlations between various events in multivariate survival data. We considered settings with p = 8, n = 200, 1000, and . Let the true values be . In a simulation study, Cui observed that the partition-estimating function method was superior to the WLW method [7] when there was a significant correlation between various events in multivariate survival data. We let . Censoring times were generated from a uniform distribution over , and we chose to change the censoring rate. Each configuration had 1000 replications.
We assess the performance of our method using the model error (ME), similar to Fan and Li [16].
(11)
In addition, we use the oracle estimator to define the relative model error (RME) as follows:
(12)
We compared our method (PPEE) with the WLW-with-LASSO method and the WLW-with-SCAD method. The results of 1000 simulated datasets are given in Table 1. The column labeled “RME” provides the median of 1000 RMEs, and the column labeled “C” reports the average number of coefficients correctly estimated as 0, while the column “IC” presents the average number of the coefficients erroneously estimated as 0. From Table 1, we can see that our method performs well in terms of variable selection. When there is a high correlation between and , our method performs particularly well. While when there is a weak correlation, the performance is slightly better than that of WLW+SCAD but still much better than that of WLW+LASSO. This indicates that our method performs better when the correlation between and is greater. In practical analysis, there is a certain correlation between failure times in multivariate survival data, which is also the advantage of our method over classical methods. In addition, none of the methods set the coefficients of significant variables to 0; thus, there is no under-fitting situation. Moreover, when the sample size n is large, none of the methods set the coefficients of significant variables to 0, and there is no under-fitting situation.
6. The Colon Cancer Study
In this section, we report the results of applying our method to the dataset collected in the Colon Cancer Study [25]. This study was initiated in the 1980s and included 929 patients with stage C disease randomly assigned to observation, levamisole alone, or levamisole combined with fluorouracil. The observation (Obs), levamisole alone (Lev), and levamisole combined with fluorouracil (Lev + 5-FU) groups comprise 315, 310, and 304 patients, respectively. There are multiple failure outcomes, such as the time to cancer recurrence and the survival time. By the end of the study, 155 patients in Obs, 144 in Lev, and 103 in Lev + 5-FU had experienced recurrences, and 114, 109, and 78 had died, respectively. We are interested in the following risk factors: sex, where 1 is male and 0 is female; age; obstruction of the colon by the tumor (obstruct); adherence to nearby organs (adhere); differentiation of the tumor (differ); perforation of the colon (perfor); number of lymph nodes with detectable cancer (nodes); extent of local spread (extent); more than four positive lymph nodes (node4), coded as 1 if true and 0 otherwise; and time from surgery to registration (surg), coded as 1 for long and 0 for short. In addition, similar to Lin [2], we created two dummy variables as follows: Lev, coded as 1 for the levamisole-alone treatment group and 0 for others, and Lev + 5FU, coded as 1 for the levamisole combined with the fluorouracil treatment group and 0 otherwise.
Table 2 shows the estimated coefficients and standard errors for the Colon Cancer Study using different methods, including the unpenalized method (UNM), LASSO, and our method (PPEE). From Table 2, we can see that our method only keeps five significant variables. In the column “LASSO”, we observe that certain variables may have an impact on one failure event but not on another. According to our method, “Lev + 5FU”, “Extent”, and “Node4” all have a significant impact on the two failure events. “Lev + 5FU” has a negative impact on both death and recurrence, which is consistent with Moertel’s study [26]; that is, levamisole combined with fluorouracil is effective in reducing the mortality rate of colon cancer. In addition, Lin [2] found that “extent” and “node4” increased the risk of colon cancer, and our method is consistent with this.
7. Discussion and Conclusions
We proposed a penalized partition-estimating equation for variable selection in multivariate survival data. The partition-estimating equation was originally proposed by Cui [7]. We developed Cui’s method and proposed the penalized partition-estimating equation to simultaneously estimate the parameters and select variables. Compared with the classic Cox regression method, our method can more effectively select variables and estimate coefficients, which is reflected in our simulation experiments. Moreover, our method makes use of the dependency information among survival times and performs better when the correlation between failure times is greater, which is also considered as an advantage of our method over classical methods, as there is a certain correlation between failure times in multivariate survival data in practical analysis. Moreover, we proved the asymptotic and oracle properties of the proposed method.
Future studies can supplement this work. In this study, our method performed well when failure events have a strong correlation. Therefore, one future task is to consider whether our method still performs well when there is no obvious correlation between events and, if not, explore how this problem can be solved. Another interesting task is ultrahigh-dimensional variable selection.
Methodology, W.C.; Validation, W.C. and W.T.; formal analysis, W.C. and W.T.; writing—original draft preparation, W.T.; writing—review and editing, W.C. All authors have read and agreed to the published version of the manuscript.
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 Mean squared error (MSE) of parameter estimates for
Relative model error.
Method | q | RME | Number of Zeros | RME | Number of Zeros | ||
---|---|---|---|---|---|---|---|
C | IC | C | IC | ||||
c = 1 | c = 5 | ||||||
n = 200 | |||||||
LASSO | 0.473 | 10.458 | 0.036 | 0.531 | 10.513 | 0.006 | |
SCAD | 0.98 | 0.677 | 10.703 | 0.024 | 0.745 | 10.783 | 0.004 |
PPEE | 0.733 | 10.712 | 0.024 | 0.801 | 10.785 | 0.004 | |
LASSO | 0.469 | 10.471 | 0.030 | 0.518 | 10.543 | 0.005 | |
SCAD | 0.8 | 0.691 | 10.710 | 0.030 | 0.753 | 10.770 | 0.006 |
PPEE | 0.703 | 10.705 | 0.036 | 0.783 | 10.776 | 0.004 | |
LASSO | 0.479 | 10.443 | 0.036 | 0.520 | 10.412 | 0.004 | |
SCAD | 0.25 | 0.682 | 10.702 | 0.026 | 0.747 | 10.773 | 0.002 |
PPEE | 0.691 | 10.719 | 0.024 | 0.756 | 10.775 | 0.002 | |
n = 1000 | |||||||
LASSO | 0.583 | 10.498 | 0.000 | 0.612 | 10.532 | 0.000 | |
SCAD | 0.98 | 0.752 | 10.811 | 0.000 | 0.796 | 10.848 | 0.000 |
PPEE | 0.813 | 10.819 | 0.000 | 0.857 | 10.855 | 0.000 | |
LASSO | 0.590 | 10.473 | 0.000 | 0.593 | 10.528 | 0.000 | |
SCAD | 0.8 | 0.743 | 10.802 | 0.000 | 0.790 | 10.842 | 0.000 |
PPEE | 0.792 | 10.811 | 0.000 | 0.824 | 10.847 | 0.000 | |
LASSO | 0.591 | 10.482 | 0.000 | 0.588 | 10.541 | 0.000 | |
SCAD | 0.25 | 0.755 | 10.807 | 0.000 | 0.781 | 10.837 | 0.000 |
PPEE | 0.761 | 10.811 | 0.000 | 0.792 | 10.841 | 0.000 |
Estimated coefficients and standard errors for the Colon Cancer Study using different methods.
Effect | UNM | LASSO | SCAD | PPEE |
---|---|---|---|---|
Recurrence | ||||
Lev | −0.026 (0.111) | 0.000 | 0.000 | 0.000 |
Lev + 5FU | −0.499 (0.122) | −0.441 (0.108) | −0.416 (0.108) | −0.428 (0.107) |
Sex | −0.138 (0.096) | 0.000 | 0.000 | 0.000 |
Age | −0.003 (0.004) | 0.000 | 0.000 | 0.000 |
Obstruct | 0.194 (0.119) | 0.061 (0.095) | 0.050 (0.134) | 0.048 (0.103) |
Perfor | 0.211 (0.257) | 0.000 | 0.000 | 0.000 |
Adhere | 0.161 (0.130) | 0.028 (0.137) | 0.028 (0.136) | 0.000 |
Nodes | 0.038 (0.015) | 0.037 (0.017) | 0.000 | 0.000 |
Differ | 0.153 (0.098) | 0.118 (0.108) | 0.036 (0.106) | 0.024 (0.105) |
Extent | 0.451 (0.119) | 0.414 (0.120) | 0.393 (0.119) | 0.532 (0.116) |
Surg | 0.240 (0.104) | 0.072 (0.110) | 0.084 (0.108) | 0.000 |
Node4 | 0.591 (0.141) | 0.641 (0.103) | 0.772 (0.146) | 0.751 (0.106) |
Death | ||||
Lev | −0.041 (0.114) | 0.000 | 0.000 | 0.000 |
Lev + 5FU | −0.362 (0.122) | −0.294 (0.109) | −0.209 (0.108) | −0.226 (0.107) |
Sex | 0.007 (0.097) | 0.000 | 0.000 | 0.000 |
Age | 0.008 (0.004) | 0.006 (0.004) | 0.002 (0.004) | 0.000 |
Obstruct | 0.269 (0.120) | 0.118 (0.135) | 0.098 (0.135) | 0.094 (0.131) |
Perfor | 0.017 (0.270) | 0.000 | 0.000 | 0.000 |
Adhere | 0.170 (0.131) | 0.138 (0.145) | 0.130 (0.141) | 0.135 (0.126) |
Nodes | 0.044 (0.015) | 0.043 (0.014) | 0.000 | 0.000 |
Differ | 0.138 (0.101) | 0.106 (0.110) | 0.007 (0.106) | 0.003 (0.110) |
Extent | 0.446 (0.118) | 0.420 (0.114) | 0.377 (0.111) | 0.427 (0.112) |
Surg | 0.240 (0.106) | 0.021 (0.113) | 0.079 (0.110) | 0.000 |
Node4 | 0.667 (0.143) | 0.641 (0.128) | 0.657 (0.143) | 0.899 (0.153) |
Appendix A. Proofs
For simplicity, we let
To prove Theorem 2, it is sufficient to show that
This means that the probability of a local maximum existing in the ball
Note that
Firstly, we consider
According to the Taylor expansion,
Next, we consider
From Lemma 1, we can obtain
Consequently,
Thus,
Using the Cauchy–Schwarz inequality,
Furthermore,
To prove Theorem 3, we introduce a lemma first.
Under the conditions of Theorem 3, with a probability tending to 1, for any given
It is sufficient to show that for any
Using the Taylor expansion, we have
Next, we consider
Thus,
Given the assumption that
We can immediately prove part 1 through Lemma A1. To prove part 2, it is sufficient to show that
According to condition (F) and Cui [
Thus, we need to prove that (
Using the Taylor expansion to
From condition (I), it follows that
Therefore, (
1. Liang, K.Y.; Self, S.G.; Chang, Y.C. Modelling marginal hazards in multivariate failure time data. J. R. Stat. Soc. Ser. B Stat. Methodol.; 1993; 55, pp. 441-453. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1993.tb01914.x]
2. Lin, D.Y. Cox regression analysis of multivariate failure time data: The marginal approach. Stat. Med.; 1994; 13, pp. 2233-2247. [DOI: https://dx.doi.org/10.1002/sim.4780132105] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/7846422]
3. Spiekerman, C.F.; Lin, D.Y. Marginal regression models for multivariate failure time data. J. Am. Stat. Assoc.; 1998; 93, pp. 1164-1175. [DOI: https://dx.doi.org/10.1080/01621459.1998.10473777]
4. Wei, L.J.; Lin, D.Y.; Weissfeld, L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J. Am. Stat. Assoc.; 1989; 84, pp. 1065-1073. [DOI: https://dx.doi.org/10.1080/01621459.1989.10478873]
5. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B Methodol.; 1972; 34, pp. 187-202. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1972.tb00899.x]
6. Cox, D.R. Partial likelihood. Biometrika; 1975; 62, pp. 269-276. [DOI: https://dx.doi.org/10.1093/biomet/62.2.269]
7. Cui, W.Q.; Ying, Z.L.; Zhao, L.C. A simple construction of optimal estimation in multivariate marginal Cox regression. Sci. China Math.; 2012; 55, pp. 1827-1857. [DOI: https://dx.doi.org/10.1007/s11425-012-4400-4]
8. Tibshirani, R. The lasso method for variable selection in the Cox model. Stat. Med.; 1997; 16, pp. 385-395. [DOI: https://dx.doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3]
9. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc.; 2006; 101, pp. 1418-1429. [DOI: https://dx.doi.org/10.1198/016214506000000735]
10. Zhang, H.H.; Lu, W. Adaptive Lasso for Cox’s proportional hazards model. Biometrika; 2007; 94, pp. 691-703. [DOI: https://dx.doi.org/10.1093/biomet/asm037]
11. Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Statist.; 2010; 38, pp. 894-942. [DOI: https://dx.doi.org/10.1214/09-AOS729] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/17244211]
12. Cai, J.W.; Fan, J.Q.; Li, R.Z.; Zhou, H.B. Variable selection for multivariate failure time data. Biometrika; 2005; 92, pp. 303-316. [DOI: https://dx.doi.org/10.1093/biomet/92.2.303] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19458784]
13. Liu, J.C.; Zhang, R.Q.; Zhao, W.H.; Lv, Y.Z. Variable selection in semiparametric hazard regression for multivariate survival data. J. Multivar. Anal.; 2015; 142, pp. 26-40. [DOI: https://dx.doi.org/10.1016/j.jmva.2015.07.015]
14. Sun, L.; Li, S.; Wang, L.; Song, X.; Sui, X. Simultaneous variable selection in regression analysis of multivariate interval-censored data. Biometrics; 2022; 78, pp. 1402-1413. [DOI: https://dx.doi.org/10.1111/biom.13548]
15. Fan, J.Q.; Li, R.Z. Variable Selection via Penalized Likelihood; Department of Statistics, UCLA: Los Angeles, CA, USA, 1999.
16. Fan, J.Q.; Li, R.Z. Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat.; 2002; 30, pp. 74-99. [DOI: https://dx.doi.org/10.1214/aos/1015362185]
17. Andersen, P.K.; Gill, R.D. Cox’s regression model for counting processes: A large sample study. Ann. Stat.; 1982; 10, pp. 1100-1120. [DOI: https://dx.doi.org/10.1214/aos/1176345976]
18. Cai, J.W. Hypothesis testing of hazard ratio parameters in marginal models for multivariate failure time data. Lifetime Data Anal.; 1999; 5, pp. 39-53. [DOI: https://dx.doi.org/10.1023/A:1009679032314]
19. Cui, W.Q. Analysis of Multivariate Survival Data by Marginal Proportional Hazards Regression Models. Ph.D. Thesis; University of Science and Technology of China: Hefei, China, 2004; (In Chinese)
20. Fan, J.Q.; Li, R.Z. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Am. Stat. Assoc.; 2004; 99, pp. 710-723. [DOI: https://dx.doi.org/10.1198/016214504000001060]
21. Fan, J.Q.; Li, R.Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.; 2001; 96, pp. 1348-1360. [DOI: https://dx.doi.org/10.1198/016214501753382273]
22. Cai, K. Bi-level Variable Selection and Dimension-reduction Methods in Complex Lifetime Data Analytics. Ph.D. Thesis; University of Calgary: Calgary, AB, Canada, 2019.
23. Raftery, A.E. A continuous multivariate exponential distribution. Commun. Stat. Theory Methods; 1984; 13, pp. 947-965. [DOI: https://dx.doi.org/10.1080/03610928408828733]
24. Raftery, A.E. Some properties of a new continuous bivariate exponential distribution. Stat. Decis. Suppl. Issue; 1985; 2, pp. 53-58.
25. Moertel, C.G.; Fleming, T.R.; Macdonald, J.S.; Haller, D.G.; Laurie, J.A.; Goodman, P.J.; Ungerleider, J.S.; Emerson, W.A.; Tormey, D.C.; Glick, J.H.
26. Moertel, C.G.; Fleming, T.R.; Macdonald, J.S.; Haller, D.G.; Laurie, J.A.; Tangen, C.M.; Ungerleider, J.S.; Emerson, W.A.; Tormey, D.C.; Glick, J.H.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In this paper, we propose a new variable selection method using a partitioning-based estimating equation for multivariate survival data to simultaneously perform variable selection and parameter estimation. The main idea of the partitioning-based estimating equation is to partition the score function into small blocks. We construct our method using the SCAD penalty function and achieve the purpose of directly selecting variables through the estimating equation. We further establish asymptotic normality and prove that our method achieves the oracle property. Moreover, we use a simple approximation of the penalty function such that our method can be implemented algorithmically. We conducted simulation studies to validate the performance of our method and analyzed the dataset from the Colon Cancer Study.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer