1. Introduction
The semiparametric partially linear model (PLM), which was proposed by [1], has attracted extensive attention in statistics because it combines the interpretability of the linear model with the flexibility of the nonparametric model. A large collection of literature has explored the estimation methods of this model, including the parametric part and the nonparametric function, such as [2,3,4,5], and so on. The premise of using methods in the aforementioned literature is that the model is correctly specified. However, in real data analysis, researchers are always able to collect a variety of variables and are not sure which variables are to be included in the true model. This kind of uncertainty is generally referred to as model uncertainty, which brings great trouble to statistical analysis.
Refs. [6,7,8] pointed out that model selection and model averaging are two mainstream methods to deal with model uncertainty. Model selection, which has a long history, selects a model from a series of candidate models through selection criteria, for example, Akaike information criterion (AIC [9]), Bayesian information criterion (BIC [10]), focused information criterion (FIC [11]), and so on. In addition, shrinkage-estimation-based variable selection was also applied to determine which variables are neededto build a PLM (see, e.g., [12,13,14], among others). These model-selection methods can be viewed as combining a series of candidate models, and assigning a weight of 1 to the selected model and 0 to other models.
As an important alternative to model selection, model averaging incorporates the model uncertainty in statistical analysis by assigning nonzero weight vector to a set of candidate models, which frequently leads to more effective results (see [15]). Bayesian model averaging, which is an important branch of model averaging, has been fully developed in the past decades; see [16] for details. In the current paper, we focus on model-averaging approach in PLM from a frequentist perspective. Since ref. [17] pioneered the use of Mallows criterion for weight choice in model averaging, there is rapidly growing research on asymptotically optimal model averaging. Different kinds of optimal model-averaging methods were proposed, including jackknife model averaging (JMA [7]), Kullback–Leibler model averaging [18], generalized least-squares model averaging [19], leave-subject-out cross-validation [20], K-fold cross-validation [21], and so on. In addition, the optimal model-averaging methods were extended to quantile regression [22], semiparametric models [23,24], missing data [25,26], functional data [27], measurement error data [28], and high-dimensional data [29,30].
Censored data are ubiquitous in the fields of biomedicine, industry, econometrics, etc. For example, in biomedicine, when some sampled individuals are lost to follow-up before the end of the study or drop out during the study, the survival time will be subject to censoring. Compared with the complete data, censoring makes some data unable to be observed completely, which increases the difficulty of statistical analysis. Although an extensive body of literature focusing on the estimation methods in the presence of censored data has been developed, such as [31,32], there is little work on the development of model-averaging approaches with censored data. Based on FIC advocated in [11,33,34,35], theydeveloped model-averaging methods for different regression models with censored data under the local misspecification framework, where the weights of the model-averaging estimators were constructed by the information criterion values, rather than being selected in a data-driven fashion. Moreover, the framework mentioned above requires that the distance between the candidate model and the true model is . This means the candidate model is close to the true model when the sample size is large, which is unrealistic.
Without local misspecification framework, ref. [36] constructed the optimal model-averaging estimator for a high-dimensional linear model with censored data by adapting a leave-one-out cross-validation criterion. Ref. [37] studied Mallows model-averaging method for linear models with censored responses, and the resulting model-averaging estimator was proved to be asymptotically optimal in terms of minimizing the squared error loss. The optimal model-averaging methods for censored data mentioned above are based on classical linear models. The primary object of the current paper is to construct an optimal model-averaging estimator for semiparametric PLM with censored responses, in which the weight vector is selected by minimizing a leave-one-out cross-validation criterion. Compared with [36,37], we confront two major challenges. Firstly, the nonparametric function in PLM significantly complicates the construction of the model-averaging estimator and the development of the weight choice criterion. Secondly, our proof of optimality cannot follow the approach of the linear model, since their proof techniques cannot be directly applied when a nonparametric part is present.
The plan of this article is as follows. In Section 2, we describe the model set and introduce the parametric estimation method in the candidate PLM. Section 3 constructs the model-averaging estimator and proposes a weight choice criterion. Section 4 establishes the asymptotic optimality of the model-averaging estimator. Section 5 explores the finite sample performance of our method by the simulation study. Section 6 applies the proposed method to a real dataset. Section 7 gives some conclusions. Proofs are listed in the Appendix A.
2. Model Setup and Parametric Estimation
To facilitate presentation, we first list the basic notations used in this paper in Table 1. Then, we consider the following PLM:
(1)
where is a response variable with a continuous distribution function , a countably infinite vector is linearly related to , is a covariate nonlinearly related to , is an unknown smooth function, and is the model error with and . The covariate is distributed on a compact interval . Without loss of generality, we take . The conditional expectation of the response is denoted as .In survival analysis, we assume to be a known monotonic transformation of the survival time , for example, the commonly used logarithm . may be censored by a censoring time and hence cannot be observed completely. We consider a sample of independent observations for , where and is the censoring indicator.
Let be the cumulative distribution function of the censoring time, and be a synthetic response. Then, from [38], it is not difficult to verify that . Therefore, under model (1), we obtain
(2)
where and . Model (2) can be expressed in matrix form as(3)
where is an n-dimensional synthetic response vector, conditional mean is an n-dimensional vector, , is a linear covariate matrix, , and is an n-dimensional error vector satisfying and .Assume that we have a total of S candidate PLMs to approximate the true data generating process, where S is allowed to go to infinity. Suppose the sth candidate model is
(4)
where is an covariate matrix which includes columns of X with full column rank, is the corresponding unknown linear regression coefficient vector, is an unknown nonparametric function vector, and is the model error.To obtain the estimator of under the sth candidate model, we should estimate the coefficient vector and nonparametric function vector firstly. There are many estimation methods for model (4), including kernel smoothing, polynomial spline smoothing, and so on. Recently, ref. [39] pointed out that using B-splines to approximate nonparametric functions in the area of model averaging has great advantages; therefore, in this paper, we adopt the spline technique to estimate the unknowns in model (4).
Denote as a partition of , where is the number of interior knots. Let be the polynomial spline space on interval of degree r. From [40], the nonparametric function in PLM can be well estimated by a B-spline expansion. Then, for , one can write
(5)
where is the normalized B-spline basis function vector in the sth candidate model, , and is the vector of spline coefficient. Define the matrix . Therefore, there exists a design matrix and the corresponding unknown parameter vector such that(6)
where is supposed to be of full-column rank. By regressing on , we can obtain the least-squares estimators of and :(7)
and(8)
where . Therefore, the estimator of under the sth candidate model is given by(9)
where , and . From Equation (9), we find that is linearly dependent on .3. Model-Averaging Estimator and Weight Choice Criterion
Let be a weight vector belonging to the set , then the model-averaging estimator of can be formulated as
(10)
where .Motivated by [7], we propose a leave-one-out cross-validation criterion to select the weight vector for Equation (10) in the PLM framework. Let , and be the matrices/vectors , and with the ith row deleted. The leave-one-out estimator of in the sth candidate model is given by
(11)
where and . Denote the sth jackknife estimator and the corresponding jackknife version of the averaging estimator as and . The leave-one-out cross-validation weight choice criterion is(12)
then minimizing over the space W yields the optimal weight vector. However, in practice, such a minimization process is computationally infeasible because the cumulative distribution function in Equation (12) is unknown and needs to be estimated. Similar to [41], we can estimate by the commonly used Kaplan–Meier estimator(13)
where denote the order statistics of , and is the indicator corresponding to . In what follows, a letter subscripted by denotes that it is obtained by replacing G in its corresponding estimator with . For instance, is obtained by replacing G with its estimator in . Then a feasible counterpart of is given by(14)
where , and . Minimizing with respect to over the set W leads to the jackknife choice of weight vector(15)
Plugging and into Equation (10) yields the model-averaging estimator of , written as , which is named the censored partially linear model averaging (CPLMA) estimator hereafter.
However, minimizing the weight choice criterion (14) is not easy because the computation of requires n separate regressions, which is especially cumbersome when the number of candidate models and the sample sizes are large. Motivated by the computationally efficient cross-validation criterion introduced by [7] for linear regression model, we express in a simple form which yields an enormous reduction in calculation time. Let be the ith diagonal entry of . From [20,42], can be conveniently written as
(16)
where , , and . The shortcut formula of given in (16) indicates that all elements in can be simultaneously calculated based on all observations, which is much more convenient and time-saving than the standard method based on Equation (11). Let , , and . The corresponding computational shortcut formula for the feasible jackknife criterion (14) then follows as(17)
where is an matrix. From Equation (17), we observe that the minimization of is a standard quadratic programming problem, which can be performed by various existing software packages, for example, the quadprog package in R [43].4. Asymptotic Optimality
In this section, we demonstrate that the resulting weight vector, which is obtained by minimizing the weight choice criterion , is asymptotically optimal under some mild conditions.
Define the squared loss as , and the corresponding risk function as . Let , , , and be a vector with the sth entry taking on 1 and the others taking on 0. To prove the asymptotic optimality of the model-averaging estimator , we list the following regularity conditions, where all limiting processes correspond to .
-
(Condition (C.1)) , where for any distribution function L.
-
(Condition (C.2)) , where denotes the maximum singular value of a matrix, and is a constant.
-
(Condition (C.3)) , a.s.
-
(Condition (C.4)) and , a.s.
-
(Condition (C.5)) , a.s.
-
(Condition (C.6)) , a.s., where is a constant.
-
(Condition (C.7)) The function g belongs to a class of functions , whose rth derivative exsits and is Lipschitz of order . That is,
for some positive constant , where is the support of U, r is a nonnegative integer and such that .
Condition (C.1), which is the same as condition (C5) in [35], is widely used to ensure the uniform convergence of the Kaplan–Meier estimator in studies of the censored data. Condition (C.2) imposes a mild restriction on the maximum singular value of the covariance matrix , which is also used by [44]. Condition (C.3), which is from condition (21) of [45], is less restrictive than the commonly used condition , for some constant in model-averaging references. Condition (C.4) places constraints on the growth rates of and , which is similar to condition (22) in [45]. Condition (C.5) discusses the sum of , and is frequently used in the model-averaging literature, such as [23,24], and so on. Condition (C.6) is a common assumption utilized to guarantee the asymptotic optimality of cross-validation; see [7,24], for instance. Conditions (C.3)–(C.6) require almost sure convergence, which ensure that the result (18) holds whether the covariates X and U are random or not. Specifically, when X and U are nonstochastic, we only need to assume convergence in probability in conditions (C.3)–(C.6); see [46]. Otherwise, we should impose almost sure convergence to guarantee that the proof method, which is used in the case of nonstochastic, is still effective. Condition (C.7) is required for the B-spline approximation in PLM; see [39,47].
Theorem 1 indicates that the CPLMA estimator proposed in this paper is asymptotically optimal in the sense that its squared error loss is asymptotically equivalent to that of the infeasible best possible model-averaging estimator in PLM framework. The proof of Theorem 1 is shown in the Appendix A.
Under Conditions (C.1)–(C.7), we have
(18)
in probability as .5. A Simulation Study
In this section, a simulation experiment is conducted to investigate the finite sample performance of the CPLMA estimator, which arises from the proposed leave-one-out cross-validation weight choice approach, in PLM with censored responses. We compare it with several popular information-criterion-based model-selection methods as well as other model-averaging procedures.
5.1. The Design of Simulation
The data-generating process in this part is similar to the infinite-order regression model proposed by [17], except that responses are subject to censoring and a nonparametric function is included in addition to the linear part. Specifically, the data are generated by the following regression model:
(19)
where , the covariates in linear component, follows a multivariate normal distribution with mean 0 and covariance between and . The coefficients of the linear part are set as , and the parameter varies between 2 and . The larger implies that the coefficient decays more quickly as j increases. The nonparametric function is , where is generated from the uniform distribution on . The model error follows a normal distribution . We choose the value of so that varies from to , where denotes the sample variance. In addition, the censoring variable is generated from a uniform distribution on interval , where different values of and are selected to yield a censoring rate (CR) of either or . In order to evaluate the performance of the methods as comprehensively as possible, we consider two designs and set the sample size , 75, 100, 200, 300, and 400.(non-nested setting).The linear parts of all candidate models are a subset of(with the rest ofbeing ignored), so the number of candidate models is.
(nested setting).The sth candidate model includes the first s linear variables. The number of the candidate models is determined by, wheredenotes the nearest integer of b. Therefore,, 13, 14, 18, 20, and 22, for , 75, 100, 200, 300, and 400, respectively.
5.2. Estimation and Comparison
Suggested by [35], the cubic B-spline is used to approximate the nonparametric function, and the spline basis matrix is generated by in splines package with R project [48]. To select the number of knots, we set , , and investigate the impact of the number of knots on the risk of CPLMA estimator under different scenarios. Figure 1 shows the variation for the mean of risks when the number of knots varies, over 500 replications for four combinations of designs and CRs considered. From Figure 1, we see that in almost all cases, 1 knot yields the smallest mean of risk, except that in the left lower panel, 1 knot leads to a mean of risk that is second to 2 knots but best among the remaining number of knots. In addition, in all cases, the mean of risk increases with the number of knots when the number of knots exceeds 2. This observation coincides with the findings in [39] that the larger number of knots results in a more serious overfitting effect. Therefore, the number of knots is set to be 1 in the simulation studies.
We compare the performance of the CPLMA method with two traditional model-selection methods (AIC and BIC) and two model-averaging methods based on scores of information criteria (SAIC and SBIC). For the sth model, we calculate AIC and BIC scores by
and respectively, where , and is obtained by replacing G in Equation (9) with . Both methods pick the model with the smallest information criterion score.For SAIC and SBIC, the weights of sth model are defined as
and respectively. To evaluate these five methods, we draw 500 independent samples of size n, and compute the risk of the estimator of . For comparison convenience, the risks of all estimators are normalized by the risk produced by the AIC method.5.3. Results
The simulation results for Design 1 are presented in Figure 2 and Figure 3 for the censoring rate of and in Figure 4 and Figure 5 for the censoring rate of . These four figures show that our method CPLMA leads to the smallest risk in most cases, except that both SAIC and SBIC sometimes have marginal advantages over ours when is large, and the advantage of SBIC is more obvious when n is small. In particular, comparing the simulation results between and , we find that the performance of CPLMA is better when is small. As expected, SAIC and SBIC invariably produce vastly more accurate outcomes than their respective model-selection counterparts.
The simulation results for Design 2 are depicted in Figure 6, Figure 7, Figure 8 and Figure 9 with and , from which we see that in most cases our proposed CPLMA method still outperforms its rivals in terms of risk. The superiority of the CPLMA method over the other methods is more apparent than in Design 1. Additionally, we find that BIC-based model selection and averaging estimators have much worse risk performance than the other three estimators when is small, which is different from the simulation results in Design 1. We also note that SAIC and our method CPLMA almost perform equally well when is very large.
In a word, no matter whether the candidate models are nested or not, our proposal, CPLMA, is superior to the traditional model-selection and model-averaging methods for all the combinations of censoring rates and sample sizes considered.
6. Real Data Analysis
In this section, we apply the proposed CPLMA method to analyze two real datasets by R software. The first real dataset can be found in the R package “survival” [49], and the other is available at
6.1. Primary Biliary Cirrhosis Dataset Study
The primary biliary cirrhosis (PBC) dataset includes information of 424 patients, which were collected at Mayo Clinic from January 1974 to May 1984, and has been extensively explored by [34,35,37,50,51]. Following the related literature, we restrict our attention to the patients without missing observations, each of whom contains the data of 17 covariates. There are 111 deaths among 276 patients, which leads to of censoring.
In this dataset, the dependent variable is the log number of days between registration and the earlier of death or study analysis time in 1986. The 17 covariates include age (in years), albumin (serum albumin in g/dL), alk.phos (alkaline phosphotase in U/L), bili (serum bilirunbin in mg/dL), chol (serum cholesterol in mg/dL), copper (urine copper in ug/day), platelet (platelet count), protime (standardized blood clotting time in seconds), ast (aspartate aminotransferase, once called SGOT in U/mL), trig (triglycerides in mg/dL), ascites (presence of ascites, no, yes), edema ( no; yes, but responded to diuretic treatment; yes, did not respond to treatment), hepato (presence of hepatomegaly, no, yes), sex ( male, female), spiders (presence of spiders, no, yes), stage (histologic state of disease, graded 1, 2, 3, or 4), and trt (treatment code, D-pencillamine, placebo). The first 10 variables are continuous, which are standardized to have mean of 0 and variance of 1 in the analysis.
A total of 17 covariates lead to a huge number of candidate models, which brings heavy calculation burden. Ref. [50] pointed out that only eight covariates, that is, ge, edema, bili, albumin, copper, ast, protime, and stage, have significant impact on the response variable, and ref. [35] found that albumin has a functional impact on the response variable. Thus, we only consider eight significant covariates. Specifically, we assign albumin in the nonparametric part while including others in the linear part of PLM, and we run model selection and averaging on the covariates in the linear part. Accordingly, there are prepared models as candidates. Similar to [35], we also use the cubic B-spline with two knots to approximate the nonparametric component.
To evaluate the prediction effect of two model-selection methods (AIC and BIC) and three model-averaging methods (SAIC, SBIC, and CPLMA), we randomly separate the data into a training sample and a test sample. Let be the size of the training sample, and be the size of the test sample. We set to , and 240. The mean-squared prediction error (MSPE) is used to describe the out-of-sample prediction performance of the proposed CPLMA and its competitors. We further calculate the mean and the median of the MSPE for each method based on 1000 replications. Specifically,
(20)
and(21)
where(22)
and is the predicted value of in the dth repetition.To facilitate comparison, we calculate the ratio of MSPE for a given method to the MSPE produced by AIC, which is referred to relative MSPE (RMSPE). Table 2 reports the mean and median of RMSPE across 1000 repetitions. We see that our proposed CPLMA always yields the lowest mean and median of RMSPE for all considered training sample sizes. In all cases, the values of RMSPE for BIC are bigger than 1 and the values of RMSPE for three model-averaging methods are smaller than 1, which indicates that in terms of prediction performance, AIC is the overwhelming favorite of two model-selection methods, and model-averaging methods outperform model-selection methods.
Table 3 presents the Diebold and Mariano (DM) [52] test results for the differences in MSPE, where a positive DM statistic implies that the method in the numerator yields a larger MSPE than the method in the denominator. The results in columns 6, 9, 11, and 12 indicate that the differences between CPLMA and its competitors are statistically significant, and our method always produces smaller MSPE than other four methods, which again demonstrates the superiority of our proposal. The results in column 3 show that AIC is significantly better than BIC, which coincides with the finding in Table 2. Columns 4 and 8 indicate that SAIC and SBIC are significantly different from their respective model-selection counterparts.
6.2. Mantle Cell Lymphoma Data Analysis
The mantle cell lymphoma (MCL) dataset contains 92 patients who were classified as having MCL based on established morphologic and immunophenotype criteria. Since 2003, this dataset has been widely studied by [53,54]. The response variable of interest is time (time of follow-up in year). The variable status denotes patient status at follow-up ( death, censored). The six covariates include indicator of INK/ARF deletion ( yes, no), indicator of ATM deletion ( yes, no), indicator of P-53 deletion ( yes, no), cyclinD-1 taqman result, BMI expression, and proliferation signature averages. After removing seven records with missing covariates, we focus on the information of the remaining 85 patients, and the censoring rate is .
Ref. [54] found that BMI expression has a functional impact on the response variable; therefore, we establish a full PLM model with BMI expression as nonparametric variable and other covariates as linear variables. Then, we can obtain candidate models to conduct model selection and model averaging. Let the size of training sample be or 65, and the mean and median of the RMSPE across 1000 repetitions are shown in Table 4. It can be seen from Table 4 that, both in terms of mean and median, our method CPLMA is significantly superior to other competitive methods. Figure 10 shows that the variation of MSPE for CPLMA is minor relative to that of other methods, regardless of if is 55 or 65.
7. Conclusions
In the context of the semiparametric partially linear models with censored responses, we develop a jackknife model-averaging method which selects the weights by minimizing a leave-one-out cross-validation criterion, in which the B-splines are used to approximate the nonparametric function and the least-squares estimation is applied to estimate the unknown parameters in each candidate model. The resulting model-averaging estimator, CPLMA estimator, is shown to be asymptotically optimal. A simulation study and two real data examples indicate that our method posseses some advantages over other model-selection and model-averaging methods.
Based on the results in this paper, we can further explore the optimal model averaging for the semiparametric partially linear quantile regression models with censored data. In addition, it is worthwhile to apply other optimal model-averaging methods, such as the model-averaging method based on Kullback–Leibler distance, to generalized partially linear models with censored responses.
Conceptualization, W.C.; methodology, G.H., W.C. and J.Z.; software, G.H. and J.Z.; supervision, W.C. and J.Z.; writing—original draft, G.H.; writing—review and editing, J.Z. All authors have read and agreed to the published version of the manuscript.
Not applicable.
Not applicable.
The PBC dataset is available in the R package “survival”, and the MCL dataset is available at
The authors would like to thank the reviewers and editors for their careful reading and constructive comments.
The authors declare no conflict of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1. The curves of the mean of risk with the number of knots over 500 replications.
Figure 2. Risk comparisons for Design 1 when [Forumla omitted. See PDF.] and the censoring rate is about [Forumla omitted. See PDF.].
Figure 3. Risk comparisons for Design 1 when [Forumla omitted. See PDF.] and the censoring rate is about [Forumla omitted. See PDF.].
Figure 4. Risk comparisons for Design 1 when [Forumla omitted. See PDF.] and the censoring rate is about [Forumla omitted. See PDF.].
Figure 5. Risk comparisons for Design 1 when [Forumla omitted. See PDF.] and the censoring rate is about [Forumla omitted. See PDF.].
Figure 6. Risk comparisons for Design 2 when [Forumla omitted. See PDF.] and the censoring rate is about [Forumla omitted. See PDF.].
Figure 7. Risk comparisons for Design 2 when [Forumla omitted. See PDF.] and the censoring rate is about [Forumla omitted. See PDF.].
Figure 8. Risk comparisons for Design 2 when [Forumla omitted. See PDF.] and the censoring rate is about [Forumla omitted. See PDF.].
Figure 9. Risk comparisons for Design 2 when [Forumla omitted. See PDF.] and the censoring rate is about [Forumla omitted. See PDF.].
The basic notations used in this paper.
Notations | Descriptions |
---|---|
|
The survival time of the ith subject |
|
The response variable, a transformation of |
|
The covariate vector of the ith subject |
|
The censoring indicator of the ith subject |
|
The last follow up time of the ith subject |
|
The observed time, equal to |
|
The cumulative distribution function of |
|
|
|
|
|
|
|
|
|
|
|
The Kaplan–Meier estimator of |
|
The estimator of |
|
The estimator of |
|
The estimator of |
|
The model-averaging estimator of |
the sth jackknife estimator of |
|
the jackknife model-averaging estimator of |
The mean and median of RMSPE across 1000 repetitions.
|
Method | BIC | SAIC | SBIC | CPLMA |
---|---|---|---|---|---|
140 | mean | 1.005 | 0.987 | 0.983 | 0.979 |
median | 1.014 | 0.992 | 0.995 | 0.987 | |
160 | mean | 1.005 | 0.987 | 0.983 | 0.982 |
median | 1.006 | 0.986 | 0.988 | 0.984 | |
180 | mean | 1.011 | 0.991 | 0.991 | 0.986 |
median | 1.011 | 0.990 | 0.990 | 0.981 | |
200 | mean | 1.014 | 0.993 | 0.995 | 0.989 |
median | 1.012 | 0.984 | 0.990 | 0.976 | |
220 | mean | 1.012 | 0.995 | 0.997 | 0.994 |
median | 1.020 | 0.994 | 1.003 | 0.993 | |
240 | mean | 1.008 | 0.995 | 0.998 | 0.993 |
median | 1.017 | 0.996 | 0.999 | 0.988 |
Diebold–Mariano test results for the differences in MSPE.
|
Method |
|
|
|
|
|
|
|
|
|
|
---|---|---|---|---|---|---|---|---|---|---|---|
140 | DM | −3.013 | 15.130 | 12.335 | 14.858 | 16.743 | 25.341 | 16.633 | 4.973 | 6.361 | 2.834 |
p-value | 0.003 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.005 | |
160 | DM | −3.014 | 16.607 | 12.490 | 14.862 | 17.196 | 30.331 | 18.474 | 4.995 | 5.538 | 0.942 |
p-value | 0.003 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.347 | |
180 | DM | −8.082 | 12.355 | 7.874 | 11.238 | 22.554 | 34.561 | 21.914 | −0.348 | 4.439 | 5.679 |
p-value | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.728 | 0.000 | 0.000 | |
200 | DM | −11.473 | 12.320 | 4.744 | 11.393 | 23.962 | 32.286 | 22.550 | −3.721 | 5.288 | 8.690 |
p-value | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
220 | DM | −9.509 | 8.500 | 2.308 | 5.587 | 19.004 | 22.316 | 17.152 | −4.011 | 1.085 | 5.059 |
p-value | 0.000 | 0.000 | 0.021 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.278 | 0.000 | |
240 | DM | −5.484 | 7.332 | 1.441 | 5.848 | 12.901 | 15.427 | 12.998 | −4.175 | 2.110 | 6.561 |
p-value | 0.000 | 0.000 | 0.150 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.035 | 0.000 |
The mean and median of RMSPE across 1000 repetitions.
|
Method | BIC | SAIC | SBIC | CPLMA |
---|---|---|---|---|---|
55 | mean | 1.011 | 0.982 | 0.988 | 0.923 |
median | 0.951 | 0.965 | 0.937 | 0.918 | |
65 | mean | 0.992 | 0.976 | 0.987 | 0.947 |
median | 0.982 | 0.970 | 0.973 | 0.939 |
Appendix A
To prove Theorem 1, we first give some notations which are used in the following context. Similar to
Under Conditions (C.2)–(C.7), we have the following results:
According to the proof of (A.45), (A.48) and (A.44) in [
If Conditions (C.4)–(C.7) hold, then
By the inequalities for maximum singular value, we obtain
By the definition of
From (
The proof of Lemma A2 is completed. □
Assuming that Conditions (C.1), (C.2), and (C.5) are satisfied, we obtain
This result is from Lemma
According to [
By Lemma A1 and Conditions (C.2)–(C.6), Equations (
For (
Equation (
A simple calculation yields
Considering (
Using Cauchy–Schwarz inequality, Lemma A1, Lemma A3, and [
By Equation (
Thus, we can obtain (
References
1. Engle, R.F.; Granger, C.W.J.; Rice, J.; Weiss, A. Semiparametric estimates of the relation between weather and electricity sales. J. Am. Stat. Assoc.; 1986; 81, pp. 310-320. [DOI: https://dx.doi.org/10.1080/01621459.1986.10478274]
2. Speckman, P. Kernel smoothing in partial linear models. J. R. Stat. Soc. Ser. B Stat. Methodol.; 1988; 50, pp. 413-436. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1988.tb01738.x]
3. Heckman, N.E. Spline smoothing in a partly linear model. J. R. Stat. Soc. Ser. B Stat. Methodol.; 1986; 48, pp. 244-248. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1986.tb01407.x]
4. Shi, J.; Lau, T. Empirical likelihood for partially linear models. J. Multivar. Anal.; 2000; 72, pp. 132-148. [DOI: https://dx.doi.org/10.1006/jmva.1999.1866]
5. Härdle, W.; Liang, H.; Gao, J. Partially Linear Models; Springer Science & Business Media: Berlin, Germany, 2000.
6. Claeskens, G.; Hjort, N.L. Model Selection and Model Averaging; Cambridge University Press: Cambridge, UK, 2008.
7. Hansen, B.E.; Racine, J.S. Jackknife model averaging. J. Econom.; 2012; 167, pp. 38-46. [DOI: https://dx.doi.org/10.1016/j.jeconom.2011.06.019]
8. Racine, J.S.; Li, Q.; Yu, D.; Zheng, L. Optimal model averaging of mixed-data kernel-weighted spline regressions. J. Bus. Econ. Stat.; 2022.in press [DOI: https://dx.doi.org/10.1080/07350015.2022.2118126]
9. Akaike, H. Statistical predictor identification. Ann. Inst. Statist. Math.; 1970; 22, pp. 203-217. [DOI: https://dx.doi.org/10.1007/BF02506337]
10. Schwarz, G. Estimating the dimension of a model. Ann. Statist.; 1978; 6, pp. 461-464. [DOI: https://dx.doi.org/10.1214/aos/1176344136]
11. Claeskens, G.; Hjort, N.L. The focused information criterion. J. Am. Stat. Assoc.; 2003; 98, pp. 900-916. [DOI: https://dx.doi.org/10.1198/016214503000000819]
12. Ni, X.; Zhang, H.; Zhang, D. Automatic model selection for partially linear models. J. Multivar. Anal.; 2009; 100, pp. 2100-2111. [DOI: https://dx.doi.org/10.1016/j.jmva.2009.06.009]
13. Raheem, S.E.; Ahmed, S.E.; Doksum, K.A. Absolute penalty and shrinkage estimation in partially linear models. Comput. Stat. Data Anal.; 2012; 56, pp. 874-891. [DOI: https://dx.doi.org/10.1016/j.csda.2011.09.021]
14. Xie, H.; Huang, J. SCAD-penalized regression in high-dimensional partially linear models. Ann. Statist.; 2009; 37, pp. 673-696. [DOI: https://dx.doi.org/10.1214/07-AOS580]
15. Peng, J.; Yang, Y. On improvability of model selection by model averaging. J. Econom.; 2022; 229, pp. 246-262. [DOI: https://dx.doi.org/10.1016/j.jeconom.2020.12.003]
16. Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian model averaging: A tutorial. Statist. Sci.; 1999; 14, pp. 382-417.
17. Hansen, B.E. Least squares model averaging. Econometrica; 2007; 75, pp. 1175-1189. [DOI: https://dx.doi.org/10.1111/j.1468-0262.2007.00785.x]
18. Zhang, X.; Zou, G.; Carroll, R.J. Model averaging based on Kullback-Leibler distance. Stat. Sin.; 2015; 25, pp. 1583-1598. [DOI: https://dx.doi.org/10.5705/ss.2013.326] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27761098]
19. Liu, Q.; Okui, R.; Yoshimura, A. Generalized least squares model averaging. Economet. Rev.; 2016; 35, pp. 1692-1752. [DOI: https://dx.doi.org/10.1080/07474938.2015.1092817]
20. Gao, Y.; Zhang, X.; Wang, S.; Zou, G. Model averaging based on leave-subject-out cross-validation. J. Econom.; 2016; 192, pp. 139-151. [DOI: https://dx.doi.org/10.1016/j.jeconom.2015.07.006]
21. Zhang, X.; Liu, C. Model averaging prediction by K-fold cross-validation. J. Econom.; 2022.in press
22. Lu, X.; Su, L. Jackknife model averaging for quantile regressions. J. Econom.; 2015; 188, pp. 40-58. [DOI: https://dx.doi.org/10.1016/j.jeconom.2014.11.005]
23. Zhang, X.; Wang, W. Optimal model averaging estimation for partially linear models. Stat. Sin.; 2019; 29, pp. 693-718. [DOI: https://dx.doi.org/10.5705/ss.202015.0392]
24. Zhu, R.; Wan, A.T.K.; Zhang, X.; Zou, G. A Mallows-type model averaging estimator for the varying-coefficient partially linear model. J. Am. Stat. Assoc.; 2019; 114, pp. 882-892. [DOI: https://dx.doi.org/10.1080/01621459.2018.1456936]
25. Xie, J.; Yan, X.; Tang, N. A model-averaging method for high-dimensional regression with missing responses at random. Stat. Sin.; 2021; 31, pp. 1005-1026. [DOI: https://dx.doi.org/10.5705/ss.202018.0297]
26. Wei, Y.; Wang, Q.; Liu, W. Model averaging for linear models with responses missing at random. Ann. Inst. Statist. Math.; 2021; 73, pp. 535-553. [DOI: https://dx.doi.org/10.1007/s10463-020-00759-y]
27. Zhang, X.; Chiou, J.; Ma, Y. Functional prediction through averaging estimated functional linear regression models. Biometrika; 2018; 105, pp. 945-962. [DOI: https://dx.doi.org/10.1093/biomet/asy041]
28. Zhang, X.; Ma, Y.; Carroll, R.J. MALMEM: Model averaging in linear measurement error models. J. R. Stat. Soc. Ser. B Stat. Methodol.; 2019; 81, pp. 763-779. [DOI: https://dx.doi.org/10.1111/rssb.12317]
29. Ando, T.; Li, K.C. A model-averaging approach for high-dimensional regression. J. Am. Stat. Assoc.; 2014; 109, pp. 254-265. [DOI: https://dx.doi.org/10.1080/01621459.2013.838168]
30. Ando, T.; Li, K.C. A weight-relaxed model averaging approach for high-dimensional generalized linear models. Ann. Statist.; 2017; 45, pp. 2654-2679. [DOI: https://dx.doi.org/10.1214/17-AOS1538]
31. Zeng, D.; Lin, D. Efficient estimation for the accelerated failure time model. J. Am. Stat. Assoc.; 2007; 102, pp. 1387-1396. [DOI: https://dx.doi.org/10.1198/016214507000001085]
32. Wang, H.J.; Wang, L. Locally weighted censored quantile regression. J. Am. Stat. Assoc.; 2009; 104, pp. 1117-1128. [DOI: https://dx.doi.org/10.1198/jasa.2009.tm08230]
33. Hjort, N.L.; Claeskens, G. Focused information criteria and model averaging for the Cox hazard regression model. J. Am. Stat. Assoc.; 2006; 101, pp. 1449-1464. [DOI: https://dx.doi.org/10.1198/016214506000000069]
34. Du, J.; Zhang, Z.; Xie, T. Focused information criterion and model averaging in censored quantile regression. Metrika; 2017; 80, pp. 547-570. [DOI: https://dx.doi.org/10.1007/s00184-017-0616-1]
35. Sun, Z.; Sun, L.; Lu, X.; Zhu, J.; Li, Y. Frequentist model averaging estimation for the censored partial linear quantile regression model. J. Statist. Plann. Inference; 2017; 189, pp. 1-15. [DOI: https://dx.doi.org/10.1016/j.jspi.2017.04.001]
36. Yan, X.; Wang, H.; Wang, W.; Xie, J.; Ren, Y.; Wang, X. Optimal model averaging forecasting in high-dimensional survival analysis. Int. J. Forecast.; 2021; 37, pp. 1147-1155. [DOI: https://dx.doi.org/10.1016/j.ijforecast.2020.12.004]
37. Liang, Z.; Chen, X.; Zhou, Y. Mallows model averaging estimation for linear regression model with right censored data. Acta Math. Appl. Sin. E.; 2022; 38, pp. 5-23. [DOI: https://dx.doi.org/10.1007/s10255-022-1054-z]
38. Koul, H.; Susarla, V.; Ryzin, J.V. Regression analysis with randomly right-censored data. Ann. Statist.; 1981; 9, pp. 1276-1288. [DOI: https://dx.doi.org/10.1214/aos/1176345644]
39. Xia, X. Model averaging prediction for nonparametric varying-coefficient models with B-spline smoothing. Stat. Pap.; 2021; 62, pp. 2885-2905. [DOI: https://dx.doi.org/10.1007/s00362-020-01218-9]
40. De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 2001.
41. Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc.; 1958; 53, pp. 457-481. [DOI: https://dx.doi.org/10.1080/01621459.1958.10501452]
42. Hu, G.; Cheng, W.; Zeng, J. Model averaging by jackknife criterion for varying-coefficient partially linear models. Comm. Statist. Theory Methods; 2020; 49, pp. 2671-2689. [DOI: https://dx.doi.org/10.1080/03610926.2019.1580736]
43. Turlach, B.A.; Weingessel, A.; Moler, C. Quadprog: Functions to Solve Quadratic Programming Problems. R Package Version 1.5-8. 2019; Available online: https://CRAN.R-project.org/package=quadprog (accessed on 16 December 2022).
44. Wei, Y.; Wang, Q. Cross-validation-based model averaging in linear models with response missing at random. Stat. Probab. Lett.; 2021; 171, 108990. [DOI: https://dx.doi.org/10.1016/j.spl.2020.108990]
45. Zhang, X.; Wan, A.T.K.; Zou, G. Model averaging by jackknife criterion in models with dependent data. J. Econom.; 2013; 174, pp. 82-94. [DOI: https://dx.doi.org/10.1016/j.jeconom.2013.01.004]
46. Wan, A.T.; Zhang, X.; Zou, G. Least squares model averaging by Mallows criterion. J. Econom.; 2010; 156, pp. 277-283. [DOI: https://dx.doi.org/10.1016/j.jeconom.2009.10.030]
47. Fan, J.; Ma, Y.; Dai, W. Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J. Am. Stat. Assoc.; 2014; 109, pp. 1270-1284. [DOI: https://dx.doi.org/10.1080/01621459.2013.879828] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25309009]
48. Bates, D.M.; Venables, W.N. Splines: Regression Spline Functions and Classes. R Package Version 3.6-1. 2019; Available online: https://CRAN.R-project.org/package=splines (accessed on 15 December 2022).
49. Therneau, T.M.; Lumley, T.; Elizabeth, A.; Cynthia, C. Survival: Survival Analysis. R Package Version 3.4-0. 2022; Available online: https://CRAN.R-project.org/package=survival (accessed on 15 December 2022).
50. Tibshirani, R. The lasso method for variable selection in the Cox model. Stat. Med.; 1997; 16, pp. 385-395. [DOI: https://dx.doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3]
51. Shows, J.H.; Lu, W.; Zhang, H.H. Sparse estimation and inference for censored median regression. J. Statist. Plann. Inference; 2010; 140, pp. 1903-1917. [DOI: https://dx.doi.org/10.1016/j.jspi.2010.01.043] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20607110]
52. Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat.; 1995; 13, pp. 253-263.
53. Rosenwald, A.; Wright, G.; Wiestner, A.; Chan, W.C.; Connors, J.M.; Campo, E.; Gascoyne, R.D.; Grogan, T.M.; Muller-Hermelink, H.K.; Smeland, E.B. et al. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell; 2003; 3, pp. 185-197. [DOI: https://dx.doi.org/10.1016/S1535-6108(03)00028-X] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/12620412]
54. Ma, S.; Du, P. Variable selection in partly linear regression model with diverging dimensions for right censored data. Stat. Sin.; 2012; 22, pp. 1003-1020. [DOI: https://dx.doi.org/10.5705/ss.2010.267]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In the past few decades, model averaging has received extensive attention, and has been regarded as a feasible alternative to model selection. However, this work is mainly based on parametric model framework and complete dataset. This paper develops a frequentist model-averaging estimation for semiparametric partially linear models with censored responses. The nonparametric function is approximated by B-spline, and the weights in model-averaging estimator are picked up via minimizing a leave-one-out cross-validation criterion. The resulting model-averaging estimator is proved to be asymptotically optimal in the sense of achieving the lowest possible squared error. A simulation study demonstrates that the method in this paper is superior to traditional model-selection and model-averaging methods. Finally, as an illustration, the proposed procedure is further applied to analyze two real datasets.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer