Empirical-Likelihood-Based Inference for

Full text

Turn on search term navigation

1. Introduction

In many practical situations, linear models may not be complex enough to capture the underlying relation between the response variable and some associated covariates, especially when the response variable Y is not linearly related to all the covariates. For example, suppose one is interested in estimating the relationship between an outcome variable Y and vectors of variables X and Z. The researcher is comfortably modeling the linear function in X but hesitates to extend the linearity to Z. One example given by Engle et al. [1] is the effect of temperature on electricity consumption for four cities. They modeled the average monthly electricity consumption as the sum of a smooth function of the monthly temperatures and a linear function of the monthly price of electricity, income and 11 other monthly dummy variables. It is natural to impose linearity on the part of the regression function involving household characteristics and a nonlinear function involving temperature since electricity consumption tends to be higher at extreme temperatures but lower at moderate temperatures. A partially linear model provides a good fit for these types of data because it allows for a regression function that maintains linearity in some variables and also extends the effect of other variables to be nonlinear.

A partially linear regression model is defined as

(1) $Y_{i} = X_{i} β + g (Z_{i}) + ε_{i}, i = 1, \dots, n,$

where the

Y_{i}

’s are scalar response variables,

X_{i} = (x_{i 1}, \dots, x_{i p})

are known p-variate covariate,

Z_{i}

is a scalar explanatory variable, and

g (\cdot)

is the smooth part of the model, which is assumed to represent a smooth unparameterized functional relationship.

β = (β_{1}, \dots, β_{p})

is a vector of unknown parameters and

ε_{1}, \dots, ε_{n}

are independent random errors with mean zero and finite variance

σ^{2}

given the covariates X and Z.

The partial linear model Equation (1) is a semiparametric model since it contains both parametric and nonparametric components. The partially linear model is more flexible to interpret the effect of each linear covariate and allows one to focus on particular variables that can have nonlinear effects. It may be preferable to a completely nonparametric model because of the well-known “curse of dimensionality”. Computationally, partially linear models are remarkably easier than additive models, in which iterative approaches such as a backfitting algorithm [2] or marginal integration [3] are necessary.

Partially linear models are widely used in biometrics, econometrics, social sciences and other fields (see [1,4]), and have been studied extensively for estimating $β$ and $g (\cdot)$ . For example, Wahba [5], Engle et al. [1], and Green et al. [6] described penalized spline estimates of $β$ and $g (\cdot)$ . Heckman [7] and Rice [8] proposed the polynomial method. Speckman [9] described the kernel method. Chen and Shiau [10] used a smoothing spline method. Chen [11] proposed the projection method. For more discussions about partially linear models, we refer to Härdle et al. [12] for a summary.

In most cases, investigators are more interested in the parameter $β$ and take $g (\cdot)$ as a nuisance parameter [13]. Estimating the confidence interval for the parametric components in partially linear models using a backfitting algorithm or marginal integration can be computationally heavy. Severini and Staniswalis [14] derived the asymptotic properties for their proposed estimators of $β$ and $g (\cdot)$ under mild regularity conditions. These asymptotic properties serve as a foundation for constructing confidence intervals that are asymptotically accurate for the parameters. However, in practice, the finite-sample performance of these confidence intervals may be less satisfactory because of the complex structure of the covariance matrix, requiring estimates to be plugged in for multiple parameters. The linear components in the partially linear models can also be estimated using the generalized additive models [15], but the results depend on the distribution family used in the gam function in R. When a wrong distribution family is chosen, the results could be very biased. Confidence interval for the parametric components in the partially linear models can also be constructed based on the asymptotic normal distribution; however, this may not hold when the normality assumption fails or when the sample size is small.

Empirical likelihood provides a good alternative among the nonparametric methods that can be used to make statistical inference when the normality assumption fails or when the distribution is unspecified. The advantages of empirical likelihood compared to the bootstrap method and the jackknife method arise from it being a nonparametric method of inference based on a data-driven likelihood ratio function. As a combination of a nonparametric method and the likelihood method, on one hand, it does not require any specification of a family of distributions for the data; on the other hand, like parametric likelihood methods, it makes an automatic determination of the shape of confidence regions [16]. This property makes it a serious competitor with other nonparametric methods such as the bootstrap method and the jackknife method. Although empirical likelihood can be a very useful tool for deriving statistical inference, the use of a conventional empirical likelihood method or the profile empirical likelihood has limitations when constructing confidence intervals for each element of a large parameter vector.

Motivated by the above mentioned concerns, this paper develops an empirical-likelihood-based procedure which can be used to make inferences a for large parameter vector $β$ in partially linear models in Equation (1) by incorporating the projection method. The proposed method has two main advantages. First, it does not require distribution assumptions. Second, we provide theoretical justification that the proposed method can be applied to partially linear models, and the computation requirements are relatively straightforward because it does not require an asymptotic variance estimation. After the Bartlett correction, the coverage probability of the confidence interval is improved and better than normal-approximation-based methods in most cases.

The structure of the paper is as follows. Section 2 gives the model formulation of the empirical likelihood for the parameter of interest and the Bartlett correction procedure for the proposed method. Section 3 studies the performance of the proposed methods through simulation studies and illustrates the method by a real study example. Section 4 gives the conclusion. All the proofs are given in the Appendix A.

2. Materials and Methods

2.1. Model Formulation

Since the interest in this paper is in obtaining inference for $β$ only in the partially linear model, the nuisance parameter $g (\cdot)$ needs to be removed first. This is implemented by using the projection principle [17,18]. Y and $X$ need to be first regressed on Z using a nonparametric regression method, where $Y = (Y_{1}, \dots, Y_{n}), Z = (Z_{1}, \dots, Z_{n}), a n d X$ is an $n \times p$ covariate matrix. Denote the nonparametric regressions of Y on Z and $X$ on Z by $m_{Y} (Z)$ and $m_{X} (Z)$ , respectively. Here, without loss of generality, let X be a one-dimensional vector (for a multidimensional vector, $E (X_{i} | Z)$ can be obtained for each column $X_{i}$ of $X$ , respectively), and then, the effect of Z on Y and X can be removed by using the regression residual of Y and X given Z. For simplicity of notation, the matrix form of the partially linear model was used here:

(2) $Y = X β + g (Z) + ε .$

The first step is to regress Y and X onto Z and obtain the following equation

(3) $m_{Y} (Z) = m_{X} (Z) β + g (Z) .$

Then, Equation (3) is subtracted from the original model (2), and the residual model is obtained as follows:

(4) $Y - m_{Y} (Z) = {X - m_{X} (Z)} β + ε .$

Denote $A^{⨂ 2} = A A^{T}$ and $\tilde{ζ} = ζ - m_{ζ} (Z)$ . For example, $\tilde{X} = X - m_{X} (Z) .$ Assuming $\tilde{X}$ has full rank, based on Speckman [9], the estimator of $β$ can then be given by Equation (5) by the least squares method if $m_{X} (Z)$ and $m_{Y} (Z)$ are known:

(5) $\hat{β} = {[\sum_{i = 1}^{n} {(X_{i} - m_{X} (Z_{i}))}^{⨂ 2}]}^{- 1} [\sum_{i = 1}^{n} {X_{i} - m_{X} (Z_{i})} {Y_{i} - m_{Y} (Z_{i})}] .$

The formula above cannot be applied directly since $m_{X} (Z)$ and $m_{Y} (Z)$ need to be estimated appropriately. There are lots of methods for estimating $m_{X} (Z)$ and $m_{Y} (Z)$ , including local constant smoothers [9], higher-order local polynomial estimators [19], kernel methods with varying bandwidths, smoothing and regression splines, etc. Fan and Gijbels [19] showed that within the class of linear estimators which include kernel and spline estimates, the local linear estimates achieve the best possible rates of convergence. Due to these desirable properties, the local linear smoothers was used with fixed bandwidths for estimating the nonparametric regression of Y and X on Z. Let ${\hat{m}}_{X} (Z)$ and ${\hat{m}}_{Y} (Z)$ be the local linear nonparametric regression estimators for $m_{X} (Z)$ and $m_{Y} (Z)$ , $K (.)$ be a symmetric density function, h be a suitable bandwidth, and define $K_{h} (z) = K (z / h) / h$ ; then, the estimators take the form given by Fan and Gijbels [19]:

(6) ${\hat{m}}_{X} (Z) = \frac{\sum_{i = 1}^{n} w_{i} X_{i}}{\sum_{i = 1}^{n} w_{i}} and {\hat{m}}_{Y} (Z) = \frac{\sum_{i = 1}^{n} w_{i} Y_{i}}{\sum_{i = 1}^{n} w_{i}},$

$w_{i} = K_{h} (Z_{i} - Z) {S_{n, 2} - (Z_{i} - Z) S_{n, 1}},$

where

S_{n, j} = \sum_{1}^{n} K_{h} (Z_{i} - Z) {(Z_{i} - Z)}^{j}

m_{X} (Z)

and

m_{Y} (Z)

are then replaced with their corresponding estimates

{\hat{m}}_{X} (Z), {\hat{m}}_{Y} (Z)

in the estimating procedure (a Gaussian kernel is an example of kernel function used in the estimating procedure) and the empirical likelihood estimator of

β

needs to satisfy the following estimating equation:

(7) $\sum_{i = 1}^{n} {X_{i} - {\hat{m}}_{X} (Z_{i})}^{T} [Y_{i} - {\hat{m}}_{Y} (Z_{i}) - {X_{i} - {\hat{m}}_{X} (Z_{i})} β] = 0 .$

This implies that the estimator for

\hat{β}

can be obtained by

(8) ${\hat{β}}^{*} = {[\sum_{i = 1}^{n} {(X_{i} - {\hat{m}}_{X} (Z_{i}))}^{⨂ 2}]}^{- 1} [\sum_{i = 1}^{n} {X_{i} - {\hat{m}}_{X} (Z_{i})} {Y_{i} - {\hat{m}}_{Y} (Z_{i})}] .$

Next, the empirical likelihood principle was applied to construct statistical inference for $β$ . Let $p_{i}$ be the probability assigned to $(X_{i}, Z_{i}, Y_{i})$ . The empirical likelihood ratio function for $β$ can be expressed as:

(9) $\begin{matrix} R_{n} (β) = sup_{p_{i}} {\prod_{i = 1}^{n} n p_{i} | \sum_{i = 1}^{n} p_{i} {X_{i} - {\hat{m}}_{X} (Z_{i})}^{T} [Y_{i} - {\hat{m}}_{Y} (Z_{i}) - {X_{i} - {\hat{m}}_{X} (Z_{i})} β] = 0, \\ p_{i} \geq 0, \sum_{i = 1}^{n} p_{i} = 1} . \end{matrix}$

We establish the asymptotic distribution of $- 2 log {R_{n} (β)}$ under the following assumptions:

Assumption 1.

$E (‖ X ‖^{4}) < \infty$ , $E (‖ X ‖^{2} Y^{2}) < \infty$ , and $E (X^{T} X)$ is nonsingular; X and Z are correlated.

Assumption 2.

The bandwidths used in estimating $m_{x} (Z)$ and $m_{y} (Z)$ are of order $n^{- 1 / 5}$ .

Assumption 3.

The function $K (.)$ is a bounded symmetric density function with compact support and satisfies $\int K (u) d u = 1, \int K (u) u d u = 0$ and $\int u^{2} K (u) d u = 1$ .

Assumption 4.

The functions $m_{x} (Z)$ and $m_{y} (Z)$ have bounded and continuous second derivatives.

Assumption 5.

The density function of $Z, f_{z} (Z)$ is bounded away from zero and has bounded continuous second derivatives.

Theorem 1.

$- 2 log {R_{n} (β)}$ converges to a chi-squared distribution with p degrees of freedom under Assumptions 1–5.

The proof of Theorem 1 is given in the Appendix A. A confidence region for $β$ can be constructed based on Theorem 1 and further adjusted by using the Bartlett correction [20].

When $β$ is a vector (or when $X$ is an $n \times p$ matrix), and we are interested in a subset of the parameter vector $β$ , say the first element $β_{1}$ , we can apply the projection method again, i.e., we regress ${\hat{X}}_{1}$ , the first column of $\hat{X}$ , which is $X - {\hat{m}}_{X} (Z)$ , onto the space of ${\hat{X}}_{- 1}$ , which is the remaining columns of $\hat{X}$ . Similarly, we apply the same projection principle from $\hat{Y} = Y - {\hat{m}}_{Y} (Z)$ to ${\hat{X}}_{- 1}$ . Then, we obtain a new residual model, i.e., $β_{1}$ should satisfy the estimating equation as follows:

(10) $\frac{1}{n} \sum_{i = 1}^{n} {{\hat{X}}_{1} - \hat{E} ({\hat{X}}_{1} | {\hat{X}}_{- 1})}^{T} [\hat{Y} - \hat{E} (\hat{Y} | {\hat{X}}_{- 1}) - {{\hat{X}}_{1} - \hat{E} ({\hat{X}}_{1} | {\hat{X}}_{- 1})} β_{1}] = 0 .$

Let

p_{i}

be the probability assigned to

(Y_{i}, X_{i}, Z_{i})

, where

p_{i}

could be different from the

p_{i}

in Theorem 1. The estimating equation for

β_{1}

can be written as:

(11) $\begin{matrix} R_{n} (β_{1}) = sup_{p_{i}} {\prod_{i = 1}^{n} n p_{i} | \sum_{i = 1}^{n} p_{i} {{\hat{X}}_{1} - \hat{E} ({\hat{X}}_{1} | {\hat{X}}_{- 1})}^{T} [\hat{Y} - \hat{E} (\hat{Y} | {\hat{X}}_{- 1}) \\ - {{\hat{X}}_{1} - \hat{E} ({\hat{X}}_{1} | {\hat{X}}_{- 1})} β_{1}] = 0, p_{i} \geq 0, \sum_{i = 1}^{n} p_{i} = 1} . \end{matrix}$

Theorem 2.

$- 2 log {R_{n} (β_{1})}$ converges to a chi-squared distribution with $1$ degree of freedom under Assumptions 1–5.

The proof of Theorem 2 is given in the Appendix A. Based on Theorem 2, the $100 (1 - α) %$ empirical likelihood confidence interval for $β_{1}$ can be obtained by:

(12) ${β_{1} : - 2 log {R_{n} (β_{1})} \leq c_{α}} .$

The

100 (1 - α) %

confidence interval for other components of

β

can be constructed similarly.

2.2. Bartlett Correction

To further improve the accuracy of the inference, the empirical likelihood ratio may be Bartlett corrected with a higher-order error than the usual error term of order $O (n^{- 1})$ [20]. The Bartlett correction can effectively control the coverage error of the confidence interval, providing more accurate estimations and reducing the chance of obtaining intervals that do not contain the true parameter value. The basic idea is to multiply the $χ^{2}$ threshold by a constant $(1 + B_{c} / n)$ instead of 1, where $B_{c}$ is the Bartlett correction constant. Because it is very difficult to obtain an exact expression for $B_{c}$ , we give an estimator of $(1 + B_{c} / n)$ by using the bootstrap procedure, which has successfully been applied in a more complex setting by Chen and Cui [21].

The Bartlett correction of the empirical likelihood confidence interval for a parameter of interest $β_{1}$ in a partially linear model in Equation (1) is constructed by the following procedures. The procedures for another component of $β$ , say $β_{2}$ , would be similar.

First, the nonparametric regression method is used to regress Y and X on the nonparametric component Z. The reduced partial residuals follow a linear model of the form $Y - m_{Y} (Z) = {X - m_{X} (Z)} β + ε .$ We use ${\hat{m}}_{Y} (Z)$ and ${\hat{m}}_{X} (Z)$ to replace $m_{Y} (Z)$ and $m_{X} (Z)$ in the estimating procedure.
Then, the first column of $\hat{X}$ (denoting by ${\hat{X}}_{1}$ ) is regressed on the rest of the columns (denoting by ${\hat{X}}_{- 1})$ . The residual serves as the new fixed covariates of $β_{1}$ , and the residual of regressing $\hat{Y}$ on ${\hat{X}}_{- 1}$ serves as the new response variable. The residual model is obtained and given by
${\hat{Y} - E (\hat{Y} | {\hat{X}}_{- 1})} = {{\hat{X}}_{1} - E ({\hat{X}}_{1} | {\hat{X}}_{- 1})} β + ε .$
We treat the residual model as the new linear model. The bootstrap procedure of estimating the Bartlett correction factor in the new linear model follows the procedure shown below:
- (a).. Generate bootstrap resamples of size n by sampling with replacement from the sample ${\hat{Y} - E (\hat{Y} | {\hat{X}}_{- 1})}_{1}^{n}$ and ${{\hat{X}}_{1} - E ({\hat{X}}_{1} | {\hat{X}}_{- 1})}_{1}^{n}$ , respectively, after the projection; then, calculate $- 2 log {R_{n}^{*} ({\hat{β}}_{1})}$ based on the resamples, where $\hat{β_{1}}$ is the global maximum empirical likelihood estimator of $β_{1}$ based on the original sample ${\hat{Y} - E (\hat{Y} | {\hat{X}}_{- 1})}_{1}^{n}$ and ${{\hat{X}}_{1} - E ({\hat{X}}_{1} | {\hat{X}}_{- 1})}_{1}^{n}$ .
- (b).. Repeat (a) B times to obtain $- 2 log {R_{n}^{* 1} ({\hat{β}}_{1})}, - 2 log {R_{n}^{* 2} ({\hat{β}}_{1})}, \dots, - 2 log {R_{n}^{* B} ({\hat{β}}_{1})}$ and $B^{- 1} \sum_{b = 1}^{B} - 2 log {R_{n}^{* b} ({\hat{β}}_{1})}$ , which is the bootstrap estimator of $E [- 2 log {R_{n} ({\hat{β}}_{1})}]$ .

The bootstrap estimator of $τ$ is $B^{- 1} \sum_{b = 1}^{B} - 2 log {R_{n}^{* b} ({\hat{β}}_{1})}$ . In consequence, the Bartlett corrected confidence region is constructed by

$C I_{α} = {β_{1} : - 2 log {R_{n} (β_{1})} \leq \hat{τ} c_{α}} .$

The Bartlett corrected confidence interval for

β_{1}

is thus constructed.

3. Results

3.1. Simulation Studies

In the simulation studies, we studied the performance of the proposed method in getting the inference of the parameter of interest $β$ in the partially linear model (1). We first simulated Z from a $U n i f (0, 1)$ distribution with sample size n. The true $β$ value was set to be $β = (2, 5, 7, 4)$ , and we aimed to estimate the first component of $β$ . X was set to be the sum of two matrices $\sum_{1}$ and $\sum_{2}$ , where $\sum_{1}$ was the matrix composed of vectors $1.5 \exp (1.5 z), 5 z, 5 \sqrt{z}$ , and $3 z + z^{2}$ . $\sum_{2}$ was the matrix of error terms composed of n samples from the scaled multivariate normal distribution with zero mean and a compound symmetry covariance matrix with diagonal 1 and off-diagonal $0.4$ ; the scale parameter was $0.5$ . The columns of the X matrix were functions of Z and were thus correlated. The nonparametric component $g (Z)$ took the function $g (Z) = \sin (Z)$ . Two cases for the distribution of the error term $ε$ were considered,

Case 1:. $ε$ follows a normal distribution with mean 0 and variance $σ^{2} = 1$ .
Case 2:. $ε$ follows the scaled log-normal distribution such that $ε$ has mean 0 and variance $σ^{2} = 1$ .

In the simulations, the sample sizes were considered to be 50, 100, and 200. In each simulation, we generated 1000 independent data sets and constructed the $95 %$ confidence interval for each data set. In estimating the nonparametric regression of $m_{y} (Z)$ and $m_{x} (Z)$ , the direct plug-in method was used to select the bandwidth of a local linear Gaussian kernel regression estimate, as described by Ruppert, Sheather, and Wand [22]. The proposed method was compared with the normal-based method and the generalized additive model method (gam) [15].

Table 1 gives the average results from the 1000 simulations (the endpoints of the confidence intervals were obtained by the medians of the 1000 simulation results, and the confidence interval lengths were computed using the difference of the two endpoints). In Table 1, Est refers to the estimated $β_{1}$ value; Norm, Gam, EL, and ELb refer to the normal-based method, the gam function in R, the empirical-likelihood-based method without Bartlett correction, and the empirical-likelihood-based method with Bartlett correction, respectively. Length and coverage probability refer to the respective length and coverage probability of the confidence intervals constructed using the four different methods. It is worth mentioning that each confidence interval based on the normal approximation is symmetric while the confidence interval based on empirical likelihood is not symmetric. In the simulation, the Gaussian distribution was used as the distribution family within the gam function under both error cases.

The simulation results from Table 1 indicates that the Bartlett correction indeed improved the statistical inference. The coverage probability was improved after the Bartlett correction, especially when the sample size is small, where the normal approximation method may not be appropriate. When the sample size is small (for example $n = 50$ ), our proposed method tends to enlarge the confidence interval to have a better coverage probability for the true parameter. In that case, the length of the confidence interval for the Bartlett correction is larger than that of the normal approximation, the gam method, and the empirical likelihood without Bartlett correction, but the coverage probability is the closest to the nominal level $95 %$ . When the sample size becomes larger, the length of the confidence interval using the proposed method tends to be close to or shorter than the confidence interval of the normal approximation method and yet still has slightly better or equally good coverage probability compared to the normal approximation method and gam method.

3.2. A Real Study Example

The proposed method is illustrated by an application to the Boston housing data set, which was obtained from the StatLib archive and has been extensively used in regression analysis. The data set consists of the median value of owner-occupied homes in 506 US census tracts in the Boston area in 1970, as well as several variables which might explain the variation in housing values. Based on the correlations and multicollinearity analysis, we fit a partially linear model with the variable of interest MEDV (median value of owner-occupied home in USD 1000) linearly related with predictor PTRATIO (pupil–teacher ratio by town), RM (number of rooms per dwelling), and nonlinearly related with variable LSTAT (% lower status of the population). The partially linear model has the following form:

$\begin{matrix} M E D V = β_{0} + β_{1} P T R A T I O + β_{2} R M + g (L S T A T) + ε . \end{matrix}$

The proposed method was used to construct the 95% confidence interval for $β_{1}$ . The proposed empirical-likelihood-based Bartlett corrected 95% confidence interval for $β_{1}$ was (2.375, 4.656), and the normal-based 95% confidence interval was (2.406, 4.502). Both methods indicated a positive linear relationship between PTRATIO and MEDV, with the proposed method’s confidence interval slightly wider than the normal-based confidence interval. Based on our simulation results for the coverage probability under a large sample size, the confidence interval obtained from the proposed method was comparable with the normal-based confidence interval and was trustworthy.

4. Discussion

In this paper, an empirical-likelihood-based method to construct the confidence interval for the linear components in partially linear models was proposed. Simulation studies showed that the length of the confidence interval for the proposed empirical likelihood with Bartlett correction method was larger than the normal approximation when the sample size was small, but the coverage probability was the closest to the nominal $95 %$ level. When the sample size was larger, the confidence interval for the proposed empirical likelihood with Bartlett correction method had a slightly shorter length and a similar coverage probability as the normal-based method and gam method, which indicated the confidence interval constructed by the proposed method was more desirable in estimating the parameter of interest. The above findings are mostly true under both normally distributed error and non-normally distributed error terms. This ensures the robustness of our proposed test numerically, which also makes the proposed method a practically useful tool in real studies where we usually do not know the distribution of the data. The trade-off of the proposed method is that it requires more computation than the normal-approximation method.

In summary, this proposed method gives better inference in terms of the length and coverage probabilities of the confidence intervals compared to the normal-approximation-based method. It does not impose any restrictions on the data distribution, and the computations are relatively straightforward for partially linear models. This proposed method is recommended for estimating and constructing confidence intervals for the linear components in partially linear models, particularly when the sample size is small.

Author Contributions

Methodology, H.S.; software, H.S. and L.C.; formal analysis, H.S. and L.C.; writing—original draft, revision, H.S.; writing—review and editing, L.C. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The boston housing data used in the paper was obtained from the StatLib archive (http://lib.stat.cmu.edu/datasets/boston, accessed on 1 November 2023).

Acknowledgments

The authors would like to thank the editor and three referees for their insightful comments that significantly improved an earlier version of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Table

Table 1

Confidence interval and coverage probability for partially linear models; $β = (2, 5, 7, 4)$ , $σ = 1$ , and $β_{1}$ is the parameter of interest.

	n	Est	Length				Coverage Probability
			Norm	Gam	EL	ELb	Norm	Gam	EL	ELb
Norm	50	1.957	1.496	1.498	1.437	1.520	0.935	0.936	0.924	0.945
	100	2.047	1.052	1.085	1.016	1.049	0.963	0.963	0.949	0.953
	200	2.026	0.706	0.704	0.689	0.701	0.946	0.950	0.944	0.950
Non-norm	50	1.975	1.368	1.289	1.278	1.386	0.946	0.934	0.944	0.950
	100	2.05	1.012	1.051	0.980	1.008	0.973	0.962	0.954	0.956
	200	2.027	0.681	0.689	0.668	0.677	0.944	0.940	0.930	0.946

Appendix A

Proof of Theorem 1.

First, we give the following fact, which is used later in the proof of the theorem. Its proof can be shown by Assumptions 2–5: (A1) ${\hat{m}}_{x} (Z) - m_{x} (Z) = o_{p} (n^{- 1 / 4}), {\hat{m}}_{y} (Z) - m_{y} (Z) = o_{p} (n^{- 1 / 4})$

From $Y = X β + g (Z) + ε$ , we have $Y - m_{y} (Z) = {X - m_{x} (Z) β} + ε .$ In the estimating process, it is rewritten as $\hat{Y} = \hat{X} β + ε$ with the notation that ${\hat{m}}_{ξ} (T)$ is the local linear kernel regression estimator (for example, the kernel function can be the Gaussian kernel function, the bandwidth h can be determined by using the direct plug-in method by Ruppert, Sheather, and Wand [22] of a local linear Gaussian kernel regression estimate) of $m_{ξ} (T)$ and $\hat{ξ} = ξ - {\hat{m}}_{ξ} (T)$ .

Let $Ω_{i} = {\hat{X}}_{i}^{T} ({\hat{Y}}_{i} - {\hat{X}}_{i} β)$ , ${\tilde{Ω}}_{i} = {\tilde{X}}_{i}^{T} ({\tilde{Y}}_{i} - {\tilde{X}}_{i} β) .$ A standard simplification as in Owen [16] yields $p_{i} = \frac{1}{n (1 + a^{T} {\tilde{Ω}}_{i})}, i = 1, \dots, n,$ where a is the solution of the equation $n^{- 1} \sum_{i = 1}^{n} \frac{{\tilde{Ω}}_{i}}{1 + a^{T} {\tilde{Ω}}_{i}} = 0 .$ A direct calculation yields (A2) $\begin{matrix} {\tilde{Ω}}_{i} - Ω_{i} = {\tilde{X}}_{i}^{T} ({\tilde{Y}}_{i} - {\tilde{X}}_{i} β) - {({\hat{X}}_{i} - {\tilde{X}}_{i} + {\tilde{X}}_{i})}^{T} ({\hat{Y}}_{i} - {\tilde{Y}}_{i} + {\tilde{Y}}_{i} - {\hat{X}}_{i} β + {\tilde{X}}_{i} β - {\tilde{X}}_{i} β) \\ = {\tilde{X}}_{i}^{T} ({\tilde{Y}}_{i} - {\hat{Y}}_{i}) + {\tilde{X}}_{i}^{T} ({\hat{X}}_{i} - {\tilde{X}}_{i}) β - {({\hat{X}}_{i} - {\tilde{X}}_{i})}^{T} ({\hat{X}}_{i} - {\tilde{Y}}_{i}) \\ + {({\hat{X}}_{i} - {\tilde{X}}_{i})}^{T} ({\hat{X}}_{i} - {\tilde{X}}_{i}) β - {({\hat{X}}_{i} - {\tilde{X}}_{i})}^{T} ({\tilde{Y}}_{i} - {\tilde{X}}_{i} β) \\ = o_{p} (1), \end{matrix}$ where $o_{p} (1)$ is independent of the index i. The above equation is of order $o_{p} (1)$ because ${\tilde{X}}_{i}$ is $O_{p} (1)$ . ${\tilde{Y}}_{i} - {\hat{Y}}_{i}$ and ${\tilde{X}}_{i} - {\hat{X}}_{i}$ are of order $o_{p} (n^{- 1 / 4})$ by (A1).

Using arguments similar to those in the proof of Theorem 3.2 of Owen [16], we have $‖ a ‖ = O_{p} (n^{- 1 / 2}) and max_{1 \leq i \leq n} ‖ {\tilde{Ω}}_{i} ‖ = o_{p} (n^{1 / 2}) .$ With (A2), we have ${max}_{1 \leq i \leq n} ‖ Ω_{i} ‖ \leq {max}_{1 \leq i \leq n} ‖ {\tilde{Ω}}_{i} ‖ + o_{p} (1) = o_{p} (n^{1 / 2}) .$ Using the same argument as those in the proof of Theorem 4 in Liang et al. [23], we have $\begin{matrix} - 2 log {R_{n} (β)} & = & \sum_{i = 1}^{n} a^{T} {\tilde{Ω}}_{i} {\tilde{Ω}}_{i}^{T} a + o_{p} (1) \\ = & {(n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{Ω}}_{i})}^{T} {(n^{- 1} \sum_{i = 1}^{n} {\tilde{Ω}}_{i} {\tilde{Ω}}_{i}^{T})}^{- 1} (n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{Ω}}_{i}) + o_{p} (1) . \end{matrix}$ Now, we need to show that by replacing $\tilde{Ω}$ with $Ω_{i}$ , the above equation still holds. To show that, we first show that $n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{X}}_{i}^{T} ({\tilde{Y}}_{i} - {\hat{Y}}_{i}) = o_{p} (1)$ . Given $ϵ > 0$ and some constant c, let $a_{i} = {\tilde{Y}}_{i} - {\hat{Y}}_{i} = {\hat{m}}_{y} (Z_{i}) - m_{y} (Z_{i})$ . We have $\begin{matrix} P (n^{- 1 / 2} | \sum_{i = 1}^{n} {\tilde{X}}_{i}^{T} ({\tilde{Y}}_{i} - {\hat{Y}}_{i}) | > ϵ) \\ \leq & P (n^{- 1 / 2} | \sum_{i = 1}^{n} {\tilde{X}}_{i}^{T} a_{i} | > ϵ, | a_{i} | \leq c n^{- 1 / 4}) + P (| a_{i} | > c n^{- 1 / 4}) \\ \leq & P (n^{- 1 / 2} \cdot c n^{- 1 / 4} | \sum_{i = 1}^{n} {\tilde{X}}_{i}^{T} | > ϵ) + o_{p} (1) \\ = & P (n^{- 1 / 2} | \sum_{i = 1}^{n} {\tilde{X}}_{i}^{T} | > c n^{1 / 4} ϵ) + o_{p} (1) \\ = & o_{p} (1) + o_{p} (1) \\ = & o_{p} (1) . \end{matrix}$ In the above equations, $P (n^{- 1 / 2} \sum_{i = 1}^{n} | {\tilde{X}}_{i}^{T} | > c n^{1 / 4} ϵ) = o_{p} (1)$ because $n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{X}}_{i}^{T} \sim N (0, v (X | Z))$ , where $v (X | Z)$ is the covariance matrix of $X - E (X | Z)$ . Similarly, we can show $\begin{matrix} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\tilde{X}}_{i}^{T} ({\tilde{X}}_{i} - {\hat{X}}_{i}) β = o_{p} (1) . \\ \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {({\hat{X}}_{i} - {\tilde{X}}_{i})}^{T} ({\tilde{Y}}_{i} - {\tilde{X}}_{i}) β = o_{p} (1) . \end{matrix}$

To show $\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {({\hat{X}}_{i} - {\tilde{X}}_{i})}^{T} ({\hat{Y}}_{i} - {\tilde{Y}}_{i}) = o_{p} (1)$ , note that (A3) $\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {({\hat{X}}_{i} - {\tilde{X}}_{i})}^{T} ({\hat{Y}}_{i} - {\tilde{Y}}_{i}) = n^{- 1 / 2} o_{p} (n^{- 1 / 4}) \cdot o_{p} (n^{- 1 / 4}) \cdot n = o_{p} (1) .$ The first equal sign in (A3) holds because ${sup}_{‖ Z_{i} ‖} ∣ {\hat{m}}_{w} (Z_{i}) - m_{w} (Z_{i}) ∣ = o_{p} (n^{- 1 / 4})$ , where $w = X$ or $w = Y$ , so that we can take $o_{p} (n^{- 1 / 4})$ out of the summation. With the same procedure, we can also show that $\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {({\hat{X}}_{i} - {\tilde{X}}_{i})}^{T} ({\hat{X}}_{i} - {\tilde{X}}_{i}) β = o_{p} (1) .$

These arguments imply that $n_{1}^{- 1 / 2} \sum_{i = 1}^{n} Ω_{i}$ and $n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{Ω}}_{i}$ asymptotically have the same limiting normal distribution, and $n^{- 1} \sum_{i = 1}^{n} Ω_{i} Ω_{i}^{T}$ and $n^{- 1} \sum_{i = 1}^{n} {\tilde{Ω}}_{i} {\tilde{Ω}}_{i}^{T}$ have the same limiting value. Since ${(n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{Ω}}_{i})}^{T} (n^{- 1} \sum_{i = 1}^{n} {\tilde{Ω}}_{i} {\tilde{Ω}}_{i}^{T}) (n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{Ω}}_{i}) \sim χ_{p}^{2},$ we have ${(n^{- 1 / 2} \sum_{i = 1}^{n} Ω_{i})}^{T} (n^{- 1} \sum_{i = 1}^{n} Ω_{i} Ω_{i}^{T}) (n^{- 1 / 2} \sum_{i = 1}^{n} Ω_{i}) \sim χ_{p}^{2} .$ The proof is thus complete. □

Proof of Theorem 2.

We continue to use the notations $\tilde{ξ} = ξ - ξ (Z)$ , $\hat{ξ} = ξ - {\hat{m}}_{ξ} (Z)$ for any random vector $ξ$ . First, denote $Ω_{i} = {{\hat{X}}_{1 i} - \hat{E} {({\hat{X}}_{1} | {\hat{X}}_{- 1})}_{i}}^{T} [{\hat{Y}}_{i} - \hat{E} ({\hat{Y}}_{i} | {\hat{X}}_{- 1 i}) - {{\hat{X}}_{1 i} - \hat{E} {({\hat{X}}_{1} | {\hat{X}}_{- 1})}_{i} β_{1}}] .$ ${\hat{Ω}}_{i} = {{\hat{X}}_{1 i} - E {({\hat{X}}_{1} | {\hat{X}}_{- 1})}_{i}}^{T} [{\hat{Y}}_{i} - E ({\hat{Y}}_{i} | {\hat{X}}_{- 1 i}) - {{\hat{X}}_{1 i} - E {({\hat{X}}_{1} | {\hat{X}}_{- 1})}_{i} β_{1}}] .$ ${\tilde{Ω}}_{i} = {{\tilde{X}}_{1 i} - E {({\tilde{X}}_{1} | {\tilde{X}}_{- 1})}_{i}}^{T} [{\tilde{Y}}_{i} - E ({\tilde{Y}}_{i} | {\tilde{X}}_{- 1 i}) - {{\tilde{X}}_{1 i} - E {({\tilde{X}}_{1} | {\tilde{X}}_{- 1})}_{i} β_{1}}] .$ We first need to show $Ω_{i} = {\tilde{Ω}}_{i} + o_{p} (1)$ , and $n_{1}^{- 1 / 2} \sum_{i = 1}^{n} Ω_{i}$ and $n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{Ω}}_{i}$ asymptotically have the same limiting distribution.

Since in the linear model case, we have proved that $Ω_{i} - {\hat{Ω}}_{i} = o_{p} (1)$ and $n_{1}^{- 1 / 2} \sum_{i = 1}^{n} {\hat{Ω}}_{i}$ and $n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{Ω}}_{i}$ asymptotically have the same limiting distribution, now we only need to show ${\hat{Ω}}_{i} - {\tilde{Ω}}_{i} = o_{p} (1)$ and (A4) $n^{- 1 / 2} \sum_{i = 1}^{n} {\hat{Ω}}_{i} - n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{Ω}}_{i} = o_{p} (1) .$ Assume $E (\tilde{Y} | {\tilde{X}}_{- 1}) = {\tilde{X}}_{- 1} η,$ $E ({\tilde{X}}_{1} | {\tilde{X}}_{- 1}) = {\tilde{X}}_{- 1} γ .$ Recall that in the estimating procedures, we replaced $\tilde{Y}, \tilde{X}$ with $\hat{Y}, \hat{X}$ , so we have $E (\hat{Y} | {\hat{X}}_{- 1}) = {\hat{X}}_{- 1} η,$ $E ({\hat{X}}_{1} | {\hat{X}}_{- 1}) = {\hat{X}}_{- 1} γ .$ We first show that with this replacement, ${\tilde{Ω}}_{i} = {\hat{Ω}}_{i} + o_{p} (1)$ holds. Note that by using Equation (A1) where $m_{v} (Z) - {\hat{m}}_{v} (Z) = o_{p} (1)$ and $v = x or y$ , because ${\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ$ , ${\tilde{Y}}_{i} - {\tilde{X}}_{- 1 i} η$ and ${\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ$ are random variables, we have $\begin{matrix} {\tilde{Ω}}_{i} - {\hat{Ω}}_{i} \\ = & {({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ)}^{T} {{\tilde{Y}}_{i} - {\hat{Y}}_{i} - ({\tilde{X}}_{- 1 i} - {\hat{X}}_{- 1 i}) η - ({\tilde{X}}_{1 i} - {\hat{X}}_{1 i}) β_{1} + ({\tilde{X}}_{- 1 i} - {\hat{X}}_{- 1 i}) γ β_{1}} \\ - {({\hat{X}}_{1 i} - {\tilde{X}}_{1 i}) - ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) γ}^{T} [{\hat{Y}}_{i} - {\tilde{Y}}_{i} - ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) η + {\tilde{Y}}_{i} - {\tilde{X}}_{- 1 i} η \\ - {{\hat{X}}_{1 i} - {\tilde{X}}_{1 i} - ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) γ + {\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ} β_{1}] \\ = & O_{p} (1) {o_{p} (n^{- 1 / 4}) - o_{p} (n^{- 1 / 4}) η - o_{p} (n^{- 1 / 4}) β_{1} + o_{p} (n^{- 1 / 4}) γ β_{1}} \\ - {o_{p} (n^{- 1 / 4}) - o_{p} (n^{- 1 / 4}) γ} [o_{p} (n^{- 1 / 4}) - o_{p} (n^{- 1 / 4}) η + O_{p} (1) \\ - {o_{p} (n^{- 1 / 4}) - o_{p} (n^{- 1 / 4}) γ - O_{p} (1)} β_{1}] \\ = & o_{p} (1) . \end{matrix}$

To show Equation (A4), we first need to show $n^{- 1 / 2} \sum_{i = 1}^{n} ({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ) ({\tilde{Y}}_{i} - {\hat{Y}}_{i}) = o_{p} (1) .$ For a given $ϵ$ and a certain constant c, we have (A5) $\begin{matrix} P \{n^{- 1 / 2} | \sum_{i = 1}^{n} ({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ) ({\tilde{Y}}_{i} - {\hat{Y}}_{i}) | > ϵ\} \\ \leq P (n^{- 1 / 2} | \sum_{i = 1}^{n} ({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ) ({\tilde{Y}}_{i} - {\hat{Y}}_{i}) | > ϵ, | {\tilde{Y}}_{i} {\hat{Y}}_{i} | \leq c n^{- 1 / 4}) \\ + P (| {\tilde{Y}}_{i} - {\hat{Y}}_{i} | > c n^{- 1 / 4}) \\ \leq P (n^{- 1 / 2} c n^{- 1 / 4} | \sum_{i = 1}^{n} ({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ) | > ϵ) + o_{p} (1) \\ = o_{p} (1) . \end{matrix}$

Equation (A5) holds because $n^{- 1 / 2} \sum_{i = 1}^{n} ({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ)$ converges to $N (0, v ({\tilde{X}}_{1} | {\tilde{X}}_{- 1}))$ , where $v ({\tilde{X}}_{1} | {\tilde{X}}_{- 1})$ is the variance of $({\tilde{X}}_{1} - {\tilde{X}}_{- 1} γ)$ . Using a similar proof, we have $n^{- 1 / 2} \sum_{i = 1}^{n} ({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ) ({\tilde{X}}_{- 1 i} - {\hat{X}}_{- 1 i}) η = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} ({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ) ({\tilde{X}}_{1 i} - {\hat{X}}_{1 i}) β_{1} = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} ({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ) ({\tilde{X}}_{- 1 i} - {\hat{X}}_{- 1 i}) γ β_{1} = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} ({\hat{X}}_{1 i} - {\tilde{X}}_{1 i}) ({\tilde{Y}}_{i} - {\tilde{X}}_{- 1 i} η) = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} ({\hat{X}}_{1 i} - {\tilde{X}}_{1 i}) ({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ) = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) γ ({\tilde{Y}}_{i} - {\tilde{X}}_{- 1 i} η) = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) γ ({\tilde{X}}_{1 i} - {\tilde{X}}_{- 1 i} γ) = o_{p} (1) .$ With the same proof as in Equation (A3), we have $n^{- 1 / 2} \sum_{i = 1}^{n} {({\hat{X}}_{1 i} - {\tilde{X}}_{1 i})}^{T} ({\hat{Y}}_{i} - {\tilde{Y}}_{i}) = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} {({\hat{X}}_{1 i} - {\tilde{X}}_{1 i})}^{T} ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) η = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} {({\hat{X}}_{1 i} - {\tilde{X}}_{1 i})}^{T} ({\hat{X}}_{1 i} - {\tilde{X}}_{1 i}) β_{1} = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} {({\hat{X}}_{1 i} - {\tilde{X}}_{1 i})}^{T} ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) γ β_{1} = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) γ^{T} ({\hat{Y}}_{i} - {\tilde{Y}}_{i}) = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) γ^{T} ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) η = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) γ^{T} ({\hat{X}}_{1 i} - {\tilde{X}}_{1 i}) β_{1} = o_{p} (1) .$ $n^{- 1 / 2} \sum_{i = 1}^{n} ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) γ^{T} ({\hat{X}}_{- 1 i} - {\tilde{X}}_{- 1 i}) γ β_{1} = o_{p} (1) .$ With the above equations, Equation (A4) holds. The proof is thus completed following the same procedure as in proving Theorem 1. □

References

1. Engle, R.; Granger, C.; Rice, J.; Weiss, A. Nonparametric estimates of the relation between weather and electricity sales. J. Am. Stat. Assoc.; 1986; 81, pp. 310-320. [DOI: https://dx.doi.org/10.1080/01621459.1986.10478274]

2. Hastie, T.; Tibshirani, R. Generalized Additive Models; Chapman and Hall: London, UK, 1990.

3. Linton, O.; Nielsen, J. A Kernel Method of Estimating Structured Nonparametric Regression Based on Marginal Integration. Biometrika; 1995; 82, pp. 93-100. [DOI: https://dx.doi.org/10.1093/biomet/82.1.93]

4. Gray, R. Spline-based test in survival analysis. Biometrika; 1994; 50, pp. 640-652. [DOI: https://dx.doi.org/10.2307/2532779]

5. Wahba, G. Cross validated spline methods for the estimation of multivariate functions from data on functionals. Proceedings of the Iowa State University Statistical Laboratory 50th Anniversary Conference, Ames, IA, USA, 13–15 June 1984; David, H.A.; David, H.T. The Iowa State University Press: Ames, IA, USA, 1984; pp. 205-235.

6. Green, P.; Jennison, C.; Seheult, A. Analysis of field experiments by least squares smoothing. J. R. Stat. Soc. Ser.; 1985; 47, pp. 299-315. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1985.tb01358.x]

7. Heckman, N. Smoothing Spline in partly linear models. J. R. Stat. Ser. B; 1986; 48, pp. 244-248. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1986.tb01407.x]

8. Rice, J. Convergence rates for partially splined models. Stat. Probab. Lett.; 1994; 4, pp. 203-208. [DOI: https://dx.doi.org/10.1016/0167-7152(86)90067-2]

9. Speckman, P. Kernel Smoothing in Partial Linear Models. J. R. Stat. Soc. Ser. B; 1988; 50, pp. 413-436. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1988.tb01738.x]

10. Chen, H.; Shiau, J.J.H. Data-Driven Efficient Estimators for a Partially Linear Model. Ann. Stat.; 1994; 22, pp. 211-237. [DOI: https://dx.doi.org/10.1214/aos/1176325366]

11. Chen, H. Convergence rates for parametric components in a partly linear model. Ann. Stat.; 1988; 16, pp. 136-146. [DOI: https://dx.doi.org/10.1214/aos/1176350695]

12. Härdle, W.; Liang, H.; Gao, J. Partially Linear Models; Springer Physica: Heidelberg, Germany, 2000.

13. Liang, H. Estimation in Partially Linear Models and Numerical Comparisons. Comput. Stat. Data Anal.; 2006; 50, pp. 675-687. [DOI: https://dx.doi.org/10.1016/j.csda.2004.10.007] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20174596]

14. Severini, T.; Staniswalis, J. Quasilikelihood estimation in semiparametric models. J. Am. Stat. Assoc.; 1994; 89, pp. 501-511. [DOI: https://dx.doi.org/10.1080/01621459.1994.10476774]

15. Hastie, T.J. Generalized Additive Models; Wadsworth and Brooks/Cole: Pacific Grove, CA, USA, 1992.

16. Owen, A. Empirical Likelihood; Chapman and Hall/CRC: London, UK, 2001.

17. Robinson, P.M. Root-n-consistent semiparametric regression. Econometrica; 1988; 56, pp. 931-954. [DOI: https://dx.doi.org/10.2307/1912705]

18. Su, H.; Liang, H. An empirical likelihood-based method for comparison of treatment effects-test of equality of coefficients in linear models. Comput. Stat. Data Anal.; 2010; 54, pp. 1079-1088. [DOI: https://dx.doi.org/10.1016/j.csda.2009.10.018] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20161586]

19. Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications; Chapman and Hall/CRC: London, UK, 1996.

20. DiCiccio, T.; Hall, P.; Romano, J. Empirical likelihood is bartlett-correctable. Ann. Stat.; 1991; 19, pp. 1053-1061. [DOI: https://dx.doi.org/10.1214/aos/1176348137]

21. Chen, S.; Cui, H. On bartlett correction of empirical likelihood in the presense of nuisance parameters. Biometrica; 2006; 93, pp. 215-220. [DOI: https://dx.doi.org/10.1093/biomet/93.1.215]

22. Ruppert, D.; Sheather, S.J.; Wand, M.P. An effective bandwidth selector for local least squares regression. J. Am. Med. Assoc.; 1995; 90, pp. 1257-1270. [DOI: https://dx.doi.org/10.1080/01621459.1995.10476630]

23. Liang, H.; Wang, S.; Carroll, R. Partially linear models with missing response variables and error-prone covariates. Biometrica; 2007; 94, pp. 185-198.

Word count: 4454

Show less

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Partially linear models find extensive application in biometrics, econometrics, social sciences, and various other fields due to their versatility in accommodating both parametric and nonparametric elements. This study aims to establish statistical inference for the parametric component effects within these models, employing a nonparametric empirical likelihood approach. The proposed method involves a projection step to eliminate the nuisance nonparametric component and utilizes an empirical-likelihood-based technique, along with the Bartlett correction, to enhance the coverage probability of the confidence interval for the parameter of interest. This method demonstrates robustness in handling normally and non-normally distributed errors. The proposed empirical likelihood ratio statistic converges to a limiting chi-square distribution under certain regulations. Simulation studies demonstrate that this method provides better inference in terms of coverage probabilities compared to the conventional normal-approximation-based method. The proposed method is illustrated by analyzing the Boston housing data from a real study.

Details

Title

Empirical-Likelihood-Based Inference for Partially Linear Models

Author

Su, Haiyan¹; Chen, Linlin²

¹ School of Computing, Montclair State University, Montclair, NJ 07043, USA
² Department of Mathematics and Statistics, Rochester Institute of Technology, Rochester, NY 14632, USA; [email protected]

First page

162

Publication year

2024

Publication date

2024

Publisher

MDPI AG

e-ISSN

22277390

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/math12010162

ProQuest document ID

2912642879

Empirical-Likelihood-Based Inference for Partially Linear Models

Jump to:

Full text

Abstract

Details

Suggested sources