Content area

Abstract

In high-dimensional data analysis, main effects and interaction effects often coexist, especially when complex nonlinear relationships are present. Effective variable selection is crucial for avoiding the curse of dimensionality and enhancing the predictive performance of a model. In this paper, we introduce a nonlinear interaction structure into the additive quantile regression model and propose an innovative penalization method. This method considers the complexity and smoothness of the additive model and incorporates heredity constraints on main effects and interaction effects through an improved regularization algorithm under marginality principle. We also establish the asymptotic properties of the penalized estimator and provide the corresponding excess risk. Our Monte Carlo simulations illustrate the proposed model and method, which are then applied to the analysis of Parkinson’s disease rating scores and further verify the effectiveness of a novel Parkinson’s disease (PD) treatment.

Full text

Turn on search term navigation

1. Introduction

Quantile regression [1] has become a widely used tool in both empirical studies and the theoretical analysis of conditional quantile functions. Additive models offer greater flexibility by expressing linear predictors as the sum of nonparametric functions of each covariate, which generally results in lower variance compared to fully nonparametric models. However, many practical problems require the consideration of interactions between covariates, an aspect that has been extensively explored in linear and generalized linear models [2,3], where incorporating interactions has been shown to improve prediction accuracy. Despite this, there is limited research on additive quantile regression models that explicitly account for interaction structures, particularly in the context of variable selection.

While nonparametric additive quantile regression models have been extensively studied in the literature [4,5,6,7,8], with recent advancements in the analysis of longitudinal data [9] and dynamic component modeling [10]. Specifically, Ref. [9] explored a partially linear additive model for longitudinal data within the quantile regression framework, utilizing quadratic inference functions to account for within-subject correlations. In parallel, Ref. [10] proposed a quantile additive model with dynamic component functions, introducing a penalization-based approach to identify non-dynamic components. Despite these advances, the integration of interaction effects remains relatively underexplored in the context of nonparametric additive quantile regression models. Building upon the existing literature, this paper considers an additive quantile regression model with a nonlinear interaction structure.

When p, the number of main effect terms and interaction effect terms, is large, many additive methods are not feasible since their implementation requires storing and manipulating the entire O(p2)×n design matrix; thus, variable selection becomes necessary. Refs. [11,12] proposed component selection and smoothing operator (COSSO) and an adaptive version of the COSSO algorithm (ACOSSO) to fit the additive model with interaction structures, respectively. But these methods violated the heredity condition, whereas [13] took the heredity constraint into account and proposed a penalty λ(j=1pfj2+j=1pk=j+1pfjk2) on the empirical L2 norm of the main effects and interaction effects. This method shrinks the interactions, depending on the main effects that are already present in the model [9,10].

Several regularization penalties, especially Lasso, have been widely used to shrink the coefficients of main effects and interaction effects to achieve a sparse model. However, existing methods treated interaction terms and main terms similarly, which may select an interaction term but not the corresponding main terms, thus making them difficult to interpret in practice. There is a natural hierarchy among the variables in the model with interaction structures. Refs. [14,15] proposed a two-stage Lasso method to select important main and interaction terms. However, this approach is inefficient, as the solution path in the second stage heavily depends on the selection results from the first stage. To address this issue, several regularization methods [16,17,18] have been proposed that employ special reparametrizations of regression coefficients and penalty functions under the hierarchical principle. Ref. [19] introduced strong and weak heredity constraints into models with interactions. They proposed a Lasso penalty with convex constraints that produces sparse coefficients while ensuring that the strong or weak heredity is satisfied. However, existing algorithms often suffer from slow convergence, even with a moderate number of predictors. Ref. [20] proposed a group-regularized estimation method under both strong and weak heredity constraints and developed a computational algorithm that guarantees the convergence of the iterates. Ref. [21] extended the linear interaction model to a quadratic model, accounting for all two-term interactions, and proposed the Regularization Path Algorithm under the Marginality Principle (RAMP), which efficiently computes the regularization solution path while maintaining strong or weak heredity constraints.

Recent advances in Bayesian hierarchical modeling for interaction selection have been driven by innovative prior constructions. Ref. [22] developed mixture priors that link interaction inclusion probabilities to main effect strengths, automatically enforcing heredity constraints. Ref. [23] introduced structured shrinkage priors using hierarchical Laplace distributions to facilitate group-level interaction selection. Ref. [24] incorporated heredity principles into SSVS frameworks through carefully designed spike-and-slab priors. These methodological developments have enhanced the ability to capture complex dependency structures in interaction selection while preserving Bayesian advantages, with demonstrated applications across various domains, including spatiotemporal analysis.

However, all these works assumed that the effects of all covariates can be captured in a simple linear form, which does not always hold in practice. To handle this issue, several authors have proposed nonparametric and semiparametric interaction model. For example, Refs. [25,26] considered the variable selection and estimation procedure of single-index models with interactions for functional data and longitudinal data. Some recent work on semiparametric single-index models with interactions included [27,28], among others.

Driven by both practical needs and theoretical considerations, this paper makes the following contributions: (1) We propose a method for variable selection in quantile regression models that incorporates nonlinear interaction structures; (2) We modify the RAMP algorithm to extend its applicability to additive models with nonlinear interactions, which enhances its capability to adeptly handle complex nonlinear interactions while also preserving the hierarchical relationship between main effects and interaction effects throughout the variable selection process. This work addresses the challenges posed by complex nonlinear interactions while ensuring that the inherent hierarchical structure is maintained, thereby providing a robust framework for more accurate and interpretable models.

The rest of this paper is organized as follows. In Section 2, we present the additive quantile regression with nonlinear interaction structures and discuss the properties of the oracle estimator. In Section 3, we present a sparsity smooth penalized method fitted by regularization algorithm under marginality principle and derive its oracle property. In Section 4, simulation studies are provided to demonstrate the outperformance of the proposed method. We illustrate an application of the proposed method on a real dataset in Section 5, and a conclusion is presented in Section 6.

2. Additive Quantile Regression with Nonlinear Interaction Structures

Suppose that {(Yi,xi):i=1,,n} is an independent and identically distributed (iid) sample, where xi=(xi1,,xip) is a p-dimensional vector of covariates. The τth (0<τ<1) conditional quantile of Yi given xi is defined as QYi|xi(τ)=inf{t:F(t|xi)τ}, where F(·|xi) is the conditional distribution function of Yi given xi. We consider the following additive nonlinear interaction model for the conditional quantile function:

(1)QYi|xi(τ)=j=1pfj(xij)+1j<kpfjk(xij,xik),

where the unknown real-valued two-variate functions fjk(xij,xik) represent the pairwise interaction terms between xij and xik. We assume that fjk(a,b)=fkj(b,a) for all a and b and that all jk and xij[0,1] for all i and j. Let ϵi=YiQYi|xi(τ); then, ϵi satisfies P(ϵi0|xi)=τ and we may also write Yi=j=1pfj(xij)+1j<kpfjk(xij,xik)+ϵi, where ϵi is the random error. For identification, we assume the τ quantile of fj(xij) and fjk(xij,xik) for 1j<kp to be zero (see [29]).

Let {φ1,φ2,} denote the preselected orthonormal basis with respect to the Lebesgue measure on the unit interval. We center all the candidate functions, so the basis functions are centered as well. We approximate each function fj(xij) by a linear combination of the basis functions, i.e.,

fj(xij)l=1Lnφjl(xij)βjl,

where {φjl,l=1,,Ln}s are centered orthonormal bases. Let Ψi,j=(φj1(xij),,φjLn(xij)); the main terms can be expressed as fj(xij)=Ψi,jβj, where βj=(βj1,,βjLn) denotes the Ln-dimensional vector of basis coefficients for the j-th main terms. We consider the tensor product of the interaction terms fjk as a surface that can be approximated by

fjk(xij,xik)s=1Lnd=1Lnφjs(xij)φkd(xik)βjk,sd.

To simplify the expression, it is computationally efficient to re-express the surface in vector notation as fjk(xij,xik)=Φjkβjk, where Φi,jk=Ψi,jΨi,k, and βjk=vec(Γjk), where Γjk=[βjk,sd],s=1,,Ln,d=1,,Ln is the matrix with elements βjk,sd. Here, ⊗ denotes the Kronecker product. We can now express (1) as

(2)QYi|xi(τ)=j=1pΨi,jβj+1j<kpΦi,jkβjk

2.1. Oracle Estimator

For high-dimensional inference, it is often assumed that p is large, but only a small fraction of the main effects and interaction terms are present in the true model. Let M={j:fj0for1jp} and I={(j,k):1j<kpandfjk0} be the index set of nonzero main effects and interaction effects, and q=|M| and s=|I| be the cardinality of M and I, respectively. That is, we assume that the first q of βj,j=1,,p are nonzero and the remaining components are zero. Hence, we can write β0=(β01,0(pq)Ln), where β01=(β1,,βq). Further, we assume that the first s of pairwise interaction βjk corresponding to β01 are nonzero and the remaining components are zero. Thus, we write β00=(β001,0p(p1)s(s1)2Ln2), where β001=(β(12),,β(1s),,β((s1)1),,β((s1)s)). Here, β(jk) denotes the component of βjk with {(j,k):(j,k)I,jM,kM} or {(j,k):(j,k)I;jM,orkM}. Let Π=(Π1,,Πn) be an n by pLn+(p(p1)/2)Ln2 design matrix with Πi=(Ψi,Φi), where Ψi=(Ψi,1,,Ψi,p)RpLn and Φi=(Φi,12,,Φi,1p,,Φi,(p1)1,,Φi,(p1)p)Rp(p1)/2Ln2. Likewise, we write ΠAi=(ΨMi,ΦIi)RqLn+sLn2, where ΨMi is the subvector consisting of the first qLn elements of Ψi with the active covariates and ΦIi is the subvector consisting of the first sLn2 elements of Φi with the active interaction covariates. We first investigate the estimator, which is obtained when the index sets M and I are known in advance, which we refer to as the oracle estimator. Now, we consider the quantile regression with the oracle information. Let

(3)β^01,β^001=argminβ01,β0011ni=1nρτYiΨMiβ01ΦIiβ001.

The oracle estimators for β0 and β00 are β^01,0(pq)Ln and β^001,0p(p1)s(s1)2Ln2, respectively. Accordingly, the oracle estimators for nonparametric function fj(xij) and fij(xij,xik) are f^j(xij)=ΨMiβ^01 and f^j(xij,xik)=ΦIiβ^001, respectively.

2.2. Asymptotic Properties

We next present the asymptotic properties of the oracle estimators. Similar to [13], we assume that all true main effects belong to the Sobolev space of order two: l=1(βjl)2l4<C2, and impose the same requirement on each true interaction function s=1(βjk,sd)2s4<C2 and d=1(βjk,sd)2d4<C2 for each j,k,1j<kp, with C being some constant.

To establish the asymptotic properties of the estimators, the following regularity conditions are needed:

(A1). The distribution of x is absolutely continuous with density f(x), which is bounded away from 0;

(A2). The conditional distribution Fε|x(·|x) of the error ε, given x=x, has a density fε|x(·|x), which satisfies the following two conditions:

(1). supε,xfε|x(ε|x)<;

(2). There exists positive constants b1 and b2 such that infxinf|ε|<b1fε|x(ε|x)b2;

(A3). Basis function φ1,,φLn satisfies the following:

(1). Lnn1/(2r+1),r>1/2;

(2). supx[0,1]pΠ2=O(Ln) and Hn=1ni=1nΠiΠi are uniformly bounded from 0 to ;

(3). There is a vector γ=(β0,β00) such that supx[0,1]p|m(x)Πγ|=O(Lnr), where m(x)=j=1pfj(xj)+1<j<kpfjk(xj,xk) and xj=(x1j,,xnj);

(A4). q=O(nC1) for some C1<13.

It should be noticed that (A1) and (A2) are typical assumptions in nonparametric quantile regression (see [30]). Condition (3) in (A3) requires that the true component function fj(xj) and fjk(xj,xk) can be uniformly well-approximated by the basis φjl(x),l=1,,Ln and φjs(x)φkd(x),s=1,,Ln,d=1,,Ln. Finally, Condition (A4) is set to control the model size.

Theorem 1. 

Let γ01=(β01,β001). Assume that Conditions (A1)–(A4) hold. Then, the following are true: (a) 

γ^01γ012=op(n1/2Ln);

(b) 

m^(x)m(x)L2=O(n(2r1)/2(2r+1)), where m(xi)=j=1pfj(xij)+1j<kpfjk(xij,xik).

Theorem 1 summarizes the rate of convergence for the oracle estimator, which is the same as the optimal convergence rate in the additive model [30] due to the merit of the additive structure. The proof of Theorem 1 can be found in Appendix A.1.

3. Penalized Estimation for Additive Quantile Regression with Nonlinear Interaction Structures

3.1. Penalized Estimator

Model (2) contains a large number of main effects fj(xij) and interaction effects fjk(xij,xik), which will significantly increase the computational burden, especially in cases where p (the number of features) is large. It is necessary to use penalty methods to ensure computational efficiency and prevent overfitting. Applying sparsity penalties (such as Lasso or grouped Lasso) to each fj can help to automatically select important variables and reduce the complexity of the model.

However, when using a large number of basis functions to fit complex relationships, a simple sparse penalty may oversimplify the model, ignore potential patterns in the data, and introduce unstable fluctuations. Therefore, in addition to sparse penalties, it is also necessary to introduce smoothness penalties (such as second-order derivative penalties) to avoid drastic fluctuations in the function and ensure smooth and stable fitting results. This method helps to control model complexity, reduce overfitting, and improve the model’s generalization ability.

Therefore, we minimize the following penalized objective function for (β0,β00):

(4)1ni=1nρτYij=1pΨi,jβj+1j<kpΦi,jkβjk+P(f),

where

P(f)=λ1j=1pfj(xj)n2+λ2J21(fj(xj))1/2+1j<kpfjk(xj,xk)n2+λ2J21(fjk(xj,xk))1/2,

is a sparsity smooth penalty function, and the first and second terms in the penalty function penalize the main and interaction effects, respectively. We employ fn2=1ni=1nfi2 as a sparsity penalty, which encourages sparsity at the function level, and J21(f)=f2(x)dx as a roughness penalty, which controls the model’s complexity by ensuring that the estimated function remains smooth while fitting the data, thus preventing high-frequency fluctuations caused by an excessive number of basis functions. The two tuning parameters λ1,λ20 control the amount of penalization.

It can be written that J21(fj(xj))=βjΩjβj, where Ωj is a band diagonal matrix of known coefficients and the (l1,l2)th element can be expressed as Ωj,l1,l2=φjl1(xj)φjl2(xj)dx,l1,l2{1,,Ln}—see [31] for details. According to [32], the penalty J21(fjk(xj,xk)) can be represented as

βjk(ΛjjILn+ILnΛkk)βjkβjkΛjkβjk,

where Λjj=fj2dxj and Λkk=fk2dxk, with fj and fk being the second derivatives of f(xj,xk) with respect to xj and xk, respectively. Hence, the penalty function can be rewritten as

P(f)=λ1j=1pβjMjβj+1j<kpβjkKjkβjk,

where Mj=1nψjψj+λ2Ωj and Kjk=1nϕjkϕjk+λ2Λjk, with ψj denoting the n×Ln matrix with the (i,k)-th entry given by φjk(xij), and ϕjk representing the n×Ln matrix with the (i,l)-th entry given by φjl(xij)φkl(xik). Similar to [33], we decompose Mj=RjRj and Kjk=QjkQjk for some quadratic Ln×Ln matrix Rj and Ln2×Ln2 matrix Qjk. Then, Model (4) can be represented as

(5)1ni=1nρYij=1pΨi,jβj1j<kpΦi,jkβjk+λ1j=1pRjβj2+1j<kpQjkβjk2.

However, the above penalty treats the main and interaction effects similarly. That is, an entry of an interaction into the model generally adds more predictors than an entry of a main effect, which usually demands high computational cost and is hard to interpret for more interactions. To deal with this difficulty, we propose to add a set of heredity restrictions to produce sparse interaction models that an interaction only be included in a model if one or both variables are marginally important.

Naturally, TS2. Our target is to estimate S and T from the data consistently or sign consistently, like Th1 in [21]

There are two types of heredity restrictions, which iare called strong and weak hierarchy:

Stronghierarchy:β^jk0andβ^jβ^k0

Weakhierarchy:β^jk0andmax{β^j0,β^k0}0.

To ensure the strong and weak hierarchy conditions for β^jk and β^j,β^k, we structure the following penalties to enforce the strong hierarchy:

(6)QP(βj,βjk)=1ni=1nρτ(Yij=1pΨi,jβj1j<kpΦi,jkβjk)+λ1j=1pRjβj2+1k<jpIβjk0(Rjβj2+Rkβk2)1/2+1k<jpQjkβjk2.

and weak hierarcy:

(7)QP(βj,βjk)=1ni=1nρτ(Yij=1pΨi,jβj1j<kpΦi,jkβjk)+λ1j=1pRjβj2+1k<jpIβjk0min(Rjβj2,Rkβk2)1/2+1k<jpQjkβjk2.

These structures ensure hierarchy through the introduction of an indicative penalty. Specifically, the indicator function Iβjk0 activates the penalty only when βjk0, thus enforcing the strong condition. This ensures that both corresponding main effects must be present if the interaction term is non-zero. Additionally, the term min(Rjβj2,Rkβk2) ensures that the penalty is applied to the smaller of the two norms, thereby enforcing the weaker condition. This means that at least one of the corresponding main effects must be present for an interaction term to be included.

3.2. Algorithm

The regularization path algorithm under the marginality principle (RAMP, [21]) is a method for variable selection in high-dimensional quadratic regression models, designed to maintain the hierarchical structure between main effects and interaction effects during the selection process. However, the RAMP algorithm primarily focuses on linear interaction terms and considers interactions of variables with themselves, making it unsuitable for directly handling nonlinear interaction terms.

To address this limitation, we modified the RAMP algorithm, extending its application to additive models that include nonlinear interaction models. Through this enhancement, we can not only handle complex nonlinear interactions but also continue to preserve the hierarchical structure between main effects and interaction effects during variable selection. Specifically, the detailed steps of the modified algorithm are as follows.

Let P={1,2,,p} and Q={(j,k):1jkp} be the index set for main effects and interaction effects, respectively. For an index set AP, define A2=AA={(j,k):jk;j,kA}Q, and AP={(j,k):jk;orj,kA}Q.

We fix a sequence of values {λ1,s}s=1S between λ1,max and λ1,min. Following [21], we set λ1,max=n1max|Xy| and λ1,min=ζλ1,max with some small ζ>0. At step s1, we denote the current active main effects set as Ps1, and the interaction effects are set as Qs1. Define Hs1 as the parent set of Qs1, which contains the main effects that have at least one interaction effect in Qs1. Set Hs1c=PHs1. Then, the algorithm is as detailed below:

Step 1. Generate a decreasing sequence λ1,max=λ1,1>λ1,2>,λ1,S=λ1,min, and λ2,s=λ1,s2 for s=1,,S.

We use the warm start strategy, where the solution for s1 is used as a starting value for s;

Step 2. Given Ps1,Qs1,Hs1, add the possible interactions among main effects in Ps1 to the current model. Then, we minimize the penalty loss function

1ni=1nρτYijPΨi,jβj(j,k)Ps12Φi,jkβjk+λ1,sjIjHs1cRjβj2+j,kI(j,k)Ps12Qjkβjk2

where the penalty is imposed on the candidate interaction effects and Hs1c, which contains the main effects not enforced by the strong heredity constraint. The penalty encourages smoothness and sparsity in the estimated functional components;

Step 3. Record Ps,Qs, and Hs according to the above solution. Add the corresponding effects from Qs into Ps;

Step 4. Calculate the quantile estimation based on the current model

(β^j(s),β^jk(s))=argmin1ni=1nρτYijPsΨi,jβj(j,k)QsΦi,jkβjk

Step 5. Repeat steps 2–4 S times and determine the active sets Ms and Is according to λs, which minimizes GIC.

Ref. [34] proposed the following GIC criterion:

GIC(λ1s)=logi=1nρτ(YijPsΨi,jβ^j(s)(j,k)QsΦi,jkβ^jk(s))+Cnκsdfλ,

where β^j(s) and β^jk(s) are the penalized estimators obtained by step 4, κs=|Ps|+|Qs| is the cardinality of the index set of nonzero coefficients in main effects and interaction effects, dfλ represents the degrees of freedom of the model, and Cn is a sequence of positive constants diverging to infinity as n increases. We take Cn=log(log(n)) in our simulation studies and real data analysis.

Following the same strategy, under the weak heredity condition, we use the set Ps1P instead of Ps12 and solve the following optimization problem:

1ni=1nρτYijPΨi,jβj(j,k)Ps1PΦi,jkβjk+λ1,sjIjHs1cRjβj2+(j,k)I(j,k)Ps1PQjkβjk2,

with respect to βj and βjk. That is, an interaction effect can enter the model for selection if at least one of its parents are selected in a previous step.

3.3. Asymptotic Theory

Let Ξn(λ) be the set of local minima of QP(βj,βjk). The following result shows that, with probability approaching one, the oracle estimator belongs to the set Ξn(λ).

Theorem 2. 

Assume that conditions (A1)–(A4) are satisfied and consider the penalized function under strong heredity. Let γ^=(β^0,β^00) be the oracle estimator. If λ1=o(1) and n1Ln3λ10 as n, then

P ( γ ^ Ξ n ( λ ) ) 1 , as n .

Remark 1. 

The penalty term λ1 decays to zero as n, allowing the oracle estimator to dominate. The rate condition n1Ln3λ10 guarantees that the approximation error from basis expansion Ln is controlled by the penalty strength.

The proof of Theorem 2 can be found in Appendix A.2.

Define Rn(m)=1ni=1nρτ(Yim(xi)) and R(m)=E(ρτ(Ym(x))) as the empirical risk and predictive risk of quantile loss, respectively. And

Mn={m:m(x)=j=1pψj(xj)βj+1<j<kpϕjk(xj,xk)βjk:E(ψj)=0,E(ϕjk)=0,E(ψj2)=1,E(ϕjk2)=1}

is a functional class. Let m˜=argminmMnR(m) denote the predictive oracle, i.e., the minimizer of the predictive risk over Mn, and m^ represent the minimization of Equation (6) over Mn. Following [35], we say that an estimator m^ is persistent (risk consistent) relative to a class of functions Mn if R(m^)R(m˜)P0. Then, we have the following result.

Theorem 3. 

Under conditions (A1)–(A4), if p=o(n), then for some constants t>0,

P R ( m ^ ) R ( m ˜ ) ( 2 2 t 2 + 10 t 2 / 3 ) n ( 2 r 1 ) / 2 ( 2 r + 1 ) 1 exp ( n t 2 ) .

The above theorem establishes the convergence rate in terms of the excess risk of the estimator, which shows the prediction accuracy and consistency of the proposed estimator. The proof of Theorem 3 can be found in Appendix A.3.

4. Simulation Studies

We conduct extensive Monte Carlo simulations to assess the performance of our proposed method in comparison with two established approaches, hirNet [19] and RAMP [21], across five specific quantile levels: τ0.1,0.25,0.5,0.75,0.90. In each simulation scenario, we generate 500 independent training datasets with two sample sizes, n=300 and n=100, and p=15 main effects, resulting in a total of 15+15×14/2=121 potential terms (15 main effects and 105 pairwise interactions). All methods are implemented using pseudo-spline basis functions to ensure consistent comparison. For regularization parameters, we adopt the relationship λ2=λ12 based on [13], which provides superior empirical performance.

The covariates xij are generated independently from a uniform distribution U(0,1). We specify five nonlinear main effect functions that are standardized before model fitting:

f1(x)=10x,f2(x)=11+x+20x,

f3(x)=20sin(x),f4(x)=20exp(x),f5(x)=10x2.

Hence, each fj is standardized and the interaction functions are generated by multiplying together the standardized main effects,

fjk(xj,xk)=fj(xj)×fk(xk),1j<kp.

The response variables are generated through quantile-specific models that incorporate either strong or weak heredity constraints. Under strong heredity, interactions are only included when all parent main effects are active, while weak heredity requires just one parent main effect.

Example 1. 

Strong heredity. Building on the quantile-specific sparsity framework established by [36], we incorporate strong heredity constraints into our modeling approach. This integration ensures two key aspects: (1) the set of active predictors varies depending on the quantile, and (2) all selected variables preserve strict hierarchical relationships. The resulting nonlinear data-generating process is as follows:

τ=0.1 (Sparse):

Y i = f 1 ( x i 1 ) + f 2 ( x i 2 ) + f 3 ( x i 3 ) + ε i .

τ=0.25,0.50,0.75 (Dense):

Y i = f 1 ( x i 1 ) + f 2 ( x i 2 ) + f 3 ( x i 3 ) + f 4 ( x i 4 ) + f 5 ( x i 5 ) + f 12 ( x i 1 , x i 2 ) + f 13 ( x i 1 , x i 3 ) + ε i .

τ=0.9 (Sparse):

Y i = f 1 ( x i 1 ) + f 3 ( x i 3 ) + f 5 ( x i 5 ) + ε i .

Example 2. 

Weak heredity. In contrast to strong heredity, we also incorporate weak heredity constraints into our modeling approach by extending the quantile-specific sparsity framework. The resulting nonlinear data-generating process is as follows:

τ0.1,0.9 (Sparse):

Y i = f 1 ( x i 1 ) + f 2 ( x i 2 ) + f 3 ( x i 3 ) + ε i .

τ=0.25,0.50,0.75 (Dense):

Y i = f 1 ( x i 1 ) + f 2 ( x i 2 ) + f 3 ( x i 3 ) + f 14 ( x i 1 , x i 4 ) + f 15 ( x i 1 , x i 5 ) + ε i .

To assess robustness across different error conditions, we consider four error distributions:

1.. Gaussian: The error terms εi follow a normal distribution with mean 0 and variance 1, i.e., εiN(0,1);

2.. Heavy-tailed: The error terms εi follow a Student’s t-distribution with 3 degrees of freedom, i.e., εit(3);

3.. Skewed: The error terms εi follow a chi-squared distribution with 2 degrees of freedom, i.e., εiχ(2);

4.. Heteroscedasticity [37]: The error terms εi are heteroscedastic and modeled as follows:

τ=0.1:εi=(xi1+xi2)ei;

τ0.25,0.50,0.75:εi=(xi1+xi5)ei;

τ=0.9(strongheredity):εi=(xi1+xi3)ei;

τ=0.9(weakheredity):εi=(xi1+xi4)ei.

Recall that M={j:fj0} and I={(j,k):fjk0} are the active sets of main and interaction effects. For each example, we run R=500 Monte Carlo simulations and denote the estimated subsets as M^(r) and I^(r), r=1,,R. We evaluate the performance on variable selection based on the following criteria:

(1). True positive rate and false positive rate of main effects (mTPR, mFDR);

(2). True positive rate and false positive rate of interaction effects (iTPR, iFDR);

(3). Main effects coverage percentage (Pm): R1r=1RI(MM^(r));

(4). Interaction effects coverage percentage (Pi): R1r=1RI(II^(r));

(5). Coverage rate of a single variable in the main effects and interaction effects;

(6). Model size (size): R1r=1R(|M^(r)|+|I^(r)|);

(7). Root mean squared error (RMSE): R1r=1R1ni=1n(ρτ(YiQ^τ(r)(Yi|xi)))2, where Q^τ(r)(Yi|xi) is obtained by the estimated coefficients β^j(r)M^(r) and β^jk(r)I^(r); R is the number of iterations.

The results under the strong heredity for n=100 and n=300 are shown in Table 1 and Table 2, respectively. In the context of small sample data, the performance of the proposed method is a little worse than the other two methods in terms of true positive rates and coverage percentages of a single variable in the main and interaction effects. This is because the high complexity and degrees of freedom in nonparametric additive interaction models make them more prone to overfitting, leading to an inaccurate reflection of the underlying data structure and, consequently, a lower true positive rate (TPR) in variable selection. In contrast, methods like hirNet and RAMP, by adopting regularization and relatively simple linear structures, can more effectively prevent overfitting. Although this may come at the cost of some loss in model interpretability.

In large sample scenarios, our proposed method not only demonstrates higher true positive rates and broader coverage of main effects and interaction effects but also significantly reduces false positive rates and generates a more parsimonious model. These results indicate that the method can effectively handle nonlinear additive interaction models, accurately capturing complex relationships in the data while maintaining a low false discovery rate and higher model simplicity.

The results from weak heredity are shown in Table 3 and Table 4. In small sample cases, the hirNet method is effective at capturing both main and interaction effects but suffers from a high rate of false selections, leading to increased model complexity. On the other hand, the RAMP method selects fewer variables, reducing false selections but also under-selecting important variables, which negatively affects predictive accuracy. In contrast, the proposed method in this paper offers a better balance, with higher precision and stability in selecting both main and interaction effects, while significantly reducing false selections. This method results in a moderate model size and the lowest prediction error, demonstrating an optimal trade-off between model complexity and predictive performance.

In large sample cases, the proposed method excels in identifying both main and interaction effects, with the main effect true positive rate (mTPR) and interaction effect true positive rate (iTPR) approaching 1.000, indicating a nearly perfect identification of true positives. This method also maintains low mFDR and iFDR, minimizing false positives. In contrast, although the hirNet methods perform better in terms of mTPR and iTPR, they tend to include more noisy variables in the model, resulting in higher RMSE values and lower predictive accuracy.

From the perspective of quantile-specific sparsity, the proposed method demonstrates varying levels of sparsity across different quantiles (e.g., τ0.1,0.25,0.5,0.75,0.90). At τ=0.1 and τ=0.9 quantiles, fewer variables are selected, indicating higher sparsity, while, at τ=0.25,0.50,0.9, more variables and interaction effects are included, showing that the model adapts its selection based on the distribution of the response variable. Simulation results further show that, when the τ=0.9, DGP consists only of main effects and our method achieves near-perfect variable selection: the TPR for main effects approaches 1 (all true effects retained) and the FDR is close to 0 (almost no false positives), leading to model sizes nearly identical to the true DGP and minimal prediction errors, as reflected in the low RMSE. These results are consistent with [38]’s work, which provides theoretical support for this outcome. Moreover, the RMSE patterns suggest that the method maintains strong predictive performance even when more variables are selected, thereby confirming the effectiveness of the quantile-specific sparsity approach.

5. Applications

Parkinson’s disease (PD) is a common degenerative disease of the nervous system that leads to shaking, stiffness, and difficulty with walking, balance, and coordination. PD symptom monitoring with the Unified Parkinson’s Disease Rating Scale (UPDRS) is very important but costly and logistically inconvenient due to the requirement of patients’ presence in clinics and time-consuming physical examinations by trained medical staff. A new treatment technology based on the rapid and remote replication of UPDRS assessments with more than a dozen biomedical voice measurement indicators has been proposed recently. The dataset consists of 5875 UPDRS from patients with early-stage PD and 16 corresponding biomedical voice measurement indicators. Reference [39] analyzed a PD dataset and mapped the clinically relevant properties from speech signals of PD patients to UPDRS using linear regression and nonlinear regression models, with the aim to verify the feasibility of frequent, remote, and accurate UPDRS tracking and effectiveness in telemonitoring frameworks that enable large-scale clinical trials into novel PD treatments. However, existing analyses of the UPDRS scores from the 16 voice measures with a model including only the main effects may fail to reflect the relationship between the UPDRS and 16 voice measures. For example, fundamental frequency and amplitude can work together on voice and thus may have an interactive effect on UPDRS. In this situation, it is necessary to consider the model with pairwise interactions. Furthermore, the effect on UPDRS from these indicators may be complicated and needs to be investigated further, including their nonlinear dependency, even nonparametric relation, so that both main effect and interactive effect may exist. By noting the typical skewness of the distribution of the UPDRS and the asymmetry of its τ-quantile-level and (1τ)-quantile-level score (0<τ<1) in Figure 1 (left), the distribution of the UPDRS tends to asymmetric and heavy-tailed, so that one should examine how any quantile, including the median (rather than the mean) of the conditional distribution, is affected by changes in the 16 voice measures and then make a reliable decision about the new treatment technology. Therefore, the proposed additive quantile regression with nonlinear interaction structure is applied for the analysis of the data.

The 16 biomedical voice measures in the PD data include several measurements of variation in fundamental frequency (Jitter, Jitter (Abs), Jitter:RAP, Jitter:PPQ5, and Jitter:DDP), several measurements of variation in amplitude (Shimmer, Shimmer (dB), Shimmer:APQ3, Shimmer:APQ5, Shimmer:APQ11, and Shimmer:DDA), two measurements of the ratio of noise to tonal components in the voice (NHR and HNR), a nonlinear dynamical complexity measurement (RPDE), signal fractal scaling exponent (DFA), and a nonlinear measure of fundamental frequency variation (PPE). Our interest is to identify the biomedical voice measurements and their interactions that may effectively affect the UPDRS (Unified Parkinson Disease Rating Scale) in Parkinson’s disease patients. Figure 1 (right) shows the correlations among 16 biomedical voice measurements. As we can see from Figure 1 (right), except for variable DFA, there are strong correlations among the covariates, which makes variable selection more challenging. Therefore, the proposed additive quantile regression model with nonlinear interaction structures may provide a complete picture on how those factors and their interactions affect the distribution of UPDRS.

We first randomly generated 100 partitions of the data into training and test sets. For each training set, we select 5475 observations as training set and the remaining 400 observations as the test set. Normalization is conducted for the dataset. Also, quantile regression is used to fit the model selected by the proposed method. For comparison, we also use least square to fit the model selected by hirNet and RAMP on the training set. Finally, we evaluate the performance of these models using the fixed active sets on the corresponding test sets.

Table 5 summarizes the covariates selected by different methods. The proposed quantile-specific method reveals distinct covariate patterns across different UPDRS severity levels. At lower quantiles (τ=0.1 and τ=0.25), the model highlights DFA and its interactions (e.g., HNRDFA, DFAPPE), suggesting that these acoustic features may be particularly relevant in the early stages of the disease. The middle quantile (τ=0.5) incorporates additional speech markers (APQ11, RPDE), which align with conventional methods, while higher quantiles (τ=0.75 and τ=0.9) progressively simplify to core features (HNR, dB), indicating reduced covariate dependence in advanced stages. This dynamic selection demonstrates how quantile-specific modeling can capture stage-dependent biological mechanisms while maintaining model parsimony, selecting just 3–5 main effects and 1–2 interactions per quantile, compared to hitNet’s fixed 12-variable approach. The shifting importance of HNR and DFA interactions across quantiles particularly underscores their complex, nonlinear relationships with symptom progression.

Figure 2 and Figure 3 show the estimation results of main effects and interaction effects at τ=0.50. The solid lines represent the estimated effects, while the dashed lines indicate the 95% confidence intervals obtained through a simulation-based approach. Specifically, for the chosen model, we conducted 100 repeated simulations on the dataset to derive point-wise confidence intervals for each covariate’s effect.

From Figure 2, it can be observed that, as APQ11 increases, the UPDRS score initially rises and then declines, peaking at a specific value. This inverted U-shaped relationship suggests that moderate amplitude variations are linked to more severe symptoms. For HNR, the UPDRS score decreases as the noise ratio increases, indicating that higher noise levels correlate with disease worsening. The confidence intervals around these trends confirm their statistical significance across the simulated datasets. RPDE shows a U-shaped curve, meaning that intermediate complexity levels are associated with more severe symptoms, while extreme values (high or low) indicate milder symptoms. The confidence intervals further support this nonlinear relationship, demonstrating its robustness to the data. DFA has a negative correlation with the UPDRS score: as fundamental frequency variability increases, the UPDRS score decreases, suggesting that greater variability is linked to symptom relief. The narrow confidence intervals around this trend highlight its consistency across the simulations.

Figure 3 illustrates the interaction effect between HNR and RPDE on the UPDRS score. This shows that, as HNR increases, the impact on the UPDRS score becomes more negative, especially when RPDE is at lower values, indicating a worsening in symptoms with higher noise to harmonic ratio. Conversely, for higher values of RPDE, the interaction effect stabilizes, suggesting less influence on the UPDRS score. The plot reveals a valley-like structure around the center where both HNR and RPDE are near zero, signifying minimal interaction effect. Notably, the most significant interactions occur at extreme values of either HNR or RPDE, highlighting the complex, nonlinear relationship between these variables and Parkinson’s disease symptom severity. The estimation results of the main effects and interaction effects at τ=0.1,0.25,0.75,0.9 are presented in Appendix B.

From the results in Appendix B, it can be seen that, although the main effects selected at τ=0.25 and τ=0.75 are the same, their impacts on UPDRS scores differ due to nonlinear relationships or complex interactions between variables. This highlights the advantage of quantile regression, which not only captures the average trend of the dependent variable but also reveals how these effects vary across different quantiles. This method provides a more comprehensive understanding of the complex relationships between variables and helps to analyze the specific effects of variables on UPDRS under different conditions.

From the perspective of quantile-specific sparsity, the proposed method demonstrates varying levels of covariate selection across different quantiles. At lower quantiles (τ=0.1 and τ=0.25), the model tends to select fewer covariates, focusing primarily on HNR, DFA, and PPE, which suggests a higher degree of sparsity. As the quantile increases to τ=0.50, the model selects more covariates, including APQ11 and RPDE, indicating a reduction in sparsity. At higher quantiles (τ=0.75 and τ=0.9), the model again shows increased sparsity by selecting only a few key covariates such as HNR, DFA, and PPE. This pattern reflects the adaptability of the proposed method in capturing the specific characteristics of the data distribution at different quantiles, thereby achieving optimal sparsity tailored to each quantile level.

Table 6 compares the average RMSE and model sizes across 100 datasets for three methods: the proposed quantile-based approach (evaluated at τ=0.1,0.25,0.5,0.75,0.9), RAMP, and hirNet. The results show that our method outperforms the others in predictive accuracy, particularly at τ=0.75, where it achieves the lowest RMSE of 0.858, outperforming RAMP (0.878) and hirNet (0.887). Additionally, the proposed method uses much smaller models, with only 2.3–5.0 variables, compared to hirNet’s larger model with 18.39 variables. This reduction in model size highlights the method’s efficiency without sacrificing predictive power. The method also performs consistently well across different quantiles, with a particularly strong result at τ=0.25 (RMSE = 0.873), though the RMSE at τ=0.5 is slightly higher (1.00), showing some variation in performance across quantiles. Furthermore, models that include interaction terms perform better than those with only main effects, emphasizing the importance of interaction effects in the model. In conclusion, the proposed method strikes an optimal balance between prediction accuracy, model simplicity, and computational efficiency, making it ideal for applications that require both interpretability and strong predictive performance.

6. Conclusions

We explore the variable selection for additive quantile regression with interactions. We fit the model using the sparsity smooth penalty function and add the regularization algorithm under the marginality principle to the additive quantile regression with interactions, which can select main and interaction effects simultaneously while keeping either the strong or weak heredity constraints. We demonstrate theoretical properties of the proposed method for additive quantile regression with interactions. Simulation studies demonstrate good performance of the proposed model and method.

Also, we applied the proposed method to the Parkinson’s disease (PD) data; our method successfully validates the important finding in the literature that frequent, remote, and accurate UPDRS tracking as a novel PD treatment could be effective in telemonitoring frameworks for PD symptom monitoring and large-scale clinical trials.

Author Contributions

Y.B.: Conceptualization, Formal analysis, Methodology, Software, Writing—original draft, Data curation; J.J.: Formal analysis, Methodology, Writing—review and editing; M.T.: Supervision, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The dataset used in this study is publicly available at https://www.worldbank.org/en/home (accessed on 1 June 2023).

Conflicts of Interest

The authors declare no conflictw of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1 (Left) Histogram and density curve of UPDRS; (right) Correlation between voice measurements.

View Image -

Figure 2 Estimated main effects for the APQ11, HNR, RPDE, and DFA variables at τ=0.5.

View Image -

Figure 3 Estimated interaction term for the HNR and RPDE variables at τ=0.5.

View Image -

Selection and estimation results for strong heredity with n=100.

Method Main Inter
P m x 1 x 2 x 3 x 4 x 5 mTPR mFDR P i x 1 x 2 x 1 x 3 iTPR iFDR Size RMSE
N ( 0 , 1 )
h i r N e t 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.045 1.000 1.000 1.000 1.000 0.710 12.140 6.056
R A M P 0.920 1.000 1.000 1.000 1.000 0.920 0.984 0.046 1.000 1.000 1.000 1.000 0.382 8.400 7.200
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 - - 1.000 0.038 - - - - - 3.420 1.030
p r o p o s e d ( τ = 0.25 ) 0.400 1.000 1.000 1.000 0.920 0.400 0.864 0.068 0.340 0.920 0.920 0.350 0.426 7.580 4.251
p r o p o s e d ( τ = 0.50 ) 0.700 1.000 1.000 1.000 0.980 0.700 0.936 0.052 0.940 0.980 0.940 0.960 0.200 7.360 2.919
p r o p o s e d ( τ = 0.75 ) 0.640 1.000 1.000 1.000 1.000 0.640 0.928 0.021 0.980 0.980 1.000 0.990 0.232 7.320 3.216
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 - 1.000 - 1.000 1.000 0.044 - - - - - 3.920 1.042
t ( 3 )
h i r N e t 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.110 1.000 1.000 1.000 1.000 0.727 12.960 7.392
R A M P 0.900 1.000 1.000 1.000 1.000 0.900 0.980 0.050 1.000 1.000 1.000 1.000 0.363 8.300 8.978
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 - - 1.000 0.056 - - - - - 3.480 2.994
p r o p o s e d ( τ = 0.25 ) 0.380 1.000 1.000 1.000 0.940 0.380 0.864 0.056 0.520 0.880 0.880 0.550 0.388 7.440 7.718
p r o p o s e d ( τ = 0.50 ) 0.620 1.000 1.000 1.000 1.000 0.620 0.924 0.057 0.940 0.960 0.960 0.960 0.213 7.360 6.033
p r o p o s e d ( τ = 0.75 ) 0.660 1.000 1.000 1.000 1.000 0.660 0.932 0.025 1.000 1.000 1.000 1.000 0.253 7.460 6.146
p r o p o s e d ( τ = 0.9 ) 0.960 0.960 - 1.000 - 1.000 0.986 0.019 - - - - - 3.480 2.827
χ 2 ( 2 )
h i r N e t 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.128 1.000 1.000 1.000 1.000 0.735 13.300 8.663
R A M P 0.860 1.000 1.000 1.000 1.000 0.860 0.972 0.061 1.000 1.000 1.000 1.000 0.425 8.680 10.217
p r o p o s e d ( τ = 0.1 ) 0.980 0.980 1.000 1.000 - - 0.993 0.032 - - - - - 3.160 3.931
p r o p o s e d ( τ = 0.25 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 0.1000 0.820 0.820 0.100 0.411 7.180 7.638
p r o p o s e d ( τ = 0.50 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 0.940 1.000 0.940 0.000 0.210 7.260 8.583
p r o p o s e d ( τ = 0.75 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 0.180 0.307 7.640 8.942
p r o p o s e d ( τ = 0.1 ) 0.860 0.860 - 1.000 - 1.000 0.953 0.046 - - - - - 3.900 6.049
σ ( x i ) e i
h i r N e t 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.031 1.000 1.000 1.000 1.000 0.717 12.240 6.199
R A M P 0.900 1.000 1.000 1.000 1.000 0.900 0.980 0.057 1.000 1.000 1.000 1.000 0.411 8.600 7.441
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 - - 1.000 0.026 - - - - - 3.380 1.306
p r o p o s e d ( τ = 0.25 ) 0.404 1.000 1.000 1.000 0.957 0.404 0.872 0.072 0.914 0.936 0.978 0.957 0.403 7.914 2.893
p r o p o s e d ( τ = 0.50 ) 0.640 1.000 1.000 1.000 0.980 0.6400 0.924 0.057 0.940 0.980 0.940 0.960 0.219 7.380 2.355
p r o p o s e d ( τ = 0.75 ) 0.660 1.000 1.000 1.000 1.000 0.660 0.932 0.029 0.980 0.980 1.000 0.990 0.232 7.380 2.706
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 - 1.000 - 1.000 1.000 0.063 - - - - - 4.080 1.344

Selection and estimation results for strong heredity with n=300.

Method Main Inter
P m x 1 x 2 x 3 x 4 x 5 mTPR mFDR P i x 1 x 2 x 1 x 3 iTPR iFDR Size RMSE
N ( 0 , 1 )
h i r N e t 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.695 11.560 6.158
R A M P 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.629 10.840 2.058
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 - - 1.000 0.000 - - - - - 3.000 0.978
p r o p o s e d ( τ = 0.25 ) 1.000 1.000 1.000 1.000 1.000 1.00 1.000 0.000 1.000 1.000 1.000 1.000 0.091 7.200 2.520
p r o p o s e d ( τ = 0.50 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.000 7.000 2.472
p r o p o s e d ( τ = 0.75 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.038 7.080 1.120
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 - 1.000 - 1.000 1.000 0.000 - - - - - 3.040 0.958
t ( 3 )
h i r N e t 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.696 11.580 8.741
R A M P 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.635 11.320 4.641
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 - - 1.000 0.000 - - - - - 3.020 3.765
p r o p o s e d ( τ = 0.25 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.074 7.160 4.684
p r o p o s e d ( τ = 0.50 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.000 7.000 4.664
p r o p o s e d ( τ = 0.75 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.029 7.060 4.786
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 - 1.000 - 1.000 1.000 0.000 - - - - - 3.060 3.658
χ 2 ( 2 )
h i r N e t 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.698 11.660 9.979
R A M P 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.375 1.000 1.000 1.000 1.000 0.646 11.660 5.843
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 - - 1.000 0.000 - - - - - 3.000 4.913
p r o p o s e d ( τ = 0.25 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.056 7.120 5.702
p r o p o s e d ( τ = 0.50 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.029 7.060 5.777
p r o p o s e d ( τ = 0.75 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.056 7.120 5.787
p r o p o s e d ( τ = 0.9 ) 0.940 0.940 - 1.000 - 1.000 0.980 0.000 - - - - - 3.100 6.053
σ ( x i ) e i
h i r N e t 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.696 11.660 6.282
R A M P 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.100 1.000 1.000 1.000 1.000 0.625 10.900 2.181
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 - - 1.000 0.000 - - - - - 3.040 1.346
p r o p o s e d ( τ = 0.25 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.090 7.200 2.777
p r o p o s e d ( τ = 0.50 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.010 7.020 2.693
p r o p o s e d ( τ = 0.75 ) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 0.010 7.020 2.782
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 - 1.000 - 1.000 1.000 0.000 - - - - - 3.080 1.357

Selection and estimation results for weak heredity with n=100.

Method Main Inter
P m x 1 x 2 x 3 mTPR mFDR P i x 1 x 4 x 1 x 5 iTPR iFDR Size RMSE
N ( 0 , 1 )
h i r N e t 0.980 1.000 1.000 1.000 1.000 0.446 1.000 1.000 1.000 1.000 0.781 14.58 31.954
R A M P 0.060 1.000 0.260 0.240 0.500 0.050 1.000 1.000 1.000 1.000 0.619 7.140 67.840
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 1.000 0.047 - - - - - 3.500 1.067
p r o p o s e d ( τ = 0.25 ) 0.340 1.000 0.580 0.480 0.686 0.507 0.720 1.000 0.720 0.860 0.505 7.860 48.135
p r o p o s e d ( τ = 0.50 ) 0.320 1.000 0.520 0.500 0.673 0.504 0.740 1.000 0.740 0.870 0.462 7.480 42.688
p r o p o s e d ( τ = 0.75 ) 0.340 1.000 0.520 0.520 0.680 0.507 0.780 1.000 0.780 0.890 0.476 7.780 41.641
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 1.000 1.000 1.000 0.062 - - - - - 4.000 1.058
t ( 3 )
h i r N e t 1.000 1.000 1.000 1.000 1.000 0.450 1.000 1.000 1.000 1.000 0.788 14.920 32.742
R A M P 0.060 1.000 0.280 0.240 0.506 0.037 1.000 1.000 1.000 1.000 0.619 7.140 69.608
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 1.000 0.006 - - - - - 7.060 1.710
p r o p o s e d ( τ = 0.25 ) 0.340 1.000 0.500 0.520 0.673 0.502 0.760 1.000 0.760 0.880 0.443 7.380 46.631
p r o p o s e d ( τ = 0.50 ) 0.360 1.000 0.540 0.540 0.693 0.497 0.720 1.000 0.720 0.860 0.459 7.440 32.514
p r o p o s e d ( τ = 0.75 ) 0.360 1.000 0.560 0.540 0.700 0.492 0.720 1.000 0.720 0.860 0.679 7.680 52.425
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 1.000 1.000 1.000 0.010 - - - - - 7.000 1.765
χ 2 ( 2 )
h i r N e t 0.980 1.000 1.000 1.000 1.000 0.468 1.000 1.000 1.000 1.000 0.794 15.360 33.515
R A M P 0.040 1.000 0.240 0.220 0.486 0.087 1.000 1.000 1.000 1.000 0.628 7.300 69.560
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 1.000 0.051 - - - - - 6.940 2.821
p r o p o s e d ( τ = 0.25 ) 0.280 1.000 0.500 0.400 0.633 0.515 0.620 0.980 0.620 0.810 0.683 7.280 41.928
p r o p o s e d ( τ = 0.50 ) 0.340 1.000 0.540 0.460 0.666 0.504 0.640 0.980 0.640 0.810 0.480 7.380 42.535
p r o p o s e d ( τ = 0.75 ) 0.360 1.000 0.580 0.500 0.693 0.500 0.680 0.980 0.680 0.830 0.522 7.920 58.314
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 1.000 1.000 1.000 0.011 - - - - - 6.166 2.929
σ ( x i ) e i
h i r N e t 1.000 1.000 1.000 1.000 1.000 0.444 1.000 1.000 1.000 1.000 0.784 14.700 32.115
R A M P 0.000 0.300 0.380 0.120 0.266 0.259 1.000 1.000 0.300 0.650 0.821 8.360 13.755
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 1.000 0.124 - - - - 7.775 0.681
p r o p o s e d ( τ = 0.25 ) 0.333 1.000 0.562 0.500 0.687 0.505 0.750 1.000 0.750 0.875 0.478 7.687 24.665
p r o p o s e d ( τ = 0.50 ) 0.354 1.000 0.604 0.479 0.694 0.502 0.750 1.000 0.750 0.875 0.481 7.750 26.570
p r o p o s e d ( τ = 0.75 ) 0.354 1.000 0.625 0.520 0.715 0.500 0.791 1.000 0.791 0.895 0.500 8.083 29.956
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 1.000 1.000 1.000 0.013 - - - - - 6.560 0.760

Selection and estimation results for weak heredity with n=300.

Method Main Inter
P m x 1 x 2 x 3 mTPR mFDR P i x 1 x 4 x 1 x 5 iTPR iFDR Size RMSE
N ( 0 , 1 )
h i r N e t 1.000 1.000 1.000 1.000 1.000 0.401 0.160 0.900 0.160 0.530 0.047 19.600 32.574
R A M P 1.000 1.000 0.700 0.720 1.000 0.000 1.000 1.000 1.000 1.000 0.695 9.360 68.139
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 1.000 0.000 - - - - - 3.620 0.908
p r o p o s e d ( τ = 0.25 ) 1.000 1.000 1.000 1.000 1.000 0.400 1.000 1.000 1.000 1.000 0.107 7.280 3.944
p r o p o s e d ( τ = 0.50 ) 1.000 1.000 1.000 1.000 1.000 0.402 1.000 1.000 1.000 1.000 0.065 7.160 3.288
p r o p o s e d ( τ = 0.75 ) 1.000 1.000 1.000 1.000 1.000 0.400 1.000 1.000 1.000 1.000 0.137 7.360 3.912
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 1.000 1.000 1.000 0.000 - - - - - 3.033 0.921
t ( 3 )
h i r N e t 1.000 1.000 1.000 1.000 1.000 0.400 1.000 1.000 1.000 1.000 0.676 11.200 32.903
R A M P 1.000 1.000 0.700 0.680 1.000 0.000 1.000 1.000 1.000 1.000 0.696 9.360 70.609
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 1.000 0.000 - - - - - 3.320 3.572
p r o p o s e d ( τ = 0.25 ) 1.000 1.000 1.000 1.000 1.000 0.400 1.000 1.000 1.000 1.000 0.074 7.180 5.371
p r o p o s e d ( τ = 0.50 ) 1.000 1.000 1.000 1.000 1.000 0.400 1.000 1.000 1.000 1.000 0.056 7.160 5.568
p r o p o s e d ( τ = 0.75 ) 1.000 1.000 1.000 1.000 1.000 0.400 0.980 1.000 0.980 0.990 0.091 7.200 5.145
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 1.000 1.000 1.000 0.000 - - - - - 3.200 3.560
χ 2 ( 2 )
h i r N e t 1.000 1.000 1.000 1.000 1.000 0.4000 1.000 1.000 1.000 1.000 0.679 11.260 34.494
R A M P 0.000 1.000 0.760 0.600 0.666 0.000 1.000 1.000 1.000 1.000 0.682 9.060 72.785
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 1.000 0.011 - - - - - 4.766 4.723
p r o p o s e d ( τ = 0.25 ) 1.000 1.000 1.000 1.000 1.000 0.400 0.900 1.000 0.900 0.950 0.251 7.620 5.462
p r o p o s e d ( τ = 0.50 ) 1.000 1.000 1.000 1.000 1.000 0.402 0.940 1.000 0.940 0.970 0.163 7.400 5.814
p r o p o s e d ( τ = 0.75 ) 1.000 1.000 1.000 1.000 1.000 0.400 0.920 1.000 0.920 0.960 0.150 7.340 6.005
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 1.000 1.000 1.000 0.000 - - - - - 3.566 4.820
σ ( x i ) e i
h i r N e t 1.000 1.000 1.000 1.000 1.000 0.402 1.000 1.000 1.000 1.000 0.673 11.140 30.643
R A M P 0.840 1.000 0.920 0.900 0.940 0.241 1.000 1.000 0.860 0.930 0.845 15.860 7.298
p r o p o s e d ( τ = 0.1 ) 1.000 1.000 1.000 1.000 1.000 0.000 - - - - - 3.440 1.055
p r o p o s e d ( τ = 0.25 ) 1.000 1.000 1.000 1.000 1.000 0.400 1.000 1.000 1.000 1.000 0.010 7.020 3.950
p r o p o s e d ( τ = 0.50 ) 1.00 1.000 1.000 1.000 1.000 0.400 1.000 1.000 1.000 1.000 0.029 7.060 3.948
p r o p o s e d ( τ = 0.75 ) 1.000 1.000 1.000 1.000 1.000 0.400 1.000 1.000 1.000 1.000 0.029 7.060 3.948
p r o p o s e d ( τ = 0.9 ) 1.000 1.000 1.000 1.000 1.000 0.000 - - - - - 3.166 1.052

The covariates selected by the proposed methods.

Method Covariates
RAMP APQ11 HNR RPDE DFA PPE
HNR*RPDE HNR*DFA DFA*PPE
hirNet PPQ5 APQ11 NHR HNR RPDE
DFA PPE APQ11*PPE APQ11*DFA HNR*RPDE
RPDE*PPE DFA*PPE
Proposed method
τ = 0.1 HNR RPDE DFA HNR*DFA
τ = 0.25 HNR DFA PPE DFA*PPE
τ = 0.5 APQ11 HNR RPDE DFA HNR*RPDE
τ = 0.75 HNR DFA PPE HNR*DFA HNR*PPE
τ = 0.9 dB HNR

The symbol * represents the interaction between the two main effects.

The average RMSE over the 100 data sets.

Method RMSE Size Method Size RMSE Method τ Size RMSE
τ = 0.1 3 0.891
τ = 0.25 4.7 0.873
RAMP 0.878 9.5 hirNet 0.887 18.39 proposed method τ = 0.5 4.9 1.00
τ = 0.75 5.0 0.858
τ = 0.9 2.3 0.971

Appendix A. Proofof the Theorem

Appendix A.1. Proof of Theorem 1

Throughout the proof, if v=(v1,,vk) is a vector, we use the norms v2=j=1kvj2 and v=maxj|vj|. For a function f on [0,1], we denote its L2(P) norm by fL2=01f2(x)dP(x)=E(f)2. We write γ01=(β01,β001) and write γ˜01=(β˜01,β˜001) in the same fashion. Let θ=n(γ˜01γ01). Note that we can express Q^Yi|xi(τ)=ΨMiβ^01+ΦIiβ^001; alternatively, we can express it as Q^Yi|xi(τ)=ΨMiβ˜01+ΦIiβ˜001. By the identifiability of the model, we must have γ^01=γ˜01.

Notice that1ni=1nρτYiΨMiβ˜01+ΦIiβ˜001=1ni=1nρτϵiΠ˜AiθUni where Π˜Ai=n1/2ΠAi and Uni=ΠAiγ˜01m(xi). Define the minimizers under the transformation asθ^=argminθ1ni=1nρτϵiΠ˜AiθUni.

Let an be a sequence of positive numbers and defineQi(an)=ρτ(ϵianΠ˜AiθUni).

DefineDi(θ,an)=Qi(an)Qi(0)EQi(an)Qi(0)|xi+anΠ˜Aiθφτ(ϵi), andQ˜i(θ,an)=Qi(an)Qi(0)+anΠ˜Aiθφτ(ϵi), where φτ(ϵi)=τI(ϵi<0).

Let qn=qLn+sLn2. If Conditions (A1)–(A4) are satisfied, then, for any positive constant L, q n 1 sup θ L | D i ( θ , q n ) | = o p ( 1 ) .

Note thatDi(θ,qn)=Q˜i(θ,qn)EQ˜i(θ,qn).

Using Knight’s identityρτ(rs)ρτ(r)=s{τI(r0)}+0s{I(rt)I(r0)}dt, we haveQ˜i(θ,qn)=ρτϵiqnΠ˜AiθUniρτϵiUni+qnΠ˜Aiθφτ(ϵi)=0qnΠ˜AiθI(ϵiUni<t)I(ϵiUni<0)dt.

Therefore, VarDi(θ,qn)=VarQ˜i(θ,qn)EQ˜i2(θ,qn). We havei=1nEQ˜i2(θ,qn)|xiCqnn1/2i=1n0qnΠ˜AiθFi(t+Uni)Fi(Uni)dtCqnn1/2i=1n0qnΠ˜Aiθ(f(0)+o(1))(t+o(t2))d2Cqn2n1/2θi=1nfi(0)Π˜AiΠ˜Aiθ(1+o(1))Cqn2n1/2θ22λmax(n1ΠABnΠA)Cqn2n1/2(1+o(1)). for some positive constant C, where Bn=diag(f1(0),,fn(0)) is an n×n diagonal matrix with fi(0) denoting the conditional density function of ϵi given xi evaluated at zero. Therefore, i=1nVar{Di(θ,qn)}Cqn2n1/2 for some positive constant C and all n sufficiently large. By Bernstein’s inequality, for all n sufficiently large,Pqn1|i=1nDi(θ,qn)|>ν|xiqn1expν2Cqn2n1/2+Cνn1/2qn1exp(Cn1/2qn2) which converges to 0 as n by Conditions (A3) and (A4). Note that the upper bound does not depend on xi, so the above bound also holds unconditionally. □

Suppose that Conditions (A1)–(A4) hold. Then, for any sequence {bn} with 1bnLnξ/10,0<ξ<(r1/2)/(2r+1), we have sup θ Π ˜ A Π ˜ A θ b n 2 L n | i = 1 n ρ ( ϵ i Π ˜ A i θ U n i ) ρ ( ϵ i U n i ) + Π ˜ A i θ ( τ I ( ϵ i < 0 ) ) E ϵ i | x i ( ρ ( ϵ i Π ˜ A i θ U n i ) ρ ( ϵ i U n i ) ) | = o p ( L n ) .

Using the similar arguments as described to prove Lemma 3.2 in [40], Lemma A2 can be proven.

For Theorem 1(1), we first prove that, η>0, there exists a C>0 such thatPinfθ2Lqn1i=1n(Qi(qn)Qi(0))>01η.

Note thatqn1i=1n(Qi(qn)Qi(0))=qn1i=1nDi(θ,qn)+qn1i=1nEQi(qn)Qi(0)qn1/2i=1nΠ˜Aiθφτ(ϵi)Gn1+Gn2+Gn3, where the definition of Gni,i=1,2,3 is clear from the context. First, we can see that infθ2L|Gn1|=Op(1) by Lemma A1.

For Gn2, we haveqn1i=1nE{Qi(qn)Qi(0)}=qn1i=1nEρτ(ϵiqnΠ˜AiθUni)ρτ(ϵiUni)=qn1i=1nE(qnΠ˜Aiθ(τI(ϵiUni0))+0qnΠ˜AiθI(ϵiUni<s)I(ϵiUni<0)ds|xi)=qn1/2i=1nEΠ˜Aiθ(τI(ϵiUni0))+qn1i=1nEUniqnΠ˜Aiθ+UniI(ϵi<s)I(ϵi<0)ds|xiWn1(θ)+Wn2(θ), where the definition of Wni(θ),i=1,2 is clear from the context. Note that |Fϵ|x(0|x)Fϵ|x(Uni|x)|B|Uni| for all x, where B is the constant in the assumption (A2). Let Un=(un1,,unn). By Condition (A3), we have Un2=O(Lnr). Consequently, we can take a constant M>0 such that supθ2L|Wn1(θ)|Mqn1/2n1/2θΠA2Un2=Op(qn1/2n1/2Lnr)θ2=Op(θ2) by Condition (A3) and Lemma A2. For Wn2(θ), we haveWn2(θ)=qn1i=1nUniqnΠ˜Aiθ+Unifi(0)sds(1+o(1))=qn1i=1nfi(0)12qn(Π˜Aiθ)2+UniqnΠ˜Aiθ(1+o(1))=Cθ(n1i=1nfi(0)ΠAiΠAi)θ×(1+o(1))+qn1/2i=1nfi(0)UniΠ˜Aiθ=CθKnθ×(1+o(1))+qn1/2i=1nfi(0)UniΠ˜Aiθ where Kn=1nΠABnΠA. Based on Condition (A3), there exists a finite constant c>0, such that CθKnθ×(1+o(1))cθ22 with the probability approaching one. Combining Condition (A2) and the Cauchy–Schwarz inequality, we obtainqn1/2i=1nfi(0)UniΠ˜Aiθ=qn1/2n1/2θΠABnUnqn1/2n1/2θΠA2·BnUn2=Op(qn1/2n1/2Lnr)θ2=Op(θ2).

We next evaluate Gn3, noting that E(Gn3)=0 andE(Gn32)Cqn1Eθ(n1i=1nΠAiΠAi)θ=O(qn1θ22).

Therefore, Gn3=O(qn1θ22). Hence, for L sufficiently large, the quadratic term will dominate and qn1i=1n(Qi(qn)Qi(0)) has asymptotically a lower bound cL2. By convexity, this implies θ^2=Op(qn). From the definition of θ^, it follows that γ^01γ012=Op(n1qn). This completes the proof of Theorem 1 (a).

The proof of Theorem 1 (b) is immediate since m^(x)m(x)L2=γ^01γ012 and supx[0,1]pΠ2=O(Ln). □

Appendix A.2. Proof of Theorem 2

Note that γ^=(β^0,β^00) is the oracle estimator. Our goal is to prove that γ^ is a local minimizer of QP(βj,βjk). The gradient function of i=1nρτ(Yij=1pΨi,jβj1j<kpΦi,jkβjk) is not applicable in the proof because the check loss function ρτ is not differentiable at zero. We derive it directly from a certain lower bound of the difference of two check loss functions.

Suppose that there exists an index j0Mc, k0Mc and (j0,k0)Ic, such that fj00 and fj0k00. That is, β^j00 and β^j0k00. Let γ^*=(β^0*,β^00*) be the vector obtained with β^j00 and β^j0k00 being replaced by 0. Since ρτ(u)ρτ(v)(τI(v0))(uv) for any u,vR, thenQP(β^j,β^jk)QP(β^j*,β^jk*)1ni=1n(τI(YiΠiγ^*))Πi(γ^γ^*)+λ1(j=1pRj2βj02+1k<jpIβjk0(Rj2βj02+Rk2βk02)1/2+1k<jpQjk2βj0k02)=1ni=1n(τI(ϵi0))Πi(γ^γ^*)1ni=1n(I(ϵi0)I(ϵirni))Πi(γ^γ^*)+λ1(j=1pRj2βj02+1k<jpIβjk0(Rj2βj02+Rk2βk02)1/2+1k<jpQjk2βj0k02)1ni=1n(τI(ϵi0))Πiγ^γ^*1ni=1n(I(ϵi0)I(ϵirni))Πiγ^γ^*+λ1(j=1pRj2βj02+1k<jpIβjk0(Rj2βj02+Rk2βk02)1/2+1k<jpQjk2βj0k02)Tn1Tn2+Tn3 where rni=UniΠi(γ^γ^*). By simple calculation, one has that i=1n(τI(ϵi0))Πi2=Op(n1/2Ln). Therefore, Tn1=Op(n1Ln2). For Tn2, from Conditions (A2) and (A3), we haveEi=1n(I(ϵi0)I(ϵirni))Πi2ni=1nE(I(ϵi0)I(ϵirni))Πi2ni=1nEΠiΠi|I(ϵirni0)I(ϵi0)|ni=1nEs(n)2I(0|ϵi||rni|)Lni=1n|rni||rni|fi(s)ds=Op(n1/2Ln3) where s(n)=maxiφ(xi)2c1n1/2Ln for some positive constant c1. The last equality follows by observing that max1in|rni|O(Lnr)+Lnγ^γ^*2=Op(n1/2Ln2). This implies that i=1n(τI(ϵi0))Πi2=Op(n1/2Ln3). Therefore, Tn2=Op(n1/2Ln3). By Tn1, Tn2, and n1Ln3λ110, we have that Tn3 will dominate. Therefore, QP(β^j,β^jk)QP(β^j*,β^jk*)>0 with probability tending to one, which contradicts the fact that γ^ is the minimizer of QP(βj,βjk). Similarly, the same results are proven under weak heredity. This completes the proof of Theorem 2. □

Appendix A.3. Proof of Theorem 3

Let z1,,zn be independent random variable with values in some space Z and let Γ be a class of real-valued functions on Z, satisfying, for some positive constants ηn and τn, γ(z)2ηnand1ni=1nvar(γ(zi))τn2,γΓ.Define Z:=supγΓ|1ni=1n(γ(zi)Eγ(zi))|. Then, for t>0, PZE(Z)+t2(τn2+2ηnE(Z))+2ηnt23exp(nt2).

For the details of this proof, see [41].

Define Γ={γ(z):γ(z)=ρ(Ym^(x))ρ(Ym˜(x))}. We can write [R(m^)R(m˜)][Rn(m^)Rn(m˜)]=Eγ(z)1ni=1nγ(zi),γΓ. By Lemma A3, we haveZE(Z)+2t2(τn2+2ηnE(Z))+2ηnt23. with probability at least 1exp(nt2) for t>0. Based on the subadditivity and inequality xy(x+y)/2,x,y0, we have2t2(τn2+2ηnE(Z))2t2τn2+2t2ηnE(Z)2t2τn2+2E(Z)+t2ηn.

Then, with probability at least 1exp(nt2), we haveZ3E(Z)+2t2τn2+5t2ηn3.

Let γ(x) be the collection of all differences ρ(Ym^(x))ρ(Ym˜(x)). The bracketing number N[](δ,Mn) is the minimum number of ε-brackets [li,ui] over Mn, where ujljϵ,1jk. We can construct 2ε-brackets over γ(x) by taking difference [liuj,uilj] for the upper and lower bounds. Therefore, the bracketing numbers N[](ϵ,γ(x)) are bounded by the squares of the bracketing numbers N[](ϵ/2,Mn). For δ>0, by theorem 19.5 in [42], there exists a finite number a(δ) such thatEsupisupγΓ|1ni=1nγ(xi)Eγ(z)|J[](δ,Mn)n+M1{M>a(δ)n}. where J[](δ,Mn)=0δlogN[](ϵ,Mn)dϵ is the bracketing integral. The envelope function M can be taken as equal to the supremum of the absolute values of the upper and lower bounds of finitely many brackets over Mn. Based on Theorem 19.5 in [42], the second term on the right is bounded by a(δ)1PM21{M>a(δ)n} and hence converges to zero as n. Given K>0, there exists a constant K such that, for Sobolev space SC2([0,1]),logN[](δ,SC2([0,1]))K(1δ)1/2,Note that, jMφjβjMn(Ln) and (jk)IϕjkβjkMn(Ln), where φjSC2([0,1]) and ϕjkSC2([0,1]2). Reference [43] implies that the bracketing integral of Sobolev space SC2([0,1]) and SC2([0,1]2) is bounded. Then,J[](δ,Mn)=O((logp)1/2+(2logp)1/4)=O(logp).

By the convexity of ρτ, there exists C>0 such that E(γ(z))22m^(x)m˜(x)L22 and γCm^(x)m˜(x)m^(x)m˜(x)L2. Let τn2=m^(x)m˜(x)L22m^(x)m(x)L22+m˜(x)m(x)L22=O(n(2r1)/(2r+1)) and ηn=m^(x)m˜(x)L2m^(x)m(x)L2+m˜(x)m(x)L2=O(n(2r1)/2(2r+1)). Then, Equation (A1) implies thatZn1/2logp+2t2m^(x)m˜(x)L2+5t23m^(x)m˜(x)L2logpn1/2+(2t2+5t2/3)n(2r1)/2(2r+1)(22t2+10t2/3)n(2r1)/2(2r+1) with probability at least 1exp(nt2) as n. From the above, we have R(m^)R(m˜)Rn(m^)Rn(m˜)+(22t2+10t2/3)n(2r1)/2(2r+1) and that there exists Dn such that P(R(m^)R(m˜)Dn)=P(Rn(m^)Rn(m˜)+(22t2+10t2/3)n(2r1)/2(2r+1)). Under the assumption that the regression function is bounded, it follows, for φjSC2([0,1]) and ϕjkSC2([0,1]2), thatj=1pβj2+1k<jpIβjk0(βj2+βk2)1/2+1k<jpβjk2Ln where Ln=o([n/log(n)]1/4). Reference [13] also shows that the Lasso is persistent when p grows polynomially in n. Furthermore, according to the definition of m^(x), we have Rn(m^)Rn(m˜)λ1Ln and R(m^)R(m˜)=λ1Ln+(22t2+10t2/3)n(2r1)/2(2r+1). Let Dn=λ1Ln+(22t2+10t2/3)n(2r1)/2(2r+1). Since Rn(m^)Rn(m˜)λ1Ln always holds, the probability of P(R(m^)R(m˜)Dn)=P(Rn(m^)Rn(m˜)+(22t2+10t2/3)n(2r1)/2(2r+1)λ1Ln+(22t2+10t2/3)n(2r1)/2(2r+1)) is 1. Note that λ1Ln+(22t2+10t2/3)n(2r1)/2(2r+1)=O((22t2+10t2/3)n(2r1)/2(2r+1)). Therefore, R(m^)R(m˜)(22t2+10t2/3)n(2r1)/2(2r+1) with probability at least 1exp(nt2). Similarly, the same results are proved under weak heredity. □

Appendix B. Estimated Results at τ = 0.1, 0.25, 0.75, 0.9 in Applications

Figure A1 illustrates the effects of HNR, RPDE, and DFA on the UPDRS score at τ=0.1. As HNR increases, the UPDRS score significantly decreases, showing a nonlinear relationship, where higher noise-to-harmonic ratios are associated with milder symptoms. RPDE exhibits a clear U-shaped curve, indicating that extreme values correspond to more severe symptoms, while intermediate values are associated with milder symptoms, reflecting a nonlinear effect pattern. DFA shows a negative correlation, with UPDRS scores decreasing as DFA increases, again revealing a nonlinear trend, suggesting that increased fundamental frequency variability may help to alleviate symptoms.

Figure A2 shows the interaction effect between HNR and DFA at τ=0.1, revealing a nonlinear relationship. When HNR is low and DFA is high, the UPDRS score is higher; as HNR increases, the UPDRS score decreases. However, when HNR is high and DFA is low, the score increases again. This suggests that the effects of HNR and DFA on the UPDRS score are complex and interdependent, highlighting the importance of understanding and analyzing the interactions between these variables.

Figure A1 Estimated main effects for the HNR, RPDE, and DFA variables at τ=0.10.

View Image -

Figure A2 Estimated interaction between HNR and DFA at τ=0.1.

View Image -

Figure A3 illustrates the estimated main effect trends of three acoustic features—HNR, DFA, and PPE—at τ=0.25. Under this condition, as HNR increases, its impact on the analysis first rises slightly and then drops rapidly, indicating that the purity of the speech signal influences recognition but with diminishing returns beyond a certain point. DFA reveals an optimal fluctuation pattern, after which its ability to reflect health status declines. PPE shows that the randomness and complexity of the speech signal also have an optimal range, beyond which the effect weakens. Overall, the variations in these features highlight their potential applications in fields such as disease diagnosis, where analyzing these acoustic characteristics can assist in medical diagnostics and improve accuracy.

Figure A4 illustrates the estimated interaction between DFA and PPE at τ=0.25, presented through a 3D surface plot. As DFA and PPE values vary, their interaction significantly impacts the overall effect. Specifically, when DFA is at a low level, the overall effect initially increases slowly and then drops rapidly as PPE increases. In contrast, when DFA is high, this trend becomes more gradual, indicating a more complex interaction pattern. This suggests that the influence of their interaction varies significantly across different DFA and PPE combinations. For instance, in certain combinations, their interaction can more accurately reflect changes in health status within speech signals, which is important for disease diagnosis and improving speech processing technologies. Overall, the figure highlights the nonlinear interaction between DFA and PPE and its potential applications.

Figure A3 Estimated main effects for the HNR, DFA, and PPE variables at τ=0.25.

View Image -

Figure A4 Estimated interaction between DFA and PPE at τ=0.25.

View Image -

Figure A5 illustrates the estimated main effects of three variables HNR, DFA, and PPE at τ=0.75. From the figure, it can be observed that, as the HNR value increases, its impact on the overall effect first rises and then rapidly decreases. The change in DFA values shows a similar trend, but the decline is more gradual. Meanwhile, the PPE value exhibits a distinct peak before gradually decreasing. These variations in the features highlight their significance in practical applications, particularly in disease diagnosis and speech processing fields.

Figure A6 illustrates the estimated interaction effects between HNR and DFA, as well as between HNR and PPE at τ=0.75. From the figure, it can be observed that the interaction between HNR and DFA exhibits a complex nonlinear relationship, with the overall effect first increasing and then decreasing as the HNR value rises, showing a distinct fluctuation pattern. Similarly, the interaction between HNR and PPE also displays a complex pattern, but with more pronounced changes, particularly when the PPE value is high, leading to more significant variations in the overall effect.

Figure A5 Estimated main effects for the HNR and DFA variables at τ=0.75.

View Image -

Figure A6 Estimated interaction effects between HNR and DFA, as well as the interaction effect between HNR and PPE variables at τ=0.75.

View Image -

Figure A7 provides the effects of dB and HNR on the UPDRS score at τ=0.9. The main effect of dB presents a U-shaped curve, indicating that both extremely high and low dB values lead to higher UPDRS scores, with lower scores at intermediate values. This emphasizes the importance of moderate changes in sound levels and their nonlinear effects. HNR also shows an inverted U-shaped curve, indicating that moderate HNR values are associated with the most severe symptoms, with this nonlinear relationship being particularly prominent at higher quantiles. These findings highlight the complex nonlinear effects of different variables on the UPDRS score, providing valuable insights for understanding Parkinson’s disease symptoms.

Figure A7 Estimated main effects for the dB and HNR variables at τ=0.90.

View Image -

References

1. Koenker, R.; Bassett, G. Regression quantiles. Econometrica; 1978; 46, pp. 33-50. [DOI: https://dx.doi.org/10.2307/1913643]

2. Nelder, J.A. A reformulation of linear models. J. R. Stat. Soc. Ser. A (Gen.); 1977; 140, pp. 48-63. [DOI: https://dx.doi.org/10.2307/2344517]

3. McCullagh, P.; Nelder, J. Monographs on Statistics and Applied Probability; Chapman & Hall: London, UK, 1989.

4. De Gooijer, J.G.; Zerom, D. On additive conditional quantiles with high-dimensional covariates. J. Am. Stat. Assoc.; 2003; 98, pp. 135-146. [DOI: https://dx.doi.org/10.1198/016214503388619166]

5. Cheng, Y.; De Gooijer, J.G.; Zerom, D. Efficient estimation of an additive quantile regression model. Scand. J. Stat.; 2011; 38, pp. 46-62. [DOI: https://dx.doi.org/10.1111/j.1467-9469.2010.00706.x]

6. Lee, Y.K.; Mammen, E.; Park, B.U. Backfitting and smooth backfitting for additive quantile models. Ann. Stat.; 2010; 38, pp. 2857-2883. [DOI: https://dx.doi.org/10.1214/10-AOS808]

7. Horowitz, J.L.; Lee, S. Nonparametric estimation of an additive quantile regression model. J. Am. Stat. Assoc.; 2005; 100, pp. 1238-1249. [DOI: https://dx.doi.org/10.1198/016214505000000583]

8. Sherwood, B.; Maidman, A. Additive nonlinear quantile regression in ultra-high dimension. J. Mach. Learn. Res.; 2022; 23, pp. 1-47.

9. Zhao, W.; Li, R.; Lian, H. Estimation and variable selection of quantile partially linear additive models for correlated data. J. Stat. Comput. Simul.; 2024; 94, pp. 315-345. [DOI: https://dx.doi.org/10.1080/00949655.2023.2245098]

10. Cui, X.; Zhao, W. Pursuit of dynamic structure in quantile additive models with longitudinal data. Comput. Stat. Data Anal.; 2019; 130, pp. 42-60. [DOI: https://dx.doi.org/10.1016/j.csda.2018.08.017]

11. Lin, Y.; Zhang, H.H. Component selection and smoothing in multivariate nonparametric regression. Ann. Stat.; 2006; 34, pp. 2272-2297. [DOI: https://dx.doi.org/10.1214/009053606000000722]

12. Storlie, C.B.; Bondell, H.D.; Reich, B.J.; Zhang, H.H. Surface estimation, variable selection, and the nonparametric oracle property. Stat. Sin.; 2011; 21, pp. 679-705. [DOI: https://dx.doi.org/10.5705/ss.2011.030a] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/21603586]

13. Radchenko, P.; James, G.M. Variable selection using adaptive nonlinear interaction structures in high dimensions. J. Am. Stat. Assoc.; 2010; 105, pp. 1541-1553. [DOI: https://dx.doi.org/10.1198/jasa.2010.tm10130]

14. Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat.; 2004; 32, pp. 407-499. [DOI: https://dx.doi.org/10.1214/009053604000000067]

15. Wu, T.T.; Chen, Y.F.; Hastie, T.; Sobel, E.; Lange, K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics; 2009; 25, pp. 714-721. [DOI: https://dx.doi.org/10.1093/bioinformatics/btp041]

16. Zhao, P.; Rocha, G.; Yu, B. The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat.; 2009; 37, pp. 3468-3497. [DOI: https://dx.doi.org/10.1214/07-AOS584]

17. Yuan, M.; Joseph, V.R.; Zou, H. Structured variable selection and estimation. Ann. Appl. Stat.; 2009; 3, pp. 1738-1757. [DOI: https://dx.doi.org/10.1214/09-AOAS254]

18. Choi, N.H.; Li, W.; Zhu, J. Variable selection with the strong heredity constraint and its oracle property. J. Am. Stat. Assoc.; 2010; 105, pp. 354-364. [DOI: https://dx.doi.org/10.1198/jasa.2010.tm08281]

19. Bien, J.; Taylor, J.; Tibshirani, R. A lasso for hierarchical interactions. Ann. Stat.; 2013; 41, pp. 1111-1141. [DOI: https://dx.doi.org/10.1214/13-AOS1096]

20. She, Y.; Wang, Z.F.; Jiang, H. Group regularized estimation under structural hierarchy. J. Am. Stat. Assoc.; 2018; 113, pp. 445-454. [DOI: https://dx.doi.org/10.1080/01621459.2016.1260470]

21. Hao, N.; Feng, Y.; Zhang, H.H. Model Selection for High Dimensional Quadratic Regression via Regularization. J. Am. Stat. Assoc.; 2018; 113, pp. 615-625. [DOI: https://dx.doi.org/10.1080/01621459.2016.1264956]

22. Liu, C.; Ma, J.; Amos, C.I. Bayesian variable selection for hierarchical gene–environment and gene–gene interactions. Hum. Genet.; 2015; 134, pp. 23-36. [DOI: https://dx.doi.org/10.1007/s00439-014-1478-5] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25154630]

23. Yang, Y.; Basu, S.; Zhang, L. A Bayesian hierarchical variable selection prior for pathway-based GWAS using summary statistics. Stat. Med.; 2020; 39, pp. 724-739. [DOI: https://dx.doi.org/10.1002/sim.8442] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31777110]

24. Kim, J.E.A. Bayesian variable selection with strong heredity constraints. J. Korean Stat. Soc.; 2018; 47, pp. 314-329. [DOI: https://dx.doi.org/10.1016/j.jkss.2018.03.003]

25. Li, Y.; Wang, N.; Carroll, R.J. Generalized functional linear models with semiparametric single-index interactions. J. Am. Stat. Assoc.; 2010; 105, pp. 621-633. [DOI: https://dx.doi.org/10.1198/jasa.2010.tm09313]

26. Li, Y.; Liu, J.S. Robust variable and interaction selection for logistic regression and multiple index models. J. Am. Stat. Assoc.; 2019; 114, pp. 271-286. [DOI: https://dx.doi.org/10.1080/01621459.2017.1401541]

27. Liu, Y.; Li, Y.; Carroll, R.J. Predictive functional linear models with diverging number of semiparametric single-index interactions. J. Econom.; 2021; 230, pp. 221-239. [DOI: https://dx.doi.org/10.1016/j.jeconom.2021.03.010]

28. Liu, H.; You, J.; Cao, J. A Dynamic Interaction Semiparametric Function-on-Scalar Model. J. Am. Stat. Assoc.; 2021; 118, pp. 360-373. [DOI: https://dx.doi.org/10.1080/01621459.2021.1933496]

29. Yu, K.; Lu, Z. Local linear additive quantile regression. Scand. J. Stat.; 2004; 31, pp. 333-346. [DOI: https://dx.doi.org/10.1111/j.1467-9469.2004.03_035.x]

30. Noh, H.; Lee, E.R. Component selection in additive quantile regression models. J. Korean Stat. Soc.; 2014; 43, pp. 439-452. [DOI: https://dx.doi.org/10.1016/j.jkss.2014.01.002]

31. Wood, S.N. P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data. Stat. Comput.; 2017; 27, pp. 985-989. [DOI: https://dx.doi.org/10.1007/s11222-016-9666-x]

32. Eilers, P.H.C.; Marx, B.D. Flexible smoothing with B-splines and penalties. Stat. Sci.; 1996; 11, pp. 89-102. [DOI: https://dx.doi.org/10.1214/ss/1038425655]

33. Meier, L.; van de Geer, S.; Buehlmann, P. High-dimensional additive modeling. Ann. Stat.; 2009; 37, pp. 3779-3821. [DOI: https://dx.doi.org/10.1214/09-AOS692]

34. Fan, Y.; Tang, C.Y. Tuning parameter selection in high dimensional penalized likelihood. J. R. Stat. Soc. Ser. B; 2013; 75, pp. 531-552. [DOI: https://dx.doi.org/10.1111/rssb.12001]

35. Greenshtein, E.; Ritov, Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli; 2004; 10, pp. 971-988. [DOI: https://dx.doi.org/10.3150/bj/1106314846]

36. Jiang, L.; Bondell, H.; Wang, H. Interquantile shrinkage and variable selection in quantile regression. Comput. Stat. Data Anal.; 2014; 69, pp. 208-219. [DOI: https://dx.doi.org/10.1016/j.csda.2013.08.006]

37. Zou, H.; Ming, Y. Composite quantile regression and the oracle model selection theory. Ann. Stat.; 2008; 36, pp. 1108-1126. [DOI: https://dx.doi.org/10.1214/07-AOS507]

38. Kohns, D.; Szendrei, T. Horseshoe prior Bayesian quantile regression. J. R. Stat. Soc. Ser. C Appl. Stat.; 2024; 73, pp. 193-220. [DOI: https://dx.doi.org/10.1093/jrsssc/qlad091]

39. Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Accurate telemonitoring of parkinson’s disease progression by noninvasive speech tests. IEEE Trans. Bio-Med. Eng.; 2010; 57, pp. 884-893. [DOI: https://dx.doi.org/10.1109/TBME.2009.2036000]

40. He, X.M.; Shi, P. Convergence rate of B-spline estimators of nonparametric conditional quantile functions. J. Nonparametr. Stat.; 1994; 3, pp. 299-308. [DOI: https://dx.doi.org/10.1080/10485259408832589]

41. Bousquet, O. A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Math.; 2002; 334, pp. 495-500. [DOI: https://dx.doi.org/10.1016/S1631-073X(02)02292-6]

42. Vaart, A. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1990.

43. Birman, M.; Solomjak, M.Z. Piecewise-polynomial approximations of functions of the classes. Math. USSR-Sb.; 1967; 2, pp. 295-317. [DOI: https://dx.doi.org/10.1070/SM1967v002n03ABEH002343]

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.