Uniform in Number of Neighbor Consistency and

Full text

Turn on search term navigation

1. Introduction and Motivations

U-statistics, first introduced in [1] and building on earlier work in [2], represent an essential class of statistical tools serving as unbiased estimators. Specifically, U-statistics of order m with kernel h are constructed from a sequence of random variables ${X_{i}}_{i = 1}^{\infty}$ defined on a measurable space $(S, S)$ , using a measurable function $f : S^{m} \to R$ . They are given by

$U_{n} (h) = \frac{(n - m)!}{n!} \sum_{(i_{1}, \dots, i_{m}) \in I_{n}^{m}} h (X_{i_{1}}, \dots, X_{i_{m}}), n \geq m,$

where

$I_{n}^{m} = \{(i_{1}, \dots, i_{m}) : i_{j} \in N, 1 \leq i_{j} \leq n, i_{j} \neq i_{k} if j \neq k\} .$

Notice that

U_{n} (h)

serves as the nonparametric uniformly minimum variance estimator of

θ = E [h (X_{1}, \dots, X_{m})]

, minimizing the following expression:

$\sum_{1 \leq i_{1} < \dots < i_{m} \leq n} {(h (X_{i_{1}}, \dots, X_{i_{m}}) - α)}^{2},$

where

α

is the minimizer. Common estimators based on U-statistics include the empirical variance, Gini’s mean difference, and Kendall’s rank correlation coefficient. A notable example is the Wilcoxon signed-rank test, a classical nonparametric method for testing zero location, as discussed in [3], Example 12.4. Asymptotic properties of U-statistics for independent and identically distributed (i.i.d.) random variables were initially developed in [1] and subsequently advanced in [4,5,6,7]. Parallel results for V-statistics were presented in [8,9]. Comprehensive reviews of U-statistics literature can be found in works [5,10,11], with a more extensive discussion in [12]. U-processes, a generalization of U-statistics where the statistics are indexed by a family of kernels, extend U-statistics to infinite-dimensional settings. These processes serve as nonlinear extensions of empirical processes and are crucial for tackling complex problems in areas such as density estimation, nonparametric regression, and goodness-of-fit testing. Transitioning from empirical processes to U-processes involves specialized techniques, particularly in stationary contexts, and finds wide applications in estimator analysis, particularly for functions of varying smoothness. Notable applications include testing qualitative features in nonparametric statistics [13,14], cross-validation in density estimation [15], and deriving limiting distributions for M-estimators [10,16,17]. Ref. [10] provided necessary and sufficient conditions for the law of large numbers and the central limit theorem for U-processes. More recent applications include normality tests using U-processes [18] and the new normality tests proposed in [19], which utilize weighted

L_{1}

distances between the standard normal density and local U-statistics derived from standardized observations. Using a characterization of symmetry in terms of extremal order statistics, the authors of [20] developed several new nonparametric tests of symmetry based on novel U-empirical processes. In a related work, ref. [21] considered a class of U-statistics with kernels indexed by a multivariate parameter and, as an application, discussed an empirical characteristic function-based transformation to symmetry. The median-of-means approach, which leverages U-statistics, was introduced in [22] for estimating the mean of multivariate functions in heavy-tailed distributions. A thorough overview of the theory behind U-processes was provided in [17]. The broad utility of U-statistics extends to various fields, including random graph theory for counting subgraphs such as triangles [23] and machine learning applications like clustering, image recognition, and graph-based learning. Even with random kernels of diverging orders, U-statistics remain relevant, as explored in [24,25,26,27]. Infinite-order U-statistics are also employed to construct simultaneous prediction intervals, addressing uncertainties in ensemble methods like subbagging and random forests [28]. Specific applications of U-statistics include the MeanNN method for estimating differential entropy [29] and novel test statistics for goodness of fit [30]. In genetics, ref. [31] used U-statistics for model-free clustering and classification, while [32] applied them to analyze random compressed sensing matrices. Ref. [33] evaluated independence tests in functional data using Kendall statistics, a specific type of U-statistics. In high-dimensional clustering, ref. [34] developed a U-statistics-based framework for group classification and partition significance, while [35] focused on dimension-agnostic inference methods using variational representations of test statistics. Other innovations include U-statistics-based empirical risk minimization [36] and asymmetric U-statistics for pattern matching in random strings and permutations [37]. Ref. [38] proposed U-statistics under left truncation and right censoring for nonparametric independence tests between time to failure and failure cause in competing risks, while [39] explored quadruplet U-statistics for network analysis. The extension of U-statistics to conditional empirical U-processes presents both practical advantages and technical challenges.

We introduce Stute’s estimators for a sequence of random elements ${(X_{i}, Y_{i})}_{i \in N^{*}}$ , where $X_{i} \in R^{d}$ and $Y_{i} \in Y$ , a Polish space, with $N^{*} = N ∖ {0}$ . For a measurable function $φ : Y^{m} \to R$ , our goal is to estimate the conditional expectation, or regression function, $r^{(m)} (φ, t)$ for $t \in R^{d m}$ , given by

$r^{(m)} (φ, t) = E (φ (Y_{1}, \dots, Y_{m}) ∣ (X_{1}, \dots, X_{m}) = t),$

whenever it exists, i.e., when

E (|φ (Y_{1}, \dots, Y_{m})|) < \infty

. We let

K : R^{d} \to R

be a kernel function with support in

{[- B, B]}^{d}

B > 0

, satisfying

$sup_{x \in R^{d}} | K (x) | = : κ < \infty and \int K (x) d x = 1 .$

Stute’s class of estimators for

r^{(m)} (φ, t)

, known as conditional U-statistics, is defined for each

t \in R^{d m}

${\hat{r}}_{n}^{(m)} (φ, t; h_{K}) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) K (\frac{t_{1} - X_{i_{1}}}{h_{K}}) \dots K (\frac{t_{m} - X_{i_{m}}}{h_{K}})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K (\frac{t_{1} - X_{i_{1}}}{h_{K}}) \dots K (\frac{t_{m} - X_{i_{m}}}{h_{K}})},$

where

$I (m, n) = \{(i_{1}, \dots, i_{m}) : 1 \leq i_{j} \leq n, i_{j} \neq i_{r} if j \neq r\},$

is the set of all m-tuples of distinct integers between 1 and n, and

{h_{K} : = h_{n}}_{n \geq 1}

is a sequence of positive constants converging to zero such that

n h_{K}^{m} \to \infty

. In the particular case where

m = 1

, the regression function

r^{(1)} (φ, t)

simplifies to

r^{(1)} (φ, t) = E (φ (Y) ∣ X = t)

, and Stute’s estimator corresponds to the well-known Nadaraya–Watson estimator of

r^{(1)} (φ, t)

[40,41], as elaborated in [42,43]. The seminal work of [44] concentrated on determining the uniform convergence rate of

{\hat{r}}_{n}^{(m)} (φ, t; h_{K})

r^{(m)} (φ, t)

as a function of

t

. Subsequently, ref. [45] investigated the asymptotic distribution of

{\hat{r}}_{n}^{(m)} (φ, t; h_{K})

, drawing comparisons with the findings of Stute. Under appropriate mixing conditions, ref. [46] extended Stute’s framework to weakly dependent data, demonstrating the Bayes risk consistency of the corresponding classification rules. As an alternative to traditional kernel-based approaches, ref. [47] introduced symmetrized Nearest Neighbor conditional U-statistics. In a complementary vein, ref. [48] analyzed functional conditional U-statistics, establishing the asymptotic normality of their finite-dimensional distributions. Despite the critical role of nonparametric estimation of conditional U-statistics in functional data analysis, this area has remained relatively underexplored. Recent developments in this field, highlighted in [49,50,51,52,53], address key challenges, particularly regarding the uniform consistency of bandwidth selection.

There has been increasing interest in regression models where the response variable is real-valued and the explanatory variables consist of smooth functions that can vary arbitrarily across observations. This form of data, referred to as functional data, is encountered across diverse fields, including climatology, medicine, economics, and linguistics. Functional time series emerge when continuous processes are divided into smaller segments such as daily intervals where each intraday curve is treated as a functional random variable. This paper delves into functional data by investigating the theory of U-processes. For an in-depth introduction to functional data analysis, we direct readers to foundational works [54,55,56], which present case studies from various disciplines alongside core analytical methods. It is important to note that the extension of probability theory to random variables within normed vector spaces, such as Banach and Hilbert spaces, precedes many contemporary developments in the field of functional data, as highlighted in [57]. Ref. [58] explored density and mode estimation in normed vector spaces, addressing the challenges posed by the curse of dimensionality in functional data. In the context of regression, ref. [56] examined nonparametric models, while key foundational contributions to the theoretical framework and practical applications can be found in works [59,60,61]. Recent advances in functional data analysis include [62], the authors of which provided consistency rates for various functionals of the conditional distribution, uniformly over a subset of the explanatory variable. Building on this, ref. [63] extended these results by establishing uniform in bandwidth (UIB) consistency rates for nonparametric models, covering key functionals such as the regression function, conditional distribution, conditional density, and conditional hazard function. Further contributions of [64] investigated local linear estimation of the regression function when the regressor is functional, demonstrating strong convergence uniformly in bandwidth parameters. Additionally, ref. [65] explored k-Nearest Neighbors (kNN) estimation for nonparametric regression models using strongly mixing functional time series data, establishing uniform nearly complete convergence rates under moderate conditions. For more recent advancements in the field, works [66,67,68,69,70,71] provide further insights and developments. Recent literature has increasingly advocated for the integration of dimension reduction techniques in regression models. One prominent approach is the use of single-index models which reduce the influence of multiple predictors to a single index—a projection in a specific direction—coupled with a nonparametric link function. These models effectively capture the essential relationship between predictors and the response while mitigating the curse of dimensionality by focusing on a one-dimensional index. The nonparametric link function, applied to this index, allows for greater flexibility than traditional linear models, extending the scope of linear regression, where the link function is simply the identity function (see [72,73,74,75,76]). Recent advances in Functional Data Analysis (FDA) have further highlighted the importance of addressing high-dimensionality in functional regression models (see [77,78,79] for comprehensive reviews). Semiparametric models, particularly functional single-index models (FSIMs), have emerged as powerful tools in this context. Refs. [80,81,82] explored the FSIM framework, extending single-index models to accommodate functional data. Ref. [83] introduced a functional single-index composite quantile regression model, utilizing B-spline basis functions to estimate both the unknown slope and link functions. Recent papers include [84], the authors of which proposed a compact FSIM where the coefficient function is restricted to be non-zero in a subregion, and [85], the authors of which developed methods for estimating general FSIMs where the conditional distribution of the response is governed by a functional predictor through a single-index structure. Ref. [86] advanced this framework by combining functional principal component analysis (FPCA) for the functional predictors with B-spline modeling for the parameters, applying profile estimation techniques for unknown functions and parameters. Additionally, refs. [87,88] investigated FSIMs in the presence of randomly missing responses in strongly mixing time series data. Ref. [89] introduced a functional single-index varying coefficient model, wherein the functional predictor forms the single index. By applying FPCA and basis function approximations, the authors proposed an iterative estimation procedure for slope and coefficient functions. In a similar vein, ref. [90] developed an automatic, location-adaptive estimation procedure for FSIM using the kNN techniques. Inspired by imaging data analysis, ref. [91] introduced a novel functional varying-coefficient single-index model to analyze functional response data with covariates of interest. Meanwhile, ref. [92] explored the nonparametric estimation of conditional cumulative distribution functions using a functional Hilbertian regressor in the single-index framework, later extending this methodology to the multi-index case in [93]. This approach avoids fixing the true parameter within a pre-specified sieve and offers a rigorous theoretical analysis of a direct kernel-based estimation method, demonstrating polynomial convergence rates. These contributions reflect the ongoing evolution of FSIMs as versatile and robust tools in the analysis of complex, high-dimensional functional data. The principal objective of this study is to establish a comprehensive framework and rigorously investigate the weak and uniform convergence properties of k-Nearest Neighbor (kNN) single-index conditional U-processes within a regular sequence of random functions. This research is driven by the fundamental statistical significance of the kNN method, a widely adopted and versatile approach. The kNN method identifies the k closest neighbors of a given point $X_{i}$ to x using a prescribed distance metric $d (\cdot, \cdot)$ . A notable feature of the kNN approach is its random, locally adaptive bandwidth, which adjusts to the underlying structure of the data—a critical aspect, especially in infinite-dimensional settings. The kNN method, originally introduced in [94] and further examined in [95], has its roots in nonparametric discrimination and was later explored in [96]. For a comprehensive exposition of the method, readers are referred to [97]. In practice, the kNN method is widely used, as demonstrated in [56], and is favored for its simplicity as it involves only one parameter, k, which dictates the number of Nearest Neighbors. This parameter is typically selected from a finite set, and the method’s local adaptability enables it to respond to the specific data structure at any point. Extensive research has been conducted on the kNN method in finite-dimensional spaces. Notable contributions include works [98,99,100,101,102]. However, in infinite-dimensional spaces, particularly within functional frameworks, three primary approaches to kNN regression estimation have emerged. The first, introduced in [103], focuses on a kNN kernel estimate where the functional variable resides in a separable Hilbert space $H$ . In this setting, ref. [103] established weak consistency by projecting the infinite-dimensional space onto a finite-dimensional subspace. This was achieved by considering the first m coefficients of an expansion of X in an orthonormal basis of $H$ and subsequently applying multivariate kNN regression techniques to the projected data. The second approach integrated kNN methods with functional local linear estimation. Consistency and convergence rate results for this method were presented in [104,105], further enriching the theoretical underpinnings of kNN in infinite-dimensional contexts. These advances underscore the growing importance of kNN methods in functional data analysis and their adaptability to high-dimensional and complex data structures.

In this paper, we first aim to establish almost sure uniform consistency and almost sure uniform-in-the-number-of-neighbors (UINN) consistency for nonparametric functional single-index regression estimators and functional single-index conditional U-processes in the dependent setting. The concept of uniform-in-bandwidth (UIB) consistency was originally introduced in [15] for kernel density estimators using empirical process methods. This investigation is motivated by a series of foundational works, including [49,51,106,107], which established UIB consistency for similar estimators in the i.i.d. finite-dimensional framework, with $h_{n}$ varying within intervals indexed by n. In the realm of Functional Data Analysis (FDA), numerous studies have focused on nonparametric functional estimators. For example, ref. [62] provided consistency rates for several functionals of the conditional distribution, such as the regression function, conditional cumulative distribution, and conditional density, uniformly over specific subsets of the explanatory variables. Ref. [63] extended these results by establishing uniform consistency rates for conditional models, including regression functions, conditional distributions, densities, and hazard functions. Refs. [68,108] demonstrated almost complete convergence of k-Nearest Neighbor (kNN) estimators—uniform in the number of neighbors—under classical assumptions on kernel functions, small ball probabilities of the functional variable, and an entropy condition to control space complexity. Similarly, ref. [64] studied local linear estimation of the regression function in the context of functional covariates and established strong UIB convergence. Ref. [65] explored kNN estimation for nonparametric regression models with strongly mixing functional time series data, achieving uniform almost sure convergence rates under mild conditions, while [90] provided new uniform asymptotic results for kernel estimates within functional single-index models. Despite this rich body of literature, most studies have focused exclusively on either UIB or UINN consistency or on uniform consistency over specific functional subsets. However, the simultaneous investigation of both UIB and UINN consistency, particularly in a dependent setting, remains an open problem. This gap, which was addressed in [51] in the independent case, forms the central focus of our research. By integrating results from both Functional Data Analysis and empirical process theory, we aim to advance the understanding of consistency in more general settings. A second major challenge we address concerns the weak convergence of these estimators. This problem is inherently complex, as it requires controlling asymptotic equicontinuity under minimal conditions—a challenge that remains unresolved in the current literature. Our approach draws on seminal works [109,110], combined with strategies outlined in [111,112], to handle functional data. However, our contribution extends beyond simply merging existing concepts; it involves developing intricate mathematical derivations tailored to the specific nature of functional data. The effective application of large sample theory, particularly results related to dependent empirical processes, is essential to this endeavor, and we rely heavily on the theoretical foundations laid out in [109,110,112]. Notably, even in the i.i.d. setting, weak convergence for kNN single-index conditional U-processes has not yet been established—a gap we aim to fill with this work.

In addition, this work focuses on estimating the regression function via the kNN method in the single index nonparametric regression model under the mixing dependence condition. We emphasize that the kNN method is a fundamental statistical technique with several advantages. Generally, it is computationally fast and does not require extensive parameter tuning. One of the key features is its nonparametric nature, enabling it to adapt automatically to any continuous underlying distributions without relying on specific models. For significant statistical problems, including density estimation, classification, and regression, kNN methods are proven consistent when k is appropriately selected. kNN methods are one of the main paradigms in machine learning. Historically, the kNN method was first introduced in [94]—see also [95]—in the context of nonparametric discrimination and further investigated in [96]. For more details, refer to [97]. The investigation of single index models, popular in econometrics, is motivated by two primary concerns: dimension reduction and the interpretability of index $θ$ in these models. For more on this, refer to [72,80,113] in infinite-dimensional settings. Therefore, the single functional index model accumulates the advantages of single index models and the potential of functional linear models in applications (see [114,115,116]). In essence, the single functional index model reduces dimensionality effects while capturing as much information as possible from the data. Functional data analysis is a challenging research field in the statistical community. The kNN method considers the k neighbors of $X_{i}$ nearest to x with respect to distance $d (\cdot, \cdot)$ . The local bandwidth of the kNN is random and depends on data $X_{i}$ , respecting the local structure of the data, which is essential in infinite dimensions. It is commonly used in practice (see [56]) and is simple to handle because the user controls only one parameter: the number k of Nearest Neighbors, valued in a finite set. Additionally, it allows building a neighbor adapted to the data at any point. The kNN method is widely studied if the explanatory variable is in a finite-dimensional space (see [98,99,100,101,102]). In an infinite dimensional space, i.e., a functional framework, three approaches exist for kNN regression estimation. The first, published in [103], examines a kNN kernel estimate when the functional variable is an element of a separable Hilbert space $H$ . In [103], a weak consistency result was established by reducing the infinite dimension of $H$ through projection onto a finite dimension subspace, considering only the first ℓ coefficients of an expansion of X in an orthonormal system of $H$ and then applying multivariate techniques on the projected data for kNN regression. The second approach, based on the kNN procedure and functional local linear estimation, achieved consistency with convergence rate (see [104,105,117]). The third approach, in [118], is a purely functional method.

1.1. Contribution

This paper addresses challenges in high-dimensional functional data analysis by advancing classical kernel estimators for stationary random processes. It focuses on the k-Nearest Neighbor (kNN) single-index kernel estimator, examining its regression performance and uniform consistency within functional single-index regression. Introducing the concepts of Uniform In Bandwidth (UIB) and Uniform In Nearest Neighbor (UINN) consistency, the study aims to provide robust estimates for accurate relative error predictions. It extends the kNN methodology to functional single-index conditional U-statistics, analyzing their uniform and UINN consistency, asymptotic properties, and limiting behavior. A key contribution is establishing a uniform central limit theorem for function classes meeting certain moment conditions, applicable to both bounded and unbounded functions. The methodologies employed include advanced techniques such as the kNN framework, small-ball probability, Hoeffding decompositions, decoupling methods, and modern empirical process theory indexed by function classes. The results are derived under general conditions, enhancing their practical relevance. The findings have versatile applications in statistics, including time series forecasting, set-indexed conditional U-statistics, and estimation of the Kendall rank correlation coefficient. The theoretical framework utilizes maximal moment inequalities for U-processes and $β$ -mixing results as established in [119], providing a robust foundation for further exploration of high-dimensional functional data and sophisticated statistical models.

1.2. Organization of the Paper

The structure of this article is organized as follows. Section 2 introduces the functional framework and presents the necessary definitions for our analysis, alongside a thorough discussion of the assumptions underlying our asymptotic results. In Section 3, we investigate strong uniform convergence rates, providing a detailed exploration of the conditions that ensure such convergence. Section 4.1 is devoted to the weak convergence of empirical processes within the functional data setting, laying the foundation for subsequent theoretical developments. The main theoretical contributions, including the uniform Central Limit Theorem (CLT) for conditional U-processes, are articulated in Section 4.2, where we establish the key results of the study. In Section 5, we explore potential applications of our findings, covering set-indexed conditional U-statistics (Section 5.1), the Kendall rank correlation coefficient (Section 5.2), and discrimination problems (Section 5.3). Section 6 addresses practical concerns related to bandwidth selection, offering insights into its implementation in empirical studies. Concluding remarks, along with recommendations for future research directions, are provided in Section 7. For the sake of clarity and coherence, all proofs, which primarily draw upon modern empirical process theory, are consolidated in Section 8, with a focus on the core arguments given the technical length of the demonstrations. Additionally, a collection of pertinent technical results is included in the Appendix section to support the main text.

2. The Functional Framework

2.1. Generality on the Model

Statistical data often exhibit some degree of dependence, which can profoundly impact the accuracy of statistical inference. Neglecting to account for this dependence during analysis can lead to erroneous conclusions. The concept of mixing serves as a valuable tool for quantifying the level of dependence within a sequence of random variables, enabling the extension of classical results for independent sequences to those that exhibit weak dependence or mixing behavior. This study focuses specifically on $β$ -mixing, or absolute regularity, a measure of dependence originally introduced in [120,121]. Sequence $Z_{1}, Z_{2}, \dots$ is defined as $β$ -mixing if the conditional probabilities converge to the unconditional probabilities in a specific manner as the lag between observations increases. In the present analysis, we assume that the sequence of random elements ${(X_{i}, Y_{i}), i \in N^{*}}$ is absolutely regular, which ensures that the dependence structure of the data is controlled and behaves predictably. For instance, Markov chains are known to be $β$ -mixing under the mild Harris recurrence condition, particularly in scenarios where the underlying state space is finite [27,122,123,124]. We let $⟨ \cdot, \cdot ⟩$ denote the inner product and $∥ \cdot ∥$ the corresponding norm in the Hilbert space $H$ . We utilize ${(e_{p})}_{p \geq 1}$ as a complete orthonormal system for $H$ . We consider a sequence of stationary random copies of the random vector $(X, Y)$ , where X assumes values in the abstract space $X$ and Y takes values in the abstract space $Y$ . The Hilbert space $H$ is equipped with a semi-metric $d (\cdot, \cdot)$ , which induces a topology to measure the proximity between two elements within $H$ . Importantly, this metric is independent of the specific definition of X to circumvent issues related to measurability. Furthermore, we define a semi-metric $d_{θ} (\cdot, \cdot)$ associated with a single index $θ \in H$ given by $d_{θ} (u, v) = | ⟨ θ, u - v ⟩ |$ for $u, v \in H$ . This construction facilitates the analysis of functional data through a single-index structure while accounting for the dependence introduced by the $β$ -mixing condition. We assume that fixed values $θ_{1}, \dots, θ_{m} \in Θ \subset H$ exist, where observations ${(X_{i}, Y_{i})}_{i = 1, \dots, n}$ satisfy the following relation:

(1) $φ (Y_{i_{1}}, \dots, Y_{i_{m}}) = r^{(m)} (φ, ⟨ θ_{1}, t_{1} ⟩, \dots, ⟨ θ_{m}, t_{m} ⟩) + ε_{i_{1}, \dots, i_{m}}, \forall i_{j} = 1, \dots, n, j = 1, \dots, m,$

where

ε_{i_{1}, \dots, i_{m}}

is a real random variable satisfying

E (ε_{i_{1}, \dots, i_{m}} ∣ (X_{i_{1}}, \dots, X_{i_{m}})) = 0

almost surely. To ensure identifiability, we impose the assumption that the regression function is differentiable, and for

i = 1, \dots, m

, we require that

⟨ θ_{i}, e_{1} ⟩ = 1

, where

e_{1}

denotes the first element of the orthonormal basis in

H

. This condition ensures the identifiability of the single functional index model. For a comprehensive discussion on the identifiability issues related to single functional index models, we refer to [80]. The primary objective of this study is to establish the weak convergence of the single-index conditional U-process, which is constructed from the single-index conditional U-statistic in the kNN framework introduced in [51]. This investigation is pivotal for advancing the theoretical understanding of nonparametric methods in high-dimensional functional data, particularly within the context of U-processes indexed by single indices. This kNN estimator is defined as

(2) ${\hat{r}}_{n}^{* (m)} (φ, t, θ; h_{n, k} (t)) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{H_{n, k} (t_{m}, θ_{m})})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{H_{n, k} (t_{m}, θ_{m})})},$

which serves as an estimator for the multivariate regression function:

$r^{(m)} (φ, t, θ) : = E (φ (Y_{1}, \dots, Y_{m}) ∣ (⟨ X_{1}, θ_{1} ⟩, \dots, ⟨ X_{m}, θ_{m} ⟩) = (⟨ t_{1}, θ_{1} ⟩, \dots, ⟨ t_{m}, θ_{m} ⟩)) .$

The kernel function

K (\cdot)

and

φ : Y^{m} \to R

is a symmetric measurable function from class

F_{m}

. Further,

h_{n, k} (t)

is a vector of positive random variables depending on

(X_{1}, \dots, X_{n})

, defined for

x_{j} \in X^{m}

$H_{n, k} (x_{j}, θ) = min \{h \in R^{+} : \sum_{i = 1}^{n} 1_{B_{θ} (x_{j}, h)} (X_{i}) = k\},$

where

B_{θ} (t, r)

is a ball in

X

and

1_{A}

denotes the indicator function. This kNN estimator generalizes the functional conditional U-statistic defined as

(3) ${\hat{r}}_{n}^{(m)} (φ, t, θ; h_{K} (t)) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{h_{K} (t_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{h_{K} (t_{m})})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{h_{K} (t_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{h_{K} (t_{m})})},$

where

h_{K} (t)

are positive real numbers decreasing as

n \to \infty

. This estimator traces its origins to the Akaike–Parzen–Rosenblatt kernel density estimation [125,126,127], with early applications discussed in [94] and later republished in [95]. The extensive body of literature surrounding this topic includes seminal works such as [128,129,130,131], among others. The k-Nearest Neighbor (kNN) method remains a fundamental statistical tool, valued for its computational efficiency and minimal requirement for parameter tuning. Our objective is to extend the results from [49,112,132,133] to a multivariate framework, with a particular focus on VC-subgraph classes and small-ball probabilities, as introduced in [56,58,111]. For a comprehensive understanding of VC-theory, we refer readers to the foundational work in [134] and related literature. This extension is critical for advancing the theoretical understanding and practical application of nonparametric methods in multivariate functional data contexts.

Definition 1.

A class of subsets $C$ on a set C is called a VC-class if there exists a polynomial $P (\cdot)$ such that for every set of N points in C, the class $C$ selects at most $P (N)$ distinct subsets.

Definition 2.

A class of functions $F$ is called a VC-subgraph class if the graphs of the functions in $F$ form a VC-class of sets. Specifically, if we define the subgraph of a real-valued function f on S as the subset $G_{f}$ on $S \times R$ such that

$G_{f} = {(s, t) : 0 \leq t \leq f (s) o r f (s) \leq t \leq 0},$

then the class ${G_{f} : f \in F}$ forms a VC-class of sets on $S \times R$ . Informally, a VC-class of functions is characterized by having a polynomial covering number, which is the minimal number of functions required to cover the entire class.

Definition 3.

Let $S_{E}$ be a subset of a semi-metric space $E$ , and let $N_{ε}$ be a positive integer. A finite set of points ${e_{1}, \dots, e_{N_{ε}}} \subset E$ is called an ε-net of $S_{E}$ for a given $ε > 0$ if

$S_{E} \subseteq ⋃_{j = 1}^{N_{ε}} B_{θ} (e_{j}, ε),$

where $B_{θ} (e_{j}, ε)$ represents a ball of radius ε around the point $e_{j}$ . If $N_{ε} (S_{E})$ is the cardinality of the smallest ε-net, i.e., the minimal number of open balls of radius ε needed to cover $S_{E}$ , then the Kolmogorov entropy (metric entropy) of the set $S_{E}$ is defined as

$ψ_{S_{E}} (ε) : = log N_{ε} (S_{E}) .$

The concept of metric entropy, introduced by Kolmogorov (cf. [135]), has been the subject of extensive research across various metric spaces. Dudley [136] applied this concept to establish sufficient conditions for the continuity of Gaussian processes, thereby laying the foundation for significant generalizations of Donsker’s theorem concerning the weak convergence of empirical processes. Let $B_{H}$ and $S_{H}$ represent two subsets of the semi-metric space $H$ , and denote the Kolmogorov entropy for a given radius $ε$ by $ψ_{B_{H}} (ε)$ and $ψ_{S_{H}} (ε)$ , respectively. The Kolmogorov entropy for the product subset $B_{H} \times S_{H}$ within the semi-metric space $H^{2}$ is expressed as

$ψ_{B_{H} \times S_{H}} (ε) = ψ_{B_{H}} (ε) + ψ_{S_{H}} (ε) .$

Consequently,

m ψ_{S_{H}} (ε)

denotes the Kolmogorov entropy of the subset

S_{H}^{m}

within the semi-metric space

H^{m}

. Given a semi-metric d on

H

, define a semi-metric on

H^{m}

$d_{H^{m}} (x, y) : = \frac{1}{m} \sum_{i = 1}^{m} d (x_{i}, y_{i}),$

where

x = (x_{1}, \dots, x_{m})

and

y = (y_{1}, \dots, y_{m})

are elements of

H^{m}

. Furthermore, it is possible to define another semi-metric,

d_{H^{m}, θ} (x, y)

, on

H^{m}

, parameterized by

θ = (θ_{1}, \dots, θ_{m})

Θ^{m}

, and based on the individual semi-metrics

d_{θ_{1}}, \dots, d_{θ_{m}}

. The selection of the appropriate semi-metric is critical in such analyses, as it significantly impacts the theoretical properties and results. For an in-depth discussion on these considerations, refer to [56], particularly Chapters 3 and 13, which provide comprehensive insights into the role of semi-metrics in functional data analysis.

2.2. Conditions and Comments

Let us present the conditions that we need in our analysis in the case of known $θ_{0} = (θ_{0, 1}, \dots, θ_{0, m})$ .

(C.1.)
On the distributions/small-ball probabilities
- (C.1.1)
  For $t = (t_{1}, \dots, t_{m}) \in X^{m}$ and $h (t) = (h_{1} (t_{1}), \dots, h_{m} (t_{m})) \in R_{+}^{m} ∖ \{0\}$ , we have
  $ϕ_{θ_{0}, t} (h (t)) : = P (X_{1} \in B_{θ_{0, 1}} (t_{1}, h_{1} (t_{1})), \dots, X_{m} \in B_{θ_{0, m}} (t_{m}, h_{m} (t_{m})))$
  
  $0 < C_{1} \tilde{ϕ} (h) f_{1} (t) \leq ϕ_{θ_{0}, t} (h (t)) \leq C_{2} \tilde{ϕ} (h) f_{1} (t) < \infty,$
where f1(t) is a non-negative functional in t=(t1,…,tm)∈Hm, ϕ˜(h):=∏j=1mϕ(hj(tj)) and ϕ(0)=0 and ϕ(u) is an invertible function absolutely continuous in a neighbor of the origin.
- (C.1.2)
  For $i \neq j,$ let $X_{i} = (X_{i_{1}}, \dots, X_{i_{m}}), X_{j} = (X_{j_{1}}, \dots, X_{j_{m}}) a n d t = (t_{1}, \dots, t_{m}) \in H^{m}$ , we have
  $sup_{i \neq j} P \{X_{i} \in \prod_{i = 1}^{m} B (t_{i}, h_{i} (t_{i})), X_{j} \in \prod_{i = 1}^{m} B (t_{i}, h_{i} (t_{i}))\} \leq {\tilde{Ψ}}_{θ_{0}} (h (t)) f_{2} (t),$
where $f_{2} (t)$ is a non-negative function, ${\tilde{Ψ}}_{θ_{0}} (h (t)) : = \prod_{i = 1}^{m} ψ_{θ_{i}} (h_{i} (t_{i}))$ and ${\tilde{Ψ}}_{θ_{0}} (h (t)) \to 0$ as $n \to \infty$ satisfying ${\tilde{Ψ}}_{θ_{0}} (h (t)) /{\tilde{ϕ}}^{2} (h (t))$ is bounded.
(C.2.)
On the smoothness of the model
- (C.2.1)
  The regression satisfies for $φ (\cdot) \in F_{m},$ and some $γ > 0$ for $1 \leq m \leq n$ ,
  $\exists γ > 0, \forall t_{1}, t_{2} \in S_{H}^{m} : |r^{(m)} (φ, t, θ_{1}) - r^{(m)} (φ, t, θ_{2})| \leq C_{3} d_{H^{m}, θ_{0}}^{γ} (t_{1}, t_{2});$
- (C.2.2)
  The conditional variance, defined for $u \in H^{m}, Var [φ (Y) | X = u] = : g_{2} (u, φ)$ is continuous in some neighborhood of $t$ ,
  $sup_{d_{H^{m}, θ_{0}} (t, u) ⩽ h_{n}} |g_{2} (t, φ) - g_{2} (u, φ)| = o (1) a s n \to \infty .$
  
  Further, we assume that for some $p > 2, E {|F (Y)|}^{p} < \infty,$ and $φ (\cdot) \in F_{m},$
  $g_{p} (t, u, φ) : = E ({|φ (Y) - r^{(m)} (φ, t, θ)|}^{p} | X = u),$
  is continuous in some neighborhood of $t .$
- (C.2.3)
  For $u, v \in H^{m},$ the function $g_{φ} (t, u, v)$ does not depend on $i, j$ and is continuous in some neighborhood of $(t, t)$ ,
  $g_{φ} (t, u, v) : = E ((φ (Y_{i}) - r^{(m)} (φ, t, θ)) (φ (Y_{j}) - r^{(m)} (φ, t, θ)) | X_{i} = u, X_{j} = v) .$
(C.3)
On the kernel function
- (C.3.1)
  The kernel functions $K (\cdot)$ is supported within $[0, 1]$ , and there exist some constants $0 < κ_{1} \leq κ_{2} < \infty$ such that
  $\int_{0}^{1} K (x) d x = 1,$
  and
  $0 < κ_{1} 1_{[0, 1]} (\cdot) \leq K (\cdot) \leq κ_{2} 1_{[0, 1]} (\cdot) .$
- (C.3.2)
  The kernel $K (\cdot)$ is a positive function and differentiable function on $[0, 1]$ with derivative $K^{'} (\cdot)$ such that
  (4) $- \infty < κ_{3} < K^{'} (\cdot) < κ_{4} < 0 .$
(C.4)
On the classes of functions
- (C.4.1)
  The class of functions $F_{m}$ is bounded and its envelope function satisfies, for some $0 < M < \infty$ ,
  $F (y) \leq M, y \in Y^{m}$
- (C.4.2)
  The class of functions $F_{m}$ is unbounded and its envelope function satisfies, for some $p > 2$ ,
  $θ_{p} : = sup_{t \in S_{X}^{m}} E (F^{p} (Y) | X = t) < \infty .$
- (C.4.3)
  The metric entropy of the class $F_{m} K_{Θ}^{m}$ satisfies, for some $1 \leq p < \infty$ ,
  $\begin{matrix} \int_{0}^{\infty} (log N (u, F_{m} K_{θ_{0}}^{m} {, ∥ \cdot ∥}_{p} {))}^{\frac{1}{2}} d u < \infty, \end{matrix}$
  where
  $F_{m} K_{θ_{0}}^{m} = \{f g : f \in F_{m}, g \in K_{θ_{0}}^{m}\} .$
- (C.4.4)
  The class of functions $F_{m} K_{θ_{0}}^{m}$ is supposed to be of VC-type with envelope function previously defined. Hence, there are two finite constants b and $ν$ such that
  $N (ϵ, F_{m} K_{θ_{0}}^{m}, {∥ \cdot ∥}_{L_{2} (Q)}) \leq {(\frac{b ∥ F κ^{m} ∥_{L_{2} (Q)}}{ϵ})}^{ν}$
  for any $ϵ > 0$ and each probability measure such that $Q {(F)}^{2} < \infty$ .
(C.5)
On the dependence of the random variables
- (C.5.1)
  Absolute regularity
  $\sum_{s = 1}^{\infty} s^{δ} {(log (s))}^{δ (1 - 1 / p)} {(β (s))}^{1 - 2 / p} < \infty,$
  for some $p > 2$ and $δ > 1 - 2 / p .$
- (C.5.2)
  There is a sequence of positive integers ${\{s_{n}\}}_{n \in N^{*}}$ such that, as $n \to \infty$ ,
  $s_{n} \to \infty, s_{n} = o (\sqrt{n \tilde{ϕ} (h (t))}), {(\frac{n}{\tilde{ϕ} (h (t))})}^{1 / 2} β (s_{n}) ⟶ 0 .$
(C.6)
On the entropy

For n large enough and for some $ω > 1$ , Kolomogorov’s entropy satisfies
(5) $\frac{{(log n)}^{2}}{n ϕ (h_{K})} \leq m ψ_{S_{H}} (\frac{log n}{n}) + m ψ_{Θ} (\frac{log n}{n}) \leq \frac{n}{log n} ϕ (h_{K}),$

(6) $\sum_{n = 1}^{\infty} exp \{m (1 - ω) \{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})\}\} < \infty,$
where $ψ_{A} (ε) : = log N_{ε} (A),$ and $N_{ε} (A)$ is the minimal number of open balls of radius $ε$ in $H$ needed to cover $A .$
(C.7.)
The sequences ${{\tilde{h}}_{n}}$ and ${{\tilde{h^{'}}}_{n}}$ (respectively, ${h_{n, 1}}$ and ${h_{n, 2}}$ ) verify
(7) ${\tilde{h^{'}}}_{n} ⟶ 0 and \frac{(log n / n)}{min {{\tilde{h}}_{n}^{2}, ϕ^{2} ({\tilde{h}}_{n})}} ⟶ 0 as n ⟶ \infty .$
(C.8.)
There exist sequences $\{ρ_{n, 1}, \dots, ρ_{n, m}\} \subset {(0, 1)}^{m}$ , $\{k_{1, n}\} \subset Z^{+}$ and $\{k_{2, n}\} \subset Z^{+}$ ( $k_{1, n} \leq k \leq k_{2, n}$ ) and constants $μ = (μ_{1}, \dots, μ_{m}) a n d ν = (ν_{1}, \dots, ν_{m})$ , such that
$0 < μ_{j} \leq ν_{j} < \infty, for all j = 1, \dots, m (and we note μ \leq ν),$
and
(8) $\begin{matrix} μ_{j} ϕ^{- 1} (\frac{ρ_{n, j} k_{1, n}}{n}) \leq ϕ_{t_{j}, θ_{j}}^{- 1} (\frac{ρ_{n, j} k_{1, n}}{n}) and ϕ_{t_{j}, θ_{j}}^{- 1} (\frac{k_{2, n}}{ρ_{n, j} n}) \leq ν_{j} ϕ^{- 1} (\frac{k_{2, n}}{ρ_{n, j} n}), \end{matrix}$

(9) $\begin{matrix} ϕ^{- 1} (\frac{k_{2, n}}{ρ_{n, j} n}) ⟶ 0, \end{matrix}$

(10) $\begin{matrix} min \{\frac{1 - ρ_{n, j}}{4} \frac{k_{1, n}}{ln n}, \frac{{(1 - ρ_{n, j})}^{2}}{4 ρ_{n, j}} \frac{k_{1, n}}{ln n}\} > 2, \end{matrix}$

(11) $\begin{matrix} \frac{(log n / n)}{min \{μ_{j} ϕ^{- 1} (\frac{ρ_{n, j} k_{1, n}}{n}), ϕ (μ_{j} ϕ^{- 1} (\frac{ρ_{n, j} k_{1, n}}{n}))\}} ⟶ 0 . \end{matrix}$

Additional/alternative conditions
- (C.1’.)
  For $X_{i} = (X_{i_{1}}, \dots, X_{i_{m}}), X_{j} = (X_{j_{1}}, \dots, X_{j_{m}}) a n d t = (t_{1}, \dots, t_{m}) \in H^{m},$
  - (C.1’.1)
    We have
    $ϕ_{t, θ_{0}} (x) = \tilde{ϕ} (x) f_{1} (t) < \infty as x \to \infty .$
    with $x = (x_{1}, \dots, x_{m}) \in R_{+}^{m}$ , $ϕ (0) = 0,$ and $ϕ (u)$ is absolutely continuous in a neighborhood of the origin.
  - (C.1’.2)
    We have
    $sup_{i \neq j} P \{X_{i} \in \prod_{i = 1}^{m} B_{θ_{i}} (t_{i}, h_{i} (t_{i})), X_{j} \in \prod_{i = 1}^{m} B_{θ_{i}} (t_{i}, h_{i} (t_{i}))\} \leq {\tilde{Ψ}}_{θ_{0}} (h (t)) f_{2} (t),$
    where $f_{2} (t)$ is a non-negative function, ${\tilde{Ψ}}_{θ_{0}} (h (t)) : = \prod_{i = 1}^{m} ψ_{θ_{i}} (h_{i} (t_{i}))$ , and ${\tilde{Ψ}}_{θ_{0}} (h (t)) \to 0$ as $n \to \infty$ satisfying ${\tilde{Ψ}}_{θ_{0}} (h (t)) /{\tilde{ϕ}}^{2} (h (t))$ is bounded.
- (C.2’.)
  The kernel function $K (\cdot)$ is supported within $[0, 1]$ , and there exist some constants $0 < κ_{2},$ $0 < κ_{1}^{'} \leq κ_{2}^{'} < \infty$ such that for $j = 1, 2$
  $\int_{0}^{1} K (x) d x = 1, K (\cdot) \leq κ_{2} 1_{[0, 1]} (\cdot), \frac{h_{n}}{ϕ (h_{K} (t))} \int_{0}^{1} K^{j} (x) ϕ^{'} (v h_{K}) d v \to κ_{j}^{'} as n \to \infty .$

Comments

In our nonparametric functional regression model, a significant theoretical challenge arises in establishing functional central limit theorems for conditional empirical processes and conditional U-processes under functional absolute regularity. Additionally, we incorporate random (or data-dependent) bandwidths based on the k-Nearest Neighbor (kNN) approach. Traditional statistical methods cannot be directly applied in this functional setting, so many of the conditions we impose are tailored to the characteristics of infinite-dimensional spaces, such as the topological structure of $H^{m}$ , the probability distribution of X, and the concept of measurability for the classes $F_{m}$ and $K^{m}$ . It is worth noting that these conditions draw inspiration from [56,58,65,111,112]. We begin with assumption (C.1.1), adapted from [111], which itself is influenced by [58]. As explained in [111], when $H^{m} = R^{m}$ , Condition (C.1.1) aligns with the fundamental axioms of probability calculus. Moreover, if $X^{m}$ is an infinite-dimensional Hilbert space, then $ϕ (h_{K})$ can decay exponentially as $n \to \infty$ . Condition (C.1.1) is a standard assumption on small ball probability, which controls the behavior of $ϕ_{t, θ} (\cdot)$ around zero. This condition allows us to express the small ball probability as the product of two independent functions, $\tilde{ϕ} (\cdot)$ and $f_{1} (\cdot)$ ; for further details, see, for example, ref. [137] for diffusion processes, [138] for Gaussian measures, and [139] for general Gaussian processes. A widely recognized result in the literature expresses small ball probability in the form $φ_{t} (ε) \sim g (t) ϕ (ε)$ , where $ϕ (ε) = ε^{γ} exp (- C / ε^{p})$ , with $γ \geq 0$ and $p \geq 0$ . This formulation corresponds to various processes, such as Ornstein–Uhlenbeck and general diffusion processes (with $p = 2$ and $γ = 0$ ) and fractal processes (with $γ > 0$ and $p = 0$ ). For additional examples, refer to [140]. When dealing with functional data, it is crucial to gather information about the variability of the small ball probability to adapt it to the bias of nonparametric estimators. This information is typically obtained by assuming

(C.1.1”)
$\forall u \in [0, 1] : lim_{r \to \infty} \frac{ϕ_{t, θ} (u r)}{ϕ_{t, θ} (r)} = lim_{r \to \infty} P (d_{θ} (X, t) \leq u r ∣ d_{θ} (X, t) \leq r) = : τ_{t, θ} (u) < \infty .$

In our model, we also consider Condition (C.2), which relates to the regularity of the model. This condition incorporates mild assumptions about the continuity of certain conditional moments and imposes the standard Lipschitz condition on Regression Function (C.2.1). Assumption (C.3) pertains to the kernel function

K (\cdot)

, where (C.3.1) can be replaced by (C.3’). For bounded functions, we employ (C.4.1), while for conditional U-processes indexed by an unbounded class, we utilize (C.4.2). It is essential to balance the moment order p in (C.4.2) with the decay rate of the mixing coefficient

β (s)

in (C.5). Condition (C.6) addresses topological concerns by controlling the Kolmogorov entropy of the set

S_{X}^{m}

, which is critical for studying uniform consistency and uniform-in-bandwidth consistency. Assumption (C.7) plays a crucial role in establishing the consistency rates of the estimator, while (C.8) adapts (C.7) to the specific case of functional conditional U-statistics within the kNN framework. These conditions collectively form the theoretical backbone for ensuring the rigor and robustness of our nonparametric functional regression model.

Remark 1.

It is important to highlight that condition (C.4.2) may be substituted with more general assumptions regarding the moments of $Y$ , as detailed in [141]. Specifically:

(M.1)″ We let ${M (x) : x \geq 0}$ be a nonnegative continuous function, increasing on $[0, \infty)$ , such that for some $s > 2$ , as $x ↑ \infty$ ,
(12) $(i) x^{- s} M (x) ↓; (i i) x^{- 1} M (x) ↑ .$

For each $t \geq M (0)$ , we define $M^{i n v} (t) \geq 0$ such that $M (M^{i n v} (t)) = t$ . We assume further that
$E (M (|F (Y)|)) < \infty .$
The following choices of $M (\cdot)$ are of particular interest:
- (i). $M (x) = x^{p}$ for some $p > 2$ ;
- (ii). $M (x) = exp (s x)$ for some $s > 0$ .
For a positive constant $L$ , we set
$ρ_{L}^{2} (f) = 2 L^{2} E (e^{| f (X) | / L} - 1 - | f (X) | / L) = 2 L^{2} \sum_{k = 2}^{\infty} \frac{{E | f (X) |}^{k}}{L^{k} k!}$
where X has probability law $P$ and define the Bernstein size of f as the nonnegative square root of $ρ_{L}^{2} (f)$ . Note that $ρ_{L} (f) < \infty$ if and only if $E e^{| f (X) | / L} < \infty$ , and that if this holds for some K, then ${lim}_{L \to \infty} ρ_{L} (f) = E f^{2} (X)$ . One can use the following condition $ρ_{L} (f) \leq R$ for some positive constant R.

In the following section, we comprehensively present the main results concerning the uniform consistency of the regression estimators and the conditional U-statistics. This includes detailed discussions on the theoretical foundations and the conditions under which these estimators achieve uniform consistency. We also explore the implications of these findings for practical applications, highlighting how they enhance the reliability and accuracy of statistical inference in regression analysis and conditional U-statistics.

3. Uniform Consistency

For simplicity reasons, Condition (C.1.1) on the small ball probability is replaced by

(H.1)
For $h \in R_{+}^{m} ∖ {0}$ and $t \in X^{m}$
(13) $0 < C_{1} \tilde{ϕ} (h) \leq ϕ_{t, θ_{0}} (h (t)) \leq C_{2} \tilde{ϕ} (h) < \infty .$

This standard condition can be considered an extension of the multivariate case where we assume that the density function of the variable

X

is strictly positive. Also, it is worth mentioning that if we particularly denote

h_{K} : = (h_{K}, \dots, h_{K}) \in {({\tilde{h}}_{n, 1}, {\tilde{h}}_{n, 2})}^{m}

, we can find two positive constants

C_{1}^{'}, C_{2}^{'}

such that

(14) $0 < C_{1}^{'} ϕ (h_{K}) \leq ϕ_{t, θ_{0}} (h_{K}) \leq C_{2}^{'} ϕ (h_{K}) < \infty,$

which is similar to Condition (C.1) used in [49,142], so we deal with

ϕ (h_{K})

instead of

ϕ^{m} (h_{K}),

(whenever we encounter a similar situation in the proofs). This approach does not merely serve as a notational convenience but also facilitates the integration of the UIB and UINN results, enhancing the theoretical coherence between the two frameworks.

3.1. Uniform Consistency of the kNN Kernel Estimator for Regression

In this section, we investigate the uniform consistency of the functional regression operator in its general form, which is expressed for all $t \in H$ as

(15) ${\hat{r}}_{n}^{* (1)} (φ, ⟨ t, θ ⟩, H_{n, k} (t, θ)) = \frac{\sum_{i = 1}^{n} φ (Y_{i}) K (\frac{d_{θ} (t, X_{i})}{H_{n, k} (t, θ)})}{\sum_{i = 1}^{n} K (\frac{d_{θ} (t, X_{i})}{H_{n, k} (t, θ)})},$

where

k = k_{n}

depending on n and

$H_{n, k} (t, θ) = min \{h \in R^{+} such that \sum_{i = 1}^{n} 1_{B_{θ} (t, h)} (X_{i}) = k\} .$

In fact, the kNN operator presented in (15) can be regarded as a natural generalization, broadening its applicability and enhancing its flexibility in handling more complex, high-dimensional functional data, of the traditional kernel regression estimator

(16) ${\hat{r}}_{n}^{(1)} (φ, ⟨ t, θ ⟩, h_{K}) = \frac{\sum_{i = 1}^{n} φ (Y_{i}) K (\frac{d_{θ} (t, X_{i})}{h_{K}})}{\sum_{i = 1}^{n} K (\frac{d_{θ} (t, X_{i})}{h_{K}})}, for each t \in H,$

where the bandwidth

h_{K} \in R_{+}^{*}

depends on n (but does not depend on t).

3.1.1. UIB Consistency for Functional Regression

Recall the bandwidths $h_{n, 1}$ and $h_{n, 2}$ given in Condition (C.7.). The following theorem plays an instrumental role in the sequel.

Theorem 1.

Under Assumptions(H.1.), (C.2.1) , (C.3.1) , (C.4.1) , (C.4.3) , (C.6.) and (C.7.) (for $m = 1$ ), we have, as $n \to \infty$ ,

(17) $sup_{φ K \in F K_{Θ}^{m}} sup_{h_{n, 1} \leq h_{K} \leq h_{n, 2}} sup_{t \in S_{H}} sup_{θ \in Θ} |{\hat{r}}_{n}^{(1)} (φ, ⟨ t, θ ⟩; h_{K}) - r^{(1)} (φ, ⟨ t, θ ⟩)| = O (h_{n, 2}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (h_{n, 1})}}) .$

The following result gives uniform consistency when the class of functions is unbounded.

Corollary 1.

Under Assumptions (H.1.) , (C.2.1) , (C.3.1) , (C.4.2) , (C.4.3) , (C.6.) and (C.7.) (for $m = 1$ ), we have

(18) $sup_{φ K \in F K_{Θ}^{m}} sup_{h_{n, 1} \leq h_{K} \leq h_{n, 2}} sup_{t \in S_{H}} sup_{θ \in Θ} |{\hat{r}}_{n}^{(1)} (φ, ⟨ t, θ ⟩; h_{K}) - r^{(1)} (φ, ⟨ t, θ ⟩)| = O (h_{n, 2}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (h_{n, 1})}}) .$

3.1.2. UINN Consistency for Functional Regression

Now, we can state the main results of this section concerning the kNN functional regression. Recall the bandwidths $k_{1, n}$ and $k_{2, n}$ given in Condition (C.8.).

Theorem 2.

Under Assumptions (H.1.) , (C.2.1) , (C.3.1) , (C.4.1) , (C.4.3) , (C.6.) and (C.7.) (for $m = 1$ ), if, in addition, Condition (C.8.) is satisfied, we have

$sup_{φ K \in F K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}} sup_{θ \in Θ} |{\hat{r}}_{n}^{* (1)} (φ, ⟨ t, θ ⟩; H_{n, k} (t, θ)) - r^{(1)} (φ, ⟨ t, θ ⟩)| = O (ϕ^{- 1} {(\frac{k_{2, n}}{ρ_{n} n})}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ ϕ^{- 1} (\frac{ρ_{n} k_{1, n}}{n}))}}) .$

The following result gives uniform consistency when the class of functions is unbounded.

Corollary 2.

Under Assumptions (H.1.) , (C.2.1) , (C.3.1) , (C.4.2) , (C.4.3) , (C.6.) and (C.7.) (for $m = 1$ ) and if Condition (C.8.) is satisfied, we have

(19) $sup_{φ K \in F K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}} sup_{θ \in Θ} |{\hat{r}}_{n}^{* (1)} (φ, ⟨ t, θ ⟩; H_{n, k} (t, θ)) - r^{(1)} (φ, ⟨ t, θ ⟩)| = O (ϕ^{- 1} {(\frac{k_{2, n}}{ρ_{n} n})}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ ϕ^{- 1} (\frac{ρ_{n} k_{1, n}}{n}))}}) .$

3.2. Relative-Error Prediction

Recall that the operator $r (\cdot)$ is typically estimated by minimizing the expected squared loss function $E [{(Y - r (X))}^{2} ∣ X]$ . While this loss function is commonly used to assess prediction performance, it may not be suitable in all contexts. Least-squares regression assigns equal weight to all variables, potentially making the results highly sensitive to outliers. In this paper, we address the limitations of classical regression by proposing the estimation of the operator m through the minimization of the mean squared relative error (MSRE), which offers a more suited alternative:

(20) $E [{(\frac{Y - r (X)}{Y})}^{2} ∣ X], for Y > 0 almost surely .$

This criterion provides a more meaningful measure of prediction performance than the least-squares error, particularly in cases where the range of predicted values is large. Moreover, the solution to (20) can be explicitly formulated as the ratio of the first two conditional inverse moments of Y given X. To develop a regression estimator that optimally minimizes the MSRE, we assume that the first two conditional inverse moments of Y given X, defined as

g_{γ} (x) : = E (Y^{- γ} ∣ X = x)

for

γ = 1, 2

, exist and are finite almost surely. As demonstrated in [143,144,145], the optimal mean squared relative error predictor of Y given X is

$\overset{˘}{r} (⟨ t, θ ⟩) = \frac{E (Y^{- 1} ∣ ⟨ X, θ ⟩ = ⟨ t, θ ⟩)}{E (Y^{- 2} ∣ ⟨ X, θ ⟩ = ⟨ t, θ ⟩)}, almost surely .$

Therefore, we estimate the regression operator

\overset{˘}{r} (\cdot)

, which minimizes the MSRE, using

(21) ${\overset{˘}{r}}_{n}^{(1)} (⟨ t, θ ⟩, h_{K}) = \frac{\sum_{i = 1}^{n} Y_{i}^{- 1} K (\frac{d_{θ} (t, X_{i})}{h_{K}})}{\sum_{i = 1}^{n} Y_{i}^{- 2} K (\frac{d_{θ} (t, X_{i})}{h_{K}})}, for each t \in H,$

and

(22) ${\overset{˘}{r}}_{n}^{* (1)} (⟨ t, θ ⟩, h_{K}) = \frac{\sum_{i = 1}^{n} Y_{i}^{- 1} K (\frac{d_{θ} (t, X_{i})}{H_{n, k} (t, θ)})}{\sum_{i = 1}^{n} Y_{i}^{- 2} K (\frac{d_{θ} (t, X_{i})}{H_{n, k} (t, θ)})}, for each t \in H .$

By considering the special cases

φ (y) = y^{- 1}

and

φ (y) = y^{- 2}

in Corollaries 1 and 2, we obtain results that complement the work in [145,146,147].

Corollary 3.

Under Assumptions (H.1) , (C.2.1) , (C.3.1) , (C.4.2) , (C.4.3) , (C.6) , and (C.7) (for $m = 1$ ), we have

(23) $sup_{K \in K_{Θ}^{m}} sup_{h_{n, 1} \leq h_{K} \leq h_{n, 2}} sup_{t \in S_{H}} sup_{θ \in Θ} |{\overset{˘}{r}}_{n}^{(1)} (⟨ t, θ ⟩, h_{K}) - \overset{˘}{r} (⟨ t, θ ⟩)| = O (h_{n, 2}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (h_{n, 1})}}) .$

This result has not been previously addressed in the literature.

Corollary 4.

Under Assumptions (H.1.) , (C.2.1) , (C.3.1) , (C.4.2) , (C.4.3) , (C.6.) and (C.7.) (for $m = 1$ ), and if Condition (C.8.) is satisfied, we have

(24) $sup_{K \in K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}} sup_{θ \in Θ} |{\overset{˘}{r}}_{n}^{* (1)} (⟨ t, θ ⟩, h_{K}) - \overset{˘}{r} (⟨ t, θ ⟩)| = O (ϕ^{- 1} {(\frac{k_{2, n}}{ρ_{n} n})}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ ϕ^{- 1} (\frac{ρ_{n} k_{1, n}}{n}))}}) .$

3.3. Uniform Consistency of the kNN Functional Conditional U-Statistics

In addition to the conditions previously established, the following assumptions are crucial for obtaining exponential inequalities for dependent data which are used later in the proofs:

(A1)
Assume that the sequence ${\{X_{i}\}}_{i \in N^{*}}$ is strictly stationary, and there exists an absolute constant $δ \geq 1$ such that, for any $n \geq 1$ , the $β$ -mixing coefficient corresponding to ${\{X_{i}\}}_{i \in N^{*}}$ satisfies $β (n) ≲ n^{- δ}$ .
(A2)
Assume that, uniformly, for any integer J such that $1 \leq J \leq m - 1$ and arbitrary indices $1 \leq i_{1} < \dots < i_{J} \leq n$ , the sequence ${\{X_{i}\}}_{i = i_{J} + 1}^{\infty}$ , conditional on $X_{i_{1}}, \dots, X_{i_{J}}$ , satisfies, for the corresponding $α$ -mixing coefficient,
$α (n; X_{i_{1}}, \dots, X_{i_{J}}) : = sup_{j \geq i_{J} + 1} α (σ_{i_{J} + 1}^{j}, σ_{j + n}^{\infty}; P (\cdot ∣ X_{i_{1}}, \dots, X_{i_{J}})) ≲ exp (- γ n), almost surely,$
where $P (\cdot ∣ X_{i_{1}}, \dots, X_{i_{J}})$ represents the conditional probability. In particular, for the $α$ -mixing coefficient of the sequence ${\{X_{i}\}}_{i \in N^{*}}$ , one obtains
$α (n) ≲ exp (- γ n) .$

The

β

-mixing Condition (A1) plays a crucial role in establishing the asymptotic normality of U-statistics in scenarios where strict Lipschitz continuity of the kernel functions is not present. This condition has been extensively discussed in the literature, including works such as [148,149], and in Remarks 2.2 and 2.3 of [150]. In contrast, Assumption (A2) is generally less restrictive than the

ϕ

-mixing condition. As shown in [150], sequences of finite-state and vector-valued absolutely continuous data with an exponentially decaying

ϕ

-mixing rate satisfy (A2). Nevertheless, it should be noted that this assumption is more restrictive compared to the polynomially decaying mixing rate discussed in [149]. The requirement to compute higher moments of U-statistics for obtaining sharp concentration inequalities underpins the necessity of an “exponentially decaying rate”, which is a commonly imposed condition in the literature to derive concentration inequalities for weakly dependent data, as outlined in [151] and further elaborated in Remarks 2.4 and 2.5 of [150]. This section focuses on investigating the uniform consistency, UIB (uniform in bandwidth) consistency, and UINN (uniform in Nearest Neighbor) consistency of the functional conditional U-statistic defined in (2). Before proceeding, we first introduce some notation. For an interval

H_{n}^{(m)} \subset R_{+}^{m} ∖ {0}

, we define

$H_{n}^{(m)} : = \prod_{j = 1}^{m} (h_{n, j}, h_{n, j}^{'}),$

where

$0 < h_{n, j} < h_{n, j}^{'}, lim_{n \to \infty} h_{n, j} = lim_{n \to \infty} h_{n, j}^{'} = 0, \forall j = 1, \dots, m .$

Subsequently, we denote

${\tilde{h}}_{n} = min_{1 \leq j \leq m} h_{n, j}, {\tilde{h}}_{n}^{'} = max_{1 \leq j \leq m} h_{n, j}^{'} .$

For any

b = (b_{1}, \dots, b_{m}) \in {(0, 1)}^{m}

, we denote

$H_{0}^{(m)} : = \prod_{j = 1}^{m} (h_{n, j}, b_{j}),$

and

${\tilde{b}}_{0} : = max_{1 \leq j \leq m} b_{j} .$

For the sake of notational simplicity, when

m = 1

, we reduce

H_{n}^{(1)}

and

H_{0}^{(1)}

H_{n}

and

H_{0}

, respectively. This convention similarly applies to other related notations, unless explicitly stated otherwise. We now proceed to define

$\begin{matrix} X & : = (X_{1}, \dots, X_{m}) \in H^{m}, Y : = (Y_{1}, \dots, Y_{m}) \in Y^{m}, \\ X_{i} & : = (X_{i_{1}}, \dots, X_{i_{m}}), Y_{i} : = (Y_{i_{1}}, \dots, Y_{i_{m}}), \\ h_{n, k} (t) & : = (H_{n, k} (t_{1}), \dots, H_{n, k} (t_{m})), for t = (t_{1}, \dots, t_{m}) \in S_{H}^{m}, \end{matrix}$

and

$G_{φ, t, θ; h} (x, y) : = \frac{φ (y_{1}, \dots, y_{m}) \prod_{i = 1}^{m} K (\frac{d_{θ_{i}} (x_{i}, t_{i})}{h_{i} (t_{i})})}{\prod_{i = 1}^{m} E [K (\frac{d_{θ_{i}} (X_{i}, t_{i})}{h_{i} (t_{i})})]},$

for

t, x \in H^{m}

θ \in Θ^{m}

, and

y \in Y^{m}

. We also define

$G : = \{G_{φ, t, θ; h} (\cdot, \cdot) : φ \in F_{m}, t = (t_{1}, \dots, t_{m}) \in H^{m}, θ \in Θ^{m}\},$

and

$G^{(p)} : = \{π_{p, m} G_{φ, t, θ; h} (\cdot, \cdot) : φ \in F_{m}, t = (t_{1}, \dots, t_{m}), θ \in Θ^{m}\} .$

The functional conditional U-statistic $u_{n} (φ, t, θ; h_{n, k} (t))$ , given by

$u_{n} (φ, t, θ; h_{n, k} (t)) = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} G_{φ, t, θ; h_{n, k}, θ} (X_{i}, Y_{i}),$

is a classical U-statistic with the U-kernel

G_{φ, t, θ; h_{n, k}, θ} (x, y)

. The investigation of the uniform consistency of

{\hat{r}}_{n}^{* (m)} (φ, t, θ; h_{n, k} (t))

relative to

r^{(m)} (φ, t, θ)

presents notable challenges due to the randomness inherent in the bandwidth vector

h_{n, k} (t)

, which introduces significant technical complexities. To address these difficulties, we begin by analyzing the uniform consistency of

{\hat{r}}_{n}^{* (m)} (φ, t, θ; h, θ)

, where

h = (h_{1}, \dots, h_{m}) \in H_{n}^{(m)}

represents a fixed multivariate bandwidth independent of

t

and k. Our analysis extends to both the uniform consistency and UIB consistency of

u_{n} (φ, t, θ; h)

relative to

E (u_{n} (φ, t, θ; h))

, in cases where

φ \in F_{m}

as well as when

φ \equiv 1

. Furthermore, we propose a more suitable centering factor than the expectation

E ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h))

, defined as follows:

$\hat{E} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h)) = \frac{E (u_{n} (φ, t, θ; h))}{E (u_{n} (1, t, θ; h))} .$

The second step involves utilizing a general lemma from [51], suitably adapted to our specific framework, following an approach akin to that in [152] (see Appendix A.1). This allows us to rigorously derive the necessary results for the bandwidth

h_{n, k} (t)

Uniform Consistency and UIB Consistency for a Multivariate Bandwidth

We now present the Uniform In Bandwidth (UIB) results for all $t \in S_{H}^{m}$ , $θ \in Θ^{m}$ , and $h \in H_{n}^{(m)}$ . To begin, we establish the result concerning the uniform deviation of the estimator $u_{n} (φ, t, θ; h, θ)$ from $E (u_{n} (φ, t, θ; h, θ))$ under the condition that the class of functions is bounded.

Theorem 3.

We suppose that Conditions (H.1) , (C.3.1) , (C.4.1) , (C.4.4) , (C.6) , and (C.7) hold. Then, as $n \to \infty$ , we have

(25) $sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |u_{n} (φ, t, θ; h) - E (u_{n} (φ, t, θ; h))| = O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}) .$

The following theorem establishes the uniform deviation of the estimator $u_{n} (φ, t, θ; h, θ)$ from its expectation $E (u_{n} (φ, t, θ; h, θ))$ , specifically for an unbounded class of functions that satisfy general moment conditions.

Theorem 4.

Suppose that Conditions (H.1) , (C.3.1) , (C.4.2) , (C.4.4) , (C.6) , and (C.7) hold. For all $0 < {\tilde{b}}_{0} < 1$ , as $n \to \infty$ , we have

(26) $sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{h \in H_{0}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |u_{n} (φ, t, θ; h) - E (u_{n} (φ, t, θ; h))| = O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}) .$

The following theorem provides the uniform deviation result for the estimate ${\hat{r}}_{n}^{* (m)} (φ, t, θ; h)$ from $\hat{E} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h))$ for both bounded and unbounded classes of functions satisfying general moment conditions.

Theorem 5.

We suppose that Conditions (H.1) , (C.3.1) , (C.4.1) , (C.4.4) , (C.6) , and (C.7) (or alternatively (H.1) , (C.3.1) , (C.4.2) , (C.4.4) , (C.6) , and (C.7) ) hold. For all $0 < {\tilde{b}}_{0} < 1$ , as $n \to \infty$ , we have

(27) $sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{h \in H_{0}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |{\hat{r}}_{n}^{* (m)} (φ, t, θ; h) - \hat{E} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h))| = O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}) .$

Theorem 6.

We suppose that Conditions (H.1) , (C.2.1) , (C.3.1) , and (C.6) hold. For all $0 < {\tilde{h}}_{n}^{'} < 1$ , with ${\tilde{h}}_{n}^{'} \to 0$ as $n \to \infty$ , we have

(28) $sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |\hat{E} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h)) - r^{(m)} (φ, t, θ)| = O ({\tilde{h^{'}}}_{n}^{γ}) .$

Corollary 5.

Under the assumptions of Theorems 5 and 6, it follows that, as $n \to \infty$ ,

(29) $sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |{\hat{r}}_{n}^{* (m)} (φ, t, θ; h) - r^{(m)} (φ, t, θ)| = O ({\tilde{h^{'}}}_{n}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}) .$

Remark 2.

Following the methodology outlined in [110,112,153], we decompose the U-statistics into distinct components. Certain components can be approximated by U-statistics derived from independent blocks, while others, when conditioned on a particular block, behave as empirical processes of independent blocks. To demonstrate the insignificance of the nonlinear terms, we utilize symmetrization techniques and apply maximal inequalities, as thoroughly discussed in [10,154].

3.4. Uniform Consistency and UINN Consistency of Functional Conditional U-Statistics

Let $0 < μ^{*} \leq ν^{*} < \infty$ be constants and let $ρ_{n}^{*} \in (0, 1)$ be a sequence chosen such that

$min_{1 \leq j \leq m} μ_{j} ϕ^{- 1} (\frac{ρ_{n, j} k_{1, n}}{n}) = μ^{*} ϕ^{- 1} (\frac{ρ_{n}^{*} k_{1, n}}{n})$

and

$max_{1 \leq j \leq m} ν_{j} ϕ^{- 1} (\frac{k_{2, n}}{ρ_{n, j} n}) = ν^{*} ϕ^{- 1} (\frac{k_{2, n}}{ρ_{n}^{*} n}) .$

The following theorem addresses the uniform deviation of the estimator

u_{n} (φ, t, θ; h_{n, k} (t))

from its expected value

E (u_{n} (φ, t, θ; h_{n, k} (t)))

, under the assumption that the class of functions is bounded.

Theorem 7.

Suppose that Conditions (H.1.) , (C.3.1) , (C.4.1) , (C.4.4) , (C.6.) , and (C.7.) are satisfied. Additionally, if assumption (C.8.) holds, then, as $n \to \infty$ ,

(30) $\begin{matrix} sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |u_{n} (φ, t, θ; h_{n, k} (t)) - E (u_{n} (φ, t, θ; h_{n, k} (t)))| \\ = O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ^{*} ϕ^{- 1} (\frac{ρ_{n}^{*} k_{1, n}}{n}))}}) . \end{matrix}$

The following theorem addresses the uniform deviation of the estimator $u_{n} (φ, t, θ; h_{n, k} (t))$ from its expected value $E (u_{n} (φ, t, θ; h_{n, k} (t)))$ when the class of functions is unbounded but adheres to general moment conditions.

Theorem 8.

Suppose that Conditions (H.1.) , (C.3.1) , (C.4.2) , (C.4.4) , (C.6.) , and (C.7.) hold. Additionally, if Assumption (C.8.) is satisfied, then, as $n \to \infty$ ,

(31) $\begin{matrix} sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |u_{n} (φ, t, θ; h_{n, k} (t)) - E (u_{n} (φ, t, θ; h_{n, k} (t)))| \\ = O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ^{*} ϕ^{- 1} (\frac{ρ_{n}^{*} k_{1, n}}{n}))}}) . \end{matrix}$

The subsequent results are centered on establishing the uniform consistency of the estimator $u_{n}$ for both bounded and unbounded classes of functions, ensuring that the consistency holds across varying function spaces.

Theorem 9.

Suppose that Conditions (H.1.) , (C.3.1) , (C.4.1) , (C.4.4) , (C.6.) , and (C.7.) (or alternatively, (H.1.) , (C.3.1) , (C.4.2) , (C.4.4) , (C.6.) , and (C.7.) ) hold. If, in addition, Assumption (C.8.) is satisfied, then, as $n \to \infty$ ,

(32) $\begin{matrix} sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |{\hat{r}}_{n}^{* (m)} (φ, t, θ; h_{n, k} (t)) - \hat{E} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h_{n, k} (t)))| \\ = O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ^{*} ϕ^{- 1} (\frac{ρ_{n}^{*} k_{1, n}}{n}))}}) . \end{matrix}$

Theorem 10.

Suppose that Conditions (H.1.) , (C.2.1) , (C.3.1) , and (C.6.) hold. If, in addition, Assumption (C.8.) is satisfied, then, as $n \to \infty$ ,

(33) $sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |\hat{E} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h_{n, k} (t))) - r^{(m)} (φ, t, θ)| = O (ϕ^{- 1} {(\frac{k_{2, n}}{ρ_{n}^{*} n})}^{γ}) .$

Corollary 6.

Under the assumptions of Theorems 9 and 10, it follows that, as $n \to \infty$ ,

(34) $sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |{\hat{r}}_{n}^{* (m)} (φ, t, θ; h_{n, k} (t)) - r^{(m)} (φ, t, θ)| = O (ϕ^{- 1} {(\frac{k_{2, n}}{ρ_{n}^{*} n})}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ^{*} ϕ^{- 1} (\frac{ρ_{n}^{*} k_{1, n}}{n}))}}) .$

Remark 3.

The selection of parameters $μ^{*}$ and $ρ_{n}^{*}$ , as defined analogously in Condition (C.8.) , is crucial in determining the convergence rate of the kNN estimator. These parameters are chosen based on the small ball probability function $ϕ_{t, θ} (h)$ , offering flexibility in adjusting the estimator to enhance performance, depending on the specific characteristics of the data.

Remark 4.

This study builds upon the foundational work in [49,51,155], introducing several important advancements. One of the primary distinctions of our approach is the minimal restrictions placed on the choice of the kernel function, with only mild conditions required, as detailed later in the paper. However, determining the optimal bandwidth or the number of neighbors presents a more complex challenge. The selection of bandwidth is pivotal, as it directly impacts the estimator’s consistency rate and bias. Our goal is to identify bandwidth and neighbor parameters that strike an optimal balance between bias and variance. Unlike traditional approaches, we propose a more flexible framework in which bandwidth and the number of neighbors are determined based on specific criteria, data characteristics, and local considerations. For a deeper exploration of this topic, readers are referred to works such as [156] and related literature. In this section, we address these challenges within the context of Functional Data Analysis (FDA), particularly under the assumption of dependent data. Specifically, we extend the analysis to single-index models and, for the first time, explore the application of conditional U-statistics in this setting. Our methodology involves decomposing U-statistics into several components: some of these components are approximated by U-statistics derived from independent blocks, while others, when conditioned on a single block, behave as empirical processes of independent blocks. While this decomposition is a powerful tool, it complicates the proof process—a necessary trade-off for extending the methodology to dependent structures. By utilizing this decomposition, we are able to adapt techniques traditionally applied to independent variables, drawing significantly from the results in [51,155]. Furthermore, this paper introduces a novel exponential inequality specifically designed for the dependent data setting, as presented in [150], representing a notable contribution to the existing literature.

In the following section, we provide a comprehensive presentation of the main results concerning uniform central limit theorems for both regression processes and conditional U-processes. This includes detailed statements of the theorems as well as thorough and meticulous explanations of the methodologies employed in proving these results.

4. Uniform Central Limit Theorems

4.1. kNN Conditional Empirical Process

We define the functional conditional empirical process for univariate bandwidth $h_{K}$ by

(35) $\{ν_{n} (ψ; h_{K} ∣ ⟨ t, θ ⟩) = \sqrt{k} ({\hat{r}}_{n}^{* (1)} (ψ, ⟨ t, θ ⟩; h_{K}) - r^{(1)} (ψ, ⟨ t, θ ⟩)), ψ \in F K_{Θ}^{m}\},$

where

{\hat{r}}_{n}^{* (1)} (ψ, ⟨ t, θ ⟩; h_{K})

designates (3) when

m = 1

, and

r^{(1)} (ψ, ⟨ t, θ ⟩)

refers to Regression Function (1), with

$ψ (\cdot, \cdot) \in F K_{Θ}^{m} : = F_{1} K^{1} = \{φ (\cdot) K (\frac{d_{θ} (\cdot, t)}{h_{K}}) : φ \in F_{1}, K (\frac{d_{θ} (\cdot, t)}{h_{K}}) \in K_{Θ}\} .$

If for

P ψ = \int ψ d P

, where

P

is the probability measure and for each

(x, y)

$sup_{ψ \in F K_{Θ}^{m}} |ψ (x, y) - P ψ| < \infty,$

then

\{ν_{n} (ψ ∣ t, θ) : ψ \in F K_{Θ}^{m}\}

is a random element with values in

l_{\infty} (F K_{Θ}^{m}),

consisting of all functional

ν_{\infty}

F K_{Θ}^{m}

such that

$sup_{ψ \in F K_{Θ}^{m}} |ν_{\infty} (ψ)| < \infty .$

Then, it is important to investigate the following weak convergence:

$\{ν_{n} (ψ ∣ ⟨ t, θ ⟩) : ψ \in F K_{Θ}^{m}, t \in I, θ \in Θ\} \overset{w}{⟶} \{G (ψ) : ψ \in F K_{Θ}^{m}\} in l_{\infty} (F K_{Θ}^{m}) .$

It is well established that weak convergence to a Gaussian limit, with uniformly bounded and uniformly continuous paths (with respect to the

{∥ \cdot ∥}_{2}

norm), is equivalent to the combination of finite-dimensional convergence and the existence of a pseudo-metric

d_{p . m}

F K_{Θ}^{m}

. This pseudo-metric ensures that

(F K_{Θ}^{m}, d_{p . m})

forms a totally bounded pseudo-metric space:

(36) $lim_{r \to 0} \underset{n \to \infty}{lim sup} P^{*} \{sup_{d_{p . m} (ψ_{1}, ψ_{2}) ⩽ r} |ν_{n} ((ψ_{1} - ψ_{2}) ∣ t)| > ε\} = 0 .$

Below, we denote

Z \overset{d}{=} N (μ, Σ^{2})

to indicate that the random vector Z follows a normal distribution with mean vector

μ

and covariance matrix

Σ^{2}

, and use

\overset{d}{\to}

to denote convergence in distribution. The following theorem, adapted from [111] for the context of kNN estimators, aims to investigate central limit theorems for the functional conditional empirical process defined as

(37) $\{ν_{n} (ψ; H_{n, k} (t, θ)) ∣ ⟨ t, θ ⟩) = \sqrt{k} ({\hat{r}}_{n}^{* (1)} (ψ, ⟨ t, θ ⟩; H_{n, k} (t, θ)) - r^{(1)} (ψ, ⟨ t, θ ⟩)) : ψ \in F K_{Θ}^{m}\} .$

Theorem 11.

Let us consider the class of functions $F K_{Θ}^{m}$ and suppose that Conditions (C.1’.) , (C.1.2) , (C.2.1) , (C.2.2) , (C.3’.) , (C.5.) , and (C.8.) hold. If the smoothing parameter k satisfies, for all $γ > 0$ ,

$k {(ϕ^{- 1} (\frac{k}{n}))}^{2 γ} \to 0 a s n \to \infty,$

we obtain, for $l \geq 1 : ψ_{1}, \dots, ψ_{l} \in F K_{Θ}^{m},$

$\{ν_{n} (ψ_{i}; H_{n, k} (t, θ)) ∣ ⟨ t, θ ⟩) : i = 1, \dots, l\} \overset{d}{⟶} N (0, Σ),$

where $Σ : = {(σ_{i, j})}_{i, j = 1, \dots, l}$ is the covariance matrix with

$σ_{i, j} : = \frac{κ_{2}^{'} r^{(1)} (φ_{i} φ_{j}, ⟨ t, θ ⟩) - r^{(1)} (φ_{i}, ⟨ t, θ ⟩) r^{(1)} (φ_{j}, t)}{κ_{1}^{'} f_{1} (t)} .$

Theorem 12.

We suppose that Conditions (C.3.1) , (C.4.2)–(C.5.1) , and (C.8.) hold, and for each $φ \in F,$

$E (φ^{2} (Y_{1})) < \infty .$

Then, we have

$lim_{b \to 0} \underset{n \to \infty}{lim sup} P \{sup_{\underset{ψ_{1}, ψ_{2} \in F K_{Θ}^{m}}{∥ ψ_{1} - ψ_{2} {) ∥}_{p} ⩽ b}} | ν_{n} ((ψ_{1} - ψ_{2}) ∣ t) | > ε\} = 0 .$

The two previous theorems can be summarized as follows:

Theorem 13.

Under Conditions (C.1’.) , (C.1.2) , (C.2.) , (C.3.1) , (C.3’.) , (C.4.4)(C.5.1) , (C.5.2) , and (C.8.) , the process as $n \to \infty$ ,

$\{ν_{n} (ψ; H_{n, k} (t))) ∣ ⟨ t, θ ⟩) = \sqrt{k} ({\hat{r}}_{n}^{* (1)} (ψ, ⟨ t, θ ⟩; H_{n, k} (t))) - r^{(1)} (ψ, ⟨ t, θ ⟩)) : ψ \in F K_{Θ}^{m}\},$

converges in law to a Gaussian process $\{G_{n} (ψ) : ψ \in F K_{Θ}^{m}\}$ that admits a version with uniformly bounded and uniformly continuous paths with respect to the ${∥ \cdot ∥}_{2} -$ norm.

Remark 5.

We note that additional applications can be derived from Theorem 13, including the conditional distribution, conditional density, and conditional hazard function. However, these and other relevant applications are not addressed here due to space constraints.

Remark 6.

Paper [157] introduces a comprehensive semiparametric functional regression model designed for cases where the predictor sets include a combination of functional and multivariate elements within the i.i.d. framework. This model integrates single-index techniques to handle functional predictors and partial–linear methods to address multivariate predictors. Similarly, ref. [90] conducts an in-depth exploration of the functional semiparametric model (FSIM), providing general asymptotic results for the kNN procedure with a focus on uniformity across all model parameters. In contrast, our study distinguishes itself from the aforementioned works in several key ways. First, while prior studies operate within an independent data setting, our research addresses a mixing data framework. Additionally, we extend the analysis by incorporating uniformity considerations for both classes of functions and the parameter “t”, as presented in Corollaries 4 and 1. Allowing function φ to vary within a class of functions enables results for the conditional distribution to emerge as a special case, among others. Another significant difference lies in our inclusion of weak convergence, as detailed in Theorem 13, which hinges on the intricate result of tightness outlined in Theorem 12, accompanied by rigorous and lengthy proofs. Furthermore, while earlier studies primarily focus on the case where $m = 1$ , extending the results to the nonlinear domain where $m \geq 2$ requires utilizing a range of techniques from empirical process theory and U-processes in the conditional setting poses a more complex and fascinating challenge compared to unconditional processes. This paper presents all results for arbitrary values of m. Finally, we explore the domain of censored data, a subject of independent interest that merits further investigation.

4.2. kNN Conditional U-Processes

In this section, we focus on investigating the weak convergence of conditional U-processes under the assumption of absolute regular observations. The class of functions under consideration is $F_{m} K_{Θ}^{m}$ , as defined in Section 2. The conditional U-process, indexed by $F_{m} K_{Θ}^{m}$ , is given by

(38) ${\{U_{n}^{(m)} (φ, t, θ; h (t)) : = \sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h (t)) - r^{(m)} (φ, t, θ))\}}_{F_{m} K_{Θ}^{m}},$

The U-empirical process is defined by

$μ_{n} (φ, t, θ; h (t)) : = \sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} \{u_{n} (φ, t, θ; h (t)) - E (u_{n} (φ, t, θ; h (t)))\} .$

It is important to note that in order to establish the weak convergence of (38), it is first necessary to address the convergence of (40) below. In fact, we elaborate on certain key details that are utilized later in the analysis. Since Condition (C.6.) holds, for each

λ > 0

, we have

(39) $\begin{matrix} G_{φ, t, θ, h} (x, y) & = & G_{φ, t, θ, h} (x, y) 1_{\{κ^{m} F (y) \leq λ {(n / \tilde{ϕ} (h))}^{1 / 2 (p - 1)}\}} \\ + G_{φ, t, θ, h} (x, y) 1_{\{κ^{m} F (y) > λ {(n / \tilde{ϕ} (h))}^{1 / 2 (p - 1)}\}} \\ = : & G_{φ, t, θ; h}^{(T)} (x, y) + G_{φ, t, θ; h}^{(R)} (x, y) . \end{matrix}$

We can write the U-statistic as follows:

(40) $\begin{matrix} μ_{n} (φ, t, θ; h (t)) & = & \sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} \{u_{n}^{(m)} (G_{φ, t, θ, h}^{(T)}) - E (u_{n}^{(m)} (G_{φ, t, θ, h}^{(T)}))\} \\ + \sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} \{u_{n}^{(m)} (G_{φ, t, θ, h}^{(R)}) - E (u_{n}^{(m)} (G_{φ, t, θ, h}^{(R)}))\} \\ = : & \sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} \{u_{n}^{(T)} (φ, t, θ, h (t)) - E (u_{n}^{(T)} (φ, t, θ, h (t)))\} \\ + \sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} \{u_{n}^{(R)} (φ, t, θ, h (t)) - E (u_{n}^{(R)} (φ, t, θ, h (t)))\} \\ = : & μ_{n}^{(T)} (φ, t, θ; h (t)) + μ_{n}^{(R)} (φ, t, θ; h (t)) . \end{matrix}$

We refer to the first term on the right-hand side of (40) as the truncated part, denoted by

μ_{n}^{(T)} (φ, t, θ; h (t))

, and the second term as the remainder part, denoted by

μ_{n}^{(R)} (φ, t, θ; h (t))

. Our initial focus is on

μ_{n}^{(T)} (φ, t, θ; h (t))

. By applying Hoeffding’s decomposition, we obtain

(41) $\begin{matrix} u_{n}^{(T)} (φ, t, θ, h (t)) & = & \sum_{p = 0}^{m} \frac{m!}{(m - p)!} u_{n}^{(p)} (π_{p, m} G_{φ, t, θ, h}^{(T)}) \\ = & E G_{φ, t, θ, h}^{(T)} (X^{'}, Y^{'}) + \sum_{p = 1}^{m} \frac{m!}{(m - p)!} u_{n}^{(p)} (π_{p, m} G_{φ, t, θ, h}^{(T)}), \end{matrix}$

where

{(X_{i}^{'}, Y_{i}^{'})}_{i \in N}

is a sequence of i.i.d. r.v. with

L (X_{i}^{'}, Y_{i}^{'}) = L (X_{i}, Y_{i})

for each i, and

X^{'}

and

Y^{'}

are defined, respectively, as

X

and

Y

. In view of (41), we have

(42) $\begin{matrix} μ_{n}^{(T)} & (φ, t, θ; h (t)) \\ = \sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} \{E G_{φ, t, θ}^{(T)} (X^{'}, Y^{'}) + \sum_{p = 1}^{m} \frac{m!}{(m - p)!} u_{n}^{(p)} (π_{p, m} G_{φ, t, θ, h}^{(T)}) - E (u_{n}^{(T)} (φ, t, θ))\}; \end{matrix}$

the stationarity assumption and some algebras show that

$E (u_{n}^{(T)} (φ, t, θ, h (t))) = E G_{φ, t, θ, h}^{(T)} (X^{'}, Y^{'}) .$

Therefore,

(43) $\begin{matrix} μ_{n}^{(T)} (φ, t, θ; h (t)) & = & \sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} \{\sum_{p = 1}^{m} \frac{m!}{(m - p)!} u_{n}^{(p)} (π_{p, m} G_{φ, t, θ, h}^{(T)})\} \\ = & \sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} \{m u_{n}^{(1)} (π_{1, m} G_{φ, t, θ, h}^{(T)}) + \sum_{p = 2}^{m} \frac{m!}{(m - p)!} u_{n}^{(p)} (π_{p, m} G_{φ, t, θ, h}^{(T)})\} . \end{matrix}$

Because of the fact that $π_{p, m} G_{φ, t, θ, h}^{(T)}$ is $P$ -canonical, we have to show that

$(\sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} \sum_{p = 2}^{m} u_{n}^{(p)} (π_{p, m} G_{φ, t, θ, h}^{(T)})) \overset{P}{⟶} 0 .$

So, to establish the weak convergence of the U-process

{\{μ_{n}^{(T)} (φ, t, θ; h (t))\}}_{F_{m} K_{Θ}^{m}}

, it is enough to show

$m \sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} u_{n}^{(1)} (π_{1, m} G_{φ, t, θ, h}^{(T)}) \overset{w}{⟶} G (φ) i n ℓ_{\infty} (m G^{(1)}),$

where

{G (φ)}_{m G^{(1)}}

is a Gaussian process indexed by

m G^{(1)}

, and for

2 \leq p \leq m

${∥\sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} u_{n}^{(p)} (π_{p, m} G_{φ, t, θ, h}^{(T)})∥}_{F_{m} K_{Θ}^{m}} \overset{P}{⟶} 0 .$

We have to prove, after, that the remaining part is negligible, in the sense that

${∥\sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} \{u_{n}^{(R)} (φ, t, θ, h (t)) - E (u_{n}^{(R)} (φ, t, θ, h (t)))\}∥}_{F_{m} K_{Θ}^{m}} \overset{P}{⟶} 0 .$

Nevertheless, when dealing with finite-dimensional convergence, the truncation becomes inconsequential. This implies that establishing the finite-dimensional convergence of

μ_{n} (φ, t, θ; h (t))

is equivalent to establishing the convergence of

μ_{n}^{(T)} (φ, t, θ; h (t))

Theorem 14.

(a) Under Conditions (C.1’.) , (C.1.2) , (C.2.) , (C.3’.) , (C.5.1) , (C.5.2) and if $r^{(m)} (φ, t, θ)$ is continuous at $t,$ then, as $n \to \infty$ ,

(44) $\sqrt{n {\tilde{ϕ}}^{1 / m} (h (t))} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h (t)) - E (u_{n} (φ, t, θ; h (t)))) \overset{d}{⟶} N (0, ρ^{2}),$

where

(45) $\begin{matrix} ρ^{2} & : = & m^{2} (σ_{t, θ}^{2} (φ, φ) - 2 r^{(m)} (φ, t, θ) σ_{t, θ}^{2} (φ, 1) + {(r^{(m)} (φ, t, θ))}^{2} σ_{t, θ}^{2} (1, 1)), \\ σ_{t, θ}^{2} (φ_{i}, φ_{j}) & : = & lim_{n \to \infty} \tilde{ϕ} (h) E (π_{1, m} G_{φ_{i}, t, θ, h}, π_{1, m} G_{φ_{j}, t, θ, h}) . \end{matrix}$

(b) If, in addition, the smoothing parameter k satisfies Condition (C.8.) , then we have, as $n \to \infty$ ,

(46) $\sqrt{k} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h_{n, k} (t)) - E (u_{n} (φ, t, θ; h_{n, k} (t)))) \overset{d}{⟶} N (0, ρ^{2}),$

where $ρ^{2}$ is defined as in (45), with

$σ_{t, θ}^{2} (φ_{i}, φ_{j}) : = lim_{n \to \infty} (\frac{k}{n}) E (π_{1, m} G_{φ_{i}, t, θ, h}, π_{1, m} G_{φ_{j}, t, θ, h}) .$

Corollary 7.

Under Conditions (C.1’.) , (C.1.2) , (C.2.) , (C.3’.) , (C.5.) and if $n {\tilde{h^{'}}}_{n}^{2 γ} \tilde{ϕ} (h (t)) \to 0$ as $n \to \infty$ , then we infer that

(47) $\sqrt{k} \{{\hat{r}}_{n}^{* (m)} (φ, t, θ; h_{n, k} (t)) - r^{(m)} (φ, t, θ)\} \overset{d}{⟶} N (0, ρ^{2}) .$

Theorem 15.

Under Conditions (C.1’.) , (C.1.2) , (C.2.) , (C.3’.) , (C.5.1) , (C.5.2) , (C.8.) and $n {\tilde{h^{'}}}_{n}^{2 γ} {\tilde{ϕ}}^{1 / m} (h (t)) \to 0$ as $n \to \infty$ , we let $F_{m} K_{Θ}^{m}$ be a measurable VC-subgraph class of functions from $(X^{m}, Y^{m}) ⟶ R$ such that Condition (C.4.2) is satisfied and, if the β-coefficients of the mixing stationary sequence ${\{(X_{i}, Y_{i})\}}_{i \in N^{*}}$ fulfill

(48) $β_{s} S^{r} \to 0, a s s \to \infty$

for some $r > 1$ , then ${\{U_{n}^{(m)} (φ, t, θ; h_{n, k} (t))\}}_{F_{m} K_{Θ}^{m}}$ converges in law to a Gaussian process ${\{G (φ)\}}_{F_{m} K_{Θ}^{m}}$ which has a version with uniformly bounded and uniformly continuous paths with respect to the ${∥ \cdot ∥}_{2} -$ norm.

Remark 7.

The distinctive characteristics of kNN-based estimators introduce inherent complexities in analyzing their asymptotic properties. Specifically, the random variable $H_{n, k} (X)$ , which depends on $(X_{1}, \dots, X_{n})$ , adds technical intricacies to the proofs. A crucial observation is that the random elements involved in (2) for $m = 1$ cannot be simply decomposed into sums of independent variables, as is typically the case with kernel-based estimators. This challenge requires more advanced probabilistic techniques, extending beyond the scope of standard limit theorems for sums of i.i.d. variables. Moreover, the direct application of Hoeffding’s decomposition—a fundamental tool for studying U-statistics—to (2) is not feasible in this context. The first step in proving Theorem 15 involves extending the work in [110,112] to address the multivariate bandwidth problem. It is also important to note that this paper extends the framework of [155] to single-index models. Additionally, we explore new applications, including the set-indexed conditional U-statistic, Kendall’s rank correlation coefficient, and time series prediction based on a continuous set of past values. Another layer of complexity arises from the fact that certain maximal inequalities and symmetrization techniques, as presented in [10,154], cannot be directly applied within this framework. This limitation necessitates a lengthier proof, particularly in establishing the equicontinuity of the empirical processes, a critical component in our analysis.

Remark 8.

It is straightforward to adapt the proofs of our results to demonstrate that they remain valid when the entropy condition is replaced by the bracketing condition. Specifically, for some $C_{0} > 0$ and $v_{0} > 0$ ,

$N_{[]} (ϵ, F_{m} K_{Θ}^{m}, L_{2} (P)) \leq C_{0} ϵ^{- v_{0}}, 0 < ϵ < 1 .$

For the definition of $N_{[]} (ϵ, F_{m} K_{Θ}^{m}, L_{2} (P))$ , refer to p. 270 of [3].

Remark 9.

We can also consider the scenario where $Θ = Θ_{n}$ , with $Θ_{n}$ satisfying the condition $card (Θ_{n}) = n^{α}$ , where $α > 0$ . For each $θ \in Θ_{n}$ , we assume ${⟨θ - θ_{0}, θ - θ_{0}⟩}^{1 / 2} \leq C_{7} b_{n}$ , where $b_{n}$ converges to zero, as discussed in [90].

The set of functional directions, $Θ_{n}$ , is constructed using an approach similar to that of [81,90], as outlined below:

(i) Each direction $θ \in Θ_{n}$ is derived from a $d_{n}$ -dimensional space spanned by $B$ -spline basis functions, denoted by $\{e_{1} (\cdot), \dots, e_{d_{n}} (\cdot)\}$ . Thus, we represent directions as

(49) $\begin{matrix} θ (\cdot) = \sum_{j = 1}^{d_{n}} α_{j} e_{j} (\cdot), w h e r e (α_{1}, \dots, α_{d_{n}}) \in V \end{matrix}$

(ii) The set of coefficient vectors in (49), denoted by $V$ , is generated through the following steps:

Step 1
For each $(β_{1}, \dots, β_{d_{n}}) \in C^{d_{n}}$ , where $C = \{c_{1}, \dots, c_{J}\} \subset R^{J}$ represents a set of J ’seed-coefficients’, we construct the initial functional direction as
$θ_{i n i t} (\cdot) = \sum_{j = 1}^{d_{n}} β_{j} e_{j} (\cdot) .$
Step 2
For each $θ_{i n i t}$ from Step 1 that satisfies $θ_{i n i t} (t_{0}) > 0$ , where $t_{0}$ is a fixed value in the domain of $θ_{i n i t} (\cdot)$ , we compute $⟨θ_{i n i t}, θ_{i n i t}⟩$ and normalize to obtain $(α_{1}, \dots, α_{d_{n}}) = (β_{1}, \dots, β_{d_{n}}) / {⟨θ_{i n i t}, θ_{i n i t}⟩}^{1 / 2}$ .
Step 3
We define $V$ as the collection of vectors $(α_{1}, \dots, α_{d_{n}})$ obtained in Step 2 . Consequently, the final set of admissible functional directions is given by
$Θ_{n} = \{θ (\cdot) = \sum_{j = 1}^{d_{n}} α_{j} e_{j} (\cdot); (α_{1}, \dots, α_{d_{n}}) \in V\} .$

5. Some Potential Applications

Although only six examples are presented here, they serve as archetypes for a wide range of problems that can be explored using similar methods.

5.1. Set Indexed Conditional U-Statistics

Our objective is to analyze the relationships between X and Y by estimating functional operators associated with the conditional distribution of Y given X. This includes estimating operators such as the regression operator for $C_{1} \times \dots \times C_{m} : = \tilde{C}$ within a class of sets $C^{m}$ . Specifically, for $t \in H^{m}$ , the conditional distribution is defined as

$G^{(m)} (C_{1} \times \dots \times C_{m} ∣ t, θ) = E (\prod_{i = 1}^{m} 1_{{Y_{i} \in C_{i}}} ∣ (⟨ θ_{1}, X_{1} ⟩, \dots, ⟨ θ_{m}, X_{m} ⟩) = (⟨ θ_{1}, t_{1} ⟩, \dots, ⟨ θ_{m}, t_{m} ⟩)) .$

We define the metric entropy of the class of sets

C

by considering the covering number for each

ε > 0

given by

$N (ε, C, G^{(1)} (\cdot ∣ ⟨ θ, x ⟩)) = inf {n \in N : \exists C_{1}, \dots, C_{n} \in C, \forall C \in C, \exists 1 \leq i, j \leq n : C_{i} \subset C \subset C_{j} and G^{(1)} (C_{j} ∖ C_{i} ∣ x) < ε} .$

The logarithm of this covering number is referred to as the metric entropy of the class

C

with respect to

G^{(1)} (\cdot ∣ ⟨ θ, x ⟩)

. Estimates for such covering numbers are available for various classes of sets (see, e.g., [158]). Often, we assume that the behavior of the covering number is governed by powers of

ε^{- 1}

. Specifically, Condition

(R_{γ})

holds if

$log N (ε, C, G^{(1)} (\cdot ∣ ⟨ θ, x ⟩)) \leq H_{γ} (ε), for all ε > 0,$

where

$H_{γ} (ε) = \{\begin{matrix} log (A ε) & if γ = 0, \\ A ε^{- γ} & if γ > 0, \end{matrix}$

for some constants A and

r > 0

. As shown in [159], Condition

(R_{0})

with

γ = 0

is satisfied for intervals, rectangles, balls, ellipsoids, and classes formed through finite applications of set operations like unions, intersections, and complements. In the case of convex sets in

R^{d}

for

d \geq 2

, the condition holds with

γ = (d - 1) / 2

. Further examples of sets satisfying

(R_{γ})

for

γ > 0

are found in [158]. An important application of this theory is the estimation of

G^{(m)} (C_{1} \times \dots \times C_{m} ∣ t, θ)

using the following empirical estimator:

${\hat{G}}_{n}^{(m)} (\tilde{C}, t, θ) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} \prod_{j = 1}^{m} 1_{{Y_{i_{j}} \in C_{j}}} K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{H_{n, k} (t_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{H_{n, k} (t_{m})})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{H_{n, k} (t_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{H_{n, k} (t_{m})})} .$

Using Corollary 6, we conclude that

$sup_{\tilde{C} \times \tilde{K} \in C^{m} K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |{\hat{G}}_{n}^{(m)} (\tilde{C}, t, θ) - G^{(m)} (\tilde{C} ∣ t, θ)| ⟶ 0, almost completely .$

Remark 10.

An alternative perspective involves considering a compact set $J \subset R^{d m}$ , where

$G^{(m)} (y_{1}, \dots, y_{m} ∣ t, θ) = E (\prod_{i = 1}^{m} 1_{{Y_{i} \leq y_{i}}} ∣ (⟨ θ_{1}, X_{1} ⟩, \dots, ⟨ θ_{m}, X_{m} ⟩) = (⟨ θ_{1}, t_{1} ⟩, \dots, ⟨ θ_{m}, t_{m} ⟩)) .$

The estimator for $G^{(m)} (y_{1}, \dots, y_{m} ∣ t, θ)$ is given by

${\hat{G}}_{n}^{(m)} (y, t, θ) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} L (\frac{t_{1} - Y_{i_{1}}}{{\tilde{H}}_{n, \tilde{k}} (t_{1})}) \dots L (\frac{t_{m} - Y_{i_{m}}}{{\tilde{H}}_{n, \tilde{k}} (t_{m})}) K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{H_{n, k} (t_{m}, θ_{m})})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{H_{n, k} (t_{m}, θ_{m})})} .$

As $n \to \infty$ , Corollary 6 implies that

$sup_{\tilde{K} \in K_{Θ}^{m}} sup_{k_{1, n} \leq k, \tilde{k} \leq k_{2, n}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} sup_{y \in J} |{\hat{G}}_{n}^{(m)} (y, t, θ) - G^{(m)} (y ∣ t, θ)| ⟶ 0, almost completely .$

5.2. Kendall Rank Correlation Coefficient

To assess the independence between the one-dimensional random variables $Y_{1}$ and $Y_{2}$ , Kendall [160] proposed a method based on a U-statistic, $K_{n}$ , with the kernel function:

$φ ((s_{1}, t_{1}), (s_{2}, t_{2})) = 1_{\{(s_{2} - s_{1}) (t_{2} - t_{1}) > 0\}} - 1_{\{(s_{2} - s_{1}) (t_{2} - t_{1}) \leq 0\}} .$

The rejection region for the independence test is defined as

\{\sqrt{n} K_{n} > γ\}

. In this example, we extend Kendall’s method to the multivariate setting. Specifically, to test the conditional independence of

ξ

and

η

(where

Y = (ξ, η)

) given X, we introduce a method based on the conditional U-statistic:

${\hat{r}}_{n}^{(2)} (φ, t, θ; h_{n, k} (t)) = \frac{\sum_{i \neq j}^{n} φ (Y_{i}, Y_{j}) K (\frac{d_{θ_{1}} (t_{1}, X_{i})}{H_{n, k} (t_{1}, θ_{1})}) K (\frac{d_{θ_{2}} (t_{2}, X_{j})}{H_{n, k} (t_{2}, θ_{2})})}{\sum_{i \neq j}^{n} K (\frac{d_{θ_{1}} (t_{1}, X_{i})}{H_{n, k} (t_{1}, θ_{1})}) K (\frac{d_{θ_{2}} (t_{2}, X_{j})}{H_{n, k} (t_{2}, θ_{2})})},$

where

t = (t_{1}, t_{2}) \in I \subset R^{2}

and

φ (\cdot)

is Kendall’s kernel. We assume that

ξ

and

η

are

d_{1}

- and

d_{2}

-dimensional random vectors, respectively, with

d_{1} + d_{2} = d

. Suppose we have observations

Y_{1}, \dots, Y_{n}

(ξ, η)

, and we wish to test the following hypothesis:

$H_{0} : ξ and η are conditionally independent given X vs H_{a} : H_{0} is false .$

Let

a = (a_{1}, a_{2}) \in R^{d}

be a unit vector, where

a_{1} \in R^{d_{1}}

a_{2} \in R^{d_{2}}

, and

∥ a ∥ = 1

. Let

F (\cdot)

and

G (\cdot)

be the distribution functions of

ξ

and

η

, respectively. Assume that

F^{a_{1}} (\cdot)

and

G^{a_{2}} (\cdot)

are continuous for any unit vector

a = (a_{1}, a_{2})

, where

F^{a_{1}} (t) = P (a_{1}^{⊤} ξ < t)

and

G^{a_{2}} (t) = P (a_{2}^{⊤} η < t)

. Here,

a_{1}^{⊤}

denotes the transpose of the vector

a_{1}

. For the case where

n = 2

, let

Y^{(1)} = (ξ^{(1)}, η^{(1)})

and

Y^{(2)} = (ξ^{(2)}, η^{(2)})

, where

ξ^{(i)} \in R^{d_{1}}

and

η^{(i)} \in R^{d_{2}}

for

i = 1, 2

. Define

$φ^{a} (Y^{(1)}, Y^{(2)}) = φ ((a_{1}^{⊤} ξ^{(1)}, a_{2}^{⊤} η^{(1)}), (a_{1}^{⊤} ξ^{(2)}, a_{2}^{⊤} η^{(2)})) .$

By applying Corollary 6, conclude that as

n \to \infty

$sup_{φ \tilde{K} \in F_{2} K_{Θ}^{2}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}^{2}} sup_{θ \in Θ^{2}} |{\hat{r}}_{n}^{* (2)} (φ^{a}, t, θ; h_{n, k} (t)) - r^{(2)} (φ^{a}, t, θ)| ⟶ 0, almost completely .$

5.3. Discrimination Problems

We now apply the results to the discrimination problem outlined in Section 3 of [161], with reference to [162]. We adopt similar notation and framework. We let $φ (\cdot)$ be a function that takes a finite number of values, say $1, \dots, M$ . The sets

$A_{j} = \{(y_{1}, \dots, y_{k}) : φ (y_{1}, \dots, y_{k}) = j\}, 1 \leq j \leq M,$

partition the feature space. Predicting the value of

φ (Y_{1}, \dots, Y_{k})

is equivalent to predicting the partition set to which

(Y_{1}, \dots, Y_{k})

belongs. For any discrimination rule g, we have the following inequality:

$P (g (⟨ X, θ ⟩) = φ (Y)) \leq \sum_{j = 1}^{M} \int_{t : g (t) = j} max M^{j} (⟨ t, θ ⟩) d P (t),$

where

$M^{j} (⟨ t, θ ⟩) = P (φ (Y) = j ∣ ⟨ X, θ ⟩ = ⟨ t, θ ⟩), t \in X^{m} .$

This inequality becomes an equality if

$g_{0} (⟨ t, θ ⟩) = arg max_{1 \leq j \leq M} M^{j} (⟨ t, θ ⟩),$

where

g_{0} (\cdot)

is the Bayes rule. The associated probability of error, known as the Bayes risk, is given by

$L^{*} = 1 - P (g_{0} (⟨ X, θ ⟩) = φ (Y)) = 1 - E \{max_{1 \leq j \leq M} M^{j} (⟨ t, θ ⟩)\} .$

Each of the unknown functions

M^{j}

can be consistently estimated using the methods discussed earlier. For

1 \leq j \leq M

, the estimator is defined as

$M_{n}^{j} (t, θ) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} 1 {φ (Y_{i_{1}}, \dots, Y_{i_{k}}) = j} K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{H_{n, k} (t_{m}, θ_{m})})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{H_{n, k} (t_{m}, θ_{m})})} .$

We define the estimated rule as

$g_{0, n} (t, θ) = arg max_{1 \leq j \leq M} M_{n}^{j} (t, θ),$

and introduce the estimated probability of error:

$L_{n}^{*} = P (g_{0, n} (X, θ) \neq φ (Y)) .$

The discrimination rule

g_{0, n} (\cdot)

is asymptotically Bayes risk consistent, meaning that as

n \to \infty

$L_{n}^{*} \to L^{*} .$

This result follows from Corollary 6 and the fact that

$|L^{*} - L_{n}^{*}| \leq 2 E [max_{1 \leq j \leq M} |M_{n}^{j} (X, θ) - M^{j} (X, θ)|] .$

5.4. Generalized U-Statistics

The extension to the case of multiple samples is straightforward. Consider k independent collections of independent observations:

$\{(X_{1}^{(1)}, Y_{1}^{(1)}), (X_{2}^{(1)}, Y_{2}^{(1)}), \dots\}, \dots, \{(X_{1}^{(k)}, Y_{1}^{(k)}), (X_{2}^{(k)}, Y_{2}^{(k)}), \dots\} .$

Let, for

t \in X^{(m_{1} + \dots + m_{k})}

$r^{(m, k)} (φ, t, θ) = r^{(m, k)} (φ, t_{1}, \dots, t_{k}, θ_{1}, \dots, θ_{k})$

be defined as

$E (φ (Y_{1}^{(1)}, \dots, Y_{m_{1}}^{(1)}; \dots; Y_{1}^{(k)}, \dots, Y_{m_{k}}^{(k)}) ∣ ⟨ (X_{1}^{(j)}, \dots, X_{m_{j}}^{(j)}), θ_{j} ⟩ = ⟨ t_{j}, θ_{j} ⟩, j = 1, \dots, k),$

where

φ

is assumed, without loss of generality, to be symmetric within each of the k blocks of its arguments. The conditional U-statistic for estimating

r^{(m, k)} (φ, t)

, corresponding to the kernel

φ

and assuming

n_{1} \geq m_{1}, \dots, n_{k} \geq m_{k}

, is given by

${\hat{r}}_{n}^{(m, k)} (φ, t, θ, {\tilde{h}}_{n, k} (t)) = \frac{\sum_{c} φ (Y_{i_{11}}^{(1)}, \dots, Y_{i_{1 m_{1}}}^{(1)}; \dots; Y_{i_{k 1}}^{(k)}, \dots, Y_{i_{k m_{k}}}^{(k)}) K (X_{i_{11}}^{(1)}, \dots, X_{i_{k m_{k}}}^{(k)})}{\sum_{c} K (X_{i_{11}}^{(1)}, \dots, X_{i_{k m_{k}}}^{(k)})},$

where

$K (X_{i_{11}}^{(1)}, \dots, X_{i_{k m_{k}}}^{(k)}) = \prod_{j = 1}^{k} K (\frac{d_{θ_{1}^{(j)}} (t_{1}^{(j)}, X_{i_{j 1}}^{(j)})}{H_{n, k} (t_{1})}) \dots K (\frac{d_{θ_{m_{j}}^{(j)}} (t_{m_{j}}^{(j)}, X_{i_{j m_{j}}}^{(j)})}{H_{n, k} (t_{m})}),$

and

${\tilde{h}}_{n, k} (t) = (H_{n, k} (t_{1}^{(1)}), \dots, H_{n, k} (t_{m_{1}}^{(1)}), \dots, H_{n, k} (t_{1}^{(k)}), \dots, H_{n, k} (t_{m_{k}}^{(k)})) \in H_{n}^{m_{1} + \dots + m_{k}} \subset R_{+}^{m_{1} + \dots + m_{k}}$

is a vector of positive random variables depending on the sets

$\{X_{1}^{(1)}, \dots, X_{n_{1}}^{(1)}\}, \dots, \{X_{1}^{(k)}, \dots, X_{n_{k}}^{(k)}\} .$

Here,

\{i_{j 1}, \dots, i_{j m_{j}}\}

denotes a set of

m_{j}

distinct elements from the set

{1, 2, \dots, n_{j}}

for

1 \leq j \leq k

, and

\sum_{c}

represents summation over all such combinations. The extension of Hoeffding’s treatment of one-sample U-statistics to the multi-sample case is due to [163] and [164]. Using Corollary 6, it is possible to infer that

$max_{1 \leq j \leq k} sup_{k_{1, n}^{j} \leq k^{j} \leq k_{2, n}^{j}} sup_{t \in S_{X}^{m_{1} + \dots + m_{k}}} |{\hat{r}}_{n}^{(m, k)} (φ, t, θ, {\tilde{h}}_{n, k} (t)) - r^{(m, k)} (φ, t, θ)| ⟶ 0, almost completely .$

5.5. Conditional U-Statistics for Censored Data

We consider a triple $(Y, C, X)$ of random variables defined on $R \times R \times H$ , where Y is the variable of interest, C is a censoring variable, and X is a concomitant variable. We work with a sample ${(Y_{i}, C_{i}, X_{i})}_{1 \leq i \leq n}$ consisting of independent and identically distributed replications of $(Y, C, X)$ , with $n \geq 1$ . In the right-censorship model, the pairs $(Y_{i}, C_{i})$ , $1 \leq i \leq n$ are not directly observed. Instead, the available information is provided by $Z_{i} : = min {Y_{i}, C_{i}}$ and $Δ_{i} : = 1 {Y_{i} \leq C_{i}}$ for $1 \leq i \leq n$ . Therefore, the observed sample is

$D_{n} = {(Z_{i}, Δ_{i}, X_{i}), i = 1, \dots, n} .$

Such censoring occurs frequently in practical settings, for example, in survival data from clinical trials or failure time data in reliability studies. Often, even under well-controlled conditions, samples are incomplete. For instance, clinical data tracking survival from many diseases are typically censored due to competing risks, like death from other causes. In the following, we impose several assumptions on the distribution of

(X, Y)

. For

- \infty < t < \infty

, we let

$F_{Y} (t) = P (Y \leq t), G (t) = P (C \leq t), and H (t) = P (Z \leq t),$

denote the right-continuous distribution functions of Y, C, and Z, respectively. For any right-continuous distribution function

L

defined on

R

, we let

$T_{L} = sup {t \in R : L (t) < 1},$

be the upper point of the corresponding distribution. Now, we consider a pointwise measurable class

F

of real-valued measurable functions defined on

R

and assume that

F

is of VC type. We recall the regression function of

ψ (Y)

evaluated at

⟨ X, θ ⟩ = ⟨ t, θ ⟩

, for

ψ \in F

and

t \in H

, given by

$r^{(1)} (ψ, t, θ) = E (ψ (Y_{i}) ∣ ⟨ X_{i}, θ ⟩ = ⟨ t, θ ⟩),$

in the case where Y is right-censored. To estimate

r^{(1)} (ψ, \cdot)

, we use the Inverse Probability of Censoring Weighted (I.P.C.W.) estimators, which have gained popularity in the censored data literature (see [165,166]). The key concept behind I.P.C.W. estimators involves introducing the real-valued function

Φ_{ψ} (\cdot, \cdot)

defined on

R^{2}

(50) $Φ_{ψ} (y, c) = \frac{1 {y \leq c} ψ (y \land c)}{1 - G (y \land c)} .$

This function incorporates both the observed data and the censoring mechanism to provide consistent estimates in the presence of censoring. Assuming that function

G (\cdot)

is known, we observe that

$Φ_{ψ} (Y_{i}, C_{i}) = \frac{Δ_{i} ψ (Z_{i})}{1 - G (Z_{i})}$

is directly observable for every

1 \leq i \leq n

. Furthermore, under Assumption ( $I$ ) outlined below,

( $I$ )
C and $(Y, X)$ are independent.

We have

(51) $\begin{matrix} r^{(1)} (Φ_{ψ}, t, θ) & : = & E (Φ_{ψ} (Y_{i}, C_{i}) ∣ ⟨ X_{i}, θ ⟩ = ⟨ t, θ ⟩) \\ = & E \{\frac{1 {Y_{i} \leq C_{i}} ψ (Z_{i})}{1 - G (Z_{i})} ∣ ⟨ X_{i}, θ ⟩ = ⟨ t, θ ⟩\} \\ = & E \{\frac{ψ (Y_{i})}{1 - G (Y_{i})} E (1 {Y_{i} \leq C_{i}} ∣ X_{i}, Y_{i}) ∣ ⟨ X_{i}, θ ⟩ = ⟨ t, θ ⟩\} \\ = & r^{(1)} (ψ, t, θ) . \end{matrix}$

Therefore, any estimate of

r^{(1)} (Φ_{ψ}, \cdot)

, which is constructed based on fully observed data, also serves as an estimate for

r^{(1)} (ψ, \cdot)

. This key property allows most statistical procedures used to estimate the regression function in the uncensored case to be naturally extended to the censored case. For example, kernel-type estimates can be readily constructed. For

x \in I

h \geq 0

, and

1 \leq i \leq n

, we define

(52) $\begin{matrix} {\bar{ω}}_{n, h, j}^{(1)} (x) : = K (\frac{d_{θ_{k}} (x, X_{j, n})}{h_{K} (x)}) / \sum_{j = 1}^{n} K (\frac{d_{θ_{k}} (x, X_{j, n})}{h_{K} (x)}) . \end{matrix}$

Based on Equations (50)–(52), when

G (\cdot)

is known, a kernel estimator for

r^{(1)} (ψ, \cdot)

is given by

(53) $\begin{matrix} {\overset{˘}{r}}_{n}^{(1)} (ψ, x; h_{n}) = \sum_{i = 1}^{n} {\bar{ω}}_{n, h, i}^{(1)} (x) \frac{Δ_{i} ψ (Z_{i})}{1 - G (Z_{i})} . \end{matrix}$

Function

G (\cdot)

is generally unknown and has to be estimated. We denote by

G_{n}^{*} (\cdot)

the Kaplan–Meier estimator of function

G (\cdot)

[167]. Namely, adopting conventions

$\prod_{\emptyset} = 1$

and

0^{0} = 1

and setting

$N_{n} (u) = \sum_{i = 1}^{n} 1 {Z_{i} \geq u},$

we have

$G_{n}^{*} (u) = 1 - \prod_{i : Z_{i} \leq u} {\{\frac{N_{n} (Z_{i}) - 1}{N_{n} (Z_{i})}\}}^{(1 - Δ_{i})}, for u \in R .$

Given this notation, we investigate the following estimator of

r^{(1)} (ψ, \cdot)

(54) $\begin{matrix} {\overset{˘}{r}}_{n}^{(1) *} (ψ, x; h_{n}) = \sum_{i = 1}^{n} {\bar{ω}}_{n, h, i}^{(1)} (x) \frac{Δ_{i} ψ (Z_{i})}{1 - G_{n}^{*} (Z_{i})}, \end{matrix}$

referring to [165,168]. Adopting the convention that

0 / 0 = 0

, the given quantity is well-defined, since

G_{n}^{*} (Z_{i}) = 1

only if

Z_{i} = Z_{(n)}

and

Δ_{(n)} = 0

, where

Z_{(k)}

represents the kth order statistic from the sample

(Z_{1}, \dots, Z_{n})

for

k = 1, \dots, n

, and

Δ_{(k)}

corresponds to

Δ_{j}

such that

Z_{k} = Z_{j}

. A right-censored version of an unconditional U-statistic, with a kernel of degree

m \geq 1

, can be introduced via the principle of a mean-preserving reweighting scheme, as outlined in [169]. The almost sure convergence of multi-sample U-statistics under random censorship was established by the authors of [170] who also explored the consistency of a new class of tests designed to evaluate equality in distribution. To mitigate potential biases from right-censoring and confounding covariates, ref. [171] proposed modifications to classical U-statistics. Moreover, ref. [172] introduced an alternative estimation method for the U-statistic by employing a substitution estimator for the conditional kernel, derived from observed data. For additional references, see [42]. To the best of our knowledge, the problem of estimating conditional U-statistics in the presence of censoring—particularly when using variable bandwidths—remains an unresolved issue. This provides the primary motivation for the present study. A natural extension of the function defined in (50) is given by

(55) $\begin{matrix} Φ_{ψ} (y_{1}, \dots, y_{k}, c_{1}, \dots, c_{k}) = \frac{\prod_{i = 1}^{k} {1 {y_{i} \leq c_{i}} ψ (y_{1} \land c_{1}, \dots, y_{k} \land c_{m})}{\prod_{i = 1}^{k} {1 - G (y_{i} \land c_{i})}} . \end{matrix}$

From this, we have an analogous relation to (51) given by

(56) $\begin{matrix} E & (Φ_{ψ} (Y_{1}, \dots, Y_{k}, C_{1}, \dots, C_{k}) ∣ (X_{1}, \dots, X_{k}) = t) \\ = E (\frac{\prod_{i = 1}^{k} 1 {Y_{i} \leq C_{i}} ψ (Y_{1} \land C_{1}, \dots, Y_{k} \land C_{k})}{\prod_{i = 1}^{k} {1 - G (Y_{i} \land C_{i})}} ∣ (X_{1}, \dots, X_{k}) = t) \\ = E (\frac{ψ (Y_{1}, \dots, Y_{k})}{\prod_{i = 1}^{k} {1 - G (Y_{i})}} E (\prod_{i = 1}^{k} 1 {Y_{i} \leq C_{i}} ∣ (Y_{1}, X_{1}), \dots (Y_{k}, X_{k})) ∣ (X_{1}, \dots, X_{k}) = t) \\ = E (ψ (Y_{1}, \dots, Y_{k}) ∣ (X_{1}, \dots, X_{k}) = t) . \end{matrix}$

An analog estimator to (3) in the censored case is given by

(57) $\begin{matrix} {\overset{˘}{r}}_{n}^{(k)} (ψ, θ, x) = \sum_{(i_{1}, \dots, i_{k}) \in I_{n}^{k}} \frac{Δ_{i_{1}} \dots Δ_{i_{k}} ψ (Z_{i_{1}}, \dots, Z_{i_{k}})}{(1 - G (Z_{i_{1}}) \dots (1 - G (Z_{i_{k}}))} {\bar{ω}}_{n, i}^{(k)} (θ, x), \end{matrix}$

where, for

i = (i_{1}, \dots, i_{k}) \in I (k, n)

(58) ${\bar{ω}}_{n, i}^{(k)} (θ, t) = \frac{K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{h_{K} (t_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{h_{K} (t_{m})})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{h_{K} (t_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{h_{K} (t_{m})})} .$

The estimator that we investigate is given by

(59) $\begin{matrix} {\overset{˘}{r}}_{n}^{(k) *} (ψ, θ, x) = \sum_{(i_{1}, \dots, i_{k}) \in I_{n}^{k}} \frac{Δ_{i_{1}} \dots Δ_{i_{k}} ψ (Z_{i_{1}}, \dots, Z_{i_{k}})}{(1 - G_{n}^{*} (Z_{i_{1}}) \dots (1 - G_{n}^{*} (Z_{i_{k}}))} {\bar{ω}}_{n, i}^{(k)} (θ, x) . \end{matrix}$

The main result of this section is given in the following corollary.

Corollary 8.

Under the assumptions of Theorems 9 and 10, it follows that, as $n \to \infty$ ,

(60) $sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |{\overset{˘}{r}}_{n}^{(k) *} (ψ, θ, t) - r^{(m)} (φ, t, θ)| = O (ϕ^{- 1} {(\frac{k_{2, n}}{ρ_{n}^{*} n})}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ^{*} ϕ^{- 1} (\frac{ρ_{n}^{*} k_{1, n}}{n}))}}) .$

This last result is a direct consequence of Corollary 6, and the law of iterated logarithm for $G_{n}^{*} (\cdot)$ established in [173] ensures that

$sup_{t \leq τ} | G_{n}^{*} - G (t) | = O (\sqrt{\frac{log log n}{n}}) almost surely as n \to \infty .$

For more details, refer to [131].

5.6. Conditional U-Statistics for Left Truncated and Right Censored Data

Building on the notation from the previous section, we now introduce a truncation variable, denoted by L, and assume that $(L, C)$ is independent of Y. We consider a scenario where we have random vectors $(Z_{i}, ϵ_{i}, Δ_{i})$ , with $ϵ_{i} = 1 (L_{i} \leq Z_{i})$ . In this section, our goal is to define conditional U-statistics for data that are left truncated and right censored (LTRC), drawing on ideas from [38], which dealt with the unconditional setting. To this end, we propose the following extension of Function $(50)$ for LTRC data:

${\tilde{Φ}}_{ψ} (y_{1}, \dots, y_{k}, l_{1}, \dots, l_{k}, c_{1}, \dots, c_{k}) = \frac{ψ (y_{1} \land c_{1}, \dots, y_{k} \land c_{k}) \prod_{i = 1}^{k} 1 {y_{i} \leq c_{i}} 1 {l_{i} \leq z_{i}}}{\prod_{i = 1}^{k} P (l_{i} < z_{i} < c_{i})} .$

According to

(56)

, we determine that

$\begin{matrix} E (Φ_{ψ} (Y_{1}, \dots, Y_{k}, L_{1}, \dots, L_{k}, C_{1}, \dots, C_{k}) ∣ (X_{1}, \dots, X_{k}) = t) \\ = E (ψ (Y_{1}, \dots, Y_{k}) ∣ (X_{1}, \dots, X_{k}) = t) . \end{matrix}$

An analog estimator to (3) for LTRC data can be expressed as follows:

(61) $\begin{matrix} {\overset{˘}{\overset{˘}{r}}}_{n}^{(k)} (ψ, θ, x) = \sum_{(i_{1}, \dots, i_{k}) \in I_{n}^{k}} \frac{Δ_{i_{1}} \dots Δ_{i_{k}} ϵ_{i_{1}} \dots ϵ_{i_{k}} ψ (Z_{i_{1}}, \dots, Z_{i_{k}})}{P (L_{i_{1}} < Z_{i_{1}} < C_{i_{1}}) \dots P (L_{i_{k}} < Z_{i_{k}} < C_{i_{k}})} {\bar{ω}}_{n, i}^{(k)} (θ, x), \end{matrix}$

where

{\bar{ω}}_{n, i}^{(k)} (θ, x)

is defined as in (58). Since

P (L_{i} < Z_{i} < C_{i})

is unknown, it needs to be estimated. To achieve this, we introduce

N_{i} (t) = 1 (L_{i} < Z_{i} \leq t, Δ_{i} = 1)

and

N_{i}^{c} (t) = 1 (L_{i} < Z_{i} \leq t, Δ_{i} = 0)

, representing the counting processes for the variable of interest and the censoring variable, respectively. Additionally, we define

N (t) = \sum_{i = 1}^{n} N_{i} (t)

and

N^{c} (t) = \sum_{i = 1}^{n} N_{i}^{c} (t)

. We introduce the risk indicators as

R_{i} (t) = 1 (Z_{i} \geq t \geq L_{i})

and

R (t) = \sum_{i = 1}^{n} R_{i} (t)

, where

R (t)

represents the risk set at time t, consisting of subjects who entered the study before time t and are still under observation at that time. Importantly,

N_{i}^{c} (t)

is a local sub-martingale with respect to the appropriate filtration

F_{t}

. The martingale associated with the censoring counting process under filtration

F_{t}

is given by

$M_{i}^{c} (t) = N_{i}^{c} (t) - \int_{0}^{t} R_{i} (u) λ_{c} (u) d u, i = 1, 2, \dots, n,$

where

λ_{c} (\cdot)

represents the hazard function associated with the censoring variable C under left truncation. The cumulative hazard function for the censoring variable is defined as

Λ_{c} (t) = \int_{0}^{t} λ_{c} (u) d u

. We let

M^{c} (t) = \sum_{i = 1}^{n} M_{i}^{c} (t)

. Next, we define the sub-distribution function of

T_{1}

corresponding to

Δ_{1} = 1

and

ϵ_{1} = 1

as follows:

$S (x) = P (T_{1} \leq x, Δ_{1} ϵ_{1} = 1) .$

We let

$w (t) = \int_{0}^{\infty} \frac{h_{1} (x)}{P (L_{1} \leq x \leq C_{1})} 1 (x > t) d S (x),$

where

h_{1} (x) = E (ψ ((T_{1}, Δ_{1}), \dots, (T_{k}, Δ_{k})) ∣ (T_{1}, Δ_{1}) = (x, Δ_{1}))

. Also, we denote

$\tilde{z} (t) = P (T_{1} \geq t \geq L_{1}) .$

Then, an estimate for the survival function of the censoring variable C under left truncation, denoted as

{\hat{K}}_{c} (\cdot)

—see [174]—can be formulated as follows:

(62) $\begin{matrix} {\hat{K}}_{c} (τ) = \prod_{t \leq τ} (1 - \frac{d N^{c} (t)}{\tilde{Z} (t)}) . \end{matrix}$

Similar to the Nelson–Aalen estimator— for instance, see [175]—the estimator for the cumulative hazard function of the censoring variable C under left truncation is represented as

(63) $\begin{matrix} {\hat{Λ}}_{c} (τ) = \int_{0}^{τ} \frac{d N^{c} (t)}{\tilde{Z} (t)} . \end{matrix}$

In both the definitions presented in (62) and (63), we make the assumption that

\tilde{Z} (t)

is non-zero with probability one. The interrelation between

{\hat{K}}_{c} (τ)

and

{\hat{Λ}}_{c} (τ)

can be expressed as

${\hat{K}}_{c} (τ) = exp [- {\hat{Λ}}_{c} (τ)] .$

We let

a_{K} = inf {t : K (t) > 0}

and

b_{K} = sup {t : K (t) < 1}

denote the left and right endpoints of the support. For LTRC data, as in [176],

F (\cdot)

is identifiable if

a_{G} ⩽ a_{W}

and

b_{G} ⩽ b_{W}

. By Corollary 2.2. [176], for

b < b_{W}

, we readily infer that

(64) $\begin{matrix} sup_{a_{W} < τ < b} | {\hat{K}}_{c} (τ) - K_{c} (τ) | = O (\sqrt{n^{- 1} log log n}) . \end{matrix}$

From the above, Estimator (61) can be rewritten directly as follows:

(65) $\begin{matrix} {\overset{˘}{\overset{˘}{r}}}_{n}^{(k) *} (ψ, θ, x) = \sum_{(i_{1}, \dots, i_{k}) \in I_{n}^{k}} \frac{Δ_{i_{1}} \dots Δ_{i_{k}} ϵ_{i_{1}} \dots ϵ_{i_{k}} ψ (Z_{i_{1}}, \dots, Z_{i_{k}})}{{\hat{K}}_{c} (Z_{i_{1}}) \dots {\hat{K}}_{c} (Z_{i_{k}})} {\bar{ω}}_{n, i}^{(k)} (θ, x) . \end{matrix}$

The last estimator is the conditional version of that studied in [38]. Following the same reasoning of Corollary 8, one can infer that as $n \to \infty,$

(66) $sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{k_{1, n} \leq k \leq k_{2, n}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |{\overset{˘}{\overset{˘}{r}}}_{n}^{(k) *} (ψ, θ, x) - r^{(m)} (φ, t, θ)| = O (ϕ^{- 1} {(\frac{k_{2, n}}{ρ_{n}^{*} n})}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ^{*} ϕ^{- 1} (\frac{ρ_{n}^{*} k_{1, n}}{n}))}}) .$

Remark 11.

In the study by the authors of [177], the practical utility of U-statistics is exemplified through an analysis of data from a case–control study among African Americans. The researchers investigated whether the similarity of haplotypes between a case and a control participant differs from the similarity between two control participants. By employing U-statistics, they effectively measured and compared haplotype similarities, highlighting the method’s applicability in genetic association studies and its capacity to handle complex genetic data.
Expanding on the traditional use of U-statistics, the authors of [178] introduced a new family that includes the well-known Wilcoxon signed-rank statistic and extends to other statistics with substantially higher power in sensitivity analyses of observational studies. These enhanced statistics were applied in three diverse examples drawn from epidemiology, clinical medicine, and genetic toxicology. The results demonstrated not only improved performance but also the flexibility of these new U-statistics in addressing various research questions across different fields.
In another significant contribution, the authors of [179] combined the properties of U-statistics with large-dimensional random matrix theory to identify group structures among numerous variables. By analyzing the eigenvalues and eigenvectors of the scaled sample matrix, they uncovered underlying patterns and groupings within high-dimensional data sets. This approach showcases the power of U-statistics in dimensionality reduction and the exploration of complex data structures, which are common challenges in modern data analysis.
Furthermore, the authors of [180] proposed the heterogeneity-weighted U (HWU)-method for association analyses that consider genetic heterogeneity. This innovative method leverages U-statistics to provide a computationally efficient solution suitable for high-dimensional genetic data and various types of phenotypes. The HWU method addresses the significant challenge of genetic heterogeneity in association studies, offering researchers a powerful tool to detect associations that might be overlooked by traditional methods.

5.7. Examples of Classes of Functions

Example 1.

The set $F$ of all indicator functions ${1 I}_{{(- \infty, t]}}$ representing intervals in $R$ satisfies the following inequality:

$N (ϵ, F, d_{P}^{(2)}) \leq \frac{2}{ϵ^{2}},$

for any probability measure $P$ and for $ϵ \leq 1$ . We observe that

$\int_{0}^{1} \sqrt{log (\frac{1}{ϵ})} d ϵ \leq \int_{0}^{\infty} u^{1 / 2} exp (- u) d u \leq 1 .$

For further details, refer to Example 2.5.4 in [181] and [182] (p. 157). The covering numbers for the class of intervals $(- \infty, t]$ in higher dimensions also satisfy a similar bound, though with a higher power of $(1 / ϵ)$ ; see Theorem 9.19 in [182].

Example 2

(Classes of functions that are Lipschitz in a parameter, Section 2.7.4 in [181]). We let $F$ be the class of functions $x \mapsto φ (t, x)$ that are Lipschitz continuous with respect to the parameter $t \in T$ . We suppose

$| φ (t_{1}, x) - φ (t_{2}, x) | \leq d (t_{1}, t_{2}) κ (x),$

for some metric d on the index set T and a function $κ (\cdot)$ defined on the sample space $X$ , for all x. According to Theorem 2.7.11 in [181] and Lemma 9.18 in [182], it follows that for any norm ${∥ \cdot ∥}_{F}$ on $F$

$N (ϵ ∥ F ∥_{F}, F, ∥ \cdot ∥_{F}) \leq N (ϵ / 2, T, d) .$

Thus, if $(T, d)$ satisfies

$J (\infty, T, d) = \int_{0}^{\infty} \sqrt{log N (ϵ, T, d)} d ϵ < \infty,$

the result holds for $F$ .

Example 3.

We consider the class of functions that are smooth up to order α, as discussed in Section 2.7.1 of [181] and Section 2 of [183]. For $0 < α < \infty$ , we let $⌊ α ⌋$ denote the greatest integer smaller than α. For any vector $k = (k_{1}, \dots, k_{d})$ of d integers, we define the differential operator:

$D^{k} : = \frac{\partial^{k_{1} + \dots + k_{d}}}{\partial^{k_{1}} \dots \partial^{k_{d}}},$

where $k_{.} : = \sum_{i = 1}^{d} k_{i}$ . For a function $f : X \to R$ , we define the norm

${∥ f ∥}_{α} : = max_{k_{.} \leq ⌊ α ⌋} sup_{x} | D^{k} f (x) | + max_{k_{.} = ⌊ α ⌋} sup_{x \neq y} \frac{| D^{k} f (x) - D^{k} f (y) |}{{∥ x - y ∥}^{α - ⌊ α ⌋}},$

where the suprema are taken over all $x, y$ in the interior of $X$ . We let $C_{M}^{α} (X)$ denote the set of continuous functions $f : X \to R$ such that ${∥ f ∥}_{α} \leq M$ . For $α \leq 1$ , this class consists of bounded functions satisfying a Lipschitz condition. According to [135], the entropy of class $C_{M}^{α} (X)$ for the uniform norm was computed, and [183] showed that there exists a constant K, depending on α, d, and the diameter of $X$ , such that for every measure γ and every $ϵ > 0$ ,

$log N_{[]} (ϵ M γ (X), C_{M}^{α} (X), L_{2} (γ)) \leq K {(\frac{1}{ϵ})}^{d / α},$

where $N_{[]}$ denotes the bracketing number (see Definition 2.1.6 in [181]). A variant of this inequality is given in Theorem 2.7.1 of [181]. Lemma 9.18 of [182] also implies

$log N (ϵ M γ (X), C_{M}^{α} (X), L_{2} (γ)) \leq K {(\frac{1}{2 ϵ})}^{d / α} .$

5.8. Examples of U-Kernels

In this section, we present some classical U-kernels.

Example 4.

Ref. [1] introduced the parameter

$Δ = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} D^{2} (y_{1}, y_{2}) d F (y_{1}, y_{2}),$

where $D (y_{1}, y_{2}) = F (y_{1}, y_{2}) - F (y_{1}, \infty) F (\infty, y_{2})$ and $F (\cdot, \cdot)$ is the joint distribution function of $Y_{1}$ and $Y_{2}$ . Parameter Δ has the property that $Δ = 0$ if and only if $Y_{1}$ and $Y_{2}$ are independent. As shown in [11], an alternative expression for Δ can be developed by introducing the following functions:

$ψ (y_{1}, y_{2}, y_{3}) = \{\begin{matrix} 1 & if y_{2} \leq y_{1} < y_{3}, \\ 0 & if y_{1} < y_{2} or y_{1} \geq y_{2}, \\ - 1 & if y_{3} \leq y_{1} < y_{2}, \end{matrix}$

and

$h (y_{1, 1}, y_{1, 2}, \dots, y_{5, 1}, y_{5, 2}) = \frac{1}{4} ψ (y_{1, 1}, y_{1, 2}, y_{1, 3}) ψ (y_{1, 1}, y_{1, 4}, y_{1, 5}) ψ (y_{1, 2}, y_{2, 2}, y_{3, 2}) ψ (y_{1, 2}, y_{4, 2}, y_{5, 2}) .$

This leads to expression

$Δ = \int \dots \int h (y_{1, 1}, y_{1, 2}, \dots, y_{5, 1}, y_{5, 2}) d F (y_{1, 1}, y_{1, 2}) \dots d F (y_{1, 5}, y_{2, 5}) .$

Example 5

(Hoeffding’s D). The symmetric kernel

$\begin{matrix} h_{D} & (z_{1}, \dots, z_{5}) : = \frac{1}{16} \sum_{(i_{1}, \dots, i_{5}) \in P_{5}} [1 (z_{i_{1}, 1} \leq z_{i_{5}, 1}) - 1 (z_{i_{2}, 1} \leq z_{i_{5}, 1})] [1 (z_{i_{3}, 1} \leq z_{i_{5}, 1}) - 1 (z_{i_{4}, 1} \leq z_{i_{5}, 1})] \\ \times [1 (z_{i_{1}, 2} \leq z_{i_{5}, 2}) - 1 (z_{i_{2}, 2} \leq z_{i_{5}, 2})] [1 (z_{i_{3}, 2} \leq z_{i_{5}, 2}) - 1 (z_{i_{4}, 2} \leq z_{i_{5}, 2})] \end{matrix}$

recovers Hoeffding’s D statistic, a rank-based U-statistic of Order 5, which leads to Hoeffding’s D correlation measure $E h_{D}$ .

Example 6

(Blum–Kiefer–Rosenblatt’s R). The symmetric kernel

$\begin{matrix} h_{R} (z_{1}, \dots, z_{6}) : = \frac{1}{32} \sum_{(i_{1}, \dots, i_{6}) \in P_{6}} [1 (z_{i_{1}, 1} \leq z_{i_{5}, 1}) - 1 (z_{i_{2}, 1} \leq z_{i_{5}, 1})] [1 (z_{i_{3}, 1} \leq z_{i_{5}, 1}) - 1 (z_{i_{4}, 1} \leq z_{i_{5}, 1})] \\ \times [1 (z_{i_{1}, 2} \leq z_{i_{6}, 2}) - 1 (z_{i_{2}, 2} \leq z_{i_{6}, 2})] [1 (z_{i_{3}, 2} \leq z_{i_{6}, 2}) - 1 (z_{i_{4}, 2} \leq z_{i_{6}, 2})] \end{matrix}$

yields Blum–Kiefer–Rosenblatt’s R statistic [184,185,186,187].

Example 7

(Bergsma–Dassios–Yanagimoto’s $τ^{*}$ ). In [188], a rank correlation statistic was introduced as a U-statistic of Order 4 with the symmetric kernel:

$\begin{matrix} h_{τ^{*}} (z_{1}, \dots, z_{4}) : = & \frac{1}{16} \sum_{(i_{1}, \dots, i_{4}) \in P_{4}} [1 (z_{i_{1}, 1}, z_{i_{3}, 1} < z_{i_{2}, 1}, z_{i_{4}, 1}) + 1 (z_{i_{2}, 1}, z_{i_{4}, 1} < z_{i_{1}, 1}, z_{i_{3}, 1}) \\ - 1 (z_{i_{1}, 1}, z_{i_{4}, 1} < z_{i_{2}, 1}, z_{i_{3}, 1}) - 1 (z_{i_{2}, 1}, z_{i_{3}, 1} < z_{i_{1}, 1}, z_{i_{4}, 1})] \\ \times [1 (z_{i_{1}, 2}, z_{i_{3}, 2} < z_{i_{2}, 2}, z_{i_{4}, 2}) + 1 (z_{i_{2}, 2}, z_{i_{4}, 2} < z_{i_{1}, 2}, z_{i_{3}, 2}) \\ - 1 (z_{i_{1}, 2}, z_{i_{4}, 2} < z_{i_{2}, 2}, z_{i_{3}, 2}) - 1 (z_{i_{2}, 2}, z_{i_{3}, 2} < z_{i_{1}, 2}, z_{i_{4}, 2})] . \end{matrix}$

Here, $1 (y_{1}, y_{2} < y_{3}, y_{4})$ denotes the indicator that both $y_{1} < y_{3}$ and $y_{2} < y_{4}$ .

Example 8

(The Wilcoxon Statistic). We suppose $E \subset R$ is symmetric around zero. As an estimator of the quantity

$\int_{(x, y) \in E^{2}} \{21 (x + y > 0) - 1\} d F (x) d F (y),$

we consider the Wilcoxon statistic,

$W_{n} = \frac{2}{n (n - 1)} \sum_{1 \leq i < j \leq n} \{2 \cdot 1 (X_{i} + X_{j} > 0) - 1\},$

which is useful for testing whether the mean μ is located at zero.

Example 9

(The Takens Estimator). We let $∥ \cdot ∥$ denote the Euclidean norm on $R^{d}$ . In [189], the following estimator of the correlation integral,

$C_{F} (r) = \int I_{\{∥ x - x^{'} ∥ \leq r\}} d F (x) d F (x^{'}), r > 0,$

is given by

$C_{n} (r) = \frac{1}{n (n - 1)} \sum_{1 \leq i \neq j \leq n} I_{\{∥ X_{i} - X_{j} ∥ \leq r\}} .$

When a scaling law holds, i.e., $C_{F} (r) = c \cdot r^{- α}$ for some $(α, r_{0}, c)$ , the U-statistic

$T_{n} = \frac{1}{n (n - 1)} \sum_{1 \leq i \neq j \leq n} log (\frac{∥ X_{i} - X_{j} ∥}{r_{0}}),$

provides the Takens estimator ${\hat{α}}_{n} = - T_{n}^{- 1}$ of the correlation dimension α.

Example 10.

We let $\hat{Y_{1} Y_{2}}$ denote the oriented angle between $Y_{1}$ and $Y_{2} \in T$ , where T is the unit circle in $R^{2}$ . We define

$h_{t} (Y_{1}, Y_{2}) = 1 {\hat{Y_{1} Y_{2}} \leq t} - t / π, f o r t \in [0, π) .$

Ref. [190] used this kernel to propose a U-process for testing uniformity on the circle.

Example 11.

For $m = 3$ , we let

$φ (Y_{1}, Y_{2}, Y_{3}) = 1 {Y_{1} - Y_{2} - Y_{3} > 0} .$

We have

$r^{(3)} (φ, t_{1}, t_{2}, t_{3}) = P (Y_{1} > Y_{2} + Y_{3} ∣ X_{1} = X_{2} = X_{3} = t),$

and the corresponding conditional U-statistic can be considered a conditional analog of the Hollander–Proschan test statistic [191], which tests whether the conditional distribution of $Y_{1}$ given $X_{1} = t$ is exponential or of the New-Better-than-Used type.

Example 12

(The Gini Mean Difference). The Gini index, another popular measure of dispersion, corresponds to the case where $E \subset R$ and $h (x, y) = | x - y |$ . The Gini statistic is given by

$G_{n} = \frac{2}{n (n - 1)} \sum_{1 \leq i < j \leq n} | X_{i} - X_{j} | .$

In the next section, we provide more details on how several neighborhood selection methodologies from the literature can be combined with our results.

6. The Bandwidth Selection Criterion

Numerous methodologies have been proposed for constructing asymptotically optimal bandwidth selection rules, particularly in the context of the Nadaraya–Watson regression estimator. Prominent references in this area include [42,51,52,192,193,194]. The accurate bandwidth selection, whether in a finite- or infinite-dimensional setting, is critical to achieving optimal performance in practical applications. However, to the best of our knowledge, no studies have yet addressed the general functional conditional U-statistic. Unlike the real-valued case, where the selection of the number k has been thoroughly explored in [195], we propose an extension of the leave-one-out cross-validation procedure. For any fixed $i = (i_{1}, \dots, i_{m}) \in I (m, n)$ , we define

(67) ${\hat{r}}_{n, i}^{* (m)} (φ, t, θ, h_{n, k} (t)) = \frac{\sum_{j \in I_{n}^{m} (i)} φ (Y_{j_{1}}, \dots, Y_{j_{m}}) K (\frac{d_{θ_{1}} (t_{1}, X_{j_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{j_{m}})}{H_{n, k} (t_{m}, θ_{m})})}{\sum_{j \in I_{n}^{m} (i)} K (\frac{d_{θ_{1}} (t_{1}, X_{j_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{j_{m}})}{H_{n, k} (t_{m}, θ_{m})})},$

where

I_{n}^{m} (i) : = I (m, n) ∖ {i}

. This expression defines the leave-one-out-

(X_{i}, Y_{i})

estimator for functional regression and serves as a predictor for

φ (Y_{i})

. To minimize the quadratic loss function, we introduce the following criterion, where

W (\cdot)

represents a known non-negative weight function:

(68) $C V^{*} (φ, θ, k) : = \sum_{i \in I (m, n)} {(φ (Y_{i}) - {\hat{r}}_{n, i}^{* (m)} (φ, X_{i}, θ, h_{n, k} (X_{i})))}^{2} \tilde{W} (X_{i}),$

where

\tilde{W} (t) : = \prod_{i = 1}^{m} W (t_{i})

. Similarly, we define

(69) $C V (φ, θ, h) : = \sum_{i \in I (m, n)} {(φ (Y_{i}) - {\hat{r}}_{n, i}^{(m)} (φ, X_{i}, θ, h))}^{2} \tilde{W} (X_{i}) .$

Following the ideas in [81,90,194], an intuitive approach to bandwidth selection is to minimize this criterion. We let

${\hat{θ}}_{n, k}^{*} = \underset{θ \in Θ_{n}^{m}}{arg min} sup_{φ \in F_{m}} C V^{*} (φ, θ, k),$

and

${\hat{θ}}_{n, h} = \underset{θ \in Θ_{n}^{m}}{arg min} sup_{φ \in F_{m}} C V (φ, θ, h) .$

We let

\hat{k} \in [k_{1, n}, k_{2, n}]

minimize over

k \in [k_{1, n}, k_{2, n}]

$sup_{φ \in F_{m}} C V (φ, {\hat{θ}}_{n, k}^{*}, k),$

and let

\hat{h} \in H_{n}^{(m)}

minimize over

h \in H_{n}^{(m)}

$sup_{φ \in F_{m}} C V^{*} (φ, {\hat{θ}}_{n, h}, h) .$

We now state the following corollary:

Corollary 9.

Under the assumptions of Corollary 5, as $n \to \infty$ ,
(70) $\begin{matrix} sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{t \in S_{H}^{m}} |{\hat{r}}_{n}^{* (m)} (φ, t, {\hat{θ}}_{n, h}; \hat{h}) - r^{(m)} (φ, t, θ_{0})| \\ = O ({\tilde{h^{'}}}_{n}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}) . \end{matrix}$
Under the assumptions of Corollary 6, as $n \to \infty$ ,
(71) $\begin{matrix} sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{t \in S_{H}^{m}} |{\hat{r}}_{n}^{* (m)} (φ, t, {\hat{θ}}_{n, k}^{*}; h_{n, \hat{k}} (t)) - r^{(m)} (φ, t, θ_{0})| \\ = O (ϕ^{- 1} {(\frac{k_{2, n}}{ρ_{n}^{*} n})}^{γ}) + O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ^{*} ϕ^{- 1} (\frac{ρ_{n}^{*} k_{1, n}}{n}))}}) . \end{matrix}$

A key advantage of our results lies in deriving asymptotics for data-driven parameters. We let $K^{'} (\cdot)$ be a density function in $R^{d}$ and let $H_{n, k^{'}}^{'} (t_{i})$ denote the number of neighborhoods associated with the $Y_{i}$ values. The conditional density $f^{(m)} (y_{1}, \dots, y_{m} ∣ t, θ)$ can be estimated as follows:

$\begin{matrix} {\hat{f}}_{n}^{(m)} (y, t, θ) \\ : = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K^{'} (\frac{t_{1} - Y_{i_{1}}}{H_{n, k^{'}}^{'} (t_{1})}) \dots K^{'} (\frac{t_{m} - Y_{i_{m}}}{H_{n, k^{'}} (t_{m})}) K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{H_{n, k} (t_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{H_{n, k} (t_{m}, θ_{m})})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K (\frac{d_{θ_{1}} (t_{1}, X_{i_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{i_{m}})}{H_{n, k} (t_{m}, θ_{m})})} . \end{matrix}$

The leave-one-out estimator is given by

$\begin{matrix} {\hat{f}}_{n, i}^{(m)} (y, t, θ) \\ : = \frac{\sum_{(j_{1}, \dots, j_{m}) \in I_{n}^{m} (i)} K^{'} (\frac{t_{1} - Y_{j_{1}}}{H_{n, k^{'}}^{'} (t_{1})}) \dots K^{'} (\frac{t_{m} - Y_{j_{m}}}{H_{n, k^{'}} (t_{m})}) K (\frac{d_{θ_{1}} (t_{1}, X_{j_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{j_{m}})}{H_{n, k} (t_{m}, θ_{m})})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K (\frac{d_{θ_{1}} (t_{1}, X_{j_{1}})}{H_{n, k} (t_{1}, θ_{1})}) \dots K (\frac{d_{θ_{m}} (t_{m}, X_{j_{m}})}{H_{n, k} (t_{m}, θ_{m})})} . \end{matrix}$

Although cross-validation procedures aim to approximate the quadratic estimation errors, alternative methods for selecting smoothing parameters can be introduced to optimize predictive power. The criterion is given by

$(\overset{˘}{k}, {\overset{˘}{k}}^{'}) = arg min_{k_{1, n} \leq k \leq k_{2, n}, k_{1, n}^{'} \leq k^{'} \leq k_{2, n}^{'}} \sum_{i \in I (m, n)} {(φ (Y_{i}) - arg max_{y \in J} {\hat{f}}_{n, i}^{(m)} (y, X_{i}, θ))}^{2} .$

Remark 12.

As emphasized in [90], the range of values for parameter h, denoted as $h \in [a_{n}, b_{n}]$ , is typically chosen by minimizing a criterion function, such as the cross-validation (CV) function, over a broad interval. This ensures that any reasonable values of h satisfying the technical conditions are included. While the automatic determination of the interval $[a_{n}, b_{n}]$ remains unresolved in one-dimensional nonparametric statistics, it is often of lesser concern since the criterion function typically exhibits flat behavior around its minimum. Early works, such as [193,196], suggest selecting an interval allowing bandwidths to capture up to 95% of the sample. As discussed in [90], this remains effective in the functional framework. Similarly, for parameter k, the set of values ${k_{1, n}, k_{1, n} + 1, \dots, k_{2, n}}$ is determined by similar principles, with global kernel estimate considerations applying equally to local bandwidth estimates (see [197] for early developments). The same recommendation applies to the selection of ${k_{1, n}, k_{1, n} + 1, \dots, k_{2, n}}$ .

In the following section, we provide a comprehensive summary of our main findings, highlighting the key results. Additionally, we outline potential directions for future work, suggesting areas where further investigation could be beneficial.

7. Concluding Remarks

This paper is centered on the k-Nearest Neighbor (kNN) kernel-type estimator for single-index conditional U-statistics, with the single-index Nadaraya–Watson estimator serving as a special case within a functional framework applied to regular datasets. Our results are predicated on specific regularity conditions related to conditional U-statistics and conditional moments, along with decay rates governing the probability distribution of variables within diminishing open balls and favorable decreasing rates for mixing coefficients. The conditional moment assumption plays a critical role, as it allows us to work with unbounded function classes. The proof of weak convergence adheres to a classical methodology, which entails establishing finite-dimensional convergence and rigorously controlling the equicontinuity of conditional U-processes. Finite-dimensional convergence is achieved through a block decomposition technique that ensures independence, after which a central limit theorem for independent variables is proven. However, the challenge of controlling equicontinuity necessitates a more sophisticated approach, particularly given the generality and complexity of the framework we consider. Full details of these proofs are provided in the subsequent sections. It is important to note that while we assume mixing—a form of asymptotic independence—primarily for simplicity, this assumption may not fully capture scenarios involving strong data dependence. As highlighted in [198], $β$ -mixing is regarded as the minimal assumption that allows for a “complete” empirical process theory, incorporating maximal inequalities and uniform central limit theorems. Explicit upper bounds for $β$ -mixing coefficients are available in the context of Markov chains (cf. [199]) and for V-geometric mixing coefficients (cf. [200]). Various stationary time series models, including linear processes (see [201] for $α$ -mixing), ARMA models (cf. [202]), and nonlinear autoregressive models (cf. [203]), have been extensively studied under similar conditions. A common assumption across these works is the continuity of either the observed process or the innovations associated with it. The application of nonparametric functional methods to general dependence structures remains a relatively unexplored domain; see [204,205,206,207,208,209,209]. Notably, our study employs an ergodic framework that circumvents the need for strong mixing conditions and their variations, thus avoiding the intricate probabilistic computations these conditions typically entail. While extending our results to encompass functional ergodic data would undoubtedly be a valuable direction for future research, it would necessitate significant mathematical advancements and falls beyond the scope of this paper. The primary challenge lies in the development of new probabilistic tools analogous to those employed in this study, but specifically adapted for $β$ -mixing samples. Finally, it would be of great interest to investigate the potential connection between our results and change-point detection, a method widely utilized for identifying abrupt shifts in stochastic systems caused by external perturbations. This technique has found applications across diverse scientific disciplines (see [210]), and exploring how our findings may relate to change-point problems presents a promising avenue for future inquiry. It will be of interest to find some links of the present with papers [211,212].

8. Mathematical Developments

This section is dedicated to the detailed presentation of the proofs underlying our results, maintaining consistency with the notation established earlier. Our proof techniques are inspired by those developed in [155], which we adapt and extend to the single-index framework. Moreover, we incorporate several of the more intricate arguments from [109], as used in prior works such as [110,112].

8.1. Proofs of Uniform Consistency Results

8.1.1. Proof of Theorem 1

To derive the convergence rates, it is essential to introduce the following notation. For every $t \in S_{H}$ , we define

$\begin{matrix} Δ_{i} (t, θ; h_{K} (t)) & : = & K (d_{θ} (X_{i}, t) / h_{K} (t)), \\ {\hat{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩; h_{K} (t)) & : = & \frac{1}{n E (Δ_{1} (t, θ; h_{K} (t)))} \sum_{i = 1}^{n} φ (Y_{i}) Δ_{i} (t, θ; h_{K} (t)), \\ {\hat{r}}_{n, 1}^{(1)} (1, ⟨ t, θ ⟩; h_{K} (t)) & : = & \frac{1}{n E (Δ_{1} (t, θ; h_{K} (t)))} \sum_{i = 1}^{n} Δ_{i} (t, θ; h_{K} (t)) . \end{matrix}$

This allows us to write

${\hat{r}}_{n}^{(1)} (φ, ⟨ t, θ ⟩; h_{K} (t)) = {\hat{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩; h_{K} (t)) / {\hat{r}}_{n, 1}^{(1)} (1, ⟨ t, θ ⟩; h_{K} (t)) .$

Now, let us consider the following decomposition:

$\begin{matrix} {\hat{r}}_{n}^{(1)} & (φ, ⟨ t, θ ⟩; h_{K} (t)) - r^{(1)} (φ, ⟨ t, θ ⟩) \\ = & \frac{1}{{\hat{r}}_{n, 1}^{(1)} (φ, ⟨ t, θ ⟩; h_{K} (t))} \{{\hat{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩; h_{K} (t)) - E [{\hat{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩; h_{K} (t))]\} \\ + \frac{1}{{\hat{r}}_{n, 1}^{(1)} (φ, ⟨ t, θ ⟩; h_{K} (t))} \{E [{\hat{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩; h_{K} (t))] - r^{(1)} (φ, t)\} \\ + \frac{r^{(1)} (φ, ⟨ t, θ ⟩)}{{\hat{r}}_{n, 1}^{(1)} (φ, ⟨ t, θ ⟩; h_{K} (t))} \{1 - {\hat{r}}_{n, 1}^{(1)} (φ, ⟨ t, θ ⟩; h_{K} (t))\} . \end{matrix}$

Therefore, the proof of (A2) is based on Lemmas A2, A3 and A9.

8.1.2. Proof of Theorem 2

In a manner analogous to [51], verifying the conditions of Lemma A1 for the case of $m = 1$ is crucial for establishing Theorem 2. To achieve this, we first introduce the following designations: $S_{Ω} = S_{H}$ , $A_{i} = X_{i}$ , and $φ (B_{i}) = φ (Y_{i})$ . Subsequently, we define

$\begin{matrix} G (H, (⟨ t, θ ⟩, A_{i})) & = & K (H^{- 1} d_{θ} (t, X_{i})), \\ D_{n, k} (⟨ t, θ ⟩) & = & H_{n, k} (t, θ), \\ M_{n}^{(1)} (φ, t, θ; H_{n, k} (t, θ)) & = & {\hat{r}}_{n}^{* (1)} (φ, ⟨ t, θ ⟩; H_{n, k} (t, θ)), \\ M (φ, t, θ) & = & r^{(1)} (φ, ⟨ t, θ ⟩) . \end{matrix}$

We then define

D_{n, k}^{-} (ξ_{n}, ⟨ t, θ ⟩)

and

D_{n, k}^{+} (ξ_{n}, ⟨ t, θ ⟩)

such that

(72) $ϕ_{t, θ} (D_{n, k}^{-} (ξ_{n}, ⟨ t, θ ⟩)) = \frac{\sqrt{ξ_{n}} k}{n},$

(73) $ϕ_{t, θ} (D_{n, k}^{+} (ξ_{n}, ⟨ t, θ ⟩)) = \frac{k}{n \sqrt{ξ_{n}}} .$

We denote

$h^{-} (t, θ) = D_{n, k}^{-} (ξ_{n}, ⟨ t, θ ⟩), h^{+} (t, θ) = D_{n, k}^{+} (ξ_{n}, ⟨ t, θ ⟩),$

and define

$U_{n} = ϕ^{- 1} {(\frac{k_{2, n}}{\sqrt{ξ_{n}} n})}^{γ} + \sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ ϕ^{- 1} (\frac{ρ_{n} k_{1, n}}{n}))}},$

for any increasing sequence

ξ_{n} \in (0, 1)

such that

ξ_{n} - 1 = O (U_{n})

. It is important to observe that for all

ξ_{n} \in (0, 1)

t \in S_{H}

, and

k_{1, n} \leq k \leq k_{2, n}

, we have

$ϕ_{t, θ}^{- 1} (\frac{\sqrt{ξ_{n}} k_{1, n}}{n}) \leq h^{-} (t, θ) \leq ϕ_{t, θ}^{- 1} (\frac{\sqrt{ξ_{n}} k_{2, n}}{n}),$

$ϕ_{t, θ}^{- 1} (\frac{k_{1, n}}{n \sqrt{ξ_{n}}}) \leq h^{+} (t, θ) \leq ϕ_{t, θ}^{- 1} (\frac{k_{2, n}}{n \sqrt{ξ_{n}}}) .$

Using Condition (8), it follows that the bandwidths $h^{-} (t, θ)$ and $h^{+} (t, θ)$ both belong to the interval

$[h_{n, 1}, h_{n, 2}] = [μ ϕ^{- 1} (\frac{ρ_{n} k_{1, n}}{n}), ν ϕ^{- 1} (\frac{k_{2, n}}{ρ_{n} n})] .$

The remainder of the proof follows the same steps as those in [155] and is therefore omitted. □

8.2. Preliminaries of the Proofs

This section is primarily devoted to the analysis of functional conditional U-statistics. As in the case of $m = 1$ , where $S_{H}$ is covered by

$⋃_{ℓ = 1}^{N_{ε} (S_{H})} B (t_{ℓ}, ε),$

for some radius

ε

, it follows that for each

t \in S_{H}^{m}

, there exists

ℓ (t) = (ℓ (t_{1}), \dots, ℓ (t_{m}))

, where

\forall 1 \leq i \leq m, 1 \leq ℓ (t_{i}) \leq N_{ε} (S_{H})

, such that

$t \in \prod_{i = 1}^{m} B_{θ_{i}} (t_{ℓ (t_{i})}, ε) and d (t_{i}, t_{ℓ (t_{i})}) = arg min_{1 \leq ℓ (t_{i}) \leq N_{ε} (S_{H})} d_{θ_{i}} (t_{i}, t_{ℓ}) .$

Thus, for each

t \in S_{H}^{m}

, the closest center is

t_{ℓ (t)}

, and the corresponding ball with the closest center is

$\prod_{i = 1}^{m} B_{θ_{i}} (t_{ℓ (t_{i})}, ε) : = B_{θ} (t_{ℓ (t)}, ε) .$

The consistency proofs for the uniform integrated bandwidth (UIB) concerning the multivariate bandwidths follow the same approach as the UIB consistency proofs for the univariate smoothing parameter in [49,51]. Furthermore, analogous to the proof of Theorem 1, we partition the sequence

{(X_{i}, Y_{i})}

into

υ_{n}

alternating blocks, where the block sizes

a_{n}

and

b_{n}

differ, satisfying

(74) $b_{n} ≪ a_{n}, (υ_{n} - 1) (a_{n} + b_{n}) < n \leq υ_{n} (a_{n} + b_{n}),$

and for

1 \leq j \leq υ_{n} - 1

, we set

$\begin{matrix} H_{j}^{(U)} & = & \{i : (j - 1) (a_{n} + b_{n}) + 1 \leq i \leq (j - 1) (a_{n} + b_{n}) + a_{n}\}, \\ T_{j}^{(U)} & = & \{i : (j - 1) (a_{n} + b_{n}) + a_{n} + 1 \leq i \leq (j - 1) (a_{n} + b_{n}) + a_{n} + b_{n}\}, \\ H_{υ_{n}}^{(U)} & = & \{i : (υ_{n} - 1) (a_{n} + b_{n}) + 1 \leq i \leq (υ_{n} - 1) (a_{n} + b_{n}) + a_{n}\}, \\ T_{υ_{n}}^{(U)} & = & \{i : (υ_{n} - 1) (a_{n} + b_{n}) + a_{n} + 1 \leq i \leq n\} . \end{matrix}$

8.2.1. Proof of Theorem 3

In this section, we consider a bandwidth $h = (h_{1}, \dots, h_{m}) \in H_{n}^{(m)}$ . To establish Theorem 3, we represent the U-statistic for each $t \in S_{H}^{m}$ as follows:

(75) $\begin{matrix} |u_{n} (φ, t, θ; h) - E [u_{n} (φ, t, θ; h)]| \\ \leq |u_{n} (φ, t, θ; h) - u_{n} (φ, t_{ℓ (t)}, θ; h)| \\ + |E [u_{n} (φ, t_{ℓ (t)}, θ; h)] - E [u_{n} (φ, t, θ; h)]| \\ + |u_{n} (φ, t_{ℓ (t)}, θ; h) - E [u_{n} (φ, t_{ℓ (t)}, θ; h)]| \end{matrix}$

(76) $\begin{matrix} \leq & |u_{n, 11} (φ, t, θ; h)| + |u_{n, 12} (φ, t, θ; h)| + |u_{n, 13} (φ, t, θ; h)| . \end{matrix}$

Let us begin with the term

|u_{n, 11} (φ, t, θ; h)|

. We have

$|u_{n, 11} (φ, t, θ; h)| \leq \frac{C (n - m)!}{n! \tilde{ϕ} (h)} \sum_{i \in I (m, n)} |φ (Y_{i_{1}}, \dots, Y_{i_{m}}) \{\prod_{j = 1}^{m} K (\frac{d_{θ_{j}} (X_{i j}, t_{j})}{h_{j}}) - \prod_{j = 1}^{m} K (\frac{d_{θ_{i}} (X_{i j}, t_{ℓ (t_{j})})}{h_{j}})\}| .$

By applying the Telescoping binomial, we obtain

(77) $\prod_{j = 1}^{m} K (\frac{d_{θ_{j}} (X_{i j}, t_{j})}{h_{j}}) - \prod_{j = 1}^{m} K (\frac{d_{θ_{j}} (X_{i j}, t_{ℓ (t_{j})})}{h_{j}}) = \sum_{j = 1}^{m} [\{K (\frac{d_{θ_{j}} (X_{i j}, t_{j})}{h_{j}}) - K (\frac{d_{θ_{j}} (X_{i j}, t_{ℓ (t_{j})})}{h_{j}})\}]$

(78) $\begin{matrix} \times \prod_{q = 1}^{j - 1} K (\frac{d_{θ_{q}} (X_{i q}, t_{q})}{h_{q}}) \prod_{p = j + 1}^{m} K (\frac{d_{θ_{p}} (X_{i p}, t_{ℓ (t_{p})})}{h_{p}}) . \end{matrix}$

From Condition (C.4.1), we could claim that

$\prod_{q = 1}^{j - 1} K (\frac{d_{θ_{q}} (X_{i q}, t_{q})}{h_{q}}) \leq κ_{2}^{j - 1} \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) .$

Similarly, we have

$\prod_{p = j + 1}^{m} K (\frac{d_{θ_{p}} (X_{i p}, t_{ℓ (t_{p})})}{h_{p}}) \leq κ_{2}^{m - j} \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p}) .$

So, (77) satisfies

$\begin{matrix} \prod_{j = 1}^{m} & K & (\frac{d_{θ_{j}} (X_{i j}, t_{j})}{h_{j}}) - \prod_{j = 1}^{m} K (\frac{d_{θ_{j}} (X_{i j}, t_{ℓ (t_{j})})}{h_{j}}) \\ \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p})] \\ \leq & κ_{2}^{m - 1} \sum_{j = 1}^{m} [\{K (\frac{d_{θ_{j}} (X_{i j}, t_{j})}{h_{j}}) - K (\frac{d_{θ_{j}} (X_{i j}, t_{ℓ (t_{j})})}{h_{j}})\} 1_{B_{θ_{j}} (t_{j}, h_{j}) \cup B_{θ_{j}} (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \\ \prod_{q = 1}^{j - 1} 1_{B_{θ_{j}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{j}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p})] \\ = : & \sum_{j = 1}^{m} K_{i_{j}, h_{j}} (t_{j}, θ_{j}, t_{ℓ (t_{j})}), \end{matrix}$

where

$\begin{matrix} K_{i_{j}, h_{j}}^{(ℓ)} (t_{j}, θ_{j}, t_{ℓ (t_{j})}) & = & κ_{2}^{m - 1} \{K (\frac{d_{θ_{j}} (X_{i j}, t_{j})}{h_{j}}) - K (\frac{d_{θ_{j}} (X_{i j}, t_{ℓ (t_{j})})}{h_{j}})\} 1_{B_{θ_{j}} (t_{j}, h_{j}) \cup B_{θ_{j}} (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \\ \times \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p}) . \end{matrix}$

Therefore, we infer that

(79) $\begin{matrix} |u_{n, 11} (φ, t, θ; h)| & \leq & \frac{C (n - m)!}{n! \tilde{ϕ} (h)} κ_{2}^{m - 1} \sum_{i \in I (m, n)} |φ (Y_{i_{1}}, \dots, Y_{i_{m}}) \sum_{j = 1}^{m} K_{i_{j}, h_{j}}^{(ℓ)} (t_{j}, θ_{j}, t_{ℓ (t_{j})})| \\ \leq & \frac{C (n - m)!}{n! \tilde{ϕ} (h)} M κ_{2}^{m - 1} \sum_{i \in I (m, n)} \sum_{j = 1}^{m} |\{K (\frac{d_{θ_{j}} (X_{i j}, t_{j})}{h_{j}}) - K (\frac{d_{θ_{j}} (X_{i j}, t_{ℓ (t_{j})})}{h_{j}})\} \\ \times 1_{B_{θ_{j}} (t_{j}, h_{j}) \cup B_{θ_{j}} (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p})| \\ \leq & \frac{(n - m)!}{n!} C m M κ_{2}^{m - 1} \sum_{i \in I (m, n)} \frac{1}{m} \sum_{j = 1}^{m} [\frac{ε_{n}}{\tilde{ϕ} (h) h_{n, j}} 1_{B_{θ_{j}} (t_{j}, h_{j}) \cup B_{θ_{j}} (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \end{matrix}$

(80) $\begin{matrix} \times \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p})] . \end{matrix}$

The transition from Equations (79) to (80) is enabled by the Lipschitz continuity of the kernel function

K (\cdot)

. Uniformly for all

t \in S_{H}^{m}

and

h \in H_{n}^{(m)}

, we derive the following:

$\begin{matrix} sup_{φ K \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} |u_{n, 11} (φ, t, θ; h)| \\ \leq sup_{φ K \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} \frac{1}{m} \sum_{j = 1}^{m} [\frac{C m M κ_{2}^{m - 1} ε_{n}}{\tilde{ϕ} (H_{n}) h_{n, j}} 1_{B_{θ_{j}} (t_{j}, h_{j}) \cup B_{θ_{j}} (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \\ \times \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p})] \\ \leq sup_{φ K \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} \frac{1}{m} \sum_{j = 1}^{m} [\frac{C m M κ_{2}^{m - 1} ε_{n}}{\tilde{ϕ} ({\tilde{H}}_{n}) {\tilde{h}}_{n}} 1_{B_{θ_{j}} (t_{j}, h_{j}) \cup B_{θ_{j}} (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \\ \times \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p})] \\ \leq sup_{φ K \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} \frac{1}{m} \sum_{j = 1}^{m} [\frac{C m M κ_{2}^{m - 1} ε_{n}}{C_{1}^{'} ϕ ({\tilde{h}}_{n}) {\tilde{h}}_{n}} 1_{B_{θ_{j}} (t_{j}, h_{j}) \cup B_{θ_{j}} (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \\ \times \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p})] \end{matrix}$

by (14), where

H_{n} = (h_{n, 1}, \dots, h_{n, m})

and

{\tilde{H}}_{n} = ({\tilde{h}}_{n}, \dots, {\tilde{h}}_{n})

, with

{\tilde{H}}_{n} \leq H_{n}

component by component. The idea is to apply Lemma A21 on function

$f_{t, h, θ} (X_{i}) = \frac{1}{m} \sum_{j = 1}^{m} [\frac{C m M κ_{2}^{m - 1} ε_{n}}{C_{1}^{'} ϕ ({\tilde{h}}_{n}) h_{n, j}} 1_{B_{θ_{j}} (t_{j}, h_{j}) \cup B_{θ_{j}} (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p})],$

which satisfies, for all

t \in S_{H}^{m}

and

θ \in Θ^{m}

$0 \leq sup_{t \in S_{H}^{m}} sup_{h \in H_{n}^{(m)}} sup_{θ \in Θ^{m}} |f_{t, h} (X)| \leq \frac{C m M κ_{2}^{m - 1} ε_{n}}{C_{1}^{'} ϕ ({\tilde{h}}_{n}) {\tilde{h}}_{n}} \leq C t e = C_{7} .$

It is important to observe that Constant

C_{7}

appearing on the final right-hand side of the preceding inequality is derived from Condition (7). At this stage, we can invoke Lemma A21 with

x = \sqrt{\frac{ψ_{S_{H}} (ε_{n}) + ψ_{Θ} (ε_{n})}{n ϕ ({\tilde{h}}_{n})}} - \frac{ε_{n}}{{\tilde{h}}_{n}}

, yielding

(81) $\begin{matrix} P & \{sup_{φ K \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |u_{n, 11} (φ, t, θ; h)| \geq \sqrt{\frac{ψ_{S_{H}} (ε_{n}) + ψ_{Θ} (ε_{n})}{n ϕ ({\tilde{h}}_{n})}}\} \\ \leq P \{|u_{n}^{(m)} (f) - θ| \geq \sqrt{\frac{ψ_{S_{H}} (ε_{n}) + ψ_{Θ} (ε_{n})}{n ϕ ({\tilde{h}}_{n})}} - \frac{ε_{n}}{{\tilde{h}}_{n}} + C_{γ, m} / \sqrt{n}\} \end{matrix}$

(82) $\begin{matrix} \leq & 2 exp \{- \frac{C_{γ, m}^{'} x^{2} n}{C_{7}^{2} + C_{7} x (log n) ({log}_{2} (n))}\} \leq n^{- c ϵ_{0}}, \end{matrix}$

such that

c ϵ_{0} > 1

. By rigorously performing the calculations while adhering to the specified conditions, particularly (C.6) and (C.7), we derive the following result:

(83) $\sum_{n \geq 1} P \{sup_{φ K \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |u_{n, 11} (φ, t, θ; h)| \geq \sqrt{\frac{ψ_{S_{H}} (ε_{n}) + ψ_{Θ} (ε_{n})}{n ϕ ({\tilde{h}}_{n})}}\} < \infty .$

The study of term

u_{n, 12}

is deduced from the previous one. In fact,

$|u_{n, 12} (φ, t, θ; h)| = |E [u_{n} (φ, t, θ; h) - u_{n} (φ, t, θ; h)]|$

(84) $\begin{matrix} \leq \frac{C (n - m)!}{n! \tilde{ϕ} (h)} |\sum_{i \in I (m, n)} E (φ (Y_{i_{1}}, \dots, Y_{i_{m}}) \{\prod_{j = 1}^{m} K (\frac{d_{θ_{j}} (X_{i j}, t_{j})}{h_{j}}) - \prod_{j = 1}^{m} K (\frac{d_{θ_{j}} (X_{i j}, t_{ℓ (t_{j})})}{h_{j}})\})| \end{matrix}$

(85) $\begin{matrix} \leq \frac{C (n - m)!}{n! \tilde{ϕ} (h)} \sum_{i \in I (m, n)} E |φ (Y_{i_{1}}, \dots, Y_{i_{m}}) \{\prod_{j = 1}^{m} K (\frac{d_{θ_{j}} (X_{i j}, t_{j})}{h_{j}}) - \prod_{j = 1}^{m} K (\frac{d_{θ_{j}} (X_{i j}, t_{ℓ (t_{j})})}{h_{j}})\}| . \end{matrix}$

To facilitate the transition from (84) to (85), we apply Jensen’s inequality, leveraging specific properties of the absolute value function. By proceeding along the same steps as previously outlined, we obtain

$|u_{n, 12} (φ, t, θ; h)| \leq C |u_{n, 12}^{'} (φ, t, θ; h)|,$

where

$|u_{n, 12}^{'} (φ, t, θ; h)| : = \frac{(n - m)!}{n! \tilde{ϕ} (h)} \sum_{i \in I (m, n)} E |φ (Y_{i_{1}}, \dots, Y_{i_{m}}) \sum_{j = 1}^{m} K_{i_{j}, h_{j}}^{(ℓ)} (t_{j}, θ_{j}, t_{ℓ (t_{j})})| .$

Notice that we have

$\begin{matrix} |u_{n, 12}^{'} (φ, t, θ; h)| & \leq & \frac{(n - m)!}{n! \tilde{ϕ} (h)} M κ_{2}^{m - 1} \sum_{i \in I (m, n)} \sum_{j = 1}^{m} E |\{K (\frac{d_{θ_{j}} (X_{i j}, t_{j})}{h_{j}}) - K (\frac{d_{θ_{j}} (X_{i j}, t_{ℓ (t_{j})})}{h_{j}})\} \\ \times 1_{B (t_{j}, h_{j}) \cup B (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p})| \\ \leq & \frac{(n - m)!}{n!} m M κ_{2}^{m - 1} \sum_{i \in I (m, n)} \frac{1}{m} \sum_{j = 1}^{m} [\frac{ε_{n}}{\tilde{ϕ} (H_{n}) h_{n, j}} E (1_{B_{θ_{j}} (t_{j}, h_{j}) \cup B_{θ_{j}} (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \\ \times \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p}))] \\ \leq & \frac{(n - m)!}{n!} m M κ_{2}^{m - 1} \sum_{i \in I (m, n)} \frac{1}{m} \sum_{j = 1}^{m} [\frac{ε_{n}}{C_{1}^{'} \tilde{ϕ} ({\tilde{h}}_{n}) h_{n, j}} E (1_{B_{θ_{j}} (t_{j}, h_{j}) \cup B_{θ_{j}} (t_{ℓ (t_{j})}, h_{j})} (X_{i j}) \\ \times \prod_{q = 1}^{j - 1} 1_{B_{θ_{q}} (t_{q}, h_{q})} (X_{i q}) \prod_{p = j + 1}^{m} 1_{B_{θ_{p}} (t_{ℓ (t_{p})}, h_{p})} (X_{i p}))] . \end{matrix}$

That implies

$\begin{matrix} sup_{φ K \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |u_{n, 12} (φ, t, θ; h)| & \leq & m M κ_{2}^{m - 1} \frac{log (n)}{n {\tilde{h}}_{n}} \leq C_{7}^{'} \frac{log (n)}{n ϕ ({\tilde{h}}_{n})} \\ = & O (\sqrt{\frac{ψ_{S_{H}} (ε_{n}) + ψ_{Θ} (ε_{n})}{n ϕ ({\tilde{h}}_{n})}}) . \end{matrix}$

This offers

(86) $sup_{φ K \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} |u_{n, 12} (φ, t, θ; h)| = O (\sqrt{\frac{ψ_{S_{H}} (ε_{n}) + ψ_{Θ} (ε_{n})}{n ϕ ({\tilde{h}}_{n})}}) .$

Continuing, now with

u_{n, 13}

$|u_{n, 13} (φ, t_{ℓ}, θ; h)| = |u_{n} (φ, t_{ℓ}; h) - E [u_{n} (φ, t_{ℓ}; h)]| .$

Assuming symmetry of the kernel function

G_{φ, t_{ℓ}, θ, h} (\cdot)

, it becomes necessary to decompose the U-statistic using the [1] decomposition, leading to the following expression:

(87) $\begin{matrix} u_{n, 13} (φ, t, θ; h) & : = & u_{n} (φ, t, θ; h) - E [u_{n} (φ, t, θ; h)] \\ = & \sum_{p = 1}^{m} \frac{m!}{(m - p)!} u_{n}^{(p)} (π_{p, m} (G_{φ, t, θ, h})) \\ = & m u_{n}^{(1)} (π_{1, m} (G_{φ, t, θ, h})) + \sum_{p = 2}^{m} \frac{m!}{(m - p)!} u_{n}^{(p)} (π_{p, m} (G_{φ, t, θ, h})) . \end{matrix}$

We define new classes of functions for

h = (h_{1}, \dots, h_{m}) \in H_{n}^{(m)}

ℓ \in {\{1, \dots, N_{ε_{n}} (S_{H})\}}^{m}

\bar{ℓ} \in {\{1, \dots, N_{ε_{n}} (Θ)\}}^{m}

and

1 \leq p \leq m :

${(F_{m} K_{Θ}^{m})}^{(p)} : = \{ϕ_{t_{ℓ}, θ_{\bar{ℓ}}} (h) π_{p, m} (G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h}) (\cdot, \cdot) f o r h \in H_{0}^{(m)} and φ \in F_{m}\} .$

These classes are VC-type classes of functions with the same characteristics and the envelope function

F_{p}

satisfying

$F_{p} \leq 2^{p} κ_{2}^{m} {∥ F ∥}_{\infty} .$

Let us start with the linear term of (87), which is

$m u_{n}^{(1)} (π_{1, m} (G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h})) = \frac{m}{n} \sum_{j = 1}^{m} π_{1, m} (G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h}) (X_{i}, Y_{i}) .$

From Hoeffding’s projection, we have

$\begin{matrix} π_{1, m} (G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h}) (x, y) & = & \{E [G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((x, X_{2}, \dots, X_{m}), (y, Y_{2}, \dots, Y_{m}))] - E [G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} (X, Y)]\} \\ = & \{E [G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} (X, Y) | (X_{1}, Y_{1}) = (x, y)] - E [G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} (X, Y)]\} . \end{matrix}$

One can see that

$m u_{n}^{(1)} (π_{1, m} (G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h})) = : \frac{1}{\sqrt{n}} α_{n} (S_{1, t_{ℓ}, θ_{\bar{ℓ}}, h}),$

is an empirical process based on a VC-type class of functions contained in

m {(F_{m} K_{Θ}^{m})}^{(1)}

with the same characteristics, and the elements are defined by

$S_{1, t_{ℓ}, θ_{\bar{ℓ}}, h} (x, y) = m ϕ_{t_{ℓ} (h)} E [G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} (X, Y) | (X_{1}, Y_{1}) = (x, y)] .$

Hence, the proof of this part is similar to that of the Lemma A2, and then

$sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} sup_{{\tilde{h}}_{n} \leq h_{ℓ} \leq b_{0}} sup_{φ \in F_{m}} |u_{n}^{(1)} (π_{1, m} (G_{φ, t_{ℓ}, θ, h}))| = O_{a . c o} (\sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}) .$

Now, let us proceed to the nonlinear terms. The objective is to demonstrate that, for

2 \leq p \leq m,

(88) $sup_{φ K \in F_{m} K_{Θ}^{m}} sup_{h \in H_{n}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} \frac{(\binom{m}{p}) \sqrt{n ϕ ({\tilde{h}}_{n})} |u_{n}^{(p)} (π_{p, m} (G_{φ, t, θ, h}))|}{\sqrt{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}} = O_{a . c o} (1) .$

To achieve that, we need to decompose the interval

\prod_{j = 1}^{m} (h_{n, j}, h_{n, j}^{'})

into smaller intervals. First, let us consider the intervals

(h_{n, j}, b_{j})

for all

j = 1, \dots, m and b_{j} \in (0, 1)

. We note

$H_{i_{j}} = [h_{i_{j}}, h_{i_{j}}^{'}],$

where

h_{i_{j}} = 2^{i - 1} h_{n, j}

and

h_{i_{j}}^{'} = 2^{i} h_{n, j}

, and we set

L_{j} (n) = max \{i : h_{i_{j}}^{'} \leq 2 b_{j}\},

and

I_{j} = \{i_{j} : 1 \leq i_{j} \leq L_{j} (n)\} .

We can observe that

$\prod_{j = 1}^{m} (h_{n, j}, b_{j}) \subseteq ⋃_{(i_{1}, \dots, i_{m}) \in I_{1} \times \dots \times I_{m}} \prod_{j = 1}^{m} H_{i_{j}},$

and

(89) $\begin{matrix} L_{j} (n) \sim \frac{log (b_{j} / h_{n, j})}{log (2)} \leq L (n) = : max L_{j} (n) for 1 \leq j \leq m . \end{matrix}$

Now, we set the following new classes for

h = (h_{1}, \dots, h_{m}) \in H_{n}^{(m)}

ℓ \in {\{1, \dots, N_{ε_{n}} (S_{H})\}}^{m}

, and

\bar{ℓ} \in {\{1, \dots, N_{ε_{n}} (Θ)\}}^{m}

1 \leq j \leq m

1 \leq i \leq L (n)

and

2 \leq p \leq m :

$\begin{matrix} {(F_{m} K_{Θ}^{m})}_{i_{j}, ℓ, \bar{ℓ}} & : = & \{ϕ_{t_{ℓ}} (h) G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} (\cdot, \cdot) where φ \in F_{m} and h \in \prod_{j = 1}^{m} H_{i_{j}}\}, \\ {(F_{m} K_{Θ}^{m})}_{i_{j}, ℓ, \bar{ℓ}}^{(p)} & : = & \{ϕ_{t_{ℓ}} (h) π_{p, m} (G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h}) (\cdot, \cdot) where φ \in F_{m} and h \in \prod_{j = 1}^{m} H_{i_{j}}\} . \end{matrix}$

Thus, to prove (88), we need to prove that for

1 \leq j \leq m

ℓ = (ℓ_{1}, \dots, ℓ_{m})

and

\bar{ℓ} = ({\bar{ℓ}}_{1}, \dots, {\bar{ℓ}}_{m})

$max_{1 \leq ℓ_{j} \leq N_{ε_{n}} (S_{H})} max_{1 \leq {\bar{ℓ}}_{j} \leq N_{ε_{n}} (Θ)} max_{1 \leq i \leq L (n)} sup_{h \in \prod_{j = 1}^{m} H_{i_{j}}} sup_{φ K \in F_{m} K_{Θ}^{m}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} \frac{(\binom{m}{p}) \sqrt{n ϕ ({\tilde{h}}_{n})} |u_{n}^{(p)} (π_{p, m} (G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h})|}{\sqrt{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}} = O_{a . c o} (1) .$

Notice that for each

ϵ_{0},

1 \leq j \leq m

ℓ = (ℓ_{1}, \dots, ℓ_{m})

and

\bar{ℓ} = ({\bar{ℓ}}_{1}, \dots, {\bar{ℓ}}_{m})

, we have

$\begin{matrix} P & \{sup_{h \in H_{0}^{(m)}} sup_{t \in S_{H}^{m}} sup_{θ \in Θ^{m}} sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} \frac{(\binom{m}{p}) \sqrt{n ϕ ({\tilde{h}}_{n})} |u_{n}^{(p)} (π_{p, m} G_{φ, t, θ, h})|}{\sqrt{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}} \geq ϵ_{0}\} \\ \leq \sum_{1 \leq ℓ_{j} \leq N_{ε_{n}} (S_{H})} \sum_{1 \leq {\bar{ℓ}}_{j} \leq N_{ε_{n}} (Θ)} \sum_{1 \leq i \leq L (n)} P \{sup_{h \in \prod_{j = 1}^{m} H_{i_{j}}} sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} \frac{(\binom{m}{p}) \sqrt{n ϕ ({\tilde{h}}_{n})} |u_{n}^{(p)} (π_{p, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h})|}{\sqrt{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}} \geq ϵ_{0}\} \\ \leq L (n) N_{ε_{n}}^{m} (S_{H}) N_{ε_{n}}^{m} (Θ) max_{1 \leq ℓ_{j} \leq N_{ε_{n}} (S_{H})} max_{1 \leq {\bar{ℓ}}_{j} \leq N_{ε_{n}} (Θ)} max_{1 \leq i \leq L (n)} \\ \times P \{{∥\sum_{p = 2}^{m} (\binom{m}{p}) u_{n}^{(p)} (π_{p, m} G_{φ, t_{ℓ (t)}, θ, h})∥}_{{(F_{m} K_{Θ}^{m})}_{i_{j}, ℓ, \bar{ℓ}}} \geq ϵ_{0} \sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}\} . \end{matrix}$

At this stage, we focus on analyzing the aforementioned equation for the case of

m = 2

to simplify the proof (the same reasoning applies for

m > 2

). Thus, we obtain

$\begin{matrix} u_{n}^{(2)} (π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h}) = \frac{1}{n (n - 1)} \sum_{1 ⩽ < i < j ⩽ n} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, Y_{i}), (X_{j}, Y_{j})), \end{matrix}$

and

(90) $\begin{matrix} \sum_{i \neq j}^{n} & π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) \\ = \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) \\ + \sum_{p = 1}^{υ_{n}} \sum_{i \neq j i, j \in H_{p}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) \\ + 2 \sum_{p = 1}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{q : | q - p | ⩾ 2}^{υ_{n}} \sum_{j \in T_{q}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) \\ + 2 \sum_{p = 1}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{q : | q - p | ⩽ 1}^{υ_{n}} \sum_{j \in T_{q}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) \\ + \sum_{p \neq q}^{υ_{n}} \sum_{i \in T_{p}^{(U)}} \sum_{j \in T_{q}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) \\ + \sum_{p = 1}^{υ_{n}} \sum_{i \neq j i, j \in T_{p}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) \\ = : I + II + III + IV + V + VI . \end{matrix}$

We begin by analyzing the term

I

. Assuming that the sequence of independent blocks

{\{ξ_{i} = (ς_{i}, ζ_{i})\}}_{i \in N^{*}}

has a size of

a_{n}

, an application of [119] demonstrates that

$\begin{matrix} P & \{{∥\frac{1}{n (n - 1)} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))∥}_{{(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}} \\ \geq ϵ_{0} \sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}\} \\ \leq P \{{∥\frac{1}{n (n - 1)} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j}))∥}_{{(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}} \\ \geq ϵ_{0} \sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}\} + 2 υ_{n} β_{b_{n}} . \end{matrix}$

We retain the choice of

υ_{n}

and

b_{n}

such that

υ_{n} {b_{n}}^{r} \leq 1

, which implies that

2 υ_{n} β_{b_{n}} \to 0

n \to \infty

. Consequently, the primary term under consideration is the second summand. The central idea is to employ Lemma A19. It is clear that the class of functions

{(F_{m} K_{Θ}^{m})}_{i_{j}, ℓ, \bar{ℓ}}

is uniformly bounded, i.e.,

$sup_{φ \tilde{K} \in {(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}} |φ (\cdot) \tilde{K} (\cdot)| \leq M κ_{2}^{2} .$

Furthermore, by applying Proposition 2.6 from [10], we obtain the following for every

(x_{i}, y_{i}) \in S_{H} \times Y

and for Rademacher variables

ϵ_{i}

(91) $\begin{matrix} E & {∥n^{- 1} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j}))∥}_{{(F_{m} K_{Θ}^{m})}_{i_{j}, ℓ, \bar{ℓ}}} \\ ⩽ c_{2} E {∥n^{- 1} \sum_{p \neq q}^{υ_{n}} ϵ_{p} ϵ_{q} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j}))∥}_{{(F_{m} K_{Θ}^{m})}_{i_{j}, ℓ, \bar{ℓ}}} \\ ⩽ c_{2} E \int_{0}^{D_{2, n}} log N (ϵ, {(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}, d_{2, n}) d ϵ, \end{matrix}$

where

$D_{2, n} : = {∥E_{ϵ}^{1 / 2} \{n^{- 2} \sum_{p \neq q}^{υ_{n}} {(ϵ_{p} ϵ_{q} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j})))}^{2}\}∥}_{{(F_{m} K_{Θ}^{m})}_{i_{j}, ℓ, \bar{ℓ}}}$

and

$d_{2, n} : = E_{ϵ}^{1 / 2} \{n^{- 2} \sum_{p \neq q}^{υ_{n}} {(ϵ_{p} ϵ_{q} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j})))}^{2}\} .$

We see that

$D_{2, n} \leq n^{- 1} \frac{n!}{(n - 2)!} M κ_{2}^{2} \leq n M κ_{2}^{2} .$

Given that

F_{2} K_{Θ}^{2}

is a VC-type class of functions that satisfies (C.4.4), it follows that the class

{(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}

is also a VC-type class of functions with similar characteristics to

F_{2} K_{Θ}^{2}

. Therefore,

(92) $\begin{matrix} E & {∥n^{- 1} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j}))∥}_{{(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}} \\ \leq c_{2} E \int_{0}^{D_{2, n}} log N (ϵ, {(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}, d_{2, n}) d ϵ \\ \leq c_{2} n M C^{2} . \end{matrix}$

All the conditions of Lemma A19 are satisfied; thus, a direct application yields, for each

ϵ_{0}

(93) $\begin{matrix} P & \{{∥\frac{1}{n (n - 1)} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))∥}_{{(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}} \\ \geq ϵ_{0} \sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}\} \\ \leq 2 exp (- \frac{ϵ_{0} \sqrt{\frac{2 (ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n}))}{n ϕ ({\tilde{h}}_{n})}}}{2^{7} 2^{3} c_{2} M^{2} κ_{2}^{4}}) \\ \leq n^{- ε_{0} C_{2}^{'}}, \end{matrix}$

where

$ε_{n} = \frac{log n}{n}, C_{2}^{'} > 0 .$

Next, let us study the same blocks

II

. We have

$\begin{matrix} P & \{{∥\frac{1}{n (n - 1)} \sum_{p = 1}^{υ_{n}} \sum_{i \neq j i, j \in H_{p}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))∥}_{{(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}} \\ \geq ϵ_{0} \sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}\} \\ \leq P \{{∥\frac{1}{n (n - 1)} \sum_{p = 1}^{υ_{n}} \sum_{i \neq j i, j \in H_{p}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))∥}_{{(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}} \\ \geq ϵ_{0} \sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}\} + 2 υ_{n} β_{b_{n}} . \end{matrix}$

Following the same argument as in blocks

I

, we obtain

(94) $\begin{matrix} E & {∥n^{- 1} \sum_{p = 1}^{υ_{n}} \sum_{i \neq j i, j \in H_{p}^{(U)}} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))∥}_{{(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}} \\ ⩽ c_{2} E {∥n^{- 1} \sum_{p = 1}^{υ_{n}} ϵ_{p} \sum_{i \neq j i, j \in H_{p}^{(U)}} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))∥}_{{(F_{m} K_{Θ}^{m})}_{i_{j}, ℓ, \bar{ℓ}}} \\ \leq c_{2} E \int_{0}^{D_{2, n}^{(2)}} log N (ϵ, {(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}, d_{2, n}^{(2)}) d ϵ \\ \leq c_{2} n M C^{' 2}, \end{matrix}$

where

$D_{2, n}^{(2)} : = {∥E_{ϵ}^{1 / 2} \{n^{- 2} {(\sum_{p = 1}^{υ_{n}} ϵ_{p} \sum_{i \neq j i, j \in H_{p}^{(U)}} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j})))}^{2}\}∥}_{{(F_{m} K_{Θ}^{m})}_{i_{j}, ℓ, \bar{ℓ}}}$

and

$d_{2, n}^{(2)} : = E_{ϵ}^{1 / 2} \{n^{- 2} {(\sum_{p = 1}^{υ_{n}} ϵ_{p} \sum_{i \neq j i, j \in H_{p}^{(U)}} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j})))}^{2}\} .$

Again, using Lemma A19, we readily obtain

(95) $\begin{matrix} P & \{{∥\frac{1}{n (n - 1)} \sum_{p = 1}^{υ_{n}} \sum_{i \neq j i, j \in H_{p}^{(U)}} π_{2, m} G_{φ, t_{ℓ}, θ_{\bar{ℓ}}, h} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))∥}_{{(F_{2} K_{Θ}^{2})}_{i_{j}, ℓ, \bar{ℓ}}} \\ \geq ϵ_{0} \sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}\} \\ \leq 2 exp (- \frac{ϵ_{0} \sqrt{\frac{2 ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ ({\tilde{h}}_{n})}}}{2^{7} 2^{3} c_{2} M^{2} κ_{2}^{4}}) \end{matrix}$

(96) $\begin{matrix} \leq & n^{- ε_{0} C_{3}^{'}} . \end{matrix}$

The results for the remaining blocks can be derived by following the same approach as described above. Consequently, we obtain

$\begin{matrix} \sum_{n \geq 1} P \{sup_{h \in H_{0}^{(m)}} sup_{t \in S_{H}^{m}} sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} \frac{(\binom{m}{p}) \sqrt{n ϕ ({\tilde{h}}_{n})} |u_{n}^{(p)} (π_{p, m} G_{φ, t_{ℓ (t)}, θ, h})|}{\sqrt{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}} \geq ϵ_{0}\} \\ \leq \sum_{n \geq 1} L (n) n^{- ε_{0} C_{m}^{'}} \\ \leq \infty, \end{matrix}$

where

$ε_{0} = \frac{log n}{n}, C_{m}^{'} > 0 .$

With this, the proof of the theorem is concluded. □

8.2.2. Proof of Theorem 5

We observe that

$\begin{matrix} |{\hat{r}}_{n}^{* (m)} (φ, t, θ; h) - \hat{E} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h))| \\ = |\frac{u_{n} (φ, t, θ, h)}{u_{n} (1, t, θ, h)} - \frac{E (u_{n} (φ, t, θ, h))}{E (u_{n} (1, t, θ, h))}| \\ \leq \frac{|u_{n} (φ, t, θ, h) - E (u_{n} (φ, t, θ, h))|}{|u_{n} (1, t, θ, h)|} \\ + \frac{|E (u_{n} (φ, t, θ, h))| \cdot |u_{n} (1, t, θ, h) - E (u_{n} (1, t, θ, h))|}{|u_{n} (1, t, θ, h)| \cdot |E (u_{n} (1, t, θ, h))|} \\ = : I + II . \end{matrix}$

Given the imposed assumptions and the results derived earlier, and for certain constants

c_{1}^{'}

and

c_{2}^{'}

, we obtain that

$\begin{matrix} sup_{h \in H_{0}^{(m)}} sup_{t \in S_{H}^{m}} |u_{n} (1, t, θ, h)| & = & c_{1}^{'} a . co \\ sup_{h \in H_{0}^{(m)}} sup_{t \in S_{H}^{m}} |E (u_{n} (1, t, θ, h))| & = & c_{2}^{'} \\ sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{h \in H_{0}^{(m)}} sup_{t \in S_{H}^{m}} |E (u_{n} (φ, t, θ, h))| & = & O (1) . \end{matrix}$

Hence, we can now apply Theorem 3 to handle

II

and Theorems 3 and 4 to handle

I

depending on whether class

F_{m}

satisfies (C.4.1) or (C.4.2). As a result, we obtain, for some

c^{″} > 0

, with probability 1,

$\begin{matrix} sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{h \in H_{0}^{(m)}} & sup_{t \in S_{H}^{m}} \frac{\sqrt{n ϕ ({\tilde{h}}_{n})} |{\hat{r}}_{n}^{* (m)} (φ, t, θ; h) - \hat{E} ({\hat{r}}_{n}^{* (m)} (φ, t, θ; h))|}{\sqrt{ψ_{S_{H}} (ε_{n}) + ψ_{Θ} (\frac{log n}{n})}} \\ \leq sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{h \in H_{0}^{(m)}} sup_{t \in S_{H}^{m}} \frac{\sqrt{n ϕ ({\tilde{h}}_{n})} (I)}{\sqrt{ψ_{S_{H}} (ε_{n}) + ψ_{Θ} (\frac{log n}{n})}} \\ + sup_{φ \tilde{K} \in F_{m} K_{Θ}^{m}} sup_{h (t) \in H_{0}^{(m)}} sup_{t \in S_{H}^{m}} \frac{\sqrt{n ϕ ({\tilde{h}}_{n})} (II)}{\sqrt{ψ_{S_{H}} (ε_{n}) ψ_{Θ} (\frac{log n}{n})}} \\ \leq c^{″} . \end{matrix}$

Thus, the proof is concluded. □

8.2.3. Proof of Theorem 6

Given Conditions (C.3.1), we have

$\begin{matrix} |\frac{E [u_{n} (φ, t, θ; h)]}{E [u_{n} (1, t; h)]} - r^{(m)} (φ, t, θ)| \\ \leq |κ_{1}^{- m} C_{1}^{- 1} E [u_{n} (φ, t, θ; h)] - r^{(m)} (φ, t, θ)| \\ \leq |κ_{1}^{- m} C_{1}^{- 1} E (\frac{(n - m)!}{n!} \sum_{i \in I (m, n)} \frac{1}{ϕ_{t, θ} (h)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) \prod_{j = 1}^{m} K (\frac{d_{θ_{j}} (X_{i_{j}}, t_{j})}{h_{j}})) \\ - r^{(m)} (φ, t, θ)| \\ \leq |κ_{1}^{- m} C_{1}^{- 1} \frac{1}{ϕ_{t, θ} (h)} E (φ (Y_{1}, \dots, Y_{m}) \prod_{i = 1}^{m} K (\frac{d_{θ_{i}} (X_{i}, t_{i})}{h_{i}})) - r^{(m)} (φ, t, θ)| \\ \leq \frac{1}{ϕ_{t, θ} (h) κ_{1}^{m} C_{1}} E (\prod_{i = 1}^{m} K (\frac{d_{θ_{i}} (X_{i}, t_{i})}{h_{i}}) |r^{(m)} (φ, X) - r^{(m)} (φ, t, θ)|) . \end{matrix}$

Considering Hypotheses (H.1), (C.2.1), (C.3.1), and (C.6.), we have

\forall h = (h_{1}, \dots, h_{m}) \in \prod_{j = 1}^{m} (h_{n, j}, b_{j})

and

\forall t \in S_{X}^{m} :

$\begin{matrix} |κ_{1}^{- m} C_{1}^{- 1} E (u_{n} (φ, t, θ, h)) - r^{(m)} (φ, t, θ)| \\ \leq \frac{C_{3}}{ϕ_{t, θ} (h) κ_{1}^{m} C_{1}} E (\prod_{i = 1}^{m} K (\frac{d_{θ_{i}} (X_{i}, t_{i})}{h_{i}}) d_{X^{m}}^{γ} (X, t)) \\ \leq \frac{C_{3}}{m} {(d_{θ_{1}} (X_{1}, t_{1}) + \dots + d_{θ_{m}} (X_{m}, t_{m}))}^{γ} \\ \leq \frac{C_{3}}{m} {(h_{1} + \dots + h_{m})}^{γ} \\ = \frac{C_{3}}{m} {(b_{1} + \dots + b_{m})}^{γ} \leq C_{3}^{'} {\tilde{b}}_{0}^{γ}, \end{matrix}$

where

${\tilde{b}}_{0} : = max_{1 \leq j \leq m} b_{j} .$

With this, the proof of the theorem is concluded. □

8.2.4. Proof of Corollary 6

In this section, we establish Corollary 6 by applying Lemma A1. Following the same reasoning as in the case of functional regression, we adopt the following notation: $S_{Ω} = S_{H}$ , $A_{i} = X_{i}$ , $φ (B_{i}) = φ (Y_{i})$ ,

$\begin{matrix} \prod_{i = 1}^{m} G (H_{i}, (⟨ t_{i}, θ_{i} ⟩, A_{i})) & = & \prod_{i = 1}^{m} K (H_{i}^{- 1} d_{θ_{i}} (t_{i}, A_{i})), \\ D_{n, k} (⟨ t_{j}, θ_{j} ⟩) & = & H_{n, k} (t_{j}, θ_{j}), \forall t = (t_{1}, \dots, t_{m}) \in S_{H}^{m}, and j = 1, \dots, m, \\ M_{n}^{(m)} (φ, t, θ; h_{n, k} (t, θ)) & = & {\hat{r}}_{n}^{* (m)} (φ, t, θ; h_{n, k} (t, θ)), \\ M (φ, t, θ) & = & r^{(m)} (φ, t, θ) . \end{matrix}$

We choose

D_{n, k}^{-} (ξ_{n}, ⟨ t, θ ⟩)

and

D_{n, k}^{+} (ξ_{n}, ⟨ t, θ ⟩)

such that for all

j = 1, \dots, m,

$ϕ_{t_{j}} (D_{n, k}^{-} (ξ_{n}, ⟨ t_{j}, θ_{j} ⟩)) = \frac{\sqrt{ξ_{n, j}} k}{n},$

$ϕ_{t_{j}} (D_{n, k}^{+} (ξ_{n}, ⟨ t_{j}, θ_{j} ⟩)) = \frac{k}{n \sqrt{ξ_{n, j}}},$

where

ξ_{n, j}

are increasing sequences that belong to

(0, 1)

and

$ξ_{n} = \prod_{j = 1}^{m} ξ_{n, j} .$

We denote

h^{-} (t, θ) = (h_{1}^{-} (t_{1}, θ_{1}), \dots, h_{m}^{-} (t_{m}, θ_{m})) and h^{+} (t, θ) = (h_{1}^{+} (t_{1}, θ_{1}), \dots, h_{m}^{+} (t_{m}, θ_{m})),

where

$h_{j}^{-} (t_{j}, θ_{j}) = D_{n, k}^{-} (ξ_{n}, ⟨ t_{j}, θ_{j} ⟩) and h^{+} (t_{j}, θ_{j}) = D_{n, k}^{+} (ξ_{n}, ⟨ t_{j}, θ_{j} ⟩), for all j = 1, \dots, m .$

We can easily see that, for all

j = 1, \dots, m,

(97) $\begin{matrix} ϕ_{t_{j}}^{- 1} (\frac{\sqrt{ξ_{n, j}} k_{1, n}}{n}) \leq h^{-} (t_{j}, θ_{j}) & \leq & ϕ_{t_{j}}^{- 1} (\frac{\sqrt{ξ_{n, j}} k_{2, n}}{n}), \end{matrix}$

(98) $\begin{matrix} ϕ_{t_{j}}^{- 1} (\frac{k_{1, n}}{n \sqrt{ξ_{n, j}}}) \leq h^{+} (t_{j}, θ_{j}) & \leq & ϕ_{t_{j}}^{- 1} (\frac{k_{2, n}}{n \sqrt{ξ_{n, j}}}) . \end{matrix}$

Using Condition

(8)

one deetrmines that, for all

j = 1, \dots, m,

there exist constants

0 < μ_{j} \leq ν_{j} < \infty

such that

$μ_{j} ϕ^{- 1} (\frac{ρ_{n, j} k_{1, n}}{n}) \leq ϕ_{t_{j}, θ_{j}}^{- 1} (\frac{ρ_{n, j} k_{1, n}}{n}) and ϕ_{t_{j}, θ_{j}}^{- 1} (\frac{k_{2, n}}{ρ_{n, j} n}) \leq ν_{j} ϕ^{- 1} (\frac{k_{2, n}}{ρ_{n, j} n}) .$

We put

ρ_{n, j} = \sqrt{ξ_{n, j}}

h_{n, j} = μ_{j} ϕ^{- 1} (\frac{ρ_{n, j} k_{1, n}}{n})

and

h_{n, j}^{'} = ν_{j} ϕ^{- 1} (\frac{k_{2, n}}{ρ_{n, j} n});

thus,

h^{-} (t)

and

h^{+} (t)

belong to the interval

$H_{n}^{(m)} : = \prod_{j = 1}^{m} (h_{n, j}, h_{n, j}^{'}) .$

We denote

{\tilde{h}}_{n} = min_{1 \leq j \leq m} h_{n, j}

and

{\tilde{h^{'}}}_{n} = max_{1 \leq j \leq m} h_{n, j}^{'}

; therefore,

$h_{j} (t_{j}) \in ({\tilde{h}}_{n}, {\tilde{h^{'}}}_{n}), \forall j = 1, \dots, m .$

We also note, for all

b = (b_{1}, \dots, b_{m}) \in {(0, 1)}^{m},

$H_{0}^{(m)} : = \prod_{j = 1}^{m} (h_{n, j}, b_{j}),$

and

${\tilde{b}}_{0} : = max_{1 \leq j \leq m} b_{j} .$

Ultimately, we can choose constants

0 < μ^{*} < ν^{*} < \infty

and a sequence

\{ρ_{n}^{*}\} \in (0, 1)

while satisfying Condition (C.8) such that

{\tilde{h}}_{n} = μ^{*} ϕ^{- 1} (\frac{ρ_{n}^{*} k_{1, n}}{n})

{\tilde{h^{'}}}_{n} = ν^{*} ϕ^{- 1} (\frac{k_{2, n}}{ρ_{n}^{*} n})

, and

$\begin{matrix} U_{n} = ϕ^{- 1} {(\frac{k_{2, n}}{ρ_{n}^{*} n})}^{γ} + \sqrt{\frac{ψ_{S_{H}} (\frac{log n}{n}) + ψ_{Θ} (\frac{log n}{n})}{n ϕ (μ^{*} ϕ^{- 1} (\frac{ρ_{n}^{*} k_{1, n}}{n}))}} . \end{matrix}$

It is evident that

(L_{0})

is fulfilled due to Condition (C.3.1), and from (97) and (98), we can readily verify that the construction of

h^{-} (t, θ)

and

h^{+} (t, θ)

satisfies Condition

(L_{1})

. □

8.3. Proofs of Weak Convergence Results

8.3.1. Preliminaries of the Proofs

As previously noted, a direct approach proves ineffective when dealing with random bandwidths. Consequently, we often rely on general lemmas (see, for instance, [51]) to facilitate the application of results established for non-random bandwidths. In this section, we introduce key results from [111,112], which were obtained for a positive bandwidth $h_{K}$ . These results are instrumental in the proofs that follow. We denote the bias term and the centered variate as the following quantities:

(99) $\begin{matrix} B_{n} (⟨ t, θ ⟩; h_{K}) & : = & \frac{E ({\hat{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩; h_{K})) - r^{(1)} (φ, t) E ({\hat{r}}_{n, 1}^{(1)} (1, ⟨ t, θ ⟩; h_{K}))}{E ({\hat{r}}_{n, 1}^{(1)} (1, ⟨ t, θ ⟩; h_{K}))}, \\ Q_{n} (⟨ t, θ ⟩; h_{K}) & : = & ({\hat{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩; h_{K}) - E ({\hat{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩; h_{K}))) \\ - r^{(1)} (φ, ⟨ t, θ ⟩) ({\hat{r}}_{n, 1}^{(1)} (1, ⟨ t, θ ⟩; h_{K}) - E ({\hat{r}}_{n, 1}^{(1)} (1, ⟨ t, θ ⟩; h_{K}))) . \end{matrix}$

Decomposition (99) plays a pivotal role in our proof. Following the methodology employed in [111], we show that convergence in the quadratic mean to one is achieved and that the bias satisfies

$B_{n} (⟨ t, θ ⟩; h_{K}) = o (1) a s n \to \infty .$

Proof of Theorem 11

By applying the Cramér–Wold device, it is sufficient to demonstrate the convergence of the one-dimensional distribution to establish Theorem 11. In fact, leveraging the linearity of $ν_{n} (ψ; k_{K} ∣ t)$ is enough to show that

$ν_{n} (Φ; H_{n, k} (t, θ) ∣ ⟨ t, θ ⟩) \to N (0, σ^{2} (Φ, t, θ))$

for all

Φ

of the form

$Φ = \sum_{p = 1}^{L} c_{p} ψ_{p}, c_{1}, \dots, c_{L} \in R, ψ_{1}, \dots, ψ_{L} \in F K_{Θ}^{m} .$

Hence, we solely illustrate the convergence in a single dimension. Recall that we are dealing with

(100) $\begin{matrix} ν_{n} (ψ; H_{n, k} (t, θ) ∣ ⟨ t, θ ⟩) & = & \sqrt{k} ({\hat{r}}_{n}^{(1)} (ψ, ⟨ t, θ ⟩; H_{n, k} (t, θ)) - r^{(1)} (φ, ⟨ t, θ ⟩)) \\ = \sqrt{k} (\frac{\sum_{i = 1}^{n} φ (Y_{i}) K (d_{θ} (X_{i}, t) / H_{n, k} (t, θ))}{\sum_{i = 1}^{n} K (d_{θ} (X_{i}, t) / H_{n, k} (t, θ))} - r^{(1)} (φ, ⟨ t, θ ⟩)) . \end{matrix}$

We set

$\begin{matrix} φ_{M} (Y) & : = & φ (Y) 1_{\{F (Y) < M\}} . \end{matrix}$

To obtain the desired result, we write

$\begin{matrix} {\hat{r}}_{n}^{(1)} (φ, ⟨ t, θ ⟩; H_{n, k} (t, θ)) - r^{(1)} (φ, ⟨ t, θ ⟩) & = & \frac{1}{{\hat{r}}_{n, 1}^{(1)} (1, t; H_{n, k} (t, θ))} [I_{1} (⟨ t, θ ⟩; H_{n, k}) + I_{2} (t; H_{n, k})], \end{matrix}$

where

(101) $\begin{matrix} I_{1} (⟨ t, θ ⟩; H_{n, k}) & = & \underset{S_{1} (⟨ t, θ ⟩; H_{n, k})}{\underset{⏟}{({\hat{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩; H_{n, k} (t, θ)) - {\hat{r}}_{n, 2}^{(1)} (φ_{M}, ⟨ t, θ ⟩; H_{n, k} (t, θ)))}} - E (S_{1} (⟨ t, θ ⟩; H_{n, k})) \\ + \underset{S_{2} (t; H_{n, k})}{\underset{⏟}{E ({\hat{r}}_{n, 2}^{(1)} (φ, t; H_{n, k} (⟨ t, θ ⟩, θ))) - r^{(1)} (φ, ⟨ t, θ ⟩)}}, \end{matrix}$

and

(102) $\begin{matrix} I_{2} (⟨ t, θ ⟩; H_{n, k}) & = & ({\hat{r}}_{n, 2}^{(1)} (φ_{M}, ⟨ t, θ ⟩; H_{n, k} (t, θ)) - E ({\hat{r}}_{n, 2}^{(1)} (φ_{M}, ⟨ t, θ ⟩; H_{n, k} (t, θ))) \\ - r^{(1)} (φ, t) ({\hat{r}}_{n, 1}^{(1)} (1, ⟨ t, θ ⟩; H_{n, k} (t, θ)) - 1)) . \end{matrix}$

To derive the desired results, we adopt the strategy of [213].

Proof of Theorem 12

In this section, we utilize the same methodology as in [112] and previously in [109]. This approach involves applying the blocking method, which partitions a strictly stationary sequence $(X_{1}, \dots, X_{n})$ into $2 υ_{n}$ equal-sized blocks, each of length $n - 2 υ_{n} a_{n}$ . It is important to refer to the notation outlined in the proof of Lemma A2. The primary objective is to establish the asymptotic equicontinuity of the conditional empirical process

$\{ν_{n} (ψ ∣ ⟨ t, θ ⟩) = \sqrt{k} ({\hat{r}}_{n}^{(1)} (ψ, ⟨ t, θ ⟩; H_{n, k} (t, θ)) - r^{(1)} (φ, ⟨ t, θ ⟩)), ψ \in F K_{Θ}^{m}, φ \in F\} .$

Let us introduce, for any

φ K \in F K_{Θ}^{m}

and

t \in X,

(103) $\begin{matrix} W_{n} (⟨ t, θ ⟩, φ; H_{n, k} (t, θ)) & : = & \sum_{i = 1}^{n} φ (Y_{i}) K (\frac{d_{θ} (X_{i}, t)}{H_{n, k} (t, θ)}) - n E \{φ (Y_{1}) K (\frac{d_{θ} (X_{1}, t)}{H_{n, k} (t, θ)})\}, \end{matrix}$

(104) $\begin{matrix} ν_{n} (φ ∣ ⟨ t, θ ⟩) & = & \sqrt{k} ({\hat{r}}_{n}^{(1)} (φ, ⟨ t, θ ⟩; H_{n, k} (t, θ)) - r^{(1)} (φ, ⟨ t, θ ⟩)) \\ : = & \sqrt{k} ({\hat{r}}_{n}^{(1)} (ψ, ⟨ t, θ ⟩, H_{n, k} (t, θ)) - r^{(1)} (φ, ⟨ t, θ ⟩)) . \end{matrix}$

Then, we have

(105) $\begin{matrix} ν_{n} (ψ ∣ ⟨ t, θ ⟩) & = & \sqrt{k} ({\hat{r}}_{n}^{(1)} (ψ, t, H_{n, k} (t, θ)) - r^{(1)} (φ, t)) \\ = & \sqrt{k} (\frac{\sum_{i = 1}^{n} φ (Y_{i}) K (d_{θ} (X_{i}, t) / H_{n, k} (t, θ))}{\sum_{i = 1}^{n} K (d_{θ} (X_{i}, t) / H_{n, k} (t, θ))} - r^{(1)} (φ, ⟨ t, θ ⟩)) \\ = & \frac{1}{{\overset{˘}{r}}_{n, 1}^{(1)} (φ, ⟨ t, θ ⟩; H_{n, k} (t, θ))} \frac{1}{\sqrt{k}} W_{n} (⟨ t, θ ⟩, φ; H_{n, k} (t, θ)) \\ - \frac{E ({\overset{˘}{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩; H_{n, k} (t, θ)))}{{\overset{˘}{r}}_{n, 1}^{(1)} (φ, ⟨ t, θ ⟩) E ({\overset{˘}{r}}_{n, 1}^{(1)} (φ, ⟨ t, θ ⟩; H_{n, k} (t, θ)))} \frac{1}{\sqrt{k}} W_{n} (⟨ t, θ ⟩, 1; H_{n, k} (t, θ)) \\ - \sqrt{k} B_{n} (⟨ t, θ ⟩; H_{n, k} (t, θ)), \end{matrix}$

where for

h_{K} > 0

, we have

$\begin{matrix} {\overset{˘}{r}}_{n, 2}^{(1)} (φ, ⟨ t, θ ⟩, h_{K}) & : = \frac{1}{n ϕ (h_{K} (t))} \sum_{i = 1}^{n} φ (Y_{i}) Δ_{i} (t, h_{K}), \\ {\overset{˘}{r}}_{n, 1}^{(1)} (1, ⟨ t, θ ⟩, h_{K}) & : = \frac{1}{n ϕ (h_{K} (t))} \sum_{i = 1}^{n} Δ_{i} (t, θ, h_{K}) . \end{matrix}$

We investigate the asymptotic equicontinuity of each of the preceding terms. For a given class of functions

G

, we let

α_{n} (\cdot)

denote an empirical process based on

(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})

, indexed by

G

$α_{n} (g) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \{g (X_{i}, Y_{i}) - E (g (X_{i}, Y_{i}))\}, with {∥α_{n} (g)∥}_{G} = sup_{g \in G} |α_{n} (g)|,$

and for a measurable function

φ (\cdot)

and

t \in H

θ \in Θ

, we set

$η_{n, t, θ, φ, K} (u, v, h_{K}) = φ (v) K (\frac{d_{θ} (u, t)}{h_{K}}), for u, v \in X .$

That implies

$\frac{1}{\sqrt{n ϕ (h_{K})}} W_{n} (⟨ t, θ ⟩, φ, h_{K}) = \frac{1}{\sqrt{ϕ (h_{K})}} α_{n} (η_{n, t, φ, K}) .$

Again, keeping in mind that

1_{\{D_{n}^{-} \leq H_{n, k} (t, θ) \leq D_{n}^{+}\}} \overset{a . c o}{⟶} 1

when

\frac{k}{n} \to 0

, we establish the asymptotic equi-continuity of

$\{\sqrt{\frac{n}{k}} α_{n} (η_{n, t, θ, φ, K}) : η_{n, t, θ, φ, K} \in F K_{Θ}^{m}\},$

which means, for every

ε > 0

, that

$lim_{b \to 0} \underset{n \to \infty}{lim sup} P \{\sqrt{\frac{n}{k}} {∥α_{n} (η_{n, t, θ, φ, K})∥}_{{F K_{Θ}^{m}}_{(b, ∥ \cdot ∥_{p})}} > ε\} = 0,$

where

$\begin{matrix} {F K_{Θ}^{m}}_{(b, ∥ \cdot ∥_{p})} & = & \{η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}} : {∥η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}∥}_{p} < b, η_{n, t, θ, φ_{1}, K_{1}}, η_{n, t, θ, φ_{2}, K_{2}} \in F K_{Θ}^{m}\} . \end{matrix}$

This approach entails working with the independent block sequence

{\{ξ_{j} = (ζ_{j}, ς_{j})\}}_{j = 1}^{\infty}

in place of the dependent sequence, facilitated by the results in [119]. Accordingly, we obtain

(106) $\begin{matrix} P \{{∥k^{- 1 / 2} \sum_{j = 1}^{n} (φ (Y_{j}) K (\frac{d_{θ} (X_{j}, t)}{H_{n, k} (t, θ)}) - P (η_{n, t, θ, φ, K} (H_{n, k} (t, θ))))∥}_{{F K_{Θ}^{m}}_{(b, ∥ \cdot ∥_{p})}} > δ\} \\ \leq 2 P \{{∥k^{- 1 / 2} \sum_{j = 1}^{υ_{n}} \sum_{i \in H_{j}} (φ (ζ_{i}) K (\frac{d_{θ} (ς_{i}, t)}{H_{n, k} (t, ς)}) - P (η_{n, t, θ, φ, K} (H_{n, k} (t, ς))))∥}_{{F K_{Θ}^{m}}_{(b, ∥ \cdot ∥_{p})}} > δ^{'}\} \\ + 2 (υ_{n} - 1) β_{a_{n}}, \end{matrix}$

where

H_{n, k} (t, ς)

is defined by

(107) $H_{n, k} (t, ς) = min \{h \in R^{+} : \sum_{i = 1}^{n} 1_{B_{θ} (t, h)} (ς_{i}) = k\},$

We choose

$a_{n} = [{(log n)}^{- 1} {(n^{p - 2} ϕ^{p} (h_{K}))}^{1 / 2 (p - 1)}] and υ_{n} = [\frac{n}{2 a_{n}}] - 1 .$

Note that in our context,

a_{n}

is given by the following expression:

$a_{n} = {(log n)}^{- 1} {(n^{- 2} k^{p})}^{1 / 2 (p - 1)} .$

Utilizing Condition (C.5.1), we have

(υ_{n} - 1) β_{a_{n}} \to 0

n \to \infty

. Consequently, we focus on the right-hand side term of (106). We begin by assuming that the blocks are independent. Symmetrization is performed using a sequence

{ϵ_{j}}_{j \in N^{*}}

of i.i.d. Rademacher variables, i.e., random variables satisfying

P (ϵ_{j} = 1) = P (ϵ_{j} = - 1) = 1 / 2

. It is important to note that the sequence

{ϵ_{j}}_{j \in N^{*}}

is independent of the sequence

{\{ξ_{i} = (ς_{i}, ζ_{i})\}}_{i \in N^{*}}

. Thus, we need to establish the following for all

δ > 0

$lim_{b \to 0} \underset{n \to \infty}{lim sup} P \{{∥k^{- 1 / 2} \sum_{j = 1}^{υ_{n}} ϵ_{j} \sum_{i \in H_{j}} (φ (ζ_{i}) K (\frac{d (ς_{i}, t)}{H_{n, k} (t, ς)}))∥}_{{F K_{Θ}^{m}}_{(b, ∥ \cdot ∥_{p})}} > δ\} = 0 .$

Again, using the fact that that

1_{\{D_{n}^{-} \leq H_{n, k} (t, ς) \leq D_{n}^{+}\}} \overset{a . c o}{⟶} 1

when

\frac{k}{n} \to 0

and [155], it suffices to show that

$lim_{b \to 0} \underset{n \to \infty}{lim sup} P \{{∥k^{- 1 / 2} \sum_{j = 1}^{υ_{n}} ϵ_{j} \sum_{i \in H_{j}} (φ (ζ_{i}) K (\frac{d_{θ} (ς_{i}, t)}{D_{n}^{-}}))∥}_{{F K_{Θ}^{m}}_{(b, ∥ \cdot ∥_{p})}} > δ\} = 0 .$

Since the

p^{t h}

-conditional moment satisfies (C.4.2), we can truncate and obtain, for each

λ > 0

, as

n \to \infty

(108) $\begin{matrix} k^{- 1 / 2} & \sum_{j = 1}^{n} E (κ_{2} F (ζ_{i}) 1_{{F (ζ_{i}) \geq λ {(M_{n})}^{1 / 2 (p - 1)}}}) \\ = k^{- 1 / 2} \int_{0}^{\infty} P (κ_{2} F 1_{{F \geq λ {(M_{n})}^{1 / 2 (p - 1)}}} \geq t) d t \\ = k^{- 1 / 2} \int_{0}^{λ {(M_{n})}^{1 / 2 (p - 1)}} P (F \geq λ {(M_{n})}^{1 / 2 (p - 1)}) d t \\ + k^{- 1 / 2} \int_{λ {(M_{n})}^{1 / 2 (p - 1)}}^{\infty} P (F \geq t) d t \end{matrix}$

(109) $\begin{matrix} \underset{n \to \infty}{⟶} & 0 . \end{matrix}$

Hence,

$\exists λ_{n} \underset{n \to \infty}{\to 0} : k^{- 1 / 2} E (κ_{2} F 1_{{F \geq λ_{n} {(M_{n})}^{1 / 2 (p - 1)}}}) \underset{n \to \infty}{⟶} 0 .$

Then, it suffices to show

$\begin{matrix} lim_{b \to 0} \underset{n \to \infty}{lim sup} P \{{∥k^{- 1 / 2} \sum_{j = 1}^{υ_{n}} ϵ_{j} \sum_{i \in H_{j}} (φ (ζ_{i}) K (\frac{d (ς_{i}, t)}{D_{n}^{-}})) 1_{{κ_{2} F (ζ_{i}) \leq λ_{n} M_{n}^{1 / 2 (p - 1)}}}∥}_{{F K_{Θ}^{m}}_{(b, ∥ \cdot ∥_{p})}} > δ\} = 0 . \end{matrix}$

We have the following:

$ν_{n}^{(2)} (η_{n, t, φ, K}) = k^{- 1 / 2} \sum_{j = 1}^{υ_{n}} ϵ_{j} \sum_{i \in H_{j}} (φ (ζ_{i}) K (\frac{d (ς_{i}, t)}{D_{n}^{-}})) 1_{{κ_{2} F (ζ_{i}) \leq λ_{n} M_{n}^{1 / 2 (p - 1)}}} .$

This is achieved by using the chaining method. In [109],

b_{q} = b 2^{- q}

q = 0, \dots, q_{n}

was provided, where

q_{n}

(110) $2^{- 1} λ_{n} {(log (n))}^{- 1} \leq b_{q_{n}}^{2} \leq λ_{n} {(log (n))}^{- 1},$

and we let class

{F K_{Θ}^{m}}_{q}

of measurable functions of

F K_{Θ}^{m}

$# {F K_{Θ}^{m}}_{q} = N_{q} : = N (b_{q}, F K_{Θ}^{m} {, ∥ \cdot ∥}_{p}) sup_{η_{n, t, θ, φ_{1}, K_{1}} \in F K_{Θ}^{m}} min_{η_{n, t, θ, φ_{2}, K_{2}} {F K_{Θ}^{m}}_{q}} {∥η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}∥}_{p} \leq b_{q} .$

There is a map

π_{q} : F K_{Θ}^{m} ⟶ {F K_{Θ}^{m}}_{q}

that takes each

η_{n, t, θ, φ, K} \in F K_{Θ}^{m}

to its closest function in

{F K_{Θ}^{m}}_{q}

such that

${∥η_{n, t, θ, φ, K} - π_{q} (η_{n, t, θ, φ, K})∥}_{p} \leq b_{q} .$

We apply the chaining method

(111) $\begin{matrix} sup_{\underset{{∥η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}∥}_{p} \leq b}{η_{n, t, θ, φ_{1}, K_{1}}, η_{n, t, θ, φ_{2}, K_{2}} \in F K_{Θ}^{m}}} ν_{n}^{(2)} (η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}) \\ \leq sup_{\underset{{∥η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}∥}_{p} \leq b_{q_{n}}}{η_{n, t, θ, φ_{1}, K_{1}}, η_{n, t, θ, φ_{2}, K_{2}} \in F K_{Θ}^{m}}} ν_{n}^{(2)} (η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}) \\ + 2 \sum_{q = 1}^{q_{n}} sup_{\underset{{∥η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}∥}_{p} \leq 3 b_{q}}{η_{n, t, θ, φ_{1}, K_{1}}, η_{n, t, θ, φ_{2}, K_{2}} \in {(F K_{Θ}^{m})}_{q - 1}}} ν_{n}^{(2)} (η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}) \\ + sup_{\underset{{∥η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}∥}_{p} \leq 2 b}{η_{n, t, θ, φ_{1}, K_{1}}, η_{n, t, θ, φ_{2}, K_{2}} \in {(F K_{Θ}^{m})}_{0}}} ν_{n}^{(2)} (η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}) . \end{matrix}$

We let

δ_{q}

be in such a way that

(112) $δ_{q} = {(b_{q})}^{1 / 2} \lor (3 b_{q} {(8 + c_{p, β}^{2})}^{1 / 2} {(log N_{q})}^{1 / 2}) .$

We let r be chosen so small in such a way that

$2 \sum_{q = 1}^{+ \infty} δ_{q} \leq δ .$

Therefore, from (111), we readily infer that

$\begin{matrix} P & \{sup_{\underset{{∥η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}∥}_{p} \leq b}{η_{n, t, θ, φ_{1}, K_{1}}, η_{n, t, θ, φ_{2}, K_{2}} \in F K_{Θ}^{m}}} ν_{n}^{(2)} (η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}) \geq 3 δ\} \\ \leq P \{sup_{\underset{{∥η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}∥}_{p} \leq b_{q_{n}}}{η_{n, t, θ, φ_{1}, K_{1}}, η_{n, t, θ, φ_{2}, K_{2}} \in F K_{Θ}^{m}}} ν_{n}^{(2)} (η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}) \geq δ\} \\ + 2 \sum_{q = 1}^{q_{n}} P \{sup_{\underset{{∥η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}∥}_{p} \leq 3 b_{q}}{η_{n, t, θ, φ_{1}, K_{1}}, η_{n, t, θ, φ_{2}, K_{2}} \in {(F K_{Θ}^{m})}_{q - 1}}} ν_{n}^{(2)} (η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}) \geq δ_{q}\} \\ + P \{sup_{\underset{{∥η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}∥}_{p} \leq 2 b}{η_{n, t, θ, φ_{1}, K_{1}}, η_{n, t, θ, φ_{2}, K_{2}} \in {(F K_{Θ}^{m})}_{0}}} ν_{n}^{(2)} (η_{n, t, θ, φ_{1}, K_{1}} - η_{n, t, θ, φ_{2}, K_{2}}) \geq δ\} = : A + B + C . \end{matrix}$

By the fact that the terms composing

ν_{n}^{(2)} (η_{n, t, θ, φ, K})

are bounded by

a_{n} λ_{n} M_{n}^{1 / 2 (p - 1)}

, and by applying the Bernstein inequality, we obtain

$\begin{matrix} B \leq 2 \sum_{q = 1}^{q_{n}} exp (2 log N_{q} - \frac{δ_{q}^{2} k}{n b_{q}^{2} c_{p, β}^{2} + (4 / 3) δ_{q} a_{n} λ_{n} n^{p / 2 (p - 1)} ϕ {(h_{K})}^{(p - 2) / 2 (p - 1)}}) . \end{matrix}$

By using (110), we have

$δ_{q} a_{n} λ_{n} n^{p / 2 (p - 1)} ϕ {(h_{K})}^{(p - 2) / 2 (p - 1)} = (4 / 3) δ_{q} λ_{n} k {(log (n))}^{- 1} \leq (8 / 3) n b_{q}^{2} δ_{q} \leq 8 n b_{q}^{2},$

that means

(113) $\begin{matrix} B & \leq & 2 \sum_{q = 1}^{q_{n}} exp (2 log N_{q} - \frac{δ_{q}^{2}}{(8 + c_{p, β}^{2}) b_{q}^{2}}) \leq 2 \sum_{q = 1}^{q_{n}} exp (- \frac{δ_{q}^{2}}{2 (8 + c_{p, β}^{2}) b_{q}^{2}}) \\ \leq & 2 \sum_{q = 1}^{\infty} exp (- \frac{2^{q}}{2 (8 + c_{p, β}^{2}) b}) ⟶ 0 a s b \to 0 . \end{matrix}$

In view of (112), we assume that

δ < 3

. In a similar way, we have

$C \leq 2 exp (2 log N_{0} - \frac{δ^{2}}{(8 + c_{p, β}^{2}) b^{2}}) ⟶ 0 a s b \to 0 .$

Finally, by (110), it suffices to prove, for each

δ > 0

$lim_{n \to \infty} P \{∥ ν_{n}^{(2)} (η_{n, t, θ, φ, K}) ∥_{{F K_{Θ}^{m}}_{(λ_{n}^{1 / 2} {(log (n))}^{- 1 / 2}, ∥ \cdot ∥_{p})}} \geq δ\} = 0 .$

Making use of the square root trick (Lemma 5.2 [214])—see also [215] in a similar way to that in [109]—we obtain

(114) $\begin{matrix} P \{∥k^{- 1 / 2} \sum_{j = 1}^{υ_{n}} ϵ_{j} \sum_{i \in H_{j}} (φ (ζ_{i}) K (\frac{d_{θ} (ς_{i}, t)}{D_{n}^{-}})) \\ \times 1_{{κ_{2} F (ζ_{i}) \leq λ_{n} {(n / ϕ (h_{K}))}^{1 / 2 (p - 1)}}} ∥_{{F K_{Θ}^{m}}_{(λ_{n}^{1 / 2} {(log (n))}^{- 1 / 2}, ∥ \cdot ∥_{p})}} \geq 2 δ} \\ \leq & P \{∥k^{- 1 / 2} \sum_{j = 1}^{υ_{n}} ϵ_{j} \sum_{i \in H_{j}} φ (ζ_{i}) K (\frac{d_{θ} (ς_{i}, t)}{D_{n}^{-}}) \\ \times 1_{{κ_{2} F (ζ_{i}) \leq λ_{n} {(n / ϕ (h_{K}))}^{1 / 2 (p - 1)}}} ∥_{{F K_{Θ}^{m}}_{(λ_{n}^{1 / 2} {(log (n))}^{- 1 / 2}, ∥ \cdot ∥_{p})}} \geq 2 δ, \\ {∥k^{- 1} \sum_{j = 1}^{υ_{n}} {(\sum_{i \in H_{j}} φ (ζ_{i}) K (\frac{d_{θ} (ς_{i}, t)}{D_{n}^{-}}) 1_{{κ_{2} F (ζ_{i}) \leq λ_{n} {(n / ϕ (h_{K}))}^{1 / 2 (p - 1)}}})}^{2}∥}_{{F K_{Θ}^{m}}_{(λ_{n}^{1 / 2} {(log (n))}^{- 1 / 2}, ∥ \cdot ∥_{p})}} \\ \leq 64 λ_{n} c_{p, β}^{2} {(log (n))}^{- 1}\} \\ + P \{{∥k^{- 1} \sum_{j = 1}^{υ_{n}} {(\sum_{i \in H_{j}} φ (ζ_{i}) K (\frac{d_{θ} (ς_{i}, t)}{D_{n}^{-}}) 1_{{κ_{2} F (ζ_{i}) \leq λ_{n} {(n / ϕ (h_{K}))}^{1 / 2 (p - 1)}}})}^{2}∥}_{{F K_{Θ}^{m}}_{(λ_{n}^{1 / 2} {(log (n))}^{- 1 / 2}, ∥ \cdot ∥_{p})}} \\ > 64 λ_{n} c_{p, β}^{2} {(log (n))}^{- 1}\} \\ = : & P (A_{1}) + P (A_{2}) . \end{matrix}$

Let us introduce the semi-norm

${\tilde{d}}_{n ϕ, 2} : = {(k^{- 1} \sum_{j = 1}^{υ_{n}} \sum_{i \in H_{j}} {|η_{n, t, θ, φ_{1}, K_{1}} (ς_{i}, ζ_{i}) - η_{n, t, θ, φ_{2}, K_{2}} (ς_{i}, ζ_{i})|}^{2})}^{1 / 2}$

and the covering number defined for any class of functions

E

${\tilde{N}}_{n ϕ, 2} (u, E) : = N_{n ϕ, 2} (u, E, {\tilde{d}}_{n ϕ, 2}) .$

By utilizing the latter approach, we can bound

P (A_{1})

(the detailed calculations can be found in [110]). Similarly, as in [110] and earlier in [109], leveraging the independence between the blocks and Condition (C.4.3), we apply Lemma 5.2 from [214] to obtain

$P (A_{2}) \to 0 .$

Consequently, the theorem is established. □

8.3.2. Proof of Theorem 14

As previously mentioned, the examination of the weak convergence of the conditional U-process is grounded in the analysis of two components: the truncated part and the remainder part.

Lemma 1.

Let $F_{m} K_{Θ}^{m}$ be a uniformly bounded class of measurable canonical functions from $X^{m} \times Y^{m} \to R$ , where $m \geq 2$ . Assume that there exist finite constants $a$ and $b$ such that the covering number of $F_{m} K_{Θ}^{m}$ satisfies

(115) $N (ϵ, F_{m} K_{Θ}^{m} {, ∥ \cdot ∥}_{L_{2} (Q)}) \leq a ϵ^{- b},$

for every $ϵ > 0$ and every probability measure Q. If the mixing coefficient β of the stationary sequence ${Z_{i} = (X_{i}, Y_{i})}_{i \in N^{*}}$ fulfills

(116) $β_{s} s^{r} ⟶ 0, a s s \to \infty$

for some $r > 1$ , then

${∥n^{\frac{- 3 m + 1}{2}} \sqrt{k^{m}} \sum_{i \in I_{m}^{n}} G_{φ, t, θ, h_{n, k}} (X_{i}, Y_{i})∥}_{F_{m} K_{Θ}^{m}} \overset{P}{⟶} 0 .$

8.3.3. Proof of Theorem 15

It is well established that the weak convergence of an empirical process can be derived from its finite-dimensional convergence and its asymptotic equicontinuity, provided certain conditions are satisfied. Theorem 14 establishes the finite-dimensional convergence of the conditional U-process ${\{μ_{n} (φ, t, θ, h (t))\}}_{F_{m} K_{Θ}^{m}}$ . Thus, the remaining task is to demonstrate its asymptotic equicontinuity. We decompose the U-process $μ_{n} (φ, t, θ)$ into two components: the truncated part and the remainder part.

$μ_{n} (φ, t, θ, h (t)) = μ_{n}^{(T)} (φ, t, θ, h (t)) + μ_{n}^{(R)} (φ, t, θ, h (t)) .$

Following the same reasoning as in [155], we also have

(117) $μ_{n} (φ, t, θ; D_{n}^{+}) \leq μ_{n} (φ, t, θ; h_{n, k} (t)) \leq μ_{n} (φ, t, θ; D_{n}^{-}) .$

Therefore, it suffices to establish the weak convergence of

μ_{n} (φ, t, θ; D_{n}^{-})

and

μ_{n} (φ, t, θ; D_{n}^{+})

rather than directly studying

μ_{n} (φ, t, θ; h_{n, k} (t))

. The steps of the proof closely follow those in [112], with the distinction that we now consider a multivariate bandwidth

h (t)

instead of a univariate bandwidth

h_{K}

. In this section, we present the proof for

μ_{n}^{(T)} (φ, t, θ; D_{n}^{-})

, with the same approach applicable to

μ_{n}^{(T)} (φ, t, θ; D_{n}^{+})

. As shown earlier, the truncated part

μ_{n}^{(T)} (φ, t, θ, h (t))

is decomposed following Hoeffding’s decomposition:

$\begin{matrix} μ_{n}^{(T)} (φ, t, θ, D_{n}^{-}) = \sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})} \{m u_{n}^{(1)} (π_{1, m} G_{φ, t, θ, D_{n}^{-}}^{(T)}) + \sum_{p = 2}^{m} \frac{m!}{(m - p)!} u_{n}^{(p)} (π_{p, m} G_{φ, t, θ, D_{n}^{-}}^{(T)})\} . \end{matrix}$

We first investigate the linear term

m \sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})} u_{n}^{(1)} (π_{1, m} G_{φ, t, θ, D_{n}^{-}}^{(T)})

. Notice that

$m \sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})} u_{n}^{(1)} (π_{1, m} G_{φ, t, θ}^{(T)}) = \frac{m \sqrt{{\tilde{ϕ}}^{1 / m} (D_{n}^{-})}}{\sqrt{n}} \sum_{i = 1}^{n} π_{1, m} G_{φ, t, θ}^{(T)} (X_{i}, Y_{i})$

We can write

$\begin{matrix} π_{1, m} G_{φ, t, θ, D_{n}^{-}}^{(T)} (x, y) & = E [G_{φ, t, θ, D_{n}^{-}}^{(T)} (x, X_{2}, \dots, X_{m}), (y, X_{2}, \dots, X_{m})] - E [G_{φ, t, θ, D_{n}^{-}}^{(T)} (X, Y)] \\ = E [G_{φ, t, θ, D_{n}^{-}}^{(T)} (X, Y) | (X_{1}, Y_{1}) = (x, y)] - E [G_{φ, t, θ, D_{n}^{-}}^{(T)} (X, Y)] . \end{matrix}$

We need to introduce a new function

$\begin{matrix} S_{φ, t, θ, h} : & X \times Y ⟶ R \\ (x, y) ⟶ m E [φ (y) \tilde{K} (\frac{d (t, x)}{h (t)}) | (X_{1}, Y_{1}) = (x, y)] . \end{matrix}$

Hence,

$m π_{1, m} G_{φ, t, θ}^{(T)} (x, y) = {\tilde{ϕ}}^{- 1 / m} (D_{n}^{-}) (S_{φ, t, θ, D_{n}^{-}} (x, y) - E [S_{φ, t, θ, D_{n}^{-}} (x, y)]) .$

The linear term of the process is given by

$\begin{matrix} m & \sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})} u_{n}^{(1)} (π_{1, m} G_{φ, t, θ}^{(T)}) \\ = \frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{i = 1}^{n} \{S_{φ, t, θ, D_{n}^{-}} (X_{i}, Y_{i}) - E [S_{φ, t, θ, D_{n}^{-}} (X_{i}, Y_{i})]\} \\ : = α_{n} (S_{φ, t, θ, D_{n}^{-}}) . \end{matrix}$

Thus, the linear term of the U-process

{μ_{n} (φ, t, θ, D_{n}^{-})}_{F_{m} K_{Θ}^{m}}

is an empirical process indexed by the class of functions

S

, defined as follows:

$S = \{S_{φ, t, θ, h} (\cdot, \cdot) φ \in F_{m}, t = (t_{1}, \dots, t_{m}) \in X^{m}\} .$

Hence, its weak convergence can be established in a manner similar to the proof of Theorem 14. It is evident that

S \subset m G^{(1)}

. Now, turning to the nonlinear part, we must demonstrate that

${∥\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})} u_{n}^{(p)} (π_{k, m} G_{φ, t, θ, D_{n}^{-}}^{(T)})∥}_{F_{m} K_{Θ}^{m}} \overset{P}{⟶} 0, for 2 \leq p \leq m .$

This is a consequence of Lemma 1. It is important to note that the selection of the number and size of the blocks must be performed in such a way that the terms

I - VI

converge to 0. We need to prove that

$P \{{∥μ_{n}^{(R)} (φ, t, θ, D_{n}^{-})∥}_{F_{m} K_{Θ}^{m}} > λ\} \to 0 as n \to \infty .$

Again, we confine our discussion to the case $m = 2$ for clarity. We have

$\begin{matrix} μ_{n}^{(R)} (φ, t, θ, D_{n}^{-}) \\ = \sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})} \{u_{n}^{(R)} (φ, t, θ, D_{n}^{-}) - E (u_{n}^{(R)} (φ, t, θ, D_{n}^{-}))\} \\ = \frac{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}}{n (n - 1)} \sum_{i \neq j}^{n} \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))]\} \\ \leq \frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} \tilde{ϕ} (D_{n}^{-}) \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))]\} \\ + \frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p = 1}^{υ_{n}} \sum_{i \neq j i, j \in H_{p}^{(U)}} \tilde{ϕ} (D_{n}^{-}) \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))]\} \\ + 2 \frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p = 1}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{q : | q - p | ⩾ 2}^{υ_{n}} \sum_{j \in T_{q}^{(U)}} \tilde{ϕ} (D_{n}^{-}) \\ \times \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))]\} \\ + 2 \frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p = 1}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{q : | q - p | ⩽ 1}^{υ_{n}} \sum_{j \in T_{q}^{(U)}} {\tilde{ϕ}}^{1 / m} (D_{n}^{-}) \\ \times \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))]\} \\ + \frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p \neq q}^{υ_{n}} \sum_{i \in T_{p}^{(U)}} \sum_{j \in T_{q}^{(U)}} \tilde{ϕ} (D_{n}^{-}) \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))]\} \\ + \frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p = 1}^{υ_{n}} \sum_{i \neq j i, j \in T_{p}^{(U)}} \tilde{ϕ} (D_{n}^{-}) \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))]\} \\ = : I^{'} + {II}^{'} + {III}^{'} + {IV}^{'} + V^{'} + {VI}^{'} . \end{matrix}$

We employ blocking arguments to handle the resulting terms. We begin by examining the first term

I^{'}

. We have

$\begin{matrix} P \{{∥\frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} \tilde{ϕ} (D_{n}^{-}) \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))]\}∥}_{F_{2} K_{Θ}^{2}} \\ > δ} \\ \leq P \{{∥\frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} \tilde{ϕ} (D_{n}^{-}) \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j}))]\}∥}_{F_{2} K_{Θ}^{2}} \\ > δ} + 2 υ_{n} β_{b_{n}} . \end{matrix}$

Notice that (48) readily implies that

υ_{n} β_{b_{n}} \to 0

and recall that for all

φ \in F_{m}

$x, t \in X^{2}, y \in Y^{2} : κ_{2}^{2} 1_{\{d (x, t) ⩽ h_{K}\}} F (y) ⩾ φ (y) \tilde{K} (\frac{d (x, t)}{h_{K}}) .$

By the symmetry of Function $F (\cdot)$ , it holds that

(118) $\begin{matrix} {∥\frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} \tilde{ϕ} (D_{n}^{-}) \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j}))]\}∥}_{F_{2} K_{Θ}^{2}} \\ ⩽ |\frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} \{κ^{2} F (ζ_{i}, ζ_{j}) 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}} \\ - E [κ^{2} F (ζ_{i}, ζ_{j}) 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}}]\}| . \end{matrix}$

Employing Chebyshev’s inequality and Hoeffding’s trick while maintaining order, we obtain

(119) $\begin{matrix} P \{|\frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} \{κ^{2} F (ζ_{i}, ζ_{j}) 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}} \\ - E [κ^{2} F (ζ_{i}, ζ_{j}) 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}}]\}| > δ\} \\ ⩽ δ^{- 2} n^{- 1} {\tilde{ϕ}}^{- 1 / m} (D_{n}^{-}) V a r (\sum_{p \neq q}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{j \in H_{q}^{(U)}} κ^{2} F (ζ_{i}, ζ_{j}) 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}}) \\ ⩽ c_{2} υ_{n} δ^{- 2} n^{- 1} {\tilde{ϕ}}^{- 1 / m} (D_{n}^{-}) V a r (\sum_{p = 1}^{υ_{n}} \sum_{i, j \in H_{p}^{(U)}} κ^{2} F (ζ_{i}, ζ_{j}^{'}) 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}}) \\ ⩽ 2 c_{2} υ_{n} δ^{- 2} n^{- 2} {\tilde{ϕ}}^{- 1 / m} (D_{n}^{-}) E [{(κ^{2} F (ζ_{1}, ζ_{2}))}^{2} 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}}] . \end{matrix}$

Under (C.6.) we have, for each

λ > 0

$\begin{matrix} c_{2} υ_{n} δ^{- 2} n^{- 2} {\tilde{ϕ}}^{- 1 / m} (D_{n}^{-}) E [{(κ^{2} F (ζ_{1}, ζ_{2}))}^{2} 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}}] \\ = c_{2} υ_{n} δ^{- 2} n^{- 2} {\tilde{ϕ}}^{- 1 / m} (D_{n}^{-}) \int_{0}^{\infty} P \{{(κ^{2} F (ζ_{1}, ζ_{2}))}^{2} 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}} ⩾ t\} d t \\ = c_{2} υ_{n} δ^{- 2} n^{- 2} {\tilde{ϕ}}^{- 1 / m} (D_{n}^{-}) \int_{0}^{λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}} P \{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\} d t \\ + c_{2} υ_{n} δ^{- 2} n^{- 2} {\tilde{ϕ}}^{- 1 / m} (D_{n}^{-}) \int_{λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}}^{\infty} P \{{(κ^{2} F)}^{2} > t\} d t, \end{matrix}$

which tends to 0 as

n \to \infty

. Terms

{II}^{'}, V^{'}

, and

{VI}^{'}

are handled in the same manner as the first, with the exception that for

{II}^{'}, {VI}^{'}

, we do not need to apply Hoeffding’s trick. This is because our variables

{ζ_{i}, ζ_{j}}_{i, j \in H_{p}^{(U)}}

(or

{ζ_{i}, ζ_{j}}_{i, j \in T_{p}^{(U)}}

for

{VI}^{'}

) are in the same blocks. For term

{IV}^{'}

, we deduce its study from those of

I^{'}

and

{III}^{'}

. Let us consider term

{III}^{'}

. As for the truncated part, we have

(120) $\begin{matrix} P & \{∥\frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p = 1}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{q : | q - p | ⩾ 2}^{υ_{n}} \sum_{j \in T_{q}^{(U)}} \tilde{ϕ} (D_{n}^{-}) \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j})) \\ {- E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((X_{i}, X_{j}), (Y_{i}, Y_{j}))]\}∥}_{F_{2} K_{Θ}^{2}} > δ\} \\ ⩽ P \{∥\frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{p = 1}^{υ_{n}} \sum_{i \in H_{p}^{(U)}} \sum_{q : | q - p | ⩾ 2}^{υ_{n}} \sum_{j \in T_{q}^{(U)}} \tilde{ϕ} (D_{n}^{-}) \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j})) \\ {- E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j}))]\}∥}_{F_{2} K_{Θ}^{2}} > δ\} \\ + \frac{υ_{n}^{2} a_{n} b_{n} β_{a_{n}}}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} . \end{matrix}$

We also have

$\begin{matrix} P \{{∥\frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{i \in H_{p}^{(U)}} \sum_{q : | q - p | ⩾ 2}^{υ_{n}} \sum_{j \in T_{q}^{(U)}} \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j}))]\}∥}_{F_{2} K_{Θ}^{2}} \\ > δ} \\ ⩽ P \{{∥\frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{i \in H_{1}^{(U)}} \sum_{q = 3}^{υ_{n}} \sum_{j \in T_{q}^{(U)}} \{G_{φ, t, θ, D_{n}^{-}}^{(R)} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j})) - E [G_{φ, t, θ, D_{n}^{-}}^{(R)} ((ς_{i}, ς_{j}), (ζ_{i}, ζ_{j}))]\}∥}_{F_{2} K_{Θ}^{2}} \\ > δ} . \end{matrix}$

Since Equation (118) is still satisfied, the problem is reduced to

$\begin{matrix} P \{|\frac{1}{\sqrt{n {\tilde{ϕ}}^{1 / m} (D_{n}^{-})}} \sum_{i \in H_{1}^{(U)}} \sum_{q = 3}^{υ_{n}} \sum_{j \in T_{q}^{(U)}} \{κ^{2} F (ζ_{i}, ζ_{j}) 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}} \\ - E [κ^{2} F (ζ_{i}, ζ_{j}) 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}}]\}| > δ\} \\ ⩽ δ^{- 2} n^{- 1 / m} \tilde{ϕ} (D_{n}^{-}) V a r (\sum_{i \in H_{1}^{(U)}} \sum_{q = 3}^{υ_{n}} \sum_{j \in T_{q}^{(U)}} κ^{2} F (ζ_{i}, ζ_{j}) 1_{\{κ^{2} F > λ {(n /\tilde{ϕ} (D_{n}^{-}))}^{1 / 2 (p - 1)}\}}), \end{matrix}$

and we follow the same procedure as in (119). The remaining terms have been shown to be asymptotically negligible. Consequently, process

{\{μ_{n} (φ, t, θ, D_{n}^{-})\}}_{F_{m} K_{Θ}^{m}}

converges in distribution to a Gaussian process, which admits a version with uniformly bounded and uniformly continuous paths with respect to the

{∥ \cdot ∥}_{2}

-norm. By applying the same reasoning, the same result holds for process

{\{μ_{n} (φ, t, θ, D_{n}^{+})\}}_{F_{m} K_{Θ}^{m}}

. Thus, according to (117), it follows that process

{\{μ_{n} (φ, t, θ, h_{n, k} (t))\}}_{F_{m} K_{Θ}^{m}}

also converges in distribution to a Gaussian process, which has a version with uniformly bounded and uniformly continuous paths with respect to the

{∥ \cdot ∥}_{2}

-norm. Similarly, process

{\{μ_{n} (1, t, θ, h_{n, k} (t))\}}_{K_{Θ}^{m}}

is treated in the same manner, and for

E (u_{n} (φ, t, θ, h_{n, k} (t)))

u_{n} (1, t, θ, h_{n, k} (t))

, and

E (u_{n} (1, t, θ, h_{n, k} (t)))

, the analysis follows the same steps as in the proof of Theorem 14. □

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

The author extends his sincere gratitude to the Editor-in-Chief, the Associate Editor, and the three referees for their invaluable feedback and for pointing out a number of oversights in the version initially submitted. Their insightful comments have greatly refined and focused the original work, resulting in a markedly improved presentation.

Conflicts of Interest

The author declares that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

1. Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Stat.; 1948; 19, pp. 293-325. [DOI: https://dx.doi.org/10.1214/aoms/1177730196]

2. Halmos, P.R. The theory of unbiased estimation. Ann. Math. Statist.; 1946; 17, pp. 34-43. [DOI: https://dx.doi.org/10.1214/aoms/1177731020]

3. van der Vaart, A. Asymptotic Statistics; Cambridge Series in Statistical and Probabilistic Mathematics Cambridge University Press: Cambridge, UK, 1998; Volume 3, xvi+443. [DOI: https://dx.doi.org/10.1017/CBO9780511802256]

4. Rubin, H.; Vitale, R.A. Asymptotic distribution of symmetric statistics. Ann. Statist.; 1980; 8, pp. 165-170. [DOI: https://dx.doi.org/10.1214/aos/1176344898]

5. Serfling, R.J. Approximation Theorems of Mathematical Statistics; Wiley Series in Probability and Mathematical Statistics John Wiley Sons, Inc.: New York, NY, USA, 1980; xiv+371.

6. Dynkin, E.B.; Mandelbaum, A. Symmetric statistics, Poisson point processes, and multiple Wiener integrals. Ann. Statist.; 1983; 11, pp. 739-745. [DOI: https://dx.doi.org/10.1214/aos/1176346241]

7. Bretagnolle, J. Lois limites du bootstrap de certaines fonctionnelles. Ann. Inst. H. Poincaré Sect. B; 1983; 19, pp. 281-296.

8. von Mises, R. On the asymptotic distribution of differentiable statistical functions. Ann. Math. Stat.; 1947; 18, pp. 309-348. [DOI: https://dx.doi.org/10.1214/aoms/1177730385]

9. Filippova, A.A. Mises theorem on the limit behaviour of functionals derived from empirical distribution functions. Dokl. Akad. Nauk SSSR; 1959; 129, pp. 44-47.

10. Arcones, M.A.; Giné, E. Limit theorems for U-processes. Ann. Probab.; 1993; 21, pp. 1494-1542. [DOI: https://dx.doi.org/10.1214/aop/1176989128]

11. Lee, A.J. U-Statistics; Statistics: Textbooks and Monographs Marcel Dekker, Inc.: New York, NY, USA, 1990; Volume 110, xii+302.

12. Borovskikh, Y.V. U-Statistics in Banach Spaces; VSP: Utrecht, The Netherlands, 1996; xii+420.

13. Ghosal, S.; Sen, A.; van der Vaart, A.W. Testing monotonicity of regression. Ann. Statist.; 2000; 28, pp. 1054-1082. [DOI: https://dx.doi.org/10.1214/aos/1015956707]

14. Abrevaya, J.; Jiang, W. A nonparametric approach to measuring and testing curvature. J. Bus. Econom. Statist.; 2005; 23, pp. 1-19. [DOI: https://dx.doi.org/10.1198/073500104000000316]

15. Nolan, D.; Pollard, D. U-processes: Rates of convergence. Ann. Statist.; 1987; 15, pp. 780-799. [DOI: https://dx.doi.org/10.1214/aos/1176350374]

16. Sherman, R.P. Maximal inequalities for degenerate U-processes with applications to optimization estimators. Ann. Statist.; 1994; 22, pp. 439-459. [DOI: https://dx.doi.org/10.1214/aos/1176325377]

17. de la Peña, V.H.; Giné, E. From dependence to independence, Randomly stopped processes. U-statistics and processes. Martingales and beyond. Decoupling; Probability and its Applications (New York) Springer: New York, NY, USA, 1999; xvi+392. [DOI: https://dx.doi.org/10.1007/978-1-4612-0537-1]

18. Arcones, M.A.; Wang, Y. Some new tests for normality based on U-processes. Statist. Probab. Lett.; 2006; 76, pp. 69-82. [DOI: https://dx.doi.org/10.1016/j.spl.2005.07.003]

19. Schick, A.; Wang, Y.; Wefelmeyer, W. Tests for normality based on density estimators of convolutions. Statist. Probab. Lett.; 2011; 81, pp. 337-343. [DOI: https://dx.doi.org/10.1016/j.spl.2010.10.022]

20. Nikitin, Y.Y.; Ahsanullah, M. New U-empirical test of symmetry based on extremal order statistics, and their efficiencies. Mathematical Statistics and Limit Theorems; Springer: Cham, Switzerland, 2015; pp. 231-248.

21. Yeo, I.K.; Johnson, R.A. A uniform strong law of large numbers for U-statistics with application to transforming to near symmetry. Statist. Probab. Lett.; 2001; 51, pp. 63-69. [DOI: https://dx.doi.org/10.1016/S0167-7152(00)00143-7]

22. Joly, E.; Lugosi, G. Robust estimation of U-statistics. Stoch. Process. Appl.; 2016; 126, pp. 3760-3773. [DOI: https://dx.doi.org/10.1016/j.spa.2016.04.021]

23. Janson, S. A functional limit theorem for random graphs with applications to subgraph count statistics. Random Struct. Algorithms; 1990; 1, pp. 15-37. [DOI: https://dx.doi.org/10.1002/rsa.3240010103]

24. Frees, E.W. Infinite order U-statistics. Scand. J. Statist.; 1989; 16, pp. 29-45.

25. Heilig, C.; Nolan, D. Limit theorems for the infinite-degree U-process. Statist. Sin.; 2001; 11, pp. 289-302.

26. Song, Y.; Chen, X.; Kato, K. Approximating high-dimensional infinite-order U-statistics: Statistical and computational guarantees. Electron. J. Stat.; 2019; 13, pp. 4794-4848. [DOI: https://dx.doi.org/10.1214/19-EJS1643]

27. Soukarieh, I.; Bouzebda, S. Renewal type bootstrap for increasing degree U-process of a Markov chain. J. Multivar. Anal.; 2023; 195, 105143. [DOI: https://dx.doi.org/10.1016/j.jmva.2022.105143]

28. Peng, W.; Coleman, T.; Mentch, L. Rates of convergence for random forests via generalized U-statistics. Electron. J. Stat.; 2022; 16, pp. 232-292. [DOI: https://dx.doi.org/10.1214/21-EJS1958]

29. Faivishevsky, L.; Goldberger, J. ICA based on a Smooth Estimation of the Differential Entropy. Proceedings of the Advances in Neural Information Processing Systems; Koller, D.; Schuurmans, D.; Bengio, Y.; Bottou, L. Curran Associates, Inc.: New York, NY, USA, 2008; Volume 21.

30. Liu, Q.; Lee, J.; Jordan, M. A Kernelized Stein Discrepancy for Goodness-of-fit Tests. Proceedings of the 33rd International Conference on Machine Learning; New York, NY, USA, 20–22 June 2016; Balcan, M.F.; Weinberger, K.Q. Volume 48, pp. 276-284.

31. Cybis, G.B.; Valk, M.; Lopes, S.R.C. Clustering and classification problems in genetics through U-statistics. J. Stat. Comput. Simul.; 2018; 88, pp. 1882-1902. [DOI: https://dx.doi.org/10.1080/00949655.2017.1374387]

32. Lim, F.; Stojanovic, V.M. On U-statistics and compressed sensing I: Non-asymptotic average-case analysis. IEEE Trans. Signal Process.; 2013; 61, pp. 2473-2485. [DOI: https://dx.doi.org/10.1109/TSP.2013.2247598]

33. Jadhav, S.; Ma, S. An association test for functional data based on Kendall’s tau. J. Multivar. Anal.; 2021; 184, 104740. [DOI: https://dx.doi.org/10.1016/j.jmva.2021.104740]

34. Bello, D.Z.; Valk, M.; Cybis, G.B. Towards U-statistics clustering inference for multiple groups. J. Stat. Comput. Simul.; 2024; 94, pp. 204-222. [DOI: https://dx.doi.org/10.1080/00949655.2023.2239978]

35. Kim, I.; Ramdas, A. Dimension-agnostic inference using cross U-statistics. Bernoulli; 2024; 30, pp. 683-711. [DOI: https://dx.doi.org/10.3150/23-BEJ1613]

36. Chen, L.; Wan, A.T.K.; Zhang, S.; Zhou, Y. Distributed algorithms for U-statistics-based empirical risk minimization. J. Mach. Learn. Res.; 2023; 24, pp. 1-43.

37. Janson, S. Asymptotic normality for m-dependent and constrained U-statistics, with applications to pattern matching in random strings and permutations. Adv. Appl. Probab.; 2023; 55, pp. 841-894. [DOI: https://dx.doi.org/10.1017/apr.2022.51]

38. Sudheesh, K.K.; Anjana, S.; Xie, M. U-statistics for left truncated and right censored data. Statistics; 2023; 57, pp. 900-917. [DOI: https://dx.doi.org/10.1080/02331888.2023.2217314]

39. Le Minh, T. U-statistics on bipartite exchangeable networks. ESAIM Probab. Stat.; 2023; 27, pp. 576-620. [DOI: https://dx.doi.org/10.1051/ps/2023010]

40. Nadaraja, E.A. On a regression estimate. Teor. Verojatnost. I Primenen.; 1964; 9, pp. 157-159.

41. Watson, G.S. Smooth regression analysis. Sankhyā Ser. A; 1964; 26, pp. 359-372.

42. Bouzebda, S.; Taachouche, N. Rates of the Strong Uniform Consistency for the Kernel-Type Regression Function Estimators with General Kernels on Manifolds. Math. Methods Statist.; 2023; 32, pp. 27-80. [DOI: https://dx.doi.org/10.3103/S1066530723010027]

43. Bouzebda, S.; Taachouche, N. Rates of the strong uniform consistency with rates for conditional U-statistics estimators with general kernels on manifolds. Math. Methods Statist.; 2024; 33, pp. 95-153. [DOI: https://dx.doi.org/10.3103/S1066530724700066]

44. Sen, A. Uniform strong consistency rates for conditional U-statistics. Sankhyā Ser. A; 1994; 56, pp. 179-194.

45. Prakasa Rao, B.L.S.; Sen, A. Limit distributions of conditional U-statistics. J. Theoret. Probab.; 1995; 8, pp. 261-301. [DOI: https://dx.doi.org/10.1007/BF02212880]

46. Harel, M.; Puri, M.L. Conditional U-statistics for dependent random variables. J. Multivar. Anal.; 1996; 57, pp. 84-100. [DOI: https://dx.doi.org/10.1006/jmva.1996.0023]

47. Stute, W. Symmetrized NN-conditional U-statistics. Research Developments in Probability and Statistics; VSP: Utrecht, The Netherlands, 1996; pp. 231-237.

48. Fu, K.A. An application of U-statistics to nonparametric functional data analysis. Comm. Statist. Theory Methods; 2012; 41, pp. 1532-1542. [DOI: https://dx.doi.org/10.1080/03610926.2010.526747]

49. Bouzebda, S.; Nemouchi, B. Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data. J. Nonparametr. Stat.; 2020; 32, pp. 452-509. [DOI: https://dx.doi.org/10.1080/10485252.2020.1759597]

50. Bouzebda, S.; Elhattab, I.; Nemouchi, B. On the uniform-in-bandwidth consistency of the general conditional U-statistics based on the copula representation. J. Nonparametr. Stat.; 2021; 33, pp. 321-358. [DOI: https://dx.doi.org/10.1080/10485252.2021.1937621]

51. Bouzebda, S.; Nezzal, A. Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data. Jpn. J. Stat. Data Sci.; 2022; 5, pp. 431-533. [DOI: https://dx.doi.org/10.1007/s42081-022-00161-3]

52. Bouzebda, S.; Nezzal, A. Asymptotic properties of conditional U-statistics using delta sequences. Comm. Statist. Theory Methods; 2024; 53, pp. 4602-4657. [DOI: https://dx.doi.org/10.1080/03610926.2023.2179887]

53. Bouzebda, S.; Nezzal, A.; Zari, T. Uniform Consistency for Functional Conditional U-Statistics Using Delta-Sequences. Mathematics; 2023; 11, 161. [DOI: https://dx.doi.org/10.3390/math11010161]

54. Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; 2nd ed. Springer Series in Statistics Springer: New York, NY, USA, 2005; xx+426.

55. Ramsay, J.O.; Silverman, B.W. Methods and case studies. Applied Functional Data Analysis; Springer Series in Statistic Springer: New York, NY, USA, 2002; x+190. [DOI: https://dx.doi.org/10.1007/b98886]

56. Ferraty, F.; Vieu, P. Theory and practice. Nonparametric Functional Data Analysis; Springer Series in Statistics Springer: New York, NY, USA, 2006; xx+258.

57. Araujo, A.; Giné, E. The Central Limit Theorem for Real and Banach Valued Random Variables; Wiley Series in Probability and Mathematical Statistics John Wiley & Sons: New York, NY, USA, Chichester, UK, Brisbane, Australia, 1980; xiv+233.

58. Gasser, T.; Hall, P.; Presnell, B. Nonparametric estimation of the mode of a distribution of random curves. J. R. Stat. Soc. Ser. B Stat. Methodol.; 1998; 60, pp. 681-691. [DOI: https://dx.doi.org/10.1111/1467-9868.00148]

59. Bosq, D. Lecture Notes in Statistics. Linear Processes in Function Spaces: Theory and Applications; Springer: New York, NY, USA, 2000; Volume 149, xiv+283. [DOI: https://dx.doi.org/10.1007/978-1-4612-1154-9]

60. Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer Series in Statistics Springer: New York, NY, USA, 2012; xiv+422. [DOI: https://dx.doi.org/10.1007/978-1-4614-3655-3]

61. Kokoszka, P.; Reimherr, M. Introduction to Functional Data Analysis; Texts in Statistical Science Series CRC Press: Boca Raton, FL, USA, 2017; xvi+290.

62. Ferraty, F.; Laksaci, A.; Tadj, A.; Vieu, P. Rate of uniform consistency for nonparametric estimates with functional variables. J. Statist. Plann. Inference; 2010; 140, pp. 335-352. [DOI: https://dx.doi.org/10.1016/j.jspi.2009.07.019]

63. Kara-Zaitri, L.; Laksaci, A.; Rachdi, M.; Vieu, P. Uniform in bandwidth consistency for various kernel estimators involving functional data. J. Nonparametr. Stat.; 2017; 29, pp. 85-107. [DOI: https://dx.doi.org/10.1080/10485252.2016.1254780]

64. Attouch, M.; Laksaci, A.; Rafaa, F. On the local linear estimate for functional regression: Uniform in bandwidth consistency. Comm. Statist. Theory Methods; 2019; 48, pp. 1836-1853. [DOI: https://dx.doi.org/10.1080/03610926.2018.1440308]

65. Ling, N.; Meng, S.; Vieu, P. Uniform consistency rate of kNN regression estimation for functional time series data. J. Nonparametr. Stat.; 2019; 31, pp. 451-468. [DOI: https://dx.doi.org/10.1080/10485252.2019.1583338]

66. Mohammedi, M.; Bouzebda, S.; Laksaci, A. The consistency and asymptotic normality of the kernel type expectile regression estimator for functional data. J. Multivar. Anal.; 2021; 181, 104673. [DOI: https://dx.doi.org/10.1016/j.jmva.2020.104673]

67. Almanjahie, I.M.; Bouzebda, S.; Kaid, Z.; Laksaci, A. Nonparametric estimation of expectile regression in functional dependent data. J. Nonparametr. Stat.; 2022; 34, pp. 250-281. [DOI: https://dx.doi.org/10.1080/10485252.2022.2027412]

68. Almanjahie, I.M.; Bouzebda, S.; Chikr Elmezouar, Z.; Laksaci, A. The functional kNN estimator of the conditional expectile: Uniform consistency in number of neighbors. Stat. Risk Model.; 2022; 38, pp. 47-63. [DOI: https://dx.doi.org/10.1515/strm-2019-0029]

69. Bouzebda, S.; Laksaci, A.; Mohammedi, M. Single index regression model for functional quasi-associated time series data. Revstat; 2022; 20, pp. 605-631.

70. Bouzebda, S.; Laksaci, A.; Mohammedi, M. The k-nearest neighbors method in single index regression model for functional quasi-associated time series data. Rev. Mat. Complut.; 2023; 36, pp. 361-391. [DOI: https://dx.doi.org/10.1007/s13163-022-00436-z]

71. Almanjahie, I.M.; Bouzebda, S.; Kaid, Z.; Laksaci, A. The local linear functional kNN estimator of the conditional expectile: Uniform consistency in number of neighbors. Metrika; 2024; 87, pp. 1007-1035. [DOI: https://dx.doi.org/10.1007/s00184-023-00942-0]

72. Härdle, W.; Hall, P.; Ichimura, H. Optimal smoothing in single-index models. Ann. Statist.; 1993; 21, pp. 157-178. [DOI: https://dx.doi.org/10.1214/aos/1176349020]

73. Bhattacharjee, S.; Müller, H.G. Single index Fréchet regression. Ann. Statist.; 2023; 51, pp. 1770-1798. [DOI: https://dx.doi.org/10.1214/23-AOS2307]

74. Liang, H.; Liu, X.; Li, R.; Tsai, C.L. Estimation and testing for partially linear single-index models. Ann. Statist.; 2010; 38, pp. 3811-3836. [DOI: https://dx.doi.org/10.1214/10-AOS835]

75. Stute, W.; Zhu, L.X. Nonparametric checks for single-index models. Ann. Statist.; 2005; 33, pp. 1048-1083. [DOI: https://dx.doi.org/10.1214/009053605000000020]

76. Gu, L.; Yang, L. Oracally efficient estimation for single-index link function with simultaneous confidence band. Electron. J. Stat.; 2015; 9, pp. 1540-1561. [DOI: https://dx.doi.org/10.1214/15-EJS1051]

77. Cuevas, A. A partial overview of the theory of statistics with functional data. J. Statist. Plann. Inference; 2014; 147, pp. 1-23. [DOI: https://dx.doi.org/10.1016/j.jspi.2013.04.002]

78. Goia, A.; Vieu, P. An introduction to recent advances in high/infinite dimensional statistics [Editorial]. J. Multivar. Anal.; 2016; 146, pp. 1-6. [DOI: https://dx.doi.org/10.1016/j.jmva.2015.12.001]

79. Ling, N.; Vieu, P. Nonparametric modelling for functional data: Selected survey and tracks for future. Statistics; 2018; 52, pp. 934-949. [DOI: https://dx.doi.org/10.1080/02331888.2018.1487120]

80. Ferraty, F.; Peuch, A.; Vieu, P. Modèle à indice fonctionnel simple. C. R. Math. Acad. Sci. Paris; 2003; 336, pp. 1025-1028. [DOI: https://dx.doi.org/10.1016/S1631-073X(03)00239-5]

81. Ait-Saïdi, A.; Ferraty, F.; Kassa, R.; Vieu, P. Cross-validated estimations in the single-functional index model. Statistics; 2008; 42, pp. 475-494. [DOI: https://dx.doi.org/10.1080/02331880801980377]

82. Attaoui, S.; Bentat, B.; Bouzebda, S.; Laksaci, A. The strong consistency and asymptotic normality of the kernel estimator type in functional single index model in presence of censored data. AIMS Math.; 2024; 9, pp. 7340-7371. [DOI: https://dx.doi.org/10.3934/math.2024356]

83. Jiang, Z.; Huang, Z.; Zhang, J. Functional single-index composite quantile regression. Metrika; 2023; 86, pp. 595-603. [DOI: https://dx.doi.org/10.1007/s00184-022-00887-w]

84. Nie, Y.; Wang, L.; Cao, J. Estimating functional single index models with compact support. Environmetrics; 2023; 34, e2784. [DOI: https://dx.doi.org/10.1002/env.2784]

85. Zhu, H.; Zhang, R.; Liu, Y.; Ding, H. Robust estimation for a general functional single index model via quantile regression. J. Korean Statist. Soc.; 2022; 51, pp. 1041-1070. [DOI: https://dx.doi.org/10.1007/s42952-022-00174-4]

86. Tang, Q.; Kong, L.; Rupper, D.; Karunamuni, R.J. Partial functional partially linear single-index models. Statist. Sin.; 2021; 31, pp. 107-133. [DOI: https://dx.doi.org/10.5705/ss.202018.0316]

87. Ling, N.; Cheng, L.; Vieu, P.; Ding, H. Missing responses at random in functional single index model for time series data. Statist. Pap.; 2022; 63, pp. 665-692. [DOI: https://dx.doi.org/10.1007/s00362-021-01251-2]

88. Ling, N.; Cheng, L.; Vieu, P. Single functional index model under responses MAR and dependent observations. Functional and High-Dimensional Statistics and Related Fields; Contrib. Stat. Springer: Cham, Switzerland, 2020; pp. 161-168. [DOI: https://dx.doi.org/10.1007/978-3-030-47756-1_22]

89. Feng, S.; Tian, P.; Hu, Y.; Li, G. Estimation in functional single-index varying coefficient model. J. Statist. Plann. Inference; 2021; 214, pp. 62-75. [DOI: https://dx.doi.org/10.1016/j.jspi.2021.01.003]

90. Novo, S.; Aneiros, G.; Vieu, P. Automatic and location-adaptive estimation in functional single-index regression. J. Nonparametr. Stat.; 2019; 31, pp. 364-392. [DOI: https://dx.doi.org/10.1080/10485252.2019.1567726]

91. Li, J.; Huang, C.; Zhu, H. A functional varying-coefficient single-index model for functional response data. J. Amer. Statist. Assoc.; 2017; 112, pp. 1169-1181. [DOI: https://dx.doi.org/10.1080/01621459.2016.1195742] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29200540]

92. Attaoui, S.; Ling, N. Asymptotic results of a nonparametric conditional cumulative distribution estimator in the single functional index modeling for time series data with applications. Metrika; 2016; 79, pp. 485-511. [DOI: https://dx.doi.org/10.1007/s00184-015-0564-6]

93. Chen, D.; Hall, P.; Müller, H.G. Single and multiple index functional regression models with nonparametric link. Ann. Statist.; 2011; 39, pp. 1720-1747. [DOI: https://dx.doi.org/10.1214/11-AOS882]

94. Fix, E.; Hodges, J.L.J. Discriminatory Analysis—Nonparametric Discrimination: Consistency Properties; Technical Report Project 21-49-004, Report 4 USAF School of Aviation Medicine, Randolph Field: San Antonio, TX, USA, 1951.

95. Fix, E.; Hodges, J.L.J. Discriminatory analysis–nonparametric discrimination: Consistency properties. Int. Stat. Rev.; 1989; 57, pp. 238-247. [DOI: https://dx.doi.org/10.2307/1403797]

96. Loftsgaarden, D.O.; Quesenberry, C.P. A nonparametric estimate of a multivariate density function. Ann. Math. Statist.; 1965; 36, pp. 1049-1051. [DOI: https://dx.doi.org/10.1214/aoms/1177700079]

97. Biau, G.; Devroye, L. Lectures on the Nearest Neighbor Method; Springer Series in the Data Sciences Springer: Cham, Switzerland, 2015; ix+290. [DOI: https://dx.doi.org/10.1007/978-3-319-25388-6]

98. Collomb, G. Estimation de la régression par la méthode des k points les plus proches avec noyau: Quelques propriétés de convergence ponctuelle. Proceedings of the Nonparametric Asymptotic Statistics (Proc. Conf., Rouen, 1979) (French); Lecture Notes in Math Springer: Berlin/Heidelberg, Germany, 1980; Volume 821, pp. 159-175.

99. Mack, Y.P. Local Properties of k-NN Regression Estimates. SIAM J. Algebr. Discret. Methods; 1981; 2, pp. 311-323. [DOI: https://dx.doi.org/10.1137/0602035]

100. Györfi, L. The rate of convergence of k-NN regression estimation and classification. IEEE Trans. Inform. Theory.; 1981; 27, pp. 500-509. [DOI: https://dx.doi.org/10.1109/TIT.1981.1056344]

101. Bhattacharya, P.K.; Mack, Y.P. Weak convergence of k-NN density and regression estimators with varying k and applications. Ann. Statist.; 1987; 15, pp. 976-994. [DOI: https://dx.doi.org/10.1214/aos/1176350487]

102. Devroye, L.; Györfi, L.; Krzyzak, A.; Lugosi, G. On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates. Ann. Statist.; 1994; 22, pp. 1371-1385. [DOI: https://dx.doi.org/10.1214/aos/1176325633]

103. Laloë, T. A k-nearest neighbor approach for functional regression. Statist. Probab. Lett.; 2008; 78, pp. 1189-1193. [DOI: https://dx.doi.org/10.1016/j.spl.2007.11.014]

104. Chikr-Elmezouar, Z.; Almanjahie, I.M.; Laksaci, A.; Rachdi, M. FDA: Strong consistency of the kNN local linear estimation of the functional conditional density and mode. J. Nonparametr. Stat.; 2019; 31, pp. 175-195. [DOI: https://dx.doi.org/10.1080/10485252.2018.1538450]

105. Ling, N.; Aneiros, G.; Vieu, P. kNN estimation in functional partial linear modeling. Statist. Pap.; 2020; 61, pp. 423-444. [DOI: https://dx.doi.org/10.1007/s00362-017-0946-0]

106. Einmahl, U.; Mason, D.M. Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist.; 2005; 33, pp. 1380-1403. [DOI: https://dx.doi.org/10.1214/009053605000000129]

107. Dony, J.; Einmahl, U. Uniform in bandwidth consistency of kernel regression estimators at a fixed point. High Dimensional Probability V: The Luminy Volume; Institute of Mathematical Statistics: Beachwood, OH, USA, 2009; Volume 5, pp. 308-325. [DOI: https://dx.doi.org/10.1214/09-IMSCOLL520]

108. Kara, L.Z.; Laksaci, A.; Rachdi, M.; Vieu, P. Data-driven kNN estimation in nonparametric functional data analysis. J. Multivar. Anal.; 2017; 153, pp. 176-188. [DOI: https://dx.doi.org/10.1016/j.jmva.2016.09.016]

109. Arcones, M.A.; Yu, B. Central limit theorems for empirical and U-processes of stationary mixing sequences. J. Theoret. Probab.; 1994; 7, pp. 47-71. [DOI: https://dx.doi.org/10.1007/BF02213360]

110. Bouzebda, S.; Nemouchi, B. Central limit theorems for conditional empirical and conditional U-processes of stationary mixing sequences. Math. Methods Statist.; 2019; 28, pp. 169-207. [DOI: https://dx.doi.org/10.3103/S1066530719030013]

111. Masry, E. Nonparametric regression estimation for dependent functional data: Asymptotic normality. Stoch. Process. Appl.; 2005; 115, pp. 155-177. [DOI: https://dx.doi.org/10.1016/j.spa.2004.07.006]

112. Bouzebda, S.; Nemouchi, B. Weak-convergence of empirical conditional processes and conditional U-processes involving functional mixing data. Stat. Inference Stoch. Process.; 2023; 26, pp. 33-88. [DOI: https://dx.doi.org/10.1007/s11203-022-09276-6]

113. Hristache, M.; Juditsky, A.; Spokoiny, V. Direct estimation of the index coefficient in a single-index model. Ann. Statist.; 2001; 29, pp. 595-623. [DOI: https://dx.doi.org/10.1214/aos/1009210682]

114. Ferraty, F.; Vieu, P. The functional nonparametric model and application to spectrometric data. Comput. Statist.; 2002; 17, pp. 545-564. [DOI: https://dx.doi.org/10.1007/s001800200126]

115. Ferraty, F.; Park, J.; Vieu, P. Estimation of a functional single index model. Recent Advances in Functional Data Analysis and Related Topics; Contrib. Statist. Physica-Verlag/Springer: Berlin/Heidelberg, Germany, 2011; pp. 111-116. [DOI: https://dx.doi.org/10.1007/978-3-7908-2736-1_17]

116. Geenens, G. Curse of dimensionality and related issues in nonparametric functional regression. Stat. Surv.; 2011; 5, pp. 30-43. [DOI: https://dx.doi.org/10.1214/09-SS049]

117. Attouch, M.; Laksaci, A.; Rafaa, F. Estimation locale linéaire de la régression non paramétrique fonctionnelle par la méthode des k plus proches voisins. C. R. Math. Acad. Sci. Paris; 2017; 355, pp. 824-829. [DOI: https://dx.doi.org/10.1016/j.crma.2017.05.007]

118. Burba, F.; Ferraty, F.; Vieu, P. k-nearest neighbour method in functional nonparametric regression. J. Nonparametr. Stat.; 2009; 21, pp. 453-469. [DOI: https://dx.doi.org/10.1080/10485250802668909]

119. Eberlein, E. Weak convergence of partial sums of absolutely regular sequences. Statist. Probab. Lett.; 1984; 2, pp. 291-293. [DOI: https://dx.doi.org/10.1016/0167-7152(84)90067-1]

120. Volkonski, V.; Rozanov, Y. Some limit theorems for random functions, Part I. Teor. Veroyatn. Primen. 4 186–207. Engl. Transl. Theory Probab. Appl.; 1959; 4, pp. 178-197. [DOI: https://dx.doi.org/10.1137/1104015]

121. Rosenblatt, M. A central limit theorem and a strong mixing condition. Proc. Natl. Acad. Sci. USA; 1956; 42, pp. 43-47. [DOI: https://dx.doi.org/10.1073/pnas.42.1.43]

122. Davydov, J.A. Mixing conditions for Markov chains. Teor. Verojatnost. I Primenen.; 1973; 18, pp. 321-338. [DOI: https://dx.doi.org/10.1137/1118033]

123. Bolthausen, E. The Berry-Esseen theorem for functionals of discrete Markov chains. Z. Wahrsch. Verw. Geb.; 1980; 54, pp. 59-73. [DOI: https://dx.doi.org/10.1007/BF00535354]

124. Bouzebda, S.; Soukarieh, I. Renewal type bootstrap for U-process Markov chains. Markov Process. Relat. Fields; 2022; 28, pp. 673-735.

125. Akaike, H. An approximation to the density function. Ann. Inst. Statist. Math. Tokyo; 1954; 6, pp. 127-132. [DOI: https://dx.doi.org/10.1007/BF02900741]

126. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Statist.; 1962; 33, pp. 1065-1076. [DOI: https://dx.doi.org/10.1214/aoms/1177704472]

127. Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Ann. Math. Statist.; 1956; 27, pp. 832-837. [DOI: https://dx.doi.org/10.1214/aoms/1177728190]

128. Devroye, L. A Course in Density Estimation; Progress in Probability and Statistics Birkhäuser Boston Inc.: Boston, MA, USA, 1987; Volume 14, xx+183.

129. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Monographs on Statistics and Applied Probability Chapman & Hall: London, UK, 1986; x+175. [DOI: https://dx.doi.org/10.1007/978-1-4899-3324-9]

130. Bouzebda, S. General tests of conditional independence based on empirical processes indexed by functions. Jpn. J. Stat. Data Sci.; 2023; 6, pp. 115-177. [DOI: https://dx.doi.org/10.1007/s42081-023-00193-3]

131. Bouzebda, S. On the weak convergence and the uniform-in-bandwidth consistency of the general conditional U-processes based on the copula representation: Multivariate setting. Hacet. J. Math. Stat.; 2023; 52, pp. 1303-1348. [DOI: https://dx.doi.org/10.15672/hujms.1134334]

132. Bouzebda, T.; Taachouche, N. Oracle inequalities and upper bounds for kernel conditional U-statistics estimators on manifolds and more general metric spaces associated with operators. Stochastics; 2024; pp. 1-64. [DOI: https://dx.doi.org/10.1080/17442508.2024.2391898]

133. Bouzebda, S. Limit Theorems in the Nonparametric Conditional Single-Index U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics; 2024; 12, 1996. [DOI: https://dx.doi.org/10.3390/math12131996]

134. Vapnik, V.N.; Chervonenkis, A.J. The uniform convergence of frequencies of the appearance of events to their probabilities. Teor. Verojatnost. I Primenen.; 1971; 16, pp. 264-279.

135. Kolmogorov, A.N.; Tikhomirov, V.M. ε-entropy and ε-capacity of sets in function spaces. Uspekhi Mat. Nauk; 1959; 14, pp. 3-86.

136. Dudley, R.M. The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Funct. Anal.; 1967; 1, pp. 290-330. [DOI: https://dx.doi.org/10.1016/0022-1236(67)90017-1]

137. Mayer-Wolf, E.; Zeitouni, O. The probability of small Gaussian ellipsoids and associated conditional moments. Ann. Probab.; 1993; 21, pp. 14-24. [DOI: https://dx.doi.org/10.1214/aop/1176989391]

138. Bogachev, V.I. Gaussian Measures; Mathematical Surveys and Monographs American Mathematical Society: Providence, RI, USA, 1998; Volume 62, xii+433. [DOI: https://dx.doi.org/10.1090/surv/062]

139. Li, W.V.; Shao, Q.M. Gaussian processes: Inequalities, small ball probabilities and applications. Stochastic Processes: Theory and Methods; North-Holland: Amsterdam, The Netherlands, 2001; Volume 19, pp. 533-597. [DOI: https://dx.doi.org/10.1016/S0169-7161(01)19019-X]

140. Ferraty, F.; Mas, A.; Vieu, P. Nonparametric regression on functional data: Inference and practical aspects. Aust. N. Z. J. Stat.; 2007; 49, pp. 267-286. [DOI: https://dx.doi.org/10.1111/j.1467-842X.2007.00480.x]

141. Deheuvels, P. One bootstrap suffices to generate sharp uniform bounds in functional estimation. Kybernetika; 2011; 47, pp. 855-865.

142. Soukarieh, I.; Bouzebda, S. Weak convergence of the conditional U-statistics for locally stationary functional time series. Stat. Inference Stoch. Process.; 2024; 27, pp. 227-304. [DOI: https://dx.doi.org/10.1007/s11203-023-09305-y]

143. Park, H.; Stefanski, L.A. Relative-error prediction. Statist. Probab. Lett.; 1998; 40, pp. 227-236. [DOI: https://dx.doi.org/10.1016/S0167-7152(98)00088-1]

144. Jones, M.C.; Park, H.; Shin, K.I.; Vines, S.K.; Jeong, S.O. Relative error prediction via kernel regression smoothers. J. Statist. Plann. Inference; 2008; 138, pp. 2887-2898. [DOI: https://dx.doi.org/10.1016/j.jspi.2007.11.001]

145. Demongeot, J.; Hamie, A.; Laksaci, A.; Rachdi, M. Relative-error prediction in nonparametric functional statistics: Theory and practice. J. Multivar. Anal.; 2016; 146, pp. 261-268. [DOI: https://dx.doi.org/10.1016/j.jmva.2015.09.019]

146. Bouhadjera, F.; Lemdani, M.; Ould Saïd, E. Strong uniform consistency of the local linear relative error regression estimator under left truncation. Statist. Pap.; 2023; 64, pp. 421-447. [DOI: https://dx.doi.org/10.1007/s00362-022-01325-9]

147. Bouhadjera, F.; Ould Saïd, E. Strong consistency of the local linear relative regression estimator for censored data. Opusc. Math.; 2022; 42, pp. 805-832. [DOI: https://dx.doi.org/10.7494/OpMath.2022.42.6.805]

148. Dehling, H.; Wendler, M. Central limit theorem and the bootstrap for U-statistics of strongly mixing data. J. Multivar. Anal.; 2010; 101, pp. 126-137. [DOI: https://dx.doi.org/10.1016/j.jmva.2009.06.002]

149. Yoshihara, K.I. Limiting behavior of U-statistics for stationary, absolutely regular processes. Z. Wahrscheinlichkeitstheorie Und Verw. Geb.; 1976; 35, pp. 237-252. [DOI: https://dx.doi.org/10.1007/BF00532676]

150. Han, F. An exponential inequality for U-statistics under mixing conditions. J. Theoret. Probab.; 2018; 31, pp. 556-578. [DOI: https://dx.doi.org/10.1007/s10959-016-0722-4]

151. Merlevède, F.; Peligrad, M.; Rio, E. A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probab. Theory Relat. Fields; 2011; 151, pp. 435-474. [DOI: https://dx.doi.org/10.1007/s00440-010-0304-9]

152. Kudraszow, N.L.; Vieu, P. Uniform consistency of kNN regressors for functional variables. Statist. Probab. Lett.; 2013; 83, pp. 1863-1870. [DOI: https://dx.doi.org/10.1016/j.spl.2013.04.017]

153. Arcones, M.A. A Bernstein-type inequality for U-statistics and U-processes. Statist. Probab. Lett.; 1995; 22, pp. 239-247. [DOI: https://dx.doi.org/10.1016/0167-7152(94)00072-G]

154. de la Peña, V.H. Decoupling and Khintchine’s inequalities for U-statistics. Ann. Probab.; 1992; 20, pp. 1877-1892. [DOI: https://dx.doi.org/10.1214/aop/1176989533]

155. Bouzebda, S.; Nezzal, A. Uniform in number of neighbors consistency and weak convergence of kNN empirical conditional processes and kNN conditional U-processes involving functional mixing data. AIMS Math.; 2024; 9, pp. 4427-4550. [DOI: https://dx.doi.org/10.3934/math.2024218]

156. Mason, D.M. Proving consistency of non-standard kernel estimators. Stat. Inference Stoch. Process.; 2012; 15, pp. 151-176. [DOI: https://dx.doi.org/10.1007/s11203-012-9068-4]

157. Novo, S.; Aneiros, G.; Vieu, P. A kNN procedure in semiparametric functional data analysis. Statist. Probab. Lett.; 2021; 171, 109028. [DOI: https://dx.doi.org/10.1016/j.spl.2020.109028]

158. Dudley, R.M. A course on empirical processes. École D’été de Probabilités de Saint-Flour, XII—1982; Springer: Berlin/Heidelberg, Germany, 1984; Volume 1097, pp. 1-142. [DOI: https://dx.doi.org/10.1007/BFb0099432]

159. Polonik, W.; Yao, Q. Set-indexed conditional empirical and quantile processes based on dependent data. J. Multivar. Anal.; 2002; 80, pp. 234-255. [DOI: https://dx.doi.org/10.1006/jmva.2001.1988]

160. Kendall, M.G. A New Measure of Rank Correlation. Biometrika; 1938; 30, pp. 81-93. [DOI: https://dx.doi.org/10.1093/biomet/30.1-2.81]

161. Stute, W. Universally consistent conditional U-statistics. Ann. Statist.; 1994; 22, pp. 460-473. [DOI: https://dx.doi.org/10.1214/aos/1176325378]

162. Stute, W. L^p-convergence of conditional U-statistics. J. Multivar. Anal.; 1994; 51, pp. 71-82. [DOI: https://dx.doi.org/10.1006/jmva.1994.1050]

163. Lehmann, E.L. A general concept of unbiasedness. Ann. Math. Stat.; 1951; 22, pp. 587-592. [DOI: https://dx.doi.org/10.1214/aoms/1177729549]

164. Dwass, M. The large-sample power of rank order tests in the two-sample problem. Ann. Math. Statist.; 1956; 27, pp. 352-374. [DOI: https://dx.doi.org/10.1214/aoms/1177728263]

165. Kohler, M.; Máthé, K.; Pintér, M. Prediction from randomly right censored data. J. Multivar. Anal.; 2002; 80, pp. 73-100. [DOI: https://dx.doi.org/10.1006/jmva.2000.1973]

166. Carbonez, A.; Györfi, L.; van der Meulen, E.C. Partitioning-estimates of a regression function under random censoring. Statist. Decis.; 1995; 13, pp. 21-37. [DOI: https://dx.doi.org/10.1524/strm.1995.13.1.21]

167. Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Am. Statist. Assoc.; 1958; 53, pp. 457-481. [DOI: https://dx.doi.org/10.1080/01621459.1958.10501452]

168. Maillot, B.; Viallon, V. Uniform limit laws of the logarithm for nonparametric estimators of the regression function in presence of censored data. Math. Methods Statist.; 2009; 18, pp. 159-184. [DOI: https://dx.doi.org/10.3103/S1066530709020045]

169. Datta, S.; Bandyopadhyay, D.; Satten, G.A. Inverse probability of censoring weighted U-statistics for right-censored data with an application to testing hypotheses. Scand. J. Stat.; 2010; 37, pp. 680-700. [DOI: https://dx.doi.org/10.1111/j.1467-9469.2010.00697.x]

170. Stute, W.; Wang, J.L. Multi-sample U-statistics for censored data. Scand. J. Statist.; 1993; 20, pp. 369-374.

171. Chen, Y.; Datta, S. Adjustments of multi-sample U-statistics to right censored data and confounding covariates. Comput. Statist. Data Anal.; 2019; 135, pp. 1-14. [DOI: https://dx.doi.org/10.1016/j.csda.2019.01.012]

172. Yuan, A.; Giurcanu, M.; Luta, G.; Tan, M.T. U-statistics with conditional kernels for incomplete data models. Ann. Inst. Statist. Math.; 2017; 69, pp. 271-302. [DOI: https://dx.doi.org/10.1007/s10463-015-0537-6]

173. Földes, A.; Rejto, L. A LIL type result for the product limit estimator. Z. Wahrsch. Verw. Geb.; 1981; 56, pp. 75-86. [DOI: https://dx.doi.org/10.1007/BF00531975]

174. Tsai, W.Y.; Jewell, N.P.; Wang, M.C. A note on the product-limit estimator under right censoring and left truncation. Biometrika; 1987; 74, pp. 883-886. [DOI: https://dx.doi.org/10.1093/biomet/74.4.883]

175. Andersen, P.K.; Borgan, O.r.; Gill, R.D.; Keiding, N. Statistical Models Based on Counting Processes; Springer Series in Statistics Springer: New York, NY, USA, 1993; xii+767. [DOI: https://dx.doi.org/10.1007/978-1-4612-4348-9]

176. Zhou, Y.; Yip, P.S.F. A strong representation of the product-limit estimator for left truncated and right censored data. J. Multivar. Anal.; 1999; 69, pp. 261-280. [DOI: https://dx.doi.org/10.1006/jmva.1998.1806]

177. Satten, G.A.; Kong, M.; Datta, S. Multisample adjusted U-statistics that account for confounding covariates. Stat. Med.; 2018; 37, pp. 3357-3372. [DOI: https://dx.doi.org/10.1002/sim.7825]

178. Rosenbaum, P.R. A new u-statistic with superior design sensitivity in matched observational studies. Biometrics; 2011; 67, pp. 1017-1027. [DOI: https://dx.doi.org/10.1111/j.1541-0420.2010.01535.x]

179. Zhang, W.; Jin, B.; Bai, Z. Learning block structures in U-statistic-based matrices. Biometrika; 2021; 108, pp. 933-946. [DOI: https://dx.doi.org/10.1093/biomet/asaa099]

180. Wei, C.; Elston, R.C.; Lu, Q. A weighted U statistic for association analyses considering genetic heterogeneity. Stat. Med.; 2016; 35, pp. 2802-2814. [DOI: https://dx.doi.org/10.1002/sim.6877] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26833871]

181. van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer Series in Statistics Springer: New York, NY, USA, 1996; xvi+508. [DOI: https://dx.doi.org/10.1007/978-1-4757-2545-2]

182. Kosorok, M.R. Introduction to Empirical Processes and Semiparametric Inference; Springer Series in Statistics Springer: New York, NY, USA, 2008; xiv+483. [DOI: https://dx.doi.org/10.1007/978-0-387-74978-5]

183. van der Vaart, A. New Donsker classes. Ann. Probab.; 1996; 24, pp. 2128-2140. [DOI: https://dx.doi.org/10.1214/aop/1041903221]

184. Blum, J.R.; Kiefer, J.; Rosenblatt, M. Distribution free tests of independence based on the sample distribution function. Ann. Math. Statist.; 1961; 32, pp. 485-498. [DOI: https://dx.doi.org/10.1214/aoms/1177705055]

185. Bouzebda, S.; Keziou, A.; Zari, T. K-sample problem using strong approximations of empirical copula processes. Math. Methods Statist.; 2011; 20, pp. 14-29. [DOI: https://dx.doi.org/10.3103/S1066530711010029]

186. Bouzebda, S.; Keziou, A. New estimates and tests of independence in semiparametric copula models. Kybernetika; 2010; 46, pp. 178-201.

187. Bouzebda, S.; Keziou, A. A new test procedure of independence in copula models via χ²-divergence. Comm. Statist. Theory Methods; 2010; 39, pp. 1-20. [DOI: https://dx.doi.org/10.1080/03610920802645379]

188. Bergsma, W.; Dassios, A. A consistent test of independence based on a sign covariance related to Kendall’s tau. Bernoulli; 2014; 20, pp. 1006-1028. [DOI: https://dx.doi.org/10.3150/13-BEJ514]

189. Borovkova, S.; Burton, R.; Dehling, H. Consistency of the Takens estimator for the correlation dimension. Ann. Appl. Probab.; 1999; 9, pp. 376-390. [DOI: https://dx.doi.org/10.1214/aoap/1029962747]

190. Silverman, B.W. Distances on circles, toruses and spheres. J. Appl. Probab.; 1978; 15, pp. 136-143. [DOI: https://dx.doi.org/10.2307/3213243]

191. Hollander, M.; Proschan, F. Testing whether new is better than used. Ann. Math. Statist.; 1972; 43, pp. 1136-1146. [DOI: https://dx.doi.org/10.1214/aoms/1177692466]

192. Hall, P. Asymptotic properties of integrated square error and cross-validation for kernel estimation of a regression function. Z. Wahrsch. Verw. Geb.; 1984; 67, pp. 175-196. [DOI: https://dx.doi.org/10.1007/BF00535267]

193. Härdle, W.; Marron, J.S. Optimal bandwidth selection in nonparametric regression function estimation. Ann. Statist.; 1985; 13, pp. 1465-1481. [DOI: https://dx.doi.org/10.1214/aos/1176349748]

194. Rachdi, M.; Vieu, P. Nonparametric regression for functional data: Automatic smoothing parameter selection. J. Statist. Plann. Inference; 2007; 137, pp. 2784-2801. [DOI: https://dx.doi.org/10.1016/j.jspi.2006.10.001]

195. Dony, J.; Mason, D.M. Uniform in bandwidth consistency of conditional U-statistics. Bernoulli; 2008; 14, pp. 1108-1133. [DOI: https://dx.doi.org/10.3150/08-BEJ136]

196. Marron, J.S. An asymptotically efficient solution to the bandwidth problem of kernel density estimation. Ann. Statist.; 1985; 13, pp. 1011-1023. [DOI: https://dx.doi.org/10.1214/aos/1176349653]

197. Vieu, P. Nonparametric regression: Optimal local bandwidth choice. J. Roy. Statist. Soc. Ser. B; 1991; 53, pp. 453-464. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1991.tb01837.x]

198. Dedecker, J.; Louhichi, S. Maximal inequalities and empirical central limit theorems. Empirical Process Techniques for Dependent Data; Birkhäuser Boston: Boston, MA, USA, 2002; pp. 137-159.

199. Heinrich, L. Bounds for the absolute regularity coefficient of a stationary renewal process. Yokohama Math. J.; 1992; 40, pp. 25-33.

200. Robinson, P.M. Large-sample inference for nonparametric regression with dependent errors. Ann. Statist.; 1997; 25, pp. 2054-2083. [DOI: https://dx.doi.org/10.1214/aos/1069362387]

201. Yajima, Y. On estimation of a regression model with long-memory stationary errors. Ann. Statist.; 1988; 16, pp. 791-807. [DOI: https://dx.doi.org/10.1214/aos/1176350837]

202. Toussoun, O. Mémoire sur L’histoire du Nil; Mémoires de l’Institut d’Egypte, Institut d’Egypte: Cairo, Egypt, 1925; Volume 9.

203. Karlsen, H.A.; Tjøstheim, D. Nonparametric estimation in null recurrent time series. Ann. Statist.; 2001; 29, pp. 372-416. [DOI: https://dx.doi.org/10.1214/aos/1009210546]

204. Bouzebda, S.; Didi, S. Additive regression model for stationary and ergodic continuous time processes. Comm. Statist. Theory Methods; 2017; 46, pp. 2454-2493. [DOI: https://dx.doi.org/10.1080/03610926.2015.1048882]

205. Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Comm. Statist. Theory Methods; 2017; 46, pp. 1367-1406. [DOI: https://dx.doi.org/10.1080/03610926.2015.1019144]

206. Bouanani, O.; Bouzebda, S. Limit theorems for local polynomial estimation of regression for functional dependent data. AIMS Math.; 2024; 9, pp. 23651-23691. [DOI: https://dx.doi.org/10.3934/math.20241150]

207. Lasota, A. Dynamical Systems on Measures; [Układy dynamiczne na mi- arach] Wydawnictwa Uniwersytetu Ślaskiego: Katowice, Poland, 2008; (In Polish)

208. Mitkowski, P.J. Mathematical Structures of Ergodicity and Chaos in Population Dynamics; Studies in Systems, Decision and Control Springer: Cham, Switzerland, 2021; Volume 312, xii+97. [DOI: https://dx.doi.org/10.1007/978-3-030-57678-3]

209. Cornfeld, I.P.; Fomin, S.V.; Sinai, Y.G. Ergodic Theory. Transl. From the Russian by A. B. Sossinskii; Grundlehren Math. Wiss. Springer: Cham, Switzerland, 1982; Volume 245.

210. Bouzebda, S.; Limnios, N. On general bootstrap of empirical estimator of a semi-Markov kernel with applications. J. Multivar. Anal.; 2013; 116, pp. 52-62. [DOI: https://dx.doi.org/10.1016/j.jmva.2012.11.008]

211. Cai, G.; Shehu, Y.; Iyiola, O.S. Inertial Tseng’s extragradient method for solving variational inequality problems of pseudo-monotone and non-Lipschitz operators. J. Ind. Manag. Optim.; 2022; 18, pp. 2873-2902. [DOI: https://dx.doi.org/10.3934/jimo.2021095]

212. Song, Y.; Ullah, M.Z. On a sparse and stable solver on graded meshes for solving high-dimensional parabolic pricing PDEs. Comput. Math. Appl.; 2023; 143, pp. 224-233. [DOI: https://dx.doi.org/10.1016/j.camwa.2023.05.008]

213. Mohammedi, M.; Bouzebda, S.; Laksaci, A.; Bouanani, O. Asymptotic normality of the k-NN single index regression estimator for functional weak dependence data. Comm. Statist. Theory Methods; 2024; 53, pp. 3143-3168. [DOI: https://dx.doi.org/10.1080/03610926.2022.2150823]

214. Giné, E.; Zinn, J. Some limit theorems for empirical processes. Ann. Probab.; 1984; 12, pp. 929-998. [DOI: https://dx.doi.org/10.1214/aop/1176993138]

215. LeCam, L. A remark on empirical measures. A Festschrift for Erich Lehmann in Honor of His Sixty-Fifth Birthday; Wadsworth Statist./Probab. Ser. Wadsworth: Belmont, CA, USA, 1983; pp. 305-327.

216. Davydov, J.A. Convergence of distributions generated by stationary stochastic processes. Theory Probab. Appl.; 1968; 13, pp. 691-696. [DOI: https://dx.doi.org/10.1137/1113086]

Word count: 23174

Show less

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

U-statistics are fundamental in modeling statistical measures that involve responses from multiple subjects. They generalize the concept of the empirical mean of a random variable X to include summations over each m-tuple of distinct observations of X. W. Stute introduced conditional U-statistics, extending the Nadaraya–Watson estimates for regression functions. Stute demonstrated their strong pointwise consistency with the conditional expectation $r^{(m)} (φ, t)$ , defined as $E [φ (Y_{1}, \dots, Y_{m}) | (X_{1}, \dots, X_{m}) = t]$ for $t \in X^{m}$ . This paper focuses on estimating functional single index (FSI) conditional U-processes for regular time series data. We propose a novel, automatic, and location-adaptive procedure for estimating these processes based on k-Nearest Neighbor (kNN) principles. Our asymptotic analysis includes data-driven neighbor selection, making the method highly practical. The local nature of the kNN approach improves predictive power compared to traditional kernel estimates. Additionally, we establish new uniform results in bandwidth selection for kernel estimates in FSI conditional U-processes, including almost complete convergence rates and weak convergence under general conditions. These results apply to both bounded and unbounded function classes, satisfying certain moment conditions, and are proven under standard Vapnik–Chervonenkis structural conditions and mild model assumptions. Furthermore, we demonstrate uniform consistency for the nonparametric inverse probability of censoring weighted (I.P.C.W.) estimators of the regression function under random censorship. This result is independently valuable and has potential applications in areas such as set-indexed conditional U-statistics, the Kendall rank correlation coefficient, and discrimination problems.

Details

Title

Uniform in Number of Neighbor Consistency and Weak Convergence of k-Nearest Neighbor Single Index Conditional Processes and k-Nearest Neighbor Single Index Conditional U-Processes Involving Functional Mixing Data

Author

Bouzebda, Salim

First page

1576

Publication year

2024

Publication date

2024

Publisher

MDPI AG

e-ISSN

20738994

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/sym16121576

ProQuest document ID

3149760215

Uniform in Number of Neighbor Consistency and Weak Convergence of k-Nearest Neighbor Single Index Conditional Processes and k-Nearest Neighbor Single Index Conditional U-Processes Involving Functional Mixing Data

Jump to:

Full text

1.1. Contribution

1.2. Organization of the Paper

2. The Functional Framework

2.1. Generality on the Model

3. Uniform Consistency

3.1. Uniform Consistency of the kNN Kernel Estimator for Regression

3.1.1. UIB Consistency for Functional Regression

3.1.2. UINN Consistency for Functional Regression

3.2. Relative-Error Prediction

3.3. Uniform Consistency of the kNN Functional Conditional U-Statistics

3.4. Uniform Consistency and UINN Consistency of Functional Conditional U-Statistics

4. Uniform Central Limit Theorems

4.1. kNN Conditional Empirical Process

4.2. kNN Conditional U-Processes

5. Some Potential Applications

5.1. Set Indexed Conditional U-Statistics

5.2. Kendall Rank Correlation Coefficient

5.3. Discrimination Problems

5.4. Generalized U-Statistics

5.5. Conditional U-Statistics for Censored Data

5.6. Conditional U-Statistics for Left Truncated and Right Censored Data

5.7. Examples of Classes of Functions

5.8. Examples of U-Kernels

6. The Bandwidth Selection Criterion

7. Concluding Remarks

8. Mathematical Developments

8.1. Proofs of Uniform Consistency Results

8.1.1. Proof of Theorem 1

8.1.2. Proof of Theorem 2

8.2. Preliminaries of the Proofs

8.2.1. Proof of Theorem 3

8.2.2. Proof of Theorem 5

8.2.3. Proof of Theorem 6

8.2.4. Proof of Corollary 6

8.3. Proofs of Weak Convergence Results

8.3.1. Preliminaries of the Proofs

8.3.2. Proof of Theorem 14

8.3.3. Proof of Theorem 15

Abstract

Details