Introduction
For many parametric distributions, so-called Stein identities are available, which rely on moments of functional expressions of a corresponding random variable. These identities are named after Charles Stein, who developed the idea of uniquely characterizing a certain distribution family by such a moment identity [see 21, 22]. Many examples of both continuous and discrete distributions together with their Stein characterizations can be found in Stein et al. [23], Sudheesh [24], Sudheesh and Tibiletti [25], Landsman and Valdez [15], Weiß and Aleksandrov [30], Anastasiou et al. [1] and the references therein. Stein identities are not a mere tool of probability theory. During the last years, there was also a lot of research activity on statistical applications of Stein identities, for example to goodness-of-fit (GoF) tests [4, 31], sequential change-point monitoring by control charts [29], and shrinkage estimation [14]. For references on these and further statistical applications related to Stein’s method, see Betsch et al. [4], Anastasiou et al. [1], Kubokawa [14]. In the present article, a further statistical application of Stein identities is investigated and exemplified, namely to the development of closed-form parameter estimators for continuous or discrete distributions. The idea to construct generalized types of method-of-moments (MM) estimators based on an appropriate type of Stein identity plus weighting function, referred to as Stein-MM estimators, was first explored in some applications by Arnold et al. [2] and Wang and Weiß [28]. Recently, Ebner et al. [6] discussed the Stein-MM approach in a much broader way, and also the present article provides a comprehensive treatment of Stein-MM estimators for various distributions. The main motivation for considering Stein-MM estimation is that the weighting function might be chosen in such a way that the resulting estimator shows better properties (e. g., a reduced bias or mean squared error (MSE)) than the default MM estimator or other existing estimators. Despite the additional flexibility offered by the weighting function, the Stein-MM estimators are computed from simple closed-form expressions, and consistency and asymptotic normality are easily established, also see Ebner et al. [6].
In what follows, we apply the proposed Stein-MM estimation to three different distribution families. We start with the illustrative example of the exponential (Exp) distribution in Sect. 2. This simple one-parameter distribution mainly serves to demonstrate the general approach for deriving the Stein-MM estimator and its asymptotics, and it also indicates the potential benefits of using the Stein-MM approach for parameter estimation. Afterward in Sect. 3, we examine a more sophisticated type of continuous distribution, namely the two-parameter inverse Gaussian (IG) distribution. In Sect. 4, we then turn to a discrete distribution family, namely the two-parameter negative-binomial (NB) distribution. Illustrative real-world data examples are also presented in Sects. 3–4. Note that neither the exponential distribution nor any discrete distribution have been considered by Ebner et al. [6], and their approach to the Stein-MM estimation of the IG-distribution differs from the one proposed here, see the details below. Also Arnold et al. [2], Wang and Weiß [28] did not discuss any of the aforementioned distributions. Finally, we conclude in Sect. 5 and outline topics for future research.
Stein Estimation of Exponential Distribution
The exponential distribution is the most well-known lifetime distribution, which is characterized by the property of being memory-less. It has positive support and depends on the parameter , where its probability density function (pdf) is given by for and zero otherwise. A detailed survey about the properties of and estimators for the -distribution can be found in Johnson et al. [10, Chapter 19]. Given the independent and identically distributed (i. i. d.) sample with for , the default estimator of , which is an MM estimator and the maximum likelihood (ML) estimator at the same time, is given by , where denotes the sample mean. This estimator is known to neither be unbiased, nor to be optimal in terms of the MSE, see Elfessi & Reineke [7]. To derive a generalized MM estimator with perhaps improved bias or MSE properties, we consider the exponential Stein identity according to Stein et al. [23, Example 1.6], which states that
2.1
for any piecewise differentiable function f with such that , exist. Solving (2.1) in and using the sample moments and instead of the population moments, the class of Stein-MM estimators for is obtained as2.2
Note that the choice leads to the default estimator . Generally, might be interpreted as a kind of weighting function, which assigns different weights to large or low values of X than the identity function does. For deriving the asymptotic distribution of the general Stein-MM estimator , we first define the vectors with as2.3
Their mean equals2.4
where we define for any . Then, the following central limit theorem (CLT) holds.Theorem 2.1
If are i. i. d. according to , then the sample mean of according to (2.3) is asymptotically normally distributed aswhere denotes the multivariate normal distribution, and where the covariances are given as
The proof of Theorem 2.1 is provided by Appendix A.1. In the second step of deriving the asymptotics of , we define the function . Then, and . Applying the Delta method [18] to Theorem 2.1, the following result follows.
Theorem 2.2
If are i. i. d. according to , then is asymptotically normally distributed, where the asymptotic variance and bias are given by
The proof of Theorem 2.2 is provided by Appendix A.2. Note that the moments involved in Theorems 2.1 and 2.2 can sometimes be derived explicitly, see the subsequent examples, while they can be computed by using numerical integration otherwise.
After having derived the asymptotic variance and bias without explicitly specifying the function f, let us now consider some special cases of this weighting function. Here, our general strategy is as follows. We first specify a parametric class of functions, where the actual parameter value(s) are determined only after a second step. In this second step, we compute asymptotic properties of the Stein-MM estimator such as bias, variance, and MSE, and then we select the parameter value(s) such that some of the aforementioned properties are minimized within the considered class of functions. Detailed examples are presented below. The chosen parametric class of functions should be sufficiently flexible in the sense that by modifying its parameter value(s), it should be possible to move the weight to quite different regions of the real numbers. Its choice may also be guided by the aim of covering existing parameter estimators as special cases. For the exponential distribution discussed here in Sect. 2, we already pointed out that the default ML and MM estimator corresponds to , so the choice of a parametric class of functions including appears reasonable. This leads to our first illustrative example, namely the choice , , where leads to the default estimator . Here, we have to restrict to to ensure that the condition in (2.1) holds. Using that
2.5
the following corollary to Theorem 2.2 is derived.Corollary 2.3
Let be i. i. d. according to , and let with . Then, is asymptotically normally distributed, where the asymptotic variance and bias are given byFurthermore, the MSE equals
The proof of (2.5) and Corollary 2.3 is provided by Appendix A.3. Note that in Corollary 2.3, denotes the generalized binomial coefficient given by .
Fig. 1 [Images not available. See PDF.]
Plot of (a) and , and (b) MSE for and . Points indicate minimal MSE values. Dotted line at corresponds to default estimator
In Fig. 1a, the asymptotic variance and bias of according to Corollary 2.3 are presented. While the variance is minimal for (i. e., for the ordinary MM and ML estimator), the bias decreases with decreasing a (i. e., bias reductions are achieved for sublinear choices of ). Hence, an MSE-optimal choice of is obtained for some . This is illustrated by Fig. 1b, where the MSE of Corollary 2.3 is presented for different sample sizes . The corresponding optimal values of a are determined by numerical optimization as 0.952, 0.978, 0.988, and 0.994, respectively. As a result, especially for small n, we have a reduction of the MSE (and of the bias as well) if using a “true” Stein-MM estimator (i. e., with ). Certainly, if the focus is mainly on bias reduction, then an even smaller choice of a would be beneficial.
As a second illustrative example, let us consider the functions with , which are again sublinear, but this time also bounded from above by one. Again, we can derive a corollary to Theorem 2.2, this time by using the moment formula
2.6
Corollary 2.4
Let be i. i. d. according to , and let with . Then, is asymptotically normally distributed, where the asymptotic variance and bias are given byFurthermore, the MSE equals
The proof of (2.6) and Corollary 2.4 is provided by Appendix A.4.
Fig. 2 [Images not available. See PDF.]
Plot of MSE for , for (a) different n and , and (b) different and . Points indicate minimal MSE values
This time, the variance decreases for increasing u, whereas the bias decreases for decreasing u. As a consequence, an MSE-optimal choice is expected for some u inside the interval (0; 1). This is illustrated by Fig. 2a, where the minima for , given that , are attained for , 0.963, 0.981, and 0.990, respectively. The major difference between the two types of weighting functions in Corollaries 2.3 and 2.4 is given by the role of within the expression for the MSE. For in Corollary 2.3, occurs as a simple factor such that the optimal choice for a is the same across different . Hence, the optimal a is simply a function of the sample size n, which is very attractive for applications in practice. For in Corollary 2.4, by contrast, the MSE depends in a more sophisticated way on , and the optimal u differs for different as illustrated by Fig. 2b. Thus, if one wants to use the weighting function in practice, a two-step procedure appears reasonable, where an initial estimate is computed via , which is then refined by with u being determined by plugging-in instead of (also see Section 2.2 in Ebner et al. [6] for an analogous idea).
We conclude this section by pointing out two further application scenarios for the use of Stein-MM estimators . First, in analogy to recent Stein-based GoF-tests such as in Weiß et al. [31], might be used for GoF-applications. More precisely, the idea could be to select a set of weighting functions, and to compute for all . As any is a consistent estimator of according to Theorem 2.2, the obtained values should vary closely around . For other continuous distributions with positive support, such as the IG-distribution considered in the next Sect. 3, we cannot expect that has an (asymptotically) unique mean for different f, see Remark 3.1, so a larger variation among the values in is expected. Such a discrepancy in variation might give rise for a formal exponential GoF-test. But as the focus of this article is on parameter estimation, we postpone a detailed investigation of this GoF-application to future research.
Table 1. Empirical bias and MSE of from simulated i. i. d. samples , where observations randomly selected for additive outlier “”
Bias | MSE | |||||||
---|---|---|---|---|---|---|---|---|
x | x | |||||||
10 | 0.280 | 0.277 | 0.206 | 0.304 | 0.107 | 0.105 | 0.100 | 0.114 |
25 | 0.343 | 0.341 | 0.271 | 0.366 | 0.126 | 0.125 | 0.091 | 0.140 |
50 | 0.305 | 0.303 | 0.238 | 0.328 | 0.098 | 0.097 | 0.067 | 0.111 |
100 | 0.308 | 0.306 | 0.241 | 0.330 | 0.098 | 0.096 | 0.063 | 0.111 |
A second type of application is illustrated by Table 1, which refers to a simulation experiment with replications per scenario. For simulated i. i. d. -samples of sizes , about 10 % of the observations were randomly selected and contaminated by an additive outlier, namely by adding 5 to the selected observations. Note that the topic of outliers in exponential data received considerable interest in the literature [10, pp. 528–530]. Then, different estimators are computed from the contaminated data, where the first three choices of the weighting function f are characterized by a sublinear increase, whereas the fourth function, , corresponds to the default estimator . Table 1 shows that all MM estimators are affected by the outliers, e. g., in terms of the strong negative bias. But comparing the four columns of bias and MSE values, respectively, it gets clear that the novel Stein-MM estimators are more robust against the outliers, having both lower bias and MSE than . Especially the choice , a logarithmic weighting scheme, leads to a rather robust estimator. The relatively good performance of the Stein-MM estimators can be explained by the fact that the weighting functions increase sublinearly (which is also beneficial for bias reduction in non-contaminated data, recall the above discussion), so the effect of large observations is damped. To sum up, by choosing an appropriate weighting function f within the Stein-MM estimator , one cannot only achieve a reduced bias and MSE, but also a reduced sensitivity towards outlying observations.
Stein Estimation of Inverse Gaussian Distribution
Like the exponential distribution considered in the previous Sect. 2, the IG-distribution with parameters , abbreviated as , has positive support, where the pdf is given byThe IG-distribution is commonly used as a lifetime model (as it can be related to the first-passage time in random walks), but it may also simply serve as a distribution with positive skewness and, thus, as an alternative to, e. g., the lognormal distribution [see ]. Detailed surveys about the properties and applications of , and on many further references, can be found in Folks & Chhikara [8], Seshadri [19] as well as in Johnson et al. [10, Chapter 15]. In what follows, the moment properties of are particularly relevant. We have , , and . In particular, positive and negative moments are related to each other by
3.1
see Tweedie [26, p. 372] as well as the aforementioned surveys.Remark 3.1
At this point, let us briefly recall the discussion in Sect. 2 (p. 7), where we highlighted the property that for i. i. d. exponential samples, the quotient has an (asymptotically) unique mean for different f. From counterexamples, it is easily seen that this property is not true for -data. The Delta method implies that the mean of is asymptotically equal to , which equals
for , but
for .
From now on, let be an i. i. d. sample from , which shall be used for parameter estimation. Here, one obviously estimates by the sample mean , but the estimation of is more demanding. In the literature, the MM and ML estimation of have been discussed (see the details below), while our aim is to derive a generalized MM estimator with improved bias and MSE properties based on a Stein identity. In fact, as we shall see, our proposed approach can be understood as a unifying framework that covers the ordinary MM and ML estimator as special cases. A Stein identity for has been derived by Koudou & Ley [13, p. 172], which states that
3.2
holds for all differentiable functions with . Solving (3.2) in and using the sample moments instead of (where h might be any of the functions involved in (3.2)), the class of Stein-MM estimators for is obtained as3.3
Here, the ordinary MM estimator of , i. e., with denoting the empirical variance [27], is included as the special case , whereas the ML estimator [26] follows for . Hence, (3.3) provides a unifying estimation approach that covers the established estimators as special cases.Remark 3.2
At this point, a reference to Example 2.9 in Ebner et al. [6] is necessary. As already mentioned in Sect. 1, also Ebner et al. [6] proposed a Stein-MM estimator for the IG-distribution, which, however, differs from the one developed here. The crucial difference is given by the fact that Ebner et al. [6] tried a joint estimation of based on (3.2), namely by jointly solving two equations that are implied by (3.2) if using two different weight functions . The resulting class of estimators, however, does not cover the existing MM and ML estimators, so Ebner et al. [6] did not pursue the Stein-MM estimation of the IG-distribution further. By contrast, as we did not see notable potential for improving the estimation of by (recall the diverse optimality properties of the sample mean as an estimator of the population mean [e. g., 20]), we used (3.2) to only derive an estimator for . In this way, we were able to recover both the MM and ML estimator of within (3.3).
For deriving the asymptotic distribution of our general Stein-MM estimator from (3.3), we first define the vectors with as
3.4
Their mean equals3.5
Then, the following CLT holds.Theorem 3.3
If are i. i. d. according to , then the sample mean of according to (3.4) is asymptotically normally distributed aswhere denotes the multivariate normal distribution, and where the covariances are given as
The proof of Theorem 3.3 is provided by Appendix A.5.
In the second step of deriving the asymptotics of , we define the function . Then, and . Applying the Delta method [18] to Theorem 3.3, the following result follows.
Theorem 3.4
Let be i. i. d. according to , and define . Then, is asymptotically normally distributed, where the asymptotic variance and bias, respectively, are given by and
The proof of Theorem 3.4 is provided by Appendix A.6.
Before we discuss the effect of f on bias and MSE of , let us first consider the special cases of the ordinary MM and ML estimator. Their asymptotics are immediate consequences of Theorem 3.4. For the MM estimator , we have to choose such that . As a consequence,
3.6
This leads to a considerable simplification of Theorem 3.4, see Appendix A.7, which is summarized in the following corollary.Corollary 3.5
Let be i. i. d. according to , then is asymptotically normally distributed with asymptotic variance and bias .
While we are not aware of a reference providing these asymptotics, they can be verified by using Tweedie [27, p. 704]. There, normal asymptotics for the reciprocal are provided: . Applying the Delta method with and to it, we conclude that has the asymptotic variance like in Corollary 3.5.
Next, we consider the special case of the ML estimator , which follows by choosing such that . Again, the joint moments simplify a lot:
3.7
Together with Theorem 3.4, see Appendix A.8, we get the following corollary.Corollary 3.6
Let be i. i. d. according to , then is asymptotically normally distributed with asymptotic variance and bias .
Comparing Corollaries 3.5 and 3.6, it is interesting to note that the MM estimator has larger asymptotic bias and variance than the ML estimator: and . To verify the asymptotics of Corollary 3.6, note that the ML estimator has been shown to follow an inverted- distribution: [see 26, p. 368]. Using the formulae for mean and variance of [see 3, p. 431], we getfor large n, which agrees with Corollary 3.6.
Table 2. Simulated vs. asymptotic mean and standard deviation of estimator from (3.3)
Mean | Std. dev. | Mean | Std. dev. | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
f(x) | n | Sim | Asym | Sim | Asym. | f(x) | n | Sim | Asym | Sim | Asym. |
1 | 100 | 1.103 | 1.120 | 0.258 | 0.283 | 1 | 100 | 1.213 | 1.300 | 0.365 | 0.447 |
250 | 1.045 | 1.048 | 0.169 | 0.179 | 250 | 1.098 | 1.120 | 0.245 | 0.283 | ||
500 | 1.023 | 1.024 | 0.122 | 0.126 | 500 | 1.053 | 1.060 | 0.180 | 0.200 | ||
100 | 1.125 | 1.138 | 0.306 | 0.312 | 100 | 1.243 | 1.294 | 0.442 | 0.451 | ||
250 | 1.053 | 1.055 | 0.193 | 0.198 | 250 | 1.104 | 1.118 | 0.274 | 0.285 | ||
500 | 1.027 | 1.028 | 0.138 | 0.140 | 500 | 1.055 | 1.059 | 0.194 | 0.202 | ||
100 | 1.026 | 1.025 | 0.158 | 0.151 | 100 | 1.020 | 1.019 | 0.160 | 0.156 | ||
250 | 1.010 | 1.010 | 0.098 | 0.096 | 250 | 1.008 | 1.008 | 0.100 | 0.099 | ||
500 | 1.006 | 1.005 | 0.069 | 0.068 | 500 | 1.004 | 1.004 | 0.070 | 0.070 | ||
1/x | 100 | 1.031 | 1.030 | 0.149 | 0.141 | 1/x | 100 | 1.031 | 1.030 | 0.149 | 0.141 |
250 | 1.012 | 1.012 | 0.091 | 0.089 | 250 | 1.012 | 1.012 | 0.091 | 0.089 | ||
500 | 1.006 | 1.006 | 0.064 | 0.063 | 500 | 1.006 | 1.006 | 0.064 | 0.063 | ||
100 | 1.032 | 1.031 | 0.151 | 0.143 | 100 | 1.033 | 1.032 | 0.154 | 0.146 | ||
250 | 1.013 | 1.013 | 0.093 | 0.091 | 250 | 1.013 | 1.013 | 0.094 | 0.093 | ||
500 | 1.007 | 1.006 | 0.065 | 0.064 | 500 | 1.007 | 1.006 | 0.066 | 0.065 | ||
1 | 100 | 3.172 | 3.180 | 0.595 | 0.600 | 1 | 100 | 3.309 | 3.360 | 0.775 | 0.849 |
250 | 3.071 | 3.072 | 0.378 | 0.379 | 250 | 3.134 | 3.144 | 0.506 | 0.537 | ||
500 | 3.036 | 3.036 | 0.267 | 0.268 | 500 | 3.069 | 3.072 | 0.366 | 0.379 | ||
100 | 3.207 | 3.216 | 0.677 | 0.675 | 100 | 3.374 | 3.415 | 0.918 | 0.937 | ||
250 | 3.085 | 3.086 | 0.427 | 0.427 | 250 | 3.159 | 3.166 | 0.580 | 0.593 | ||
500 | 3.043 | 3.043 | 0.301 | 0.302 | 500 | 3.081 | 3.083 | 0.413 | 0.419 | ||
100 | 3.087 | 3.085 | 0.462 | 0.440 | 100 | 3.078 | 3.076 | 0.475 | 0.454 | ||
250 | 3.035 | 3.034 | 0.284 | 0.278 | 250 | 3.031 | 3.031 | 0.293 | 0.287 | ||
500 | 3.018 | 3.017 | 0.199 | 0.197 | 500 | 3.017 | 3.015 | 0.206 | 0.203 | ||
1/x | 100 | 3.093 | 3.090 | 0.448 | 0.424 | 1/x | 100 | 3.093 | 3.090 | 0.448 | 0.424 |
250 | 3.037 | 3.036 | 0.274 | 0.268 | 250 | 3.037 | 3.036 | 0.274 | 0.268 | ||
500 | 3.019 | 3.018 | 0.192 | 0.190 | 500 | 3.019 | 3.018 | 0.192 | 0.190 | ||
100 | 3.095 | 3.092 | 0.449 | 0.426 | 100 | 3.097 | 3.094 | 0.453 | 0.430 | ||
250 | 3.038 | 3.037 | 0.275 | 0.269 | 250 | 3.039 | 3.038 | 0.278 | 0.272 | ||
500 | 3.020 | 3.018 | 0.193 | 0.191 | 500 | 3.020 | 3.019 | 0.195 | 0.192 |
Remark 3.7
To analyze the performance of the asymptotics provided by Theorem 3.4 (and that of the special cases discussed in Corollaries 3.5 and 3.6), if used as approximations to the true distribution of for finite sample size n, we did a simulation experiment with replications. The obtained results for various choices of and f(x) are summarized in Table 2. It can be recognized that the asymptotic approximations for mean and standard deviation generally agree quite well with their simulated counterparts. Only for the case (the default MM estimator) and sample size , we sometimes observe stronger deviations. But in the large majority of estimation scenarios, we have a close agreement such that the conclusions derived from the asymptotic expressions are meaningful for finite sample sizes as well.
In analogy to our discussion in Sect. 2, let us now analyze the performance of the Stein-MM estimator for the weight functions , with . Recall that this class of weight functions cover the default MM estimator for and the ML estimator for . The choice (right in the middle between these two special cases) has to be excluded as it leads to a degenerate estimator according to (3.3). For this reason, the subsequent analyses in Figs. 3 and 4 are done separately for (plots on left-hand side, covering the ML estimator) and (plots on right-hand side, covering the MM estimator).
Fig. 3 [Images not available. See PDF.]
Plots of and , where points indicate minimal variance and bias values. Scenarios with a and b, and with c and (d) . Dotted lines at and correspond to default ML and MM estimator, respectively
Let us start with the analysis of asymptotic bias and variance in Fig. 3. The upper and lower panel consider two different example situations, namely and , respectively, while left-hand and right-hand side are separated by the pole at . The right-hand side shows that the default MM estimator is neither (locally) optimal in terms of asymptotic bias nor in terms of variance. In fact, the optimal a for is around for , and around for . However, comparing the actual values at the Y-axis to those of the plots on the left-hand side, we recognize that the asymptotic bias and variance get considerably smaller for some region with . In particular, the ML estimator is clearly superior to the MM estimator, and as shown by Figs. 3a and c, the ML estimator is even optimal in terms of the asymptotic variance. It has to be noted, however, that the curve corresponding to the asymptotic variance is rather flat around , so moderate deviations from do not have a notable effect on the variance. Thus, it is important to also consider the optimum bias, which is reached for some a around in both (a) and (c). So it appears to be advisable to choose an for optimal overall estimation performance.
Fig. 4 [Images not available. See PDF.]
Plots of MSE, where points indicate minimal MSE values. Scenarios with a and b, and with c and d. Dotted lines at and correspond to default ML and MM estimator, respectively
This is confirmed by Fig. 4, where the asymptotic MSE is shown for various sample sizes n and the same scenarios as in Fig. 3. While the ML estimator approaches the MSE-optimum for increasing n, we get an improved performance for smaller sample sizes if choosing appropriately (e. g., if ). Generally, an analogous recommendation holds for in parts (b) and (d), with MSE-optima at a around and , respectively, but much smaller MSE values can be reached for . To sum up, the default MM estimator (and more generally, Stein-MM estimators with ) are not recommended for practice due to their rather large bias, variance, and thus MSE, while the ML estimator constitutes at least a good initial choice for estimating , being optimal in terms of asymptotic variance. However, unless the sample size n is very large, an improved MSE performance can be achieved by reducing a to an appropriate value in in the second step and by computing the corresponding Stein-MM estimate .
Table 3. Runoff data from Example 3.8: Stein-MM estimates for different choices of function
a | 1.5 | –1 | 0.668 | 0.125 | 0.109 | 0 | 0.5 |
---|---|---|---|---|---|---|---|
Notes | (i) | (ii) | (iii) | (iv) | (v) | ||
1.423 | 1.440 | 1.429 | 1.511 | 1.511 | 1.512 | 1.529 | |
(i) ML estimate, -optimal | (ii) -optimal | (v) MM estimate | |||||
(iii) -optimal given | (iv) -optimal given |
Example 3.8
As an illustrative data example, let us consider the runoff amounts at Jug Bridge in Maryland [see is estimated by the sample mean as , and using the ML estimator as an initial estimator for , we get the value . As outlined before, this initial model fit might now be used for searching estimators with improved performance. Some examples (together with further estimates for comparative purposes) are summarized in Table 3. The ML estimator () is also optimal in asymptotic variance, whereas the bias-optimal choice is obtained for a somewhat larger value of a, namely . The corresponding estimate is slightly lower than the ML estimate, similar to the value for , and can thus be seen as a fine-tuning of the initial estimate. By contrast, a notable change in the estimate happens if we turn to . The “constrained-optimal” choices (optimal given that ) as well as the MM estimate lead to nearly the same values (around 1.51) and are thus visibly larger than the actually preferable estimates for . Also their variance and bias are about 2.5 times larger than those of the estimates for .
Stein Estimation of Negative-binomial Distribution
While the previous sections (and also the research by Ebner et al. [6]) solely focussed on continuous distributions, let us now turn to the case of discrete-valued random variables. Here, the most relevant type are count random variables X, having a quantitative range contained in . The probably most well-known distributions for counts are Poisson and binomial distributions, both depending on the (normalized) mean as their only model parameter. But as already discussed in Sect. 3, there is hardly any potential for finding a better estimator of the mean than the sample mean, so we do not further discuss these distributions. Instead, we focus on another popular count distribution, namely the NB-distribution with parameters and , abbreviated as . Such has the range , probability mass function (pmf) , and mean . By contrast to the equidispersed Poisson distribution, its variance is always larger than the mean (overdispersion), which is an important property for applications in practice. A detailed survey about the properties of and estimators for the -distribution can be found in Johnson et al. [11, Chapter 5]. Instead of the original parametrization by , it is often advantageous to consider either or , where or , respectively, serve as an additional dispersion parameter once the mean has been fixed. In case of the -parametrization, it holds that and , whereas we get and for the -parametrization. Besides the ease of interpretation, these parametrizations are advantageous in terms of parameter estimation. While MM estimation is rather obvious, namely by and , , ML estimation is generally demanding as there does not exist a closed-form solution, see the discussion by Kemp & Kemp [12], i. e., numerical optimization is necessary. However, there is an important exception: the NB’s ML estimator of the mean is given by [12, p. 867], i. e., is both the MM and ML estimator with its known appealing performance. So it suffices to find an adequate estimator for or , respectively, the ML estimators of which do not have a closed-form expression.
These difficulties in estimating or , respectively, serve as our motivation for deriving a generalized MM estimator. For this purpose, we consider the NB’s Stein identity according to Brown & Phillips [5, Lemma 1], which can be expressed as either
4.1
4.2
for any function f such that , exist. Note that the discrete difference in (4.1) and (4.2) plays a similar role as the continuous derivative in the previous Stein identities (2.1) and (3.2).Stein-MM estimators are now derived by solving (4.1) in or (4.2) in , respectively, and by using again sample moments instead of the involved population moments (with being estimated by ). As a result, the (closed-form) classes of Stein-MM estimators for and are obtained as
4.3
Note that the choice (hence ) leads to the default MM estimators given above. The ML estimators are not covered by (4.3) this time, because they do not have a closed-form expression at all. Note, however, that the so-called “weighted-mean estimator” for in (2.6) of Kemp & Kemp [12], which was motivated as a kind of approximate ML estimator, is covered by (4.3), namely by choosing with . It is also worth pointing to Savani & Zhigljavsky [17], who define an estimator of based on the moment for some specified f; their approach, however, usually does not lead to a closed-form estimator.For deriving the asymptotic distribution of the general Stein-MM estimator or , respectively, we first define the vectors with as
4.4
Their mean equals4.5
where we define for any . Then, the following CLT holds.Theorem 4.1
If are i. i. d. according to a negative binomial distribution, then the sample mean of according to (4.4) is asymptotically normally distributed aswhere denotes the multivariate normal distribution, and where the covariances are given as
The proof of Theorem 4.1 is provided by Appendix A.9.
In the second step of deriving the Stein-MM estimators’ asymptotics, we define the function
for ,
for .
Theorem 4.2
Let be i. i. d. according to , and define . Then, is asymptotically normally distributed, where the asymptotic variance and bias, respectively, are given by and
The proof of Theorem 4.2 is provided by Appendix A.10.
Theorem 4.3
Let be i. i. d. according to , and define . Then, is asymptotically normally distributed, where the asymptotic variance and bias, respectively, are given by and
The proof of Theorem 4.3 is provided by Appendix A.11.
Our first special case shall be the function with , which is inspired by Kemp & Kemp [12]. For evaluating the asymptotics in Theorems 4.1–4.3, we need to compute the momentsAs shown in the following, this can be done by explicit closed-form expressions. The idea is to utilize the probability generating function (pgf) of the NB-distribution,together with the following property:where for denote the falling factorials. The main result is summarized by the following lemma.
Lemma 4.4
Let . For the mixed factorial moments, we have
The proof of Lemma 4.4 is provided by Appendix A.12. The factorial moments are easily transformed into raw moments by using the relation , where are the Sterling numbers of the second kind (see [11], p. 12). Then, follows by plugging-in into Lemma 4.4.
Fig. 5 [Images not available. See PDF.]
Stein-MM estimator for . Plots of and for parametrization (4.1), where points indicate minimal variance and bias values. Weighting function (a)–(b) with , and (c)–(d) with . The gray graphs in (c)–(d) correspond to the comparative choice , which leads to the default MM estimator for (dotted lines)
While general closed-form formulae are possible in this way for as well as for Theorems 4.1–4.3, the obtained results are very complex such that we decided to omit the final expressions. Instead, we compute and, thus, the expressions of Theorems 4.1–4.3 numerically. This is easily done in practice, in fact for any reasonable choice of the function f, by computingwhere the upper summation limit M is chosen sufficiently large, e. g., such that falls below a specified tolerance limit. In this way, we generated the illustrative graphs in Figs. 5 (estimator ) and 6 (estimator ). There, parts (a)–(b) always refer to the above choice , and clear minima for variance and bias for can be recognized. To be able to compare with the respective default MM estimator, we did analogous computations for (where for the default MM estimator), which, however, is only defined for as X becomes zero with positive probability. As can be seen from the gray curves in parts (c)–(d), variance and basis usually do not attain a local minimum for . Therefore, parts (c)–(d) mainly focus on a slight modification of the weight function, namely , which is also well-defined for .
Fig. 6 [Images not available. See PDF.]
Stein-MM estimator for . Plots of and for parametrization (4.2), where points indicate minimal variance and bias values. Weighting function (a)–(b) with , and (c)–(d) with . The gray graphs in (c)–(d) correspond to the comparative choice , which leads to the default MM estimator for (dotted lines)
Table 4. Stein-MM estimators and , optimal choices for or a, respectively (columns “”), and corresponding minimal value of variance and bias (columns “min.”)
, optimal choice for | , optimal choice for | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Variance | Bias | Variance | Bias | |||||||
f(x) | Min | Min | Min | Min | ||||||
1.0 | 0.286 | 0.751 | 5.095 | 0.620 | 5.093 | 0.751 | 0.271 | 0.544 | 0.920 | |
1.5 | 0.375 | 0.771 | 14.271 | 0.668 | 10.102 | 0.771 | 0.407 | 0.595 | 1.144 | |
2.5 | 0.500 | 0.805 | 59.113 | 0.727 | 26.022 | 0.805 | 0.641 | 0.650 | 1.475 | |
1.0 | 0.286 | 0.489 | 5.002 | 0.990 | 5.311 | 0.489 | 0.267 | 0.990 | 0.985 | |
1.5 | 0.375 | 0.332 | 14.130 | 0.861 | 10.440 | 0.332 | 0.404 | 0.990 | 1.205 | |
2.5 | 0.500 | 0.097 | 58.908 | 0.470 | 26.680 | 0.097 | 0.639 | 0.896 | 1.544 |
The optimal choices for and a, respectively, lead to very similar variance and bias values, see Table 4. While leads to a slightly larger variance than , its optimal bias is visibly lower. For both choices of f, however, the optimal Stein-MM estimators perform clearly better than the default MM estimator, see the dotted line at in parts (c)–(d) in Figs. 5 and 6. Altogether, also in view of the fact that explicit closed-form expressions are possible for (although being rather complex), we prefer to use as the weighting function, in accordance to Kemp & Kemp [12]. For this choice, we also did a simulation experiment with replications (in analogy to Remark 3.7), in order to check the finite-sample performance of the asymptotic expressions for variance and bias. We generally observed a very good agreement between asymptotic and simulated values. Especially for the estimator , the asymptotic approximations show an excellent performance, whereas the estimator sometimes leads to extreme estimates if and . But except these few outlying estimates, also is well described by the asymptotic formulae. Detailed simulation results are available from the authors upon request.
Table 5. Counts of red mites on apple leaves from Example 4.5: Stein-MM estimates (upper part) and (lower part) for different choices of function
0.25 | 0.5 | 0.530 | 0.690 | 0.75 | – | |
---|---|---|---|---|---|---|
Notes | (i) | (ii) | (iii) | |||
0.967 | 0.963 | 0.967 | 1.009 | 1.032 | 1.167 | |
(i) -optimal | (ii) -optimal | (iii) MM estimate | ||||
0.222 | 0.25 | 0.5 | 0.690 | 0.75 | — | |
Notes | (i) | (ii) | (iii) | |||
0.459 | 0.457 | 0.456 | 0.468 | 0.474 | 0.504 | |
(i) -optimal | (ii) -optimal | (iii) MM estimate |
Example 4.5
As an illustrative data example, let us consider counts of red mites on apple leaves (see [16], p. 271), who confirmed “a good fit of the negative binomial” for these data. The parameter is estimated by the sample mean as . In case of the -parametrization, we use the ordinary MM estimator as an initial estimator for , leading to the value . Based on this initial model fit, we search for Stein-MM estimators with having an improved performance. The resulting estimates (together with further estimates for comparative purposes) are summarized in the upper part of Table 5. It can be seen that the initial estimate (last column) is corrected downwards to a value close to 1 (i. e., we essentially end up with the special case of a geometric distribution). Here, it is interesting to note that the numerically computed ML estimate as reported in Rueda & O’Reilly [16], also leads to such a result, namely to the value 1.025. In this context, we also recall Kemp & Kemp [12] who proposed the choice to get a closed-form approximate ML estimator for .
We repeated the aforementioned estimation procedure also for the -parameterization, starting with the initial MM-estimate for , see the lower part of Table 5. Again, the initial estimate is corrected downwards to a value around 0.46.
Conclusions
In this article, we demonstrated how Stein characterizations of (continuous or discrete) distributions can be utilized to derive improved moment estimators of model parameters. The main idea is to first choose an appropriate parametric class of weighting functions. Then, the final parameter values are determined such that the resulting Stein-MM estimator has asymptotically optimal mean or variance properties within the considered class, and certainly also lower variance and bias than existing parameter estimators. Here, the optimal choice from the given class of weighting functions is implemented based closed-form expressions for asymptotic distributions, possibly together with numerical optimization routines. The whole procedure was exemplified for three types of distribution: the continuous exponential and inverse Gaussian distribution, as well as the discrete negative-binomial distribution. For all these distribution families, we observed an appealing performance in various aspects, and we also demonstrated the application of our findings to real-world data examples.
Taking together the present article and the contributions by Arnold et al. [2], Wang & Weiß [28], Ebner et al. [6], Stein-MM estimators are now available for a wide class of continuous distributions. For discrete distributions, however, only the negative-binomial distribution (see Sect. 4 before) and the discrete Lindley distributions [see 28] have been considered so far. Thus, future research should be directed towards Stein-MM estimators for further common types of discrete distribution. Our research also gives rise to several further directions for future research. While our main focus was on selecting the weight function with respect to minimal bias or variance, we also briefly pointed out in Sect. 2 that such a choice could also be motivated by robustness to outliers. In fact, there are some analogies to “M-estimation” as introduced by Huber [9]. It appears to be promising to analyze if robustified MM estimators can be achieved by suitable classes of weighting function. As another direction for future research (briefly sketched in Sect. 2), the performance of GoF-tests based on Stein-MM estimators should be investigated. Finally, one should analyze Stein-MM estimators in a regression or time-series context.
Acknowledgements
The author thank the two referees for their useful comments on an earlier draft of this article.
Author Contributions
SN and CHW contributed to the design and implementation of the research, to the analysis of the results and to the writing of the manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL. This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Projektnummer 437270842.
Availability of data and materials
The datasets analyzed during the current study are available in Folks & Chhikara [8, p. 272] and Rueda & O’Reilly [16, p. 271], respectively.
Declarations
Conflict of interest
The authors declare they have no conflict of interest.
Consent for publication
The authors hereby consent to publication of their work upon its acceptance.
References
1. Anastasiou, A; Barp, A; Briol, F-X; Ebner, B; Gaunt, RE; Ghaderinezhad, F; Gorham, J; Gretton, A; Ley, C; Liu, Q; Mackey, L; Oates, CJ; Reinert, G; Swan, Y. Stein’s method meets computational statistics: a review of some recent developments. Stat. Sci.; 2023; 38,
2. Arnold, BC; Castillo, E; Sarabia, JM. A multivariate version of Stein’s identity with applications to moment calculations and estimation of conditionally specified distributions. Commun. Stat. Theory Methods; 2001; 30,
3. Bernardo, JM; Smith, AFM. Bayesian Theory; 1994; New York, John Wiley & Sons Inc: [DOI: https://dx.doi.org/10.1002/9780470316870]
4. Betsch, S; Ebner, B; Nestmann, F. Characterizations of non-normalized discrete probability distributions and their application in statistics. Electronic Journal of Statistics; 2022; 16,
5. Brown, TC; Phillips, MJ. Negative binomial approximation with Stein’s method. Methodol. Comput. Appl. Probab.; 1999; 1,
6. Ebner, B., Fischer, A., Gaunt, R.E., Picker, B., Swan, Y.: Point estimation through Stein’s method. arXiv:2305.19031 (2023)
7. Elfessi, A; Reineke, DM. A Bayesian look at classical estimation: the exponential distribution. J. Stat. Educ.; 2001; 9,
8. Folks, JL; Chhikara, RS. The inverse Gaussian distribution and its statistical application – a review. J. Roy. Stat. Soc. B; 1978; 40,
9. Huber, PJ. Robust estimation of a location parameter. Ann. Math. Stat.; 1964; 35,
10. Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, Volume 1. 2nd edition, John Wiley & Sons, Inc., New York (1995)
11. Johnson, N.L., Kemp, A.W., Kotz, S.: Univariate Discrete Distributions. 3rd edition, John Wiley & Sons, Inc., New York (2005)
12. Kemp, AW; Kemp, CD. A rapid and efficient estimation procedure for the negative binomial distribution. Biomed. J.; 1987; 29,
13. Koudou, AE; Ley, C. Characterizations of GIG laws: a survey. Probab. Surv.; 2014; 11, pp. 161-176.MathSciNet ID: 3264557[DOI: https://dx.doi.org/10.1214/13-PS227]
14. Kubokawa, T.: Stein’s identities and the related topics: an instructive explanation on shrinkage, characterization, normal approximation and goodness-of-fit. Japanese Journal of Statistics and Data Science, in press (2024)
15. Landsman, Z; Valdez, EA. The tail Stein’s identity with applications to risk measures. N. Am. Actuar. J.; 2016; 20,
16. Rueda, R; O’Reilly, F. Tests of fit for discrete distributions based on the probability generating function. Commun. Stat.-Simul. Comput.; 1999; 28,
17. Savani, V; Zhigljavsky, AA. Efficient estimation of parameters of the negative binomial distribution. Commun. Stat.-Theory Methods; 2006; 35,
18. Serfling, RJ. Approximation Theorems of Mathematical Statistics; 1980; New York, John Wiley & Sons Inc: [DOI: https://dx.doi.org/10.1002/9780470316481]
19. Seshadri, V. The Inverse Gaussian Distribution; 1999; New York, Springer: [DOI: https://dx.doi.org/10.1007/978-1-4612-1456-4]
20. Shuster, JJ. Nonparametric optimality of the sample mean and sample variance. Am. Stat.; 1982; 36,
21. Stein, C. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Proc. Sixth Berkeley Sympos. Math. Stat. Probab.; 1972; 2, pp. 583-602.MathSciNet ID: 402873
22. Stein, C.: Approximate Computation of Expectations. IMS Lecture Notes, Volume 7, Hayward, California (1986)
23. Stein, C., Diaconis, P., Holmes, S., Reinert, G.: Use of exchangeable pairs in the analysis of simulations. In P. Diaconis & S. Holmes (eds): Stein’s Method: Expository Lectures and Applications, IMS Lecture Notes, Vol. 46, 1–25 (2004)
24. Sudheesh, KK. On Stein’s identity and its applications. Stat. Probab. Lett.; 2009; 79,
25. Sudheesh, KK; Tibiletti, L. Moment identity for discrete random variable and its applications. Statistics; 2012; 46,
26. Tweedie, MCK. Statistical properties of inverse Gaussian distributions. I. Ann. Math. Stat.; 1957; 28,
27. Tweedie, MCK. Statistical properties of inverse Gaussian distributions. II. Ann. Math. Stat.; 1957; 28,
28. Wang, S; Weiß, CH. New characterizations of the (discrete) Lindley distribution and their applications. Math. Comput. Simul.; 2023; 212, pp. 310-322.MathSciNet ID: 4591343[DOI: https://dx.doi.org/10.1016/j.matcom.2023.05.003]
29. Weiß, C.H.: Control charts for Poisson counts based on the Stein–Chen identity. Advanced Statistical Methods in Statistical Process Monitoring, Finance, and Environmental Science, Springer, in press (2023)
30. Weiß, CH; Aleksandrov, B. Computing (bivariate) Poisson moments using Stein-Chen identities. Am. Stat.; 2022; 76,
31. Weiß, CH; Puig, P; Aleksandrov, B. Optimal Stein-type goodness-of-fit tests for count data. Biomed. J.; 2023; 65,
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© The Author(s) 2024. corrected publication 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
For parameter estimation of continuous and discrete distributions, we propose a generalization of the method of moments (MM), where Stein identities are utilized for improved estimation performance. The construction of these Stein-type MM-estimators makes use of a weight function as implied by an appropriate form of the Stein identity. Our general approach as well as potential benefits thereof are first illustrated by the simple example of the exponential distribution. Afterward, we investigate the more sophisticated two-parameter inverse Gaussian distribution and the two-parameter negative-binomial distribution in great detail, together with illustrative real-world data examples. Given an appropriate choice of the respective weight functions, their Stein-MM estimators, which are defined by simple closed-form formulas and allow for closed-form asymptotic computations, exhibit a better performance regarding bias and mean squared error than competing estimators.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer