Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5DOI 10.1186/s40488-015-0029-5
*Correspondence: mailto: [email protected]
Web End [email protected] Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong, P. R. China
METHODOLOGY Open Access
http://crossmark.crossref.org/dialog/?doi=10.1186/s40488-015-0029-5-x&domain=pdf
Web End = Multivariate zero-truncated/adjusted Charlier series distributions with applications
Xiqian Ding, Da Ju and Guo-Liang Tian*
Abstract
Although the univariate Charlier series distribution (Biom. J. 30(8):10031009, 1988) and bivariate Charlier series distribution (Biom. J. 37(1):105117, 1995; J. Appl. Stat. 30(1):6377, 2003) can be easily generalized to the multivariate version via the method of stochastic representation (SR), the multivariate zero-truncated Charlier series (ZTCS) distribution is not available to date. The first aim of this paper is to propose the multivariate ZTCS distribution by developing its important distributional properties, and providing efficient likelihood-based inference methods via a novel data augmentation in the framework of the expectationmaximization (EM) algorithm. Since the joint marginal distribution of any r-dimensional sub-vector of the multivariate ZTCS random vector of dimension m is an r-dimensional zero-deflated Charlier series (ZDCS) distribution (1 r < m), it is the second objective of the paper to introduce a new
family of multivariate zero-adjusted Charlier series (ZACS) distributions (including the multivariate ZDCS distribution as a special member) with a more flexible correlation structure by accounting for both inflation and deflation at zero. The corresponding distributional properties are explored and the associated maximum likelihood estimation method via EM algorithm is provided for analyzing correlated count data. Some simulation studies are performed and two real data sets are used to illustrate the proposed methods.
Mathematics subject classification primary: 62E15; Secondary 62F10 Keywords: EM algorithm; Multivariate zero-adjusted Charlier series distributions;
Multivariate zero-truncated Charlier series distribution; Univariate Charlier series distribution
1 Introduction
The univariate Charlier series (CS) distribution was first introduced by Ong (1988) in the consideration of the conditional distribution of a bivariate Poisson distribution. The CS distribution is a convolution of a binomial variate and a Poisson variate. Let X0
Binomial(K, ), X1 Poisson(), and (X0, X1) be mutually independent (denoted by X0X1). Then a discrete non-negative random variable X is said to follow the CS distribution with parameters K N
= {1, 2, . . . , }, [0, 1) and R+, denoted by X CS(K, ; ), if it can be stochastically represented by X = X0 + X1. Its probability mass function (pmf) is given by
Pr(X = x) =
min(K,x)
k=0
k(1 )Kk xke(x k)!, x = 0, 1, . . . , . (1.1)
2015 Ding et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/
Web End =(http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/
Web End =(http://creativecommons. http://creativecommons.org/publicdomain/zero/1.0/
Web End =org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
K k
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 2 of 21
The mean and variance of X are given by
E(X) = K + and Var(X) = K(1 ) + . (1.2)
Let X1, . . . , Xn
iid
CS(K, ; ) and the observed data be Yobs = {x1, . . . , xn}, where x1, . . . , xn are the realizations of X1, . . . , Xn. Let x and s2 be the sample mean and variance, respectively. Assuming K is known, Ong (1988) derived the moment estimates of the parameters in the univariate CS distribution as follows:
=
1/2and = x K . (1.3)
Next, Papageorgiou and Loukas (1995) proposed a bivariate CS distribution which arises as the conditional distribution from a trivariate Poisson distribution studied by Loukas and Papageorgiou (1991) and Loukas (1993). Let X0 Binomial(K, ) and
Xi0 ind
Poisson(i), i = 1, 2 and define Xi = X0 + Xi0, i = 1, 2. Then a discrete nonnegative random vector x = (X1, X2) is said to follow a bivariate CS distribution with parameters K N, [0, 1) and i R+, i = 1, 2. We denote it by x CS(K, ; 1, 2).
Its probability generating function, marginal means and the covariance are given by
Gx (z) = E
=
1/2, 1 = x1 K and 2 = x2 K . (1.6)
In addition, Papageorgiou and Loukas (1995) also discussed the method of ratio of frequencies and the maximum likelihood estimate method.
Although the univariate Charlier series distribution (Ong 1988) and bivariate Charlier series distribution (Karlis 2003, Papageorgiou and Loukas 1995) can be easily generalized to the multivariate version via the method of stochastic representation (SR), the multivariate zero-truncated Charlier series (ZTCS) distribution is not available to date. The first aim of this paper is to propose the multivariate ZTCS distribution by developing its important distributional properties, and providing efficient likelihood-based inference methods via a novel data augmentation in the framework of the expectation maximization (EM) algorithm. Since the joint marginal distribution of any r-dimensional sub-vector of the multivariate ZTCS random vector of dimension m is an r-dimensional zero-deflated Charlier series (ZDCS) distribution (1 r < m), it is the second objective of the paper to introduce a new family of multivariate zero-adjusted Charlier series (ZACS) distributions (including the multivariate ZDCS distribution as a special member) with a more flexible correlation structure by accounting for both inflation and deflation at zero. The corresponding distributional properties are explored and the associated maximum likelihood estimation method via EM algorithm is provided for analyzing correlated count data.
x s2
K
exp{1(z1 1) + 2(z2 1)} [(1 ) + z1z2]K , (1.4)
E(Xi) = K + i and Cov(X1, X2) = K(1 ), i = 1, 2. (1.5)
Let x1, . . . , xn ind
CS(K, ; 1, 2), where xj = (X1j, X2j) for j = 1, . . . , n and the observed data be Yobs = {x1, . . . , xn}, where x1, . . . , xn are the realizations of x1, . . . , xn. Let x1, x2 be the sample mean for X1 and X2 and m11 be the sample covariance, respectively. Assuming
K is known, Papageorgiou and Loukas (1995) obtained the moment estimates of the three parameters as follows:
=
1
2
1
zX11zX22
4m11
K
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 3 of 21
The rest of the paper is organized as follows. In Section 2, the multivariate ZTCS distribution is proposed and some important distributional properties are explored. In Section 3, the likelihood-based methods are developed for the multivariate ZTCS distribution. In Sections 4 and 5, we introduce the multivariate ZACS distribution, explore its distributional properties and provide associated likelihood-based methods for the case of without covariates. In Section 6, some simulation studies are performed to evaluate the proposed methods. In Section 7, two real data sets are used to illustrate the proposed methods. Section 8 provides some concluding remarks.
2 Multivariate zero-truncated Charlier series distribution
Let X00 Binomial(K, ), {Xi0}mi=1 ind Poisson (i), X00{X10, . . . , Xm0} and define
Xi = X00 + Xi0, i = 1, . . . , m.
A discrete non-negative random vector x = (X1, . . . , Xm) is said to follow an m-dimensional CS distribution with parameters K N = {1, 2, . . . , }, [0, 1) and
= (1, . . . , m) Rm+, denoted by x CS (K, ; 1, . . . , m) or x CSm(K, ; ),
accordingly. The joint pmf of x is
Pr(x = x) =
0, with probability ,
w, with probability 1 ,
where U Bernoulli (1 ) with = (1 )Ke+, + =
Let w ZTCSm(K, ; ), then we have Pr(w = 0) = 0 and
w d= x|(x = 0), (2.3)
min(K,x)
k=0
K k
k(1 )Kkm
i=1xikiei(xi k)! = Qx(K, , ), (2.1)
where x = (x1, . . . , xm) , {xi}mi=1 are the corresponding realizations of {Xi}mi=1, and
min(K, x)
= min(K, x1, . . . , xm).
In particular, as K and K remains finitely large (say, 0), the distribution of
Binomial(K, ) tends to the distribution of Poisson(0), so the above m-dimensional CS distribution approaches to the m-dimensional Poisson distribution MP(0, 1, . . . , m). Furthermore, if = 0, then Pr(X00 = 0) = 1 (i.e., X00 follows the degenerate distribution with all mass at zero, denoted by X00 Degenerate(0)) and 0 = 0, so the m-dimensional
CS distribution becomes the product of m independent Poisson(i) distributions.
Motivated by the Type II multivariate zero-truncated Poisson (ZTP) distribution developed recently by Tian et al. (2014), we in this paper propose a new multivariate zero-truncated Charlier series (ZTCS) distribution, whose limiting form reduces to the Type II multivariate ZTP distribution.
Definition 1. Let x CS (K, ; 1, . . . , m). A discrete non-zero random vector w = (W1, . . ., Wm) is said to have the multivariate ZTCS distribution with the parameters (K, ) and = (1, . . . , m) , denoted by w ZTCSm(K, ; ) or w
ZTCS(K, ; 1, . . . , m), if
x d= U w =
(2.2)
mi=1 i = 1, and Uw.
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 4 of 21
where x is specified in Definition 1. The SR (2.3) can be used to generate the ZTCS random vector w via the generation of the random vector x from the multivariate CS distribution, while the SR (2.2) is useful in deriving important distributional properties in the following subsections and in developing an EM algorithm in Section 3.1. Moreover, besides coming from the missing zero vector, the correlation between any two components of w may come from the common random variable
X00 Binomial(K, ).
2.1 Joint probability mass function and mixed moments
From the SR (2.2), the joint pmf of w ZTCSm(K, ; ) is
f (w; K, , ) = Pr(w = w) (2.2)=
Pr(x = w)
Pr(U = 1)
1
= 1 (1 )Ke
+
min(K,w)
k=0
K k
k(1 )Kkm
i=1wikiei(wi k)!, (2.4)
where w 1 = 0. From (2.2), it is easy to show that
E(w) =
+K11
1 ,
E(ww ) =
diag()+ +K(11 +11 )+K(1+K)1111
1 ,
Var(w) =
1 1
diag() + K(1 ) 1111
(2.5)
1
+ K
11 + 11
+ K221111
,
where 11 = 11m = (1, . . . , 1) . Thus we have
Corr(Wi, Wj) =
K(1 )
1 (i + K)(j + K)
i + K(1 )
1 (i + K)2
j + K(1 )
1 (j + K)2
, (2.6)
for i = j. In particular, when = 0, (2.6) becomes
Corr(Wi, Wj) =
ij (e+ 1 i)(e
+ 1 j)
, i = j.
In (2.6), let i = j = , we obtain
Corr(Wi, Wj) =
K(1 )
1 ( + K)2 + K(1 )
1 ( + K)2
, i = j.
For any r1, . . . , rm 0, the mixed moments of w are given by
E
m
i=1 Wrii
=
(1 )1E
m
i=1 Xrii
=
(1 )1E
i=1(X00 + Xi0)ri . (2.7)
m
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 5 of 21
2.2 Moment generating function
Using the identity of E() = E[E(|U)], the moment generating function (mgf) of x is
Mx(t) = E[exp(t x)] = E[exp(U t w)] = E
E[exp(Ut w)|U]
= E [Mw(Ut)] = Mw(0) + (1 )Mw(t) = + (1 )Mw(t).
Thus the mgf of w ZTCS(K, ; 1, . . . , m) is given by
Mw(t) =
Mx(t) 1
=
MX00(t+)
mi=1 MXi0(ti) 1
=
et+ + 1
K exp mi=1 ieti +
(1 )Ke+ 1 (1 )Ke
+ ,
mi=1 ti.
2.3 Marginal distributions2.3.1 Marginal distribution for each random component
Let w = (W1, . . . , Wm) ZTCS(K, ; 1, . . . , m). We first derive the marginal distribution of Wi with realization wi for i = 1, . . . , m. If wi > 0, then
Pr(Wi = wi) =
w1=0
where t+ =
wi1=0
wi+1=0
wm=0Pr(w = w)
1
= 1 (1 )Ke
+ w1=0
wi1=0
wi+1=0
wm=0
min{K,wi}
k=0
K k
k(1 )Kkm
j=1wjkjej(wj k)!
I(wj k > 0)
1
= 1 (1 )Ke
+
min{K,wi}
k=0
K k
k(1 )Kk wikiei (wi k)!
w1=0
wi1=0
m
wi+1=0
wm=0
j=1,j =iwjkjej(wj k)!
I(wj k > 0)
1
= 1 (1 )Ke
+
min{K,wi}
k=0
K k
k(1 )Kkwikiei(wi k)!(2.8)
1 i
= 1 (1 )Ke
i
min{K,wi}
k=0
K k
k(1 )Kkwikiei(wi k)!, (2.9)
where
1 i =
1 (1 )Kei 1 (1 )Ke
+ . (2.10)
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 6 of 21
Hence,
Pr(Wi = 0) = 1
wi=1Pr(Wi = wi)
(2.8)
= 1
1
1 (1 )Ke
+ wi=1
min{K,wi}
k=0
K k
k(1 )Kk wikiei (wi k)!
= 1
(2.10)
= i (2.11)
=
1 (1 )Kei 1 (1 )Ke
+
(1 )K(ei e+)1 (1 )Ke
+
0, (1 )Kei
(0, 1).
By combining (2.11) with (2.9) and noting that a ZDCS distribution is a special case of a ZACS distribution (4.2), we obtain
Wi ZDCS(i; K, , i). (2.12)
2.3.2 Marginal distribution for an arbitrary random sub-vector
Second, the marginal distribution for an arbitrary random sub-vector will be considered. Before that, a so-called multivariate zero-adjusted Charlier series distribution is needed to be introduced. We will give the definition of this distribution in Definition 2 in Section 4. We now consider the marginal distributions of w(1) and w(2), where
w(1) =
W1 ...Wr
, w(2) =
Wr+1...
Wm
w(1)
w(2)
and w =
.
Furthermore in Section 4, we will introduce multivariate zero-adjusted Charlier series distribution and it can be shown that
w(1) ZDCS((1); K, , 1, . . . , r) and w(2) ZDCS((2); K, , r+1, . . . , m),(2.13)
where
(i) =
(1 )K
e
(i)+ e+
1 (1 )Ke
+
0, (1 )Ke
(i) +
(0, 1), i = 1, 2, (2.14)
(1)+ =
ri=1 i and (2)+ =
mi=r+1 i.
In fact, for any positive integers i1, . . . , ir satisfying 1 i1 < < ir m, we have
Wi1
...
Wir
ZDCS(; K, , i1, . . . , ir), (2.15)
where
=
(1 )K
e(i1 ++
ir ) e+
1 (1 )Ke
+
0, (1 )Ke(i
1 ++ir )
(0, 1).
(2.16)
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 7 of 21
2.4 Conditional distributions2.4.1 Conditional distribution of w(1)|w(2)
From (2.4), (2.13) and (4.4), the conditional distribution of w(1)|w(2) is given by
Pr(w(1) = w(1)|w(2) = w(2)) =
f (w; K, , ) Pr(w(2) = w(2))
(2.17)
= (2)I(w(2) = 0) + )
1
1(1)
1 K e+ Qw(K, , )
(2)
(2) (K, , (2))
*
I(w(2) = 0)
,
1(1)
K e
(2)+ Qw
where w(2) = (wr+1, . . . , wm) , (2) = (r+1, . . . , m) and
Qw(K, , ) =
min{K,w}
k=0
K k
k(1 )Kkm
j=1wjkjej (wj k)!,
p=r+1 wplpep (wp p)!.
We first consider Case I: w(2) = 0. Under Case I, it is possible that w(1) = 0 or w(1) = 0. From (2.17), it is easy to obtain
Pr w(1) = w(1)|w(2) = w(2) =
e
Qw
(2)
K, , (2) = min{K,w(2)}
l=0
K l
l(1 )Klm
(1) +
min{K,w}
k=0
K k
k(1 )Kkm
j=1wjkj (wjk)!
. (2.18)
Case II: w(2) = 0. Under Case II, it is obviously that w(1) = 0 and the sharing binomial variable equals to zero. Thus we have
Pr
w(1) = w(1)|w(2) = 0 =11 e(1)+r
i=1wiiei wi! .
min{K,w
(2)}
l=0
K l
l(1 )Klm
p=r+1wplp (wpl)!
This implies
w(1)|
w(2) = 0
ZTP(I)(1, . . . , r). (2.19)
2.4.2 Conditional distribution of X0|(w, U)
The stochastic representation (2.2) can be rewritten as
(X1, . . . , Xm) = (X0 + X1, . . . , X0 + Xm) d= Uw,
where X0 Binomial(K, ) and {Xi}mi=1 ind Poisson(i). To obtain the conditional dis
tribution of X0|(w, U), we consider two cases: U = 1 and U = 0. When U = 1, the conditional distribution of X0|(w, U) is given by
Pr(X0 = l|w = w, U = 1) =
Pr(X0 = l, X1 = w1 l, . . . , Xm = wm l) Pr(X1 = w1, . . . , Xm = wm)
K l
l(1 )Klm
i=1wili (wil)!
=
min(K,w)
k=0
K k
k(1 )Kkm
i=1wiki (wik)!
= ql(w, K, , ), (2.20)
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 8 of 21
for l = 0, 1, . . . , min(K, w), which implying1
X0|(w = w, U = 1) Finite(l, ql(w, K, , ); l = 0, 1, . . . , min(w)). (2.21)
When U = 0, we obtain Pr(X0 = 0|w = w, U = 0) = 1, i.e.,
X0|(w = w, U = 0) Degenerate(0). (2.22)
Hence, for any l, we have
Pr(X0 = l|w = w, U = 0) = I(l = 0). (2.23)
Thus, we have the conditional distribution of X0|(w, U), which is given by the following:
X0|(w, U)
where ql(w, K, , ) is defined by (2.20).
2.4.3 Conditional distribution of X0|w
By using (2.24), the conditional distribution of X0|w is
Pr(X0 = l|w = w) =
1
1
1 e+(1 )K
min(K,w)
I(min(w) 1). (2.27)
3 Likelihood-based methods for the multivariate ZTCS distribution
Suppose that wj ind
ZTCS(K, ; 1, . . . , m), where wj = (W1j, . . . , Wmj) for j = 1, . . . , n. Let wj = (w1j, . . . , wmj) denote the realization of the random vector wj, and Yobs =
Finite(l, ql(w, K, , ); l = 0, 1, . . . , min(w)), if U = 1,
Degenerate(0), if U = 0,
(2.24)
u=0
Pr(X0 = l, U = u|w = w)
=
u=0Pr(U = u|w = w) Pr(X0 = l|w = w, U = u)
= Pr(U = 0) Pr(X0 = l|w = w, U = 0)
+ Pr(U = 1) Pr(X0 = l|w = w, U = 1)
(2.24)
= e+(1 )K I(l = 0)+[1 e+(1 )K] ql(w, K, , )
= pl(w, K, , ), (2.25)
for l = 0, 1, . . . , min(K, w), where ql(w, K, , ) is defined by (2.20). Thus,
X0|(w = w) Finite(l, pl(w, K, , ); l = 0, 1, . . . , min(K, w)). (2.26)
Especially, when min(K, w) = 0, we have X0|(w = w) Degenerate(0). Thus, the conditional expectation of X0|w is given by
E(X0|w = w) =
k=1k
K k
k(1 )Kkm
i=1wiki (wik)!
min(K,w)
k=0
K k
k(1 )Kkm
i=1wiki (wik)!
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 9 of 21
{wj}nj=1 be the observed data. We consider K as a known positive integer. Then, the
observed-data likelihood function for (, ) is
L(, |Yobs) =
n
j=1e+1 (1 )Ke +
min(K,wj)
k=0
K k
k(1 )Kkm
i=1wijki (wij k)!,
so that the log-likelihood function is (, |Yobs) = n log
e+ (1 )K
+
n
j=1log
min(K,wj)
k=0
K k
k(1 )Kkm
i=1wijki (wij k)!
. (3.1)
3.1 MLEs via the EM algorithm
The SR (2.2) can motivate a novel EM algorithm, where some latent variables are independent of the observed variables. For each wj = (w1j, . . . , wmj) , we introduce latent variables Uj
iid
Bernoulli(1 ) with = (1 )Ke+, X0j
iid
Binomial(K, ),
Xij iid
Poisson(i) for i = 1, . . . , m, and X0jXij, such that
x0j + x1j, . . . , x0j + xmj = ujwj,
where uj and xij denote the realizations of Uj and Xij, respectively. We denote the latent/missing data by Ymis =
uj, x0j, x1j, . . . , xmj
nj=1, so that the complete data are
Ycom = Yobs Ymis =
wj, uj, x0j, x1j, . . . , xmj
n j=1
=
x0j, x1j, . . . , xmj
nj=1 = x0j, uj, wj
n j=1 ,
where xij = ujwij x0j for j = 1, . . . , n and i = 1, . . . , m. Thus, the complete-data likelihood function is given by
L(, |Ycom) =
n
j=1
K x0j
x0j (1 )Kx0jm
i=1xiji ei xij!
=
n
j=1
K x0j
x0j (1 )Kx0jm
i=1ujwijx0ji ei
ujwij x0j !
nx
0 (1 )nKnx
0
m
i=1
nj=1 ujwijnx0i eni, (3.2)
where x0 = (1/n)
nj=1 x0j. The complete-data log-likelihood function is
(, |Ycom) = nx0 log +
nK nx0
log(1)+
n
m
i=1
j=1ujwij nx0
log i ni
.
nj=1 ujwijn K , i = 1, . . . , m, (3.3)
and the E-step is to replace {uj}nj=1 and
x0j
The M-step is to calculate the complete-data maximum likelihood estimates (MLEs):
= x0
K and
i =
nj=1 in (3.3) by their conditional expectations:
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 10 of 21
E(Uj|Yobs, , ) = E(Uj) = 1 (1 )Ke+, and (3.4)
E(X0j|Yobs, , ) (2.27)=
[1 (1 )Ke+]
min(K,wj)
kj=1kj
K kj
kj(1 )Kkjm
i=1wijkji (wijkj)!
i=1wijkji (wijkj)!
I(min(wj) 1), (3.5)
respectively. An important feature of this EM algorithm is that the latent variables {Uj}nj=1 are independent of the observed variables {wj}nj=1.
Also note that here we assume that K is a known positive integer. In practice, since K {1, 2, . . . , N}, say N = 100. For a given K, we first use the EM iteration (3.3)(3.5) to find the MLEs of and , denoted by
and
. Then, we can calculate (
min(K,wj)
kj=0
K kj
kj(1 )Kkjm
,
|Yobs) and
choose the K that maximizes (
,
|Yobs).
3.2 Bootstrap confidence intervals
When other approaches are not available, the bootstrap method is a useful tool to find confidence intervals (CIs) for an arbitrary function of (, ), say, = h(, ). Let (
,
)
be the MLEs of (, ) calculated by the EM algorithm (3.3)(3.5), then
= h(
,
) is the
MLE of . Based on (
,
), we can generate wj iid
ZTCS(K,
,
1, . . . ,
m) via the SR (2.2)
for j = 1, . . . , n. Having obtained Yobs =
+w1, . . . , wn
,, we can calculate the bootstrap
replication ( , ) and get = h( , ). Independently repeating this process G times,
we obtain G bootstrap replications
g
Gg=1. Consequently, the standard error, se( ), of
can be estimated by the sample standard deviation of the G replications, i.e.,
-se( ) =
1G 1
G
g=1
g ( 1 + + G)/G
2
1/2
. (3.6)
If
g
Gg=1 is approximately normally distributed, the first(1 )100 % bootstrap CI for
is
z/2
-se( ), + z/2
-se( )
. (3.7)
Alternatively, if
Gg=1 is non-normally distributed, the second(1 )100 % bootstrap
CI of can be obtained as
[
L,
U] , (3.8)
where
g
L and
U are the 100(/2) and 100(1 /2) percentiles of
g
G g=1, respectively.
4 Multivariate zero-adjusted Charlier series distribution
To introduce the multivariate zero-adjusted Charlier series (ZACS) distribution, we first define the univariate ZACS distribution. A non-negative discrete random variable Y is said to have a ZACS distribution with parameters [0, 1) and > 0, denoted by Y
ZACS(, K, , ), if
Y
d
= Z W, (4.1)
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 11 of 21
where Z Bernoulli(1 ), W ZTCS(K, , ), and Z W. It is clear that the pmf of Y is given by
Pr(Y = y) = I(y = 0) +
)
1
1 (1 )Ke
Qy(K, , )
*
I(y = 0), (4.2)
where
Qy(K, , ) =
k(1 )Kk yke (y k)!.
Motivated by (4.1), naturally, we have the following multivariate generalization.
Definition 2. A discrete random vector y = (Y1, . . . , Ym) is said to have the multivariate ZACS distribution with parameters [0, 1), K > 0, [0, 1)
and = (1, . . . , m) Rm+, denoted by y ZACSm( ; K, , ) or y ZACS( ; K, , 1, . . . , m), if
y d= Z w =
0, with probability ,w, with probability 1 ,
min(K, y)
k=0
K k
(4.3)
where Z Bernoulli(1), w ZTCS(K, ; 1, . . . , m), and Z w. The random vector w is called the base vector of the y.
It is easy to show that the joint pmf of y ZACS( ; K, , 1, . . . , m) is
Pr(y = y) = I(y = 0) +
)
1
1 (1 )Ke
+ Qy
(K, , )
*
I(y = 0), (4.4)
where
Qy(K, , ) =
min(K,y)
k=0
K k
k(1 )Kkm
i=1yikiei (yi k)!.
We consider several special cases of (4.3) or (4.4):
(i) If = 0, then y d= w ZTCS(K, ; 1, . . . , m), i.e., the multivariate ZTCS distribution is a special member of the family of the multivariate ZACS distributions. Thus, we can see that studying the multivariate ZTCS distribution is a basis for studying the multivariate ZACS distribution;
(ii) If (0, (1 )Ke+), then y follows the multivariatezero-deflated Charlier series (ZDCS) distribution with parameters (, K, , ), denoted by y ZDCSm(; K, , )
or y ZDCS(; K, , 1, . . . , m);(iii) If = (1 )Ke+, then y CSm(K, ; );(iv) If ((1 )Ke+, 1), then y follows the multivariate zero-inflated Charlier series (ZICS) distribution with parameters (, K, , ), denoted by y ZICSm( ; K, , )
or y ZICS( ; K, , 1, . . . , m).
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 12 of 21
4.1 Mixed moments and moment generating function
From (4.1) and (2.2), we immediately have
E(y) =
1
1 ( + K 11),
E(yy ) =
1
1
diag() + + K(11 + 11 ) + K(1 + K) 1111 ,
Var(y) =
1
1
diag() + K(1 ) 1111
1
+ K(11 + 11 ) + K221111 .
(4.5)
Thus, we have
Corr(Yi, Yj) =
K(1 ) (i + K)(j + K)( )/(1 )
,
i + K(1 ) 1 (i + K)2
j + K(1 ) 1 (j + K)2
for i = j. In particular, if = 0, we obtain
Corr(Yi, Yj) =
ij( )/(1 )
i 2i( )/(1 )
j 2j( )/(1 )
.
Furthermore, if i = j = , then
Corr(Yi, Yj) =
( )/(1 )
[1 ( )/(1 )]
.
Clearly, Corr(Yi, Yj) could be either positive or negative, which depend on the values of , K, and .
For any r1, . . . , rm 0, the mixed moments of y are given by
E
m
i=1 Yrii
=
(1 )E
m
i=1 Wrii
=
1
1
E
m
i=1 Xrii
. (4.6)
By using the formula of E() = E[ E(|Z )], the mgf of y is
My(t) = E[ exp(t y)] = E[ exp(Z t w)] = E
E[ exp(Z t w)|Z ]
= E[ Mw(Z t)] = Mw(0) + (1 )Mw(t) = + (1 )Mw(t)
= +
1
1
(et+ + 1 )K exp m
i=1ieti + (1 )Ke+ , (4.7)
where t+ =
mi=1 ti.
4.2 Marginal distributions
Now we consider the marginal distributions of y(1) and y(2), where
y(1) =
Y1 ...Yr
, y(2) =
Yr+1
...
Ym
y(1)
y(2)
and y =
.
Based on (4.1) and (2.13), we have
y(k)
d
d
= Z Z(k)(k), k = 1, 2,
= Z w(k)
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 13 of 21
where Z Bernoulli(1 ), Z(k) Bernoulli(1 (k)), (k) is given by (2.14), (1) ZTCS(K, ; 1, . . . , r) and (2) ZTCS(K, ; r+1, . . . , m). Note that Z Z(k)(k) and
Z Z(k) Bernoulli((1 )(1 (k))). According to the SR (4.3), we can obtain
y(1) ZACS((1); K, , 1, . . . , r) and y(2) ZACS((2); K, , r+1, . . . , m),(4.8)
where
(k) = 1(1)(1(k)) = 1(1)
1 (1 )Ke
(k) +
1 (1 )Ke
+
(0, 1), k = 1, 2, (4.9)
(1)+ =
ri=1 i and (2)+ =
mi=r+1 i.
In fact, for any positive integers i1, . . . , ir satisfying 1 i1 < < ir m, we have
Yi1
...
Yir
ZACS(; K, , i1, . . . , ir), (4.10)
where is given by (2.16) and
= 1 (1 )(1 ) = 1 (1 )
1 (1 )Ke(i
1 ++ir )
1 (1 )Ke
+
(0, 1). (4.11)
4.3 Conditional distributions4.3.1 Conditional distribution of y(1)|y(2)
From (4.4) and (4.8), the conditional distribution of y(1)|y(2) is given by
Pr(y(1) = y(1)|y(2) = y(2)) =
Pr(y = y)
Pr(y(2) = y(2))
I(y = 0) + R(y, K, , , )I(y = 0)
= (2)I(y(2) = 0) + S(y(2), K, , , (2))I(y(2) = 0)
. (4.12)
where
R(y, K, , , ) =
1
1 (1 )Ke
+
min(K,y)
k=0
K k
k(1 )Kkm
i=1yikiei(yi k)!and
=
1 (2)
1 (1 )Ke
i=r+1 yikiei (yi k)!.
We first consider Case I: y(2) = 0. Under Case I, it is clear that y = 0. From (4.12), it is easy to obtain
Pr
y(1) = y(1)|y(2) = y(2) =
e
S
y(2), K, , , (2)
(2) +
min(K,y(2))
k=0
K k
k(1 )Kkm
(1) +
min{K,y}
k=0
K k
k(1 )Kkm
j=1yjkj (yjk)!
min{K,y
(2)}
l=0
K l
l(1 )Klm
p=r+1yplp (ypl)!
.
Case II: y(2) = 0. Under Case II, it is possible that y(1) = 0 or y(1) = 0. When y(1) = 0, from (4.12), we obtain
Pr(y(1) = 0|y(2) = 0) =
(2) .
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 14 of 21
When y(1) = 0, from (4.12), we have
Pr(y(1) = y(1)|y(2) = 0) =
(1 )(1 )Ke
(2) +
(2)(1 (1 )Ke
+ )
r
i=1yiiei yi! .
4.3.2 Conditional distribution of Z |y
Since Z Bernoulli(1 ), Z only takes the value 0 or 1. Note that y = 0 is equivalent to Z = 0. Thus, Pr(Z = 0|y = 0) = Pr(Z = 0)/ Pr(y = 0) = 1. And when y = 0, we have Pr(Z = 1|y = y) = Pr(Z = 1, w = y)/ Pr(y = y) = 1. Therefore,
Z |(y = y)
Degenerate(0), if y = 0,
Degenerate(1), if y = 0,
(4.13)
i.e., Z |(y = y) Degenerate(I(y = 0)).
4.3.3 Conditional distribution of w|(y = y = 0)
If y = 0, we have
Pr(w = w|y = y) =
Pr(w = w, y = y)
Pr(y = y) =
Pr(w = y, Z = 1)Pr(y = y) =
I(w = y).
Thus, given y = y = 0, we have
w|(y = y = 0) Degenerate(y). (4.14)
5 Likelihood-based methods for multivariate ZACS distribution without covariates
Suppose that yj
iid
ZACS( ; K, , 1, . . . , m), where yj = (Y1j, . . . , Ymj) for j = 1, . . . , n. Let yj = (y1j, . . . , ymj) denote the realization of the random vector yj, and Yobs = {yj}nj=1 be the observed data. Furthermore, let
J = {j|yj = 0, j = 1, . . . , n} and m0 =
nj=1 I(yj = 0) denote the number of elements in J. We assume that K is a known positive integer. Therefore, the observed-data likelihood function is proportional to
L(, , |Yobs)
m0(1 )nm0
j/
K kj
kj(1 )Kkjm
i=1yijkji (yij kj)!
+
min(K,yj)
kj=0
.
J
e+1 (1 )Ke
Thus, we can write the log-likelihood function into two parts:
(, , |Yobs) = 1(|Yobs) + 2(, |Yobs), (5.1)
where
1(|Yobs) = m0 log + (n m0) log(1 ) and
2(, |Yobs) = (n m0)
+ + log[ 1 (1 )Ke+]
+
j/
J log
min(K,yj)
kj=0
K kj
kj(1 )Kkjm
i=1yijkji (yij kj)!
.
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 15 of 21
In other words, the parameter and the parameter vector (, ) can be estimated separately. Obviously, the MLE of has an explicit solution
=
m0n , (5.2) but the closed-form MLEs of (, ) are not yet available.
5.1 MLEs via the EM algorithm and bootstrap CIs
The objective of this section is to find the MLEs of (, ) based on (5.1). For the log-likelihood function (3.1), the corresponding EM iteration for finding the MLEs of (, ) is defined by (3.3)(3.5). By comparing (3.1) with (5.1), if we replace (
nj=1 wij) in (3.1)
with (
J yij), we promptly obtain the MLEs of (, ) by using the EM algorithm. The M-step is to calculate the complete-data MLEs:
=
j/
j/
J x0j (n m0)K
and
i =
J ujyij(n m0)
K
, i = 1, . . . , m, (5.3)
and the E-step is to replace {uj}j/J and {x0j}j/J in (5.3) by their conditional expectations:
E(Uj|Yobs, , ) = E(Uj) = 1 (1 )Ke+, and (5.4)
E(X0j|Yobs, , ) =
[ 1 (1 )Ke+]
j/
min(K,yj)
kj=1
K kj
kj(1 )Kkjm
i=1yijkji (yijkj)!
i=1yijkji (yijkj)!
I(min(yj) 1), (5.5)
min(K,yj)
kj=0
K kj
kj(1 )Kkjm
respectively.
The procedure of constructing bootstrap CIs for an arbitrary function of (, , ), say = h(, , ), is very similar to that presented in Section 3.2.
6 Simulation studies
To evaluate the performance of the proposed methods in Section 3, we investigate the accuracy of MLEs and confidence interval estimators of the parameters in the multivariate ZTCS distribution. We consider two cases for the dimension with m = 2 and m = 3.
6.1 Experiment 1: m = 2
When m = 2, the parameters (K, ; 1, 2) are set to be (5, 0.5; 3, 5). We generate {wj}nj=1 iid ZTCS(K, ; 1, . . . , m) with n = 200. Based on this simulated data set, for
different K values we first calculate the MLEs of and (1, 2) by using the EM algorithm (3.3)(3.5) and then calculate the estimated log-likelihood. We choose K = 5 that maximizes the log-likelihood among all K values. These results are reported in Table 1.
Table 1 Finding the value of K by maximizing the log-likelihood function for m = 2
K
2 Log-likelihood 3 0.66742 3.4739056 5.4464959 884.6629
4 0.58617 3.1314781 5.1040629 883.4288
5 0.51145 2.9188539 4.8914319 883.0151
6 0.44308 2.8176227 4.7901937 883.1228
1
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 16 of 21
For this fixed value of K = 5, we first calculate the MLEs of (, 1, 2) by using the EM algorithm (3.3)(3.5), the bootstrap standard deviations (stds) of these MLEs, the corresponding mean square errors (MSEs) and two 95 % bootstrap confidence intervals (CIs) of these parameters with G = 1000 by the bootstrap method presented in Section3.2. Then, we independently repeat the above process 1000 times. The resulting average MLE, std, MSE and two coverage probabilities (CPs) based on the normal-based and non-normal-based bootstrap samples, respectively, are displayed in Table 2.
From Table 2, we can see that the average MSE of
is very small while the average
2) are reasonably small. The two bootstrap coverage probabilities are close to but less than 0.95.
6.2 Experiment 2: m = 3
When m = 3, the parameters (K, ; 1, 2, 3) are set to be (4, 0.3; 2, 4, 6). We generate {wj}nj=1 iid ZTCS(K, ; 1, . . . , m) with n = 200. Based on this simulated data set, for
different K values we first calculate the MLEs of and (1, 2, 3) by using the EM algorithm (3.3)(3.5) and then calculate the estimated log-likelihood. We choose K = 4 that maximizes the log-likelihood among all K values. These results are reported in Table 3.
For this fixed value of K = 4, we first calculate the MLEs of (, 1, 2, 3) by using the EM algorithm (3.3)(3.5), the bootstrap stds of these MLEs, the corresponding MSEs and two 95 % bootstrap CIs of these parameters with G = 1000 by the bootstrap method presented in Section 3.2. Then, we independently repeat the above process 1000 times.
The resulting average MLE, std, MSE and two CPs based on the normal-based and non-normal-based bootstrap samples, respectively, are displayed in Table 4.
From Table 4, we can see that the average MSEs of
3) are very small. The
7 Two real examples7.1 Students absenteeism data
In this section, we use the data set on the number of absences of 113 students from a lecture course in two successive semesters reported by Karlis (2003) to illustrate the proposed statistical methods for the multivariate ZTCS distribution. Let W1 denote the number of absences in the first semester and W2 denote the number of absences in the second semester. The data are displayed in Table 5 below.
For the purpose of illustration, we artificially remove the (0, 0) cell counts from Table 5 and the updated data are shown in Table 6.
Let wj = (W1j, W2j)
iid
ZTCS(K, ; 1, 2) for j = 1, . . . , n with n = 98. Let wj = (w1j, w2j) denote the realization of the random vector wj, and Yobs = {wj}nj=1 be
the observed data. The parameter K of the binomial distribution is considered unknown
Table 2 The average MLE, std, MSE and two CPs of (, 1, 2) for m = 2 and K = 5
Parameter True value Average MLE Average stdB Average MSE CP CP 0.5 0.511457 0.06417 0.004209 0.930 0.932 1 3 2.918853 0.33051 0.114730 0.927 0.932 2 5 4.891431 0.35926 0.139568 0.921 0.928 stdB: The sample standard deviation for the bootstrap samples
CP: Normal-based bootstrap CP
CP: Non-normal-based bootstrap CP
MSEs of (
1,
and (
1,
2,
two bootstrap coverage probabilities are close to 0.95.
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 17 of 21
Table 3 Finding the value of K by maximizing the log-likelihood function for m = 3
K
3 Log-likelihood 3 0.4037377 1.9887834 4.2987810 5.9287793 650.5100604
4 0.3203847 1.9184570 4.2284540 5.8584519 649.8799727
5 0.2583191 1.9084000 4.2183967 5.8483944 650.1021029
6 0.2137641 1.9174109 4.2274075 5.8574051 650.1044541
Table 4 The average MLE, std, MSE and two CPs of (, 1, 2) for m = 3 and K = 4
Parameter True value Average MLE Average stdB Average MSE CP CP 0.3 0.320384 0.0541052 0.00295 0.937 0.939 1 2 1.918457 0.222460 0.05689 0.925 0.932 2 4 4.228454 0.242082 0.06476 0.921 0.925 3 6 5.858451 0.255456 0.09402 0.954 0.948 stdB: The sample standard deviation for the bootstrap samples
CP: Normal-based bootstrap CP
CP: Non-normal-based bootstrap CP
Table 5 Cross-tabulation of the students absenteeism data (Karlis 2003)
W1\W2 0 1 2 3 4 5 6 7 8 Total0 15 10 4 4 2 0 0 0 0 35 1 6 11 9 4 2 0 0 0 0 32 2 5 7 6 5 0 0 0 0 0 23 3 1 3 2 4 3 1 0 0 0 14 4 1 0 2 0 1 0 0 0 0 4 5 0 0 0 0 0 1 1 0 0 2 6 0 0 0 0 0 0 2 0 0 2 7 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 9 1 0 0 0 0 0 0 0 1 1
Total 29 31 23 17 8 2 3 0 0 113
Table 6 The number of absences of 113 students from a course in two successive semesters without the (0, 0) cell counts (Karlis 2003)
W1\W2 0 1 2 3 4 5 6 7 8 Total0 10 4 4 2 0 0 0 0 20 1 6 11 9 4 2 0 0 0 0 32 2 5 7 6 5 0 0 0 0 0 23 3 1 3 2 4 3 1 0 0 0 14 4 1 0 2 0 1 0 0 0 0 4 5 0 0 0 0 0 1 1 0 0 2 6 0 0 0 0 0 0 2 0 0 2 7 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 1 9 1 0 0 0 0 0 0 0 1 1
Total 14 31 23 17 8 2 3 0 0 98
2
1
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 18 of 21
and it is attempted to estimate this. Based on the data in Table 6, for different K values we first calculate the MLEs of and (1, 2) by using the EM algorithm (3.3)(3.5) and then calculate the estimated values of the log-likelihood function. These results are reported in Table 7.
We should choose the K that maximizes the log-likelihood among all K values. From Table 7, we observed that the values of log-likelihood monotonically increase as K .
On the other hand, K must be larger than or equal to max(W1, W2). From Table 6, we have max(W1, W2) = 9. To illustrate how to obtain the confidence intervals of the parameters, it seems reasonable to choose K = 10. With G = 6000 bootstrap replications, we calculate the bootstrap average MLEs, the bootstrap stds of (
2) and two 95 % bootstrap
CIs of (, 1, 2). These results are listed in Table 8.
7.2 Road accident data of Athens
The number of accidents in 24 roads of Athens for the period 19871991 were reported and analyzed by Karlis (2003) with a multivariate Poisson distribution. Since only accidents that caused injuries are included as shown in Table 9, we want to fit the data set by the multivariate ZTCS model.
Let wj = (W1j, . . . , W5j) iid ZTCS(K, ; 1, . . . , 5), where W1j, . . . , W5j denote the average numbers of accidents reported in the j-th road per kilometer from 1987 to 1991, respectively, for j = 1, . . . , n (n = 24). For example, when j = 1, we have tj = t1 = 1.2 and
(W11, . . . , W51) = (11, 33, 25, 23, 6) /1.2.
The unknown parameter K is assumed to be an positive integer. Based on the data in Table 9, for different K values we first calculate the MLEs of and = (1, . . . , 5) by using the EM algorithm (3.3)(3.5) and then calculate the estimated values of the log-likelihood function. These results are reported in Table 10.
Table 7 Finding the value of K by maximizing the log-likelihood function for fitting the data of Table 6 by the multivariate ZTCS distribution
K
2 Log-likelihood 2 0.1220034 1.394314 1.600330 328.8195
3 0.1024329 1.326026 1.531414 328.1680
4 0.0869704 1.281892 1.486834 327.7241
5 0.0748624 1.252965 1.457593 327.4137
8 0.0517887 1.208834 1.412942 326.8897
9 0.0468272 1.200906 1.404915 326.7862
10 0.0427041 1.194675 1.398604 326.7022
14 0.0314939 1.179167 1.382890 326.4824
15 0.0295422 1.176680 1.380369 326.4453
20 0.0225358 1.168154 1.371725 326.3146
30 0.0152633 1.160049 1.363504 326.1831
50 0.0092682 1.153806 1.357169 326.0775
75 0.0062141 1.150805 1.354123 326.0247
100 0.0046735 1.149330 1.352626 325.9982
150 0.0031242 1.147871 1.351145 325.9718
250 0.0018786 1.146717 1.349972 325.9507
350 0.0013431 1.146225 1.349473 325.9416
,
1,
1
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 19 of 21
Table 8 MLEs and confidence intervals of parameters for the students absenteeism data Parameter MLEB stdB 95 % bootstrap CI 95 % bootstrap CI 0.043961 0.017672 [0.009325, 0.078598] [0.007857, 0.078343]
1 1.175550 0.210425 [0.763118, 1.587982] [0.785992, 1.606149] 2 1.377584 0.218819 [0.948699, 1.806468] [0.965662, 1.824147] MLEB: The average MLE for the bootstrap samplesstdB: The sample standard deviation for the bootstrap samplesCI: Normal-based bootstrap CICI: Non-normal-based bootstrap CI
We should choose the K that maximizes the log-likelihood among all K values. From Table 10, we observed that the values of log-likelihood monotonically increase as K .
On the other hand, K must be larger than or equal to max{Wij: 1 i 5, 1 j 24}. From Table 9, we have max{Wij: 1 i 5, 1 j 24} = 52.7. To illustrate how to obtain the confidence intervals of the parameters, it seems reasonable to choose K = 53. With
G = 6000 bootstrap replications, we calculate the bootstrap average MLEs, the bootstrap stds of (
5) and two 95 % bootstrap CIs of (, 1, . . . , 5). These results are reported in Table 11.
Based on the data in Table 9, we calculate the sample correlation coefficient matrix, which is given by
1.0000 0.8038 0.7643 0.8089 0.5746
0.8038 1.0000 0.8326 0.8297 0.4084
0.7643 0.8326 1.0000 0.9058 0.5768
0.8089 0.8297 0.9058 1.0000 0.6557
0.5746 0.4084 0.5768 0.6557 1.0000
Table 9 Accident data of 24 roads in Athens for the period 19871991 (Karlis 2003)
Road j Year Length(km)
1987 1988 1989 1990 1991 tj
Akadimias 1 11 33 25 23 6 1.2
Alexandras 2 41 63 91 77 29 2.6 Amfitheas 3 5 35 44 21 13 2.4 Aharnon 4 44 79 91 88 33 5.5 Vas. Olgas 5 5 3 4 4 0 0.5 Vas. Konstantinou 6 8 15 26 13 7 1.3 Vas. Sofias 7 34 63 81 67 23 2.6 Vouliagmenis 8 17 16 24 24 4 2.1 G Septemvriou 9 16 24 30 30 13 1.7 Galatsioy 10 13 13 15 17 9 1.1 Iera Odos 11 7 15 20 19 8 2.7 Kalirois 12 15 24 39 32 7 2.6 Katehaki 13 2 3 27 24 7 1.4 Kifisias 14 22 23 38 22 11 1.4 Kifisou 15 38 48 60 53 24 7.9 Leof. Kavalas 16 4 6 12 9 3 2.0 Lenorman 17 19 30 37 48 22 2.0 Leof. Athinon 18 15 11 16 21 28 6.1 Mesogeion 19 20 30 33 28 9 1.5P. Ralli 20 13 14 13 17 9 2.6 Panepistimiou 21 24 58 40 36 5 1.1 Patision 22 80 108 114 113 86 4.1 Peiraios 23 86 89 109 90 49 8.0 Sigrou 24 60 61 87 86 29 4.8
,
1, . . . ,
R =
,
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 20 of 21
Table 10 Finding the value of K by maximizing the log-likelihood function for fitting the data of Table 9 by the multivariate ZTCS distribution
K
5 Log-likelihood 2 0.396 8.405 13.520 16.667 14.513 5.262 498.992
3 0.383 8.046 13.161 16.308 14.154 4.903 492.749
4 0.372 7.707 12.822 15.969 13.815 4.564 487.185
5 0.349 7.452 12.567 15.714 13.559 4.309 482.820
8 0.292 6.858 11.973 15.120 12.966 3.715 473.529
9 0.276 6.710 11.825 14.972 12.818 3.567 471.336
10 0.261 6.585 11.700 14.847 12.693 3.442 469.490
14 0.209 6.271 11.387 14.533 12.379 3.128 464.604
15 0.198 6.226 11.341 14.488 12.333 3.083 463.810
20 0.155 6.092 11.207 14.354 12.200 2.949 461.170
30 0.106 5.997 11.112 14.259 12.105 2.854 458.806
50 0.065 5.944 11.059 14.206 12.051 2.800 457.125
51 0.064 5.942 11.057 14.204 12.050 2.799 457.078
52 0.062 5.941 11.056 14.203 12.049 2.798 457.033
53 0.061 5.940 11.055 14.202 12.047 2.797 456.990
54 0.060 5.939 11.054 14.201 12.046 2.795 456.949
75 0.043 5.923 11.038 14.185 12.030 2.779 456.350
100 0.032 5.913 11.028 14.175 12.021 2.770 455.978
150 0.021 5.905 11.020 14.167 12.012 2.762 455.617
250 0.013 5.898 11.014 14.160 12.006 2.755 455.334
350 0.009 5.896 11.011 14.158 12.003 2.753 455.215
1
2
3
4
while the population correlation coefficient matrix , based on (2.6) is estimated to be
=
1.0000 0.2703 0.2444 0.2612 0.4199
0.2703 1.0000 0.1951 0.2085 0.3352
0.2444 0.1951 1.0000 0.1886 0.3031
0.2612 0.2085 0.1886 1.0000 0.3240
0.4199 0.3352 0.3031 0.324 1.0000
.
is very close to R.
Table 11 MLEs and confidence intervals of parameters for the road accident data of Athens Parameter MLEB stdB 95 % bootstrap CI 95 % bootstrap CI 0.0663 0.017 [0.0327, 0.1000] [0.0356, 0.1027]
1 5.8610 1.217 [3.4750, 8.2470] [3.8330, 8.4930] 2 10.926 2.347 [6.3260, 15.526] [7.1450, 16.313]
3 14.118 1.824 [10.543, 17.694] [10.844, 18.022] 4 11.954 1.624 [8.7700, 15.138] [9.2400, 15.478]
5 2.7533 0.604 [1.5676, 3.9390] [1.7988, 4.0889] MLEB: The average MLE for the bootstrap samplesstdB: The sample standard deviation for the bootstrap samplesCI: Normal-based bootstrap CICI: Non-normal-based bootstrap CI
it can be easily seen that
Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 21 of 21
8 Concluding remarks
In this paper, we first proposed the multivariate ZTCS distribution and studied its distributional properties. Since the joint marginal distribution of any r-dimensional sub-vector of the multivariate ZTCS random vector of m-dimensional has certain probability mass function, we then proposed the multivariate ZACS distribution. It is noted that the multivariate ZTCS distribution is a special case of the multivariate ZACS distribution. The EM algorithm is used to obtain the MLEs of the parameters in the multivariate ZACS distribution. The multivariate ZTCS distribution can be used when other distributions, like multivariate zero-truncated Poisson distribution is not a good fit to some real data sets. Meanwhile, the multivariate ZACS distribution, as a more general form, can be used in a much wider range. It can be a good substitute for the Type II multivariate ZTP distribution (Tian et al. 2014).
Endnote
1A discrete random variable X is said to have the general finite distribution, denoted by X Finite(xl, pl; l = 1, . . . , n), if Pr(X = xl) = pl [0, 1] and
nl=1 pl = 1.
Competing interests
The authors declare that they have no competing interests.
Authors contributions
XD contributed to the most of the work, DJ to the simulation studies (Section 6) and the second application (Section 7.2), and GLT was the supervisor and reviewed all work from the initial idea, through preparation of the manuscript until the final version. All authors read and approved the final manuscript.
Received: 15 May 2015 Accepted: 20 July 2015
References
Karlis, D: An EM algorithm for multivariate Poisson distribution and related models. J. Appl. Stat. 30(1), 6377 (2003) Loukas, S: Some methods of estimation for a trivariate Poisson distribution. Zastoswania Math. 21, 503510 (1993) Loukas, S, Papageorgiou, H: On a trivariate Poisson distribution. Appl. Math. 36, 432439 (1991)
Ong, SH: A discrete Charlier series distribution. Biom. J. 30(8), 10031009 (1988)
Papageorgiou, H, Loukas, S: A bivariate discrete Charlier Series distribution. Biom. J. 37(1), 105117 (1995)
Tian, GL, Liu, Y, Tan, MT: Type II multivariate zero-truncated/adjusted Poisson models and their applications. Technical
Report at Department of Statistics and Actuarial Science. The University of Hong Kong (2014)
Submit your manuscript to a journal and benet from:
7 Convenient online submission7 Rigorous peer review7 Immediate publication on acceptance7 Open access: articles freely available online 7 High visibility within the eld7 Retaining the copyright to your article
Submit your next manuscript at 7 springeropen.com
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
The Author(s) 2015
Abstract
Although the univariate Charlier series distribution (Biom. J. 30(8):1003-1009, 1988) and bivariate Charlier series distribution (Biom. J. 37(1):105-117, 1995; J. Appl. Stat. 30(1):63-77, 2003) can be easily generalized to the multivariate version via the method of stochastic representation (SR), the multivariate zero-truncated Charlier series (ZTCS) distribution is not available to date. The first aim of this paper is to propose the multivariate ZTCS distribution by developing its important distributional properties, and providing efficient likelihood-based inference methods via a novel data augmentation in the framework of the expectation-maximization (EM) algorithm. Since the joint marginal distribution of any r-dimensional sub-vector of the multivariate ZTCS random vector of dimension m is an r-dimensional zero-deflated Charlier series (ZDCS) distribution (1[less than or equal to]r<m), it is the second objective of the paper to introduce a new family of multivariate zero-adjusted Charlier series (ZACS) distributions (including the multivariate ZDCS distribution as a special member) with a more flexible correlation structure by accounting for both inflation and deflation at zero. The corresponding distributional properties are explored and the associated maximum likelihood estimation method via EM algorithm is provided for analyzing correlated count data. Some simulation studies are performed and two real data sets are used to illustrate the proposed methods.
Mathematics subject classification primary:62E15; Secondary 62F10
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer