Multivariate zero-truncated/adjusted Charlier

Full text

Turn on search term navigation

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5DOI 10.1186/s40488-015-0029-5

*Correspondence: mailto: [email protected]

Web End [email protected] Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong, P. R. China

METHODOLOGY Open Access

http://crossmark.crossref.org/dialog/?doi=10.1186/s40488-015-0029-5-x&domain=pdf

Web End = Multivariate zero-truncated/adjusted Charlier series distributions with applications

Xiqian Ding, Da Ju and Guo-Liang Tian*

Abstract

Although the univariate Charlier series distribution (Biom. J. 30(8):10031009, 1988) and bivariate Charlier series distribution (Biom. J. 37(1):105117, 1995; J. Appl. Stat. 30(1):6377, 2003) can be easily generalized to the multivariate version via the method of stochastic representation (SR), the multivariate zero-truncated Charlier series (ZTCS) distribution is not available to date. The first aim of this paper is to propose the multivariate ZTCS distribution by developing its important distributional properties, and providing efficient likelihood-based inference methods via a novel data augmentation in the framework of the expectationmaximization (EM) algorithm. Since the joint marginal distribution of any r-dimensional sub-vector of the multivariate ZTCS random vector of dimension m is an r-dimensional zero-deflated Charlier series (ZDCS) distribution (1 r < m), it is the second objective of the paper to introduce a new

family of multivariate zero-adjusted Charlier series (ZACS) distributions (including the multivariate ZDCS distribution as a special member) with a more flexible correlation structure by accounting for both inflation and deflation at zero. The corresponding distributional properties are explored and the associated maximum likelihood estimation method via EM algorithm is provided for analyzing correlated count data. Some simulation studies are performed and two real data sets are used to illustrate the proposed methods.

Mathematics subject classification primary: 62E15; Secondary 62F10 Keywords: EM algorithm; Multivariate zero-adjusted Charlier series distributions;

Multivariate zero-truncated Charlier series distribution; Univariate Charlier series distribution

1 Introduction

The univariate Charlier series (CS) distribution was first introduced by Ong (1988) in the consideration of the conditional distribution of a bivariate Poisson distribution. The CS distribution is a convolution of a binomial variate and a Poisson variate. Let X0

Binomial(K, ), X1 Poisson(), and (X0, X1) be mutually independent (denoted by X0X1). Then a discrete non-negative random variable X is said to follow the CS distribution with parameters K N

= {1, 2, . . . , }, [0, 1) and R+, denoted by X CS(K, ; ), if it can be stochastically represented by X = X0 + X1. Its probability mass function (pmf) is given by

Pr(X = x) =

min(K,x)

k=0

k(1 )Kk xke(x k)!, x = 0, 1, . . . , . (1.1)

2015 Ding et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/

Web End =(http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/

Web End =(http://creativecommons. http://creativecommons.org/publicdomain/zero/1.0/

Web End =org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

K k

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 2 of 21

The mean and variance of X are given by

E(X) = K + and Var(X) = K(1 ) + . (1.2)

Let X1, . . . , Xn

iid

CS(K, ; ) and the observed data be Yobs = {x1, . . . , xn}, where x1, . . . , xn are the realizations of X1, . . . , Xn. Let x and s2 be the sample mean and variance, respectively. Assuming K is known, Ong (1988) derived the moment estimates of the parameters in the univariate CS distribution as follows:

1/2and = x K . (1.3)

Next, Papageorgiou and Loukas (1995) proposed a bivariate CS distribution which arises as the conditional distribution from a trivariate Poisson distribution studied by Loukas and Papageorgiou (1991) and Loukas (1993). Let X0 Binomial(K, ) and

Xi0 ind

Poisson(i), i = 1, 2 and define Xi = X0 + Xi0, i = 1, 2. Then a discrete nonnegative random vector x = (X1, X2) is said to follow a bivariate CS distribution with parameters K N, [0, 1) and i R+, i = 1, 2. We denote it by x CS(K, ; 1, 2).

Its probability generating function, marginal means and the covariance are given by

Gx (z) = E

1/2, 1 = x1 K and 2 = x2 K . (1.6)

In addition, Papageorgiou and Loukas (1995) also discussed the method of ratio of frequencies and the maximum likelihood estimate method.

Although the univariate Charlier series distribution (Ong 1988) and bivariate Charlier series distribution (Karlis 2003, Papageorgiou and Loukas 1995) can be easily generalized to the multivariate version via the method of stochastic representation (SR), the multivariate zero-truncated Charlier series (ZTCS) distribution is not available to date. The first aim of this paper is to propose the multivariate ZTCS distribution by developing its important distributional properties, and providing efficient likelihood-based inference methods via a novel data augmentation in the framework of the expectation maximization (EM) algorithm. Since the joint marginal distribution of any r-dimensional sub-vector of the multivariate ZTCS random vector of dimension m is an r-dimensional zero-deflated Charlier series (ZDCS) distribution (1 r < m), it is the second objective of the paper to introduce a new family of multivariate zero-adjusted Charlier series (ZACS) distributions (including the multivariate ZDCS distribution as a special member) with a more flexible correlation structure by accounting for both inflation and deflation at zero. The corresponding distributional properties are explored and the associated maximum likelihood estimation method via EM algorithm is provided for analyzing correlated count data.

x s2

exp{1(z1 1) + 2(z2 1)} [(1 ) + z1z2]K , (1.4)

E(Xi) = K + i and Cov(X1, X2) = K(1 ), i = 1, 2. (1.5)

Let x1, . . . , xn ind

CS(K, ; 1, 2), where xj = (X1j, X2j) for j = 1, . . . , n and the observed data be Yobs = {x1, . . . , xn}, where x1, . . . , xn are the realizations of x1, . . . , xn. Let x1, x2 be the sample mean for X1 and X2 and m11 be the sample covariance, respectively. Assuming

K is known, Papageorgiou and Loukas (1995) obtained the moment estimates of the three parameters as follows:

zX11zX22

4m11

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 3 of 21

The rest of the paper is organized as follows. In Section 2, the multivariate ZTCS distribution is proposed and some important distributional properties are explored. In Section 3, the likelihood-based methods are developed for the multivariate ZTCS distribution. In Sections 4 and 5, we introduce the multivariate ZACS distribution, explore its distributional properties and provide associated likelihood-based methods for the case of without covariates. In Section 6, some simulation studies are performed to evaluate the proposed methods. In Section 7, two real data sets are used to illustrate the proposed methods. Section 8 provides some concluding remarks.

2 Multivariate zero-truncated Charlier series distribution

Let X00 Binomial(K, ), {Xi0}mi=1 ind Poisson (i), X00{X10, . . . , Xm0} and define

Xi = X00 + Xi0, i = 1, . . . , m.

A discrete non-negative random vector x = (X1, . . . , Xm) is said to follow an m-dimensional CS distribution with parameters K N = {1, 2, . . . , }, [0, 1) and

= (1, . . . , m) Rm+, denoted by x CS (K, ; 1, . . . , m) or x CSm(K, ; ),

accordingly. The joint pmf of x is

Pr(x = x) =

0, with probability ,

w, with probability 1 ,

where U Bernoulli (1 ) with = (1 )Ke+, + =

Let w ZTCSm(K, ; ), then we have Pr(w = 0) = 0 and

w d= x|(x = 0), (2.3)

min(K,x)

k=0

K k

k(1 )Kkm

i=1xikiei(xi k)! = Qx(K, , ), (2.1)

where x = (x1, . . . , xm) , {xi}mi=1 are the corresponding realizations of {Xi}mi=1, and

min(K, x)

= min(K, x1, . . . , xm).

In particular, as K and K remains finitely large (say, 0), the distribution of

Binomial(K, ) tends to the distribution of Poisson(0), so the above m-dimensional CS distribution approaches to the m-dimensional Poisson distribution MP(0, 1, . . . , m). Furthermore, if = 0, then Pr(X00 = 0) = 1 (i.e., X00 follows the degenerate distribution with all mass at zero, denoted by X00 Degenerate(0)) and 0 = 0, so the m-dimensional

CS distribution becomes the product of m independent Poisson(i) distributions.

Motivated by the Type II multivariate zero-truncated Poisson (ZTP) distribution developed recently by Tian et al. (2014), we in this paper propose a new multivariate zero-truncated Charlier series (ZTCS) distribution, whose limiting form reduces to the Type II multivariate ZTP distribution.

Definition 1. Let x CS (K, ; 1, . . . , m). A discrete non-zero random vector w = (W1, . . ., Wm) is said to have the multivariate ZTCS distribution with the parameters (K, ) and = (1, . . . , m) , denoted by w ZTCSm(K, ; ) or w

ZTCS(K, ; 1, . . . , m), if

x d= U w =

(2.2)

mi=1 i = 1, and Uw.

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 4 of 21

where x is specified in Definition 1. The SR (2.3) can be used to generate the ZTCS random vector w via the generation of the random vector x from the multivariate CS distribution, while the SR (2.2) is useful in deriving important distributional properties in the following subsections and in developing an EM algorithm in Section 3.1. Moreover, besides coming from the missing zero vector, the correlation between any two components of w may come from the common random variable

X00 Binomial(K, ).

2.1 Joint probability mass function and mixed moments

From the SR (2.2), the joint pmf of w ZTCSm(K, ; ) is

f (w; K, , ) = Pr(w = w) (2.2)=

Pr(x = w)

Pr(U = 1)

= 1 (1 )Ke

min(K,w)

k=0

K k

k(1 )Kkm

i=1wikiei(wi k)!, (2.4)

where w 1 = 0. From (2.2), it is easy to show that

E(w) =

+K11

1 ,

E(ww ) =

diag()+ +K(11 +11 )+K(1+K)1111

1 ,

Var(w) =

1 1

diag() + K(1 ) 1111

(2.5)

+ K

11 + 11

+ K221111

where 11 = 11m = (1, . . . , 1) . Thus we have

Corr(Wi, Wj) =

K(1 )

1 (i + K)(j + K)

i + K(1 )

1 (i + K)2

j + K(1 )

1 (j + K)2

, (2.6)

for i = j. In particular, when = 0, (2.6) becomes

Corr(Wi, Wj) =

ij (e+ 1 i)(e

+ 1 j)

, i = j.

In (2.6), let i = j = , we obtain

Corr(Wi, Wj) =

K(1 )

1 ( + K)2 + K(1 )

1 ( + K)2

, i = j.

For any r1, . . . , rm 0, the mixed moments of w are given by

i=1 Wrii

(1 )1E

i=1 Xrii

(1 )1E

i=1(X00 + Xi0)ri . (2.7)

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 5 of 21

2.2 Moment generating function

Using the identity of E() = E[E(|U)], the moment generating function (mgf) of x is

Mx(t) = E[exp(t x)] = E[exp(U t w)] = E

E[exp(Ut w)|U]

= E [Mw(Ut)] = Mw(0) + (1 )Mw(t) = + (1 )Mw(t).

Thus the mgf of w ZTCS(K, ; 1, . . . , m) is given by

Mw(t) =

Mx(t) 1

MX00(t+)

mi=1 MXi0(ti) 1

et+ + 1

K exp mi=1 ieti +

(1 )Ke+ 1 (1 )Ke

+ ,

mi=1 ti.

2.3 Marginal distributions2.3.1 Marginal distribution for each random component

Let w = (W1, . . . , Wm) ZTCS(K, ; 1, . . . , m). We first derive the marginal distribution of Wi with realization wi for i = 1, . . . , m. If wi > 0, then

Pr(Wi = wi) =

w1=0

where t+ =

wi1=0

wi+1=0

wm=0Pr(w = w)

= 1 (1 )Ke

+ w1=0

wi1=0

wi+1=0

wm=0

min{K,wi}

k=0

K k

k(1 )Kkm

j=1wjkjej(wj k)!

I(wj k > 0)

= 1 (1 )Ke

min{K,wi}

k=0

K k

k(1 )Kk wikiei (wi k)!

w1=0

wi1=0

wi+1=0

wm=0

j=1,j =iwjkjej(wj k)!

I(wj k > 0)

= 1 (1 )Ke

min{K,wi}

k=0

K k

k(1 )Kkwikiei(wi k)!(2.8)

1 i

= 1 (1 )Ke

min{K,wi}

k=0

K k

k(1 )Kkwikiei(wi k)!, (2.9)

where

1 i =

1 (1 )Kei 1 (1 )Ke

+ . (2.10)

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 6 of 21

Hence,

Pr(Wi = 0) = 1

wi=1Pr(Wi = wi)

(2.8)

= 1

1 (1 )Ke

+ wi=1

min{K,wi}

k=0

K k

k(1 )Kk wikiei (wi k)!

= 1

(2.10)

= i (2.11)

1 (1 )Kei 1 (1 )Ke

(1 )K(ei e+)1 (1 )Ke

0, (1 )Kei

(0, 1).

By combining (2.11) with (2.9) and noting that a ZDCS distribution is a special case of a ZACS distribution (4.2), we obtain

Wi ZDCS(i; K, , i). (2.12)

2.3.2 Marginal distribution for an arbitrary random sub-vector

Second, the marginal distribution for an arbitrary random sub-vector will be considered. Before that, a so-called multivariate zero-adjusted Charlier series distribution is needed to be introduced. We will give the definition of this distribution in Definition 2 in Section 4. We now consider the marginal distributions of w(1) and w(2), where

w(1) =

W1 ...Wr

, w(2) =

Wr+1...

w(1)

w(2)

and w =

Furthermore in Section 4, we will introduce multivariate zero-adjusted Charlier series distribution and it can be shown that

w(1) ZDCS((1); K, , 1, . . . , r) and w(2) ZDCS((2); K, , r+1, . . . , m),(2.13)

where

(i) =

(1 )K

(i)+ e+

1 (1 )Ke

0, (1 )Ke

(i) +

(0, 1), i = 1, 2, (2.14)

(1)+ =

ri=1 i and (2)+ =

mi=r+1 i.

In fact, for any positive integers i1, . . . , ir satisfying 1 i1 < < ir m, we have

Wi1

...

Wir

ZDCS(; K, , i1, . . . , ir), (2.15)

where

(1 )K

e(i1 ++

ir ) e+

1 (1 )Ke

0, (1 )Ke(i

1 ++ir )

(0, 1).

(2.16)

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 7 of 21

2.4 Conditional distributions2.4.1 Conditional distribution of w(1)|w(2)

From (2.4), (2.13) and (4.4), the conditional distribution of w(1)|w(2) is given by

Pr(w(1) = w(1)|w(2) = w(2)) =

f (w; K, , ) Pr(w(2) = w(2))

(2.17)

= (2)I(w(2) = 0) + )

1(1)

1 K e+ Qw(K, , )

(2)

(2) (K, , (2))

I(w(2) = 0)

1(1)

K e

(2)+ Qw

where w(2) = (wr+1, . . . , wm) , (2) = (r+1, . . . , m) and

Qw(K, , ) =

min{K,w}

k=0

K k

k(1 )Kkm

j=1wjkjej (wj k)!,

p=r+1 wplpep (wp p)!.

We first consider Case I: w(2) = 0. Under Case I, it is possible that w(1) = 0 or w(1) = 0. From (2.17), it is easy to obtain

Pr w(1) = w(1)|w(2) = w(2) =

(2)

K, , (2) = min{K,w(2)}

l=0

K l

l(1 )Klm

(1) +

min{K,w}

k=0

K k

k(1 )Kkm

j=1wjkj (wjk)!

. (2.18)

Case II: w(2) = 0. Under Case II, it is obviously that w(1) = 0 and the sharing binomial variable equals to zero. Thus we have

w(1) = w(1)|w(2) = 0 =11 e(1)+r

i=1wiiei wi! .

min{K,w

(2)}

l=0

K l

l(1 )Klm

p=r+1wplp (wpl)!

This implies

w(1)|

w(2) = 0

ZTP(I)(1, . . . , r). (2.19)

2.4.2 Conditional distribution of X0|(w, U)

The stochastic representation (2.2) can be rewritten as

(X1, . . . , Xm) = (X0 + X1, . . . , X0 + Xm) d= Uw,

where X0 Binomial(K, ) and {Xi}mi=1 ind Poisson(i). To obtain the conditional dis

tribution of X0|(w, U), we consider two cases: U = 1 and U = 0. When U = 1, the conditional distribution of X0|(w, U) is given by

Pr(X0 = l|w = w, U = 1) =

Pr(X0 = l, X1 = w1 l, . . . , Xm = wm l) Pr(X1 = w1, . . . , Xm = wm)

K l

l(1 )Klm

i=1wili (wil)!

min(K,w)

k=0

K k

k(1 )Kkm

i=1wiki (wik)!

= ql(w, K, , ), (2.20)

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 8 of 21

for l = 0, 1, . . . , min(K, w), which implying1

X0|(w = w, U = 1) Finite(l, ql(w, K, , ); l = 0, 1, . . . , min(w)). (2.21)

When U = 0, we obtain Pr(X0 = 0|w = w, U = 0) = 1, i.e.,

X0|(w = w, U = 0) Degenerate(0). (2.22)

Hence, for any l, we have

Pr(X0 = l|w = w, U = 0) = I(l = 0). (2.23)

Thus, we have the conditional distribution of X0|(w, U), which is given by the following:

X0|(w, U)

where ql(w, K, , ) is defined by (2.20).

2.4.3 Conditional distribution of X0|w

By using (2.24), the conditional distribution of X0|w is

Pr(X0 = l|w = w) =

1 e+(1 )K

min(K,w)

I(min(w) 1). (2.27)

3 Likelihood-based methods for the multivariate ZTCS distribution

Suppose that wj ind

ZTCS(K, ; 1, . . . , m), where wj = (W1j, . . . , Wmj) for j = 1, . . . , n. Let wj = (w1j, . . . , wmj) denote the realization of the random vector wj, and Yobs =

Finite(l, ql(w, K, , ); l = 0, 1, . . . , min(w)), if U = 1,

Degenerate(0), if U = 0,

(2.24)

u=0

Pr(X0 = l, U = u|w = w)

u=0Pr(U = u|w = w) Pr(X0 = l|w = w, U = u)

= Pr(U = 0) Pr(X0 = l|w = w, U = 0)

+ Pr(U = 1) Pr(X0 = l|w = w, U = 1)

(2.24)

= e+(1 )K I(l = 0)+[1 e+(1 )K] ql(w, K, , )

= pl(w, K, , ), (2.25)

for l = 0, 1, . . . , min(K, w), where ql(w, K, , ) is defined by (2.20). Thus,

X0|(w = w) Finite(l, pl(w, K, , ); l = 0, 1, . . . , min(K, w)). (2.26)

Especially, when min(K, w) = 0, we have X0|(w = w) Degenerate(0). Thus, the conditional expectation of X0|w is given by

E(X0|w = w) =

k=1k

K k

k(1 )Kkm

i=1wiki (wik)!

min(K,w)

k=0

K k

k(1 )Kkm

i=1wiki (wik)!

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 9 of 21

{wj}nj=1 be the observed data. We consider K as a known positive integer. Then, the

observed-data likelihood function for (, ) is

L(, |Yobs) =

j=1e+1 (1 )Ke +

min(K,wj)

k=0

K k

k(1 )Kkm

i=1wijki (wij k)!,

so that the log-likelihood function is (, |Yobs) = n log

e+ (1 )K

j=1log

min(K,wj)

k=0

K k

k(1 )Kkm

i=1wijki (wij k)!

. (3.1)

3.1 MLEs via the EM algorithm

The SR (2.2) can motivate a novel EM algorithm, where some latent variables are independent of the observed variables. For each wj = (w1j, . . . , wmj) , we introduce latent variables Uj

iid

Bernoulli(1 ) with = (1 )Ke+, X0j

iid

Binomial(K, ),

Xij iid

Poisson(i) for i = 1, . . . , m, and X0jXij, such that

x0j + x1j, . . . , x0j + xmj = ujwj,

where uj and xij denote the realizations of Uj and Xij, respectively. We denote the latent/missing data by Ymis =

uj, x0j, x1j, . . . , xmj

nj=1, so that the complete data are

Ycom = Yobs Ymis =

wj, uj, x0j, x1j, . . . , xmj

n j=1

x0j, x1j, . . . , xmj

nj=1 = x0j, uj, wj

n j=1 ,

where xij = ujwij x0j for j = 1, . . . , n and i = 1, . . . , m. Thus, the complete-data likelihood function is given by

L(, |Ycom) =

j=1

K x0j

x0j (1 )Kx0jm

i=1xiji ei xij!

j=1

K x0j

x0j (1 )Kx0jm

i=1ujwijx0ji ei

ujwij x0j !

0 (1 )nKnx

i=1

nj=1 ujwijnx0i eni, (3.2)

where x0 = (1/n)

nj=1 x0j. The complete-data log-likelihood function is

(, |Ycom) = nx0 log +

nK nx0

log(1)+

i=1

j=1ujwij nx0

log i ni

nj=1 ujwijn K , i = 1, . . . , m, (3.3)

and the E-step is to replace {uj}nj=1 and

x0j

The M-step is to calculate the complete-data maximum likelihood estimates (MLEs):

= x0

K and

i =

nj=1 in (3.3) by their conditional expectations:

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 10 of 21

E(Uj|Yobs, , ) = E(Uj) = 1 (1 )Ke+, and (3.4)

E(X0j|Yobs, , ) (2.27)=

[1 (1 )Ke+]

min(K,wj)

kj=1kj

K kj

kj(1 )Kkjm

i=1wijkji (wijkj)!

I(min(wj) 1), (3.5)

respectively. An important feature of this EM algorithm is that the latent variables {Uj}nj=1 are independent of the observed variables {wj}nj=1.

Also note that here we assume that K is a known positive integer. In practice, since K {1, 2, . . . , N}, say N = 100. For a given K, we first use the EM iteration (3.3)(3.5) to find the MLEs of and , denoted by

and

. Then, we can calculate (

min(K,wj)

kj=0

K kj

kj(1 )Kkjm

|Yobs) and

choose the K that maximizes (

|Yobs).

3.2 Bootstrap confidence intervals

When other approaches are not available, the bootstrap method is a useful tool to find confidence intervals (CIs) for an arbitrary function of (, ), say, = h(, ). Let (

)

be the MLEs of (, ) calculated by the EM algorithm (3.3)(3.5), then

= h(

) is the

MLE of . Based on (

), we can generate wj iid

ZTCS(K,

1, . . . ,

m) via the SR (2.2)

for j = 1, . . . , n. Having obtained Yobs =

+w1, . . . , wn

,, we can calculate the bootstrap

replication ( , ) and get = h( , ). Independently repeating this process G times,

we obtain G bootstrap replications

Gg=1. Consequently, the standard error, se( ), of

can be estimated by the sample standard deviation of the G replications, i.e.,

-se( ) =

1G 1

g=1

g ( 1 + + G)/G

1/2

. (3.6)

Gg=1 is approximately normally distributed, the first(1 )100 % bootstrap CI for

z/2

-se( ), + z/2

-se( )

. (3.7)

Alternatively, if

Gg=1 is non-normally distributed, the second(1 )100 % bootstrap

CI of can be obtained as

[

U] , (3.8)

where

L and

U are the 100(/2) and 100(1 /2) percentiles of

G g=1, respectively.

4 Multivariate zero-adjusted Charlier series distribution

To introduce the multivariate zero-adjusted Charlier series (ZACS) distribution, we first define the univariate ZACS distribution. A non-negative discrete random variable Y is said to have a ZACS distribution with parameters [0, 1) and > 0, denoted by Y

ZACS(, K, , ), if

= Z W, (4.1)

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 11 of 21

where Z Bernoulli(1 ), W ZTCS(K, , ), and Z W. It is clear that the pmf of Y is given by

Pr(Y = y) = I(y = 0) +

)

1 (1 )Ke

Qy(K, , )

I(y = 0), (4.2)

where

Qy(K, , ) =

k(1 )Kk yke (y k)!.

Motivated by (4.1), naturally, we have the following multivariate generalization.

Definition 2. A discrete random vector y = (Y1, . . . , Ym) is said to have the multivariate ZACS distribution with parameters [0, 1), K > 0, [0, 1)

and = (1, . . . , m) Rm+, denoted by y ZACSm( ; K, , ) or y ZACS( ; K, , 1, . . . , m), if

y d= Z w =

0, with probability ,w, with probability 1 ,

min(K, y)

k=0

K k

(4.3)

where Z Bernoulli(1), w ZTCS(K, ; 1, . . . , m), and Z w. The random vector w is called the base vector of the y.

It is easy to show that the joint pmf of y ZACS( ; K, , 1, . . . , m) is

Pr(y = y) = I(y = 0) +

)

1 (1 )Ke

+ Qy

(K, , )

I(y = 0), (4.4)

where

Qy(K, , ) =

min(K,y)

k=0

K k

k(1 )Kkm

i=1yikiei (yi k)!.

We consider several special cases of (4.3) or (4.4):

(i) If = 0, then y d= w ZTCS(K, ; 1, . . . , m), i.e., the multivariate ZTCS distribution is a special member of the family of the multivariate ZACS distributions. Thus, we can see that studying the multivariate ZTCS distribution is a basis for studying the multivariate ZACS distribution;

(ii) If (0, (1 )Ke+), then y follows the multivariatezero-deflated Charlier series (ZDCS) distribution with parameters (, K, , ), denoted by y ZDCSm(; K, , )

or y ZDCS(; K, , 1, . . . , m);(iii) If = (1 )Ke+, then y CSm(K, ; );(iv) If ((1 )Ke+, 1), then y follows the multivariate zero-inflated Charlier series (ZICS) distribution with parameters (, K, , ), denoted by y ZICSm( ; K, , )

or y ZICS( ; K, , 1, . . . , m).

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 12 of 21

4.1 Mixed moments and moment generating function

From (4.1) and (2.2), we immediately have

E(y) =

1 ( + K 11),

E(yy ) =

diag() + + K(11 + 11 ) + K(1 + K) 1111 ,

Var(y) =

diag() + K(1 ) 1111

+ K(11 + 11 ) + K221111 .

(4.5)

Thus, we have

Corr(Yi, Yj) =

K(1 ) (i + K)(j + K)( )/(1 )

i + K(1 ) 1 (i + K)2

j + K(1 ) 1 (j + K)2

for i = j. In particular, if = 0, we obtain

Corr(Yi, Yj) =

ij( )/(1 )

i 2i( )/(1 )

j 2j( )/(1 )

Furthermore, if i = j = , then

Corr(Yi, Yj) =

( )/(1 )

[1 ( )/(1 )]

Clearly, Corr(Yi, Yj) could be either positive or negative, which depend on the values of , K, and .

For any r1, . . . , rm 0, the mixed moments of y are given by

i=1 Yrii

(1 )E

i=1 Wrii

i=1 Xrii

. (4.6)

By using the formula of E() = E[ E(|Z )], the mgf of y is

My(t) = E[ exp(t y)] = E[ exp(Z t w)] = E

E[ exp(Z t w)|Z ]

= E[ Mw(Z t)] = Mw(0) + (1 )Mw(t) = + (1 )Mw(t)

= +

(et+ + 1 )K exp m

i=1ieti + (1 )Ke+ , (4.7)

where t+ =

mi=1 ti.

4.2 Marginal distributions

Now we consider the marginal distributions of y(1) and y(2), where

y(1) =

Y1 ...Yr

, y(2) =

Yr+1

...

y(1)

y(2)

and y =

Based on (4.1) and (2.13), we have

y(k)

= Z Z(k)(k), k = 1, 2,

= Z w(k)

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 13 of 21

where Z Bernoulli(1 ), Z(k) Bernoulli(1 (k)), (k) is given by (2.14), (1) ZTCS(K, ; 1, . . . , r) and (2) ZTCS(K, ; r+1, . . . , m). Note that Z Z(k)(k) and

Z Z(k) Bernoulli((1 )(1 (k))). According to the SR (4.3), we can obtain

y(1) ZACS((1); K, , 1, . . . , r) and y(2) ZACS((2); K, , r+1, . . . , m),(4.8)

where

(k) = 1(1)(1(k)) = 1(1)

1 (1 )Ke

(k) +

1 (1 )Ke

(0, 1), k = 1, 2, (4.9)

(1)+ =

ri=1 i and (2)+ =

mi=r+1 i.

In fact, for any positive integers i1, . . . , ir satisfying 1 i1 < < ir m, we have

Yi1

...

Yir

ZACS(; K, , i1, . . . , ir), (4.10)

where is given by (2.16) and

= 1 (1 )(1 ) = 1 (1 )

1 (1 )Ke(i

1 ++ir )

1 (1 )Ke

(0, 1). (4.11)

4.3 Conditional distributions4.3.1 Conditional distribution of y(1)|y(2)

From (4.4) and (4.8), the conditional distribution of y(1)|y(2) is given by

Pr(y(1) = y(1)|y(2) = y(2)) =

Pr(y = y)

Pr(y(2) = y(2))

I(y = 0) + R(y, K, , , )I(y = 0)

= (2)I(y(2) = 0) + S(y(2), K, , , (2))I(y(2) = 0)

. (4.12)

where

R(y, K, , , ) =

1 (1 )Ke

min(K,y)

k=0

K k

k(1 )Kkm

i=1yikiei(yi k)!and

1 (2)

1 (1 )Ke

i=r+1 yikiei (yi k)!.

We first consider Case I: y(2) = 0. Under Case I, it is clear that y = 0. From (4.12), it is easy to obtain

y(1) = y(1)|y(2) = y(2) =

y(2), K, , , (2)

(2) +

min(K,y(2))

k=0

K k

k(1 )Kkm

(1) +

min{K,y}

k=0

K k

k(1 )Kkm

j=1yjkj (yjk)!

min{K,y

(2)}

l=0

K l

l(1 )Klm

p=r+1yplp (ypl)!

Case II: y(2) = 0. Under Case II, it is possible that y(1) = 0 or y(1) = 0. When y(1) = 0, from (4.12), we obtain

Pr(y(1) = 0|y(2) = 0) =

(2) .

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 14 of 21

When y(1) = 0, from (4.12), we have

Pr(y(1) = y(1)|y(2) = 0) =

(1 )(1 )Ke

(2) +

(2)(1 (1 )Ke

+ )

i=1yiiei yi! .

4.3.2 Conditional distribution of Z |y

Since Z Bernoulli(1 ), Z only takes the value 0 or 1. Note that y = 0 is equivalent to Z = 0. Thus, Pr(Z = 0|y = 0) = Pr(Z = 0)/ Pr(y = 0) = 1. And when y = 0, we have Pr(Z = 1|y = y) = Pr(Z = 1, w = y)/ Pr(y = y) = 1. Therefore,

Z |(y = y)

Degenerate(0), if y = 0,

Degenerate(1), if y = 0,

(4.13)

i.e., Z |(y = y) Degenerate(I(y = 0)).

4.3.3 Conditional distribution of w|(y = y = 0)

If y = 0, we have

Pr(w = w|y = y) =

Pr(w = w, y = y)

Pr(y = y) =

Pr(w = y, Z = 1)Pr(y = y) =

I(w = y).

Thus, given y = y = 0, we have

w|(y = y = 0) Degenerate(y). (4.14)

5 Likelihood-based methods for multivariate ZACS distribution without covariates

Suppose that yj

iid

ZACS( ; K, , 1, . . . , m), where yj = (Y1j, . . . , Ymj) for j = 1, . . . , n. Let yj = (y1j, . . . , ymj) denote the realization of the random vector yj, and Yobs = {yj}nj=1 be the observed data. Furthermore, let

J = {j|yj = 0, j = 1, . . . , n} and m0 =

nj=1 I(yj = 0) denote the number of elements in J. We assume that K is a known positive integer. Therefore, the observed-data likelihood function is proportional to

L(, , |Yobs)

m0(1 )nm0

K kj

kj(1 )Kkjm

i=1yijkji (yij kj)!

min(K,yj)

kj=0

e+1 (1 )Ke

Thus, we can write the log-likelihood function into two parts:

(, , |Yobs) = 1(|Yobs) + 2(, |Yobs), (5.1)

where

1(|Yobs) = m0 log + (n m0) log(1 ) and

2(, |Yobs) = (n m0)

+ + log[ 1 (1 )Ke+]

J log

min(K,yj)

kj=0

K kj

kj(1 )Kkjm

i=1yijkji (yij kj)!

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 15 of 21

In other words, the parameter and the parameter vector (, ) can be estimated separately. Obviously, the MLE of has an explicit solution

m0n , (5.2) but the closed-form MLEs of (, ) are not yet available.

5.1 MLEs via the EM algorithm and bootstrap CIs

The objective of this section is to find the MLEs of (, ) based on (5.1). For the log-likelihood function (3.1), the corresponding EM iteration for finding the MLEs of (, ) is defined by (3.3)(3.5). By comparing (3.1) with (5.1), if we replace (

nj=1 wij) in (3.1)

with (

J yij), we promptly obtain the MLEs of (, ) by using the EM algorithm. The M-step is to calculate the complete-data MLEs:

J x0j (n m0)K

and

i =

J ujyij(n m0)

, i = 1, . . . , m, (5.3)

and the E-step is to replace {uj}j/J and {x0j}j/J in (5.3) by their conditional expectations:

E(Uj|Yobs, , ) = E(Uj) = 1 (1 )Ke+, and (5.4)

E(X0j|Yobs, , ) =

[ 1 (1 )Ke+]

min(K,yj)

kj=1

K kj

kj(1 )Kkjm

i=1yijkji (yijkj)!

I(min(yj) 1), (5.5)

min(K,yj)

kj=0

K kj

kj(1 )Kkjm

respectively.

The procedure of constructing bootstrap CIs for an arbitrary function of (, , ), say = h(, , ), is very similar to that presented in Section 3.2.

6 Simulation studies

To evaluate the performance of the proposed methods in Section 3, we investigate the accuracy of MLEs and confidence interval estimators of the parameters in the multivariate ZTCS distribution. We consider two cases for the dimension with m = 2 and m = 3.

6.1 Experiment 1: m = 2

When m = 2, the parameters (K, ; 1, 2) are set to be (5, 0.5; 3, 5). We generate {wj}nj=1 iid ZTCS(K, ; 1, . . . , m) with n = 200. Based on this simulated data set, for

different K values we first calculate the MLEs of and (1, 2) by using the EM algorithm (3.3)(3.5) and then calculate the estimated log-likelihood. We choose K = 5 that maximizes the log-likelihood among all K values. These results are reported in Table 1.

Table 1 Finding the value of K by maximizing the log-likelihood function for m = 2

2 Log-likelihood 3 0.66742 3.4739056 5.4464959 884.6629

4 0.58617 3.1314781 5.1040629 883.4288

5 0.51145 2.9188539 4.8914319 883.0151

6 0.44308 2.8176227 4.7901937 883.1228

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 16 of 21

For this fixed value of K = 5, we first calculate the MLEs of (, 1, 2) by using the EM algorithm (3.3)(3.5), the bootstrap standard deviations (stds) of these MLEs, the corresponding mean square errors (MSEs) and two 95 % bootstrap confidence intervals (CIs) of these parameters with G = 1000 by the bootstrap method presented in Section3.2. Then, we independently repeat the above process 1000 times. The resulting average MLE, std, MSE and two coverage probabilities (CPs) based on the normal-based and non-normal-based bootstrap samples, respectively, are displayed in Table 2.

From Table 2, we can see that the average MSE of

is very small while the average

2) are reasonably small. The two bootstrap coverage probabilities are close to but less than 0.95.

6.2 Experiment 2: m = 3

When m = 3, the parameters (K, ; 1, 2, 3) are set to be (4, 0.3; 2, 4, 6). We generate {wj}nj=1 iid ZTCS(K, ; 1, . . . , m) with n = 200. Based on this simulated data set, for

different K values we first calculate the MLEs of and (1, 2, 3) by using the EM algorithm (3.3)(3.5) and then calculate the estimated log-likelihood. We choose K = 4 that maximizes the log-likelihood among all K values. These results are reported in Table 3.

For this fixed value of K = 4, we first calculate the MLEs of (, 1, 2, 3) by using the EM algorithm (3.3)(3.5), the bootstrap stds of these MLEs, the corresponding MSEs and two 95 % bootstrap CIs of these parameters with G = 1000 by the bootstrap method presented in Section 3.2. Then, we independently repeat the above process 1000 times.

The resulting average MLE, std, MSE and two CPs based on the normal-based and non-normal-based bootstrap samples, respectively, are displayed in Table 4.

From Table 4, we can see that the average MSEs of

3) are very small. The

7 Two real examples7.1 Students absenteeism data

In this section, we use the data set on the number of absences of 113 students from a lecture course in two successive semesters reported by Karlis (2003) to illustrate the proposed statistical methods for the multivariate ZTCS distribution. Let W1 denote the number of absences in the first semester and W2 denote the number of absences in the second semester. The data are displayed in Table 5 below.

For the purpose of illustration, we artificially remove the (0, 0) cell counts from Table 5 and the updated data are shown in Table 6.

Let wj = (W1j, W2j)

iid

ZTCS(K, ; 1, 2) for j = 1, . . . , n with n = 98. Let wj = (w1j, w2j) denote the realization of the random vector wj, and Yobs = {wj}nj=1 be

the observed data. The parameter K of the binomial distribution is considered unknown

Table 2 The average MLE, std, MSE and two CPs of (, 1, 2) for m = 2 and K = 5

Parameter True value Average MLE Average stdB Average MSE CP CP 0.5 0.511457 0.06417 0.004209 0.930 0.932 1 3 2.918853 0.33051 0.114730 0.927 0.932 2 5 4.891431 0.35926 0.139568 0.921 0.928 stdB: The sample standard deviation for the bootstrap samples

CP: Normal-based bootstrap CP

CP: Non-normal-based bootstrap CP

MSEs of (

and (

two bootstrap coverage probabilities are close to 0.95.

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 17 of 21

Table 3 Finding the value of K by maximizing the log-likelihood function for m = 3

3 Log-likelihood 3 0.4037377 1.9887834 4.2987810 5.9287793 650.5100604

4 0.3203847 1.9184570 4.2284540 5.8584519 649.8799727

5 0.2583191 1.9084000 4.2183967 5.8483944 650.1021029

6 0.2137641 1.9174109 4.2274075 5.8574051 650.1044541

Table 4 The average MLE, std, MSE and two CPs of (, 1, 2) for m = 3 and K = 4

Parameter True value Average MLE Average stdB Average MSE CP CP 0.3 0.320384 0.0541052 0.00295 0.937 0.939 1 2 1.918457 0.222460 0.05689 0.925 0.932 2 4 4.228454 0.242082 0.06476 0.921 0.925 3 6 5.858451 0.255456 0.09402 0.954 0.948 stdB: The sample standard deviation for the bootstrap samples

CP: Normal-based bootstrap CP

CP: Non-normal-based bootstrap CP

Table 5 Cross-tabulation of the students absenteeism data (Karlis 2003)

W1\W2 0 1 2 3 4 5 6 7 8 Total0 15 10 4 4 2 0 0 0 0 35 1 6 11 9 4 2 0 0 0 0 32 2 5 7 6 5 0 0 0 0 0 23 3 1 3 2 4 3 1 0 0 0 14 4 1 0 2 0 1 0 0 0 0 4 5 0 0 0 0 0 1 1 0 0 2 6 0 0 0 0 0 0 2 0 0 2 7 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 9 1 0 0 0 0 0 0 0 1 1

Total 29 31 23 17 8 2 3 0 0 113

Table 6 The number of absences of 113 students from a course in two successive semesters without the (0, 0) cell counts (Karlis 2003)

W1\W2 0 1 2 3 4 5 6 7 8 Total0 10 4 4 2 0 0 0 0 20 1 6 11 9 4 2 0 0 0 0 32 2 5 7 6 5 0 0 0 0 0 23 3 1 3 2 4 3 1 0 0 0 14 4 1 0 2 0 1 0 0 0 0 4 5 0 0 0 0 0 1 1 0 0 2 6 0 0 0 0 0 0 2 0 0 2 7 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 1 9 1 0 0 0 0 0 0 0 1 1

Total 14 31 23 17 8 2 3 0 0 98

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 18 of 21

and it is attempted to estimate this. Based on the data in Table 6, for different K values we first calculate the MLEs of and (1, 2) by using the EM algorithm (3.3)(3.5) and then calculate the estimated values of the log-likelihood function. These results are reported in Table 7.

We should choose the K that maximizes the log-likelihood among all K values. From Table 7, we observed that the values of log-likelihood monotonically increase as K .

On the other hand, K must be larger than or equal to max(W1, W2). From Table 6, we have max(W1, W2) = 9. To illustrate how to obtain the confidence intervals of the parameters, it seems reasonable to choose K = 10. With G = 6000 bootstrap replications, we calculate the bootstrap average MLEs, the bootstrap stds of (

2) and two 95 % bootstrap

CIs of (, 1, 2). These results are listed in Table 8.

7.2 Road accident data of Athens

The number of accidents in 24 roads of Athens for the period 19871991 were reported and analyzed by Karlis (2003) with a multivariate Poisson distribution. Since only accidents that caused injuries are included as shown in Table 9, we want to fit the data set by the multivariate ZTCS model.

Let wj = (W1j, . . . , W5j) iid ZTCS(K, ; 1, . . . , 5), where W1j, . . . , W5j denote the average numbers of accidents reported in the j-th road per kilometer from 1987 to 1991, respectively, for j = 1, . . . , n (n = 24). For example, when j = 1, we have tj = t1 = 1.2 and

(W11, . . . , W51) = (11, 33, 25, 23, 6) /1.2.

The unknown parameter K is assumed to be an positive integer. Based on the data in Table 9, for different K values we first calculate the MLEs of and = (1, . . . , 5) by using the EM algorithm (3.3)(3.5) and then calculate the estimated values of the log-likelihood function. These results are reported in Table 10.

Table 7 Finding the value of K by maximizing the log-likelihood function for fitting the data of Table 6 by the multivariate ZTCS distribution

2 Log-likelihood 2 0.1220034 1.394314 1.600330 328.8195

3 0.1024329 1.326026 1.531414 328.1680

4 0.0869704 1.281892 1.486834 327.7241

5 0.0748624 1.252965 1.457593 327.4137

8 0.0517887 1.208834 1.412942 326.8897

9 0.0468272 1.200906 1.404915 326.7862

10 0.0427041 1.194675 1.398604 326.7022

14 0.0314939 1.179167 1.382890 326.4824

15 0.0295422 1.176680 1.380369 326.4453

20 0.0225358 1.168154 1.371725 326.3146

30 0.0152633 1.160049 1.363504 326.1831

50 0.0092682 1.153806 1.357169 326.0775

75 0.0062141 1.150805 1.354123 326.0247

100 0.0046735 1.149330 1.352626 325.9982

150 0.0031242 1.147871 1.351145 325.9718

250 0.0018786 1.146717 1.349972 325.9507

350 0.0013431 1.146225 1.349473 325.9416

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 19 of 21

Table 8 MLEs and confidence intervals of parameters for the students absenteeism data Parameter MLEB stdB 95 % bootstrap CI 95 % bootstrap CI 0.043961 0.017672 [0.009325, 0.078598] [0.007857, 0.078343]

1 1.175550 0.210425 [0.763118, 1.587982] [0.785992, 1.606149] 2 1.377584 0.218819 [0.948699, 1.806468] [0.965662, 1.824147] MLEB: The average MLE for the bootstrap samplesstdB: The sample standard deviation for the bootstrap samplesCI: Normal-based bootstrap CICI: Non-normal-based bootstrap CI

We should choose the K that maximizes the log-likelihood among all K values. From Table 10, we observed that the values of log-likelihood monotonically increase as K .

On the other hand, K must be larger than or equal to max{Wij: 1 i 5, 1 j 24}. From Table 9, we have max{Wij: 1 i 5, 1 j 24} = 52.7. To illustrate how to obtain the confidence intervals of the parameters, it seems reasonable to choose K = 53. With

G = 6000 bootstrap replications, we calculate the bootstrap average MLEs, the bootstrap stds of (

5) and two 95 % bootstrap CIs of (, 1, . . . , 5). These results are reported in Table 11.

Based on the data in Table 9, we calculate the sample correlation coefficient matrix, which is given by

1.0000 0.8038 0.7643 0.8089 0.5746

0.8038 1.0000 0.8326 0.8297 0.4084

0.7643 0.8326 1.0000 0.9058 0.5768

0.8089 0.8297 0.9058 1.0000 0.6557

0.5746 0.4084 0.5768 0.6557 1.0000

Table 9 Accident data of 24 roads in Athens for the period 19871991 (Karlis 2003)

Road j Year Length(km)

1987 1988 1989 1990 1991 tj

Akadimias 1 11 33 25 23 6 1.2

Alexandras 2 41 63 91 77 29 2.6 Amfitheas 3 5 35 44 21 13 2.4 Aharnon 4 44 79 91 88 33 5.5 Vas. Olgas 5 5 3 4 4 0 0.5 Vas. Konstantinou 6 8 15 26 13 7 1.3 Vas. Sofias 7 34 63 81 67 23 2.6 Vouliagmenis 8 17 16 24 24 4 2.1 G Septemvriou 9 16 24 30 30 13 1.7 Galatsioy 10 13 13 15 17 9 1.1 Iera Odos 11 7 15 20 19 8 2.7 Kalirois 12 15 24 39 32 7 2.6 Katehaki 13 2 3 27 24 7 1.4 Kifisias 14 22 23 38 22 11 1.4 Kifisou 15 38 48 60 53 24 7.9 Leof. Kavalas 16 4 6 12 9 3 2.0 Lenorman 17 19 30 37 48 22 2.0 Leof. Athinon 18 15 11 16 21 28 6.1 Mesogeion 19 20 30 33 28 9 1.5P. Ralli 20 13 14 13 17 9 2.6 Panepistimiou 21 24 58 40 36 5 1.1 Patision 22 80 108 114 113 86 4.1 Peiraios 23 86 89 109 90 49 8.0 Sigrou 24 60 61 87 86 29 4.8

1, . . . ,

R =

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 20 of 21

Table 10 Finding the value of K by maximizing the log-likelihood function for fitting the data of Table 9 by the multivariate ZTCS distribution

5 Log-likelihood 2 0.396 8.405 13.520 16.667 14.513 5.262 498.992

3 0.383 8.046 13.161 16.308 14.154 4.903 492.749

4 0.372 7.707 12.822 15.969 13.815 4.564 487.185

5 0.349 7.452 12.567 15.714 13.559 4.309 482.820

8 0.292 6.858 11.973 15.120 12.966 3.715 473.529

9 0.276 6.710 11.825 14.972 12.818 3.567 471.336

10 0.261 6.585 11.700 14.847 12.693 3.442 469.490

14 0.209 6.271 11.387 14.533 12.379 3.128 464.604

15 0.198 6.226 11.341 14.488 12.333 3.083 463.810

20 0.155 6.092 11.207 14.354 12.200 2.949 461.170

30 0.106 5.997 11.112 14.259 12.105 2.854 458.806

50 0.065 5.944 11.059 14.206 12.051 2.800 457.125

51 0.064 5.942 11.057 14.204 12.050 2.799 457.078

52 0.062 5.941 11.056 14.203 12.049 2.798 457.033

53 0.061 5.940 11.055 14.202 12.047 2.797 456.990

54 0.060 5.939 11.054 14.201 12.046 2.795 456.949

75 0.043 5.923 11.038 14.185 12.030 2.779 456.350

100 0.032 5.913 11.028 14.175 12.021 2.770 455.978

150 0.021 5.905 11.020 14.167 12.012 2.762 455.617

250 0.013 5.898 11.014 14.160 12.006 2.755 455.334

350 0.009 5.896 11.011 14.158 12.003 2.753 455.215

while the population correlation coefficient matrix , based on (2.6) is estimated to be

1.0000 0.2703 0.2444 0.2612 0.4199

0.2703 1.0000 0.1951 0.2085 0.3352

0.2444 0.1951 1.0000 0.1886 0.3031

0.2612 0.2085 0.1886 1.0000 0.3240

0.4199 0.3352 0.3031 0.324 1.0000

is very close to R.

Table 11 MLEs and confidence intervals of parameters for the road accident data of Athens Parameter MLEB stdB 95 % bootstrap CI 95 % bootstrap CI 0.0663 0.017 [0.0327, 0.1000] [0.0356, 0.1027]

1 5.8610 1.217 [3.4750, 8.2470] [3.8330, 8.4930] 2 10.926 2.347 [6.3260, 15.526] [7.1450, 16.313]

3 14.118 1.824 [10.543, 17.694] [10.844, 18.022] 4 11.954 1.624 [8.7700, 15.138] [9.2400, 15.478]

5 2.7533 0.604 [1.5676, 3.9390] [1.7988, 4.0889] MLEB: The average MLE for the bootstrap samplesstdB: The sample standard deviation for the bootstrap samplesCI: Normal-based bootstrap CICI: Non-normal-based bootstrap CI

it can be easily seen that

Ding et al. Journal of Statistical Distributions and Applications (2015) 2:5 Page 21 of 21

8 Concluding remarks

In this paper, we first proposed the multivariate ZTCS distribution and studied its distributional properties. Since the joint marginal distribution of any r-dimensional sub-vector of the multivariate ZTCS random vector of m-dimensional has certain probability mass function, we then proposed the multivariate ZACS distribution. It is noted that the multivariate ZTCS distribution is a special case of the multivariate ZACS distribution. The EM algorithm is used to obtain the MLEs of the parameters in the multivariate ZACS distribution. The multivariate ZTCS distribution can be used when other distributions, like multivariate zero-truncated Poisson distribution is not a good fit to some real data sets. Meanwhile, the multivariate ZACS distribution, as a more general form, can be used in a much wider range. It can be a good substitute for the Type II multivariate ZTP distribution (Tian et al. 2014).

Endnote

1A discrete random variable X is said to have the general finite distribution, denoted by X Finite(xl, pl; l = 1, . . . , n), if Pr(X = xl) = pl [0, 1] and

nl=1 pl = 1.

Competing interests

The authors declare that they have no competing interests.

Authors contributions

XD contributed to the most of the work, DJ to the simulation studies (Section 6) and the second application (Section 7.2), and GLT was the supervisor and reviewed all work from the initial idea, through preparation of the manuscript until the final version. All authors read and approved the final manuscript.

Received: 15 May 2015 Accepted: 20 July 2015

References

Karlis, D: An EM algorithm for multivariate Poisson distribution and related models. J. Appl. Stat. 30(1), 6377 (2003) Loukas, S: Some methods of estimation for a trivariate Poisson distribution. Zastoswania Math. 21, 503510 (1993) Loukas, S, Papageorgiou, H: On a trivariate Poisson distribution. Appl. Math. 36, 432439 (1991)

Ong, SH: A discrete Charlier series distribution. Biom. J. 30(8), 10031009 (1988)

Papageorgiou, H, Loukas, S: A bivariate discrete Charlier Series distribution. Biom. J. 37(1), 105117 (1995)

Tian, GL, Liu, Y, Tan, MT: Type II multivariate zero-truncated/adjusted Poisson models and their applications. Technical

Report at Department of Statistics and Actuarial Science. The University of Hong Kong (2014)

Submit your manuscript to a journal and benet from:

7 Convenient online submission7 Rigorous peer review7 Immediate publication on acceptance7 Open access: articles freely available online 7 High visibility within the eld7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com

Word count: 8619

Show less

The Author(s) 2015

Abstract

Translate

Although the univariate Charlier series distribution (Biom. J. 30(8):1003-1009, 1988) and bivariate Charlier series distribution (Biom. J. 37(1):105-117, 1995; J. Appl. Stat. 30(1):63-77, 2003) can be easily generalized to the multivariate version via the method of stochastic representation (SR), the multivariate zero-truncated Charlier series (ZTCS) distribution is not available to date. The first aim of this paper is to propose the multivariate ZTCS distribution by developing its important distributional properties, and providing efficient likelihood-based inference methods via a novel data augmentation in the framework of the expectation-maximization (EM) algorithm. Since the joint marginal distribution of any r-dimensional sub-vector of the multivariate ZTCS random vector of dimension m is an r-dimensional zero-deflated Charlier series (ZDCS) distribution (1[less than or equal to]r<m), it is the second objective of the paper to introduce a new family of multivariate zero-adjusted Charlier series (ZACS) distributions (including the multivariate ZDCS distribution as a special member) with a more flexible correlation structure by accounting for both inflation and deflation at zero. The corresponding distributional properties are explored and the associated maximum likelihood estimation method via EM algorithm is provided for analyzing correlated count data. Some simulation studies are performed and two real data sets are used to illustrate the proposed methods.

Mathematics subject classification primary:62E15; Secondary 62F10

Details

Title

Multivariate zero-truncated/adjusted Charlier series distributions with applications

Author

Ding, Xiqian; Ju, Da; Tian, Guo-liang

Pages

1-21

Publication year

2015

Publication date

Aug 2015

Publisher

Springer Nature B.V.

e-ISSN

21955832

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1186/s40488-015-0029-5

ProQuest document ID

1703733639

The Author(s) 2015

Multivariate zero-truncated/adjusted Charlier series distributions with applications

Jump to:

Full text

Abstract

Details

Suggested sources