A New Bivariate INAR(1) Model with Time-Dependent

Full text

Turn on search term navigation

1. Introduction

Bivariate count data occur in many contexts, often as the counts of two events, objects or individuals during a certain period of time. For example, such counts occur in epidemiology when two kinds of related diseases are examined, in criminology when two kinds of crimes are committed, in business when the volume of sales of two correlated products are observed or in manufacturing when two similar products are produced.

In real application, the observed time series data are often discrete, over-dispersed (the empirical variance is greater than the empirical mean) and have other features such as time dependence. Many univariate models have been proposed to deal with integer-valued time series data based on the univariate binomial thinning operator $“ \circ ”$ , which is proposed by Steutel and van Harn [1]:

(1) $\begin{matrix} α \circ X = \sum_{i = 1}^{X} W_{i}, \end{matrix}$

where X is a non-negative integer-valued random variable and

P (W_{i} = 1) = 1 - P (W_{i} = 0) = α

. The INAR(1) model [2], The BAR(1) model [3], the INAR(p) model [4], The PDINAR(1) model [5] and the BARIO model [6] are very popular in analyzing non-negative integer-valued time series; see Weiß [7], Scotto et al. [8] and Davis et al. [9] for recent reviews on this topic. Motivated by infinite-patch metapopulation models discussed in Buckley and Pollett [10], Weiß [11] proposed an extension to the popular Poisson INAR(1) model, which is characterized by time-dependent innovations, i.e., the mean of the innovation is linearly increased by the previous population size. An important advantage of this model is that it gives a reasonable interpretation for immigration, which becomes more attractive if the current population is large; see Weiß [11] for an application to iceberg order data.

Univariate models are extensively investigated in the literature, but relatively few multivariate models, especially for bivariate versions, have been studied in detail. Franke and Rao [12] proposed a multivariate INAR(1) model, which is generalized to the p-order case by Latour [13]. Pedeli and Karlis [14] discussed a tractable bivariate INAR(1) model, which can be used to deal with bivariate count data with equi-dispersion and over-dispersion, but with small flexibility. See Pedeli and Karlis [15] for the estimation of the BINAR model and Pedeli and Karlis [16] for a further discussion of the properties of the multivariate INAR(1) model. Based on hierarchical dynamic models, Ravishanker et al. [17] described a Bayesian framework for estimation and prediction for multivariate times series of counts. Popović [18] proposed a bivariate INAR(1) model with random coefficients based on different binomial thinning operators. The above models assumed the innovations of their marginal models are independent and identically distributed. Based on a finite range of counts, Scotto et al. [19] considered the density-dependent bivariate binomial autoregressive models by using their state-dependent thinning concept. Li et al. [20] proposed a bivariate random coefficient INAR(1) model with asymmetric Hermite innovations.

Inspired by Weiß [11], we aim at providing a bivariate INAR model to analyze bivariate time series with time dependence and cross-correlation. The first contribution is that this paper gives an available method to capture the time-dependence trend by imposing the past information in the distribution of the innovational vector, which in turn makes the cross-correlation between the two entries into an innovation vector. The second contribution is that the new model not only allows autocorrelation but also allows cross-correlation. The third contribution is that this paper illustrates the stationarity and ergodicity of the extended bivariate INAR process and its two subprocesses, which is important to derive the consistence and asymptotic normality of the CML estimation.

The remainder of the paper is organized as follows. In Section 2, we first give brief reviews of the bivariate Poisson distribution and the bivariate binomial thinning operator, based on which we give the definition of the new bivariate INAR(1) model. Conditional maximum likelihood (CML) estimates and the asymptotic properties of unknown parameters are discussed in Section 3. A simulation and two real data examples that show the effectiveness of the new model are given in Section 4 and Section 5, respectively. Conclusions are made in Section 6.

2. A New Bivariate INAR(1) Model

For readability, we first give a brief review of the bivariate Poisson distribution.

Definition 1.

If the joint probability mass function (pmf) of $(X, Y)$ satisfies

(2) $\begin{matrix} P (X = x, Y = y) \\ = e^{- (λ_{1} + λ_{2} - ϕ)} \frac{{(λ_{1} - ϕ)}^{x}}{x!} \frac{{(λ_{2} - ϕ)}^{y}}{y!} \sum_{i = 0}^{min (x, y)} (\binom{x}{i}) (\binom{y}{i}) i! {[\frac{ϕ}{(λ_{1} - ϕ) (λ_{2} - ϕ)}]}^{i}, \end{matrix}$

where $λ_{1}, λ_{2} > 0$ and $ϕ \in (0, min (λ_{1}, λ_{2}))$ , then $(X, Y)$ is called a bivariate Poisson random variable with parameters $(λ_{1}, λ_{2}, λ_{3})$ , i.e.,BP

(λ_{1}, λ_{2}, ϕ) .

From Kocherlakota and Kocherlakota [21], we obtain the fact that if $(X, Y)$ follows BP $(λ_{1}, λ_{2}, ϕ)$ , there must exist three mutually independent random variables $Z_{1}$ , $Z_{2}$ , $Z_{3}$ such that $X = Z_{1} + Z_{3}$ and $Y = Z_{2} + Z_{3}$ , where $Z_{1}, Z_{2}$ and $Z_{3}$ follow Poisson $(λ_{1} - ϕ)$ , Poisson $(λ_{2} - ϕ)$ and Poisson $(ϕ)$ , respectively. Then, we have the conclusion that $Cov (X, Y) = ϕ$ . In addition, $P (X = x, Y = y)$ , given in (2), is continuous and differentiable. For convenience, we denote $f (x, y, λ_{1}, λ_{2}, ϕ) = P (X = x, Y = y)$ . By using Lemma A3 in Li et al. [20], we obtain that

(3) $\begin{matrix} \frac{\partial f (x, y, λ_{1}, λ_{2}, ϕ)}{\partial λ_{1}} & = f (x - 1, y, λ_{1}, λ_{2}, ϕ) - f (x, y, λ_{1}, λ_{2}, ϕ), \end{matrix}$

(4) $\begin{matrix} \frac{\partial f (x, y, λ_{1}, λ_{2}, ϕ)}{\partial λ_{2}} & = f (x, y - 1, λ_{1}, λ_{2}, ϕ) - f (x, y, λ_{1}, λ_{2}, ϕ), \end{matrix}$

(5) $\begin{matrix} \frac{\partial f (x, y, λ_{1}, λ_{2}, ϕ)}{\partial ϕ} & = f (x, y, λ_{1}, λ_{2}, ϕ) - f (x - 1, y, λ_{1}, λ_{2}, ϕ) - f (x, y - 1, λ_{1}, λ_{2}, ϕ) \\ + f (x - 1, y - 1, λ_{1}, λ_{2}, ϕ) . \end{matrix}$

Applying the univariate binomial thinning operator $“ \circ ”$ given in (1) to the bivariate case with $X = {(X_{1}, X_{2})}^{⊤}$ leads to the bivariate binomial thinning operator:

$A \circ X = (\begin{matrix} α_{11} \circ X_{1} + α_{12} \circ X_{2} \\ α_{21} \circ X_{1} + α_{22} \circ X_{2} \end{matrix}) with A = {(α_{i j})}_{2 \times 2},$

where

α_{i j} \in (0, 1), i, j = 1, 2

X_{1}

and

X_{2}

are non-negative integer-valued random variables, and all the thinnings are performed independent of each other.

By calculation, $E (A \circ X) = A E (X) .$ Denoting $V$ as the $2 \times 2$ variance matrix of the Bernoulli random variables $α_{i j} \circ X_{j}$ with ${(V)}_{i j} = α_{i j} (1 - α_{i j})$ , $i, j = 1, 2,$ we obtain that $E ((A \circ X) {(A \circ X)}^{⊤}) = A E (X X^{⊤}) A^{⊤} + diag (V E (X)) .$ Furthermore, if all the counting series of $A \circ X$ and $B \circ Y$ are independent, $E ((A \circ X) {(B \circ Y)}^{⊤}) = A E (X Y^{⊤}) B^{⊤}$ .

In the following, we give the definition of the new bivariate INAR(1) model, which not only includes the property of the models defined by Pedeli and Karlis [14,16] but also allows the innovation vectors ${ϵ_{t}}$ to be time-dependent.

Definition 2.

Let $X_{t} = {(X_{1 t}, X_{2 t})}^{⊤}$ be non-negative integer-valued bivariate random vector. If the process ${X_{t}}$ satisfies

(6) $X_{t} = A \circ X_{t - 1} + ϵ_{t}, t \in Z,$

then ${X_{t}}$ is said to follow the extended bivariate INAR(1) process, where $A = {(α_{i j})}_{2 \times 2}$ , $0 < α_{i j} < 1$ , for any $i, j = 1, 2, ϵ_{t} = {(ϵ_{1 t}, ϵ_{2 t})}^{⊤} \sim BP (λ_{1 t}, λ_{2 t}, ϕ)$ with ${(λ_{1 t}, λ_{2 t})}^{⊤} = B X_{t - 1} + C, B = {(b_{i j})}_{2 \times 2}$ , $C = {(c_{1}, c_{2})}^{⊤}$ , $0 < b_{i j} < 1, c_{i} > 0, i, j = 1, 2$ .

For simplicity, we denote the new model as the EBINAR(1) model. It is easy to see that the ith equation of model (6) is presented by:

(7) $X_{i t} = α_{i 1} \circ X_{1, t - 1} + α_{i 2} \circ X_{2, t - 1} + ϵ_{i t}, i = 1, 2 .$

Notice that the model given by (7) is similar to the one discussed in Weiß [11], the main difference is that $X_{i t}$ involves two paralleled survivors $X_{1, t - 1}$ and $X_{2, t - 1}$ . It is known that the EBINAR(1) process ${X_{t}, t \in Z}$ has two parts: the first part consists of the survivors of the elements of the system at the preceding time $t - 1$ , denoted by $X_{t - 1}$ ; the other part is comprised by the time-dependent innovation vector $ϵ_{t}$ , which implies that the mean of the innovation vector is linearly increased by the previous population size.

Remark 1. (1). If both $A$ and $B$ are diagonal matrices, the component equation given in (7) becomes to the one discussed by Weiß [11].

(2). If $A$ is diagonal and $B = 0$ , model (6) becomes the one discussed in Pedeli and Karlis [14], but it is worth mentioning that the autoregression matrix in Pedeli and Karlis [14] is diagonal, which means that it causes no cross-correlation in the counts.

(3). If $A$ is non-diagonal and $B = 0$ , model (6) becomes the one discussed in Pedeli and Karlis [16], which accounts for cross-correlation in the counts, but they still keep the innovations of their marginal models independent and identically distributed such that the time dependence can not to be captured.

To derive the pmf the EBINAR(1) process, we first denote $h (k, m_{1}, m_{2}, α_{1}, α_{2}) : = P (X + Y = k)$ is the convolution of $X + Y$ , $\forall k \geq 0$ with $X \sim Bin (m_{1}, α_{1})$ and $Y \sim Bin (m_{2}, α_{2})$ . By calculation, we obtain that

(8) $\begin{matrix} h (k, m_{1}, m_{2}, α_{1}, α_{2}) = \sum_{j = 0}^{s} P (X = j | m_{1}, α_{1}) P (Y = k - j | m_{2}, α_{2}) . \end{matrix}$

Furthermore, by using Lemma A3 in Li et al. [20],

(9) $\begin{matrix} \frac{\partial h (k, m_{1}, m_{2}, α_{1}, α_{2})}{\partial α_{1}} = m_{1} (h (k - 1, m_{1} - 1, m_{2}, α_{1}, α_{2}) - h (k, m_{1} - 1, m_{2}, α_{1}, α_{2})), \end{matrix}$

(10) $\begin{matrix} \frac{\partial h (k, m_{1}, m_{2}, α_{1}, α_{2})}{\partial α_{2}} = m_{2} (h (k - 1, m_{1}, m_{2} - 1, α_{1}, α_{2}) - h (k, m_{1}, m_{2} - 1, α_{1}, α_{2})) . \end{matrix}$

Second, we denote $ς = {(ς_{1}, ς_{2})}^{⊤}$ , $ϑ = {(ϑ_{1}, ϑ_{2})}^{⊤}$ , $k = {(k_{1}, k_{2})}^{⊤}$ and let $x = ς_{1} - k_{1}$ and $y = ς_{2} - k_{2}$ . Then, the conditional probability distribution of the EBINAR(1) process takes the following form:

(11) $\begin{matrix} P (ς | ϑ) : = P (X_{t} = ς | X_{t - 1} = ϑ) = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} P (A \circ X_{t - 1} = k | ϵ_{t} = ς - k) P (ϵ_{t} = ς - k) \\ = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} (P (α_{11} \circ X_{1, t - 1} + α_{12} \circ X_{2, t - 1} = k_{1}) \times P (α_{21} \circ X_{1, t - 1} + α_{22} \circ X_{2, t - 1} = k_{2}) . \\ \times P (ϵ_{1 t} = ς_{1} - k_{1}, ϵ_{2 t} = ς_{2} - k_{2})) \\ = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} h (k_{1}, ϑ_{1}, ϑ_{2}, α_{11}, α_{12}) h (k_{2}, ϑ_{1}, ϑ_{2}, α_{21}, α_{22}) f (x, y, λ_{1 t}, λ_{2 t}, ϕ), \end{matrix}$

where

g_{1} = min (ς_{1}, ϑ_{1})

g_{2} = min (ς_{2}, ϑ_{2})

$\begin{matrix} f (x, y, λ_{1 t}, λ_{2 t}, ϕ) = P (ϵ_{1 t} = x, ϵ_{2 t} = y) \\ \overset{(2)}{=} exp (λ_{1 t} + λ_{2 t} - ϕ) \frac{{(λ_{1 t} - ϕ)}^{x}}{x!} \frac{{(λ_{2 t} - ϕ)}^{y}}{y!} \sum_{i = 0}^{min (x, y)} (\binom{x}{i}) (\binom{y}{i}) i! {[\frac{ϕ}{(λ_{1 t} - ϕ) (λ_{2 t} - ϕ)}]}^{i} \end{matrix}$

with

λ_{1 t} = b_{11} X_{1, t - 1} + b_{12} X_{2, t - 1} + c_{1}

and

λ_{2 t} = b_{21} X_{1, t - 1} + b_{22} X_{2, t - 1} + c_{2}

If the largest eigenvalue of non-negative matrix $A$ is less than 1, then the bivariate marginal distribution of model (6) can be expressed in terms of the bivariate innovation vectors:

(12) $X_{t} \overset{d}{=} A^{k} \circ X_{t - k} + \sum_{j = 0}^{k - 1} A^{j} \circ ϵ_{t - j} = A^{t} \circ X_{0} + \sum_{j = 0}^{t - 1} A^{j} \circ ϵ_{t - j}, k = 1, 2, \dots, t,$

where

A^{0}

is an identity matrix, and

X_{0}

is the initial value of the process.

In what follows, we first discuss the stationarity and ergodicity of processes (6) and (7), respectively. Second, we obtain the first two-moment of ${X_{t}}$ and ${ϵ_{t}}$ , respectively. Third, we give a necessary and sufficient condition for the existence of $E {(X_{1 t})}^{k}$ and $E {(X_{2 t})}^{k}$ for any fixed positive integer k. These properties are necessary to derive the asymptotic properties of the estimators.

Theorem 1.

Let ${X_{t} = {(X_{1 t}, X_{2 t})}^{⊤}}$ follow (6), $Γ = A + B = {(γ_{i j})}_{i, j = 1, 2}$ with $0 < γ_{i j} < 1$ . If the largest eigenvalue of Γ is less than 1, there exists a strictly stationary and ergodic process satisfying (7).

Proof.

Let $W_{t, k}$ , $V_{t, l}$ and $δ_{1 t}$ be independent of each other and each of them be independent and identically distributed, i.e., $W_{t, k} \sim Bin (1, α_{11}) + Poi (b_{11})$ , $V_{t, l} \sim Bin (1, α_{12}) + Poi (b_{12})$ and $δ_{1 t} \sim Poi (c_{1})$ , where $Bin (1, α_{1 j}) + Poi (b_{1 j})$ means the convolution of the distributions $Bin (1, α_{1 j})$ and $Poi (b_{1 j})$ , $k = 1, 2, \dots, X_{1, t - 1}$ and $l = 1, 2, \dots, X_{2, t - 1}$ ; see Weiß [11] for details. According to the concepts of bivariate binomial thinning and the additivity of binomial distribution and Poisson distribution, (7) can be rewritten as

(13) $\begin{matrix} X_{1 t} \overset{d}{=} W_{t, 1} + \dots + W_{t, X_{1, t - 1}} + V_{t, 1} + \dots + V_{t, X_{2, t - 1}} + δ_{1 t} . \end{matrix}$

Since $γ = max (α_{i j} + b_{i j}) < 1$ , we have $E (W_{t, k}) = α_{11} + b_{11} < 1$ and $E (V_{t, l}) = α_{12} + b_{12} < 1$ . Denote $H (n) = \sum_{k = 1}^{n} \frac{1}{k}$ and $H (0) = 0$ , then $E (H (δ_{1 t})) = \sum_{k = 1}^{+ \infty} \frac{1}{k} P (δ_{1 t} \geq k) .$ In addition, that $H (δ_{1 t}) \leq δ_{1 t}$ and $E (δ_{1 t}) = c_{1} < \infty$ , thus, $E H (δ_{1 t}) \leq E (δ_{1 t}) < \infty$ . Therefore, the Theorem of Heathcote [22] holds. Hence, there exists a stationary marginal distribution of (7), i.e., there exists a strictly stationary process satisfying (7). Similarly, we also have a similar conclusion for $X_{2 t}$ . □

To prove the stationaity of the EBINAR(1) process, we first introduce a sequence of random variables ${X_{t}^{(n)}}$ that could be considered as approximations to ${X_{t}}$ with

$X_{t}^{(n)} = \{\begin{matrix} 0, & n < 0, \\ R_{t}, & n = 0, \\ A \circ X_{t - 1}^{(n - 1)} + B X_{t - 1}^{(n - 1)} + R_{t}, & n > 0, \end{matrix}$

where the largest eigenvalues of the non-negative matrices

A

B

and

Γ : = A + B

are less than 1, all of the non-negative matrices

A

B

and

I - Γ

are invertible,

R_{t} = {(R_{1 t}, R_{2 t})}^{⊤}

R_{1 t}

is independent with

R_{2 t}

and

R_{i t}

follows a Poisson distribution with the parameter

c_{i}

i = 1, 2

Theorem 2.

If the conditions of Theorem 1 hold, there exists a strictly stationary process satisfying (6).

Proof.

Because

$(\begin{matrix} X_{1}^{(0)} \\ ⋮ \\ X_{k}^{(0)} \end{matrix}) = (\begin{matrix} R_{1} \\ ⋮ \\ R_{k} \end{matrix}) and (\begin{matrix} X_{h + 1}^{(0)} \\ ⋮ \\ X_{h + k}^{(0)} \end{matrix}) = (\begin{matrix} R_{h + 1} \\ ⋮ \\ R_{h + k} \end{matrix})$

are identically distributed for

{(R_{1}, \dots, R_{k})}^{⊤}

and

{(R_{h + 1}, \dots, R_{h + k})}^{⊤}

are identically distributed. Thus,

{X_{t}^{(0)}}

is strictly stationary. Now, we suppose

{X_{t}^{(n)}}

is strictly stationary. Then,

(14) $\begin{matrix} (\begin{matrix} X_{1}^{(n + 1)} \\ ⋮ \\ X_{k}^{(n + 1)} \end{matrix}) = (\begin{matrix} R_{1} \\ ⋮ \\ R_{k} \end{matrix}) + (\begin{matrix} A & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & A \end{matrix}) \circ (\begin{matrix} X_{0}^{(n)} \\ ⋮ \\ X_{k - 1}^{(n)} \end{matrix}) + (\begin{matrix} B & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & B \end{matrix}) (\begin{matrix} X_{0}^{(n)} \\ ⋮ \\ X_{k - 1}^{(n)} \end{matrix}) \end{matrix}$

and

(15) $\begin{matrix} (\begin{matrix} X_{h + 1}^{(n + 1)} \\ ⋮ \\ X_{h + k}^{(n + 1)} \end{matrix}) = (\begin{matrix} R_{h + 1} \\ ⋮ \\ R_{h + k} \end{matrix}) + (\begin{matrix} A & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & A \end{matrix}) \circ (\begin{matrix} X_{h}^{(n)} \\ ⋮ \\ X_{h + k - 1}^{(n)} \end{matrix}) + (\begin{matrix} B & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & B \end{matrix}) (\begin{matrix} X_{h}^{(n)} \\ ⋮ \\ X_{h + k - 1}^{(n)} \end{matrix}) . \end{matrix}$

Because the joint distributions of the variables involved in the right-hand sides of (14) and (15) are identical, thus, ${(X_{1}^{(n + 1)}, \dots, X_{k}^{(n + 1)})}^{⊤}$ and ${(X_{h + 1}^{(n + 1)}, \dots, X_{h + k}^{(n + 1)})}^{⊤}$ in the left-hand side of the two equations are identically distributed. Hence, the process ${X_{t}^{(n)}}$ is strictly stationary. Therefore, ${X_{t}}$ is a strictly stationary process. □

Theorem 3.

If the conditions of Theorem 1 hold, ${X_{t}}$ is a null recurrent and ergodic Markov chain.

Proof.

First, we prove ${X_{t}}$ is null recurrent. Because $X_{t} \overset{d}{=} \sum_{j = 0}^{\infty} A^{j} \circ ϵ_{t - j},$ then $P_{0, 0}^{t} = P (X_{t} = 0 | X_{0} = 0) = \prod_{j = 0}^{k - 1} P (A^{j} \circ ϵ_{t - j} = 0)$ with probability one. Let $A^{j} = (p_{i j})$ and $γ = max (p_{i j})$ , $\forall i = 1, 2, \forall j = 1, 2$ . Then, we obtain that

$\begin{matrix} P (A^{j} \circ ϵ_{t - j} \neq 0) = P ({p_{11} \circ ϵ_{1, t - j} + p_{12} \circ ϵ_{2, t - j} \geq 1} \cup {p_{21} \circ ϵ_{1, t - j} + p_{22} \circ ϵ_{2, t - j} \geq 1}) \\ \leq P (p_{11} \circ ϵ_{1, t - j} + p_{12} \circ ϵ_{2, t - j} \geq 1) + P (p_{21} \circ ϵ_{1, t - j} + p_{22} \circ ϵ_{2, t - j} \geq 1) \\ \leq P (p_{11} \circ ϵ_{1, t - j} \geq 1) + P (p_{12} \circ ϵ_{2, t - j} \geq 1) + P (p_{21} \circ ϵ_{1, t - j} \geq 1) + P (p_{22} \circ ϵ_{2, t - j} \geq 1) \\ \leq 2 [P (γ^{j} \circ ϵ_{1, t - j} \geq 1) + P (γ^{j} \circ ϵ_{2, t - j} \geq 1)] \\ \leq 2 [E (γ^{j} \circ ϵ_{1, t - j}) + E (γ^{j} \circ ϵ_{2, t - j})] = 2 γ^{j} (μ_{ϵ_{1}} + μ_{ϵ_{2}}) . \end{matrix}$

According to Theorem 2, there exists an $M > 0$ such that $μ_{ϵ_{i}} \leq M / 4, i = 1, 2 .$ Then, we have $P (A^{j} \circ ϵ_{t - j} = 0) \geq 1 - M γ^{j} .$ Hence,

$\begin{matrix} P_{0, 0}^{t} \geq \prod_{j = 0}^{t - 1} P (A^{j} \circ ϵ_{t - j} = 0) & \geq \prod_{j = 0}^{t - 1} (1 - M γ^{j}) = exp {\sum_{j = 0}^{t - 1} log (1 - M γ^{j})} \\ \geq exp {log (1 - M γ^{τ}) / (1 - r)} > 0, \forall t > τ . \end{matrix}$

Therefore, $lim_{t \to \infty} P_{0, 0}^{t} > 0$ . Thus, $\sum_{k = 0}^{\infty} P_{0, 0}^{t} = \infty$ , i.e., $0$ is a recurrent state.

Second, we illustrate the ergodicity. For all states $ς$ , $ϑ$ , $κ_{t - 2}, κ_{t - 3}, \dots$ , we have

$P (X_{t} = ς| X_{t - 1} = ϑ, X_{t - 2} = κ_{t - 2}, \dots) = P (X_{t} = ς| X_{t - 1} = ϑ) = P (ϑ, ς),$

where

P (ϑ, ς)

denotes the transition probability from state

ϑ

to state

ς

. Thus,

{X_{t}}

is a homogeneous Markov chain. Since

α_{i j}, b_{i j} \in (0, 1)

, thus

P (ϵ_{1 t} = ς_{1} - k_{1}, ϵ_{2 t} = ς_{2} - k_{2}) > 0 .

Denote n-state transition probability from state

ς

to state

ϑ

with

P_{ς ϑ}^{n}

. For a given

X_{t - 1}

, the conditional probability function of the random vector

X_{t}

is derived by:

$\begin{matrix} P (X_{1 t} & = ς_{1}, X_{2 t} = ς_{2} | X_{1, t - 1} = ϑ_{1}, X_{2, t - 1} = ϑ_{2}) \\ = P (X_{1 t} = ς_{1} | X_{1, t - 1} = ϑ_{1}, X_{2, t - 1} = ϑ_{2}) P (X_{2 t} = ς_{2} | X_{1, t - 1} = ϑ_{1}, X_{2, t - 1} = ϑ_{2}, X_{1 t} = ς_{1}), \end{matrix}$

then

P_{τ υ}^{1} > 0

for all

τ, υ \in N_{0}^{2}

. According to (12), every state can be reached from any other state with positive probability in a finite number of steps, analogously. Hence,

{X_{t}}

is irreducible. By (12), k steps of conditional probability distribution

P_{0, 0}^{k}

are obtained with:

$\begin{matrix} P_{0, 0}^{k} = P (X_{t} = 0 | X_{t - k} = 0) & = P (A^{k} \circ X_{t - k} + \sum_{j = 0}^{k - 1} A^{j} \circ ϵ_{t - j} = 0 | X_{t - k} = 0) \\ = \underset{(V)}{\underset{︸}{P (A^{k} \circ X_{t - k} = 0 | X_{t - k} = 0)}} \underset{(VI)}{\underset{︸}{\prod_{j = 0}^{k - 1} P (A^{j} \circ ϵ_{t - j} = 0)}} . \end{matrix}$

Note that the first multiplier (V) is positive, which can be obtained by a similar method to (11). Denoting $A^{j} = {(p_{i j})}_{i, j = 1, 2}$ , then we have:

$\begin{matrix} P (A^{j} \circ ϵ_{t - j} = 0) = P (p_{11} \circ ϵ_{1, t - j} + p_{12} \circ ϵ_{2, t - j} = 0, p_{21} \circ ϵ_{1, t - j} + p_{22} \circ ϵ_{2, t - j} = 0) \\ = \sum_{k = 0}^{\infty} \sum_{s = 0}^{\infty} {(1 - p_{11})}^{k} {(1 - p_{12})}^{s} {(1 - p_{21})}^{k} {(1 - p_{22})}^{s} P (ϵ_{1, t - j} = k, ϵ_{2, t - j} = s) > 0, \end{matrix}$

thus, the second part, (VI), is also positive. Therefore,

P_{0, 0}^{k} > 0

, with probability one, i.e.,

{X_{t}}

, is aperiodic. Hence,

{X_{t}}

is an ergodic Markov chain. □

Note that $E (X_{t}^{(n)}) = {(I - A - B)}^{- 1} C < \infty$ and

$\begin{matrix} E (X_{t}^{(n)} {X_{t}^{(n)}}^{⊤}) = Γ E (X_{t}^{(n - 1)} {X_{t}^{(n - 1)}}^{⊤}) Γ^{⊤} + Ψ \\ = \dots = Γ^{n} E (X_{t}^{(0)} {X_{t}^{(0)}}^{⊤}) {(Γ^{n})}^{⊤} + Γ^{n - 1} Ψ {(Γ^{n - 1})}^{⊤} + \dots + Γ Ψ Γ^{⊤} + Ψ, \end{matrix}$

where

Ψ

involves the first moments of

X_{t}^{(n)}

and

R_{t}

. Hence, the first two moments of

X_{t}^{(n)}

are finite. Thus,

{X_{t}^{(n)}}

is stationary and ergodic by Theorem 2, Theorem 3 and Shumway and Stoffer [23].

Theorem 4.

If the conditions of Theorem 1 hold, the first two moments and covariance matrix of ${X_{t}}$ exist and

(1). $E (X_{t} | X_{t - 1}) = (A + B) X_{t - 1} + C$ ;

(2). $E (X_{t}) = {(I - A - B)}^{- 1} C$ , if ${(I - A - B)}^{- 1}$ exists, where $I$ denotes the identity matrix;

(3). $R (k) = Cov (X_{t + k}, X_{t}) = {(A + B)}^{k} R (0)$ , $k = 1, 2, \dots$ .

In addition, if $k = 0$ , $R (0) = A R (0) A^{⊤} + H^{*} + A R (0) B^{⊤} + B R (0) A^{⊤} + Σ,$ where $H^{*} = diag (\sum_{j = 1}^{M} V_{i j} E (X_{j, t - 1}))$ , $V_{i j} = α_{i j} (1 - α_{i j})$ and $Σ = Cov (ϵ_{t}, ϵ_{t})$ . Specifically, if $A$ and $B$ are diagonal matrices, $R (0) = {(I - A A^{⊤} - 2 A B^{⊤})}^{- 1} Σ + H^{*} .$

Proof.

(1) and (2) are easy to prove by the moment property of $A \circ$ , and we omit them. Here, we only give the proof of (3):

$\begin{matrix} R (k) & = Cov (A \circ X_{t + k - 1} + ϵ_{t + k}, X_{t}) = Cov (A \circ X_{t + k - 1}, X_{t}) + Cov (ϵ_{t + k}, X_{t}) \\ = A Cov (X_{t + k - 1}, X_{t}) + Cov (B X_{t + k - 1} + C, X_{t}) = (A + B) Cov (X_{t + k - 1}, X_{t}) \\ = (A + B) R (k - 1) = \dots = {(A + B)}^{k} R (0) . \end{matrix}$

In fact, $Cov (X_{t - 1}, ϵ_{t}) = Cov (X_{t - 1}, B X_{t - 1} + C) = Cov (X_{t - 1}, X_{t - 1}) B^{⊤}$ and $Cov (ϵ_{t}, X_{t - 1}) = Cov (B X_{t - 1} + C, X_{t - 1}) = B Cov (X_{t - 1}, X_{t - 1})$ . Hence,

$\begin{matrix} R (0) & = Cov (X_{t}, X_{t}) = Cov (A \circ X_{t - 1} + ϵ_{t}, A \circ X_{t - 1} + ϵ_{t}) \\ = A Cov (X_{t - 1}, X_{t - 1}) A^{⊤} + H^{*} + A Cov (X_{t - 1}, X_{t - 1}) B^{⊤} + B Cov (X_{t - 1}, X_{t - 1}) A^{⊤} + Σ \\ = A R (0) A^{⊤} + H^{*} + A R (0) B^{⊤} + B R (0) A^{⊤} + Σ, \end{matrix}$

where

H^{*} = diag (\sum_{j = 1}^{2} V_{i j} E (X_{j, t - 1})) .

Let

λ

λ_{1}

and

λ_{2}

be the largest eigenvalues of

{AA}^{⊤} + 2 {AB}^{⊤}

A

and

B

, respectively. If

A

and

B

are diagonal matrices,

$| λ | \leq | λ_{1}^{2} + 2 λ_{1} λ_{2} | \leq | λ_{1} (λ_{1} + λ_{2}) + λ_{1} λ_{2} | \leq λ_{1} | (λ_{1} + λ_{2}) | + λ_{2} | λ_{1} | \leq λ_{1} + λ_{2} < 1,$

then

I - {AA}^{⊤} - 2 {AB}^{⊤}

is a nonsingular matrix. Hence,

R (0)

is obtained. □

Theorem 5.

If the conditions of Theorem 1 hold, the first two moments and covariance matrices of ${ϵ_{t}}$ exist and:

(1). $E (ϵ_{t} | X_{t - 1}) = B X_{t - 1} + C$ ;

(2). $E (ϵ_{t}) = (I - A) {(I - A - B)}^{- 1} C,$ if ${(I - A - B)}^{- 1}$ exists;

(3). $R_{ϵ} (k) = Cov (ϵ_{t + k}, ϵ_{t}) = B {(A + B)}^{k} R (0) B^{⊤}$ , $k = 0, 1, 2, \dots$ .

Proof.

(1) is easy to prove by the distribution of $ϵ_{t}$ . We only need to prove (2) and (3).

(2). $E (ϵ_{t}) = B μ_{t} + C = B {(I - A - B)}^{- 1} C + (I - A - B) {(I - A - B)}^{- 1} C = (I - A) E (X_{t})$ by the definition of $ϵ_{t}$ . $E (ϵ_{t})$ can be obtained directly by (6).

(3). According to the construction of the EBINAR(1) model, we have:

(16) $\begin{matrix} Cov (X_{t}, ϵ_{t}) = Cov (A \circ X_{t - 1}, B X_{t - 1} + C) + Cov (B X_{t - 1} + C, B X_{t - 1} + C) \\ = A Cov (X_{t - 1}, X_{t - 1}) B^{⊤} + B Cov (X_{t - 1}, X_{t - 1}) B^{⊤} = (A + B) R (0) B^{⊤}, \end{matrix}$

(17) $\begin{matrix} R_{ϵ} (k) = Cov (B X_{t + k - 1} + C, ϵ_{t}) = B Cov (X_{t + k - 1}, ϵ_{t}) \\ = B Cov (A \circ X_{t + k - 2} + ϵ_{t + k - 1}, ϵ_{t}) = BA Cov (X_{t + k - 2}, ϵ_{t}) + B Cov (ϵ_{t + k - 1}, ϵ_{t}) \\ = BA Cov (X_{t + k - 2}, ϵ_{t}) + B Cov (B X_{t + k - 2} + C, ϵ_{t}) \\ = BA Cov (X_{t + k - 2}, ϵ_{t}) + B^{2} Cov (X_{t + k - 2}, ϵ_{t}) = B (A + B) Cov (X_{t + k - 2}, ϵ_{t}) \\ = \dots = B {(A + B)}^{k - 1} Cov (X_{t}, ϵ_{t}), \end{matrix}$

then

R_{ϵ} (k)

is achieved by substituting (16) into (17), i.e.,

R_{ϵ} (k) = B {(A + B)}^{k} R (0) B^{⊤}

. Note that

Cov (ϵ_{t}, ϵ_{t}) = Cov (B X_{t - 1} + C, B X_{t - 1} + C) = B R (0) B^{⊤},

i.e., the formula holds for

k = 0

. □

Theorem 6.

For any fixed positive integer k, it is a necessary and sufficient condition that $E {(X_{i t})}^{k} < \infty$ is $γ < 1$ , $i = 1, 2$ .

Proof.

For convenience, let $A$ and $B$ be diagonal matrices.

Necessity. According to Lemma 2.1 of Silva and Oliveira [24], $E [{(α_{11} \circ X_{1, t - 1})}^{i} {(ϵ_{1 t})}^{k - i}] = α_{11}^{i} b_{11}^{k - i} E {(X_{1, t - 1})}^{k} + ψ_{1}$ , where $ψ_{1} = ψ_{1} (X_{1, t - 1})$ involves the moments of $X_{1, t - 1}$ of order $\leq (k - 1)$ and $i = 0, 1, 2, \dots, k$ . Then,

(18) $\begin{matrix} E {(X_{1 t})}^{k} = E {(α_{11} \circ X_{1, t - 1} + ϵ_{1 t})}^{k} = \sum_{i = 0}^{k} (\binom{k}{i}) E [{(α_{11} \circ X_{1, t - 1})}^{i} {(ϵ_{1 t})}^{k - i}] \\ = \sum_{i = 0}^{k} (\binom{k}{i}) α_{11}^{i} b_{11}^{k - i} E {(X_{1, t - 1})}^{k} + ψ_{1} = {(α_{11} + b_{11})}^{k} E {(X_{1, t - 1})}^{k} + ψ_{1} . \end{matrix}$

Thus, $E {(X_{1 t})}^{k} = \frac{ψ_{1}}{1 - {(α_{11} + b_{11})}^{k}}$ by (18). Hence, $1 - {(α_{11} + b_{11})}^{k} > 0$ if $E {(X_{1 t})}^{k} < \infty$ , i.e., $α_{11} + b_{11} < 1$ . Similarly, $α_{22} + b_{22} < 1$ if $E {(X_{2 t})}^{k} < \infty$ . Hence, $γ < 1$ if $E {(X_{i t})}^{k} < \infty$ , $i = 1, 2$ .

Sufficiency. We know that $E {(X_{i t})}^{k} < \infty$ holds for $k = 1, 2$ by Theorems 4 and 5. The sufficient condition can be proved by induction with respect to k. Now suppose that $E {(X_{i t})}^{k - 1} < \infty$ , $k \geq 3$ . According to (13), we define

$X_{1 t}^{(n)} = \{\begin{matrix} 0, & n < 0; \\ δ_{1 t}, & n = 0; \\ δ_{1 t} + \sum_{j = 1}^{X_{1, t - 1}^{(n - 1)}} W_{t, j}, & n > 0 \end{matrix} and X_{2 t}^{(n)} = \{\begin{matrix} 0, & n < 0; \\ δ_{2 t}, & n = 0; \\ δ_{2 t} + \sum_{s = 1}^{X_{2, t - 1}^{(n - 1)}} V_{t, s}, & n > 0, \end{matrix}$

where

W_{t, j}

δ_{1 t}

V_{t, s}

and

δ_{2 t}

are independent of each other and each of them is independent and identically distributed, i.e.,

W_{t, j} \sim Bin (1, α_{11}) + Poi (b_{11})

δ_{1 t} \sim Poi (c_{1})

V_{t, s} \sim Bin (1, α_{22}) + Pois (b_{22})

and

δ_{2 t} \sim Pois (c_{2})

. Using the univariate binomial thinning operator,

X_{1 t}^{(n)}

and

X_{2 t}^{(n)}

admit the representations:

(19) $\begin{matrix} X_{1 t}^{(n)} = δ_{1 t} + (α_{11} \circ X_{1, t - 1}^{(n - 1)} + Z_{1 t}), \end{matrix}$

(20) $\begin{matrix} X_{2 t}^{(n)} = δ_{2 t} + (α_{22} \circ X_{2, t - 1}^{(n - 1)} + Z_{2 t}), \end{matrix}$

where

Z_{1 t} \sim Poi (b_{11} X_{1, t - 1}^{(n - 1)})

and

Z_{2 t} \sim Poisson (b_{22} X_{2, t - 1}^{(n - 1)})

. It is easy to see both

{X_{1 t}^{(n)}}_{n \in N}

and

{X_{2 t}^{(n)}}_{n \in N}

are non-decreasing. According to Lemma 2.1 of [24], we have:

$\begin{matrix} E {(α_{11} \circ X_{1, t - 1}^{(n - 1)} + Z_{1 t})}^{k} = {(α_{11} + b_{11})}^{k} E {(X_{1, t - 1}^{(n - 1)})}^{k} + ψ_{2} \leq {(α_{11} + b_{11})}^{k} E {(X_{1, t - 1}^{(n)})}^{k} + ψ_{4}, \\ E {(α_{22} \circ X_{2, t - 1}^{(n - 1)} + Z_{2 t})}^{k} = {(α_{22} + b_{22})}^{k} E {(X_{2, t - 1}^{(n - 1)})}^{k} + ψ_{3} \leq {(α_{22} + b_{22})}^{k} E {(X_{2, t - 1}^{(n)})}^{k} + ψ_{5}, \end{matrix}$

where

ψ_{2} = ψ_{2} (X_{1, t - 1}^{(n - 1)})

and

ψ_{3} = ψ_{3} (X_{2, t - 1}^{(n - 1)})

involve the moments of

X_{1, t - 1}^{(n - 1)}

and

X_{2, t - 1}^{(n - 1)}

of order

\leq (k - 1)

, and

ψ_{4} = ψ_{4} (X_{1, t - 1}^{(n)})

and

ψ_{5} = ψ_{5} (X_{2, t - 1}^{(n)})

involve the moments of

X_{1, t - 1}^{(n)}

and

X_{2, t - 1}^{(n)}

of order

\leq (k - 1)

, respectively. According to (19) and (20), we obtain:

(21) $\begin{matrix} E {(X_{1 t}^{(n)})}^{k} & = E {(δ_{1 t})}^{k} + {(α_{11} \circ X_{1, t - 1}^{(n - 1)} + Z_{1 t})}^{k} + \sum_{j = 1}^{k - 1} (\binom{k}{j}) E {(δ_{1 t})}^{k - j} E {(α_{11} \circ X_{1, t - 1}^{(n - 1)} + Z_{1 t})}^{j} \\ \leq E {(δ_{1 t})}^{k} + {(α_{11} + b_{11})}^{k} E {(X_{1, t - 1}^{(n)})}^{k} + ψ_{4} + \sum_{j = 1}^{k - 1} (\binom{k}{j}) E {(δ_{1 t})}^{k - j} E {(α_{11} \circ X_{1, t - 1}^{(n)} + Z_{1 t})}^{j} \\ \leq c_{1}^{k} + γ^{k} E {(X_{1, t - 1}^{(n)})}^{k} + ψ_{4} + \sum_{j = 1}^{k - 1} (\binom{k}{j}) E {(δ_{1 t})}^{k - j} E {(α_{11} \circ X_{1, t - 1}^{(n)} + Z_{1 t})}^{j}, \end{matrix}$

(22) $\begin{matrix} E {(X_{2 t}^{(n)})}^{k} & = E {(δ_{2 t})}^{k} + {(α_{22} \circ X_{2, t - 1}^{(n - 1)} + Z_{2 t})}^{k} + \sum_{j = 1}^{k - 1} (\binom{k}{j}) E {(δ_{2 t})}^{k - j} E {(α_{22} \circ X_{2, t - 1}^{(n - 1)} + Z_{2 t})}^{j} \\ \leq E {(δ_{2 t})}^{k} + {(α_{22} + b_{22})}^{k} E {(X_{2, t - 1}^{(n)})}^{k} + ψ_{5} + \sum_{j = 1}^{k - 1} (\binom{k}{j}) E {(δ_{2 t})}^{k - j} E {(α_{22} \circ X_{2, t - 1}^{(n)} + Z_{2 t})}^{j} \\ \leq c_{2}^{k} + γ^{k} E {(X_{2, t - 1}^{(n)})}^{k} + ψ_{5} + \sum_{j = 1}^{k - 1} (\binom{k}{j}) E {(δ_{2 t})}^{k - j} E {(α_{22} \circ X_{2, t - 1}^{(n)} + Z_{2 t})}^{j} . \end{matrix}$

Using (21) and (22),

(23) $\begin{matrix} E {(X_{1 t}^{(n)})}^{k} + E {(X_{2 t}^{(n)})}^{k} \leq \frac{\sum_{i = 1}^{2} [c_{i}^{k} + \sum_{j = 1}^{k - 1} (\binom{k}{j}) E {(δ_{i t})}^{k - j} E {(α_{i i} \circ X_{i, t - 1}^{(n)} + Z_{i t})}^{j}] + ψ_{6}}{1 - γ^{k}}, \end{matrix}$

where

ψ_{6} = ψ_{4} + ψ_{5}

. Note that the numerator in (23) involves the moments of

X_{1, t - 1}^{(n)}

and

X_{2, t - 1}^{(n)}

of order

\leq k - 1

and is finite; thus,

E {(X_{1 t}^{(n)})}^{k} + E {(X_{2 t}^{(n)})}^{k}

is finite if

γ < 1

. In addition that both

E (X_{1 t}^{(n)})

and

E (X_{2 t}^{(n)})

are non-negative; thus,

E (X_{1 t}^{(n)})

and

E (X_{2 t}^{(n)})

are finite. □

3. Parameter Estimation

In this section, we consider the conditional maximum likelihood estimation for model (6). Let $θ = {(α_{i j}, b_{i j}, c_{i}, ϕ)}^{⊤}$ , $i, j = 1, 2$ . Suppose that $X_{0}, X_{1}, \dots, X_{T}$ are generated by the EBINAR(1) model with the true parameter value $θ_{0}$ .

By (11), the conditional log-likelihood function can be written as:

(24) $ℓ (θ) = \sum_{t = 1}^{T} ln P_{θ} (X_{t}| X_{t - 1}),$

where

$\begin{matrix} P_{θ} (X_{t}| X_{t - 1}) = P (X_{1 t} = X_{1 t}, X_{2 t} = X_{2 t} | X_{1, t - 1} = X_{1, t - 1}, X_{2, t - 1} = X_{2, t - 1}) = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} \\ (h (k_{1}, X_{1, t - 1}, X_{2, t - 1}, α_{11}, α_{12}) h (k_{2}, X_{1, t - 1}, X_{2, t - 1}, α_{21}, α_{22}) f (X_{1 t} - k_{1}, X_{2 t} - k_{2}, λ_{1 t}, λ_{2 t}, ϕ)) \end{matrix}$

with

λ_{1 t} = b_{11} X_{1, t - 1} + b_{12} X_{2, t - 1} + c_{1}

λ_{2 t} = b_{21} X_{1, t - 1} + b_{22} X_{2, t - 1} + c_{2}

g_{1} = min (X_{1 t}, X_{1, t - 1})

g_{2} = min (X_{2 t}, X_{2, t - 1})

f ()

and

h ()

are given in (2) and (8), respectively.

By using (3)–(5), and (9) and (10), we can derive the score equation:

(25) $\begin{matrix} \frac{\partial ℓ (θ_{0})}{\partial θ} = \sum_{t = 1}^{T} \frac{1}{P_{θ_{0}} (X_{t}| X_{t - 1})} \frac{\partial P_{θ_{0}} (X_{t}| X_{t - 1})}{\partial θ} = 0, \end{matrix}$

where

$\begin{matrix} \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial α_{11}} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} X_{1, t - 1} h (k_{2}, X_{1, t - 1}, X_{2, t - 1}, α_{21}, α_{22}) f (X_{1 t} - k_{1}, X_{2 t} - k_{2}, λ_{1 t}, λ_{2 t}, ϕ) \\ \times (h (k_{1} - 1, X_{1, t - 1} - 1, X_{2, t - 1}, α_{11}, α_{12}) - h (k_{1}, X_{1, t - 1} - 1, X_{2, t - 1}, α_{11}, α_{12})), \\ \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial α_{12}} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} X_{2, t - 1} h (k_{2}, X_{1, t - 1}, X_{2, t - 1}, α_{21}, α_{22}) f (X_{1 t} - k_{1}, X_{2 t} - k_{2}, λ_{1 t}, λ_{2 t}, ϕ) \\ \times (h (k_{1} - 1, X_{1, t - 1}, X_{2, t - 1} - 1, α_{11}, α_{12}) - h (k_{1}, X_{1, t - 1}, X_{2, t - 1} - 1, α_{11}, α_{12})), \\ \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial α_{21}} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} X_{1, t - 1} h (k_{1}, X_{1, t - 1}, X_{2, t - 1}, α_{11}, α_{12}) f (X_{1 t} - k_{1}, X_{2 t} - k_{2}, λ_{1 t}, λ_{2 t}, ϕ) \\ \times (h (k_{2} - 1, X_{1, t - 1} - 1, X_{2, t - 1}, α_{21}, α_{22}) - h (k_{2}, X_{1, t - 1} - 1, X_{2, t - 1}, α_{21}, α_{22})), \\ \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial α_{22}} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} X_{2, t - 1} h (k_{1}, X_{1, t - 1}, X_{2, t - 1}, α_{11}, α_{12}) f (X_{1 t} - k_{1}, X_{2 t} - k_{2}, λ_{1 t}, λ_{2 t}, ϕ) \\ \times (h (k_{2} - 1, X_{1, t - 1}, X_{2, t - 1} - 1, α_{21}, α_{22}) - h (k_{2}, X_{1, t - 1}, X_{2, t - 1} - 1, α_{21}, α_{22})), \\ \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial b_{11}} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} h (k_{1}, X_{1, t - 1}, X_{2, t - 1}, α_{11}, α_{12}) h (k_{2}, X_{1, t - 1}, X_{2, t - 1}, α_{21}, α_{22}) \\ \times X_{1, t - 1} f (X_{1 t} - k_{1} - 1, X_{2 t} - k_{2}, λ_{1 t}, λ_{2 t}, ϕ), \\ \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial b_{12}} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} h (k_{1}, X_{1, t - 1}, X_{2, t - 1}, α_{11}, α_{12}) h (k_{2}, X_{1, t - 1}, X_{2, t - 1}, α_{21}, α_{22}) \\ \times X_{2, t - 1} f (X_{1 t} - k_{1} - 1, X_{2 t} - k_{2}, λ_{1 t}, λ_{2 t}, ϕ), \\ \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial c_{1}} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} h (k_{1}, X_{1, t - 1}, X_{2, t - 1}, α_{11}, α_{12}) h (k_{2}, X_{1, t - 1}, X_{2, t - 1}, α_{21}, α_{22}) \\ \times f (X_{1 t} - k_{1} - 1, X_{2 t} - k_{2}, λ_{1 t}, λ_{2 t}, ϕ), \\ \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial b_{21}} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} h (k_{1}, X_{1, t - 1}, X_{2, t - 1}, α_{11}, α_{12}) h (k_{2}, X_{1, t - 1}, X_{2, t - 1}, α_{21}, α_{22}) \\ \times X_{1, t - 1} f (X_{1 t} - k_{1}, X_{2 t} - k_{2} - 1, λ_{1 t}, λ_{2 t}, ϕ), \\ \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial b_{22}} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} h (k_{1}, X_{1, t - 1}, X_{2, t - 1}, α_{11}, α_{12}) h (k_{2}, X_{1, t - 1}, X_{2, t - 1}, α_{21}, α_{22}) \\ \times X_{2, t - 1} f (X_{1 t} - k_{1} - 1, X_{2 t} - k_{2} - 1, λ_{1 t}, λ_{2 t}, ϕ), \\ \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial c_{2}} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} h (k_{1}, X_{1, t - 1}, X_{2, t - 1}, α_{11}, α_{12}) h (k_{2}, X_{1, t - 1}, X_{2, t - 1}, α_{21}, α_{22}) \\ \times f (X_{1 t} - k_{1}, X_{2 t} - k_{2} - 1, λ_{1 t}, λ_{2 t}, ϕ), \end{matrix}$

$\begin{matrix} \frac{\partial P_{θ} (X_{t}| X_{t - 1})}{\partial ϕ} = \sum_{k_{1} = 0}^{g_{1}} \sum_{k_{2} = 0}^{g_{2}} h (k_{1}, X_{1, t - 1}, X_{2, t - 1}, α_{11}, α_{12}) h (k_{2}, X_{1, t - 1}, X_{2, t - 1}, α_{21}, α_{22}) \\ \times (f (X_{1 t} - k_{1}, X_{2 t} - k_{2}, λ_{1 t}, λ_{2 t}, ϕ) - f (X_{1 t} - k_{1} - 1, X_{2 t} - k_{2}, λ_{1 t}, λ_{2 t}, ϕ) . \\ - f (X_{1 t} - k_{1}, X_{2 t} - k_{2} - 1, λ_{1 t}, λ_{2 t}, ϕ) + f (X_{1 t} - k_{1} - 1, X_{2 t} - k_{2} - 1, λ_{1 t}, λ_{2 t}, ϕ)) . \end{matrix}$

The maximizer ${\hat{θ}}_{T}$ of (24) is the CML estimate of $θ_{0}$ , which is obtained by numerically maximizing the log-likelihood (24) or by solving the score Equation (25). To study the asymptotic behaviour of the estimator, we make the following two Assumptions about the parameter space and the underlying process.

Assumption 1.

The parametric space Θ is compact with $Θ = {θ, θ = {α_{i j}, b_{i j}, c_{1}, c_{2}, ϕ}^{⊤}, i, j = 1, 2}$ , where $\underset{̲}{δ} \leq α_{i j}, b_{i j} \leq \bar{δ}, \underset{̲}{c} \leq c_{i} \leq \bar{c}, \underset{̲}{ϕ} \leq ϕ \leq \bar{ϕ}, γ = max (α_{i j} + b_{i j}) < 1,$ $\underset{̲}{δ}$ , $\bar{δ}$ , $\underset{̲}{c}$ , $\bar{c}$ , $\underset{̲}{ϕ}$ and $\bar{ϕ}$ are finite positive constants, and $θ_{0}$ is an interior point in Θ.

Assumption 2.

If there exists a $t \geq 1$ , such that $X_{t} (θ_{0}) = X_{t} (θ)$ , $P_{θ_{0}}$ a.s., then $θ = θ_{0},$ where $P_{θ_{0}}$ is the probability measure under the true parameter $θ_{0} .$

To derive the identification of the EBINAR(1) model, we give the following two Lemmas.

Lemma 1.

Let $g_{1} (x, y, b_{11}, b_{12}, c_{1}) = b_{11} x + b_{12} y + c_{1}$ , $b_{11}, b_{12}, c_{1} > 0$ for $(x, y) \in R^{+} \times R^{+} .$ Then, if $g_{1} (x, y, b_{11}, b_{12}, c_{1}) = g_{1} (x, y, b_{11}^{0}, b_{12}^{0}, c_{1}^{0})$ , then $b_{11} = b_{11}^{0}, b_{12} = b_{12}^{0}, c_{1} = c_{1}^{0}$ .

Proof.

By the assumption:

$\begin{matrix} \frac{\partial g_{1} (x, y, b_{11}, b_{12}, c_{1})}{\partial x} = \frac{\partial g_{1} (x, y, b_{11}^{0}, b_{12}^{0}, c_{1}^{0})}{\partial x}, \\ \frac{\partial g_{1} (x, y, b_{11}, b_{12}, c_{1})}{\partial y} = \frac{\partial g_{1} (x, y, b_{11}^{0}, b_{12}^{0}, c_{1}^{0})}{\partial y}, \\ g_{1} (0, 0, b_{11}, b_{12}, c_{1}) = g_{1} (0, 0, b_{11}^{0}, b_{12}^{0}, c_{1}^{0}) . \end{matrix}$

we obtain:

b_{11} = b_{11}^{0}, b_{12} = b_{12}^{0}, c_{1} = c_{1}^{0}

. □

Similarly, we denote $g_{2} (x, y, b_{21}, b_{22}, c_{2}) = b_{21} x + b_{22} y + c_{2}$ . If $g_{2} (x, y, b_{21}, b_{22}, c_{2}) = g_{2} (x, y, b_{21}^{0}, b_{22}^{0}, c_{2}^{0})$ , then we have $b_{21} = b_{21}^{0}, b_{22} = b_{22}^{0}, c_{2} = c_{2}^{0}$ by the same method.

Lemma 2.

If ${X_{t}}$ is the strictly stationary and ergodic solution of model (6), Assumptions 1 and 2 hold, then model (6) is identifiable.

Proof.

According to Lemma 1, we conclude that if $λ_{1 t} (b_{11}, b_{12}, c_{1}) = λ_{1 t} (b_{11}^{0}, b_{12}^{0}, c_{1}^{0})$ , then $b_{11} = b_{11}^{0}, b_{12} = b_{12}^{0}, c_{1} = c_{1}^{0}$ . Similarly, if $λ_{2 t} (b_{21}, b_{22}, c_{2}) = λ_{2 t} (b_{21}^{0}, b_{22}^{0}, c_{2}^{0})$ , then $b_{21} = b_{21}^{0}, b_{22} = b_{22}^{0}, c_{2} = c_{2}^{0}$ . Thus, if $ϵ_{t} (b_{i j}, c_{i}, ϕ) = ϵ_{t} (b_{i j}^{0}, c_{i}^{0}, ϕ^{0})$ , $t \geq 1, i, j = 1, 2$ , then $b_{i j} = b_{i j}^{0}$ , $c_{i} = c_{i}^{0}$ , $ϕ = ϕ^{0} .$ According to (7), we have $ϵ_{i t} = X_{i t} - α_{i 1} \circ X_{1, t - 1} - α_{i 2} \circ X_{2, t - 1}$ , $i = 1, 2 .$ If $ϵ_{i t} (θ) = ϵ_{i t} (θ_{0})$ , then we have $α_{i 1} = α_{i 1}^{0}, α_{i 2} = α_{i 2}^{0}$ , otherwise

$0 = E (ϵ_{i t} (θ)) - E (ϵ_{i t} (θ_{0})) = (α_{i 1} - α_{i 1}^{0}) E (X_{1 t}) + (α_{i 2} - α_{i 2}^{0}) E (X_{2 t}),$

then

E (X_{1 t}) = 0

and

E (X_{2 t}) = 0

, which contradicts the fact that

E (X_{i t}) > 0

i = 1, 2

By Assumption 2, for given $X_{1, t - 1}$ and $X_{2, t - 1}$ , we have

$ϕ = Cov (X_{1 t} (θ), X_{2 t} (θ)) = Cov (X_{1 t} (θ_{0}), X_{2 t} (θ_{0})) = ϕ^{0} .$

Thus, $ϕ = ϕ^{0} .$ Hence, model (6) is identifiable. □

Theorem 7.

Suppose that ${X_{t}}$ is the strictly stationary and ergodic solution of model (6) and Assumptions 1 and 2 hold. As $T \to \infty$ , there exists an estimator ${\hat{θ}}_{T}$ such that ${\hat{θ}}_{T} \overset{a . s .}{⟶} θ_{0}$ .

Proof.

To prove the strong consistence of ${\hat{θ}}_{T}$ , we need to check all the assumptions given in Theorems 4.1.2 and 4.1.3 in Amemiya [25]. Let $W_{t} (θ) = ln P_{θ} (X_{t} | X_{t - 1})$ , then $ℓ (θ) = \sum_{t = 1}^{T} W_{t} (θ)$ . We observe that $W_{t} (θ)$ is a measurable function of $X_{t}$ for all $θ \in Θ$ , and is continuous in an open and convex neighborhood $N (θ_{0})$ of $θ_{0}$ , then there at least exists a point $\bar{θ} \in N (θ_{0})$ such that $W_{t} (θ)$ attains the maximum value at $\bar{θ}$ .

Thus,

$E (sup_{θ \in N (θ_{0})} W_{t} (θ)) = E (ln P_{\bar{θ}} (X_{t} | X_{t - 1})) \leq ln E (P_{\bar{θ}} (X_{t} | X_{t - 1})) < \infty .$

Note that ${X_{t}}$ is a stationary and ergodic time series, and in terms of Theorem 4.1.2 in Amemiya [25], $\frac{1}{T} \sum_{t = 1}^{T} W_{t} (θ) \to E W_{t} (θ)$ in probability as $T \to \infty$ . By Jensen’s inequality, we have:

(26) $E (W_{t} (θ)) - E (W_{t} (θ_{0})) = E ln \frac{P_{θ} (X_{t} | X_{t - 1})}{P_{θ_{0}} (X_{t} | X_{t - 1})} \leq ln E \frac{P_{θ} (X_{t} | X_{t - 1})}{P_{θ_{0}} (X_{t} | X_{t - 1})} = 0 .$

Thus, $E W_{t} (θ)$ attains a strict local maximum at $θ_{0}$ by (26) and Lemma 2. Hence, the conditions of Theorem 4.1.2 in Amemiya [25] are fulfilled; thus, there exists an estimator ${\hat{θ}}_{T}$ such that ${\hat{θ}}_{T} \to θ_{0}$ , $T \to \infty$ . □

Theorem 8.

If the conditions of Theorem 7 hold, as $T \to \infty$ ,

$\sqrt{T} ({\hat{θ}}_{T} - θ_{0}) \overset{d}{⟶} N (0, {(J (θ_{0}))}^{- 1} I (θ_{0}) {(J (θ_{0}))}^{- 1}),$

where $I (θ_{0}) = lim_{T \to \infty} T^{- 1} E (\frac{\partial ℓ (θ_{0})}{\partial θ} {(\frac{\partial ℓ (θ_{0})}{\partial θ})}^{⊤}), J (θ_{0}) = lim_{T \to \infty} T^{- 1} E (\frac{\partial^{2} ℓ (θ_{0})}{\partial θ \partial θ^{⊤}}) .$

Proof.

To prove the asymptotic normality of ${\hat{θ}}_{T}$ , we need to verify the assumptions of Theorem 4.1.3 in Amemiya [25].

First, by Proposition 1 in Freeland and McCabe [26], it is easy to obtain all the partial derivatives in a similar way, i.e., $\frac{\partial W_{t} (θ)}{\partial θ_{i}}$ exist and are three times continuous differentiable in $Θ$ ; thus, $\frac{\partial^{2} W_{t} (θ)}{\partial θ_{i} \partial θ_{j}}$ exists and is continuous in $N (θ_{0})$ , for any $i, j, k = 1, 2, \dots, 11$ . Thus, there at least exists a point $\tilde{θ} \in N (θ_{0})$ such that $\frac{\partial^{2} W_{t} (θ)}{\partial θ_{i} \partial θ_{j}}$ attains the maximum value at $\tilde{θ}$ . Hence,

$E (sup_{θ \in N (θ_{0})} \frac{\partial^{2} W_{t} (θ)}{\partial θ_{i} \partial θ_{j}}) = E (\frac{\partial^{2} W_{t} (\tilde{θ})}{\partial θ_{i} \partial θ_{j}}) < \infty .$

For convenience, we denote: $\frac{\partial^{2} ℓ (θ)}{\partial θ \partial θ^{⊤}} = G (X_{t}, θ) = (g_{i j} (X_{t}, θ))$ and $E (\frac{\partial^{2} ℓ (θ)}{\partial θ \partial θ^{⊤}}) = G (θ) = (g_{i j} (X_{t}, θ))$ . We only need to prove $g_{i j} (X_{t}, θ)$ converges to a finite and non-stochastic function $g_{i j} (θ) = E (g_{i j} (X_{t}, θ))$ . Let $h (X_{t}, θ) = g_{i j} (X_{t}, θ) - E [g_{i j} (X_{t}, θ)]$ , then $E h (X_{t}, θ) = 0$ . Hence, the conditions of Theorem 4.1.3 in [25] are fulfilled. Thus, $T^{- 1} \sum_{t = 1}^{T} h (X_{t}, θ)$ converges to 0 in probability uniformly in $θ \in N (θ_{0})$ . Furthermore, $T^{- 1} \sum_{t = 1}^{T} h (X_{t}, θ_{T}^{⋆})$ converges to 0 in probability, when $θ_{T}^{⋆} \to θ_{0}$ , $T \to \infty$ .

Second, it is easy to see $Cov (\partial W_{t} (θ_{0}) / \partial θ) = E [\partial W_{t} (θ_{0}) / \partial θ \partial W_{t} (θ_{0}) / \partial θ^{⊤}]$ because $E (\partial W_{t} (θ_{0}) / \partial θ) = 0$ .

Using the ergodic theorem,

$\frac{1}{T} \frac{\partial ℓ (θ_{0})}{\partial θ} \overset{p}{⟶} E \frac{1}{P_{θ_{0}} (X_{t} | X_{t - 1})} \frac{\partial P_{θ_{0}} (X_{t} | X_{t - 1})}{\partial θ} .$

Using the martingale central limit theorem and the Cramér–Wold device, it is direct to show that

$\frac{1}{\sqrt{T}} \partial ℓ (θ_{0}) / \partial θ \overset{d}{⟶} N (0, I (θ_{0})) with I (θ_{0}) = {lim}_{T \to \infty} T^{- 1} E [\partial ℓ (θ_{0}) / \partial θ {(\partial ℓ (θ_{0}) / \partial θ)}^{⊤}] .$

Third, there exists an $H (X_{1 t}, X_{2 t})$ such that $|\frac{\partial^{3} ln ℓ (θ)}{\partial θ_{i} \partial θ_{j} \partial θ_{k}}| \leq H (X_{1 t}, X_{2 t})$ and $E [H (X_{1 t}, X_{2 t})]$ $< \infty$ by Theorem 4. By the Taylor expansion, we have

(27) $\frac{\partial ℓ ({\hat{θ}}_{T})}{\partial θ} = \frac{\partial ℓ (θ_{0})}{\partial θ} + \frac{\partial^{2} ℓ (θ_{T}^{⋆})}{\partial θ \partial θ^{⊤}} ({\hat{θ}}_{T} - θ_{0}),$

where

θ_{T}^{⋆}

lies in between

{\hat{θ}}_{T}

and

θ_{0}

. We observe that the

\frac{\partial ℓ (θ_{0})}{\partial θ} = 0

in (27) by (25), then

(28) $\sqrt{T} ({\hat{θ}}_{T} - θ_{0}) = {[\frac{1}{T} \frac{\partial^{2} ℓ (θ_{T}^{⋆})}{\partial θ \partial θ^{⊤}}]}^{- 1} \frac{1}{\sqrt{T}} \frac{\partial ℓ ({\hat{θ}}_{T})}{\partial θ} .$

Hence, the asymptotic normality of ${\hat{θ}}_{T}$ follows from (28). □

4. Simulation

In this section, we conduct a simulation study to illustrate the finite sample property of the CML estimate. The simulation is carried out in $R$ by using the $optim$ function for the optimization of the conditional log-likelihood function.

In the simulation, we generate data from the non-diagonal EBINAR(1) model and the diagonal EBINAR(1) model. The sizes of samples are chosen to be 50, 100, 200, 500 and 1000 to reflect relatively small, small, moderate, large and relatively large sample sizes, and we use 500 replications. For the simulated sample, performances of the estimators are evaluated by mean squared error (MSE) and mean absolute deviation error (MADE), where $MSE = \frac{1}{m} \sum_{i = 1}^{m} {({\hat{φ}}_{i} - φ)}^{2}$ , $MADE = \frac{1}{m} \sum_{i = 1}^{m} | {\hat{φ}}_{i} - φ |$ , where ${\hat{φ}}_{i}$ is the estimator of $φ$ in the ith replication and m denotes replication times. The used parameter combinations of $θ = {(α_{11}, α_{12}, α_{21}, α_{22}, b_{11}, b_{12}, b_{21}, b_{22}, c_{1}, c_{2}, ϕ)}^{⊤}$ are listed as follows:

(1). For a non-diagonal model: $θ = {(0.3, 0.1, 0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.6, 0.6, 0.5)}^{⊤}$ ;

(2). For a diagonal model: $θ = I : {(0.2, 0, 0, 0.3, 0.3, 0, 0, 0.2, 0.6, 0.6, 0.5)}^{⊤}$ , $II : (0.2, 0, 0, 0.3,$ ${0.3, 0, 0, 0.2, 2, 2, 1)}^{⊤},$ $III : {(0.1, 0, 0, 0.4, 0.4, 0, 0, 0.1, 0.6, 0.6, 0.5)}^{⊤},$ $IV : (0.1, 0, 0, 0.4, 0.4, 0, 0, 0.1,$ ${2, 2, 1)}^{⊤} .$

Table 1, Table 2, Table 3, Table 4 and Table 5 show that the MSE and MADE decrease with the increase in T for diagonal and non-diagonal models, which implies that the estimators are consistent.

To illustrate the location and dispersion of the estimates, we present the boxplots of the estimates for the non-diagonal and I of diagonal parameter combinations in Figure 1 and Figure 2; the others are similar.

Figure 1 and Figure 2 illustrate the large sample properties of the estimators on a limited sample size. In general, the estimated medians are apparently closer to the real parameter values with the sample size increases. Regarding dispersion issues, both the interquartile ranges and the overall ranges of the produced values are narrower with the sample size increases.

5. Illustrative Examples

In this section, we apply the proposed model to two crime datasets coming from different number of car beats, which is the unique ID for the observation unit’s geography in Pittsburgh Police Department. The crime data is available online at “The Forecasting Principles” site (http://www.forecastingprinciples.com/index.php/crimedata) in the section about Crime data and download on 23 September 2016.

According to Cohen and Gorr [27], the occurrence of criminal mischief may be accompanied by burglary behavior, so does for the robbery. Hence, the monthly counts of burglary and CMIS (or those of burglary and robbery) may exhibit dependence. In this section, we take the monthly counts of burglary and CMIS in beat 11 and the monthly counts of burglary and robbery in beat 26 as examples.

5.1. Monthly Counts of Burglary and CMIS in Beat 11

In this part, we consider the monthly number of burglary and criminal mischief (CMIS) from January 1990 to December 2001 in the geographic ID = 11. Table 6 gives the statistics of the counts of burglaries and CMIS.

Table 6 shows that both the counts of burglaries and CMIS are over-dispersed because their variances are greater than their means. In contrast, this relationship can also be illustrated by the cross-correlation graph of the samples, which are given in Figure 3.

From Figure 3, the counts of burglaries are weakly dependent with those of CMIS. Their plots of sample path, autocorrelation function (ACF) and partial autocorrelation function (PACF) are given in Figure 4, which show that the analyzed data sets are bivariate integer-valued time series with some characteristics of mutual influence.

To give quantitative results about cross-correlation, we compare our model with the following models:

Full BINAR-BP with $ϵ_{t}$ following $BP (λ_{1}, λ_{2}, ϕ)$ [16];
Full BINAR-NB with $ϵ_{t}$ following bivariate negative binomial distribution with parameters $(λ_{1}, λ_{2}, β)$ ; see [14,16] for detail.

As the goodness-of-fit criteria, we use the Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the mean square error of the Pearson residuals (PRMS), which is equal to $\sum_{t = 1}^{n} Z_{t}^{2} / (n - p^{*})$ , where $p^{*}$ denotes the number of estimated parameters and $Z_{t}$ denotes standardized Pearson residuals.

The CML estimate and approximated standard error (SE) of the parameter, including the fitted values of PRMS, AIC, BIC and log-likelihood function (Log Lik), are summarized in Table 7, where the approximated standard error is computed by using the estimated version of the robust sandwich matrix ${(J (θ_{0}))}^{- 1} I (θ_{0}) {(J (θ_{0}))}^{- 1}$ ; see Theorem 8 for details.

Table 7 shows that the EBINAR(1) model takes the highest Log Lik value and the lowest AIC, BIC and PRMS for the monthly number of burglaries and CMIS. Hence, the EBINAR(1) model is more suitable for the data sets.

5.2. Monthly Counts of Burglaries and Robberies in Beat 26

In this part, we consider the monthly number of burglaries and robberies from January 1990 to December 2001 in the geographic ID = 26; see Table 8 for some of their statistics.

Table 8 shows the monthly number of burglary and robbery are over-dispersed. In contrast, this relationship can also be illustrated by their cross-correlation graph given in Figure 5, which shows that the counts of burglary are significantly dependent on those of robbery.

To further illustrate the the monthly number of burglaries and robberies in beat 26, we present their sample path, ACF and PACF plots in Figure 6, from which we can conclude that the analyzed data set exhibits some characteristics of mutual influence.

To give quantitative result about cross-correlation, we compare our model with the Full BINAR-BP and Full BINAR-NB models. The CML estimate and SE, including the fitted PRMS, AIC, BIC and Log Lik, are summarized in Table 9.

Table 9 shows that the EBINAR(1) model takes the highest Log Lik value and the lowest AIC, BIC and PRMS for burglaries and robberies in beat 26. Hence, EBINAR(1) model is more suitable.

To sum up, our findings reveal that there are some connections for the burglary and CMIS in beat 11 and those for the burglary and robbery in beat 26, which agrees with the conclusion of Cohen and Gorr [27]. Of course, the counts of burglary may be affected by other crime activities, such as simple assault, vagrancy and trespassing, which will be studied in a further study.

Remark 2.

For the two real datasets, our EBINAR(1) model performs best, but it is not clear enough regarding predicting unseen data. To further illustrate the better performance of the new model in prediction, one available solution is to conduct a further experiment when dividing the considered data into a training set and test set. In addition such experiment will be considered in future study of the crime data.

6. Concluding Remarks

This paper proposes a more flexible model for bivariate integer-valued time series data, i.e., the EBINAR(1) model, whose innovation vector is time-dependent. It is a generalization of the EINAR(1) model [11] to the two-dimensional case as well as a generalization of the BINAR(1) model [14,16], but with more flexibility. We discuss some necessary properties of the model, the CML estimators of parameters and their large-sample properties. Simulation was conducted to examine the finite sample performance of estimators. Real data examples are provided to illustrate our model to be effective relative to existing models.

To make the bivariate INAR-type models more flexible with respect to real-data applications in some cases, it may be interesting to include explanatory covariates or periodicity in the model to account for dependence through thinning operations on several factors, which will be considered in another project: see Aknouche et al. [28] and Chen and Khamthong [29].

Author Contributions

Conceptualization, H.C. and F.Z.; methodology, H.C. and F.Z.; software, H.C., F.Z. and X.L.; validation, H.C., F.Z. and X.L.; formal analysis, H.C.; investigation, H.C.; resources, F.Z.; data curation, X.L.; writing—original draft preparation, H.C. and F.Z.; writing—review and editing, H.C. and F.Z.; visualization, H.C. and F.Z.; supervision, F.Z. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Crime data are available online at The Forecasting Principles site and were downloaded on 23 September 2016 from http://www.forecastingprinciples.com/index.php/crimedata.

Acknowledgments

We are very grateful to anonymous referees for providing several exceptionally helpful comments which led to a significant improvement in the manuscript.

Conflicts of Interest

The authors declare no conflict of interests in the publication of this paper.

Abbreviations

The following abbreviations are used in this manuscript:

$A^{- 1}$	inverse of matrix $A$ ;
$A^{⊤}$	transpose of the matrix or vector $A$ ;
$∥ \cdot ∥$	Euclidean norm of a matrix or vector;
$\| \cdot \|$	absolute value of a univariate variable;
$\overset{d}{⟶}$	convergence in distribution;
$\overset{p}{⟶}$	convergence in probability one;
pmf	probability mass function;
CML	conditional maximun likelihood;
AIC	Akaike information criterion;
BIC	Bayesian information criterion;
SE	standard error;
PRMS	mean square error of the Pearson residual;
Para.	parameter.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures and Tables

Figure 1. Boxplots of the CML estimates for non-diagonal EBINAR(1) model.

Figure 2. Boxplots of the CML estimates for diagonal EBINAR(1) model with parameter I.

Figure 3. Cross-correlation between the monthly number of burglaries and CMIS in beat 11.

Figure 4. Beat 11: (1) monthly number of burglary, (2) monthly number of CMIS, (3) ACF of burglary, (4) ACF of CMIS, (5) PACF of burglary, (6) PACF of CMIS.

Figure 5. Cross-correlation between the monthly number of burglaries and robberies in beat 26.

View Image - Figure 6. Beat 26: (1) monthly number of burglaries, (2) monthly number of robberies, (3) ACF of burglaries, (4) ACF of robberies, (5) PACF of burglaries, (6) PACF of robberies.

Figure 6. Beat 26: (1) monthly number of burglaries, (2) monthly number of robberies, (3) ACF of burglaries, (4) ACF of robberies, (5) PACF of burglaries, (6) PACF of robberies.

Table 1

Results for non-diagonal EBINAR(1) model.

Size		$α_{11}$	$α_{12}$	$α_{21}$	$α_{22}$	$b_{11}$	$b_{12}$	$b_{21}$	$b_{22}$	$c_{1}$	$c_{2}$	$ϕ$
50	MSE	0.0095	0.0122	0.0082	0.0177	0.0048	0.0187	0.0100	0.0380	0.0172	0.0246	0.0160
50	MADE	0.0561	0.0537	0.0390	0.0460	0.0371	0.0487	0.0409	0.0750	0.0857	0.0991	0.0811
100	MSE	0.0069	0.0043	0.0033	0.0127	0.0063	0.0070	0.0099	0.0090	0.0128	0.0210	0.0150
100	MADE	0.0473	0.0269	0.0287	0.0417	0.0399	0.0355	0.0378	0.0578	0.0763	0.0913	0.0736
200	MSE	0.0044	0.0019	0.0034	0.0108	0.0058	0.0063	0.0054	0.0063	0.0178	0.0185	0.0170
200	MADE	0.0421	0.0281	0.0326	0.0291	0.0426	0.0363	0.0372	0.0532	0.0901	0.0876	0.0824
500	MSE	0.0033	0.0008	0.0011	0.0105	0.0044	0.0027	0.0008	0.0061	0.0029	0.0044	0.0063
500	MADE	0.0317	0.0161	0.0213	0.0469	0.0379	0.0293	0.0225	0.0556	0.0446	0.0529	0.0465
1000	MSE	0.0002	0.0001	0.0005	0.0041	0.0014	0.0007	0.0002	0.0006	0.0006	0.0015	0.0048
1000	MADE	0.0116	0.0079	0.0167	0.0413	0.0280	0.0225	0.0114	0.0190	0.0199	0.0345	0.0379

Table 2

Results for diagonal EBINAR(1) model with parameter I.

Size		$α_{11}$	$α_{22}$	$b_{11}$	$b_{22}$	$c_{1}$	$c_{2}$	$ϕ$
50	MSE	0.0031	0.0146	0.0205	0.0030	1.2188	0.6134	0.3256
50	MADE	0.0406	0.1021	0.1250	0.0398	0.7710	0.5880	0.5059
100	MSE	0.0020	0.0062	0.0134	0.0023	0.7887	0.4527	0.2778
100	MADE	0.0323	0.0665	0.0976	0.0330	0.6125	0.4978	0.4843
200	MSE	0.0015	0.0045	0.0088	0.0010	0.4832	0.3250	0.2572
200	MADE	0.0300	0.0524	0.0775	0.0217	0.5016	0.3995	0.4742
500	MSE	0.0007	0.0031	0.0043	0.0010	0.2240	0.1528	0.2198
500	MADE	0.0202	0.0396	0.0495	0.0227	0.3655	0.2670	0.4312
1000	MSE	0.0005	0.0020	0.0022	0.0004	0.1965	0.1147	0.1789
1000	MADE	0.0156	0.0352	0.0377	0.0142	0.3100	0.2084	0.3954

Table 3

Results for diagonal EBINAR(1) model with parameter II.

Size		$α_{11}$	$α_{22}$	$b_{11}$	$b_{22}$	$c_{1}$	$c_{2}$	$ϕ$
50	MSE	0.0154	0.0183	0.0122	0.0191	0.7510	1.1007	0.3183
50	MADE	0.0871	0.0988	0.0903	0.0949	0.6894	0.8269	0.5008
100	MSE	0.0059	0.0089	0.0059	0.0072	0.4333	0.6728	0.2336
100	MADE	0.0470	0.0742	0.0599	0.0582	0.4957	0.5889	0.4442
200	MSE	0.0042	0.0044	0.0041	0.0053	0.2876	0.4939	0.1983
200	MADE	0.0411	0.0499	0.0475	0.0470	0.3796	0.4866	0.4193
500	MSE	0.0027	0.0035	0.0036	0.0025	0.1038	0.3240	0.1899
500	MADE	0.0344	0.0400	0.0414	0.0326	0.2636	0.4216	0.4107
1000	MSE	0.0013	0.0022	0.0017	0.0011	0.0730	0.0855	0.1352
1000	MADE	0.0238	0.0303	0.0307	0.0204	0.1978	0.2221	0.3512

Table 4

Results for diagonal EBINAR(1) model with parameter III.

Size		$α_{11}$	$α_{22}$	$b_{11}$	$b_{22}$	$c_{1}$	$c_{2}$	$ϕ$
50	MSE	0.0027	0.0048	0.0473	0.0083	0.0078	0.0083	0.0013
50	MADE	0.0258	0.0428	0.1533	0.0546	0.0620	0.0586	0.0316
100	MSE	0.0036	0.0064	0.0429	0.0102	0.0089	0.0087	0.0017
100	MADE	0.0359	0.0485	0.1486	0.0640	0.0680	0.0638	0.0330
200	MSE	0.0060	0.0059	0.0380	0.0059	0.0054	0.0047	0.0017
200	MADE	0.0341	0.0469	0.1239	0.0469	0.0541	0.0507	0.0321
500	MSE	0.0018	0.0042	0.0082	0.0031	0.0046	0.0039	0.0016
500	MADE	0.0312	0.0429	0.0638	0.0380	0.0477	0.0426	0.0287
1000	MSE	0.0011	0.0030	0.0037	0.0020	0.0020	0.0016	0.0005
1000	MADE	0.0253	0.0395	0.0380	0.0331	0.0384	0.0341	0.0182

Table 5

Results for diagonal EBINAR(1) model with parameter IV.

Size		$α_{11}$	$α_{22}$	$b_{11}$	$b_{22}$	$c_{1}$	$c_{2}$	$ϕ$
50	MSE	0.0031	0.0146	0.0205	0.0030	1.2188	0.6134	0.3256
50	MADE	0.0406	0.1021	0.1250	0.0398	0.7710	0.5880	0.5059
100	MSE	0.0020	0.0062	0.0134	0.0023	0.7887	0.4527	0.2778
100	MADE	0.0323	0.0665	0.0976	0.0330	0.6125	0.4978	0.4843
200	MSE	0.0015	0.0045	0.0088	0.0010	0.4832	0.3250	0.2572
200	MADE	0.0300	0.0524	0.0775	0.0217	0.5016	0.3995	0.4742
500	MSE	0.0007	0.0031	0.0043	0.0010	0.2240	0.1528	0.2198
500	MADE	0.0202	0.0396	0.0495	0.0227	0.3655	0.2670	0.4312
1000	MSE	0.0005	0.0020	0.0022	0.0004	0.1965	0.1147	0.1789
1000	MADE	0.0156	0.0352	0.0377	0.0142	0.3100	0.2084	0.3954

Table 6

Summary statistics for the monthly number of burglaries and CMIS in beat 11.

Data	Mean	Variance	Minimum	Median	Maximum
Burglary	2.8819	4.1188	0	3	10
CMIS	6.3819	10.0839	1	6	22

Table 7

Estimates for the monthly numbers of burglaries and those of CMIS in beat 11.

	EBINAR(1)			Full BINAR(1)-NB			Full BINAR(1)-BP
Para.	Estimate	SE	Para.	Estimate	SE	Para.	Estimate	SE
${\hat{α}}_{11}$	0.1689	0.1559	${\hat{α}}_{11}$	0.2784	0.0665	${\hat{α}}_{11}$	0.2993	0.0838
${\hat{α}}_{12}$	0.0179	0.0411	${\hat{α}}_{12}$	0.0217	0.0092	${\hat{α}}_{12}$	0.0217	0.0215
${\hat{α}}_{21}$	0.0390	0.1447	${\hat{α}}_{21}$	0.1060	0.0550	${\hat{α}}_{21}$	0.1060	0.0719
${\hat{α}}_{22}$	0.1131	0.1236	${\hat{α}}_{22}$	0.5010	0.0295	${\hat{α}}_{22}$	0.1934	0.0551
${\hat{b}}_{11}$	0.0690	0.1460
${\hat{b}}_{12}$	0.0093	0.0559
${\hat{b}}_{21}$	0.1014	0.1809
${\hat{b}}_{22}$	0.1354	0.1414
${\hat{c}}_{1}$	1.0007	0.4372	${\hat{λ}}_{1}$	1.9814	0.0186	${\hat{λ}}_{1}$	1.5164	0.2561
${\hat{c}}_{2}$	3.3190	0.5478	${\hat{λ}}_{2}$	2.3137	0.0166	${\hat{λ}}_{2}$	4.5258	0.4493
$\hat{ϕ}$	0.5273	0.2628	$\hat{β}$	0.1374	0.9759	$\hat{ϕ}$	0.4044	0.2274
PRMS	0.0064			0.0245			0.0103
AIC	1315.4620			1387.8913			1350.9488
BIC	1348.1300			1408.6800			1371.7375
Log Lik	−646.7310			−686.9457			−668.4744

Table 8

Summary statistics for the monthly number of burglaries and robberies in beat 26.

Data	Mean	Variance	Minimum	Median	Maximum
Burglary	3.9306	9.7434	0	3	15
Robbery	3.0625	9.6394	0	2	17

Table 9

Estimates for the monthly number of burglaries and robberies in beat 26.

	EBINAR(1)			Full BINAR(1)-NB			Full BINAR(1)-BP
Para.	Estimate	SE	Para.	Estimate	SE	Para.	Estimate	SE
${\hat{α}}_{11}$	0.3117	0.0654	${\hat{α}}_{11}$	0.2314	0.3042	${\hat{α}}_{11}$	0.2765	0.0537
${\hat{α}}_{12}$	0.2086	0.0611	${\hat{α}}_{12}$	0.3172	0.2442	${\hat{α}}_{12}$	0.0927	0.0471
${\hat{α}}_{21}$	0.0900	0.0511	${\hat{α}}_{21}$	0.1099	0.2834	${\hat{α}}_{21}$	0.0001	0.0000
${\hat{α}}_{22}$	0.1906	0.1163	${\hat{α}}_{22}$	0.4361	0.2244	${\hat{α}}_{22}$	0.4249	0.0415
${\hat{b}}_{11}$	0.0671	0.0706
${\hat{b}}_{12}$	0.2280	0.0653
${\hat{b}}_{21}$	0.1233	0.0511
${\hat{b}}_{22}$	0.3358	0.1161
${\hat{c}}_{1}$	0.2043	0.2048	${\hat{λ}}_{1}$	2.2310	0.0026	${\hat{λ}}_{1}$	1.7652	0.2048
${\hat{c}}_{2}$	0.4139	0.1139	${\hat{λ}}_{2}$	1.1708	0.0076	${\hat{λ}}_{2}$	0.9604	0.1601
$\hat{ϕ}$	0.5599	0.1187	$\hat{β}$	0.4073	0.7189	$\hat{ϕ}$	0.7778	0.1494
PRMS	0.0087			0.0748			0.0992
AIC	1320.8092			1344.6968			1357.7718
BIC	1353.4771			1365.4855			1378.5604
Log Lik	−649.4046			−665.3484			−671.8859

References

1. Steutel, F.W.; van Harn, K. Discrete analogues of self-decomposability and stability. Ann. Probab.; 1979; 7, pp. 893-899. [DOI: https://dx.doi.org/10.1214/aop/1176994950]

2. Al-Osh, M.A.; Alzaid, A.A. First-order integer-valued autoregressive process. J. Time Ser. Anal.; 1987; 8, pp. 261-275. [DOI: https://dx.doi.org/10.1111/j.1467-9892.1987.tb00438.x]

3. McKenzie, E. Some simple models for discrete variate time series. Water Resoure Bull.; 1985; 21, pp. 645-650. [DOI: https://dx.doi.org/10.1111/j.1752-1688.1985.tb05379.x]

4. Du, J.; Li, Y. The integer valued autoregressive INAR(p) model. J. Time Ser. Anal.; 1991; 12, pp. 129-142.

5. Alzaid, A.A.; Omair, M.A. Poisson difference integer valued autoressive model of order one. Bull. Malays. Math. Sci. Soc.; 2014; 37, pp. 465-485.

6. Chen, H.; Li, Q.; Zhu, F. Binomial AR(1) processes with innovational outliers. Commun. Stat. Theory Methods; 2021; 50, pp. 446-472. [DOI: https://dx.doi.org/10.1080/03610926.2019.1635704]

7. Weiß, C.H. Thinning operations for modeling time series of counts—A survey. Adv. Stat. Anal.; 2008; 92, pp. 319-341. [DOI: https://dx.doi.org/10.1007/s10182-008-0072-3]

8. Scotto, M.G.; Wei, C.H.; Gouveia, S. Thinning-based models in the analysis of integer-valued time series: A review. Stat. Model.; 2015; 15, pp. 590-618. [DOI: https://dx.doi.org/10.1177/1471082X15584701]

9. Davis, R.A.; Fokianos, K.; Holan, S.H.; Joe, H.; Livsey, J.; Lund, R.; Pipiras, V.; Ravishanker, N. Count time series: A methodological review. J. Am. Stat. Assoc.; 2021; 116, pp. 1533-1547. [DOI: https://dx.doi.org/10.1080/01621459.2021.1904957]

10. Buckley, F.M.; Pollett, P.K. Limit theorems for discrete-time metapopulation models. Probab. Surv.; 2010; 7, pp. 53-83. [DOI: https://dx.doi.org/10.1214/10-PS158]

11. Weiß, C.H. A Poisson INAR(1) model with serially dependent innovations. Metrika; 2015; 78, pp. 829-851. [DOI: https://dx.doi.org/10.1007/s00184-015-0529-9]

12. Franke, J.; Rao, T.S. Multivariate First-Order Integer Valued Autoregressions; Technical Report Department of Mathematics, UMIST: Manchester, UK, 1995.

13. Latour, A. The multivariate GINAR(p) process. Adv. Appl. Probab.; 1997; 29, pp. 228-248. [DOI: https://dx.doi.org/10.2307/1427868]

14. Pedeli, X.; Karlis, D. A bivariate INAR(1) processes with application. Stat. Model.; 2011; 11, pp. 325-349. [DOI: https://dx.doi.org/10.1177/1471082X1001100403]

15. Pedeli, X.; Karlis, D. On estimation of the bivariate Poisson INAR process. Commun. Stat. Simul. Comput.; 2013; 42, pp. 514-533. [DOI: https://dx.doi.org/10.1080/03610918.2011.639001]

16. Pedeli, X.; Karlis, D. Some properties of multivariate INAR(1) processes. Comput. Stat. Data Anal.; 2013; 67, pp. 213-225. [DOI: https://dx.doi.org/10.1016/j.csda.2013.05.019]

17. Ravishanker, N.; Serhiyenko, V.; Willig, M.R. Hierarchical dynamic models for multivariate times series of counts. Stat. Its Interface; 2014; 7, pp. 559-570. [DOI: https://dx.doi.org/10.4310/SII.2014.v7.n4.a11]

18. Popović, P.M. A bivariate INAR(1) model with different thinning parameters. Stat. Pap.; 2016; 57, pp. 517-538. [DOI: https://dx.doi.org/10.1007/s00362-015-0667-1]

19. Scotto, M.G.; Wei, C.H.; Silva, M.E.; Pereira, I. Bivariate binomial autoregressive models. J. Multivar. Anal.; 2014; 125, pp. 233-251. [DOI: https://dx.doi.org/10.1016/j.jmva.2013.12.014]

20. Li, Q.; Chen, H.; Liu, X. A new bivariate random coefficient INAR(1) model with applications. Symmetry; 2022; 14, 39. [DOI: https://dx.doi.org/10.3390/sym14010039]

21. Kocherlakota, S.; Kocherlakota, K. Bivariate Discrete Distributions; Marcel Dekker: New York, NY, USA, 1992; pp. 87-97.

22. Heathcote, C.R. Corrections and comments on the paper “A branching process allowing immigration”. J. R. Stat. Soc. Ser. B; 1966; 28, pp. 213-217. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1966.tb00634.x]

23. Shumway, R.H.; Stoffer, D.S. Time Series Analysis and Its Applications with R examples; 3rd ed. Springer: New York, NY, USA, 2011.

24. Silva, M.E.; Oliveira, V.L. Difference equations for the higher-order moments and cumulants of the INAR(1) model. J. Time Ser. Anal.; 2004; 25, pp. 317-333. [DOI: https://dx.doi.org/10.1111/j.1467-9892.2004.01685.x]

25. Amemiya, T. Advanced Econometrics; Harvard University Press: Cambridge, MA, USA, 1985; pp. 110-112.

26. Freeland, R.K.; McCabe, B.P.M. Analysis of low count time series data by poisson autoregression. J. Time Ser. Anal.; 2004; 25, pp. 701-722. [DOI: https://dx.doi.org/10.1111/j.1467-9892.2004.01885.x]

27. Cohen, J.; Gorr, W.L. Development of Crime Forecasting and Mapping Systems for Use by Police; Inter-University Consortium for Political and Social Research: New York, NY, USA, 2005; [DOI: https://dx.doi.org/10.3886/ICPSR04545.v1]

28. Aknouche, A.; Bentarzi, W.; Demouche, N. On periodic ergodicity of a general periodic mixed Poisson autoregression. Stat. Probab. Lett.; 2018; 134, pp. 15-21. [DOI: https://dx.doi.org/10.1016/j.spl.2017.10.014]

29. Chen, C.W.S.; Khamthong, K. Bayesian modelling of nonlinear negative binomial integer-valued GARCHX models. Stat. Model.; 2020; 20, pp. 537-561. [DOI: https://dx.doi.org/10.1177/1471082X19845541]

Word count: 6331

Show less

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Recently, there has been a growing interest in integer-valued time series models, especially in multivariate models. Motivated by the diversity of the infinite-patch metapopulation models, we propose an extension to the popular bivariate INAR(1) model, whose innovation vector is assumed to be time-dependent in the sense that the mean of the innovation vector is linearly increased by the previous population size. We discuss the stationarity and ergodicity of the observed process and its subprocesses. We consider the conditional maximum likelihood estimate of the parameters of interest, and establish their large-sample properties. The finite sample performance of the estimator is assessed via simulations. Applications on crime data illustrate the model.

Details

Title

A New Bivariate INAR(1) Model with Time-Dependent Innovation Vectors

Author

Chen, Huaping¹

; Zhu, Fukang²

; Liu, Xiufang³

¹ School of Mathematics and Statistics, Henan University, Kaifeng 475004, China
² School of Mathematics, Jilin University, Changchun 130012, China
³ College of Mathematics, Taiyuan University of Technology, Taiyuan 030024, China

First page

819

Publication year

2022

Publication date

2022

Publisher

MDPI AG

e-ISSN

2571905X

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/stats5030048

ProQuest document ID

2716602921

A New Bivariate INAR(1) Model with Time-Dependent Innovation Vectors

Jump to:

Full text

Abstract

Details

Suggested sources