A Multivariate Flexible Skew-Symmetric-Normal

Full text

Turn on search term navigation

1. Introduction

The use of multivariate normal (MN) distribution plays a central role in statistical modeling. However, there are some situations where the data are not in agreement with the MN distribution. Departure from normality can take place in different ways, such as multimodality, lack in central symmetry, and positive or negative excess of kurtosis. The class of scale mixtures of skew-normal distributions (SMSN) whose general form was first introduced by Branco and Dey [1] includes many multivariate skew symmetric (MSS) distributions with only one mode.

More formally, a scale mixture distribution can be constituted by mixing a base density over a scaling distribution. Its density can be expressed in the form of the integral given by

(1) $f (y) = \int_{0}^{\infty} g (y | κ (τ)) d H (τ; ν),$

where

g (y | κ (τ))

is the conditional density of a

p \times 1

random vector

Y

given

κ (τ)

. Herein,

κ (\cdot)

is a positive function of a scaling variable

τ

with cumulative distribution function (cdf)

H (τ; ν)

, indexed by the parameter vector

ν

Using (1), the family of SMSN distributions can be generated by assuming a multivariate skew-normal (MSN) distribution [2] with location $ξ$ , scale covariance matrix $κ (τ) Σ$ , and shape parameter $λ$ for $g (y | κ (τ))$ . The marginal probability density function (pdf) of $Y$ can be obtained as follows:

(2) $\begin{matrix} f (y) = 2 \int_{0}^{\infty} ϕ_{p} (y; ξ, κ (τ) Σ) Φ (κ {(τ)}^{- 1 / 2} λ^{⊤} Σ^{- 1 / 2} (y - ξ)) d H (τ; ν), \end{matrix}$

where

ϕ_{p} (\cdot; ξ, Σ)

denotes the pdf of the p-variate MN distribution with mean vector

ξ

and covariance matrix

Σ

, and

Φ (\cdot)

is the cdf of the standard normal distribution. Some simple scaling functions such as

κ (τ) = τ

and

κ (τ) = 1 / τ

lead to well-known distributions. As described by Branco and Dey [1], some remarkable examples are the multivariate skew-t (MST), multivariate skew-slash (MSSL), and multivariate skew-contaminated-normal (MSCN) distributions, to name just a few. The SMSN family collapses to the class of scale mixture of normal distributions when the skewness parameter vanishes. The MSN distribution [2] with the parameterization given by Arellano-Valle and Genton [3] can be recovered when

H (τ; ν)

is degenerated by imposing

τ = 1

The multivariate skew-scale mixtures of normal (MSSMN) distributions [4] is established when $g (y | κ (τ))$ defined in (1) is given by

(3) $\begin{matrix} g (y | κ (τ) = τ^{- 1}) = 2 ϕ_{p} (y; ξ, τ^{- 1} Σ) Φ (λ^{⊤} Σ^{- 1 / 2} (y - ξ)) . \end{matrix}$

Therefore, the pdf of MSSMN distribution is obtained as follows:

(4) $\begin{matrix} f (y) = 2 Φ (λ^{⊤} Σ^{- 1 / 2} (y - ξ)) \int_{0}^{\infty} ϕ_{p} (y; ξ, τ^{- 1} Σ) d H (τ; ν) . \end{matrix}$

Recently, Arellano-Valle et al. [5] proposed a multivariate class of scale-shape mixtures of skew-normal (MSSMSN) distributions which provides alternative candidates for modeling asymmetric data. A convenient hierarchical representation of the MSSMSN distribution is given by

(5) $Y | τ = {(τ_{1}, τ_{2})}^{⊤} \sim S N_{p} (ξ, κ (τ_{1}) Σ, η (τ_{1}, τ_{2}) λ),$

where

τ = {(τ_{1}, τ_{2})}^{⊤}

is a mixing vector with a joint cdf

H (τ_{1}, τ_{2}; ν)

, and

η (τ_{1}, τ_{2}) : (R^{+}, R^{+}) \to R

is a real-valued shape mixing function which is not necessarily symmetric about zero. The family of MSSMSN distributions encapsulates several renowned unimodal asymmetric distributions generated by varying the scale and shape functions,

κ (τ_{1})

and

η (τ_{1}, τ_{2})

, for a given distribution of

τ

, or alternatively by fixing

κ (τ_{1})

and

η (τ_{1}, τ_{2})

but varying the distribution of

τ

. A convenient setup for the mixing functions is to choose

κ (τ_{1}) = 1 / τ_{1}

and

η (τ_{1}, τ_{2}) = \sqrt{τ_{2}}

, leading to a particular form of the shape mixture of SMSN distributions. If we choose

η (τ_{1}, τ_{2}) = {(τ_{1} / τ_{2})}^{- 1 / 2}

, it produces another form of shape mixtures of MSSMN distributions. See [5] for a more comprehensive discussion and detailed investigation.

As shown by Azzalini and Capitanio [6], the pdf of the MSS distribution can be written as follows:

(6) $f (y; ξ, Σ) = {2 | Σ |}^{- 1 / 2} f_{0} (Σ^{- 1 / 2} (y - ξ)) G_{0} (w \{Σ^{- 1 / 2} (y - ξ)\}),$

where

ξ

and

Σ

are, respectively, the location vector and scale covariance matrix;

f_{0} : R^{p} \to R^{+}

is a p-variate centrally symmetric pdf with respect to the origin, i.e.,

f_{0} (x) = f_{0} (- x)

;

G_{0} : R \to [0, 1]

is a univariate cdf satisfying

G_{0} (- x) = 1 - G_{0} (x)

; and

w : R^{p} \to R

is an odd real-valued function, namely

w {- x} = w {x}

. The MSS class is equivalent to the one studied by Wang et al. [7] for which

G_{0} (w {x})

is replaced by

π : R^{p} \to [0, 1]

satisfying

π (- x) = 1 - π (x)

In light of (6), it is possible to generate a wide range of asymmetric families of unimodal and multimodal skew distributions depending on the specification of the function $w {x}$ . This essential property induces greater flexibility in the available shapes. For example, Ma and Genton [8] proposed a flexible class of skew-symmetric distributions by choosing $w {x} = P_{K} (x)$ , where $P_{K} (x)$ is an odd polynomial function of order K.

The Hadamard product, also known as the Schur product, is a type of matrix multiplication that is commutative and simpler than the matrix product. See [9] for a comprehensive review and its applications to multivariate statistical analysis. The Hadamard product is advantageous in computations and algebraic manipulations because the products are entry-wise, the multiplication is commutative, and, particularly, the inverse is very easy to obtain and the computation of power matrices is straightforward. By convention, we use “⊙" to denote the Hadamard operations (product and power). Let $A = (a_{i j})$ and $B = (b_{i j})$ be two $p \times q$ matrices of the same dimension but not necessarily square. Then, the Hadamard product between these two matrices, denoted by $A ⊙ B$ , is the element-wise product of $A$ and $B$ , that is, a $p \times q$ matrix whose $(i, j)$ entry is $a_{i j} b_{i j}$ . Accordingly, the nth Hadamard power of matrix $A$ is defined as $A^{⊙ n} = [a_{i j}^{n}]$ . More key properties concerning the multiplication and partial derivatives of the Hadamard product are deferred to Appendix A.

In the multivariate setup, the odd polynomial of order K has various combinations of the coefficients. For example, the bivariate case under an odd polynomial of order $K = 3$ is given by $P_{K} (x_{1}, x_{2}) = α_{1} x_{1} + α_{2} x_{2} + α_{3} x_{1}^{3} + α_{4} x_{2}^{3} + α_{5} x_{1}^{2} x_{2} + α_{6} x_{1} x_{2}^{2}$ . As an alternative to that of Ma and Genton [8] in more a general setting, we introduce the multivariate flexible skew-symmetric-normal (MFSSN) distribution, denoted by $M F S S N_{p} (ξ, Σ, α)$ , which has the following pdf:

(7) $\begin{matrix} f_{M F S S N} (y; ξ, Σ, α) = 2 ϕ_{p} (y; ξ, Σ) Φ (λ_{1}^{⊤} η_{1} + λ_{2}^{⊤} η_{3} + \dots + λ_{m}^{⊤} η_{2 m - 1}), \end{matrix}$

where

α = {(λ_{1}^{⊤}, \dots, λ_{m}^{⊤})}^{⊤}

is a

p m \times 1

multiple-scaled vector of shape parameters and

η_{2 k - 1} = {[Σ^{- 1 / 2} (y - ξ)]}^{⊙ 2 k - 1}

remains a

p \times 1

vector through an odd Hadamard power transformation of order

2 K - 1

, for

K = 1, \dots, m

. Notably, the MFSSN distribution encompasses the flexible generalized skew-normal (FGSN) distribution introduced by Ma and Genton [8] as the univariate case. Figure 1 illustrates the scatter-contour plots coupled with their marginal histograms of the bivariate MFSSN distribution under

ξ = 0

Σ = I_{2}

and various specifications of shape parameters arisen from two setups of m. As can be seen, many different non-elliptically distributional shapes with multiple modes and asymmetric patterns can be produced. Additional flexibility can be gained by expanding the thickness of tails. This motivates us to propose a more general class of scale-shape mixtures of MFSSN (SSMFSSN) distributions.

The class of SSMFSSN distributions can be hierarchically represented by

(8) $\begin{matrix} Y ∣ τ = {(τ_{1}, τ_{2})}^{⊤} \sim M F S S N_{p} (ξ, τ_{1}^{- 1} Σ, ϑ), \end{matrix}$

where

ϑ = {(λ_{1}^{⊤} τ_{2}^{1 / 2} τ_{1}^{- 1 / 2}, \dots, λ_{m}^{⊤} τ_{2}^{1 / 2} τ_{1}^{- (2 m - 1) / 2})}^{⊤}

. Further, we denote the pdf of

τ = {(τ_{1}, τ_{2})}^{⊤}

h_{τ} (τ_{1}, τ_{2}; ν)

. From (8), the marginal pdf of

Y

is given by

(9) $\begin{matrix} f_{S S M F S S N} (y; ξ, Σ, ζ, ν) & = & \int ϕ_{p} (y; ξ, τ_{1}^{- 1} Σ) Φ (τ_{2}^{1 / 2} {λ_{1}^{⊤} η_{1} + λ_{2}^{⊤} η_{3} \\ + \dots + λ_{m}^{⊤} η_{2 m - 1}}) d H (τ; ν) . \end{matrix}$

Obviously, the MFSSN model is obtained by setting

τ_{1} = τ_{2} = 1

in (9).

The family of SSMFSSN distributions introduced in (8) is quite vast, containing several subfamilies of asymmetric and multimodal distributions which have never been considered in the literature. Notice that the MSSMSN distribution described in (5) is a simple case of SSMFSSN by taking $κ (τ_{1}) = 1 / τ_{1}$ , $η (τ_{1}, τ_{2}) = {(τ_{1} / τ_{2})}^{- 1 / 2}$ , and $λ_{j} = 0$ , for $j = 2, \dots, m$ . More importantly, the scale-shape mixtures of flexible generalized skew-normal (SSMFGSN) distributions proposed very recently by Mahdavi et al. [10] can be thought of as a univariate case of SSMFSSN when the dimension $p = 1$ .

The EM algorithm [11] and some of its extraordinary variants such as the expectation conditional maximization (ECM) algorithm [12] and the expectation conditional maximization either (ECME) algorithm [13] are broadly applicable methods to carry out ML estimation for multivariate skew distributions in a complete-data framework. To the best of our knowledge, previous developments of the EM-type algorithm are based on the convolution-type representations, see, e.g., [14] for the MSN distribution, [15] for the MST distribution, [1] for the SMSN distribution, and [5] for the MSSMSN distribution. Since our proposed SSMFSSN model cannot be explicitly expressed by a convolution-type representation, we develop a novel EM-based procedure designed under the selection mechanism to compute the ML estimates.

The rest of the paper is organized as follows. Section 2 presents the formulation of the general SSMFSSN model and discusses how to deploy the ECME algorithm for ML estimation based on the selection-type mechanism. Section 3 exemplifies some particular cases of SSMFSSN distributions constructed by setting different mixing distributions for $τ$ . The proposed techniques are illustrated by conducting two simulation studies in Section 4 and analyzing a real data example in Section 5. We conclude in Section 6 with a few remaks and offer directions for future research.

2. Methodology

2.1. The Family of SSMFSSN Distributions

A p-variate random vector $Y$ ∼ $S S M F S S N_{p} (ξ, Σ, α, ν)$ is asserted to follow the SSMFSSN distribution with location vector $ξ$ , scale covariance matrix $Σ$ , shape parameters $α = {(λ_{1}^{⊤}, \dots, λ_{m}^{⊤})}^{⊤}$ , and flatness parameters $ν$ if it has the following selection-type representation:

(10) $\begin{matrix} Y \overset{d}{=} V ∣ (U > 0), \end{matrix}$

where

V = ξ + Σ^{1 / 2} τ_{1}^{- 1 / 2} Z_{1}

U = λ_{1}^{⊤} (τ_{1}^{- 1 / 2} Z_{1}) + \dots + λ_{m}^{⊤} (τ_{1}^{- (2 m - 1) / 2} Z_{1}^{⊙ 2 m - 1}) - τ_{2}^{- 1 / 2} Z_{2}

and

{(Z_{1}^{⊤}, Z_{2})}^{⊤}

∼

N_{p + 1} (0, I_{p + 1})

. Herein, ‘

\overset{d}{=}

’ stands for equality in distribution, and U is obviously a continuous random variable symmetric about zero. Using this characterization, the random samples for

S S M F S S N_{p} (0, I_{p}, α, ν)

can be simulated through the following scheme

(11) $\begin{matrix} X = \{\begin{matrix} τ_{1}^{- 1 / 2} Z_{1}, if U > 0, \\ - τ_{1}^{- 1 / 2} Z_{1}, otherwise . \end{matrix} \end{matrix}$

As a result, the random samples of the general $S S M F S S N_{p} (ξ, Σ, α, ν)$ can be obtained by the affine transformation $Y = ξ + Σ^{1 / 2} X$ .

For fitting the SSMFSSN model (10) within the complete-data framework via the EM-type algorithm, we introduce two latent variables $W \overset{d}{=} U ∣ (U > 0)$ and $γ = {(γ_{1}, γ_{2})}^{⊤} \overset{d}{=} {(τ_{1}, τ_{2})}^{⊤} ∣ (U > 0)$ . Then, ${(Y^{⊤}, W, γ^{⊤})}^{⊤} \overset{d}{=} {(V^{⊤}, U, τ^{⊤})}^{⊤} | (U > 0)$ has the following joint pdf:

(12) $\begin{matrix} f_{Y, W, γ} (y, W, γ) & = & \frac{1}{Pr (U > 0)} f_{V, U, τ} (y, W, γ) \\ = & 2 f_{τ} (γ) f_{V | τ} (y) f_{U | V, τ} (W) \\ = & 2 γ_{2}^{1 / 2} h_{τ} (γ_{1}, γ_{2}; ν) ϕ_{p} (y; ξ, γ_{1}^{- 1} Σ) ϕ (γ_{2}^{1 / 2} {W - ζ}), \end{matrix}$

where

ζ = λ_{1}^{⊤} η_{1} + \dots + λ_{m}^{⊤} η_{2 m - 1}

h_{τ} (γ_{1}, γ_{2}; ν)

is the pdf of

τ = {(τ_{1}, τ_{2})}^{⊤}

evaluated at point

γ = {(γ_{1}, γ_{2})}^{⊤}

Integrating out W from (12) gives the following joint pdf

(13) $\begin{matrix} f_{Y, γ} (y, γ_{1}, γ_{2}) & = & 2 h_{τ} (γ_{1}, γ_{2}; ν) ϕ_{p} (y; ξ, γ_{1}^{- 1} Σ) Φ (γ_{2}^{1 / 2} ζ) . \end{matrix}$

Therefore, the marginal pdf of $Y$ is given by

(14) $\begin{matrix} f_{Y} (y) & = & \int_{0}^{\infty} \int_{0}^{\infty} f_{Y, γ} (y, γ_{1}, γ_{2}) d γ_{1} d γ_{2} \\ \overset{τ_{1} ⊥ τ_{2}}{=} & 2 \int ϕ_{p} (y; ξ, γ_{1}^{- 1} Σ) d H_{τ_{1}} (γ_{1}; ν_{1}) \int Φ (γ_{2}^{1 / 2} ζ) d H_{τ_{2}} (τ_{2}; ν_{2}), \end{matrix}$

where the second equality holds if we further assume

τ_{1}

and

τ_{2}

are mutually independent. It is noteworthy that the shape mixtures of MSSMN distribution established by Arellano-Valle et al. [5] also belongs to the family of our proposed SSMFSSN model by imposing

λ_{2} = \dots = λ_{m} = 0

Dividing (12) by (13) yields the following relation

(15) $\begin{matrix} f (W ∣ y, γ_{1}, γ_{2}) & = & \frac{γ_{2}^{1 / 2} ϕ (γ_{2}^{1 / 2} (W - ζ))}{Φ (γ_{2}^{1 / 2} ζ)} \equiv f (W ∣ y, γ_{2}), \end{matrix}$

for which ‘≡’ in (15) means that W and

γ_{1}

are conditionally independent. Moreover, it is straightforward to show that

(16) $W ∣ {(y^{⊤}, γ_{2})}^{⊤} \sim T N (ζ, γ_{2}^{- 1}) I_{(0, \infty)},$

where

T N (μ, σ^{2}) I_{A}

represents a doubly truncated normal distribution confined within the interval

A = {a_{1} < x < a_{2}}

, and

I_{A}

is an indicator function of set

A

. Using Lemma 2 of Lin et al. [16], we have the following conditional expectation:

(17) $E (W ∣ y, γ_{2}) = ζ + γ_{2}^{- 1 / 2} \frac{ϕ (γ^{1 / 2} ζ)}{Φ (γ^{1 / 2} ζ)} .$

By Bayes’ rule, the conditional pdf of $γ = {(γ_{1}, γ_{2})}^{⊤}$ given $Y = y$ is

(18) $\begin{matrix} f (γ_{1}, γ_{2} ∣ y) = \frac{2 h_{τ} (γ_{1}, γ_{2}; ν) ϕ_{p} (y; ξ, γ_{1}^{- 1} Σ) Φ (γ_{2}^{1 / 2} ζ)}{f_{Y} (y)} . \end{matrix}$

2.2. Parameter Estimation via the ECME Algorithm

The EM algorithm [11] is a widely used iterative technique to deal with ML estimation in models that involve incomplete data or latent variables. One primary virtue of EM lies in the fact of attractive monotone convergence properties and the preservation of simplicity and stability. In practice, a major limitation of EM is often that some estimators in the M-step cannot be solved in terms of closed-form expressions. To overcome this obstacle, the ECM algorithm proposed by Meng and Rubin [12] recommends replacing the E-step of EM with a sequence of simpler conditional maximization (CM) steps, yet it also enjoys the same convergence properties as EM. However, in certain problems, some of the CM-steps of ECM may become analytically intractable or suffer from slow convergence. As a further flexible extension, the ECME algorithm [13] divides the CM-steps of ECM to maximize either the Q-function, called the CMQ-step, or the corresponding constrained actual log-likelihood function, named as the CML-step. In what follows, we describe in greater detail how the proposed SSMFSSN model can be fitted by using the ECME algorithm.

Suppose that $y = {(y_{1}^{⊤}, \dots, y_{n}^{⊤})}^{⊤}$ constitutes a set of p-dimensional observed samples of size n arising from the SSMFSSN model. Under the EM framework, the latent variables $W = {(W_{1}, \dots, W_{n})}^{⊤}$ and $γ = {(γ_{1}^{⊤}, \dots, γ_{n}^{⊤})}^{⊤}$ introduced in the preceding subsection are treated as missing data. Then, the complete data are given by $y_{c} = {(y^{⊤}, W^{⊤}, γ^{⊤})}^{⊤}$ . Further, we let $θ = {(ξ^{⊤}, vech {(Σ)}^{⊤}, α^{⊤}, ν^{⊤})}^{⊤}$ denote the entire unknown parameters to be estimated in the SSMFSSN model, where $vech (\cdot)$ is the half-vectorization operator that stacks the lower triangular elements of a $p \times p$ symmetric matrix into a single $p (p + 1) / 2$ vector.

According to (12), the log-likelihood function of $θ$ corresponding to the complete-data $y_{c}$ , excluding additive constants and terms that do not involve parameters of the model, is given by

(19) $\begin{matrix} ℓ_{c} (θ ∣ y_{c}) & = & - \frac{n}{2} ln | Σ | - \frac{1}{2} \sum_{i = 1}^{n} {γ_{1 i} {(y_{i} - ξ)}^{⊤} Σ^{- 1} (y_{i} - ξ) + γ_{2 i} {(W_{i} - ζ_{i})}^{2} \\ - 2 ln h (γ_{1 i}, γ_{2 i}; ν)}, \end{matrix}$

with

$\begin{matrix} ζ_{i} & = & λ_{1}^{⊤} η_{i, 1} + λ_{2}^{⊤} η_{i, 3} + \dots + λ_{m}^{⊤} η_{i, 2 m - 1}, \\ = & λ_{1}^{⊤} {[Σ^{- 1 / 2} (y_{i} - ξ)]}^{⊙ 1} + λ_{2}^{⊤} {[Σ^{- 1 / 2} (y_{i} - ξ)]}^{⊙ 3} + \dots + λ_{m}^{⊤} {[Σ^{- 1 / 2} (y_{i} - ξ)]}^{⊙ 2 m - 1} \\ = & 1_{p}^{⊤} Λ_{1} {[Σ^{- 1 / 2} (y_{i} - ξ)]}^{⊙ 1} + 1_{p}^{⊤} Λ_{2} {[Σ^{- 1 / 2} (y_{i} - ξ)]}^{⊙ 3} + \dots + 1_{p}^{⊤} Λ_{m} {[Σ^{- 1 / 2} (y_{i} - ξ)]}^{⊙ 2 m - 1} \\ = & 1_{p}^{⊤} {[Δ_{1} (y_{i} - ξ)]}^{⊙ 1} + 1_{p}^{⊤} {[Δ_{2} (y_{i} - ξ)]}^{⊙ 3} + \dots + 1_{p}^{⊤} {[Δ_{m} (y_{i} - ξ)]}^{⊙ 2 m - 1} \\ = & 1_{p}^{⊤} \sum_{j = 1}^{m} {[Δ_{j} (y_{i} - ξ)]}^{⊙ 2 j - 1}, \end{matrix}$

where

1_{p}

is a

p \times 1

vector of ones,

η_{i, 2 j - 1} = {[Σ^{- 1 / 2} (y_{i} - ξ)]}^{⊙ 2 j - 1}

Λ_{j} = Diag {λ_{j}}

is a

p \times p

diagonal matrix containing the elements of

λ_{j}

on the main diagonal, and

Δ_{j} = Λ_{j}^{1 / (2 j - 1)} Σ^{- 1 / 2}

is a

p \times p

reparameterized matrix.

On the kth iteration, the E-step requires the calculation of the so-called Q-function, which is the conditional expectation of (19) given the observed data $y$ and the current estimate ${\hat{θ}}^{(k)}$ , where the superscript $^{(k)}$ denote the updated estimates at iteration k. To evaluate the Q-function, we require the following conditional expectations:

(20) $\begin{matrix} {\hat{s}}_{1 i}^{(k)} = E (γ_{1 i} ∣ y_{i}, {\hat{θ}}^{(k)}), {\hat{s}}_{2 i}^{(k)} = E (γ_{2 i} ∣ y_{i}, {\hat{θ}}^{(k)}), {\hat{s}}_{3 i}^{(k)} = E (W_{i} γ_{2 i} ∣ y_{i}, {\hat{θ}}^{(k)}), \end{matrix}$

which have explicit expressions that are discussed in detail in a subsequent section along with

(21) ${\hat{s}}_{4 i}^{(k)} (ν) = E (ln h (γ_{1 i}, γ_{2 i}; ν) ∣ y_{i}, {\hat{θ}}^{(k)}),$

which may not have standard form for some subfamilies. Substituting (20) and (21) into (19) yields the following Q-function:

(22) $\begin{matrix} Q (θ ∣ {\hat{θ}}^{(k)}) & = & - \frac{n}{2} ln | Σ | - \frac{1}{2} \sum_{i = 1}^{n} {{\hat{s}}_{1 i}^{(k)} {(y_{i} - ξ)}^{⊤} Σ^{- 1} (y_{i} - ξ) + {\hat{s}}_{2 i}^{(k)} ζ_{i}^{2} - 2 {\hat{s}}_{3 i}^{(k)} ζ_{i} \\ - 2 {\hat{s}}_{4 i}^{(k)} (ν)} . \end{matrix}$

The CM-steps are implemented to update estimates of $θ$ in the order of $ξ$ , $Σ$ , $α$ and $ν$ by maximizing, one by one, the Q-function obtained in the E-step. After some algebraic manipulations, they are summarized by the following CMQ and CML steps:

CMQ-Step 1: Fixing $Σ = {\hat{Σ}}^{(k)}$ and $α = {\hat{α}}^{(k)}$ , we update ${\hat{ξ}}^{(k)}$ via Proposition A2 by taking the partial derivative of (22) with respect to $ξ$ . Since the derivation cannot get a closed-form expression for its maximizer, the solution of ${\hat{ξ}}^{(k + 1)}$ is validated by numerically solving the root of the following equation:
(23) $\begin{matrix} \sum_{i = 1}^{n} \{{\hat{s}}_{1 i}^{(k)} {\hat{Σ}}^{{(k)}^{- 1}} (y_{i} - ξ) + {\hat{s}}_{2 i}^{(k)} ζ_{i}^{(k)} {\hat{a}}_{i} - s_{3 i}^{(k)} {\hat{a}}_{i}\} = 0, \end{matrix}$
where the two terms $ζ_{i}^{(k)} = 1_{p}^{⊤} \sum_{j = 1}^{m} {[{\hat{Δ}}_{j}^{(k)} (y_{i} - ξ)]}^{⊙ 2 j - 1}$ and ${\hat{a}}_{i} = \sum_{j = 1}^{m} (2 j - 1) {\hat{Δ}}_{j}^{{(k)}^{⊤}} {[{\hat{Δ}}_{j}^{(k)} (y_{i} - ξ)]}^{⊙ 2 j - 2}$ are nonlinear functions of $ξ$ with ${\hat{Δ}}_{j}^{(k)} = Diag {{\hat{λ}}_{j}^{(k)}}^{1 / (2 j - 1)} {\hat{Σ}}^{{(k)}^{- 1 / 2}}$ .
CMQ-Step 2: Fixing $ξ = {\hat{ξ}}^{(k + 1)}$ and then updating ${\hat{Σ}}^{(k)}$ by maximizing (22) over $Σ$ gives
(24) ${\hat{Σ}}^{(k + 1)} = \frac{1}{n} \sum_{i = 1}^{n} {\hat{s}}_{1 i}^{(k)} (y_{i} - {\hat{ξ}}^{(k + 1)}) {(y_{i} - {\hat{ξ}}^{(k + 1)})}^{⊤} .$
CMQ-Step 3: Fixing $ξ = {\hat{ξ}}^{(k + 1)}$ , we update ${\hat{Δ}}_{j}^{(k)}$ via Proposition A3 by taking the partial derivative of (22) with respect to $Δ_{j}$ each, $j = 1, \dots, m$ . Since their solutions cannot be isolated and set equal to zeros, we have the following equation for finding the nonlinear roots of $Δ_{j}$ :
(25) $\begin{matrix} \sum_{i = 1}^{n} ({\hat{s}}_{3 i}^{(k)} - {\hat{s}}_{2 i}^{(k)} ζ_{i}^{(k + 1)} (Δ_{j})) {(Δ_{j} (y_{i} - {\hat{ξ}}^{(k + 1)}))}^{⊙ 2 j - 2} {(y_{i} - {\hat{ξ}}^{(k + 1)})}^{⊤} = 0, \end{matrix}$
where $ζ_{i}^{(k + 1)} (Δ_{j}) = 1_{p}^{⊤} \sum_{j = 1}^{m} {[Δ_{j} (y_{i} - ξ^{(k + 1)})]}^{⊙ 2 j - 1}$ is a nonlinear function of $Δ_{j}$ . After simplification, we can transform ${\hat{Δ}}_{j}^{(k + 1)}$ back to
(26) ${\hat{λ}}_{j}^{(k + 1)} = {(Δ_{j}^{(k + 1)} Σ^{1 / 2 (k + 1)})}^{⊙ 2 j - 1} 1_{p}, j = 1, \dots, m .$

Collecting the above solutions turn out to be ${\hat{α}}^{(k + 1)} = {({\hat{λ}}_{1}^{{(k + 1)}^{⊤}}, \dots, {\hat{λ}}_{m}^{{(k + 1)}^{⊤}})}^{⊤}$ .

For some members of SSMFSSN, the calculation of ${\hat{s}}_{4 i}^{(k)} (ν)$ is not straightforward. An update of ${\hat{ν}}^{(k)}$ can be achieved by directly maximizing the constrained actual log-likelihood function. This gives rise to the following CML-step:

CML-Step: ${\hat{ν}}^{(k)}$ is updated by optimizing the following constrained log-likelihood function:
(27) ${\hat{ν}}^{(k + 1)} = arg max_{ν} \sum_{i = 1}^{n} ln f_{S S M F S S N} (y_{i}, {\hat{ξ}}^{(k + 1)}, {\hat{Σ}}^{(k + 1)}, {\hat{ζ}}^{(k + 1)}, ν) .$

Note that the maximization in the above CML-step requires p-dimensional search of the objective function (constrained log-likelihood), which can be easily accomplished by using, for example, the optim routine in R Development Core Team [17]. The iterations of the above algorithms are alternately repeated until a suitable convergence rule is satisfied, e.g., the relative difference $| ℓ ({\hat{θ}}^{(k + 1)}) / ℓ ({\hat{θ}}^{(k)}) - 1 |$ is sufficiently small, e.g. less than $10^{- 6}$ , which we consider in the numerical experiments, where $ℓ (θ) = \sum_{i = 1}^{n} ln f_{S S M F S S N} (y_{i}; θ)$ . To prevent infinite loop from adopting this criterion, the maximum number of iterations is set to 5000.

On the initialization of parameters for starting the algorithm, the location vector ${\hat{ξ}}^{(0)}$ and the scale covariance matrix ${\hat{Σ}}^{(0)}$ are specified as the sample mean vector and sample covariance matrix, respectively. The initial values for the shape parameters ${\hat{λ}}_{1}^{(0)}$ are taken as the sample skewness of p variables, while the remaining ${\hat{λ}}_{2}^{(0)}, \dots, {\hat{λ}}_{m}^{(0)}$ are fixed around 0. As for ${\hat{ν}}^{(0)} = {({\hat{ν}}_{1}^{(0)}, {\hat{ν}}_{2}^{(0)})}^{⊤}$ , their initial values are chosen as relatively small values depending on the settings of parameter domain. To avoid getting stuck in one of the many local maxima of the likelihood function, a convenient method is to try a variety different of initial values with perturbations or using the bootstrap resampling method [14]. The solution with the highest log-likelihood value is treated as the ML estimates, denoted by $\hat{θ} = {({\hat{ξ}}^{⊤}, vech {\hat{(Σ)}}^{⊤}, {\hat{α}}^{⊤}, {\hat{ν}}^{⊤})}^{⊤}$ .

3. Examples of SSMFSSN Distributions

We present some special cases of SSMFSSN distributions which are induced by setting different mixing distributions for $τ$ . For each case, additional conditional expectations are also derived for the implementation of ECME.

3.1. The Multivariate Flexible Skew-Symmetric-t-Normal Distribution

The multivariate flexible skew-symmetric-t-normal (MFSSTN) distribution, denoted by $Y$ ∼ $M F S S T N_{p} (ξ, Σ, α, ν)$ , is produced by taking $τ_{1} \sim Γ (ν / 2, ν / 2)$ and $τ_{2} = 1$ in (10). In this case, $h_{τ} (γ_{1}, γ_{2}; ν) = g (γ_{1}; ν / 2, ν / 2)$ , where $g (\cdot; α, β)$ represents the pdf of the gamma distribution with mean $α / β$ . The pdf of the MFSSTN distribution can be expressed as

(28) $f_{M F S S T N} (y; ξ, Σ, α, ν) = 2 t_{p} (y; ξ, Σ, ν) Φ (ζ),$

where

t_{p} (\cdot; ξ, Σ, ν)

stands for the pdf of p-dimensional multivariate t distribution with location vector

ξ

, scale covariance matrix

Σ

, and degree of freedom (DOF) equal to

ν

. One special case of the MFSSTN distribution is the multivariate skew-t-normal (MSTN) distribution of Lin et al. [18], obtained by letting

λ_{j} = 0

for

j = 2, \dots, m

. In addition, the MFSSTN distribution reduces to MFSSN as

ν \to \infty

According to (18), it is easy to observe that $γ_{1} | Y = y$ ∼ $Γ ((ν + p) / 2, (ν + δ^{2}) / 2)$ , where $δ^{2} = {(y - ξ)}^{⊤} Σ^{- 1} (y - ξ)$ denotes the squared Mahalanobis distance. Therefore,

(29) $E (γ_{1} ∣ Y = y) = \frac{ν + p}{ν + δ^{2}} and E (ln γ_{1} ∣ Y = y) = ψ (\frac{ν + p}{2}) - ln (\frac{ν + δ^{2}}{2}),$

where

ψ (x) = Γ^{'} (x) / Γ (x)

is the digamma function.

From (20) and (21), the E-step involves the calculation of ${\hat{s}}_{1 i}^{(k)} = E (γ_{1 i} ∣ y_{i}, {\hat{θ}}^{(k)})$ , ${\hat{s}}_{2 i}^{(k)} = 1$ , ${\hat{s}}_{3 i}^{(k)} = E (W_{i} ∣ y_{i}, {\hat{θ}}^{(k)})$ and ${\hat{s}}_{4 i}^{(k)} = E (ln γ_{1 i} ∣ y_{i}, {\hat{θ}}^{(k)})$ , which can be easily evaluated via the results of (17) and (29). As an alternative way of estimating $ν$ , the CML-Step for the MFSSTN distribution can be altered to the following CMQ-Step.

CMQ-Step 4: ${\hat{ν}}^{(k + 1)}$ is obtained by solving the root of the following equation:
(30) $1 + ln (\frac{ν}{2}) - ψ (\frac{ν}{2}) + \frac{1}{n} \sum_{i = 1}^{n} ({\hat{s}}_{4 i}^{(k)} - {\hat{s}}_{1 i}^{(k)}) = 0 .$

3.2. The Multivariate Flexible Skew-Symmetric-Slash-Normal Distribution

Let $Y$ be a p-dimensional random vector with the following representation

(31) $Y = ξ + τ^{- 1 / 2} Σ^{1 / 2} Z,$

where

Z

∼

N_{p} (0, I_{p})

and is independent of

τ \overset{d}{=} U^{1 / ν}

∼

B e t a (ν, 1)

, where U is the uniform distribution on the interval

(0, 1)

. From (31), the conditional distribution

Y

given

τ

N_{p} (ξ, τ^{- 1} Σ)

. Then,

Y

has a multivariate slash distribution with pdf given by

(32) $\begin{matrix} f_{S L} (y; ξ, Σ, ν) & = & ν \int_{0}^{1} τ^{ν - 1} ϕ_{p} (y; ξ, τ^{- 1} Σ) d τ \\ = & \{\begin{matrix} \frac{2^{ν} ν {| Σ |}^{- 1 / 2} Γ (ν + p / 2) G (δ^{2} / 2; ν + p / 2)}{π^{p / 2} δ^{2 ν + p}}, y \neq ξ \\ \frac{ν}{(ν + p / 2) {(2 π)}^{p / 2}} {| Σ |}^{- 1 / 2}, y = ξ \end{matrix} \end{matrix}$

where

G (\cdot; r)

denotes the cdf of the gamma distribution with scale parameter 1 and shape parameter r. Using the law of iterated expectations, the mean vector and variance-covariance matrix of

Y

are

(33) $E (Y) = ξ and cov (Y) = \frac{ν}{ν - 1} Σ .$

If we select $τ_{1}$ ∼ $B e t a (ν, 1)$ and $τ_{2} = 1$ for (10), this generates the multivariate flexible skew-symmetric-slash-normal (MFSSSLN) distribution, denoted by $Y$ ∼ $M F S S S L N_{p} (ξ, Σ, α, ν)$ , with the pdf taking the form of

(34) $\begin{matrix} f_{M F S S S L N} (y; ξ, Σ, α, ν) = 2 f_{S L} (y; ξ, Σ, ν) Φ (ζ) . \end{matrix}$

Note that the MFSSSLN distribution contains the MFSSN distribution as a limiting case for $ν \to \infty$ and encloses the multivariate skew-slash distribution considered by Wang and Genton [19] as a reduced case when $λ_{2} = \dots = λ_{m} = 0$ .

According to (18), the conditional distribution $γ_{1} ∣ y$ is given by

(35) $\begin{matrix} f (γ_{1} ∣ y) = \{\begin{matrix} \frac{{| δ |}^{2 ν + p} γ_{1}^{ν + p / 2 - 1} exp (- γ_{1} δ^{2} / 2)}{2^{ν + p / 2} Γ (ν + p / 2) G (δ^{2} / 2; ν + p / 2)}, y \neq ξ \\ (ν + \frac{p}{2}) γ_{1}^{ν + p / 2 - 1}, y = ξ . \end{matrix} \end{matrix}$

In addition, some algebraic manipulations yield

(36) $\begin{matrix} E (γ_{1} ∣ Y = y) = \{\begin{matrix} (\frac{2 ν + p}{δ^{2}}) \frac{G (δ^{2} / 2; ν + p / 2 + 1)}{G (δ^{2} / 2; ν + p / 2)}, y \neq ξ \\ \frac{2 ν + p}{2 ν + p + 2}, y = ξ . \end{matrix} \end{matrix}$

and

(37) $\begin{matrix} E (ln γ_{1} ∣ Y = y) = \{\begin{matrix} ln (\frac{2}{δ^{2}}) + \frac{\int_{0}^{δ^{2} / 2} ln (x) x^{ν + p / 2 - 1} e^{- x} d x}{Γ (ν + p / 2) G (δ^{2} / 2; ν + p / 2)}, y \neq ξ \\ \frac{- 2}{2 ν + p}, y = ξ . \end{matrix} \end{matrix}$

To implement the ECME procedure for fitting MFSSSLN, the conditional expectations involved in (20) and (21) can be easily evaluated via the results of (17), (36), and (37). Besides, the DOF ${\hat{ν}}^{(k)}$ can be alternatively updated by maximizing the Q-function over $ν$ , leading to the following CMQ-Step:

CMQ-Step 4:
(38) ${\hat{ν}}^{(k + 1)} = - \frac{n}{\sum_{i = 1}^{n} {\hat{s}}_{4 i}^{(k)}} .$

3.3. The Multivariate Flexible Skew-Symmetric-Contaminated-Normal Distribution

The multivariate flexible skew-symmetric-contaminated-normal (MFSSCN) distribution, denoted by $Y$ ∼ $M F S S C N_{p} (ξ, Σ, α, ν_{1}, ν_{2})$ , arises when $τ_{2} = 1$ , while $τ_{1}$ has a discrete distribution taking value $ν_{2} \in (0, 1)$ with probability $ν_{1}$ and value 1 with probability $1 - ν_{1}$ . More precisely,

(39) $\begin{matrix} h (τ_{1}, ν) = ν_{1} I_{(τ_{1} = ν_{2})} + (1 - ν_{1}) I_{(τ_{1} = 1)}, 0 < ν_{1} < 1 and 0 < ν_{2} < 1, \end{matrix}$

where

ν_{1}

is the proportion of outliers and

ν_{2}

is an inflation parameter denoting the degree of contamination.

Using (14), the pdf of $Y$ is given by

(40) $\begin{matrix} f_{M F S S C N} (y; ξ, Σ, ζ, ν_{1}, ν_{2}) = 2 \{ν_{1} ϕ_{p} (y; ξ, ν_{2}^{- 1} Σ) + (1 - ν_{1}) ϕ_{p} (y; ξ, Σ)\} Φ (ζ) . \end{matrix}$

Obviously, the MFSSCN distribution reduces to the MFSSN distribution when $ν_{2} = 1$ , to the multivariate skew-contaminated-normal distribution [4] when $λ_{2} = \dots = λ_{m} = 0$ , and is said to follows the “MFSSCNe" distribution if $ν_{1}$ and $ν_{2}$ are restricted to be equal, namely $ν_{1} = ν_{2} = ν$ .

To obtain ${\hat{s}}_{4 i}^{(k)}$ , we require the following conditional expectation

(41) $E (γ_{1} ∣ Y = y) = \frac{1 - ν_{1} + ν_{1} ν_{2}^{1 + p / 2} exp {(1 - ν_{2}) δ^{2} / 2}}{1 - ν_{1} + ν_{1} ν_{2}^{p / 2} exp {(1 - ν_{2}) δ^{2} / 2}} .$

The resulting Q-function can be easily evaluated through (17) and (41) since ${\hat{s}}_{2 i}^{(k)} = 1$ . To estimate $ν_{1}$ and $ν_{2}$ , we perform the CML-Step, so the calculation of ${\hat{s}}_{4 i}^{(k)}$ can be omitted.

3.4. The Multivariate Flexible Skew-Symmetric-t Distribution

The multivariate flexible skew-symmetric-t (MFSST) distribution, denoted by $Y$ ∼ $M F S S T_{p} (ξ, Σ, α, ν)$ , is created by setting $τ_{1} = τ_{2} = τ$ with $τ \sim Γ (ν / 2, ν / 2)$ . Utilizing (8), the hierarchical representation for $Y$ can be simplified as

(42) $\begin{matrix} Y ∣ τ \sim M F S S N_{p} (ξ, τ_{1}^{- 1} Σ, ϑ), \end{matrix}$

where

ϑ = {(λ_{1}^{⊤}, τ^{- 1} λ_{2}^{⊤} \dots, τ^{- m + 1} λ_{m}^{⊤})}^{⊤}

. Therefore, it can be verified that the MFSST distribution has the following pdf

(43) $f_{M F S S T} (y; ξ, Σ, ζ, ν) = 2 t_{p} (y; ξ, Σ, ν) T (ζ \sqrt{\frac{ν + p}{ν + δ^{2}}}; ν + p),$

where

T (\cdot, ν)

denotes the cdf of the t distribution with DOF

ν

. The detailed proof of (43) is sketched in Appendix B. It is notable that the MFSST distribution includes the multivariate skew-t distribution of Azzalini and Capitanio [6] as a particular member by letting

λ_{2} = \dots = λ_{m} = 0

and the MFSSN distribution as a limiting case when

ν

grows to infity.

Using (18) subject to $γ_{1} = γ_{2} = γ$ , it suffices to show that

(44) $\begin{matrix} f (γ ∣ y) & = & \frac{1}{T (M; ν + p)} g (γ; \frac{ν + p}{2}, \frac{ν + δ^{2}}{2}) Φ (γ^{1 / 2} ζ), \end{matrix}$

where

M = ζ \sqrt{\frac{ν + p}{δ^{2} + ν}}

With arguments similar to those of Lin et al. [20], it is straightforward to derive

(45) $\begin{matrix} E (γ ∣ Y = y) = (\frac{ν + p}{δ^{2} + ν}) \frac{T (M \sqrt{\frac{ν + p + 2}{ν + p}}; ν + p + 2)}{T (M; ν + p)} \end{matrix}$

and

(46) $\begin{matrix} E (ln γ ∣ Y = y) & = & ψ (\frac{ν + p}{2}) - ln (\frac{δ^{2} + ν}{2}) \\ + \frac{ν + p}{δ^{2} + ν} \{\frac{T (M \sqrt{\frac{ν + p + 2}{ν + p}}; ν + p + 2)}{T (M; ν + p)} - 1\} \\ + \frac{ζ (δ^{2} - 1)}{\sqrt{(ν + p) {(ν + δ^{2})}^{3}}} \frac{t (M; ν + p)}{T (M; ν + p)} \\ + \frac{1}{T (M; ν + p)} \int_{- \infty}^{M} κ_{ν} (x) t (x; ν + p) d x, \end{matrix}$

where

$\begin{matrix} κ_{ν} (x) = ψ (\frac{ν + p + 1}{2}) - ψ (\frac{ν + p}{2}) - ln (1 + \frac{x^{2}}{ν + p}) + \frac{(ν + p) x^{2} - ν - p}{(ν + p) (ν + p + x^{2})} . \end{matrix}$

Additionally, using the law of iterated expectations, we can deduce that

(47) $\begin{matrix} E (W γ ∣ Y = y) & = & \frac{1}{T (M; ν + p)} {M \sqrt{\frac{ν + p}{δ^{2} + ν}} T (M \sqrt{\frac{ν + p + 2}{ν + p}}; ν + p + 2) \\ + {(δ^{2} + ν)}^{- 1 / 2} \frac{Γ ((ν + p + 1) / 2))}{\sqrt{π} Γ ((ν + p) / 2)} {(1 + \frac{ζ^{2}}{δ^{2} + ν})}^{- \frac{ν + p + 1}{2}}} . \end{matrix}$

As a consequence, the E-step in (20) and (21) requires the calculation of ${\hat{s}}_{1 i}^{(k)} = {\hat{s}}_{2 i}^{(k)} = E (γ_{i} ∣ y_{i}, θ^{(k)})$ , ${\hat{s}}_{3 i}^{(k)} = E (W_{i} γ_{i} ∣ y_{i}, {\hat{θ}}^{(k)})$ , and ${\hat{s}}_{4 i}^{(k)} = E (ln γ_{i} ∣ y_{i}, {\hat{θ}}^{(k)})$ , which can be directly evaluated via (45)–(47). Moreover, the procedure of updating ${\hat{ν}}^{(k)}$ is the same as (30).

3.5. The Multivariate Flexible Skew-Symmetric-t-t Distribution

Consider two independent random variables $τ_{1}$ ∼ $Γ (ν_{1} / 2, ν_{1} / 2)$ and $τ_{2}$ ∼ $Γ (ν_{2} / 2, ν_{2} / 2)$ with joint pdf given by

(48) $\begin{matrix} h_{τ} (γ_{1}, γ_{2}; ν_{1}, ν_{2}) = g (γ_{1}; ν_{1} / 2, ν_{1} / 2) g (γ_{2}; ν_{2} / 2, ν_{2} / 2) . \end{matrix}$

Using (14), we thus generate the multivariate flexible skew-symmetric-t-t (MFSSTT) distribution, denoted by $Y$ ∼ $M F S S T T_{p} (ξ, Σ, α, ν_{1}, ν_{2})$ , whose pdf is of the form

(49) $f_{M F S S T T} (y; ξ, Σ, α, ν) = 2 t_{p} (y; ξ, Σ, ν_{1}) T (ζ; ν_{2}) .$

When the two DOFs are restricted to be equal, namely $ν_{1} = ν_{2} = ν$ , $Y$ is said to follow the ‘MFSSTTe’ distribution. One thing worth noting is that the MFSSTT distribution reduces to the MFSSN distribution when $(ν_{1}, ν_{2}) \to (\infty, \infty)$ , to the MFSSTN distribution by letting $ν_{1} = ν$ and $ν_{2} = \infty$ , and embeds the multivariate skew-t-t (MSTT) distribution introduced by Wang et al. [21] as a special case under the setting of $λ_{2} = \dots = λ_{m} = 0$ .

According to (18), it is easy to see $γ_{1} ∣ Y = y$ ∼ $Γ ((ν_{1} + p) / 2, (ν_{1} + δ^{2}) / 2)$ . The conditional pdf of $γ_{2}$ given $Y = y$ is

(50) $\begin{matrix} f (γ_{2} ∣ y) = \frac{g (γ_{2}; ν_{2} / 2, ν_{2} / 2) Φ (γ_{2}^{1 / 2} ζ)}{T (ζ; ν_{2})} . \end{matrix}$

Further, we have the following conditional expectations:

(51) $\begin{matrix} E (γ_{2} ∣ Y = y) = \frac{T (ζ \sqrt{\frac{ν_{2} + 2}{ν_{2}}}; ν_{2} + 2)}{T (ζ; ν_{2})} \end{matrix}$

and

(52) $\begin{matrix} E (ln γ_{2} ∣ Y = y) & = & ψ (\frac{ν_{2} + 1}{2}) - ln (\frac{ν_{2}}{2}) + \frac{T (ζ \sqrt{\frac{ν_{2} + 2}{ν_{2}}}; ν_{2} + 2)}{T (ζ; ν_{2})} - 1 \\ - (\frac{ζ}{ν_{2}}) \frac{t (ζ; ν_{2})}{T (ζ; ν_{2})} - \frac{1}{T (ζ; ν_{2})} \int_{- \infty}^{ζ} ln (1 + \frac{x^{2}}{ν_{2}}) t (x; ν_{2}) d x . \end{matrix}$

Applying the law of iterated expectations to (15) and (50) gives

(53) $\begin{matrix} E (W γ_{2} ∣ Y = y) = \frac{1}{T (ζ; ν_{2})} \{ζ T (ζ \sqrt{\frac{ν_{2} + 2}{ν_{2}}}; ν_{2} + 2) + t (ζ; ν_{2})\} . \end{matrix}$

With slight modifications as defined in (20) and (21), the necessary conditional expectations in the E-step include ${\hat{s}}_{1 i}^{(k)} = E (γ_{1 i} ∣ y_{i}, {\hat{θ}}^{(k)})$ , ${\hat{s}}_{2 i}^{(k)} = E (γ_{2 i} ∣ y_{i}, {\hat{θ}}^{(k)})$ , ${\hat{s}}_{3 i}^{(k)} = E (W_{i} γ_{2 i} ∣ y_{i}, {\hat{θ}}^{(k)})$ , ${\hat{s}}_{4 i}^{(k)} = E (ln γ_{1 i} ∣ y_{i}, {\hat{θ}}^{(k)})$ , and ${\hat{s}}_{5 i}^{(k)} = E (ln γ_{2 i} ∣ y_{i}, {\hat{θ}}^{(k)})$ , which can be easily evaluated via the results given in (29) and (51)–(53), respectively. To numerically estimate $ν_{1}$ and $ν_{2}$ for the MFSSTT distribution, we resort to the following two root-finding equations:

CMQ-Step 4: ${\hat{ν}}_{1}^{(k + 1)}$ and ${\hat{ν}}_{2}^{(k + 1)}$ are obtained by solving the roots of the following two equations:
(54) $1 + ln (\frac{ν_{1}}{2}) - ψ (\frac{ν_{1}}{2}) + \frac{1}{n} \sum_{i = 1}^{n} ({\hat{s}}_{4 i}^{(k)} - {\hat{s}}_{1 i}^{(k)}) = 0$
and
(55) $1 + ln (\frac{ν_{2}}{2}) - ψ (\frac{ν_{2}}{2}) + \frac{1}{n} \sum_{i = 1}^{n} ({\hat{s}}_{5 i}^{(k)} - {\hat{s}}_{2 i}^{(k)}) = 0 .$

4. Simulation Studies

4.1. Recovery of the True Underlying Parameters

The first experiment intends to investigate the ability of the proposed ECME algorithm to recover the true underlying parameters. Monte Carlo samples of different sample sizes $n = 100$ , 250, 500, and 1000 were generated from the MFSSN distributions specified in (7) and five examples of SSMFSSN distributions studied in Section 3. For ease of exposition, we considered the Hadamard power transformation of order three ( $K = 3$ ) that allows two shape parameters in the skewing function $Φ (\cdot)$ . Moreover, the flatness parameters for MFSSCN and MFSSTT were assumed to be equal, say $ν_{1} = ν_{2} = ν$ , referred to as the MFSSCNe and MFSSTTe distributions. The presumed true parameters were $ξ = {(1, - 1)}^{⊤}$ , $σ = vech (Σ) = {(1, 0.5, 4)}^{⊤}$ , $λ_{1} = {(- 2, 2)}^{⊤}$ , and $λ_{2} = {(1, - 1)}^{⊤}$ . Furthermore, $ν = 3$ was taken for the MFSSTN, MFSSSLN, MFSST, and MFSSTTe distributions, while $ν = 0.7$ was adopted for the MFSSCNe distribution since its support lies within the interval $(0, 1)$ . As an illustration, Figure 2 displays the scatter plots superimposed on the fitted contours for each type of data simulated from one trail.

For all scenarios, the accuracies of the parameter estimates are assessed by computing the mean absolute bias (MAB) and the root mean square error (RMSE) over $R = 100$ replications. For a vector of parameters $θ = {(θ_{1}, \dots, θ_{p})}^{⊤}$ , these measures are, respectively, defined as

(56) $MAB = \frac{1}{p R} \sum_{k = 1}^{p} \sum_{r = 1}^{R} | {\hat{θ}}_{k r} - θ_{k}^{A} | and RMSE = \sqrt{\frac{1}{p R} \sum_{k = 1}^{p} \sum_{r = 1}^{R} {({\hat{θ}}_{k r} - θ_{k}^{A})}^{2}},$

where

{\hat{θ}}_{k r}

denotes the ML estimate of the kth parameter at the rth replication and

θ_{k}^{A}

represents the actual value of

θ_{k}

The experimental results are summarized in Table 1. It is readily seen both MAB and RMSE values tend to approach zero with increasing the sample size. While this study is limited to the simplest case $(p = 2; m = 2)$ , our developed ECME algorithm shows favorable ability to recover the true parameter values with data generated exactly according to model assumptions. Similar experiments have also been undertaken on more complex scenarios $(p = 3; m = 3)$ . The extensive results would not necessarily be excessively reported since the conclusions are in accordance with those already presented.

4.2. Comparing the Proposed Procedure with Convolution-Type EM Algorithms

The second experiment aims to compare the performance of the proposed selection-type ECME procedure outlined in Section 2.2 with the traditional EM-based algorithms derived based on convolution-type representations. As an illustration, we consider the fitting of MSN, MST, and MSCN distributions arisen from the multivariate skew-normal independent (SNI) family studied by Cabral et al. [22]. As discussed above, they are special cases of our proposed MFSSN, MFSST, and MFSSCN distributions by setting $λ_{2} = \dots = λ_{m} = 0$ . Accordingly, the CMQ-Step 1 in Section 2 can be simplified as follows:

CMQ-Step 1: Fixing $Σ = {\hat{Σ}}^{(k)}$ and $λ_{1} = {\hat{λ}}_{1}^{(k)}$ , we obtain ${\hat{ξ}}^{(k + 1)}$ by
(57) $\begin{matrix} {\hat{ξ}}^{(k + 1)} & = & {[I_{p} \sum_{i = 1}^{n} {\hat{s}}_{1 i}^{(k)} + {\hat{Σ}}^{(k)} {\hat{Δ}}_{1}^{⊤ (k)} 1_{p} 1_{p}^{⊤} {\hat{Δ}}_{1}^{(k)} \sum_{i = 1}^{n} {\hat{s}}_{2 i}^{(k)}]}^{- 1} {\sum_{i = 1}^{n} {\hat{s}}_{1 i}^{(k)} y_{i} \\ + {\hat{Σ}}^{(k)} {\hat{Δ}}_{1}^{⊤ (k)} 1_{p} 1_{p}^{⊤} {\hat{Δ}}_{1}^{(k)} \sum_{i = 1}^{n} {\hat{s}}_{2 i}^{(k)} y_{i} - {\hat{Σ}}^{(k)} {\hat{Δ}}_{1}^{⊤ (k)} 1_{p} \sum_{i = 1}^{n} {\hat{s}}_{3 i}^{(k)}} . \end{matrix}$

In total, 100 Monte Carlo (MC) samples of sizes $n = 100$ , 500, and 1000 were generated from each of the three distributions. The true parameters were the same as those given in the previous experiment except for $λ_{2} = 0$ . Each simulated sample was fitted twice with the proposed selection-type ECME procedure and the EM-type algorithm based on convolution-type representations, as implemented by the mixsmsn R package [23]. For a fair comparison, we started the two algorithms using the same initial values as described at the end of Section 2. All computations were carried out by Microsoft R package 3.5.1 in win64 environment of a desktop computer with 2.80-GHz/Intel Core(TM) i7-7700HQ CPU Processor and 16.0 GB RAM. Performance evaluation was assessed by the execution CPU time and the converged log-likelihood maxima.

The box plots depicted in Figure 3 reveal the selection-type algorithm demands much lower computational cost than those required for the convolution-type algorithm. The phenomenon is more apparent for the MST and MSCN distributions, particularly for larger n. The high efficiency of the selection-type algorithm can be ascribed to the fact that its E-step is designed by virtue of simplification. Finally, it is worth mentioning that both algorithms can achieve the same final log-likelihood, as demonstrated by violin plots in Figure 4.

5. An Illustrative Example: The Wind Speed Data

We considered a trivariate dataset analyzed by Azzalini and Genton [24] for the study of spatial distribution of wind speed by means of the MST and various MSSMSN distributions proposed by Azzalini and Capitanio [6] and Arellano-Valle et al. [5], respectively. This dataset contains 278 hourly average speed assembled at three meteorological towers: Goodnoe Hills (gh), Kennewick (kw), and Vansycle (vs) from 23 February to 30 November 2003 recorded at midnight when wind speeds tend to peak. The positive and negative signs of wind speed measurements represent a westerly wind direction and an easterly wind direction, respectively. The Ljung–Box test indicates weak serial correlation for observations measured at the three stations. For modeling these data, we followed Azzalini and Genton [24] to treat the observations as being independent and identically distributed. Figure 5 presents histograms overlaid with kernel density curves obtained by using R density() function for measurements collected at each tower.

Six SSMFSSN models of order 3 were considered to fit the wind speed data. For the sake of comparison, the MST, MSTN, and MSTC distributions belonging to the MSSMSN family [5] were also fitted as sub-models of SSMFSSN subject to the constraint of $λ_{2} = 0$ . To select an appropriate model from the candidates, we adopted the Akaike information criterion (AIC) [25] and the Bayesian information criterion (BIC) [26], which are the two most widely used model selection indices based on penalized likelihood and applicable for both nested and non-nested models. The two criteria are defined as

(58) $AIC = 2 d - 2 ℓ_{m a x} and BIC = d log n - 2 ℓ_{m a x},$

where d is the number of free parameters in the model and

ℓ_{\max}

is the maximized log-likelihood value. A lower AIC or BIC value indicates that a closer fit of the model to the data.

Table 2 compares the ML estimation results for nine candidate models. As can be seen, our proposed SSMFSSN models perform favorably as compared to three MSSMSN analogs because they suffer from a lack of ability to capture the possibly bimodal behavior of the wind speed data (Figure 5). Accordingly, the MFSSTT distribution provides the best fit in terms of the lowest value of AIC as well as BIC, followed by the MFSST distribution. The MSTC and MST are the top two MSSMSN models with smaller AIC and BIC values.

Table 3 summarizes the resulting ML estimates of 6 considered SSMFFSN models together with their asymptotic standard errors obtained by performing the parametric bootstrap method [27]. Notably, the estimates of the shape and flat parameters indicate the presence of skewed and leptokurtic characteristics toward different directions among the three variables. As an illustration, the fitted 3D contour densities of MSTC, MST, MFSST, and MFSSTT distributions are depicted in Figure 6. It is interesting to see that the MFSSTT model having the smallest AIC and BIC can adapt the shape of the wind speed data more closely than the other three competitors.

6. Conclusions

We introduce a novel family of SSMFSSN distributions as a generalization of the work of Mahdavi et al. [10] that can capture simultaneously the dependency among multivariate responses, skewness, heavy-tailedness, and, in particular, multimodal density shapes without resorting to the use of finite mixtures [14,15,21]. Since the SSMFSSN model cannot be represented by a convolution-type form, this stimulates us to devise a feasible ECME algorithm for ML estimation under the selection-type mechanism. The effectiveness and efficiency of the algorithm are evaluated by conducting two simulation studies. Numerical analysis of a real dataset highlights the potential and capability of our proposed approach as a promising alternative tool for modeling multimodal multivariate data with asymmetrical behavior. Computer programs for implementation of our methods can be installed as a R package from Github devtools::install_github(“a-mahdavi/SSMFSSN.EM”). Further developments of the current approach could be exploited for powerful extensions of the factor analysis model or finite mixtures thereof with censored or possibly missing values that were considered recently by the authors of [28,29,30,31,32,33,34,35]. One limitation of the SSMFSSN model is that it may not be suited to the data with modes having too far distances. Another worthwhile extension of this work is to pursue a mixture modeling framework of the current approach that would be an effective way to resolve this problem.

Author Contributions

A.M. and T.-I.L. conceived the project, developed the statistical methods, designed the approach, and analyzed the results. All authors contributed to the development of the methodology and to writing the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

TIL was partially supported by the Ministry of Science and Technology of Taiwan (Grant No. MOST 109-2118-M-005-005-MY3).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors gratefully acknowledge three anonymous referees for their comments and suggestions that greatly improved this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. The Hadamard Product

Below, we present three propositions concerning the multiplication and partial derivatives of the Hadamard product which are useful for our methodological developments.

Proposition A1.

Let $A$ be a $p \times q$ matrix and $D$ be a $p \times p$ diagonal matrix. Then, ${(D A)}^{⊙ n} = D^{⊙ n} A^{⊙ n}$ , where $n \in N$ .

Proof.

Multiplication of a Hadamard product by diagonal matrix $D$ satisfies the following equation:

(A1)

D (A ⊙ B) = (D A) ⊙ B = A ⊙ (D B) .

By taking $B = A$ in (A1), we obtain

(A2)

D (A ⊙ A) = (D A) ⊙ A = A ⊙ (D A) .

Now, consider $n = 2$ ; using (A2), we have

$D^{⊙ 2} A^{⊙ 2} = D [(D A) ⊙ A] = (D A) ⊙ (D A) = {(D A)}^{⊙ 2} .$

Similarly, for $n = 3$ ,we have

$D^{⊙ 3} A^{⊙ 3} = D [{(D A)}^{⊙ 2} ⊙ A] = {(D A)}^{⊙ 2} ⊙ (D A) = {(D A)}^{⊙ 3} .$

Clearly, the desired statement can be established by mathematical induction on n. □

Proposition A2.

Let $A$ be a $p \times q$ matrix and $b \in R^{p}$ and $x \in R^{q}$ be two column vector. Then,

$\frac{\partial}{\partial x} b^{⊤} {(A x)}^{⊙ n} = n A^{⊤} ({(A x)}^{⊙ n - 1} ⊙ b),$

Proof.

Let $f = b^{⊤} {(A x)}^{⊙ n}$ . It is easy to show

$\begin{matrix} f = T r a c e (b^{⊤} {(A x)}^{⊙ n}) = b : {(A x)}^{⊙ n}, \end{matrix}$

where “:” denotes Frobenius products defined as

A : B = T r a c e (A^{⊤} B)

Making use of the following facts

$\begin{matrix} (A ⊙ B) : C = A : (B ⊙ C), \\ A : B C = B^{⊤} A : C = A C^{⊤} : B, \\ \frac{\partial}{\partial x} (A ⊙ B) = (\frac{\partial A}{\partial x}) ⊙ B + A ⊙ (\frac{\partial B}{\partial x}), \end{matrix}$

the differential of the scalar function f yields

$\begin{matrix} d f & = & b : n {(A x)}^{⊙ n - 1} ⊙ A (d x) = n ({(A x)}^{⊙ n - 1} ⊙ b) : A (d x) \\ = & n A^{⊤} ({(A x)}^{⊙ n - 1} ⊙ b) : d x . \end{matrix}$

This completes the proof. □

Proposition A3.

Let $X$ be a $p \times q$ matrix and $a \in R^{p}$ and $b \in R^{q}$ be two column vector then,

$\frac{\partial}{\partial x} a^{⊤} {(x b)}^{⊙ n} = n ({(x b)}^{⊙ n - 1} ⊙ a) b^{⊤} .$

Proof.

Similar to the proof of Proposition A2, we define $f = a : {(X b)}^{⊙ n}$ and obtain its differential as below

$\begin{matrix} d f & = & a : n {(X b)}^{⊙ n - 1} ⊙ (d X) b = n ({(X b)}^{⊙ n - 1} ⊙ a) : (d X) b \\ = & n ({(X b)}^{⊙ n - 1} ⊙ a) b^{⊤} : d X, \end{matrix}$

which completes the proof. □

Appendix B. Proof of Equation (43)

Proof.

Without loss of generality, we assume that $ξ = 0$ and $Σ = I_{p}$ . Let $Y \overset{d}{=} (τ^{- 1 / 2}) Z_{1} | (τ^{- 1 / 2} Z_{2} < λ_{1}^{⊤} (τ^{- 1 / 2} Z_{1}) + \dots + λ_{m}^{⊤} {(τ^{- 1 / 2} Z_{1})}^{⊙ 2 m - 1})$ , where $Z_{1}$ ∼ $N_{p} (0, I_{p})$ and $Z_{2}$ ∼ $N (0, 1)$ are two independent random variables and $τ \sim Γ (ν / 2, ν / 2)$ . Clearly, ${(X_{1}, X_{2})}^{⊤} \overset{d}{=} τ^{- 1 / 2} {(Z_{1}, Z_{2})}^{⊤}$ ∼ $t_{p + 1} (0, I_{p + 1}, ν)$ . Therefore, it is easy to verify that $X_{1}$ ∼ $t_{p} (0, I_{p}, ν)$ , $X_{2}$ ∼ $t (0, 1, ν)$ and

$\sqrt{\frac{ν + p}{ν + x_{1}^{⊤} x_{1}}} X_{2} | (X_{1} = x_{1}) \sim t (0, 1, ν + p) .$

By Bayes’ theorem, the pdf of $Y \overset{d}{=} X_{1} | (X_{2} < λ_{1}^{⊤} X_{1} + \dots + λ_{m}^{⊤} X_{1}^{⊙ 2 m - 1})$ is

$\begin{matrix} f_{Y} (y) & = & \frac{f_{X_{1}} (y) Pr (X_{2} < λ_{1}^{⊤} X_{1} + \dots + λ_{m}^{⊤} X_{1}^{⊙ 2 m - 1} | X_{1} = y)}{Pr (X_{2} < λ_{1}^{⊤} X_{1} + \dots + λ_{m}^{⊤} X_{1}^{⊙ 2 m - 1})} \\ = & 2 f_{X_{1}} (y) Pr (\sqrt{\frac{ν + p}{ν + y^{⊤} y}} X_{2} < \sqrt{\frac{ν + p}{ν + y^{⊤} y}} (λ_{1}^{⊤} y + \dots + λ_{m}^{⊤} y^{⊙ 2 m - 1}) | X_{1} = y) \\ = & 2 t_{p} (y; 0, I_{p}, ν) T (\sqrt{\frac{ν + p}{ν + y^{⊤} y}} (λ_{1}^{⊤} y + \dots + λ_{m}^{⊤} y^{⊙ 2 m - 1}); ν + p) . \end{matrix}$

This completes the proof. □

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures and Tables

View Image - Figure 1. Four scatter-contour plots coupled with their marginal histograms of the bivariate MFSSN distribution: (a) m=2, λ1=(0,2)⊤ and λ2=(0,−2)⊤ (bimodal); (b) m=2, λ1=(−2,2)⊤ and λ2=(1,−1)⊤ (trimodal); (c) m=3, λ1=(−2,2)⊤, λ2=(1,−1)⊤ and λ3=(2,−2)⊤ (trimodal); (d) m=3, λ1=(0,2)⊤, λ2=(0,−2)⊤ and λ3=(2,−1)⊤ (trimodal).

Figure 1. Four scatter-contour plots coupled with their marginal histograms of the bivariate MFSSN distribution: (a) m=2, λ1=(0,2)⊤ and λ2=(0,−2)⊤ (bimodal); (b) m=2, λ1=(−2,2)⊤ and λ2=(1,−1)⊤ (trimodal); (c) m=3, λ1=(−2,2)⊤, λ2=(1,−1)⊤ and λ3=(2,−2)⊤ (trimodal); (d) m=3, λ1=(0,2)⊤, λ2=(0,−2)⊤ and λ3=(2,−1)⊤ (trimodal).

Figure 2. Scatter-contour plots of one simulation case with 500 random samples generated from six subfamilies of the proposed model.

View Image - Figure 3. Box plots of CPU time for convergence of selection-type and convolution-type algorithms for fitting MSN, MST, and MSCN distributions under various sample sizes.

Figure 3. Box plots of CPU time for convergence of selection-type and convolution-type algorithms for fitting MSN, MST, and MSCN distributions under various sample sizes.

View Image - Figure 4. Violin plots of converged log-likelihood obtained by selection and convolution EM-type algorithms for fitting MSN, MST, and MSCN distributions under various sample sizes.

Figure 4. Violin plots of converged log-likelihood obtained by selection and convolution EM-type algorithms for fitting MSN, MST, and MSCN distributions under various sample sizes.

View Image - Figure 5. Histograms of univariate measurements overlaid with kernel density curves for the hourly average wind speed collected at three meteorological towers.

Figure 5. Histograms of univariate measurements overlaid with kernel density curves for the hourly average wind speed collected at three meteorological towers.

Figure 6. The 3-D contour densities for MSTC, MST, MFSST, and MFSSTT distributions fitted to the wind speed data.

Table 1

Simulation results based on 100 replications with different sample sizes.

Model	Parameter	n = 100		n = 250		n = 500		n = 1000
Model	Parameter	MAB	RMSE	MAB	RMSE	MAB	RMSE	MAB	RMSE
MFSSN	$ξ$	0.090	0.122	0.057	0.074	0.037	0.048	0.027	0.033
	$σ$	0.257	0.382	0.143	0.209	0.115	0.168	0.081	0.123
	$λ_{1}$	0.476	0.625	0.203	0.270	0.111	0.155	0.079	0.112
	$λ_{2}$	0.339	0.489	0.160	0.209	0.104	0.131	0.064	0.082
MFSSTN	$ξ$	0.095	0.127	0.062	0.086	0.040	0.053	0.028	0.036
	$σ$	0.419	0.673	0.244	0.383	0.158	0.241	0.121	0.179
	$λ_{1}$	0.382	0.469	0.349	0.426	0.317	0.382	0.306	0.355
	$λ_{2}$	0.335	0.439	0.194	0.244	0.177	0.216	0.151	0.181
	$ν$	2.495	10.745	0.458	0.651	0.319	0.413	0.203	0.258
MFSSSLN	$ξ$	0.091	0.121	0.052	0.069	0.036	0.047	0.025	0.035
	$σ$	0.424	0.641	0.306	0.489	0.190	0.314	0.127	0.200
	$λ_{1}$	0.458	0.588	0.267	0.347	0.216	0.269	0.174	0.206
	$λ_{2}$	0.470	0.614	0.259	0.378	0.174	0.227	0.114	0.138
	$ν$	6.530	11.849	3.425	8.206	1.191	3.251	0.481	0.721
MFSSCNe	$ξ$	0.087	0.117	0.048	0.064	0.036	0.047	0.026	0.034
	$σ$	0.363	0.546	0.285	0.440	0.223	0.339	0.133	0.202
	$λ_{1}$	0.446	0.595	0.288	0.353	0.202	0.247	0.138	0.174
	$λ_{2}$	0.426	0.586	0.265	0.347	0.178	0.219	0.131	0.160
	$ν$	0.199	0.216	0.162	0.176	0.116	0.137	0.083	0.098
MFSST	$ξ$	0.114	0.152	0.069	0.091	0.042	0.055	0.032	0.040
	$σ$	0.393	0.586	0.250	0.386	0.160	0.238	0.132	0.216
	$λ_{1}$	0.453	0.559	0.265	0.329	0.207	0.253	0.194	0.228
	$λ_{2}$	0.355	0.468	0.215	0.273	0.137	0.175	0.110	0.139
	$ν$	1.152	2.210	0.430	0.621	0.266	0.366	0.207	0.265
MFSSTTe	$ξ$	0.114	0.146	0.070	0.094	0.047	0.061	0.029	0.038
	$σ$	0.400	0.654	0.213	0.323	0.165	0.255	0.115	0.188
	$λ_{1}$	0.437	0.546	0.411	0.476	0.406	0.453	0.414	0.438
	$λ_{2}$	0.313	0.409	0.216	0.260	0.183	0.217	0.167	0.193
	$ν$	1.332	2.733	0.424	0.573	0.300	0.408	0.210	0.279

Table 2

Summary results from fitting various models to the wind speed data. The model with the smallest value of AIC and BIC is displayed in bold.

Family	Model	$ℓ_{max}$	d	AIC	BIC
	MSTC	–3178.7	13	6383.4	6430.5
MSSMSN	MST	–3180.7	13	6387.5	6434.6
	MSTN	–3180.9	13	6387.8	6434.9
	MFSSN	–3171.7	15	6373.4	6427.8
	MFSSTN	–3145.6	16	6323.1	6381.2
SSMFSSN	MFSSSLN	–3147.1	16	6326.2	6384.2
	MFSSCN	–3145.6	16	6323.3	6381.3
	MFSST	–3143.0	16	6318.1	6376.1
	MFSSTT	–3138.2	17	6310.4	6372.1

Table 3

ML estimates of key parameters for six SSMFSSN models. The associated standard errors are shown in parentheses.

Parameter	MFSSN	MFSSTN	MFSSSLN	MFSSCN	MFSST	MFSSTT
$ξ_{1}$	23.0(0.05)	21.3 (0.09)	21.1 (0.04)	21.5 (0.07)	19.9 (0.06)	18.6 (0.08)
$ξ_{2}$	14.8 (0.04)	15.6 (0.08)	15.2 (0.04)	15.0 (0.07)	15.1 (0.04)	15.0 (0.06)
$ξ_{3}$	14.6 (0.04)	14.9 (0.06)	14.8 (0.02)	15.4 (0.04)	13.4 (0.08)	12.6 (0.09)
$σ_{11}$	221.2 (0.26)	122.9 (0.49)	80.6 (0.24)	115.8 (0.31)	115.6 (0.21)	115.6 (0.25)
$σ_{21}$	138.8 (0.16)	96.4 (0.36)	64.6 (0.18)	91.5 (0.24)	92.8 (0.15)	93.2 (0.17)
$σ_{31}$	151.0 (0.13)	104.8 (0.26)	69.5 (0.13)	99.1 (0.16)	102.9 (0.18)	106.0 (0.19)
$σ_{22}$	181.3 (0.10)	134.5 (0.27)	90.3 (0.16)	128.0 (0.18)	131.2 (0.18)	132.1 (0.20)
$σ_{32}$	111.4 (0.07)	80.8 (0.18)	54.4 (0.09)	79.6 (0.12)	78.8 (0.15)	79.1 (0.15)
$σ_{33}$	296.4 (0.07)	203.4 (0.19)	134.0 (0.13)	187.2 (0.09)	204.4 (0.24)	206.7 (0.27)
$λ_{11}$	–1.7 (0.01)	–0.4 (0.04)	–0.2 (0.01)	–0.3 (0.02)	0.1 (0.02)	1.0 (0.02)
$λ_{12}$	1.6 (0.04)	1.1 (0.04)	1.0 (0.03)	1.3 (0.04)	1.3 (0.01)	3.2 (0.01)
$λ_{13}$	1.9 (0.04)	1.4 (0.04)	1.1 (0.02)	1.3 (0.02)	1.4 (0.01)	2.8 (0.02)
$λ_{21}$	–0.7 (1.99)	–0.1 (2.36)	0.0 (1.03)	–0.1 (1.53)	–0.2 (0.04)	1.2 (0.32)
$λ_{22}$	–0.7 (0.05)	–0.4 (0.25)	–0.2 (0.05)	–0.4 (0.19)	–0.5 (0.01)	–1.6 (0.21)
$λ_{23}$	–0.1 (1.52)	–0.2 (1.98)	–0.1 (0.78)	–0.2 (1.18)	–0.1 (0.02)	–2.3 (0.17)
$ν_{1}$	–	5.0 (0.18)	1.5 (0.03)	0.2 (0.04)	4.7 (0.06)	4.7 (0.08)
$ν_{2}$	–	–	–	0.2 (0.54)	–	1.0 (0.16)

Word count: 6038

Show less

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Multivariate skew-symmetric-normal (MSSN) distributions have been recognized as an appealing tool for modeling data with non-normal features such as asymmetry and heavy tails, rendering them suitable for applications in diverse areas. We introduce a richer class of MSSN distributions based on a scale-shape mixture of (multivariate) flexible skew-symmetric normal distributions, called the SSMFSSN distributions. This very general class of SSMFSSN distributions can capture various shapes of multimodality, skewness, and leptokurtic behavior in the data. We investigate some of its probabilistic characterizations and distributional properties which are useful for further methodological developments. An efficient EM-type algorithm designed under the selection mechanism is advocated to compute the maximum likelihood (ML) estimates of parameters. Simulation studies as well as applications to a real dataset are employed to illustrate the usefulness of the presented methods. Numerical results show the superiority of our proposed model in comparison to several existing competitors.

Details

Title

A Multivariate Flexible Skew-Symmetric-Normal Distribution: Scale-Shape Mixtures and Parameter Estimation via Selection Representation

Author

Mahdavi, Abbas¹; Amirzadeh, Vahid¹; Jamalizadeh, Ahad¹; Lin, Tsung-I²

¹ Department of Statistics, Faculty of Mathematics & Computer, Shahid Bahonar University of Kerman, Kerman 7616914111, Iran; [email protected] (A.M.); [email protected] (V.A.); [email protected] (A.J.)
² Institute of Statistics, National Chung Hsing University, Taichung 402, Taiwan; Department of Public Health, China Medical University, Taichung 404, Taiwan

First page

1343

Publication year

2021

Publication date

2021

Publisher

MDPI AG

e-ISSN

20738994

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/sym13081343

ProQuest document ID

2565720628

A Multivariate Flexible Skew-Symmetric-Normal Distribution: Scale-Shape Mixtures and Parameter Estimation via Selection Representation

Jump to:

Full text

Abstract

Details

Suggested sources