Likelihood Based Inference and Bias Reduction in

Full text

Turn on search term navigation

1. Introduction

Arellano-Valle et al. [1] introduced the skew-generalized-normal (SGN) distribution. We say that a random variable Z follows a SGN distribution, denoted as $Z \sim S G N (λ_{1}, λ_{2})$ , if its probability density function (pdf) takes the following form:

(1) $f (z; λ_{1}, λ_{2}) = 2 ϕ (z) Φ (\frac{λ_{1} z}{\sqrt{1 + λ_{2} z^{2}}}), z \in R,$

where

λ_{1} \in R

λ_{2} \geq 0

ϕ

, and

Φ

denote the pdf and the cumulative distribution function (cdf) of the

N (0, 1)

distribution, respectively. For the case

λ_{1} = 0

, the SGN distribution is reduced to standard normal distribution. On the other hand, non-zero values of parameter

λ_{1}

, directly influence the skewness of the model. In particular, when

λ_{2} = 0

the SGN distribution is reduced to the well-known skew-normal (SN) distribution introduced by Azzalini [2]. Moreover, it converges to the half-normal distribution when

λ_{1} \to \infty

The SGN has been previously used in the literature. In this sense, Sever et al. [3] used this model in the context of discriminant analysis; Arnold et al. [4] considered the bivariate case; later Gómez et al. [5] examined the skew-curved-normal distribution, which is a subfamily of the SGN distribution. The singularity of the Fisher’s information matrix was examined by Arellano-Valle et al. [6]; in that paper it was concluded that for the SGN model with location and scale parameters, the Fisher information matrix is singular for the particular case when the normality is restored ( $λ_{1} = 0$ ).

Arrué et al. [7], considered bias reduction of the ML estimate of the shape parameter $λ_{1}$ in the SGN distribution when $λ_{2} = 1$ . This submodel was named modified skew-normal (MSN). They also showed that Fisher’s information matrix of the MSN model is non-singular when the shape parameter is null. However, the issue of the divergence of the ML estimator for the shape parameter behaves in a similar way as in the SN model (see Sartori [8]). Hence, they applied the method introduced by Firth [9] to reduce the bias of this estimator for the shape parameter.

The MSN model is a convenient alternative to the SN model because in addition to regulating the asymmetry by means of a single parameter, it allows us to apply regular asymptotic theory to study the behavior of the MLE around normality. However, these two models do not have enough flexibility to adjust data from asymmetric distributions with heavy tails. With the purpose of producing robust inferences under these situations, we consider in this work a distribution that has more flexibility than the MSN distribution. Specifically, we consider the modified skew-t-normal (MStN) distribution, which extends the MSN distribution (this is a limiting case when the degrees of freedom tend to infinite) and consequently it is a more flexible model than the MSN distribution. In addition, it is a very useful model to describe abnormal data. On the other hand, the problem with the estimation of the shape parameter persists in this new distribution. For that reason, the main goal of this work is to examine bias reduction by using Firth’s method in the MStN distribution by assuming that the degrees of freedom are known. It will be seen that this leads us to similar results to those obtained in the MSN distribution.

The structure of the paper is as follows. In Section 2 the MStN is considered and some of its main properties are examined. In Section 3 the likelihood-based inference and the singularity of the Fisher’s information matrix of this distribution are studied. Next, in Section 4, the bias reduction methodology introduced by Firth [9] is described. This approach was used by Sartori [8] for the skew-normal and skew-t distributions and by Arrué et al. [7] for the MSN distribution. Later, this method is applied, in particular, to the shape parameter of the MStN distribution. It is shown that modified ML estimation is always finite. Furthermore, for the case of location, scale, and shape parameters, the methodology is applied by combining with the ML estimates of the location and shape parameters. Two applications of this methodology to real datasets are considered in Section 5. Finally, the conclusions are given in Section 6.

2. MStN Distribution

We say that a random variable Z follows a modified skew t-normal distribution, denoted by $Z \sim M S t N (λ, ν)$ , if its pdf can be expressed as

(2) $f_{Z} (z; λ, ν) = 2 t_{ν} (z) Φ (λ u (z)), z \in R,$

where

u (z) = \frac{z}{\sqrt{1 + z^{2}}}

z \in R

and

λ \in R

. Here,

t_{ν}

and

Φ

denote the pdf of the Student’s-t distribution with

ν

degrees of freedom and the cdf of the

N (0, 1)

distribution, respectively. For the particular case that

λ = 0

, then the pdf of the MStN distribution given in (2) is equivalent to the pdf of the Student’s-t distribution. On the other hand, non-zero values of

λ

directly affect the symmetry and kurtosis of the model. In particular, when

λ \to \infty

the model converges to the half-t distribution. Additionally, when

ν \to \infty

, MStN approaches to the MSN distribution. Figure 1 shows the behavior of the pdf of the MStN distribution for different values of

λ

and

ν

The thick solid line corresponds to the case $λ = 0$ , i.e., Student’s-t distribution ( $ν = 1$ left panel and $ν = 5$ right panel). The two panels illustrate that skewness and kurtosis are modified with values of $λ$ and $ν$ . Furthermore, for the limiting case $ν \to \infty$ it coincides with the $M S N$ distribution. Finally, the normal density is obtained when $λ = 0$ . We can also incorporate location and scale parameters, i.e., $X = μ + σ Z$ , with $μ \in R$ , $σ > 0$ and $Z \sim M S t N (λ, ν)$ . Then, the new model is denoted by $X \sim M S t N (μ, σ, λ, ν)$ , with pdf given by

(3) $f_{X} (x; μ, σ, λ, ν) = \frac{2}{σ} t_{ν} (\frac{x - μ}{σ}) Φ (λ u (\frac{x - μ}{σ})), x \in R .$

The MStN distribution given by (3) provides a suitable parametric model for fitting empirical data from asymmetric and heavy-tailed distributions. An important property of the MStN distribution given by (3) is the existence and non-singularity of the Fisher’s information matrix when

λ = 0

. This means that we can use regular asymptotic theory to test if there is symmetry in the underlying distribution of the data.

2.1. Properties

The MStN distribution has a series of interesting formal properties. The following proposition lists some of them in the standardized case with $μ = 0$ and $σ = 1$ .

Proposition 1.

The MStN distribution satisfies the following properties:

1.
If $Z \sim M S t N (λ, ν)$ , then $- Z \sim M S t N (- λ, ν)$ .
2.
If $Z \sim M S t N (λ, ν)$ , then $| Z | \sim 2 t_{ν}$ (with positive support); in particular, if $ν \to \infty$ , then $| Z | \sim H N (0, 1)$ .
3.
$M S t N (0, ν) = t_{ν}$ .
4.
$M S t N (λ, 1) = M S C N$ (Modified Skew-Cauchy-Normal).
5.
If $ν \to \infty$ , then $M S t N (λ, ν) \to M S N (λ)$ ; in particular, if $λ = 0$ , then $M S t N (λ, ν) \to M S N (0) = N (0, 1)$ .
6.
If $Z | V = v \sim M S N (0, 1 / v, \sqrt{v} λ)$ , with conditional pdf
$f_{Z | V = v} (z; λ) = 2 \sqrt{v} ϕ (\sqrt{z}) Φ (λ u (z)), z \in R,$
where $V \sim G a m m a (ν / 2, ν / 2)$ , then $Z \sim M S t N (λ, ν)$ .
7.
If $Z | S = s \sim S t N (s, ν)$ , with conditional pdf
$f_{Z | S = s} (z; ν) = 2 t_{ν} (z) Φ (s z), z \in R,$
where $S \sim N (λ, 1)$ , then $Z \sim M S t N (λ, ν)$ .
8.
If $Z | V = v, S = s \sim S N (0, 1 / v, s / \sqrt{v})$ , with conditional pdf
$f_{Z | v = v, S = s} (z) = 2 \sqrt{v} ϕ (\sqrt{v} z) Φ (s z), z \in R,$
where $V \sim G a m m a (ν / 2, ν / 2)$ and $S \sim N (λ, 1)$ are independent, then $Z \sim M S t N (λ, ν)$ .

Property 8 shows the genesis of the MStN distribution, indicating that it belongs to the family of scale-shape mixtures of SN distributions defined in Arellano-Valle et al. [10], with a gamma and a normal mixing distributions for the scale V and shape S random variable, respectively. The proof of this property is obtained by integrating the conditional pdf $f_{Z | v = v, S = s} (z)$ in $(v, s)$ , and then using the following well-known facts: (i) if $V \sim G a m m a (ν / 2, ν / 2)$ , then $\int_{0}^{\infty} \sqrt{v} ϕ (\sqrt{v} z) f_{V} (v) d v = t_{ν} (z)$ ; (ii) if $S \sim N (λ, 1)$ , then $\int_{- \infty}^{\infty} Φ (s z) d s = Φ (λ u (z))$ . Properties 6 and 7 are direct consequences of Property 8 by the removal of one of the mixing variables. Property 6 represents the MStN distribution as a skew-scale mixture of the MSN distribution (Ferreira et al. [11]), while Property 7 represents it as a shape mixture of the MStN distribution. The remaining properties are proved straightforward.

2.2. Moments

For $ν > k$ , the moment of order k of $Z \sim M S t N (λ, ν)$ is given by

$\begin{matrix} E (Z^{k}) & = & \int_{- \infty}^{\infty} 2 z^{k} t_{ν} (z) Φ (λ u (z)) d z \\ = & \int_{- \infty}^{0} 2 z^{k} t_{ν} (z) Φ (λ u (z)) d z + \int_{0}^{\infty} 2 z^{k} t_{ν} (z) Φ (λ u (z)) d z \\ = & \int_{0}^{\infty} 2 {(- z)}^{k} t_{ν} (z) d z - \int_{0}^{\infty} 2 {(- z)}^{k} t_{ν} (z) Φ (λ u (z)) d z + \int_{0}^{\infty} 2 z^{k} t_{ν} (z) Φ (λ u (z)) d z \\ = & \{\begin{matrix} \int_{0}^{\infty} 2 z^{k} t_{ν} (z) d z & for k even, \\ - \int_{0}^{\infty} 2 z^{k} t_{ν} (z) d z + 2 \int_{0}^{\infty} 2 z^{k} t_{ν} (z) Φ (λ u (z)) d z & for k odd . \end{matrix} \end{matrix}$

That is, we have

$\begin{matrix} E (Z^{k}) & = & \{\begin{matrix} E (| Z |^{k}) & for k even, \\ - {E (| Z |}^{k} {) + 2 E (| Z |}^{k} Φ (λ u (| Z |)) & for k odd, \end{matrix} \end{matrix}$

where by Property 2, the random variable

| Z |

has a half-

t_{ν}

distribution with pdf

2 t_{ν} (u)

u > 0

. This means that

| Z |

has the same distribution of

V^{- 1 / 2} | Z_{0} |

, where

V \sim G a m m a (ν / 2, ν / 2)

and is independent of

Z_{0} \sim N (0, 1)

and so of

| Z_{0} | \sim H N (0, 1)

, the half-normal distribution with pdf

2 ϕ (u)

u > 0

. Thus, we find for

ν > k

and each

k \geq 1

that

{E (| Z |}^{k}) = E (V^{- k / 2}) E (| Z_{0} |^{k})

, with

(4) $E (V^{- k / 2}) = \frac{{(ν / 2)}^{k / 2} Γ ((ν - k) / 2)}{Γ (ν / 2)} : = ν_{k} (ν > k), E (| Z_{0} |^{k}) = \frac{Γ ((k + 1) / 2)}{{(1 / 2)}^{k / 2} Γ (1 / 2)} : = c_{k} .$

This is proved by the following proposition:

Proposition 2.If $ν > k$ , with $k \geq 1$ , then the random variable $Z \sim M S t N (λ, ν)$ has moment of order k, $μ_{k} = E (Z^{k})$ , given by

$\begin{matrix} μ_{k} & = & \{\begin{matrix} c_{k} ν_{k} & for k even, \\ 2 b_{k} - c_{k} ν_{k} & for k odd, \end{matrix} \end{matrix}$

with $c_{k}$ and $ν_{k}$ defined in (4) and $b_{k} : = b_{k} (λ, ν) = \int_{0}^{\infty} 2 x^{m} t_{ν} (x) Φ (λ u (x)) d x$ .

Note in Proposition 2 that $b_{k} (- λ, ν) = c_{k} ν_{k} - b_{k} (λ, ν)$ , $b_{k} (0, ν) = c_{k} ν_{k} / 2$ , $lim_{λ \to \infty} b_{k} (λ, ν) = c_{k} ν_{k}$ and $lim_{λ \to - \infty} b_{k} (λ, ν) = 0$ . In particular, when k is odd, the first of these relationships implies that $μ_{k} (- λ) = - μ_{k} (λ)$ . For k even, $μ_{k}$ is constant on $λ$ and hence it becomes a even function of $λ$ .

2.3. Skewness and Kurtosis

Assuming $ν > 3$ , the skewness coefficient can be computed by using the results in Proposition 2 and the expression given by

$\begin{matrix} \sqrt{β_{1}} & = & \frac{μ_{3} - 3 μ_{2} μ_{1} + 2 μ_{1}^{3}}{{(μ_{2} - μ_{1}^{2})}^{3 / 2}} . \end{matrix}$

Since $μ_{k} (- λ) = - μ_{k} (λ)$ when k is odd, then $\sqrt{β_{1}}$ is an odd function of $λ$ . This can also be observed in the left panel of Figure 2. The range of its values can be determined in terms of its minimum and maximum for each value of $ν$ , as can be observed in Table 1. These values are obtained from the expression

$lim_{λ \to \pm \infty} \sqrt{β_{1}} = \pm \frac{c_{3} ν_{3} - 3 c_{2} ν_{2} + 2 c_{1}^{2} ν_{1}^{2}}{{(c_{2} ν_{2} - c_{1}^{2} ν_{1}^{2})}^{3 / 2}} .$

Similarly, for

ν > 4

the kurtosis coefficient can be obtained from Proposition 2 and the expression given by

$\begin{matrix} β_{2} & = & \frac{μ_{4} - 4 μ_{3} μ_{1} + 6 μ_{2} μ_{1}^{2} - 3 μ_{1}^{4}}{{(μ_{2} - μ_{1}^{2})}^{2}} . \end{matrix}$

In this case, for k even

μ_{k} = c_{k} ν_{k}

does not depend on

λ

and hence

β_{2}

is an even function of

λ

, as is displayed in the right hand side panel of Figure 2. Again, minimum and maximum values allow us to compute the range of this coefficient. These values can be obtained from the expressions

$lim_{λ \to 0} β_{2} = 3 \frac{ν - 2}{ν - 4}, lim_{λ \to \pm \infty} β_{2} = \frac{c_{4} ν_{4} - 4 c_{3} ν_{3} c_{1} ν_{1} + 6 c_{2} ν_{2} c_{1}^{2} ν_{1}^{2} - 3 c_{1}^{4} ν_{1}^{4}}{{(c_{2} ν_{2} - c_{1}^{2} ν_{1}^{2})}^{2}} .$

The ranges of values of the skewness and kurtosis coefficients coincide with the range of the respective coefficients of the StN distribution, as displayed in Table 1 for $ν = 5, 7 \dots, 19$ .

In Figure 2 it is observed, that the skewness coefficient is an odd function and the width of the interval decreases when $ν$ increases. When $ν \to \infty$ , we obtain the skewness range of the MSN and SN models, which is given by $(- 0.995, 0.995)$ . On the other hand, the kurtosis coefficient is an even function. It is observed that the lower end and upper end of the interval, together with the width of the interval, decreases with the value of $ν$ . Once again when $ν \to \infty$ , the kurtosis range of the models MSN and SN is obtained, whose interval corresponds to $(3, 3.869)$ .

3. Inference

In this section we use the $M S t N (μ, σ, λ, ν)$ in order to obtain a robust inference on the parameters $(μ, σ, λ)$ . In this sense, the degrees of freedom parameter $ν$ , with $0 < ν < \infty$ , will be considered as initially known, and in a second stage from a grid of values for $ν$ we will select that value with the best likelihood and AIC.

Given a random sample $(x_{1}, \dots, x_{n}) \in R^{n}$ from the random variable $X \sim M S t N (μ, σ, λ, ν)$ , the log-likelihood function for parameter vector $θ = {(μ, σ, λ)}^{⊤} \in R \times (0, \infty) \times R$ is

(5) $\begin{matrix} l (θ) & = & n log (c_{ν}) - n log (σ) - \frac{ν + 1}{2} \sum_{i = 1}^{n} log (1 + \frac{z_{i}^{2}}{ν}) + \sum_{i = 1}^{n} log (2 Φ (λ u (z_{i}))), \end{matrix}$

where

c_{ν} = log Γ ((ν + 1) / 2) - log Γ (ν / 2) - \frac{1}{2} log (π ν)

and

z_{i} = (x_{i} - μ) / σ

The associated score functions are

$\begin{matrix} \frac{\partial l (θ)}{\partial μ} & = & \frac{1}{σ} \sum_{i = 1}^{n} (- λ {(1 + z_{i}^{2})}^{- 3 / 2} ζ (λ u (z_{i})) + (\frac{ν + 1}{ν}) z_{i} {(1 + \frac{z_{i}^{2}}{ν})}^{- 1}), \\ \frac{\partial l (θ)}{\partial σ} & = & \frac{1}{σ} \sum_{i = 1}^{n} (- 1 - λ z_{i} {(1 + z_{i}^{2})}^{- 3 / 2} ζ (λ u (z_{i})) + (\frac{ν + 1}{ν}) z_{i}^{2} {(1 + \frac{z_{i}^{2}}{ν})}^{- 1}), \\ \frac{\partial l (θ)}{\partial λ} & = & \sum_{i = 1}^{n} u (z_{i}) ζ (λ u (z_{i})), \end{matrix}$

where

ζ (x) = ϕ (x) / Φ (x)

. From these equations, ML estimates of

(μ, σ, λ)

must be obtained numerically.

The MStN Fisher’s information matrix computed for $n = 1$ as $i (θ) = E ({(\partial l (θ) / \partial θ)}^{2})$ has entries given by

$\begin{matrix} i_{μ μ} & = \frac{1}{σ^{2}} (λ^{2} η_{03} + \frac{ν + 2}{ν + 3}), i_{μ σ} = - \frac{1}{σ^{2}} (λ ρ_{05} - 2 λ ρ_{25} - λ^{3} ρ_{27} - λ^{2} η_{13} - 2 (\frac{ν + 1}{ν}) δ_{2}), \\ i_{μ λ} & = \frac{1}{σ} (ρ_{03} - λ^{2} ρ_{25} - λ η_{12}), i_{σ σ} = \frac{1}{σ^{2}} (λ^{2} η_{23} + \frac{2 ν}{ν + 3}), i_{σ λ} = - \frac{λ}{σ} η_{22}, i_{λ λ} = η_{21}, \end{matrix}$

where

ρ_{k l} = E (Z^{k} {(1 + Z^{2})}^{- l / 2} ζ (λ u (Z))),

with

ρ_{k l} = 0

for k odd,

η_{k l} = E (Z^{k} {(1 + Z^{2})}^{- l} ζ^{2}

(λ u (Z)))

, and

δ_{k} = E (Z {(1 + Z^{2} / ν)}^{- k})

, which must be computed assuming that

Z \sim M S t N (λ, ν)

In the symmetric case with $λ = 0$ , we have that $δ_{k} = 0$ , and so the Fisher information matrix reduces to

$i (μ, σ, 0) = (\begin{matrix} \frac{1}{σ^{2}} \frac{ν + 2}{ν + 3} & 0 & \frac{1}{σ} \sqrt{\frac{2}{π}} d_{1} \\ 0 & \frac{2}{σ^{2}} \frac{ν}{ν + 3} & 0 \\ \frac{1}{σ} \sqrt{\frac{2}{π}} d_{1} & 0 & \frac{2}{π} d_{2} \end{matrix}) .$

The expressions for $d_{1} = E ({(1 + Z_{0}^{2})}^{- 1})$ and $d_{2} = E (Z_{0}^{2} {(1 + Z_{0}^{2})}^{- 1})$ , with $Z_{0} \sim t_{ν}$ , were computed using the Mathematica software package [12] and are displayed in Appendix A. This is a non-singular matrix since

$| i (μ, σ, 0) | = \frac{4 d_{2} ν (ν + 2)}{π σ^{4} {(ν + 3)}^{2}} (1 - h (ν)) \neq 0,$

where

h (ν) = \frac{(ν + 3) {(d_{1})}^{2}}{(ν + 2) d_{2}}

is an increasing function of

ν

and converges to

0.926

ν \to \infty

. This is shown in Figure 3.

On the order hand, as $ν \to \infty$ , the matrix $i (μ, σ, λ) \to i_{SMN} (μ, σ, λ)$ , where $i_{SMN} (μ, σ, λ)$ is the Fisher’s information matrix of the $M S N (μ, σ, λ)$ model (see also Arrué et al. [7]).

As the MStN model is a regular parametric model, we have for each $ν > 0$ that the ML estimator $\hat{θ}$ of $θ$ is consistent and such that $\sqrt{n} (\hat{θ} - θ) \overset{d}{\to} N_{3} (0, {(i (θ))}^{- 1})$ as $n \to \infty$ . This fact holds also when $ν \to \infty$ , and hence for the ML estimator of $θ$ obtained from the $M S N (θ)$ model.

4. Bias Reduction for the ML Estimates

For each value of $ν$ , the ML estimate of $λ$ obtained from the MStN model overestimates the true value of this parameter. This fact can be seen in Table 2 below.

In addition, the ML estimate of $λ$ could be infinite, with a certain probability, when all observations have the same sign, e.g., if all of them are positive so that $min {z_{1}, \dots, z_{n}} > 0$ . This non-zero probability of divergence when estimating $λ$ , increases when the true values of $λ$ and $ν$ grow, however it quickly declines with the sample size. This can be observed in Figure 4.

For the case of the shape parameter with location and scale parameters, the overestimation of these parameters only occurs for $λ$ . This can be noted in Table 3.

As the bias of the ML estimates of $μ$ and $σ$ is virtually zero, it is sensible to apply Firth’s method (see Ref. [9]), to reduce the bias of the ML estimate of $λ$ to $O (n^{- 1})$ (see Cox and Snell [13]). Therefore, by doing this, we obtain a new ML bias-corrected estimate of $λ$ , i.e., ${\hat{λ}}^{*}$ unbiased to $O (n^{- 2})$ . Since $Z \sim M S t N (λ, ν)$ then it is verified that $- Z \sim M S t N (- λ, ν)$ , and we only proceed for the case $λ > 0$ .

4.1. Preliminary Results

Consider a regular parametric model with log-likelihood function $l (θ)$ , and let us also suppose that the parameter is a scalar; for instance $θ = λ$ . In addition, let $U (θ) = l^{'} (θ)$ and $j (θ) = - l^{″} (θ)$ , where $l^{'}$ and $l^{″}$ are the first and second derivatives of l, respectively, and consider the following expected quantities:

$i (θ) = E_{θ} (j (θ)), ν_{θ, θ, θ} (θ) = E_{θ} (l^{″} {(θ)}^{3}), ν_{θ, θ θ} (θ) = - E_{θ} (l^{'} (θ) l^{″} (θ))$

For a random sample of size n, we will consider

j (θ)

of order

O_{p} (n)

and

i (θ)

ν_{θ, θ, θ} (θ)

and

ν_{θ, θ θ} (θ)

are of order

O (n)

(see Sartori [8]). In order to obtain the reduced ML estimator (least-bias estimator), denoted by

{\hat{θ}}^{*}

, we must solve the following likelihood equation:

(6) $U^{*} (θ) = U (θ) + M (θ) = 0,$

where

M (θ) = - i (θ) b (θ)

and

b (θ) = \frac{1}{2} i {(θ)}^{- 2} {ν_{θ, θ, θ} + ν_{θ, θ θ}} = O (n^{- 1})

(see Sartori [8]).

The quasi-likelihood function associated with (6) is given by

$\begin{matrix} l^{*} (θ) = \int_{c}^{θ} u^{*} (t) d t = l (θ) - l (c) + \int_{c}^{θ} m (t) d t, \end{matrix}$

where c is an arbitrary real number. This function allows us to numerically find a reduced bias estimate of

θ

, say,

{\hat{θ}}^{*}

. Moreover, it can be used to compute confidence intervals of

θ

by means of the likelihood ratio statistics. In fact, since

l^{*} (θ)

is a penalized likelihood with a bound penalty function of order

O (1)

, then the log-likelihood ratio statistics based on the expression

(7) $W^{*} (θ) = 2 {l^{*} ({\hat{θ}}^{*}) - l^{*} (θ)},$

has the usual asymptotic distribution

χ_{1}^{2}

. It can be used to calculate confidence intervals for

θ

, since it better captures the skewness of the log-likelihood compared to the normal asymptotic distribution.

4.2. Shape Parameter Case

Consider now the baseline case with $μ = 0$ , $σ = 1$ and we take a sample of size n, say $z_{1}, \dots, z_{n}$ , from $M S t N (λ, ν)$ , with $θ = λ$ being the unknown parameter. In this case, the log-likelihood function is obtained from (5) by letting $μ =$ and $σ = 1$ . However, since $ν$ is assumed to be known, it becomes proportional to the function $l (λ) = \sum_{i = 1}^{n} log (Φ (λ u (z_{i}))) .$

The ML estimate is infinite, with a certain probability, e.g., when $min {z_{1}, \dots, z_{n}} > 0$ , since the log-likelihood is an increasing function in $λ$ and goes to zero when $λ \to \infty$ . The score function and the observed information are given by $U (λ) = \sum_{i = 1}^{n} u (z_{i}) ζ (λ u (z_{i}))$ and $j (λ) = λ \sum_{i = 1}^{n} u {(z_{i})}^{3} ζ (λ u (z_{i})) + \sum_{i = 1}^{n} u {(z_{i})}^{2} ζ^{2} (λ u (z_{i}))$ , respectively.

Now, by using the notation $a_{k h} (λ) = E_{λ} {u {(z)}^{k} ζ^{h} (λ u (z))}$ the modified function $M (λ)$ takes the following expression,

(8) $\begin{matrix} M (λ) & = & - \frac{λ}{2} \frac{a_{42} (λ)}{a_{22} (λ)} . \end{matrix}$

Remark 1.

Note that $a_{k 1} = 0$ when k is odd, and $a_{k 1} \geq 0$ when both k and h are even.

The left panel of Figure 5 below shows the graphs of the modified $M (λ)$ function of the SN, MStN for $ν = 1$ and $ν = 3$ and the MSN models. All of them are bound and odd functions for all $λ$ and they tend to zero when $λ \to \pm \infty$ (see Proposition 3). They take the maximum value at $M_{S N} (1.07) = 0.83$ , $M_{M S t N} (2.58) = 0.64$ with $ν = 1$ , $M_{M S t N} (2.75) = 0.59$ with $ν = 3$ and $M_{M S N} (2.96) = 0.55$ respectively. Furthermore, it is also observed that the larger the value of $ν$ , the more closely the modified function associated with the MStN distribution approaches the modified function of the MSN model. On the right panel of Figure 5, the different graphs of the integrated modified function M are displayed. It is noticeable that this is a decreasing and even function with respect to $λ$ .

In order to guarantee the existence of estimator ${\hat{λ}}^{*}$ , the following proposition is needed.

Proposition 3.

Let $M (λ)$ be the modification of the score function for the $M S t N (λ, ν)$ distribution, then $M (λ) = O (λ^{- 1})$ whatever the value of $ν > 0,$ i.e., the rate of convergence of the tails of $M (λ)$ , is of the order $λ^{- 1}$ .

Proof.

See Appendix A. □

First Simulation Study

We have performed a simulation analysis with 5000 iterations for a random variable Z that follows a MStN distribution, by assuming that $μ = 0$ , $σ = 1$ and $ν$ are known, for different sample sizes and different values of $λ$ and $ν$ .

It can be inferred from Table 4 that an overestimation of the parameter $λ$ exists and there are also cases where the estimation is ∞. Obviously, it depends on the sample size, degrees of freedom ( $ν$ ), and the true value of the shape parameter ( $λ$ ). After applying Firth’s method to the shape parameter $λ$ , we obtain a new estimate ${\hat{λ}}^{*}$ , which is finite and always exists. This is consistent with Proposition 3. The bias reduction of ${\hat{λ}}^{*}$ is quite good, taking into account that this method is applied whenever the estimate $\hat{λ}$ is finite and/or infinite. In addition, there exists an underestimation of $λ$ when its value is large and the sample size is small. The empirical coverage is very close to the nominal value $(95 %)$ , although it is slightly lower when the sample size is small. This is due to the fact that the coverage is affected by the percentage of $\hat{λ}$ , which approaches infinity.

4.3. Location, Scale, and Shape Case

Similar to the previous case (shape parameter), the ML estimate of $λ$ obtained from the log-likelihood function for $(μ, σ, λ)$ given by (5), could be infinite, with a certain probability, when the random sample satisfies $min {x_{1}, \dots, x_{n}} > \hat{μ}$ , where $\hat{μ}$ is the respective ML estimate of $μ$ . Since the bias of the ML estimates of $μ$ and $σ$ is virtually zero, then it seems reasonable to apply the bias reduction method to only the shape parameter $λ$ .

Let $l_{P} (λ) = l ({\hat{μ}}_{λ}, {\hat{σ}}_{λ}, λ)$ be the profile log-likelihood for $λ$ , where ${\hat{μ}}_{λ}$ and ${\hat{σ}}_{λ}$ are the ML estimates for a known value of $λ$ . We also define the profile of the modified log-likelihood equation as follows

(9) $U_{P}^{*} (λ) = U_{P} (λ) + M (λ) = 0,$

where

$\begin{matrix} U_{P} (λ) = \frac{\partial l_{P} (λ)}{\partial λ} = \sum_{i = 1}^{n} u (\frac{z_{i} - {\hat{μ}}_{λ}}{{\hat{σ}}_{λ}}) ζ (λ u (\frac{z_{i} - {\hat{μ}}_{λ}}{{\hat{σ}}_{λ}})), \end{matrix}$

is the profile of the score function, and M is the modified function given in (8). We also use the profile of the quasi-log-likelihood associated with (9), given by

$\begin{matrix} l_{P}^{*} (λ) = \int_{c}^{λ} U_{P}^{*} (t) d t = l_{P} (λ) - l_{P} (c) + \int_{c}^{λ} M (t) d t, \end{matrix}$

where c is an arbitrary real number. As M is bound, then the likelihood ratio statistics

$W_{P}^{*} (λ) = 2 {l_{P}^{*} (λ^{*}) - l_{P}^{*} (λ)},$

has the usual asymptotic distribution

χ_{1}^{2}

. This is useful to calculate the confidence interval for

λ

Second Simulation Study

Again, we have performed a simulation analysis with 5000 iterations for a random variable $Z \sim M S t N (μ, σ, λ, ν)$ , by assuming that $μ = 0$ , $σ = 1$ are unknown and $ν$ is known, for different sample sizes and different values of $λ$ and $ν$ .

In a similar way as in the case of the scalar parameter, it is observed in Table 5, that there exists an overestimation of the parameter $λ$ . Moreover, there are cases where these estimates approach ∞ (with a higher percentage than in the previous case). Nevertheless, ML estimates of the location parameter $μ$ and scale parameter $σ$ are quite good since they always exist, they are finite, and bias is close to zero for both cases. For that reason, we only apply Firth’s method to the parameter $λ$ . The new estimate, ${\hat{λ}}^{*}$ , always exists and it is finite. Besides, the bias is reduced for both cases, i.e., when $\hat{λ}$ is finite and/or infinite. Similarly to the previous case (without location and scale parameter), there also exists an underestimation of the parameter $λ$ for large values of $λ$ and small sample size, however the size of the underestimation is of lower magnitude. The empirical coverage is very close to the nominal value $(95 %)$ , although it is slightly lower when the sample size is small and the value of $λ$ is large. This is due to the fact that the coverage is affected by the percentage of $\hat{λ}$ , which approaches infinity.

5. Applications

In this section we present two applications to real datasets; the first database is available as Supplementary Materials and second at http://Lib.stat.cmu.edu/datasets/Plasma_Retinol (accessed on 2 January 2023).

5.1. First Application

We consider a dataset that deals with the nickel concentration from 86 soil samples analyzed by the Mining Department of Universidad de Atacama, Chile. Table 6, shows the descriptive statistics of this dataset including the sample skewness coefficient ( $\sqrt{b_{1}}$ ) and the sample kurtosis coefficient ( $b_{2}$ ),

Now, an exploration of the ML estimates for the MStN distribution, assuming different known values for the parameter $ν$ , is carried out to examine the behavior of the log-likelihood function. Table 7 illustrates the performance of the latter function. It is observed that the maximum value of this function ( $ℓ_{m a x}$ ) occurs when $ν = 3$ .

Table 8, shows the ML estimates of the parameters for the SN, MSN, and MStN models, respectively. Standard errors (shown in brackets) were obtained by inverting the Fisher’s information matrix for each model. In addition, two measures of model selection, the maximum of the log-likelihood function, and Akaike’s Information Criterion (AIC), are displayed in Table 8. It can be concluded that the model introduced in this paper provides a good fit to data as compared to other competing models.

Figure 6 exhibits the histogram of the nickel concentration dataset. Furthermore, we have superimposed the densities of the $M S N (\hat{μ}, \hat{σ}, \hat{λ})$ (dotted line) and $M S t N (\hat{μ}, \hat{σ}, \hat{λ}, 3)$ (solid line) models.

QQ-plots of MStN and MSN models and cdf of the empirical distribution (ogive) and MSN and MStN distributions are shown in Figure 7 and Figure 8 respectively, obtained from the parameter estimates given above. Both tables confirm the good fit to data of the model presented in this paper.

Table 9 shows the ML estimates $\hat{μ}$ , $\hat{σ}$ , $\hat{λ}$ and the modified ML estimates ${\hat{λ}}^{*}$ .

The value of the modified ML estimates ${\hat{λ}}^{*}$ is lower than the ML estimates $\hat{λ}$ . In addition, by construction, it has a lower bias.

Table 10 displays the confidence interval of parameter $λ$ for three different confidence levels. It is observed that the confidence intervals computed using the modified ML estimate of ${\hat{λ}}^{*}$ , $I C$ , are more accurate than the interval based on the ML estimate of $\hat{λ}$ , $I C^{*}$ , since they are narrower.

5.2. Second Application

We present a second application of the proposed model, to a dataset related with Betaplasma. The data were obtained from a study of patients ( $n = 315$ ) subjected to Betaplasma ingestion in their diet with the object of measuring plasma concentrations of betacarotene (ng/ml). The human body converts betacarotene into vitamin A, which then acts as an antioxidant, preventing oxidation damage to cells.

Table 11, presents a summary of the descriptive statistics, reflecting high dispersion, asymmetry, and a marked kurtosis value.

We studied the ML estimates for the MStN distribution, assuming different known values for parameter $ν$ . This enabled us to analyze the behavior of the log-likelihood function. Table 12, illustrates the performance of the log-likelihood function. The maximum value of this function ( $ℓ_{m a x}$ ) occurs when $ν = 2$ .

Table 13, shows the ML estimates of the parameters for the SN, MSN, and MStN models, respectively, and the standard errors (in parentheses). The table also shows two model selection measures, the maximum log-likelihood function, and the Akaike information criterion (AIC). We may conclude that the model presented in this paper provides a better fit with the data than the SN and MSN models.

Figure 9, shows the histogram of the plasma concentration of betacarotene. It also presents the densities of the $M S N (\hat{μ}, \hat{σ}, \hat{λ})$ model (dotted line) and $M S t N (\hat{μ}, \hat{σ}, \hat{λ}, 2)$ model (continuous line).

Bias reduction was applied to parameter $λ$ of the MStN model, but the same estimated value was obtained as for the ML. This result is not surprising, since when the sample size ( $n = 315$ ) is relatively large the estimates tend to be more precise.

The left panel of Figure 9 visually shows the good fit of the MStN model with the plasma concentration of the betacarotene data. The right panel presents a close-up of the right tail of the distribution, showing how the proposed model is better at capturing the extreme values than the MSN model.

The QQ plots of the MStN and MSN models, and the CDF of the empirical distributions (ogive) of the MSN and MStN models are shown in Figure 10 and Figure 11, respectively, obtained from the parameter estimates given above. These graphs show the good fit of the proposed model for datasets with high extreme values.

6. Concluding Remarks

In this work, a modified maximum likelihood estimator is proposed to solve the issue of overestimation for the shape parameter $λ$ in the MStN model. This problem is solved by finding a new maximum likelihood estimator ( ${\hat{λ}}^{*}$ ) by using Firth’s method. Although the estimates of $λ$ can be finite or infinite, the bias of this new estimator is lower. Moreover, the existence of ${\hat{λ}}^{*}$ is proved. A simulation study was carried out for the shape parameter case and for the case where location, scale, and shape parameters were considered. The conclusions for this simulation analysis are given below:

For both cases, the bias of the new ML estimator ${\hat{λ}}^{*}$ are reduced after applying Firth’s method. For both cases, this methodology satisfactorily reduces the bias of the new estimator even in situations where the estimates of $λ$ are infinite;
For the second case, the method is not applied to parameters $μ$ y $σ$ , since the bias is very close to zero;
For the first case, the empirical coverage is very close to the nominal value $(95 %)$ , and is slightly lower when the sample size is small; this is due to the higher percentage of infinite values for $\hat{λ}$ . For the second case, the empirical coverage is lower for situations where $λ$ is large and n is small;
For the first case, there exists an underestimation of the parameter $λ$ , when the parameter takes large values and the sample size is small. For the second case, similar results are observed but the degree of the underestimation is of lower magnitude.

Furthermore, the profile of quasi-likelihood associated with the profile of the modified likelihood equation is considered; its objective is to obtain confidence intervals that better capture the skewness of the log-likelihood of this model. The non-singularity of the Fisher’s information matrix associated with the MStN distribution when the shape parameter takes the value zero is also verified. Thus, it allows us to perform asymptotic inference. Two real datasets were used to apply the methodology presented in this paper. Our findings revealed that better results were obtained with the MStN model than with the SN and MSN models.

Author Contributions

Conceptualization, J.A. and H.W.G.; methodology, R.B.A.-V. and H.W.G.; software, J.A. and E.C.-O.; validation, R.B.A.-V., O.V. and H.W.G.; formal analysis, R.B.A.-V., E.C.-O. and H.W.G.; investigation, J.A.; writing—original draft preparation, J.A. and R.B.A.-V.; writing—review and editing, R.B.A.-V., E.C.-O. and O.V.; funding acquisition, O.V. and H.W.G. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The first data is available as Supplementary Materials and second data at http://Lib.stat.cmu.edu/datasets/Plasma_Retinol (accessed on 2 January 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

View Image - Figure 1. Different plots of the pdf of the MStN distribution for different values of [Forumla omitted. See PDF.] when [Forumla omitted. See PDF.] (left panel) and [Forumla omitted. See PDF.] (right panel).

Figure 1. Different plots of the pdf of the MStN distribution for different values of [Forumla omitted. See PDF.] when [Forumla omitted. See PDF.] (left panel) and [Forumla omitted. See PDF.] (right panel).

Figure 2. Skewness and kurtosis coefficients for the MStN model.

Figure 3. Graph of the function [Forumla omitted. See PDF.].

View Image - Figure 4. Probability of divergence for the ML estimate of [Forumla omitted. See PDF.] for the MStN model when [Forumla omitted. See PDF.] (left panel) and [Forumla omitted. See PDF.] (right panel).

Figure 4. Probability of divergence for the ML estimate of [Forumla omitted. See PDF.] for the MStN model when [Forumla omitted. See PDF.] (left panel) and [Forumla omitted. See PDF.] (right panel).

View Image - Figure 5. Modified function (left panel) and integrated modified function (right panel) for SN (dashed line), MStN with [Forumla omitted. See PDF.] (solid line) and [Forumla omitted. See PDF.] (thick solid line) and MSN (dotted line).

Figure 5. Modified function (left panel) and integrated modified function (right panel) for SN (dashed line), MStN with [Forumla omitted. See PDF.] (solid line) and [Forumla omitted. See PDF.] (thick solid line) and MSN (dotted line).

Figure 6. Histogram of the nickel concentration data together with the pdf of MSN model (dotted) and MStN (solid).

Figure 7. QQ plots for MStN and MSN distributions.

Figure 8. Empirical cdf (ogive) versus theoretical cdf for the MStN and MSN models.

Figure 9. Histogram of the plasma concentration of betacarotene data with the pdf of the MSN (dotted) and MStN (continuous) models.

Figure 10. QQ plots for MStN and MSN distributions.

Figure 11. Empirical cdf (ogive) versus theoretical cdf for the MStN and MSN models.

Table 1

Skewness and kurtosis ranges for different values of $ν$ .

$ν$	Skewness Range	Kurtosis Range
5	(−2.550, 2.550)	(9.00, 23.109)
7	(−1.798, 1.798)	(5.000, 9.461)
9	(−1.539, 1.539)	(4.200, 7.054)
11	(−1.407, 1.407)	(3.857, 6.082)
13	(−1.326, 1.326)	(3.667, 5.561)
15	(−1.272, 1.272)	(3.545, 5.237)
17	(−1.233, 1.233)	(3.462, 5.017)
19	(−1.204, 1.204)	(3.400, 4.857)
∞	(−0.995, 0.995)	(3.000, 3.869)

Table 2

ML estimate of $λ$ and empirical (theoretical) percentage of cases when it exists: this results in 5000 iterations from the $M S t N (λ, ν)$ model.

		$n = 20$		$n = 50$		$n = 100$
$λ$	$ν$	$\hat{λ}$ ^a	$% (\hat{λ} < \infty)$	$\hat{λ}$ ^a	$% (\hat{λ} < \infty)$	$\hat{λ}$ ^a	$% (\hat{λ} < \infty)$
5	3	6.99	70.68 (71.04)	7.00	95.10 (95.49)	5.82	99.74 (99.80)
	5	7.17	71.48 (72.28)	7.00	95.36 (95.95)	5.86	99.90 (99.84)
	10	7.06	74.52 (73.24)	6.97	96.24 (96.29)	5.83	99.84 (99.86)
10	3	11.52	45.62 (45.09)	14.58	77.32 (77.66)	13.73	95.18 (95.01)
	5	12.31	46.44 (46.15)	15.00	79.08 (78.72)	13.64	96.16 (95.47)
	10	12.67	47.26 (47.00)	14.38	80.40 (79.55)	13.88	95.54 (95.82)

^a Calculated when $\hat{λ} < \infty$ .

Table 3

Bias of $\hat{μ}$ , $\hat{σ}$ and $\hat{λ}$ , and percentages of cases when $\hat{λ}$ exists: results in 5000 iterations from the $M S t N (0, 1, λ, ν)$ distribution.

n	$λ$	$ν$	Bias ( $μ$ )	Bias ( $σ$ )	$λ$ ^a	$% (\hat{λ} < \infty)$
50	5	3	0.003	0.006	6.860	83.78
100	5	3	−0.005	0.010	6.797	96.98
200	5	3	−0.001	0.003	5.665	99.88
50	10	3	0.016	−0.010	11.052	61.80
100	10	3	0.004	0.001	13.499	85.24
200	10	3	0.000	0.002	13.211	97.60
50	5	5	0.003	0.004	6.840	85.54
100	5	5	−0.001	0.000	6.637	97.52
200	5	5	−0.001	0.003	5.666	99.96
50	10	5	0.016	−0.011	11.254	61.96
100	10	5	0.004	−0.003	13.645	86.08
200	10	5	0.001	−0.001	12.845	98.00
50	5	10	0.008	−0.006	6.798	87.10
100	5	10	0.000	0.001	6.525	98.22
200	5	10	0.000	0.002	5.641	99.94
50	10	10	0.017	−0.015	11.404	65.28
100	10	10	0.002	−0.001	13.621	87.80
200	10	10	−0.001	0.000	13.200	98.20

^a Calculated when $\hat{λ} < \infty$ .

Table 4

Bias of $\hat{λ}$ and ${\hat{λ}}^{*}$ , empirical coverage of 0.95CI based on $W^{*} (λ)$ and empirical (theoretical) percentage of cases when $\hat{λ}$ exists: results in 5000 iterations from the $M S t N (λ, ν)$ distribution.

n	$λ$	$ν$	Bias( $\hat{λ})$ ^a	Bias( ${\hat{λ}}^{*})$	$% W (λ^{*})$	$% (\hat{λ} < \infty)$
20	5	3	1.867	−1.583	0.94	71.64 (71.04)
50	5	3	1.754	−0.298	0.95	95.06 (95.49)
100	5	3	0.788	−0.030	0.95	99.82 (99.80)
20	10	3	1.991	−6.034	0.90	44.78 (45.09)
50	10	3	4.299	−2.866	0.94	76.80 (77.66)
100	10	3	3.856	−0.694	0.94	94.84 (95.01)
20	5	5	2.148	−1.513	0.94	72.62 (72.28)
50	5	5	1.802	−0.293	0.95	96.48 (95.95)
100	5	5	0.815	−0.004	0.95	99.80 (99.84)
20	10	5	2.197	−5.949	0.90	46.46 (46.15)
50	10	5	4.116	−2.751	0.94	79.38 (78.72)
100	10	5	3.862	−0.626	0.95	95.38 (95.47)
20	5	10	2.177	−1.479	0.94	72.82 (73.24)
50	5	10	2.103	−0.236	0.96	96.64 (96.29)
100	5	10	0.776	0.018	0.95	99.90 (99.86)
20	10	10	2.274	−5.888	0.91	47.42 (47.00)
50	10	10	4.169	−2.626	0.94	79.18 (79.55)
100	10	10	4.338	−0.600	0.95	95.88 (95.82)

^a Calculated when $\hat{λ} < \infty$ .

Table 5

Bias of $\hat{μ}$ , $\hat{σ}$ , $\hat{λ}$ and ${\hat{λ}}^{*}$ , empirical coverage of 0.95 CI based on $W^{*} (λ)$ and empirical percentage of cases when $\hat{λ}$ exists: results in 5000 iterations from the $M S t N (0, 1, λ, ν)$ model.

n	$λ$	$ν$	Bias ( $\hat{μ}$ )	Bias ( $\hat{σ}$ )	Bias ( $\hat{λ}$ )	Bias ( ${\hat{λ}}^{*}$ )	$% W (λ^{*})$	$% (\hat{λ} < \infty)$
50	5	3	0.004	0.005	1.889	−0.898	0.93	84.24
100	5	3	−0.002	0.005	1.617	−0.306	0.94	97.42
200	5	3	−0.001	0.004	0.712	−0.106	0.95	99.84
50	10	3	0.017	−0.008	1.097	−3.971	0.87	61.86
100	10	3	0.004	−0.003	3.301	−1.699	0.91	85.40
200	10	3	0.001	0.000	3.188	−0.534	0.94	97.96
50	5	5	0.004	0.002	1.828	−0.855	0.94	86.70
100	5	5	−0.002	0.005	1.755	−0.255	0.94	97.58
200	5	5	−0.001	0.002	0.628	−0.098	0.95	99.78
50	10	5	0.016	−0.013	1.185	−3.842	0.87	63.74
100	10	5	0.005	−0.002	3.736	−1.508	0.92	86.72
200	10	5	0.000	0.000	3.053	−0.401	0.94	97.78
50	5	10	0.006	−0.003	1.832	−0.813	0.92	86.84
100	5	10	0.000	0.002	1.523	−0.245	0.95	98.00
200	5	10	0.000	0.000	0.578	−0.108	0.94	99.94
50	10	10	0.014	−0.013	1.530	−3.689	0.88	64.18
100	10	10	0.004	−0.002	3.622	−1.311	0.92	86.40
200	10	10	0.002	−0.002	2.700	−0.450	0.93	98.06

Table 6

Descriptive statistics of the nickel concentration dataset.

Data	n	$\bar{t}$	s	$\sqrt{b_{1}}$	$b_{2}$
Nickel	86	$21.337$	$16.639$	$2.355$	$11.191$

Table 7

ML estimates of the parameters of the MStN model when $ν$ is known together with the maximum of the log-likelihood function.

$\hat{μ}$	$\hat{σ}$	$\hat{λ}$	$ν$	$ℓ_{\max}$
13.092	6.284	0.954	1	−343.057
8.676	11.182	2.235	2	−338.775
7.083	13.767	2.994	3	−338.260
6.335	15.345	3.492	4	−338.483
5.858	16.458	3.875	5	−338.864
21.439	11.410	−0.518	6	−349.399

Table 8

Parameter estimates, standard errors, $ℓ_{m a x}$ and AIC for SN, MSN, and MStN distributions.

MLE	SN	MSN	MStN
$\hat{μ}$	2.625 (1.136)	2.571 (1.260)	7.083 (1.402)
$\hat{σ}$	24.968 (1.913)	25.027 (2.153)	13.767 (1.838)
$\hat{λ}$	10.261 (4.751)	10.619 (5.239)	2.994 (0.789)
$ν$	-	-	3
$ℓ_{m a x}$	−344.762	−344.769	−338.260
AIC	693.524	693.538	682.520

Table 9

ML estimate and modified ML estimate of $μ$ , $σ$ , $λ$ for the MStN distribution.

$\hat{μ}$	$\hat{σ}$	$\hat{λ}$	${\hat{λ}}^{*}$	$l (\hat{μ}, \hat{σ}, \hat{λ})$	$l (\hat{μ}, \hat{σ}, {\hat{λ}}^{*})$
7.083 (1.402)	13.767 (1.838)	2.994 (0.789)	-	−338.260	-
7.083 (1.447)	13.767 (1.843)	-	2.838 (0.731)	-	−338.305

Table 10

Confidence interval for $λ$ .

MLE	$95 %$	$98 %$	$99 %$
IC	(1.696, 4.292)	(1.373, 4.615)	(1.158, 4.830)
IC*	(1.635, 4.040)	(1.336, 4.339)	(1.137, 4.539)

Table 11

Descriptive statistics of the plasma concentration of betacarotene.

Data	n	$\bar{t}$	s	$\sqrt{b_{1}}$	$b_{2}$
Betaplasma	315	$189.895$	$182.997$	$3.530$	$19.792$

Table 12

ML estimates of the parameters of the MStN model when $ν$ and its maximum log-likelihood function are known.

$\hat{μ}$	$\hat{σ}$	$\hat{λ}$	$ν$	$ℓ_{\max}$
64,737	74,446	2514	1	−1.931,939
52,120	107,392	3946	2	−1.917,631
45,800	127,352	5081	3	−1.918,405
42,250	140,472	5930	4	−1.921,499
39,815	150,177	6608	5	−1.924,907

Table 13

Parameter estimates, standard errors, $ℓ_{m a x}$ , AIC, and BIC for the SN, MSN, and MStN distributions.

MLE	SN	MSN	MStN
$\hat{μ}$	21.332 (5.180)	21.332 (5.170)	52.120(5.686)
$\hat{σ}$	248.613 (10.510)	248.594 (10.505)	107.391(8.519)
$\hat{λ}$	18.834(6.968)	18.954(6.971)	3.945(0.728)
$ν$	-	-	2
$ℓ_{m a x}$	−1976.317	−1976.319	−1917.631
AIC	3956.634	3956.638	3839.262
BIC	3969.896	3969.896	3852.520

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math11153287/s1. Supplementary file: The database of the first application.

References

1. Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F.A. A New Class of Skew-Normal Distributions. Commun. Stat. Theory Methods; 2004; 33, pp. 1465-1480. [DOI: https://dx.doi.org/10.1081/STA-120037254]

2. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat.; 1985; 12, pp. 171-178.

3. Sever, M.; Lajovic, J.; Rajer, B. Robustness of the Fishers discriminant function to skew-curved normal distribution. Metodoloskizvezki; 2005; 2, pp. 231-242.

4. Arnold, B.C.; Castillo, E.; Sarabia, J.M. Distributions with Generalized Skewed Conditionals and Mixtures of such Distributions. Commun. Stat. Theory Methods; 2007; 36, pp. 1493-1504. [DOI: https://dx.doi.org/10.1080/03610920601125862]

5. Gómez, H.W.; Castro, L.M.; Salinas, H.S.; Bolfarine, H. Properties and Inference on the Skew-curved-symmetric Familiy of Distributions. Commun. Stat. Theory Methods; 2010; 39, pp. 884-898. [DOI: https://dx.doi.org/10.1080/03610920902807887]

6. Arellano-Valle, R.B.; Gómez, H.W.; Salinas, H.S. A note on the Fisher information matrix for the skew-generalized-normal model. SORT; 2013; 37, pp. 19-28.

7. Arrué, J.; Arellano-Valle, R.B.; Gómez, H.W. Bias reduction of maximum likelihood estimates for a modified skew normal distribution. J. Stat. Comput. Simul.; 2016; 86, pp. 2967-2984. [DOI: https://dx.doi.org/10.1080/00949655.2016.1143471]

8. Sartori, N. Bias prevention of maximum likelihood estimates for scalar skew normal and skew t distributions. J. Stat. Plan. Inference; 2006; 136, pp. 4259-4275. [DOI: https://dx.doi.org/10.1016/j.jspi.2005.08.043]

9. Firth, D. Bias reduction of maximum likelihood estimates. Biometrika; 1993; 80, pp. 27-38. Erratum in Biometrika 1993, 82, 667 [DOI: https://dx.doi.org/10.1093/biomet/80.1.27]

10. Arellano-Valle, R.B.; Ferreira, C.S.; Genton, M.G. Scale and shape mixtures of multivariate skew-normal distributions. J. Multivar. Anal.; 2018; 166, pp. 98-110. [DOI: https://dx.doi.org/10.1016/j.jmva.2018.02.007]

11. Ferreira, C.S.; Bolfarine, H.; Lachos, V.H. Skew scale mixtures of normal distributions: Properties and estimation. Stat. Methodol.; 2011; 8, pp. 154-171. [DOI: https://dx.doi.org/10.1016/j.stamet.2010.09.001]

12. Wolfram Research, Inc. Mathematica, Version 10.0; Wolfram Research, Inc.: Champaign, IL, USA, 2014.

13. Cox, D.R.; Snell, E.J. A general definition of residuals. J. R. Stat. Soc. B Stat. Methodol.; 1968; 30, pp. 248-275. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1968.tb00724.x]

Word count: 6265

Show less

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

In this paper, likelihood-based inference and bias correction based on Firth’s approach are developed in the modified skew-t-normal (MStN) distribution. The latter model exhibits a greater flexibility than the modified skew-normal (MSN) distribution since it is able to model heavily skewed data and thick tails. In addition, the tails are controlled by the shape parameter and the degrees of freedom. We provide the density of this new distribution and present some of its more important properties including a general expression for the moments. The Fisher’s information matrix together with the observed matrix associated with the log-likelihood are also given. Furthermore, the non-singularity of the Fisher’s information matrix for the MStN model is demonstrated when the shape parameter is zero. As the MStN model presents an inferential problem in the shape parameter, Firth’s method for bias reduction was applied for the scalar case and for the location and scale case.

Details

Title

Likelihood Based Inference and Bias Reduction in the Modified Skew-t-Normal Distribution

Author

Arrué, Jaime¹

; Arellano-Valle, Reinaldo B²; Calderín-Ojeda, Enrique³

; Venegas, Osvaldo⁴

; Gómez, Héctor W¹

¹ Departamento de Estadística y Ciencias de Datos, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile
² Departamento de Estadística, Facultad de Matemáticas, Pontificia Universidad Católica de Chile, Santiago 7820436, Chile
³ Centre for Actuarial Studies, Department of Economics, The University of Melbourne, Melbourne, VIC 3010, Australia
⁴ Departamento de Ciencias Matemáticas y Físicas, Facultad de Ingeniería, Universidad de Católica de Temuco, Temuco 4780000, Chile

First page

3287

Publication year

2023

Publication date

2023

Publisher

MDPI AG

e-ISSN

22277390

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/math11153287

ProQuest document ID

2849020852

Likelihood Based Inference and Bias Reduction in the Modified Skew-t-Normal Distribution

Jump to:

Full text

Abstract

Details

Suggested sources