A New Multimodal Modification of the Skew Family

Full text

Turn on search term navigation

1. Introduction

Let h and G, respectively, be a probability density function (pdf) symmetric with respect to zero and a cumulative distribution function (cdf), such that the derivative of G is symmetric with respect to zero. Then,

(1) $\begin{matrix} f_{Y} (y; λ) = 2 h (y) G (λ y), - \infty < y < \infty \end{matrix}$

is a density function for all

λ

in the reals, Azzalini [1], where

λ

is a skewness parameter, denoted by

Y \sim S K (λ)

. In the case that h and G are the pdf and cdf of the standard normal distribution in (1), the resulting distribution is called the skew-normal distribution, represented by the expression

f_{Y} (y; λ) = 2 ϕ (y) Φ (λ y)

, denoted by

Y \sim S N (λ)

. Furthermore, when a random variable follows a skew-normal distribution with location parameters

μ \in ℜ

, scale

σ > 0

, and skewness

λ \in ℜ

, it will be denoted by

Y \sim S N (μ, σ, λ)

Although the skew distribution (see [1]) can function appropriately in a wide variety of environments where the data exhibit unimodality, this model does not perform well in the presence of multimodality, that is, when there are multiple modes or peaks in the distribution empirical. The presence of multimodality can be explained by different reasons, including the existence of multiple groups or subpopulations with unique characteristics, or by the existence of latent variables that significantly influence the distribution of the population. In such cases, a mixed distribution is one of the first alternatives considered for modeling; however, its use implies addressing the problem of non-identifiability. Various methods for introducing new flexible probability distributions can be found in the statistical literature. There are many examples that we could mention, but the approaches proposed in Elal-Olivero [2], Gómez et al. [3], Venegas et al. [4], and Bolfarine et al. [5] are especially attractive when trying to propose a new bimodal distribution. The objective of this article was to develop an alternative multimodal family for the skew-normal distribution, for which we propose a weighted version, Fisher [6] and Rao [7], of the skew distribution that can present asymmetric shapes with up to three modes. We provide evidence that new family performance, being flexible in both asymmetry and ways involving bimodality, can overcome some important distributions in the literature.

Gómez-Déniz et al. [8,9] present two extensions of the skew-normal family, to model bimodality and multimodality.

The first is defined by

(2) $\begin{matrix} f_{Y} (y; λ, a) = g_{Y} (y) [G_{X} (λ y + a) + G_{X} (λ y - a)], \end{matrix}$

where

g_{Y}

is a density function that is symmetric about zero, and where

G_{X}

is a cdf of a distribution also symmetric about zero,

y \in ℜ, a \in ℜ, λ \in ℜ

The second is defined as follows: if f is a symmetric pdf around 0, defined by $f (w_{α} (x))$ , with $w_{α} (x) = x - \frac{α}{x}$ , where $α \geq 0$ and F is the corresponding cdf, then we have the following family of bimodal asymmetric distributions:

(3) $\begin{matrix} g (x; α, λ) = \{\begin{matrix} 2 F (λ x) f (w_{α} (x)) & , & x \neq 0 \\ f (0) & , & x = 0 . \end{matrix} \end{matrix}$

These models present more flexibility than the skew family of distributions, since for different values of the parameters they provide a distribution that can present unimodality or bimodality. On the other hand, Reyes et al. in [10,11] present bimodal distributions for the exponential case and Birnbaum Saunders, respectively. In this paper, we present a modification to the family of skew distributions given in Equation (1), which also includes the Azzalini family of skew distributions (see Azzalini [1]) as a particular case. The methodology used is based on the multiplication of Azzalini’s proposal by a polynomial of degree 4 and by adding a new parameter to the family. This new family is shown as an alternative to the families presented by Gómez-Déniz et al. [8,9].

This article is organized as follows: In Section 2, an expression is obtained for the pdf of the new family along with its most relevant properties: moments, kurtosis coefficient, and log-likehood function. In Section 3, the particular case of the normal distribution is studied. In addition, a simulation study is included, in which the behavior of the estimators of the proposed family for this particular case is evaluated. Two applications to real data are shown, one related to medical data and the other to environmental data. In Section 4, the particular case of the Laplace distribution is studied. In addition, a simulation study is included, in which the behavior of the estimators of the proposed family for this particular case is evaluated. An application to environmental data is shown. Finally, Section 5 presents the discussion.

2. Modified Generalized Skew Distribution

2.1. Density Function

Let Y be a random variable, let h be a density function symmetric with respect to zero, and let G be a cumulative distribution function whose density is also symmetric with respect to zero. We will say that Y is a distributed Modified Generalized Skew (MGS) with parameters $α$ that control the number of modes and $λ$ the skewness, denoted by $Y \sim M G S (α, λ)$ .

Theorem 1.

Let $Y \sim M G S (α, λ)$ ; then, the density function of Y is given by

(4) $\begin{matrix} f_{Y} (y, α, λ) = \frac{2}{1 + α ρ_{4}} (1 + α y^{4}) h (y) G (λ y), \end{matrix}$

where $y \in ℜ$ , $λ \in ℜ$ , $α \geq 0$ , and $ρ_{4}$ is the moment of order 4 of a random variable X with a skew distribution of parameter λ.

Proof.

$\begin{matrix} \int_{- \infty}^{\infty} f_{Y} (y) d y & = & \int_{- \infty}^{\infty} C_{0} (1 + α y^{4}) 2 h (y) G (λ y) d y \\ = & C_{0} [\int_{- \infty}^{\infty} 2 h (y) G (λ y) d y + \int_{- \infty}^{\infty} α y^{4} 2 h (y) G (λ y) d y] \\ = & C_{0} [1 + α E (X^{4})] \\ = & C_{0} [1 + α ρ_{4}] \\ = & 1, \end{matrix}$

where

C_{0} = \frac{1}{1 + α ρ_{4}}

and

ρ_{4}

is the moment of order 4 of a random variable X with a skew distribution of parameter

λ

. □

2.2. Important Results

In this section, we present some results of the MGS distribution.

Let $Y \sim M G S (α, λ)$ , $λ \in ℜ$ , and $α \geq 0$ ; then:

1.. $f_{Y} (y; 0, 0) = h (y)$ .
2.. $f_{Y} (y; α, 0) = \frac{1}{1 + α ρ_{4}} (1 + α y^{4}) h (y)$ .
3.. $f_{Y} (y; 0, λ) = 2 h (y) G (λ y)$ .

Item 1 indicates that if both parameters are zero then the family of symmetric density functions is recovered. Item 2 shows that when $λ = 0$ a family of uni or bimodal symmetric distributions is obtained. Finally, Item 3 indicates that if $α = 0$ then the family of skew distributions is obtained.

The above results are illustrated in the following diagram:

$M G S (α, λ) \begin{matrix} α = 0, h = N o r m a l ⟶ S N (0, λ) ⟶ λ = 0 ⟶ N (0, 1) \\ ↗ \\ ⟶ & α = 0, h = L o g i s t i c ⟶ S L O G (0, λ) ⟶ λ = 0 ⟶ L O G (0, 1) \\ ↘ \\ α = 0, h = L a p l a c e ⟶ S L P (0, λ) ⟶ λ = 0 ⟶ L P (0, 1) \end{matrix}$

2.3. Moments

The following statement shows the moments for the $M G S$ distribution. These depend on the moments of the skew distribution.

Proposition 1.

If $Y \sim M G S (α, λ)$ then for $r = 1, 2, . . .$ we have

$μ_{r} = E [Y^{r}] = \frac{1}{1 + α ρ_{4}} [ρ_{r} + α ρ_{r + 4}],$

where $y \in ℜ$ , $λ \in ℜ$ , $α \geq 0$ , and $ρ_{r}$ are the moments of order r of a random variable X with a skew distribution of parameter λ.

Proof.

$\begin{matrix} μ_{r} & = & E [Y^{r}] \\ = & \frac{1}{1 + α ρ_{4}} \int_{- \infty}^{\infty} 2 y^{r} (1 + α y^{4}) h (y) G (λ y) d y \\ = & \frac{1}{1 + α ρ_{4}} \int_{- \infty}^{\infty} 2 (y^{r} + α y^{r + 4}) h (y) G (λ y) d y \\ = & \frac{1}{1 + α ρ_{4}} [ρ_{r} + α ρ_{r + 4}] . \end{matrix}$

□

The first four moments of Y are given in the following corollary:

Corollary 1.

If $Y \sim M G S (α, λ)$ then

$\begin{matrix} μ_{1} & = & \frac{1}{1 + α ρ_{4}} [ρ_{1} + α ρ_{5}] \\ μ_{2} & = & \frac{1}{1 + α ρ_{4}} [ρ_{2} + α ρ_{6}] \\ μ_{3} & = & \frac{1}{1 + α ρ_{4}} [ρ_{3} + α ρ_{7}] \\ μ_{4} & = & \frac{1}{1 + α ρ_{4}} [ρ_{4} + α ρ_{8}], \end{matrix}$

Proof.

Replacing these expressions in Proposition 1, for $r = 1, 2, 3, 4$ the results are obtained. □

Corollary 2.

If $Y \sim M G S (α, λ)$ then

$\begin{matrix} E (Y^{r}; α, λ) & = & - E (Y^{r}; α, - λ), i f r o d d, \\ E (Y^{r}; α, λ) & = & \frac{1}{1 + α ρ_{4}} \int_{- \infty}^{\infty} 2 y^{r} (1 + α y^{4}) f (y) d y - E (Y^{r}; α, - λ), i f r e v e n . \end{matrix}$

Proof.

$\begin{matrix} E [Y^{r}; α, λ] & = & \frac{1}{1 + α ρ_{4}} \int_{- \infty}^{\infty} 2 y^{r} (1 + α y^{4}) h (y) G (λ y) d y \\ = & \frac{1}{1 + α ρ_{4}} \int_{- \infty}^{\infty} 2 y^{r} (1 + α y^{4}) h (y) [1 - G (- λ y) d y] \\ = & \frac{1}{1 + α ρ_{4}} \int_{- \infty}^{\infty} 2 y^{r} (1 + α y^{4}) h (y) - E [Y^{r}; α, - λ] . \end{matrix}$

For r even and odd we obtain what is required. □

Corollary 3.

If $Y \sim M G S (α, λ)$ then

$\begin{matrix} β_{1} (α, λ) & = & - β_{1} (α, - λ) \\ α_{2} (α, λ) & = & α_{2} (α, - λ) . \end{matrix}$

Proof.

Using Corollary 2 and substituting into the standardized skewness coefficients ( $β_{1}$ ) and kurtosis $(α_{2})$ given by

$\begin{matrix} β_{1} & = & \frac{μ_{3} - 3 μ_{2} μ_{1} + 2 μ_{1}^{3}}{{[(μ_{2} - μ_{1}^{2})]}^{3 / 2}}, \\ α_{2} & = & \frac{μ_{4} - 4 μ_{1} μ_{3} + 6 μ_{1}^{2} μ_{2} - 3 μ_{1}^{4}}{{(μ_{2} - μ_{1}^{2})}^{2}}, \end{matrix}$

respectively, the result is obtained. □

2.4. $M G S$ Distribution with Location and Scale Parameters

The family of distributions $M G S (α, λ)$ can be extended by means of a linear transformation, introducing location and scale parameters, adding more flexibility to the model proposed in (4).

Let $Y \sim M G S (α, λ)$ ; then, $Z = μ + σ Y$ follows a Modified Generalized Skew model with location parameters $μ$ and scale $σ$ denoted by $Z \sim M G S (μ, σ, α, λ)$ , and its density function is given by

(5) $f_{Z} (z; μ, σ, α, λ) = \frac{2}{σ (1 + α ρ_{4})} (1 + α {(\frac{z - μ}{σ})}^{4}) h (\frac{z - μ}{σ}) G (λ (\frac{z - μ}{σ})),$

where

z \in ℜ

λ \in ℜ

α \geq 0

, and

ρ_{4}

is the moment of order 4 of a random variable X with a skew distribution of parameter

λ

The moments of the distribution of $Z \sim M G S (μ, σ, α, λ)$ are given by

Proposition 2.

Let $Z \sim M G S (μ, σ, α, λ)$ ; then,

$\begin{matrix} E (Z^{r}) = E [{(μ + σ Y)}^{r}] = \sum_{i = 0}^{r} (\binom{r}{i}) μ^{r - i} σ^{i} μ_{i} = \frac{1}{1 + α ρ_{4}} \sum_{i = 0}^{r} (\binom{r}{i}) μ^{r - i} σ^{i} [ρ_{i} + α ρ_{i + 4}], \end{matrix}$

$ρ_{r}$ are the moments of order r of a random variable X with a skew distribution of parameter λ.

Proof.

By developing the Newton binomial and placing the moments given in Proposition 1 into $E (Z^{r})$ the result is obtained. □

2.5. Log-Likelihood Function

Let $z_{1}, z_{2}, \dots, z_{n}$ be a random sample of a variable Z, such that $Z \sim M G S (θ)$ with $θ = (μ, σ, α, λ)$ ; then, the log-likelihood function is

$\begin{matrix} l (θ; z) & = & - n log (σ) - n log (1 + α ρ_{4}) + \sum_{i = 1}^{n} log \{1 + α {(\frac{z_{i} - μ}{σ})}^{4}\} \\ - \sum_{i = 1}^{n} log \{h (\frac{z_{i} - μ}{σ})\} - \sum_{i = 1}^{n} log \{G (λ (\frac{z_{i} - μ}{σ}))\} . \end{matrix}$

Partially deriving the log-likelihood function with respect to the parameters and solving the system of equations in numerical form, we obtain the maximum likelihood estimators of the parameters $μ$ , $σ$ , $α$ , and $λ$ .

3. Normal Distribution Case

Let us consider the particular case in Equation (5) when $h = ϕ$ and $G = Φ$ . If a random variable follows a Modified Generalized Skew Normal (MGSN) distribution then we will denote it by $Z \sim M G S N (μ, σ, α, λ)$ , and its pdf is given by

(6) $f_{Z} (z; μ, σ, α, λ) = \frac{2}{σ (1 + 3 α)} (1 + α {(\frac{z - μ}{σ})}^{4}) ϕ (\frac{z - μ}{σ}) Φ (λ \frac{(z - μ)}{σ}),$

where

z \in ℜ

μ \in ℜ

σ > 0

λ \in ℜ

, and

α \geq 0

Figure 1 shows the density function of the proposed model MGSN for the parameters $μ = 0$ , $σ = 1$ , and different values of $α$ and $λ$ compared to the Gómez-Déniz [8] model for the normal case, called the Generalized Skew Normal (GSN) distribution. In this representation, the great flexibility of the new distribution can be seen to model unimodal, bimodal, and trimodal data with only two parameters, while the GSN model is only unimodal using the same number of parameters:

Proposition 3.

If $Y \sim M G S N (μ, σ, α, λ)$ then its density function presents at most three modes.

Proof.

Without losing generality, we consider $Y \sim M G S N (0, 1, α, λ)$ and the parameter $λ$ only affects the asymmetry; we can assume $λ = 0$ in the density given in (6); then,

$f_{Y} (y) = \frac{1}{(1 + 3 α)} (1 + α y^{4}) ϕ (y) .$

Differentiating and equating to zero, we have

$\frac{\partial f_{Y} (y)}{\partial y} = (1 + α y^{4}) (- y ϕ (y)) + 4 α y^{3} ϕ (y) = 0,$

$\frac{\partial f_{Y} (y)}{\partial y} = - y (1 + α y^{4}) + 4 α y^{3} = 0,$

resulting in a polynomial of degree 5, that is, it has at most three maximums. For the normal case,

λ = 0

, and values of

α \leq 0.25

, the density is unimodal. Otherwise, it is trimodal when

α

is finite or bimodal when

α \to \infty

. □

In Figure 2, it can be observed that the graphical representation of the MGSN model when $λ = 0$ for values of $α \in [0, 1 / 4)$ is unimodal, $α \geq 1 / 4$ is trimodal, and when $α \to \infty$ it is bimodal.

3.1. Moments

The moments for the $M G S N (0, 1, α, λ)$ distribution are obtained by substituting into Corollary 1 the moments of the skew-normal distribution given by Henze [12]:

$\begin{matrix} μ_{1} & = & \frac{\sqrt{\frac{2}{π}} λ}{(1 + 3 α) {(λ^{2} + 1)}^{\frac{5}{2}}} [λ^{4} (8 α + 1) + 2 λ^{2} (10 α + 1) + 1 + 15 α] \\ μ_{2} & = & \frac{1 + 15 α}{1 + 3 α} \\ μ_{3} & = & \frac{\sqrt{\frac{2}{π}} λ}{(1 + 3 α) {(λ^{2} + 1)}^{\frac{7}{2}}} [2 λ^{6} (24 α + 1) + 7 λ^{4} (24 α + 1) + 2 λ^{2} (105 α + 4) + 3 + 105 α] \\ μ_{4} & = & \frac{3 + 108 α}{1 + 3 α} . \end{matrix}$

Figure 3 shows the graphs of the skewness and kurtosis coefficients of the MGSN distribution for $μ = 0$ , $σ = 1$ , and different values of $α$ and $λ$ . In the left panel, it can be seen that for a fixed value of $α$ the skewness coefficient is an odd function with respect to $λ$ . As an example, given $α = 8$ , the value of the skewness coefficient for $λ = 2$ is $0.1234$ and for $λ = 2$ it is $- 0.1234$ . In the right panel, we can see that given a fixed value of $α$ the kurtosis coefficient is an even function with respect to $λ$ . For example, given $α = 8$ , the value of the kurtosis coefficient for $λ = 2$ is $3.8962$ and for $λ = 2$ it is $3.8962$ .

Figure 4 shows, in the right panel, the profile of the asymmetric coefficient for different values of $α$ . It can be seen that for $α = 0$ the profile coincides with the profile of the skew coefficient of the skew-normal distribution. Furthermore, through exploratory analysis we can conclude that if $α \to \infty$ and $λ \to 0.7923602$ then $β_{1}$ converges to $\pm 1.700501$ . Similarly, we have that for $α = 0$ the profile of the kurtosis, shown in the right panel, coincides with the profile of the kurtosis coefficient of the skew-normal distribution. Also, through exploratory analysis, we can conclude that if $α \to \infty$ and $λ \to 1.023191$ then the value of $α_{2}$ converges to $7.878286,$ and if $α \to \infty$ and $λ \to 0$ then the value of $α_{2}$ converges to $1.4$ .

The skewness and kurtosis values for fixed values of $α$ and $λ$ , obtained from Table 1, show numerically that the skewness and kurtosis coefficients are even and odd functions with respect to $λ$ , respectively.

3.2. Estimate

Let $z_{1}, z_{2}, \dots, z_{n}$ be a random sample of a variable Z, such that $Z \sim M G S N (θ)$ with $θ = (μ, σ, α, λ)$ ; then, the log-likelihood function is

$\begin{matrix} l (θ; \tilde{z}) & = & \sum_{i = 1}^{n} log \{1 + α {(\frac{z_{i} - μ}{σ})}^{4}\} - n log (1 + 3 α) - n log (σ) - \sum_{i = 1}^{n} {(\frac{z_{i} - μ}{\sqrt{2} σ})}^{2} \\ + & \sum_{i = 1}^{n} log \{Φ (λ (\frac{z_{i} - μ}{σ}))\} . \end{matrix}$

After deriving the log-likelihood function, the normal equations are given by

$\begin{matrix} \frac{\partial ℓ (θ; \tilde{z})}{\partial μ} & = & - \frac{1}{σ} \overset{n}{\sum_{i = 1}} \frac{4 α {(\frac{z_{i} - μ}{σ})}^{3}}{(1 + α {(\frac{z_{i} - μ}{σ})}^{4})} + \frac{1}{σ} \overset{n}{\sum_{i = 1}} (\frac{z_{i} - μ}{σ}) - \frac{λ}{σ} \sum_{i = 1}^{n} \frac{ϕ (λ (\frac{z_{i} - μ}{σ}))}{Φ (λ (\frac{z_{i} - μ}{σ}))} = 0, \\ \frac{\partial ℓ (θ; \tilde{z})}{\partial σ} & = & \frac{1}{σ} \overset{n}{\sum_{i = 1}} \frac{4 α {(\frac{z_{i} - μ}{σ})}^{4}}{(1 + α {(\frac{z_{i} - μ}{σ})}^{4})} - \frac{n}{σ} + \frac{1}{σ} \overset{n}{\underset{i = 1}{\sum {(\frac{z_{i} - μ}{σ})}^{2}}} - \frac{λ}{σ} \sum_{i = 1}^{n} (\frac{z_{i} - μ}{σ}) \frac{ϕ (λ (\frac{z_{i} - μ}{σ}))}{Φ (λ (\frac{z_{i} - μ}{σ}))} = 0, \\ \frac{\partial ℓ (θ; \tilde{z})}{\partial α} & = & \overset{n}{\sum_{i = 1}} \frac{{(\frac{z_{i} - μ}{σ})}^{4}}{(1 + α {(\frac{z_{i} - μ}{σ})}^{4})} - \frac{3 n}{1 + 3 α} = 0, \\ \frac{\partial ℓ (θ; \tilde{z})}{\partial λ} & = & \overset{n}{\sum_{i = 1}} (\frac{z_{i} - μ}{σ}) \frac{ϕ (λ (\frac{z_{i} - μ}{σ}))}{Φ (λ (\frac{z_{i} - μ}{σ}))} = 0 . \end{matrix}$

Maximum Likelihood Estimators (MLE) are obtained, maximizing normal equations. These equations do not allow an analytical solution, so it is necessary to use iterative methods.

3.3. Simulation Study

There are many programs that provide built-in random number generators, but there are probability distributions that are not covered by such software. In the case of the MGSN distribution, we use the acceptance–rejection method to generate random numbers of the distribution $M G S N (μ, σ, α, λ)$ with the pdf defined in (6), according to the algorithm below. The results of a sequence of n random numbers are stored within a matrix that we call the n-vector. Since the MGSN distribution has non-finite support, we use a constant $l_{1} > 0$ to limit the generated MGSN values. Furthermore, we consider another constant $l_{2} > 0$ corresponding to the maximum value of the pdf MGSN, which must be evaluated in the true parameters.

3.3.1. Algorithm

To start the algorithm, we need to define the parameters $μ$ , $σ$ , $α$ , and $λ$ of the MGSN distribution, as follows:

n: the length of the n-vector.
Y: a random variable with $M G S N (μ, σ, α, λ)$ distribution.
$f_{Y} (y)$ : the MGSN pdf with $y > 0$ .
$l_{1}$ : a lower limit for the MGSN numbers to be generated with $l_{1} > 0$ .
$l_{2}$ : the maximum value of $f_{Y}$ with $l_{2} > 0$ .
$U_{1}$ : a random variable with a uniform distribution in $(- l_{1}, l_{1})$ , $U (- l_{1}, l_{1})$ , in short.
$U_{2}$ : a random variable with a $U (0, l_{2})$ distribution.

Acceptance–rejection algorithm to generate numbers from the $M G S N (μ, σ, α, λ)$ distribution:

Begin Input: n, $μ$ , $σ$ , $α$ , $λ$
Output: n-vector,
Set $l_{2} = m a x_{y > 0} {f_{Y} (y)}$ ;
Generate a value $u_{1}$ from $U_{1} \sim U (- l_{1}, l_{1})$ ;
Obtain a value $u_{2}$ from $U_{2} \sim U (0, l_{2})$ ;
Set $y = u_{1}$ from $Y \sim M G S N (μ, σ, α, λ)$ if $u_{2} \leq f (u_{1})$ , append y to n-vector; otherwise, go back to step 3;
Repeat steps 3–5 until the length of n-vector is equal to n;
end

Computational simulations were performed in the R programming language, using the “optim” function quasi-Newton method “BFGS” from the “stats” package. We used a computer with the following characteristics: (i) OS: Windows 10 Pro 64-bit; (ii) RAM: 8 GB; and (iii) Processor: Intel(R) Core(TM) i7-8550U CPU at 1.99 GigaHertz. The algorithm above was run 2000 times with n = 50, 100, 200, and 500; the average processing time was 0.04565 s. Below, we show the EMVs obtained from the $M G S N (μ, σ, α, λ)$ model for different parameter values and random sample sizes, using the acceptance–rejection algorithm.

3.3.2. Simulation Results

Table 2 presents the results of the simulation study, illustrating the behavior of the MLE for 2000 samples of sizes $n =$ 50, 100, 200, and 500 of a population with distribution $M G S N (μ, σ, α, λ)$ . Also, it can be seen that the estimates of the parameters are quite close to the true value, and that the standard deviations and average lengths of the intervals are small. These results show the expected asymptotic behavior. On the other hand, the empirical hedges are very close in all cases to the nominal value of $95 %$ confidence.

3.4. Applications for the Normal Case

In this section, we show two real data applications for the MGSN model given in (6) and compare their results with the proposed models given in [8,9] for the normal and skew-normal cases (GSN) and (GSN2), respectively, given in (2) and (3), considering location and scale parameters, as follows:

$f_{Y} (y; μ, σ, λ, α) = ϕ (z \frac{(y - μ)}{σ}) [Φ (λ \frac{(y - μ)}{σ} + α) + Φ (λ \frac{(y - μ)}{σ} - α)]$

and

$\begin{matrix} g (y; μ, σ, α, λ) = \{\begin{matrix} 2 Φ (λ \frac{(y - μ)}{σ}) ϕ (w_{α} (\frac{(y - μ)}{σ})) & , & y \neq μ \\ ϕ (0) & , & y = μ \end{matrix} \end{matrix}$

3.4.1. Application 1

The data used in Application 1 correspond to the age and frequency of cancer called Kaposis sarcoma. This is a type of cancer that can form masses in the skin, lymph, nodes, or other organs without distinguishing the subtypes. The data were collected from the website of the Office for National Statistics (ONS, Health Statistics section), and they can be seen in Table A1 in the Appendix (see Appendix A). It can be seen that there is a greater incidence in individuals aged around 25 years, as well as for those aged about 60 years. The records were taken during the years 1995 to 2016 and correspond to different regions of the UK.

Table 3 shows descriptive summary measures of data related to Kaposis sarcoma. Table 4 shows the values of the maximum likelihood estimates and their corresponding standard deviations for the GSN2, GSN, and MGSN models. Using the Akaike Information Criterion (AIC) [13] and the Akaike Consistent Information Criterion (CAIC) [14], it can be seen that the MGSN model presents a better fit, since its value is lower. Figure 5 shows the histogram and plot of the GSN2, GSN, and MGSN models for the Kaposis sarcoma data set. Through the graphical representation, it can be seen that the MGSN model apparently fits the data better.

3.4.2. Application 2

The second data set corresponds to the duration of the Old Faithful geyser eruption (see Appendix, Table A2) in Yellowstone National Park, WY, USA [15]. Table 5 shows the descriptive summary measures of the data related to the duration of the Old Faithful Geyser eruption. Table 6 shows the values of the maximum likelihood estimates and their corresponding standard deviations for the GSN2, GSN, and MGSN models. Using the AIC [13] and CAIC [14] criteria, it can be seen that the MGSN model presents a better fit, because its values are smaller. Figure 6 shows the histogram and graphical representation of the GSN2, GSN, and MGSN models for the eruption time data set. Through the graphical representation, it can be seen that the MGSN model apparently fits the data better.

4. Laplace Distribution Case

Let us consider the particular case in Equation (5) when h and G are, respectively, the cumulative and density function of the Laplace distribution. If a random variable follows a Modified Generalized Skew Laplace (MGSLP) distribution, we will denote it by $Z \sim M G S L P (μ, σ, α, λ)$ , and its pdf is given by

(7) $f_{Z} (z; μ, σ, α, λ) = \frac{2}{σ (1 + 3 α)} (1 + α {(\frac{z - μ}{σ})}^{4}) h (\frac{z - μ}{σ}) G (λ \frac{(z - μ)}{σ}),$

where

z \in ℜ

μ \in ℜ

σ > 0

α \geq 0

, and

λ \in ℜ

4.1. Simulation Study for the Case of the Laplace Distribution

Table 7 presents the results of the simulation study, illustrating the behavior of the MLE for 2000 samples of sizes n = 50, 100, 200, and 500 of a population with distribution $M G S L P (μ, σ, α, λ)$ . Also, it can be seen that the estimates of the parameters are quite close to the true value, and that the standard deviations and average lengths of the intervals are small. These results show the expected asymptotic behavior. On the other hand, the empirical hedges are very close in all cases to the nominal value of $95 %$ confidence.

4.2. Application for the Laplace Distribution Case

In this section, we show one real-data application for the MGSLP model given in (7) and compare the results with the models proposed in [8,9] for the Laplace and skew-Laplace cases (GSLP) and (GSLP2), respectively, given in (2) and (3), as follows:

$f_{Y} (y; μ, σ, λ, α) = f (z \frac{(y - μ)}{σ}) [F (λ \frac{(y - μ)}{σ} + α) + F (λ \frac{(y - μ)}{σ} - α)]$

and

$\begin{matrix} g (y; μ, σ, α, λ) = \{\begin{matrix} 2 F (λ \frac{(y - μ)}{σ}) f (w_{α} (\frac{(y - μ)}{σ})) & , & y \neq μ \\ f (0) & , & y = μ \end{matrix} \end{matrix}$

where f and F correspond to the density and cumulative distribution of the Laplace distribution, respectively.

For the data corresponding to the duration of the Old Faithful geyser eruption (see Appendix A, Table A2) in Yellowstone National Park, Wyoming, USA [15], Table 8 shows the values of the maximum likelihood estimates and their corresponding standard deviations for the GSLP2, GSLP, and MGSLP models. Using the AIC [13] and CAIC [14] criteria, it can be seen that the MGSLP model presents a better fit because its values are smaller. Figure 7 shows the histogram and graphical representation of the GSLP2, GSLP, and MGSLP models for the eruption time data set. Through the graphical representation, it can be seen that the MGSLP model apparently best fits the eruption time data set.

5. Discussion

We have proposed a new family based on a weighted version of the skew distribution, which has a parameter, $α$ , that allows modeling data sets that present one, two, or three modes. That is, we have a family of models that are more flexible than the distributions proposed by Gómez-Déniz et al. [8,9], considering that these have the same number of parameters. Its density function, moments, and some properties were studied; it should be noted that the mathematical treatment is less complex than other distributions given in the current literature. In particular, when the parameter $α$ takes the value zero the new family recovers the family of skew distributions. Two particular cases of the new model were studied, one for the normal distribution and the other for the Laplace distribution. A simulation algorithm was developed, using the acceptance–rejection method, to obtain random samples of different sizes from the proposed model, for the two particular cases. Subsequently, 2000 iterations were carried out for each of these samples, obtaining the estimates through the maximum likelihood method, using the “optim” function of the R software, for different values of $μ$ , $σ$ , $α$ , and $λ$ . This study allowed us to observe the good asymptotic behavior of the parameter estimates. Two applications were carried out with real data, one related to medicine and the other to the environment, where it was empirically shown that the proposed family fits better than the families presented by Gómez-Déniz et al. [8,9]. This new model is a potential contribution for professionals who work in data analysis and/or users of statistics.

Author Contributions

Data curation, J.R.; formal analysis, J.R., M.A.R., P.L.C., and J.A.; investigation, J.R., M.A.R., and P.L.C.; methodology, J.R., M.A.R., P.L.C., and J.A.; writing—original draft, J.R., M.A.R., P.L.C., and J.A.; writing—review and editing, M.A.R., P.L.C., and J.A.; Funding Acquisition, J.R., M.A.R., and J.A. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1. Plot of MGSN pdf (solid line) and GSN pdf (dashed line) for different values of [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.].

Figure 2. Plot of the MGSN model for the case [Forumla omitted. See PDF.] and different values of [Forumla omitted. See PDF.].

Figure 3. Plots of the skewness (left) and kurtosis (right) of the MGSN distribution.

Figure 4. Profile of coefficient skewness (left) and kurtosis (right) of the MGSN distribution.

Figure 5. MGSN distribution (solid line), GSN distribution (dashed line), and GSN2 distribution (dotted line) for the Kaposis sarcoma data.

View Image - Figure 6. Histogram and graphical representation of MGSN distribution (solid line), GSN distribution (dashed line), and GSN2 distribution (dotted line) for the eruption time data.

Figure 6. Histogram and graphical representation of MGSN distribution (solid line), GSN distribution (dashed line), and GSN2 distribution (dotted line) for the eruption time data.

View Image - Figure 7. Histogram for the eruption time data set and the fit of the graphs for the MGSLP (solid line), GSLP (dashed line), and GSLP2 (dotted line) distributions.

Figure 7. Histogram for the eruption time data set and the fit of the graphs for the MGSLP (solid line), GSLP (dashed line), and GSLP2 (dotted line) distributions.

Table 1

Coefficients skewness and kurtosis values of the MGSN model for different values of $α$ and $λ$ .

	Coefficient Skewness					Coefficient Kurtosis
$α$	$λ = - 3$	$λ = - 2$	$λ = 0$	$λ = 2$	$λ = 3$	$λ = - 3$	$λ = - 2$	$λ = 0$	$λ = 2$	$λ = 3$
1	0.1047	0.2326	0	−0.2326	−0.1047	2.7460	2.9395	1.6875	2.9395	2.7460
2	0.1413	0.2907	0	−0.2907	−0.1413	3.1079	3.4253	1.5515	3.4253	3.1079
3	0.1029	0.2659	0	−0.2659	−0.1029	3.2536	3.6492	1.5028	3.6492	3.2536
4	0.0588	0.2307	0	−0.2307	−0.0588	3.3133	3.7625	1.4778	3.7625	3.3133
5	0.0195	0.1976	0	−0.1976	−0.0195	3.3367	3.8247	1.4626	3.8247	3.3367
6	−0.0139	0.1689	0	−0.1689	0.0139	3.3436	3.8609	1.4524	3.8609	3.3436
7	−0.0419	0.1444	0	−0.1444	0.0419	3.3425	3.8828	1.4450	3.8828	3.3425
8	−0.0656	0.1234	0	−0.1234	0.0656	3.3376	3.8962	1.4395	3.8962	3.3376
9	−0.0859	0.1055	0	−0.1055	0.0859	3.3310	3.9046	1.4351	3.9046	3.3310
10	−0.1033	0.0899	0	−0.0899	0.1033	3.3236	3.9098	1.4316	3.9098	3.3236
11	−0.1184	0.0764	0	−0.0764	0.1184	3.3160	3.9129	1.4288	3.9129	3.3160
12	−0.1316	0.0645	0	−0.0645	0.1316	3.3085	3.9146	1.4264	3.9146	3.3085
13	−0.1432	0.0540	0	−0.0540	0.1432	3.3014	3.9153	1.4244	3.9153	3.3014
14	−0.1536	0.0446	0	−0.0446	0.1536	3.2946	3.9154	1.4227	3.9154	3.2946
15	−0.1628	0.0362	0	−0.0362	0.1628	3.2882	3.9151	1.4212	3.9151	3.2882
16	−0.1711	0.0287	0	−0.0287	0.1711	3.2821	3.9145	1.4199	3.9145	3.2821
17	−0.1786	0.0219	0	−0.0219	0.1786	3.2765	3.9137	1.4187	3.9137	3.2765
18	−0.1854	0.0157	0	−0.0157	0.1854	3.2711	3.9128	1.4177	3.9128	3.2711
19	−0.1916	0.0100	0	−0.0100	0.1916	3.2662	3.9118	1.4167	3.9118	3.2662
20	−0.1973	0.0049	0	−0.0049	0.1973	3.2615	3.9107	1.4159	3.9107	3.2615

Table 2

Simulation of 2000 iterations for parameter estimates for the model $M G S N (μ, σ, α, λ)$ by the maximum likelihood method.

n	$μ$	$σ$	$λ$	$α$	$\hat{μ}$	sd $(\hat{μ})$	Ali $(\hat{μ})$	C $(\hat{μ})$	$\hat{σ}$	sd $(\hat{σ})$	Ali $(\hat{σ})$	C $(\hat{σ})$	$\hat{λ}$	sd $(\hat{λ})$	Ali $(\hat{λ})$	C $(\hat{λ})$	$\hat{α}$	sd $(\hat{α})$	Ali $(\hat{α})$	C $(\hat{α})$
50	0	1	−0.5	0.4	0.0018	0.4781	1.8743	93.55	1.0014	0.1538	0.6028	94.10	−0.5577	0.4682	1.8354	96.30	0.5471	0.3396	1.3314	93.80
100	0	1	−0.5	0.4	0.0090	0.3758	1.4730	95.40	1.0045	0.1202	0.4712	95.30	−0.5226	0.2915	1.1427	96.85	0.4688	0.1826	0.7159	94.30
200	0	1	−0.5	0.4	0.0079	0.2796	1.0961	95.50	1.0007	0.0919	0.3602	95.45	−0.5153	0.2800	1.0977	98.30	0.4357	0.1102	0.4319	94.10
500	0	1	−0.5	0.4	0.0108	0.1640	0.6428	95.80	1.0035	0.0540	0.2118	95.35	−0.5087	0.1024	0.4016	95.90	0.4132	0.0619	0.2427	94.25
50	0	1	0.5	2	0.0015	0.2185	0.8564	95.50	1.0009	0.0866	0.3394	95.25	0.5146	0.1529	0.5995	95.75	2.4454	1.2979	5.0879	91.50
100	0	1	0.5	2	0.0016	0.1434	0.5621	94.60	0.9987	0.0574	0.2250	94.50	0.5043	0.1002	0.3928	94.90	2.3937	1.0487	4.1107	92.75
200	0	1	0.5	2	0.0000	0.0991	0.3883	94.50	0.9991	0.0406	0.1590	95.10	0.5013	0.0693	0.2715	94.95	2.2405	0.7663	3.0038	93.35
500	0	1	0.5	2	−0.0018	0.0611	0.2394	94.85	0.9998	0.0245	0.0962	95.05	0.5012	0.0426	0.1670	95.85	2.0869	0.4162	1.6316	94.15
50	0	1	1	0.5	0.0988	0.5310	2.0815	94.40	0.9694	0.1775	0.6956	95.50	1.2925	1.3780	5.4016	94.45	0.6736	0.5719	2.2417	95.80
100	0	1	1	0.5	0.0177	0.4679	1.8342	94.05	0.9923	0.1497	0.5870	95.40	1.2382	1.1636	4.5613	95.95	0.6286	0.4146	1.6253	94.75
200	0	1	1	0.5	0.0195	0.3561	1.3958	94.20	0.9921	0.1148	0.4500	94.50	1.0597	0.5470	2.1441	96.85	0.5683	0.2855	1.1191	96.20
500	0	1	1	0.5	0.0040	0.2340	0.9171	94.75	0.9979	0.0759	0.2974	94.60	1.0260	0.3869	1.5166	98.95	0.5257	0.1443	0.5657	96.05
50	1	2	−0.5	0.4	0.9502	0.9466	3.7106	94.10	1.9923	0.3061	1.2000	95.15	−0.5542	0.6802	2.6664	97.90	0.5401	0.3332	1.3062	94.60
100	1	2	−0.5	0.4	1.0117	0.7657	3.0015	94.75	2.0050	0.2456	0.9629	94.80	−0.5215	0.2852	1.1181	95.95	0.4692	0.1814	0.7111	94.15
200	1	2	−0.5	0.4	1.0322	0.5668	2.2219	95.80	2.0121	0.1784	0.6992	95.05	−0.5217	0.3113	1.2204	98.85	0.4307	0.1126	0.4415	95.60
500	1	2	−0.5	0.4	1.0152	0.3164	1.2401	95.45	2.0032	0.1039	0.4074	95.85	−0.5061	0.0981	0.3844	95.70	0.4127	0.0616	0.2414	94.15
50	−1	2	0.5	2	−0.9740	0.4348	1.7044	94.90	1.9900	0.1679	0.6583	94.75	0.5118	0.1479	0.5796	94.70	2.3971	1.2988	5.0915	91.85
100	−1	2	0.5	2	−0.9940	0.2864	1.1225	94.25	1.9954	0.1149	0.4502	95.25	0.5084	0.1007	0.3946	94.50	2.3808	1.0989	4.3078	92.35
200	−1	2	0.5	2	−0.9934	0.1992	0.7809	95.00	1.9963	0.0797	0.3125	95.05	0.5045	0.0707	0.2772	95.60	2.2129	0.7466	2.9267	93.75
500	−1	2	0.5	2	−0.9965	0.1254	0.4914	95.40	1.9993	0.0501	0.1965	94.90	0.5015	0.0435	0.1704	95.35	2.0710	0.4002	1.5687	94.45
50	−1	1	1	0.5	−0.9152	0.5354	2.0988	95.50	0.9668	0.1803	0.7069	95.20	1.2361	1.1948	4.6836	93.85	0.6921	0.6102	2.3919	95.60
100	−1	1	1	0.5	−0.9782	0.4719	1.8498	94.95	0.9921	0.1522	0.5966	95.30	1.1948	1.0436	4.0910	96.40	0.6291	0.4311	1.6900	95.55
200	−1	1	1	0.5	−0.9917	0.3801	1.4902	93.70	0.9959	0.1221	0.4786	93.90	1.0810	0.6083	2.3844	96.90	0.5725	0.2917	1.1433	96.10
500	−1	1	1	0.5	−1.0064	0.2245	0.8800	94.60	1.0007	0.0730	0.2861	94.90	1.0203	0.2172	0.8512	94.80	0.5274	0.1335	0.5234	94.80

In the above, sd corresponds to the standard deviation, Ali corresponds to the average length of the intervals, and C corresponds to the empirical coverage based on a confidence interval of $95 %$ of the respective EMV of the parameters.

Table 3

Summary statistics for Kaposis sarcoma data set.

n	Mean	Variance	Asymmetry	Kurtosis
29,131	45.396	416.387	0.313	1.936

Table 4

Parameter estimates for GSN2, GSN, and MGSN distributions for Kaposis sarcoma data set.

Parameter Estimates	GSN2 (sd)	GSN (sd)	MGSN (sd)
$\hat{μ}$	37.6241 (0.03552)	37.029 (0.1313)	20.5880 (0.1293)
$\hat{σ}$	21.0537 (0.0808)	22.052 (0.1050)	18.1833 (0.0674)
$\hat{λ}$	0.4912 (0.0085)	4.8080 (0.1180)	3.9293 (0.0525)
$\hat{α}$	0.0754 (0.0017)	5.525 (0.1350)	0.2488 (0.00412)
AIC	256,212.1	253,832.6	249,300.9
CAIC	256,245.2	253,869.7	249,334.0

Table 5

Summary statistics for the eruption time data set.

n	Mean	Variance	Asymmetry	Kurtosis
272	70.897	184.8240	−0.414	1.844

Table 6

Parameter estimates for GSN2, GSN, and MGSN distributions for the eruption time data set.

Parameter Estimates	GSN2 (sd)	GSN (sd)	MGSN (sd)
$\hat{μ}$	65.1850 (0.2520)	75.5992 (0.1313)	57.5424 (1.5939)
$\hat{σ}$	13.088 (0.5570)	14.3610 (0.6651)	9.2529 (0.4983)
$\hat{λ}$	0.6760 (0.1160)	−6.2206 (1.9559)	1.7219 (0.3005)
$\hat{α}$	0.4660 (0.0380)	7.5214 (2.4209)	1.5183 (0.3818)
AIC	2248.85	2142.43	2077.92
CAIC	2266.74	2156.95	2092.34

Table 7

Simulation of 2000 iterations for parameter estimates for the model $M G S L P (μ, σ, α, λ)$ by the maximum likelihood method.

n	$μ$	$σ$	$λ$	$α$	$\hat{μ}$	sd $(\hat{μ})$	Ali $(\hat{μ})$	C $(\hat{μ})$	$\hat{σ}$	sd $(\hat{σ})$	Ali $(\hat{σ})$	C $(\hat{σ})$	$\hat{λ}$	sd $(\hat{λ})$	Ali $(\hat{λ})$	C $(\hat{λ})$	$\hat{α}$	sd $(\hat{α})$	Ali $(\hat{α})$	C $(\hat{α})$
50	1	1	0.5	0.1	1.0030	0.4398	1.7242	95.45	1.0072	0.1773	0.6948	99.25	0.6178	0.4746	1.8606	96.20	0.1181	0.0718	0.2814	96.40
100	1	1	0.5	0.1	0.9940	0.3180	1.2465	95.70	1.0058	0.1110	0.4353	99.25	0.5466	0.2024	0.7936	95.60	0.1082	0.0412	0.1615	96.40
200	1	1	0.5	0.1	1.0055	0.2534	0.9932	97.50	1.0069	0.1382	0.5418	99.60	0.5171	0.1196	0.4690	95.55	0.1035	0.0239	0.0936	95.15
500	1	1	0.5	0.1	1.0147	0.2726	1.0685	99.25	1.0169	0.1865	0.7313	99.20	0.5080	0.0854	0.3347	97.90	0.1007	0.0158	0.0619	96.86
50	2	1	0.5	0.9	2.1006	0.5210	2.0423	93.45	0.9929	0.1054	0.4132	95.20	0.5452	0.2426	0.9508	96.05	0.8656	0.5079	1.9910	98.80
100	2	1	0.5	0.9	2.0419	0.3774	1.4793	93.65	0.9948	0.0765	0.2997	94.15	0.5240	0.1396	0.5474	95.20	1.1725	0.8377	3.2837	91.60
200	2	1	0.5	0.9	2.0023	0.2710	1.0623	94.75	0.9998	0.0551	0.2158	95.10	0.5137	0.0883	0.3460	94.65	1.1924	0.8358	3.2763	93.56
500	2	1	0.5	0.9	2.0042	0.1546	0.6061	94.85	0.9988	0.0333	0.1304	95.20	0.5038	0.0514	0.2016	95.30	0.9972	0.3633	1.4239	94.46
50	0	1	1.2	0.9	0.2215	0.5439	2.1320	92.35	0.9627	0.1136	0.4453	93.45	1.0951	0.4731	1.8544	95.65	0.8088	0.5186	2.0329	99.20
100	0	1	1.2	0.9	0.0915	0.4323	1.6945	93.85	0.9854	0.0864	0.3387	94.20	1.2720	0.5473	2.1455	94.15	1.0433	0.7256	2.8445	91.65
200	0	1	1.2	0.9	0.0370	0.3138	1.2301	93.90	0.9939	0.0624	0.2448	94.75	1.3338	0.5349	2.0968	95.25	1.1157	0.7858	3.0803	94.40
500	0	1	1.2	0.9	0.0196	0.1928	0.7559	94.85	0.9965	0.0384	0.1503	94.55	1.2497	0.2424	0.9500	94.65	1.0032	0.4455	1.7464	95.30

In the above, sd corresponds to the standard deviation, Ali corresponds to the average length of the intervals, and C corresponds to the empirical coverage, based on a confidence interval of $95 %$ of the respective EMV of the parameters.

Table 8

Parameter estimates for GSLP2, GSLP, and MGSLP distributions.

Parameter Estimates	GSN2 (sd)	GSN (sd)	MGSN (sd)
$\hat{μ}$	101.1921 (1.8598)	73.9999 (0.0278)	66.9997 (0.0543)
$\hat{σ}$	20.3161 (1.3864)	11.5685 (0.70151)	2.6399 (0.0748)
$\hat{λ}$	−8.5204 (4.4833)	−7.9486 (3.9974)	0.0583 (0.0149)
$\hat{α}$	1.0879 (0.0132)	10.9371 (6.0598)	3.6810 (1.7438)
AIC	2181.30	2148.54	2095.638
CAIC	2195.72	2162.96	2114.061

Appendix A

Table A1

Data corresponding to Kaposis sarcoma.

Age	Number
1	1
5	89
10	342
15	718
20	2352
25	3593
30	3243
35	2533
40	2015
45	1747
50	1562
55	1662
60	1801
65	1915
70	1855
75	1611
80	1203
85	642
90	247

Table A2

Data corresponding to eruption time.

79	74	65	49	51	49	78	79
54	52	73	83	86	57	46	64
74	48	82	81	53	77	77	75
62	80	56	47	79	68	84	47
85	59	79	84	81	81	49	86
55	90	71	52	60	81	83	63
88	80	62	86	82	73	71	85
85	58	76	81	77	50	80	82
51	84	60	75	76	85	49	57
85	58	78	59	59	74	75	82
54	73	76	89	80	55	64	67
84	83	83	79	49	77	76	74
78	64	75	59	96	83	53	54
47	53	82	81	53	83	94	83
83	82	70	50	77	51	55	73
52	59	65	85	77	78	76	73
62	75	73	59	65	84	50	88
84	90	88	87	81	46	82	80
52	54	76	53	71	83	54	71
79	80	80	69	70	55	75	83
51	54	48	77	81	81	78	56
47	83	86	56	93	57	79	79
78	71	60	88	53	76	78	78
69	64	90	81	89	84	78	84
74	77	50	45	45	77	70	58
83	81	78	82	86	81	79	83
55	59	63	55	58	87	70	43
76	84	72	90	78	77	54	60
78	48	84	45	66	51	86	75
79	82	75	83	76	78	50	81
73	60	51	56	63	60	90	46
77	92	82	89	88	82	54	90
66	78	62	46	52	91	54	46
80	78	88	82	93	53	77	74

References

1. Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistica; 1986; 46, pp. 199-208.

2. Elal-Olivero, D. Alpha-skew-normal distribution. Proyecciones; 2010; 29, pp. 224-240. [DOI: https://dx.doi.org/10.4067/S0716-09172010000300006]

3. Gómez, H.W.; Elal-Olivero, D.; Salinas, H.S.; Bolfarine, H. Bimodal extension based on the skew-normal distribution with application to pollen data. Environmetrics; 2011; 22, pp. 50-62. [DOI: https://dx.doi.org/10.1002/env.1026]

4. Venegas, O.; Salinas, H.S.; Gallardo, D.I.; Bolfarine, H.; Gómez, H.W. Bimodality based on the generalized skew-normal distribution. J. Stat. Comput. Simul.; 2018; 88, pp. 156-181. [DOI: https://dx.doi.org/10.1080/00949655.2017.1381698]

5. Bolfarine, H.; Martínez-Flórez, G.; Salinas, H.S. Bimodal symmetric-asymmetric power-normal families. Commun. Stat. Theory Methods; 2018; 47, pp. 259-276. [DOI: https://dx.doi.org/10.1080/03610926.2013.765475]

6. Fisher, R.A. The effect of methods of ascertainment upon the estimation of frequencies. Ann. Eugen.; 1934; 6, pp. 13-25. [DOI: https://dx.doi.org/10.1111/j.1469-1809.1934.tb02105.x]

7. Rao, C.R. On discrete distributions arising out of methods of ascertainment. Sankhyā Indian J. Stat. Ser. A; 1965; 27, pp. 311-324.

8. Gómez-Déniz, E.; Arnold, B.C.; Sarabia, J.M.; Gómez, H.W. Properties and Applications of a New Family of Skew Distributions. Mathematics; 2021; 9, 87. [DOI: https://dx.doi.org/10.3390/math9010087]

9. Gómez-Déniz, E.; Calderín-Ojeda, E.; Sarabia, J.M. Bimodal and Multimodal Extensions of the Normal and Skew Normal Distribution s. Stat. J.; 2023; accepted and available on the internet

10. Reyes, J.; Gómez-Déniz, E.; Gómez, H.W.; Calderín-Ojeda, E. A Bimodal Extension of the Exponential Distribution with Applications in Risk Theory. Symmetry; 2021; 13, 679. [DOI: https://dx.doi.org/10.3390/sym13040679]

11. Reyes, J.; Arrué, J.; Leiva, V.; Martin-Barreiro, C. A New Birnbaum- Saunders Distribution and Its Mathematical Features Applied to Bimodal Real-World Data from Environment and Medicine. Mathematics; 2021; 9, 1891. [DOI: https://dx.doi.org/10.3390/math9161891]

12. Henze, N. A probabilistic representation of the Skew-Normal distribution. Scand. J. Stat.; 1986; 4, pp. 271-275.

13. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control.; 1974; 19, pp. 716-723. [DOI: https://dx.doi.org/10.1109/TAC.1974.1100705]

14. Bozdogan, H. The general theory and its analytical extension. Psychometrika; 1974; 52, pp. 345-370. [DOI: https://dx.doi.org/10.1007/BF02294361]

15. Owen, D. Tables for computing bivariate normal probabilities. Ann. Math. Stat.; 1956; 27, pp. 1075-1090. [DOI: https://dx.doi.org/10.1214/aoms/1177728074]

Word count: 6185

Show less

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

The skew distribution has the characteristic of appropriately modeling asymmetric unimodal data. However, in practice, there are several cases in which the data present more than one mode. In the literature, it is possible to find a large number of authors who have studied extensions based on the skew distribution to model this type of data. In this article, a new family is introduced, consisting of a multimodal modification to the family of skew distributions. Using the methodology of the weighted version of a function, we perform the product of the density function of a family of skew distributions with a polynomial of degree 4, thus obtaining a more flexible model that allows modeling data sets, whose distribution contains at most three modes. The density function, some properties, moments, skewness coefficients, and kurtosis of this new family are presented. This study focuses on the particular cases of skew-normal and Laplace distributions, although it can be applied to any other distribution. A simulation study was carried out, to study the behavior of the model parameter estimates. Illustrations with real data, referring to medicine and environmental data, show the practical performance of the proposed model in the two particular cases presented.

Details

Title

A New Multimodal Modification of the Skew Family of Distributions: Properties and Applications to Medical and Environmental Data

Author

Reyes, Jimmy

; Rojas, Mario A; Cortés, Pedro L; Arrué, Jaime

First page

1224

Publication year

2024

Publication date

2024

Publisher

MDPI AG

e-ISSN

20738994

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/sym16091224

ProQuest document ID

3110702134

A New Multimodal Modification of the Skew Family of Distributions: Properties and Applications to Medical and Environmental Data

Jump to:

Full text

2. Modified Generalized Skew Distribution

2.1. Density Function

2.2. Important Results

2.3. Moments

2.4. $M G S$ Distribution with Location and Scale Parameters

2.5. Log-Likelihood Function

3. Normal Distribution Case

3.1. Moments

3.2. Estimate

3.3. Simulation Study

3.3.1. Algorithm

3.3.2. Simulation Results

3.4. Applications for the Normal Case

3.4.1. Application 1

3.4.2. Application 2

4. Laplace Distribution Case

4.1. Simulation Study for the Case of the Laplace Distribution

4.2. Application for the Laplace Distribution Case

Abstract

Details

Suggested sources

79	74	65	49	51	49	78	79
54	52	73	83	86	57	46	64
74	48	82	81	53	77	77	75
62	80	56	47	79	68	84	47
85	59	79	84	81	81	49	86
55	90	71	52	60	81	83	63
88	80	62	86	82	73	71	85
85	58	76	81	77	50	80	82
51	84	60	75	76	85	49	57
85	58	78	59	59	74	75	82
54	73	76	89	80	55	64	67
84	83	83	79	49	77	76	74
78	64	75	59	96	83	53	54
47	53	82	81	53	83	94	83
83	82	70	50	77	51	55	73
52	59	65	85	77	78	76	73
62	75	73	59	65	84	50	88
84	90	88	87	81	46	82	80
52	54	76	53	71	83	54	71
79	80	80	69	70	55	75	83
51	54	48	77	81	81	78	56
47	83	86	56	93	57	79	79
78	71	60	88	53	76	78	78
69	64	90	81	89	84	78	84
74	77	50	45	45	77	70	58
83	81	78	82	86	81	79	83
55	59	63	55	58	87	70	43
76	84	72	90	78	77	54	60
78	48	84	45	66	51	86	75
79	82	75	83	76	78	50	81
73	60	51	56	63	60	90	46
77	92	82	89	88	82	54	90
66	78	62	46	52	91	54	46
80	78	88	82	93	53	77	74

79	74	65	49	51	49	78	79
54	52	73	83	86	57	46	64
74	48	82	81	53	77	77	75
62	80	56	47	79	68	84	47
85	59	79	84	81	81	49	86
55	90	71	52	60	81	83	63
88	80	62	86	82	73	71	85
85	58	76	81	77	50	80	82
51	84	60	75	76	85	49	57
85	58	78	59	59	74	75	82
54	73	76	89	80	55	64	67
84	83	83	79	49	77	76	74
78	64	75	59	96	83	53	54
47	53	82	81	53	83	94	83
83	82	70	50	77	51	55	73
52	59	65	85	77	78	76	73
62	75	73	59	65	84	50	88
84	90	88	87	81	46	82	80
52	54	76	53	71	83	54	71
79	80	80	69	70	55	75	83
51	54	48	77	81	81	78	56
47	83	86	56	93	57	79	79
78	71	60	88	53	76	78	78
69	64	90	81	89	84	78	84
74	77	50	45	45	77	70	58
83	81	78	82	86	81	79	83
55	59	63	55	58	87	70	43
76	84	72	90	78	77	54	60
78	48	84	45	66	51	86	75
79	82	75	83	76	78	50	81
73	60	51	56	63	60	90	46
77	92	82	89	88	82	54	90
66	78	62	46	52	91	54	46
80	78	88	82	93	53	77	74

A New Multimodal Modification of the Skew Family of Distributions: Properties and Applications to Medical and Environmental Data

Jump to:

Full text

2. Modified Generalized Skew Distribution

2.1. Density Function

2.2. Important Results

2.3. Moments

2.4. MGS Distribution with Location and Scale Parameters

2.5. Log-Likelihood Function

3. Normal Distribution Case

3.1. Moments

3.2. Estimate

3.3. Simulation Study

3.3.1. Algorithm

3.3.2. Simulation Results

3.4. Applications for the Normal Case

3.4.1. Application 1

3.4.2. Application 2

4. Laplace Distribution Case

4.1. Simulation Study for the Case of the Laplace Distribution

4.2. Application for the Laplace Distribution Case

Abstract

Details

2.4. $M G S$ Distribution with Location and Scale Parameters

79	74	65	49	51	49	78	79
54	52	73	83	86	57	46	64
74	48	82	81	53	77	77	75
62	80	56	47	79	68	84	47
85	59	79	84	81	81	49	86
55	90	71	52	60	81	83	63
88	80	62	86	82	73	71	85
85	58	76	81	77	50	80	82
51	84	60	75	76	85	49	57
85	58	78	59	59	74	75	82
54	73	76	89	80	55	64	67
84	83	83	79	49	77	76	74
78	64	75	59	96	83	53	54
47	53	82	81	53	83	94	83
83	82	70	50	77	51	55	73
52	59	65	85	77	78	76	73
62	75	73	59	65	84	50	88
84	90	88	87	81	46	82	80
52	54	76	53	71	83	54	71
79	80	80	69	70	55	75	83
51	54	48	77	81	81	78	56
47	83	86	56	93	57	79	79
78	71	60	88	53	76	78	78
69	64	90	81	89	84	78	84
74	77	50	45	45	77	70	58
83	81	78	82	86	81	79	83
55	59	63	55	58	87	70	43
76	84	72	90	78	77	54	60
78	48	84	45	66	51	86	75
79	82	75	83	76	78	50	81
73	60	51	56	63	60	90	46
77	92	82	89	88	82	54	90
66	78	62	46	52	91	54	46
80	78	88	82	93	53	77	74