1. Introduction
Let h and G, respectively, be a probability density function (pdf) symmetric with respect to zero and a cumulative distribution function (cdf), such that the derivative of G is symmetric with respect to zero. Then,
(1)
is a density function for all in the reals, Azzalini [1], where is a skewness parameter, denoted by . In the case that h and G are the pdf and cdf of the standard normal distribution in (1), the resulting distribution is called the skew-normal distribution, represented by the expression , denoted by . Furthermore, when a random variable follows a skew-normal distribution with location parameters , scale , and skewness , it will be denoted by .Although the skew distribution (see [1]) can function appropriately in a wide variety of environments where the data exhibit unimodality, this model does not perform well in the presence of multimodality, that is, when there are multiple modes or peaks in the distribution empirical. The presence of multimodality can be explained by different reasons, including the existence of multiple groups or subpopulations with unique characteristics, or by the existence of latent variables that significantly influence the distribution of the population. In such cases, a mixed distribution is one of the first alternatives considered for modeling; however, its use implies addressing the problem of non-identifiability. Various methods for introducing new flexible probability distributions can be found in the statistical literature. There are many examples that we could mention, but the approaches proposed in Elal-Olivero [2], Gómez et al. [3], Venegas et al. [4], and Bolfarine et al. [5] are especially attractive when trying to propose a new bimodal distribution. The objective of this article was to develop an alternative multimodal family for the skew-normal distribution, for which we propose a weighted version, Fisher [6] and Rao [7], of the skew distribution that can present asymmetric shapes with up to three modes. We provide evidence that new family performance, being flexible in both asymmetry and ways involving bimodality, can overcome some important distributions in the literature.
Gómez-Déniz et al. [8,9] present two extensions of the skew-normal family, to model bimodality and multimodality.
The first is defined by
(2)
where is a density function that is symmetric about zero, and where is a cdf of a distribution also symmetric about zero, .The second is defined as follows: if f is a symmetric pdf around 0, defined by , with , where and F is the corresponding cdf, then we have the following family of bimodal asymmetric distributions:
(3)
These models present more flexibility than the skew family of distributions, since for different values of the parameters they provide a distribution that can present unimodality or bimodality. On the other hand, Reyes et al. in [10,11] present bimodal distributions for the exponential case and Birnbaum Saunders, respectively. In this paper, we present a modification to the family of skew distributions given in Equation (1), which also includes the Azzalini family of skew distributions (see Azzalini [1]) as a particular case. The methodology used is based on the multiplication of Azzalini’s proposal by a polynomial of degree 4 and by adding a new parameter to the family. This new family is shown as an alternative to the families presented by Gómez-Déniz et al. [8,9].
This article is organized as follows: In Section 2, an expression is obtained for the pdf of the new family along with its most relevant properties: moments, kurtosis coefficient, and log-likehood function. In Section 3, the particular case of the normal distribution is studied. In addition, a simulation study is included, in which the behavior of the estimators of the proposed family for this particular case is evaluated. Two applications to real data are shown, one related to medical data and the other to environmental data. In Section 4, the particular case of the Laplace distribution is studied. In addition, a simulation study is included, in which the behavior of the estimators of the proposed family for this particular case is evaluated. An application to environmental data is shown. Finally, Section 5 presents the discussion.
2. Modified Generalized Skew Distribution
2.1. Density Function
Let Y be a random variable, let h be a density function symmetric with respect to zero, and let G be a cumulative distribution function whose density is also symmetric with respect to zero. We will say that Y is a distributed Modified Generalized Skew (MGS) with parameters that control the number of modes and the skewness, denoted by .
Let ; then, the density function of Y is given by
(4)
where , , , and is the moment of order 4 of a random variable X with a skew distribution of parameter λ.2.2. Important Results
In this section, we present some results of the MGS distribution.
Let , , and ; then:
1.. .
2.. .
3.. .
Item 1 indicates that if both parameters are zero then the family of symmetric density functions is recovered. Item 2 shows that when a family of uni or bimodal symmetric distributions is obtained. Finally, Item 3 indicates that if then the family of skew distributions is obtained.
The above results are illustrated in the following diagram:
2.3. Moments
The following statement shows the moments for the distribution. These depend on the moments of the skew distribution.
If then for we have
where , , , and are the moments of order r of a random variable X with a skew distribution of parameter λ.
The first four moments of Y are given in the following corollary:
If then
Replacing these expressions in Proposition 1, for the results are obtained. □
If then
For r even and odd we obtain what is required. □
If then
Using Corollary 2 and substituting into the standardized skewness coefficients () and kurtosis given by
respectively, the result is obtained. □2.4. Distribution with Location and Scale Parameters
The family of distributions can be extended by means of a linear transformation, introducing location and scale parameters, adding more flexibility to the model proposed in (4).
Let ; then, follows a Modified Generalized Skew model with location parameters and scale denoted by , and its density function is given by
(5)
where , , , and is the moment of order 4 of a random variable X with a skew distribution of parameter .The moments of the distribution of are given by
Let ; then,
are the moments of order r of a random variable X with a skew distribution of parameter λ.
By developing the Newton binomial and placing the moments given in Proposition 1 into the result is obtained. □
2.5. Log-Likelihood Function
Let be a random sample of a variable Z, such that with ; then, the log-likelihood function is
Partially deriving the log-likelihood function with respect to the parameters and solving the system of equations in numerical form, we obtain the maximum likelihood estimators of the parameters , , , and .
3. Normal Distribution Case
Let us consider the particular case in Equation (5) when and . If a random variable follows a Modified Generalized Skew Normal (MGSN) distribution then we will denote it by , and its pdf is given by
(6)
where , , , , and .Figure 1 shows the density function of the proposed model MGSN for the parameters , , and different values of and compared to the Gómez-Déniz [8] model for the normal case, called the Generalized Skew Normal (GSN) distribution. In this representation, the great flexibility of the new distribution can be seen to model unimodal, bimodal, and trimodal data with only two parameters, while the GSN model is only unimodal using the same number of parameters:
If then its density function presents at most three modes.
Without losing generality, we consider and the parameter only affects the asymmetry; we can assume in the density given in (6); then,
Differentiating and equating to zero, we have
resulting in a polynomial of degree 5, that is, it has at most three maximums. For the normal case, , and values of , the density is unimodal. Otherwise, it is trimodal when is finite or bimodal when . □In Figure 2, it can be observed that the graphical representation of the MGSN model when for values of is unimodal, is trimodal, and when it is bimodal.
3.1. Moments
The moments for the distribution are obtained by substituting into Corollary 1 the moments of the skew-normal distribution given by Henze [12]:
Figure 3 shows the graphs of the skewness and kurtosis coefficients of the MGSN distribution for , , and different values of and . In the left panel, it can be seen that for a fixed value of the skewness coefficient is an odd function with respect to . As an example, given , the value of the skewness coefficient for is and for it is . In the right panel, we can see that given a fixed value of the kurtosis coefficient is an even function with respect to . For example, given , the value of the kurtosis coefficient for is and for it is .
Figure 4 shows, in the right panel, the profile of the asymmetric coefficient for different values of . It can be seen that for the profile coincides with the profile of the skew coefficient of the skew-normal distribution. Furthermore, through exploratory analysis we can conclude that if and then converges to . Similarly, we have that for the profile of the kurtosis, shown in the right panel, coincides with the profile of the kurtosis coefficient of the skew-normal distribution. Also, through exploratory analysis, we can conclude that if and then the value of converges to and if and then the value of converges to .
The skewness and kurtosis values for fixed values of and , obtained from Table 1, show numerically that the skewness and kurtosis coefficients are even and odd functions with respect to , respectively.
3.2. Estimate
Let be a random sample of a variable Z, such that with ; then, the log-likelihood function is
After deriving the log-likelihood function, the normal equations are given by
Maximum Likelihood Estimators (MLE) are obtained, maximizing normal equations. These equations do not allow an analytical solution, so it is necessary to use iterative methods.3.3. Simulation Study
There are many programs that provide built-in random number generators, but there are probability distributions that are not covered by such software. In the case of the MGSN distribution, we use the acceptance–rejection method to generate random numbers of the distribution with the pdf defined in (6), according to the algorithm below. The results of a sequence of n random numbers are stored within a matrix that we call the n-vector. Since the MGSN distribution has non-finite support, we use a constant to limit the generated MGSN values. Furthermore, we consider another constant corresponding to the maximum value of the pdf MGSN, which must be evaluated in the true parameters.
3.3.1. Algorithm
To start the algorithm, we need to define the parameters , , , and of the MGSN distribution, as follows:
n: the length of the n-vector.
Y: a random variable with distribution.
: the MGSN pdf with .
: a lower limit for the MGSN numbers to be generated with .
: the maximum value of with .
: a random variable with a uniform distribution in , , in short.
: a random variable with a distribution.
Acceptance–rejection algorithm to generate numbers from the distribution:
Begin Input: n, , , ,
Output: n-vector,
Set ;
Generate a value from ;
Obtain a value from ;
Set from if , append y to n-vector; otherwise, go back to step 3;
Repeat steps 3–5 until the length of n-vector is equal to n;
end
Computational simulations were performed in the R programming language, using the “optim” function quasi-Newton method “BFGS” from the “stats” package. We used a computer with the following characteristics: (i) OS: Windows 10 Pro 64-bit; (ii) RAM: 8 GB; and (iii) Processor: Intel(R) Core(TM) i7-8550U CPU at 1.99 GigaHertz. The algorithm above was run 2000 times with n = 50, 100, 200, and 500; the average processing time was 0.04565 s. Below, we show the EMVs obtained from the model for different parameter values and random sample sizes, using the acceptance–rejection algorithm.
3.3.2. Simulation Results
Table 2 presents the results of the simulation study, illustrating the behavior of the MLE for 2000 samples of sizes 50, 100, 200, and 500 of a population with distribution . Also, it can be seen that the estimates of the parameters are quite close to the true value, and that the standard deviations and average lengths of the intervals are small. These results show the expected asymptotic behavior. On the other hand, the empirical hedges are very close in all cases to the nominal value of confidence.
3.4. Applications for the Normal Case
In this section, we show two real data applications for the MGSN model given in (6) and compare their results with the proposed models given in [8,9] for the normal and skew-normal cases (GSN) and (GSN2), respectively, given in (2) and (3), considering location and scale parameters, as follows:
and3.4.1. Application 1
The data used in Application 1 correspond to the age and frequency of cancer called Kaposis sarcoma. This is a type of cancer that can form masses in the skin, lymph, nodes, or other organs without distinguishing the subtypes. The data were collected from the website of the Office for National Statistics (ONS, Health Statistics section), and they can be seen in Table A1 in the Appendix (see Appendix A). It can be seen that there is a greater incidence in individuals aged around 25 years, as well as for those aged about 60 years. The records were taken during the years 1995 to 2016 and correspond to different regions of the UK.
Table 3 shows descriptive summary measures of data related to Kaposis sarcoma. Table 4 shows the values of the maximum likelihood estimates and their corresponding standard deviations for the GSN2, GSN, and MGSN models. Using the Akaike Information Criterion (AIC) [13] and the Akaike Consistent Information Criterion (CAIC) [14], it can be seen that the MGSN model presents a better fit, since its value is lower. Figure 5 shows the histogram and plot of the GSN2, GSN, and MGSN models for the Kaposis sarcoma data set. Through the graphical representation, it can be seen that the MGSN model apparently fits the data better.
3.4.2. Application 2
The second data set corresponds to the duration of the Old Faithful geyser eruption (see Appendix, Table A2) in Yellowstone National Park, WY, USA [15]. Table 5 shows the descriptive summary measures of the data related to the duration of the Old Faithful Geyser eruption. Table 6 shows the values of the maximum likelihood estimates and their corresponding standard deviations for the GSN2, GSN, and MGSN models. Using the AIC [13] and CAIC [14] criteria, it can be seen that the MGSN model presents a better fit, because its values are smaller. Figure 6 shows the histogram and graphical representation of the GSN2, GSN, and MGSN models for the eruption time data set. Through the graphical representation, it can be seen that the MGSN model apparently fits the data better.
4. Laplace Distribution Case
Let us consider the particular case in Equation (5) when h and G are, respectively, the cumulative and density function of the Laplace distribution. If a random variable follows a Modified Generalized Skew Laplace (MGSLP) distribution, we will denote it by , and its pdf is given by
(7)
where , , , , and .4.1. Simulation Study for the Case of the Laplace Distribution
Table 7 presents the results of the simulation study, illustrating the behavior of the MLE for 2000 samples of sizes n = 50, 100, 200, and 500 of a population with distribution . Also, it can be seen that the estimates of the parameters are quite close to the true value, and that the standard deviations and average lengths of the intervals are small. These results show the expected asymptotic behavior. On the other hand, the empirical hedges are very close in all cases to the nominal value of confidence.
4.2. Application for the Laplace Distribution Case
In this section, we show one real-data application for the MGSLP model given in (7) and compare the results with the models proposed in [8,9] for the Laplace and skew-Laplace cases (GSLP) and (GSLP2), respectively, given in (2) and (3), as follows:
and where f and F correspond to the density and cumulative distribution of the Laplace distribution, respectively.For the data corresponding to the duration of the Old Faithful geyser eruption (see Appendix A, Table A2) in Yellowstone National Park, Wyoming, USA [15], Table 8 shows the values of the maximum likelihood estimates and their corresponding standard deviations for the GSLP2, GSLP, and MGSLP models. Using the AIC [13] and CAIC [14] criteria, it can be seen that the MGSLP model presents a better fit because its values are smaller. Figure 7 shows the histogram and graphical representation of the GSLP2, GSLP, and MGSLP models for the eruption time data set. Through the graphical representation, it can be seen that the MGSLP model apparently best fits the eruption time data set.
5. Discussion
We have proposed a new family based on a weighted version of the skew distribution, which has a parameter, , that allows modeling data sets that present one, two, or three modes. That is, we have a family of models that are more flexible than the distributions proposed by Gómez-Déniz et al. [8,9], considering that these have the same number of parameters. Its density function, moments, and some properties were studied; it should be noted that the mathematical treatment is less complex than other distributions given in the current literature. In particular, when the parameter takes the value zero the new family recovers the family of skew distributions. Two particular cases of the new model were studied, one for the normal distribution and the other for the Laplace distribution. A simulation algorithm was developed, using the acceptance–rejection method, to obtain random samples of different sizes from the proposed model, for the two particular cases. Subsequently, 2000 iterations were carried out for each of these samples, obtaining the estimates through the maximum likelihood method, using the “optim” function of the R software, for different values of , , , and . This study allowed us to observe the good asymptotic behavior of the parameter estimates. Two applications were carried out with real data, one related to medicine and the other to the environment, where it was empirically shown that the proposed family fits better than the families presented by Gómez-Déniz et al. [8,9]. This new model is a potential contribution for professionals who work in data analysis and/or users of statistics.
Data curation, J.R.; formal analysis, J.R., M.A.R., P.L.C., and J.A.; investigation, J.R., M.A.R., and P.L.C.; methodology, J.R., M.A.R., P.L.C., and J.A.; writing—original draft, J.R., M.A.R., P.L.C., and J.A.; writing—review and editing, M.A.R., P.L.C., and J.A.; Funding Acquisition, J.R., M.A.R., and J.A. All authors have read and agreed to the published version of the manuscript.
Data are contained within the article.
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1. Plot of MGSN pdf (solid line) and GSN pdf (dashed line) for different values of [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.].
Figure 2. Plot of the MGSN model for the case [Forumla omitted. See PDF.] and different values of [Forumla omitted. See PDF.].
Figure 3. Plots of the skewness (left) and kurtosis (right) of the MGSN distribution.
Figure 4. Profile of coefficient skewness (left) and kurtosis (right) of the MGSN distribution.
Figure 5. MGSN distribution (solid line), GSN distribution (dashed line), and GSN2 distribution (dotted line) for the Kaposis sarcoma data.
Figure 6. Histogram and graphical representation of MGSN distribution (solid line), GSN distribution (dashed line), and GSN2 distribution (dotted line) for the eruption time data.
Figure 7. Histogram for the eruption time data set and the fit of the graphs for the MGSLP (solid line), GSLP (dashed line), and GSLP2 (dotted line) distributions.
Coefficients skewness and kurtosis values of the MGSN model for different values of
Coefficient Skewness | Coefficient Kurtosis | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
| | | | | | | | | | |
1 | 0.1047 | 0.2326 | 0 | −0.2326 | −0.1047 | 2.7460 | 2.9395 | 1.6875 | 2.9395 | 2.7460 |
2 | 0.1413 | 0.2907 | 0 | −0.2907 | −0.1413 | 3.1079 | 3.4253 | 1.5515 | 3.4253 | 3.1079 |
3 | 0.1029 | 0.2659 | 0 | −0.2659 | −0.1029 | 3.2536 | 3.6492 | 1.5028 | 3.6492 | 3.2536 |
4 | 0.0588 | 0.2307 | 0 | −0.2307 | −0.0588 | 3.3133 | 3.7625 | 1.4778 | 3.7625 | 3.3133 |
5 | 0.0195 | 0.1976 | 0 | −0.1976 | −0.0195 | 3.3367 | 3.8247 | 1.4626 | 3.8247 | 3.3367 |
6 | −0.0139 | 0.1689 | 0 | −0.1689 | 0.0139 | 3.3436 | 3.8609 | 1.4524 | 3.8609 | 3.3436 |
7 | −0.0419 | 0.1444 | 0 | −0.1444 | 0.0419 | 3.3425 | 3.8828 | 1.4450 | 3.8828 | 3.3425 |
8 | −0.0656 | 0.1234 | 0 | −0.1234 | 0.0656 | 3.3376 | 3.8962 | 1.4395 | 3.8962 | 3.3376 |
9 | −0.0859 | 0.1055 | 0 | −0.1055 | 0.0859 | 3.3310 | 3.9046 | 1.4351 | 3.9046 | 3.3310 |
10 | −0.1033 | 0.0899 | 0 | −0.0899 | 0.1033 | 3.3236 | 3.9098 | 1.4316 | 3.9098 | 3.3236 |
11 | −0.1184 | 0.0764 | 0 | −0.0764 | 0.1184 | 3.3160 | 3.9129 | 1.4288 | 3.9129 | 3.3160 |
12 | −0.1316 | 0.0645 | 0 | −0.0645 | 0.1316 | 3.3085 | 3.9146 | 1.4264 | 3.9146 | 3.3085 |
13 | −0.1432 | 0.0540 | 0 | −0.0540 | 0.1432 | 3.3014 | 3.9153 | 1.4244 | 3.9153 | 3.3014 |
14 | −0.1536 | 0.0446 | 0 | −0.0446 | 0.1536 | 3.2946 | 3.9154 | 1.4227 | 3.9154 | 3.2946 |
15 | −0.1628 | 0.0362 | 0 | −0.0362 | 0.1628 | 3.2882 | 3.9151 | 1.4212 | 3.9151 | 3.2882 |
16 | −0.1711 | 0.0287 | 0 | −0.0287 | 0.1711 | 3.2821 | 3.9145 | 1.4199 | 3.9145 | 3.2821 |
17 | −0.1786 | 0.0219 | 0 | −0.0219 | 0.1786 | 3.2765 | 3.9137 | 1.4187 | 3.9137 | 3.2765 |
18 | −0.1854 | 0.0157 | 0 | −0.0157 | 0.1854 | 3.2711 | 3.9128 | 1.4177 | 3.9128 | 3.2711 |
19 | −0.1916 | 0.0100 | 0 | −0.0100 | 0.1916 | 3.2662 | 3.9118 | 1.4167 | 3.9118 | 3.2662 |
20 | −0.1973 | 0.0049 | 0 | −0.0049 | 0.1973 | 3.2615 | 3.9107 | 1.4159 | 3.9107 | 3.2615 |
Simulation of 2000 iterations for parameter estimates for the model
n | | | | | | sd | Ali | C | | sd | Ali | C | | sd | Ali | C | | sd | Ali | C |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
50 | 0 | 1 | −0.5 | 0.4 | 0.0018 | 0.4781 | 1.8743 | 93.55 | 1.0014 | 0.1538 | 0.6028 | 94.10 | −0.5577 | 0.4682 | 1.8354 | 96.30 | 0.5471 | 0.3396 | 1.3314 | 93.80 |
100 | 0 | 1 | −0.5 | 0.4 | 0.0090 | 0.3758 | 1.4730 | 95.40 | 1.0045 | 0.1202 | 0.4712 | 95.30 | −0.5226 | 0.2915 | 1.1427 | 96.85 | 0.4688 | 0.1826 | 0.7159 | 94.30 |
200 | 0 | 1 | −0.5 | 0.4 | 0.0079 | 0.2796 | 1.0961 | 95.50 | 1.0007 | 0.0919 | 0.3602 | 95.45 | −0.5153 | 0.2800 | 1.0977 | 98.30 | 0.4357 | 0.1102 | 0.4319 | 94.10 |
500 | 0 | 1 | −0.5 | 0.4 | 0.0108 | 0.1640 | 0.6428 | 95.80 | 1.0035 | 0.0540 | 0.2118 | 95.35 | −0.5087 | 0.1024 | 0.4016 | 95.90 | 0.4132 | 0.0619 | 0.2427 | 94.25 |
50 | 0 | 1 | 0.5 | 2 | 0.0015 | 0.2185 | 0.8564 | 95.50 | 1.0009 | 0.0866 | 0.3394 | 95.25 | 0.5146 | 0.1529 | 0.5995 | 95.75 | 2.4454 | 1.2979 | 5.0879 | 91.50 |
100 | 0 | 1 | 0.5 | 2 | 0.0016 | 0.1434 | 0.5621 | 94.60 | 0.9987 | 0.0574 | 0.2250 | 94.50 | 0.5043 | 0.1002 | 0.3928 | 94.90 | 2.3937 | 1.0487 | 4.1107 | 92.75 |
200 | 0 | 1 | 0.5 | 2 | 0.0000 | 0.0991 | 0.3883 | 94.50 | 0.9991 | 0.0406 | 0.1590 | 95.10 | 0.5013 | 0.0693 | 0.2715 | 94.95 | 2.2405 | 0.7663 | 3.0038 | 93.35 |
500 | 0 | 1 | 0.5 | 2 | −0.0018 | 0.0611 | 0.2394 | 94.85 | 0.9998 | 0.0245 | 0.0962 | 95.05 | 0.5012 | 0.0426 | 0.1670 | 95.85 | 2.0869 | 0.4162 | 1.6316 | 94.15 |
50 | 0 | 1 | 1 | 0.5 | 0.0988 | 0.5310 | 2.0815 | 94.40 | 0.9694 | 0.1775 | 0.6956 | 95.50 | 1.2925 | 1.3780 | 5.4016 | 94.45 | 0.6736 | 0.5719 | 2.2417 | 95.80 |
100 | 0 | 1 | 1 | 0.5 | 0.0177 | 0.4679 | 1.8342 | 94.05 | 0.9923 | 0.1497 | 0.5870 | 95.40 | 1.2382 | 1.1636 | 4.5613 | 95.95 | 0.6286 | 0.4146 | 1.6253 | 94.75 |
200 | 0 | 1 | 1 | 0.5 | 0.0195 | 0.3561 | 1.3958 | 94.20 | 0.9921 | 0.1148 | 0.4500 | 94.50 | 1.0597 | 0.5470 | 2.1441 | 96.85 | 0.5683 | 0.2855 | 1.1191 | 96.20 |
500 | 0 | 1 | 1 | 0.5 | 0.0040 | 0.2340 | 0.9171 | 94.75 | 0.9979 | 0.0759 | 0.2974 | 94.60 | 1.0260 | 0.3869 | 1.5166 | 98.95 | 0.5257 | 0.1443 | 0.5657 | 96.05 |
50 | 1 | 2 | −0.5 | 0.4 | 0.9502 | 0.9466 | 3.7106 | 94.10 | 1.9923 | 0.3061 | 1.2000 | 95.15 | −0.5542 | 0.6802 | 2.6664 | 97.90 | 0.5401 | 0.3332 | 1.3062 | 94.60 |
100 | 1 | 2 | −0.5 | 0.4 | 1.0117 | 0.7657 | 3.0015 | 94.75 | 2.0050 | 0.2456 | 0.9629 | 94.80 | −0.5215 | 0.2852 | 1.1181 | 95.95 | 0.4692 | 0.1814 | 0.7111 | 94.15 |
200 | 1 | 2 | −0.5 | 0.4 | 1.0322 | 0.5668 | 2.2219 | 95.80 | 2.0121 | 0.1784 | 0.6992 | 95.05 | −0.5217 | 0.3113 | 1.2204 | 98.85 | 0.4307 | 0.1126 | 0.4415 | 95.60 |
500 | 1 | 2 | −0.5 | 0.4 | 1.0152 | 0.3164 | 1.2401 | 95.45 | 2.0032 | 0.1039 | 0.4074 | 95.85 | −0.5061 | 0.0981 | 0.3844 | 95.70 | 0.4127 | 0.0616 | 0.2414 | 94.15 |
50 | −1 | 2 | 0.5 | 2 | −0.9740 | 0.4348 | 1.7044 | 94.90 | 1.9900 | 0.1679 | 0.6583 | 94.75 | 0.5118 | 0.1479 | 0.5796 | 94.70 | 2.3971 | 1.2988 | 5.0915 | 91.85 |
100 | −1 | 2 | 0.5 | 2 | −0.9940 | 0.2864 | 1.1225 | 94.25 | 1.9954 | 0.1149 | 0.4502 | 95.25 | 0.5084 | 0.1007 | 0.3946 | 94.50 | 2.3808 | 1.0989 | 4.3078 | 92.35 |
200 | −1 | 2 | 0.5 | 2 | −0.9934 | 0.1992 | 0.7809 | 95.00 | 1.9963 | 0.0797 | 0.3125 | 95.05 | 0.5045 | 0.0707 | 0.2772 | 95.60 | 2.2129 | 0.7466 | 2.9267 | 93.75 |
500 | −1 | 2 | 0.5 | 2 | −0.9965 | 0.1254 | 0.4914 | 95.40 | 1.9993 | 0.0501 | 0.1965 | 94.90 | 0.5015 | 0.0435 | 0.1704 | 95.35 | 2.0710 | 0.4002 | 1.5687 | 94.45 |
50 | −1 | 1 | 1 | 0.5 | −0.9152 | 0.5354 | 2.0988 | 95.50 | 0.9668 | 0.1803 | 0.7069 | 95.20 | 1.2361 | 1.1948 | 4.6836 | 93.85 | 0.6921 | 0.6102 | 2.3919 | 95.60 |
100 | −1 | 1 | 1 | 0.5 | −0.9782 | 0.4719 | 1.8498 | 94.95 | 0.9921 | 0.1522 | 0.5966 | 95.30 | 1.1948 | 1.0436 | 4.0910 | 96.40 | 0.6291 | 0.4311 | 1.6900 | 95.55 |
200 | −1 | 1 | 1 | 0.5 | −0.9917 | 0.3801 | 1.4902 | 93.70 | 0.9959 | 0.1221 | 0.4786 | 93.90 | 1.0810 | 0.6083 | 2.3844 | 96.90 | 0.5725 | 0.2917 | 1.1433 | 96.10 |
500 | −1 | 1 | 1 | 0.5 | −1.0064 | 0.2245 | 0.8800 | 94.60 | 1.0007 | 0.0730 | 0.2861 | 94.90 | 1.0203 | 0.2172 | 0.8512 | 94.80 | 0.5274 | 0.1335 | 0.5234 | 94.80 |
In the above, sd corresponds to the standard deviation, Ali corresponds to the average length of the intervals, and C corresponds to the empirical coverage based on a confidence interval of
Summary statistics for Kaposis sarcoma data set.
n | Mean | Variance | Asymmetry | Kurtosis |
---|---|---|---|---|
29,131 | 45.396 | 416.387 | 0.313 | 1.936 |
Parameter estimates for GSN2, GSN, and MGSN distributions for Kaposis sarcoma data set.
Parameter Estimates | GSN2 (sd) | GSN (sd) | MGSN (sd) |
---|---|---|---|
| 37.6241 (0.03552) | 37.029 (0.1313) | 20.5880 (0.1293) |
| 21.0537 (0.0808) | 22.052 (0.1050) | 18.1833 (0.0674) |
| 0.4912 (0.0085) | 4.8080 (0.1180) | 3.9293 (0.0525) |
| 0.0754 (0.0017) | 5.525 (0.1350) | 0.2488 (0.00412) |
AIC | 256,212.1 | 253,832.6 | 249,300.9 |
CAIC | 256,245.2 | 253,869.7 | 249,334.0 |
Summary statistics for the eruption time data set.
n | Mean | Variance | Asymmetry | Kurtosis |
---|---|---|---|---|
272 | 70.897 | 184.8240 | −0.414 | 1.844 |
Parameter estimates for GSN2, GSN, and MGSN distributions for the eruption time data set.
Parameter Estimates | GSN2 (sd) | GSN (sd) | MGSN (sd) |
---|---|---|---|
| 65.1850 (0.2520) | 75.5992 (0.1313) | 57.5424 (1.5939) |
| 13.088 (0.5570) | 14.3610 (0.6651) | 9.2529 (0.4983) |
| 0.6760 (0.1160) | −6.2206 (1.9559) | 1.7219 (0.3005) |
| 0.4660 (0.0380) | 7.5214 (2.4209) | 1.5183 (0.3818) |
AIC | 2248.85 | 2142.43 | 2077.92 |
CAIC | 2266.74 | 2156.95 | 2092.34 |
Simulation of 2000 iterations for parameter estimates for the model
n | | | | | | sd | Ali | C | | sd | Ali | C | | sd | Ali | C | | sd | Ali | C |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
50 | 1 | 1 | 0.5 | 0.1 | 1.0030 | 0.4398 | 1.7242 | 95.45 | 1.0072 | 0.1773 | 0.6948 | 99.25 | 0.6178 | 0.4746 | 1.8606 | 96.20 | 0.1181 | 0.0718 | 0.2814 | 96.40 |
100 | 1 | 1 | 0.5 | 0.1 | 0.9940 | 0.3180 | 1.2465 | 95.70 | 1.0058 | 0.1110 | 0.4353 | 99.25 | 0.5466 | 0.2024 | 0.7936 | 95.60 | 0.1082 | 0.0412 | 0.1615 | 96.40 |
200 | 1 | 1 | 0.5 | 0.1 | 1.0055 | 0.2534 | 0.9932 | 97.50 | 1.0069 | 0.1382 | 0.5418 | 99.60 | 0.5171 | 0.1196 | 0.4690 | 95.55 | 0.1035 | 0.0239 | 0.0936 | 95.15 |
500 | 1 | 1 | 0.5 | 0.1 | 1.0147 | 0.2726 | 1.0685 | 99.25 | 1.0169 | 0.1865 | 0.7313 | 99.20 | 0.5080 | 0.0854 | 0.3347 | 97.90 | 0.1007 | 0.0158 | 0.0619 | 96.86 |
50 | 2 | 1 | 0.5 | 0.9 | 2.1006 | 0.5210 | 2.0423 | 93.45 | 0.9929 | 0.1054 | 0.4132 | 95.20 | 0.5452 | 0.2426 | 0.9508 | 96.05 | 0.8656 | 0.5079 | 1.9910 | 98.80 |
100 | 2 | 1 | 0.5 | 0.9 | 2.0419 | 0.3774 | 1.4793 | 93.65 | 0.9948 | 0.0765 | 0.2997 | 94.15 | 0.5240 | 0.1396 | 0.5474 | 95.20 | 1.1725 | 0.8377 | 3.2837 | 91.60 |
200 | 2 | 1 | 0.5 | 0.9 | 2.0023 | 0.2710 | 1.0623 | 94.75 | 0.9998 | 0.0551 | 0.2158 | 95.10 | 0.5137 | 0.0883 | 0.3460 | 94.65 | 1.1924 | 0.8358 | 3.2763 | 93.56 |
500 | 2 | 1 | 0.5 | 0.9 | 2.0042 | 0.1546 | 0.6061 | 94.85 | 0.9988 | 0.0333 | 0.1304 | 95.20 | 0.5038 | 0.0514 | 0.2016 | 95.30 | 0.9972 | 0.3633 | 1.4239 | 94.46 |
50 | 0 | 1 | 1.2 | 0.9 | 0.2215 | 0.5439 | 2.1320 | 92.35 | 0.9627 | 0.1136 | 0.4453 | 93.45 | 1.0951 | 0.4731 | 1.8544 | 95.65 | 0.8088 | 0.5186 | 2.0329 | 99.20 |
100 | 0 | 1 | 1.2 | 0.9 | 0.0915 | 0.4323 | 1.6945 | 93.85 | 0.9854 | 0.0864 | 0.3387 | 94.20 | 1.2720 | 0.5473 | 2.1455 | 94.15 | 1.0433 | 0.7256 | 2.8445 | 91.65 |
200 | 0 | 1 | 1.2 | 0.9 | 0.0370 | 0.3138 | 1.2301 | 93.90 | 0.9939 | 0.0624 | 0.2448 | 94.75 | 1.3338 | 0.5349 | 2.0968 | 95.25 | 1.1157 | 0.7858 | 3.0803 | 94.40 |
500 | 0 | 1 | 1.2 | 0.9 | 0.0196 | 0.1928 | 0.7559 | 94.85 | 0.9965 | 0.0384 | 0.1503 | 94.55 | 1.2497 | 0.2424 | 0.9500 | 94.65 | 1.0032 | 0.4455 | 1.7464 | 95.30 |
In the above, sd corresponds to the standard deviation, Ali corresponds to the average length of the intervals, and C corresponds to the empirical coverage, based on a confidence interval of
Parameter estimates for GSLP2, GSLP, and MGSLP distributions.
Parameter Estimates | GSN2 (sd) | GSN (sd) | MGSN (sd) |
---|---|---|---|
| 101.1921 (1.8598) | 73.9999 (0.0278) | 66.9997 (0.0543) |
| 20.3161 (1.3864) | 11.5685 (0.70151) | 2.6399 (0.0748) |
| −8.5204 (4.4833) | −7.9486 (3.9974) | 0.0583 (0.0149) |
| 1.0879 (0.0132) | 10.9371 (6.0598) | 3.6810 (1.7438) |
AIC | 2181.30 | 2148.54 | 2095.638 |
CAIC | 2195.72 | 2162.96 | 2114.061 |
Appendix A
Data corresponding to Kaposis sarcoma.
Age | Number |
---|---|
1 | 1 |
5 | 89 |
10 | 342 |
15 | 718 |
20 | 2352 |
25 | 3593 |
30 | 3243 |
35 | 2533 |
40 | 2015 |
45 | 1747 |
50 | 1562 |
55 | 1662 |
60 | 1801 |
65 | 1915 |
70 | 1855 |
75 | 1611 |
80 | 1203 |
85 | 642 |
90 | 247 |
Data corresponding to eruption time.
79 | 74 | 65 | 49 | 51 | 49 | 78 | 79 |
54 | 52 | 73 | 83 | 86 | 57 | 46 | 64 |
74 | 48 | 82 | 81 | 53 | 77 | 77 | 75 |
62 | 80 | 56 | 47 | 79 | 68 | 84 | 47 |
85 | 59 | 79 | 84 | 81 | 81 | 49 | 86 |
55 | 90 | 71 | 52 | 60 | 81 | 83 | 63 |
88 | 80 | 62 | 86 | 82 | 73 | 71 | 85 |
85 | 58 | 76 | 81 | 77 | 50 | 80 | 82 |
51 | 84 | 60 | 75 | 76 | 85 | 49 | 57 |
85 | 58 | 78 | 59 | 59 | 74 | 75 | 82 |
54 | 73 | 76 | 89 | 80 | 55 | 64 | 67 |
84 | 83 | 83 | 79 | 49 | 77 | 76 | 74 |
78 | 64 | 75 | 59 | 96 | 83 | 53 | 54 |
47 | 53 | 82 | 81 | 53 | 83 | 94 | 83 |
83 | 82 | 70 | 50 | 77 | 51 | 55 | 73 |
52 | 59 | 65 | 85 | 77 | 78 | 76 | 73 |
62 | 75 | 73 | 59 | 65 | 84 | 50 | 88 |
84 | 90 | 88 | 87 | 81 | 46 | 82 | 80 |
52 | 54 | 76 | 53 | 71 | 83 | 54 | 71 |
79 | 80 | 80 | 69 | 70 | 55 | 75 | 83 |
51 | 54 | 48 | 77 | 81 | 81 | 78 | 56 |
47 | 83 | 86 | 56 | 93 | 57 | 79 | 79 |
78 | 71 | 60 | 88 | 53 | 76 | 78 | 78 |
69 | 64 | 90 | 81 | 89 | 84 | 78 | 84 |
74 | 77 | 50 | 45 | 45 | 77 | 70 | 58 |
83 | 81 | 78 | 82 | 86 | 81 | 79 | 83 |
55 | 59 | 63 | 55 | 58 | 87 | 70 | 43 |
76 | 84 | 72 | 90 | 78 | 77 | 54 | 60 |
78 | 48 | 84 | 45 | 66 | 51 | 86 | 75 |
79 | 82 | 75 | 83 | 76 | 78 | 50 | 81 |
73 | 60 | 51 | 56 | 63 | 60 | 90 | 46 |
77 | 92 | 82 | 89 | 88 | 82 | 54 | 90 |
66 | 78 | 62 | 46 | 52 | 91 | 54 | 46 |
80 | 78 | 88 | 82 | 93 | 53 | 77 | 74 |
References
1. Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistica; 1986; 46, pp. 199-208.
2. Elal-Olivero, D. Alpha-skew-normal distribution. Proyecciones; 2010; 29, pp. 224-240. [DOI: https://dx.doi.org/10.4067/S0716-09172010000300006]
3. Gómez, H.W.; Elal-Olivero, D.; Salinas, H.S.; Bolfarine, H. Bimodal extension based on the skew-normal distribution with application to pollen data. Environmetrics; 2011; 22, pp. 50-62. [DOI: https://dx.doi.org/10.1002/env.1026]
4. Venegas, O.; Salinas, H.S.; Gallardo, D.I.; Bolfarine, H.; Gómez, H.W. Bimodality based on the generalized skew-normal distribution. J. Stat. Comput. Simul.; 2018; 88, pp. 156-181. [DOI: https://dx.doi.org/10.1080/00949655.2017.1381698]
5. Bolfarine, H.; Martínez-Flórez, G.; Salinas, H.S. Bimodal symmetric-asymmetric power-normal families. Commun. Stat. Theory Methods; 2018; 47, pp. 259-276. [DOI: https://dx.doi.org/10.1080/03610926.2013.765475]
6. Fisher, R.A. The effect of methods of ascertainment upon the estimation of frequencies. Ann. Eugen.; 1934; 6, pp. 13-25. [DOI: https://dx.doi.org/10.1111/j.1469-1809.1934.tb02105.x]
7. Rao, C.R. On discrete distributions arising out of methods of ascertainment. Sankhyā Indian J. Stat. Ser. A; 1965; 27, pp. 311-324.
8. Gómez-Déniz, E.; Arnold, B.C.; Sarabia, J.M.; Gómez, H.W. Properties and Applications of a New Family of Skew Distributions. Mathematics; 2021; 9, 87. [DOI: https://dx.doi.org/10.3390/math9010087]
9. Gómez-Déniz, E.; Calderín-Ojeda, E.; Sarabia, J.M. Bimodal and Multimodal Extensions of the Normal and Skew Normal Distribution s. Stat. J.; 2023; accepted and available on the internet
10. Reyes, J.; Gómez-Déniz, E.; Gómez, H.W.; Calderín-Ojeda, E. A Bimodal Extension of the Exponential Distribution with Applications in Risk Theory. Symmetry; 2021; 13, 679. [DOI: https://dx.doi.org/10.3390/sym13040679]
11. Reyes, J.; Arrué, J.; Leiva, V.; Martin-Barreiro, C. A New Birnbaum- Saunders Distribution and Its Mathematical Features Applied to Bimodal Real-World Data from Environment and Medicine. Mathematics; 2021; 9, 1891. [DOI: https://dx.doi.org/10.3390/math9161891]
12. Henze, N. A probabilistic representation of the Skew-Normal distribution. Scand. J. Stat.; 1986; 4, pp. 271-275.
13. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control.; 1974; 19, pp. 716-723. [DOI: https://dx.doi.org/10.1109/TAC.1974.1100705]
14. Bozdogan, H. The general theory and its analytical extension. Psychometrika; 1974; 52, pp. 345-370. [DOI: https://dx.doi.org/10.1007/BF02294361]
15. Owen, D. Tables for computing bivariate normal probabilities. Ann. Math. Stat.; 1956; 27, pp. 1075-1090. [DOI: https://dx.doi.org/10.1214/aoms/1177728074]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The skew distribution has the characteristic of appropriately modeling asymmetric unimodal data. However, in practice, there are several cases in which the data present more than one mode. In the literature, it is possible to find a large number of authors who have studied extensions based on the skew distribution to model this type of data. In this article, a new family is introduced, consisting of a multimodal modification to the family of skew distributions. Using the methodology of the weighted version of a function, we perform the product of the density function of a family of skew distributions with a polynomial of degree 4, thus obtaining a more flexible model that allows modeling data sets, whose distribution contains at most three modes. The density function, some properties, moments, skewness coefficients, and kurtosis of this new family are presented. This study focuses on the particular cases of skew-normal and Laplace distributions, although it can be applied to any other distribution. A simulation study was carried out, to study the behavior of the model parameter estimates. Illustrations with real data, referring to medicine and environmental data, show the practical performance of the proposed model in the two particular cases presented.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer