Content area

Abstract

This study introduces the truncated (u,v)-half-normal distribution, a novel probability model defined on the bounded interval (u,v), with parameters σ and b. This distribution is designed to model processes with restricted domains, ensuring realistic and analytically tractable outcomes. Some key properties of the proposed model, including its cumulative distribution function, probability density function, survival function, hazard rate, and moments, are derived and analyzed. Parameter estimation of σ and b is achieved through a hybrid approach, combining maximum likelihood estimation (MLE) for σ and a likelihood-free-inspired technique for b. A sensitivity analysis highlighting the dependence of σ on b, and an optimal estimation algorithm is proposed. The proposed model is applied to two real-world data sets, where it demonstrates superior performance over some existing models based on goodness-of-fit criteria, such as the known AIC, BIC, CAIC, KS, AD, and CvM statistics. The results emphasize the model’s flexibility and robustness for practical applications in modeling data with bounded support.

Full text

Turn on search term navigation

1. Introduction

Truncated probability distributions (TPD) arise when the domain of a probability distribution is restricted to a specific range, effectively excluding certain values or intervals from the distribution. The motivation of TPD is to enhance sampling, parameter inference, fast convergence in the related region, which will lead to a robust estimation. These modifications alter the fundamental properties of the original distribution, such as its mean, variance, and moments, to reflect the effects of truncation. Existing models for bounded data, that include unit-domain or truncated distributions, suffer from restricted intervals, inflexible parameterization, or normalization complexities. Truncated distributions are ubiquitous in real-world applications where the data naturally adhere to specific bounds, such as in censored data sets, physical measurements constrained by limitations of instruments, or predefined policy thresholds. For example, in medical statistics, ref. [1] used truncated distributions to model survival times when data collection is limited to a specific observation window. Similarly, environmental science leverages truncated models to study phenomena like precipitation levels, which naturally fall within bounded ranges [2]. In many other real-world scenarios, proportions, probabilities, financial decisions, or measurements within a specific range naturally occur within bounded domains [3].

The concept of truncation extends to a wide variety of probability distributions, including the normal, exponential, and Poisson distributions (see, [4]). Among these, the truncated normal distribution holds particular importance due to the centrality of the normal distribution in the statistical theory and applications. Early work by [5] laid the foundation for understanding the properties of truncated distributions, including their moments and implications. This was followed by [6], who provided comprehensive derivations for the mean and variance of the truncated normal distribution and explored their dependency on truncation limits. For a review regarding the properties of the truncated normal distribution, we refer to [7]. The truncated normal distribution has been applied in fields ranging from genetics to finance. In reliability engineering, it is used to model lifetimes of components when failures outside a specific range are unobservable [2].

Truncated or unit domain models effectively capture processes or behaviors constrained by known limits, ensuring realistic outcomes and avoiding impossible values. Models defined on bounded intervals provide meaningful parameter estimates that align with the domain’s restrictions. Using truncated domains prevents predictions from falling outside feasible or observable ranges, enhancing model reliability, and decision-making in practical applications. There are fewer distributions on truncated domain than on (0,), for example, uniform on [a,b], U-quadratic on [a,b], truncated normal on [a,b] (see, [7]), arc-sine on [a,b] (see, [8]), and Pareto distribution on (c,) (see, [9]). Recently, new generalized uniform proposed by [10] and Mustapha type III by [11], are additional instances, among others.

This work is inspired by the significant impact of the half-normal distribution across various fields of study. For instance, [12] recently proposed a modified half-normal distribution, while [13] introduced an extension of the generalized half-normal distribution. The unit-half-normal distribution was explored by [14], and [15] developed a model based on the half-normal distribution to adapt financial risk measures, such as value-at-risk (VaR) and conditional value-at-risk (CVaR), for scenarios with solely negative impacts. Additionally,  ref. [16] proposed an estimation procedure for stress–strength reliability involving two independent unit-half-normal distributions, and [17] discussed objective Bayesian estimation for differential entropy under the generalized half-normal framework. Furthermore, ref. [18] examined inference for a constant-stress model using progressive type-I interval-censored data from the generalized half-normal distribution, among other contributions.

In this paper, we introduce the truncated (u,v)-half-normal distribution, a novel probability model defined on the bounded interval (u,v). The proposed distribution characteristics are analytically tractable statistical properties, making it versatile for modeling various density shapes within truncated domains. Parameter estimation was performed using a hybrid approach: maximum likelihood estimation (MLE) and a likelihood-free-like technique. We supported the estimation method by sensitivity analysis using simulated data, and an optimal estimation algorithm, to determine the best estimates. The exceptional performance of the proposed model is demonstrated through applications to real-world data.

The truncated (u,v)-half-normal distribution addresses these gaps by transforming the half-normal distribution via a new ratio mapping, eliminating normalization constants and enabling flexible modeling on any interval (u,v). Its parameters directly control bounds (u), interval length (b), and dispersion (σ), offering interpretability and analytical tractability. Unlike Beta, Kumaraswamy and Topp–Leone models, it inherits the asymmetry and tail behavior of half normal, proving superior in real-world data applications providing lower value of goodness-of-fit statistics. Finally, although the support of the proposed distribution is defined over a general bounded interval (u,v) with u<v, in most empirical applications u0 may be appropriate. However, allowing u<0 increases the generality of the model, facilitates applications that involve transformed or standardized data, and aligns with the theoretical origins of the model as a transformation of a symmetric normal variable.

The structure of the paper is as follows: Section 2 introduces the formulation of the proposed model and its statistical properties. Section 3 discusses maximum likelihood estimation method and provides sensitivity analysis on the parameters. Section 4 presents two illustrative real-world data applications. Finally, Section 5 concludes the paper with a summary of findings and implications.

2. Definition and Properties

Assume that the random variable X is normally distributed, with mean EX=0 and variance σ=VX as parameters. Note that, the random variable defined as the absolute value of X, i.e., |X|, follows the half-normal distribution (HN), originally suggested by [19] for creating centile charts. Now, let FX denotes the cumulative distribution function (CDF) associated with X. The aim is to introduce a new random variable, denoted by Y, defined on u,v, where v=u+b, with uR and b>0.

Thus, from the distribution of X, one can define the CDF of a “new” variable Y as follows

FY(a)=PYa=0,ifau,FXb(au)b(au),ifu<a<u+b,1,ifau+b.

Remark 1.

Using the symmetric properties of the normal distribution of X, for all u<a<u+b, one can see that

F Y ( a ) = F X b ( a u ) b ( a u ) = P b ( a u ) b ( a u ) X b ( a u ) b ( a u ) = 2 F X b ( a u ) b ( a u ) 1 .

Furthermore, one can deduce that

lim b + F Y a = 2 F X a u 1 = 2 F Z a 1 ,

where Z=X+u (which follows a normal distribution with u as mean and σ as standard deviation).

Proposition 1.

Based on the above results, one can see that

(1) Y = b Z b + u , w i t h Z b = X b 1 + X b

Proof. 

Let y be a real value, such that u<y<u+b. Thus, we have

PbZb+uy=Pb|X|b+|X|yu=Pb|X|(yu)b+(yu)|X|=P|X|b(yu)b(yu)=FXb(yu)b(yu)=FYy.

   □

Remark 2.

Zb defined in (1) is a novel transformation analogous to the odd ratio concept, but constructed using absolute value. Thus, one can easily see that Zb has 0,1 as support, that is, the choice Zb maps |X| to (0,1), ensuring bounded support. Furthermore, for all 0<z<1, the CDF associated with Zb is defined as follows

FZbz=PZbz=PXbz1z=2FXbz1z1,

where FX(·) denotes the CDF associated with X.

The classic procedure for producing a truncated distribution, is a conditional distribution that results from restricting the domain of some other probability distribution. Explicitly, suppose that X is a random variable with IR as support (in general, I is an open interval). Thus, in order to define a new random variable, denoted Y, with bounded support u,v, from X, we define Y as follows

Y=XuXv.

Therefore, the PDF and CDF associated with Y are, respectively, given by

fYx=fXxFXvFXu1xu,v

and

FYx=FXxFXuFXvFXu,xu,v,

where fX (resp. FX) denotes the PDF (resp. CDF) associated with X and 1(·) is an indicator function. One of the weak points of this classic procedure is that the normalization constant, given by 1FXvFXu, of the new distribution depends on FX (which may not have an explicit form in some cases, in particular when X is a normal random variable). This can have a negative impact, for example, for managing the maximum likelihood function and consequently the expansions of the likelihood ratio tests. To overcome these difficulties we mixed two approaches. First, we used the composed CDF, originally proposed by [20] to created a distribution with support on 0,b from a positive random variable, and the linear transformation of unit transform random variable, usually used to change support of a random variable. Indeed, based on Proposition 1, one can see that Y is an affine function of a unit transform random variable

(2)Zb=|X˜|1+|X˜|,whereX˜N0,σ˜=σb.

Note that, even if the support of Zb (i.e., 0,1) is independent of b, this parameter play an important role in the construction of Zb, which makes a noticeable difference from the basic rescaling technique. Furthermore, one can say that |X˜| follows the HN distribution. Thus, Zb can be seen as a “modified version” of the unit-HN distribution, originally introduced by [14] (indeed, for b=1, Zb is exactly the unit-HN distribution).

Remark 3.

Indeed, when b=1, the distribution of the introduced variable Y can be seen as a simple linear transformation of the Unit-HN distribution. This transformation allows that Y has u,u+1 as support.

While the support of the proposed distribution on (u,v) may resemble that of a truncated normal, the two models differ substantially. The truncated normal relies on a renormalization of the classical normal distribution over a bounded interval, requiring computation of cumulative distribution values. In contrast, the proposed model is based on a nonlinear transformation of an HN variable, yielding a closed-form density that avoids normalization constants. Moreover, the shape of the distribution can be flexibly controlled via the ratio σb, enabling a useful class of behaviors than the truncated normal.

Finally, from Remark 1, one can see that

limb+FYa=2FZa1=F|Z|a,

where X+u=ZNu,σ (In other words, |Z| has a folded normal distribution (if u=0, then |Z| has a half-normal distribution)).

Based on Remark 1, one can write the PDF that associated with Y as follows

(3)fYy=2bbyu2fXb(yu)b(yu)1yu,u+b,

where fX denotes the PDF associated with X. The transformation of the support of X using the nonlinear mapping (b(au))/(b(au)) transform the density shape and stretches the half-normal density dynamically based on a, allowing the PDF of Y to adapt its shape within (u,v), additionally, it may exhibit heavier tails or sharper peaks near b which can provide more flexibly to the proposed model. The parameter b scales the proposed transformation, affecting the rate at which probability accumulates near v, unlike other truncated models such as truncated normal on (0,b) in which the upper bound b is like a hard cut-off. Now, using (3) and results of Remark 1, the survival function (S.F.) and hazard rate function (H.R.F.) associated with Y, are given, for all yu,u+b, respectively, by

F˜Yy=PYy=21FXb(yu)b(yu)=2F˜Xb(yu)b(yu),

and

hYy=bbyu2hXb(yu)b(yu),

where F˜X and hX denote, respectively, the S.F. and the H.R.F. associated with X.

Proposition 2.

The PDF of the proposed distribution in (3) is unimodal with mode at either y0=4σ2b+b3+b3(8σ2b+b3)4σ2+u, for all b,σ>0 or y0=4σ2b+b3b3(8σ2b+b3)4σ2+u, for some, b,σ>0

Proof. 

We show that fYy=0 has a real root. Recall that the function fX is defined by

fXb(yu)b(yu)=ϕb(yu)b(yu)=12πσexp12σ2b(yu)b(yu)2.

Thus, one can find that

(4)fY(y)=4b2b(yu)3ϕb(yu)b(yu)2b5(yu)σ2b(yu)5ϕb(yu)b(yu)=2b2ϕb(yu)b(yu)σ2b(yu)52σ2b(yu)2b3(yu).

We can solve 2σ2b(yu)2b3(yu)=0 directly from (4) to determine the possible root of fY(y)=0 as follows. Let z=yu, then

(5)2σ2bz2b3z=2σ2b22bz+z2b3z=2σ2b24σ2bz+σ2z2b3z=2σ2z24σ2b+b3z+2σ2b2=0.

By applying quadratic equations formula in (5), we get

z=4σ2b+b3±4σ2b+b3216σ4b24σ2=4σ2b+b3±4σ2b+b32(4σ2b)24σ2=b4σ2+b2±b(8σ2+b2)4σ2.

Hence,

z=b4σ2+b2+b(8σ2+b2)4σ2,forallb>0,σ>0,

or

z=b4σ2+b2b(8σ2+b2)4σ2,forsomeb>0,σ>0.

Finally, we can claim that there exist a root among y0=z+u, which implies that fY(y) has unimodal density.    □

Figure 1 illustrates some possible shapes of the PDF in (3) associated with the variable Y, when u=π, b=2π (i.e., Y has π,π as support), and for some selected values of the parameter σ (indeed, due to (2), on the right part of Figure 1 we choose σ values such that σb is rather weak (≤1), and σ values such that σb is large (>1)). Figure 1 confirms the results mentioned in the previous section. In particular, these graphical results show the usefulness of the distribution of Y in terms of dispersion, asymmetry (left and right skewed) and flattening, which adopt various phenomena across the applications.

2.1. The rth Moment of Y

The rth moment of Y is defined to be EYr. In this section, we present the calculation of this measure. Specially, we focus on the first four moments, which usually used to study the dispersion and the form (asymmetry and kurtosis) of the distribution.

Remark 4.

Using Newton’s Binomial formula and based on (1), one can write the rth moment of Y as follows

EYr=EbZb+ur=k=0rCrkurkbkEZbk.

Furthermore, based on the Proposition 2.1 of [14], one can write the expression of the rth moment of Zb as follows

(6)EZbr=σbrmr,wheremr=EZ1+σbZr,withZHN1.

Remark 5.

Based on the properties of HN distribution (for details, see, e.g., [5,21]), for all r=1,2,, we have

mr=EZ1+σbZr=22π0+z1+σbzre12z2dz.

A closed-form expression of the rth raw moment of the proposed distribution in terms of elementary functions is not tractable due to the bounded transformation involved, it is nonetheless possible to express it symbolically using the Meijer G function a highly general class of special functions encompassing many classical functions (e.g., exponential, logarithmic, Bessel, hypergeometric). Indeed, Meijer G function is defined through a complex contour integral representation (see [22]). This expression allows general and symbolic computation of moments for a wide range of parameter values (for more details, see [23,24,25]). Explicitly, by introducing the substitution x=z1z, the moment integral

E[Yr]=01u+(vu)zrfZb(z)dz,

transforms into an expression involving the integrand x1+xrexpb2x22σ2, for which Meijer G representations are available:

x1+xr=G1,11,1x|1r0,exp(ax2)=G0,11,0ax2|0.

Thus, the moment E[Yr] can be formally written as a weighted integral involving Meijer G functions. Thus, the value of mr, for all r=1,2,, can be solved numerically by using, for example, the integrate()  function of the  stats  package of  R  (see [26]). This representation provides a useful foundation for further analytical generalizations and asymptotic studies.

Thus, based on Remark 4 and Equation (6), one can deduce that

EYr=EbZb+ur=k=0rCrkurkbkσbkmk=k=0rCrkurkσkmk.

We can deduce that, the moments depend on σb, enabling flexible tail control. Consequently, the mean, variance, coefficient of variation (CV), skewness (CS), and kurtosis (CK) coefficients are, respectively,

EY=u+σm1,

VY=EY2EY2=σ2m2m12,

CVY=σYEY=m2m12m1+uσ,

CSY=EYEY3σY3=EY33EYEY2+2EY3σ3m2m1232=m3+2m133m1m2m2m1232,

and

CKY=EYEY4σY4=EY44EYEY3+6EY2EY23EY4σ4m2m122=m44m1m3+6m12m23m14m2m122.

Based on Remark 5, Table 1 presents the numerical calculation for coefficient of variation (CV), skewness (CS), and kurtosis (CK) coefficients of the variable Y with different set of values associated with the parameters σ and b, with u=0. Note that CS and CK coefficients are independent of u. However, u could play an important role for the CV measures (indeed, obviously when u sufficiently large, we have that the CV is close to zero, which implies that the distribution of Y is concentrated around its mean (which will also be very large).

Thus, for u=0, one can observe that when σb increases, then the CV decreases (therefore, we have a weak dispersion around the mean). Moreover, independently of the value of u, one can see also that when σb increases, then we obtain a negative CS, which indicates a longer tail to the left of the distribution of Y. Finally, one can deduce that there exists an interval of positive value (denoted by k1,k2) such that if σb belongs to it, then the CK value is less than 3, otherwise the CK value is greater than 3. Then, the value of σb plays an important role in identifying the level of flattening of the distribution of Y (compared to the normal distribution).

Proposition 3.

The moment generating function of the variable Y is given by

M Y t = e t u k = 0 + t k σ k k ! m r , w h e r e m r = 0 + z 1 + σ b z r 2 ϕ z d z .

Proof. 

First, recall that

MYt=EetY=EetbZb+u=etuEetbZb.

Based on the results of this section and the Proposition 2.2. of [14], we have

EetbZb=k=0+tbkσbkk!EZ1+σbZk,withZHN1.

   □

2.2. Random Data Sampling

This section outlines the quantile function associated with the proposed model, which is essential for generating random samples. The quantile function is crucial for determining different percentiles of a distribution, such as the median, and is a key component in creating random variables that conform to a given distribution. Additionally, in non-parametric testing, the quantile function helps in identifying critical values for test statistics, thereby enhancing the robustness of statistical analysis.

Proposition 4.

The quartile function (QF) associated with Y is defined as follows

F Y 1 p = b F X 1 1 + p 2 F X 1 1 + p 2 + b + u , w i t h p 0 , 1 ,

where FX1 denotes the QF related to X.

Proof. 

Let p0,1 and u<y<u+b. Thus, we have

p=FY(y)=2FXb(yu)b(yu)1FXb(yu)b(yu)=1+p2b(yu)b(yu)=FX11+p2yuFX11+p2+b=bFX11+p2y=bFX11+p2FX11+p2+b+u.

   □

An alternative way that involves root finding is developed and very effective, the step by step algorithm is provided below in the description of Algorithm 1, and the R codes can be found in the Appendix A.

Algorithm 1 Step-by-Step Algorithm for the random data sampling.
1. Define the parameters for the quantile function: u,b,σ.
2. Define the PDF as the fY(y,u,b,σ) in (3).
3. Define the CDF as the numerical integral of the PDF from u to y:
FY(y,u,b,σ)=uyfY(t,u,b,σ)dt.
4. The quantile function is to find the value of y such that:
F Y ( y , u , b , σ ) p = 0 , 0 p 1 .
5. Solving the equation in step 4 by using the root-finding algorithm (e.g., uniroot in R).

3. Parameter Estimation

Here, we provide an estimation process of the parameters of the proposed model say, Θ=(σ,b)T. The parameter σ can be estimated by maximum likelihood estimation (MLE) method, while the optimal estimate of parameter b by a newly propose technique which is a likelihood-free-like approach. A demonstration using simulated data is provided.

3.1. MLE Technique

Let Y1Y2,Yn be an order statistics from the proposed model. Therefore, u=Y1 is the minimum order statistics, and let d=YnY1 be the range. Then, the estimate of b will be b^=d+ϵ, for some ϵ>0.

Next, we compute the MLE of σ. Let the log-likelihood function of the proposed model based on σ be L(σ|b) for the observed data y1,y2,,yn, be

L(σ|b)=i=1nlog(2)+2logbb(yiu)+logϕb(yiu)b(yiu);0,σ,

where ϕb(yiu)b(yiu)=12πσ2expb(yiu)b(yiu)2/(2σ2).

To maximize L(σ|b) to obtain σ^, we differentiate L(σ|b) with respect to σ and solve

L(σ|b)σ=i=1n1σ+1σ3b(yiu)b(yiu)2=0.

Thus,

(7)σ^=1ni=1nb^(yiu)b^(yiu)2.

The estimate of σ can be obtain analytically but depend on b^. The MLE of σ^ is asymptotically normal, that is for very large sample size σ^N(0,1I(σ)), where I(σ) is the expected Fisher information matrix, which is the negative expected value of the second derivative of the L(σ|b) with respect to σ as

2L(σ|b)σ2=nσ23σ4i=1nb(yiu)b(yiu)2.

The Fisher information matrix I(σ) is

I(σ)=Enσ23σ4i=1nb(yiu)b(yiu)2.

When n is sufficiently large, the first term dominates the expectation, therefore, I(σ)nσ2. Hence, the variance of σ^ say Var(σ)σ2n. To construct a (1α)100% confidence interval for σ^.

σ^2Zα2σ^2n,σ^2+Zα2σ^2n,

where Zα2 is the critical value from the standard normal distribution.

3.2. Sensitivity Analysis and Algorithm for Optimal Estimates

This subsection explores sensitivity analysis of the MLE of σ^, emphasizing its strong dependency on b^. A simulation-based likelihood-free-like approach is presented to determine the optimal value of b^, and σ^, providing robust parameter estimates for practical applications.

3.2.1. Sensitivity Analysis

The MLE of σ^ is obtained analytically but it strongly depend on b^. We conducted some sensitivity analysis to study the influence of b^ in σ^. We consider several samples from standard normal and uniform distributions with various values of b, and study the effect of increase in b on σ. Figure 2 illustrated that in several scenarios, as b increases (i.e., increase by ϵ), σ starts with a high value and then decreases rapidly. This suggests that σ is highly sensitive to values of b near d (i.e, choosing very small ϵ), resulting in large variations. This indicates that, beyond a certain threshold, changes in b have a diminishing impact on σ. At values of b close to d, σ is highly sensitive to small changes in b, likely due to the dependency on (b(yiu)) in the denominator, which can become very small. As b increases, σ converges towards a stable value, suggesting that for large values of b, the effect of changes in b on σ becomes negligible.

3.2.2. Algorithm for Optimal Estimates

The sensitivity analysis leads us to propose the use of a simulation based approach to obtain the optimal value of b^. A likelihood-free-like approach using full data set without considering a summary statistics is proposed. This method provides a mean of finding best estimates for any parameter on a boundary that depend on either minimum or maximum order statistics or range, especially when the other dependent parameters sensitivity becomes negligible to a certain threshold. The algorithm is simulation based but provides a single estimate. To see more about the likelihood-free method and its applications (see, [27,28]).

Let us assume bU(0,c) as an informative prior, then, we can apply likelihood-free-like to estimate b to get optimal estimate. A Cramér–von Mises test is suggested as a distance function, defined by:

(8)CvM=112n+i=1n2i12nF(yi)2,

where F(yi) is the CDF of the proposed model. Let δ>0 be a tolerance threshold, and D* be a simulated data. The algorithm for determination of optimal b and σ is given below in Algorithm 2. In addition, the flowchart in Figure 3 describes the steps given in Algorithm 2.

Algorithm 2 Likelihood-free-like algorithm for optimal b^.
1. Let Y1Y2,Yn be an order statistics from the observed data;
let u=Y1, and d=YnY1 be their range. Choose a threshold δ>0.
2. Compute the initial estimate b^0=d+ϵ, ϵ>0 to obtain MLE of σ:
σ ^ 0 = 1 n i = 1 n b 0 ( y i u ) b 0 ( y i u ) 2 .
3. Based on b^0, generate b^*=d+U(0,c), where U(0,c),c>0, is a uniform distribution.
4. Compute σ^* using b^* from (7).
5. Simulate data D*={x1,x2,,xn}, of size n from F(x;b^*,σ^*) using Algorithm 1.
6. Evaluate the acceptance measure ρ(D*) from (8) as:
ρ(D*)=112n+i=1n2i12nF(xi;σ^*,b^*)2.
7. Accept b^* and σ^* if ρ<δ.
8. Return to step 3 and repeat N times, to get (b^*(1),σ^*(1)),,(b^*(m),σ^*(m));mN.
9. The optimal b^ and σ^ is the pair of estimates with the minimum value of ρ in step 8.

For a very large N, we can obtain b^ in the accepted b*s with the minimum ρ, and σ^ as the corresponding σ^*.

3.2.3. Testing Algorithm with a Simulated Data

We sampled data of size n=50 from the proposed model with parameters σ=0.5 and b=5. The algorithm in the Algorithm 2 is applied to obtain the optimal values of b^, then to get the optimal MLE of the σ^, by using 10,000 iterations and δ=0.05. A simple way to identify the initial value of b is using a visual means, to guess the increment ϵ>0, and plotting the density to see how good is the fit to the histogram; then, use the values to set the range of the uniform distribution to apply the algorithm. The results are given in Table 2, and demonstrated in Figure 4.

4. Real-World Data Illustration

In this section, we evaluate the performance of the proposed model and compare it with several widely used models using real-world data. Parameters of the models were estimated using the maximum likelihood method, and their fit was assessed based on the Akaike information criterion (AIC), Bayesian information criterion (BIC), consistent Akaike information criterion (CAIC); in addition, the Kolmogorov–Smirnov (KS), Anderson–Darling (AD), and the Cramér–von Mises (CvM) statistics. The model with the smallest values for these criteria was deemed to provide the best fit. The benchmark models used for comparison are outlined as follows.

HN, F(x)=erfxσ2.

Weibull Pareto (WP) [29], F(x)=1e(vlog(x/θ))w,x>θ,v,w,θ>0.

Power distribution (Pw), F(x)=(x/λ)θ,λ,θ>0,0xλ.

Transmuted exponentiated U-quadratic (TEUq) [30], F(x)=G(x)(1+λ(λG(x))); where G(x)=((α/3)((xβ)3+(βa)3))γ and α<12(ba)3,β<(a+b)/2, a,b,γ>0,λ|1|.

Mustapha type III (MuIII) [11], F(x)=1βlogxθα,α,β,θ>0,xθ.

4.1. First Data Set

The first data set, provided by [31], examines the influence of soil fertility and the characterization of biological N2 fixation on the growth of Dimorphandra wilsonii Rizz. For 128 plants, measurements were taken of the phosphorus concentration in their leaves, encompassing 15 distinct observations. The numerical results of the estimates and model performance measures are given in Table 3. Evidently, from the Table 3, the proposed model has the smallest AIC (−398.04), BIC (−392.34), and CAIC (−397.95), indicating the best trade-off between model fit and complexity. In addition, proposed model has the KS (0.0833), AD (0.1209), and CvM (0.0156), in this metrics it has the smallest AD and CvM, making it with the best performance, and ranked the first with six smallest values out of seven of those metrics in Table 3. Furthermore, the WP aligns slightly better KS statistic (0.0742), followed by the second rank. Other models, such as TEU, MuIII, HN perform poorly in this respect.

In support of these results, Figure 5 shows the plots of the fitted density (left) and fitted CDF (right) of the proposed model. Figure 6 represents the plots of the empirical and proposed model box plots (left); and the quantile-quantile plot (right) for the first data set. The plots provided visual support of the results describing how the proposed model can represent the observed data with negligible deviations.

4.2. Second Data Set

The second data set consists of 50 observations of burr height (measured in millimeters) for holes with a diameter of 12 mm and a sheet thickness of 3.15 mm, as reported by [32]. The numerical results for parameter estimates and model performance metrics are presented in Table 4. Evidently, from Table 4, the proposed model achieves the smallest values for AIC (−114.36), BIC (−110.54), and CAIC (−114.10), indicating the best balance between model fit and complexity. Moreover, the proposed model exhibits the smallest values for the AD (0.1122) and CvM (0.0161) statistics, while also performing competitively in the KS statistic (0.1538). Overall, it ranks first in six out of the seven evaluated metrics, highlighting its superior performance. The Pw model, with a slightly better KS statistic (0.1463), ranks second. In contrast, models like TEU and WP exhibit relatively poor performance on this data set.

Supporting these findings, Figure 7 illustrates the fitted density (left) and CDF (right) for the proposed model. Additionally, Figure 8 shows the empirical and proposed model box plots (left) and the quantile–quantile plot (right) for this data set. These plots are proving the numerical outcomes visually, by showing how the proposed model can characterize the observed data very well.

5. Conclusions

In this paper, we introduced the truncated (u,v)-half-normal distribution, a novel probability model for bounded domains, addressing the need for realistic and analytically tractable models in constrained scenarios. Several properties of the model, including its cumulative distribution function, probability density function and its possible shapes, survival function, hazard rate, and moments, were rigorously derived and analyzed. Parameter estimation was performed using a hybrid approach: MLE for σ and a likelihood-free-like technique for b. We supported the results using sensitivity analysis, and an optimal estimation algorithm, to determine best estimates. The MLE of σ^ depends strongly on b^, as sensitivity analysis reveals that σ initially starts high and decreases rapidly with increasing b, particularly near some threshold value, leading to large variations. Beyond the threshold, changes in b have a diminishing impact, and σ converges to a stable value, indicating reduced sensitivity for large b. Hence, the utilization of the proposed algorithm helps to determine the optimal estimates. In addition, simulated data is used to demonstrate the accuracy and importance of the proposed hybrid algorithm. Applications to two real-world data sets demonstrated the superior performance of the proposed model compared to widely used alternatives, with better fit across metrics like AIC, BIC, CAIC, KS, AD, and CvM statistics. These results highlight the versatility and robustness of the (u,v)-half-normal distribution for modeling data with bounded support. We note that while the formulation permits negative lower bounds (u<0), its empirical relevance depends on the nature of the data. In contexts involving standardized or signed measurements, this generalization may prove useful. In strictly positive domains, u0 should naturally be enforced. An interesting avenue for future research is the extension of the proposed bounded half-normal distribution to a regression framework. In such a setting, the location (e.g., via a transformation of the interval (u,v)), the scale parameter σ, or the shape parameter b could be modeled as functions of covariates. This would allow practitioners to model bounded response variables that exhibit half-normal-like asymmetry while accounting for explanatory variables. Such an extension could be pursued within both classical and Bayesian inferential paradigms, potentially using maximum likelihood or Bayesian hierarchical models. The development of suitable link functions, estimation procedures, and diagnostics tailored to this bounded setting would form an important next step.

Author Contributions

Conceptualization, M.K., H.S.B., B.A. and M.M.; methodology, M.K., H.S.B., B.A. and M.M.; software, B.A. and M.M.; validation, M.K., H.S.B., B.A. and M.M.; formal analysis, M.K., H.S.B., B.A. and M.M.; investigation, M.K., H.S.B., B.A., M.M., S.M.A.A. and L.A.; resources, M.K., H.S.B., B.A., M.M., S.M.A.A. and L.A.; data curation, M.K., H.S.B., B.A. and M.M.; writing—original draft preparation, M.K., H.S.B., B.A. and M.M.; writing—review and editing, M.K., H.S.B., B.A., M.M., S.M.A.A. and L.A.; visualization, M.K., H.S.B., B.A., M.M., S.M.A.A. and L.A.; supervision, M.K., H.S.B., B.A., M.M., S.M.A.A. and L.A.; project administration, M.K., H.S.B., B.A., M.M., S.M.A.A. and L.A.; funding acquisition, S.M.A.A. and L.A. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The data supporting the findings of this study are cited throughout the manuscript.

Acknowledgments

The authors extend their appreciation to Umm Al-Qura University, Saudi Arabia for funding this research work through grant number: 25UQU4310037GSSR09.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1 Plot of density function of Y distribution for different values of σ at u=1 and b=6.

View Image -

Figure 2 Plots of the sensitivity analysis on σ as b increases in (iiii). This indicates that beyond a certain threshold, increase in b have a diminishing impact on σ.

View Image -

Figure 3 Flowchart describing step by step of the likelihood-free-like algorithm.

View Image -

Figure 4 Plots of the fitted density of the simulated data with initial guess (left) and optimal values (right) of the σ^ and b^.

View Image -

Figure 5 Plots of the fitted density (left), and fitted CDF (right) of the proposed model for the first data set.

View Image -

Figure 6 Plots of the empirical and proposed model box plots (left), and the quantile-quantile plot (right) of the proposed model for the first data set.

View Image -

Figure 7 Plots of the fitted density (left), and fitted CDF (right) of the proposed model for the second data set.

View Image -

Figure 8 Plots of the empirical and proposed model box plots (left), and the quantile-quantile plot (right) of the proposed model for the second data set.

View Image -

Indicators of Y with different set of parameters.

Indicator σ = 1 , b = 10 σ = 2 , b = 4 σ = 1.8 , b = 2 σ = π , b = π σ = 3 , b = 2 σ = 9 , b = 3 σ = 0.75 , b = 0.05
CV 0.6958630 0.5651201 0.4964827 0.4836054 0.4335308 0.3499644 0.192012
CS 0.7206297 0.1487482 −0.1613313 −0.2220628 −0.4703052 −0.9528335 −2.509737
CK 3.0353171 2.1437532 2.0757065 2.0940308 2.2733735 3.0905442 9.860044

Some possible real roots of z in a unit domain for some vales of σ.

Sample Size Parameters Actual Values Initial Estimates of b0,σ0 Optimal Estimates of b^,σ^
50 σ 0.5 0.4331 0.4623
b 5.0 11.0094 5.5732
CvM 0.1222 0.0338

MLEs, L, AIC, BIC, CAIC, KS, AD, and CvM for the first data set.

Model MLEs L AIC BIC CAIC KS AD CvM
Proposed Model σ ^ = 0.1465 201.02 −398.04 −392.34 −397.95 0.0833 0.1209 0.0156
b ^ = 0.5479
WP v ^ = 0.3999 196.82 −387.64 −379.09 −387.45 0.0742 0.7414 0.1130
w ^ = 3.3299
θ ^ = 0.0400
Pw θ ^ = 1.3010 166.55 −329.10 −323.39 −329.01 0.2169 1.5757 0.2611
λ ^ = 0.2810
MuIII α ^ = 99.3839 123.60 −241.21 −232.65 −241.01 0.3181 0.8562 0.1568
β ^ = 0.0094
θ ^ = 0.3800
HN σ ^ = 0.1509 149.21 −296.41 −293.56 −296.38 0.6357 1.2301 0.2022
TEU λ ^ = 0.9722 75.3985 −142.79 −131.39 −142.47 0.3078 1.7936 0.2863
γ ^ = 1.6520
a ^ = 0.0490
b ^ = 0.2800

MLEs, L, AIC, BIC, CAIC, KS, AD, and CvM for the second data set.

Model MLEs L AIC BIC CAIC KS AD CvM
Proposed Model σ ^ = 0.3116 59.18 114.36 −110.54 −114.10 0.1538 0.1122 0.0161
b ^ = 0.4929
WP v ^ = 0.0936 41.72 −77.44 −71.69 −76.91 0.2186 2.9353 0.4999
w ^ = 3.0749
θ ^ = 0.0190
Pw θ ^ = 1.1757 57.44 −110.87 −107.05 −110.62 0.1463 1.1898 0.1592
λ ^ = 0.3210
MuIII α ^ = 85.2215 43.52 −81.03 −75.39 −80.51 0.2627 0.5609 0.0949
β ^ = 0.0105
θ ^ = 0.4200
HN σ ^ = 0.1819 48.9371 −95.87 −93.96 −95.79 0.5471 1.0972 0.1689
TEU λ ^ = 0.7716 2.8104 2.3792 10.0273 3.2681 0.2770 3.5515 0.6581
γ ^ = 1.5599
a ^ = 0.0190
b ^ = 0.3200

Appendix A. Quantile Function R-Code Using the Algorithm 1

References

1. Lachos, V.H.; Ghosh, P.; Arellano-Valle, R.B. Likelihood based inference for skew-normal independent linear mixed models. Stat. Sin.; 2010; 20, pp. 303-322.

2. Meeker, W.Q.; Escobar, L.A.; Pascual, F.G. Statistical Methods for Reliability Data; John Wiley & Sons: Hoboken, NJ, USA, 2021.

3. Daher, W.; Aydilek, H.; Saleeby, E.G. Insider trading with different risk attitudes. J. Econ.; 2020; 131, pp. 123-147. [DOI: https://dx.doi.org/10.1007/s00712-020-00703-x]

4. Nadarajah, S. Some truncated distributions. Acta Appl. Math.; 2009; 106, pp. 105-123. [DOI: https://dx.doi.org/10.1007/s10440-008-9285-4]

5. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Multivariate Distributions; Wiley: New York, NY, USA, 1972; Volume 7.

6. Barr, D.R.; Sherrill, E.T. Mean and variance of truncated normal distributions. Am. Stat.; 1999; 53, pp. 357-361. [DOI: https://dx.doi.org/10.1080/00031305.1999.10474490]

7. Burkardt, J. The truncated normal distribution. Dep. Sci. Comput. Website Fla. State Univ.; 2014; 1, 58.

8. Chattamvelli, R.; Shanmugam, R. Arcsine Distribution. Continuous Distributions in Engineering and the Applied Sciences-Part I; Springer: Berlin/Heidelberg, Germany, 2021; pp. 57-68.

9. Arnold, B.C. Pareto distribution. Wiley StatsRef: Statistics Reference Online; John Wiley & Sons: Hoboken, NJ, USA, 2014; pp. 1-10.

10. González-Hernández, I.J.; Méndez-González, L.C.; Granillo-Macías, R.; Rodríguez-Muñoz, J.L.; Pacheco-Cedeño, J.S. A new generalization of the uniform distribution: Properties and applications to lifetime data. Mathematics; 2024; 12, 2328. [DOI: https://dx.doi.org/10.3390/math12152328]

11. Muhammad, M. A new three-parameter model with support on a bounded domain: Properties and quantile regression model. J. Comput. Math. Data Sci.; 2023; 6, 100077. [DOI: https://dx.doi.org/10.1016/j.jcmds.2023.100077]

12. Sun, J.; Kong, M.; Pal, S. The Modified-Half-Normal distribution: Properties and an efficient sampling scheme. Commun. Stat.—Theory Methods; 2023; 52, pp. 1591-1613. [DOI: https://dx.doi.org/10.1080/03610926.2021.1934700]

13. Olmos, N.M.; Varela, H.; Bolfarine, H.; Gómez, H.W. An extension of the generalized half-normal distribution. Stat. Pap.; 2014; 55, pp. 967-981. [DOI: https://dx.doi.org/10.1007/s00362-013-0546-6]

14. Bakouch, H.S.; Nik, A.S.; Asgharzadeh, A.; Salinas, H.S. A flexible probability model for proportion data: Unit-half-normal distribution. Commun. Stat.-Case Stud. Data Anal. Appl.; 2021; 7, pp. 271-288. [DOI: https://dx.doi.org/10.1080/23737484.2021.1882355]

15. Bosch-Badia, M.T.; Montllor-Serrats, J.; Tarrazon-Rodon, M.A. Risk analysis through the half-normal distribution. Mathematics; 2020; 8, 2080. [DOI: https://dx.doi.org/10.3390/math8112080]

16. De la Cruz, R.; Salinas, H.S.; Meza, C. Reliability estimation for stress-strength model based on Unit-half-normal distribution. Symmetry; 2022; 14, 837. [DOI: https://dx.doi.org/10.3390/sym14040837]

17. Ahmadi, K.; Akbari, M.; Raqab, M.Z. Objective Bayesian estimation for the differential entropy measure under generalized half-normal distribution. Bull. Malays. Math. Sci. Soc.; 2023; 46, 39. [DOI: https://dx.doi.org/10.1007/s40840-022-01435-5] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36471709]

18. Sief, M.; Liu, X.; Abd El-Raheem, A.E.R.M. Inference for a constant-stress model under progressive type-I interval censored data from the generalized half-normal distribution. J. Stat. Comput. Simul.; 2021; 91, pp. 3228-3253. [DOI: https://dx.doi.org/10.1080/00949655.2021.1925673]

19. Altman, D.G. Construction of age-related reference centiles using absolute residuals. Stat. Med.; 1993; 12, pp. 917-924. [DOI: https://dx.doi.org/10.1002/sim.4780121003] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/8337548]

20. Nik, A.S.; Chesneau, C.; Bakouch, H.S.; Asgharzadeh, A. A new truncated (0, b)-F family of lifetime distributions with an extensive study to a submodel and reliability data. Afr. Mat.; 2023; 34, 3. [DOI: https://dx.doi.org/10.1007/s13370-022-01037-1]

21. Tanner, M.A. Tools for Statistical Inference; Springer: Berlin/Heidelberg, Germany, 1993; Volume 3.

22. Luke, Y.L. Special Functions and Their Approximations: V. 2; Academic Press: Boca Raton, FL, USA, 1969; Volume 4.

23. Prudnikov, A.P.; Brychkov, I.A.; Marichev, O.I. Integrals and Series: Direct Laplace Transforms; CRC Press: Boca Raton, FL, USA, 1986.

24. Prudnikov, A. Integrals and Series; Routledge: London, UK, 2018.

25. Mathai, A.M.; Saxena, R.K.; Haubold, H.J. The H-Function: Theory and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009.

26. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2010.

27. Sisson, S.A.; Fan, Y.; Beaumont, M. Handbook of Approximate Bayesian Computation; CRC Press: Boca Raton, FL, USA, 2018.

28. Palestro, J.J.; Sederberg, P.B.; Osth, A.F.; Van Zandt, T.; Turner, B.M. Likelihood-Free Methods for Cognitive Science; Springer: Berlin/Heidelberg, Germany, 2018.

29. Alzaatreh, A.; Famoye, F.; Lee, C. Weibull-Pareto distribution and its applications. Commun. Stat.—Theory Methods; 2013; 42, pp. 1673-1691. [DOI: https://dx.doi.org/10.1080/03610926.2011.599002]

30. Muhammad, M.; Suleiman, M.I. The transmuted exponentiated U-quadratic distribution for lifetime modeling. Sohag J. Math; 2019; 6, pp. 19-27.

31. Fonseca, M.; Franca, M. A Influoencia da Fertilidade do Solo e Caracterizacao da FixacaoBiologica de N2 Para o Crescimento de Dimorphandra Wilsonii Rizz. Master’s Thesis; Universidade Federal de Minas Gerais: Belo Horizonte, Brazil, 2007.

32. Dasgupta, R. On the distribution of burr with applications. Sankhya B; 2011; 73, pp. 1-19. [DOI: https://dx.doi.org/10.1007/s13571-011-0015-y]

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.