Content area

Abstract

An efficient estimator can reduce both bias and mean squared error to provide more accurate results by using the transformation strategy. In this paper, an enhanced class of ratio–product types of estimators is introduced, which employs the transformation technique by linearly combining two robust measures, the trimean and decile mean, and five non-conventional measures, the range, inter-quartile range, mid-range, quartile average, and quartile deviation, on auxiliary variables with a simple random sampling method to estimate the finite population median. This transformation approach improves efficiency and enables estimators to manage data variability better. Using these estimators, we investigate their bias and mean squared error up to the first order of approximation. A comparison of the proposed estimators and existing methods is conducted through five simulated populations generated through different suitable distributions and three real datasets. By improving the precision and efficiency of median estimation, the proposed estimators ensure accurate and reliable results. Comparing the new estimators to traditional estimators, the findings show superior performance for new estimators in terms of mean squared errors (MSEs).

Full text

Turn on search term navigation

1. Introduction

At the planning and estimate stages, the utilization of auxiliary information is important to improving the performance of estimators in survey sampling. The median often gives a more accurate picture of the central tendency than the mean when working with highly skewed data distributions, such as those in income, expenditure, taxation, consumption, and production. For datasets with significant skewness, this is especially true because the median provides a more reliable representation of the information in the middle. The development of effective techniques for accurately estimating medians in finite populations has received less attention, despite the massive quantity of research on estimating population parameters, such as means, variances, proportions, and totals. This contrast highlights the importance of more research on improving methods that use auxiliary variables to improve median estimation. More details about auxiliary information can be found in [1,2,3,4,5].

In many cases, the median is important for analyzing skewed data or outliers. For reflecting survey responses in the social sciences, income in economics, pollution levels in environmental research, patient outcomes in health, and variations in the real estate market, it increases statistical accuracy. The estimation of the median using auxiliary information in simple random sampling has been significantly enhanced by the fundamental contributions [6,7,8,9]. Their efforts laid the foundation for subsequent research and enhancement in this area. A number of estimators have been developed for estimating the finite population median using various sampling techniques in subsequent years of the work [9]. Some novel techniques for improving regression estimators and ratios for the median were proposed by [10]. To address median estimation, ref. [11] presented a few techniques utilizing double sampling techniques. The population median was estimated by [12] using a generalized class of estimators that employs two auxiliary variables in double sampling. By employing the known median of the auxiliary variable, ref. [13] presented the minimum unbiased estimator. Subsequently, later on, the accuracy of the median estimation was improved by using two auxiliary variables under the two-phase sampling technique as discussed by [14]. To approximate the population median using simple random and stratified random sampling methods, refs. [15,16] developed several improved estimators. Using auxiliary data for various population parameters under different sampling designs, some new estimators were developed by [17,18,19]. In recent years, many researchers have been focusing on introducing new efficient estimators to estimate population median with different sampling methods. For more details about median estimation, we refer the reader to [20,21,22,23,24,25,26,27], and references therein.

The precision of estimators typically decreases and may lead to misleading results when extreme values are present. All ratio, product, regression, and exponential estimators used to determine the finite population median significantly rely on traditional auxiliary variable measures, which is a drawback. This dependence can reduce their efficacy, especially in outlier-influenced datasets. An efficient estimator can reduce both bias and mean squared error to provide more accurate results by using the transformation strategy. In this paper, an enhanced class of ratio–product types of estimators is introduced that employs the transformation technique by linearly combining two robust measures, the trimean and decile mean, and five non-conventional measures, the inter-quartile range, mid-range, quartile average, and quartile deviation, on auxiliary variables with a simple random sampling method to estimate the finite population median. This transformation approach improves efficiency and helps estimators manage data variability. The suggested estimators are flexible to skewed distributions and datasets with outliers, in contrast to many other estimators previously in use. Because of their flexibility, they are especially useful in domains such as environmental research, healthcare, and income analysis.

Real-Life Applications

In real-life applications, the improved median-estimation method proposed in this study can provide valuable insights across various fields:

Economic income-distribution analysis: In economics, accurate estimations of median income are crucial for understanding income inequality. The proposed method can offer more reliable median estimates even in the presence of extreme incomes or outliers, providing a clearer understanding of wealth distribution, which is critical for policy making and economic planning.

Climate research: In climate studies, data often contain extreme values due to rare events like heatwaves or storms. By applying the improved median estimation, researchers can obtain a more stable and robust estimate of climate variables (e.g., average temperature or rainfall) without the distortion caused by these extreme outliers. This can improve the reliability of climate models and help in formulating better strategies for climate change adaptation.

Medical studies: In healthcare, particularly when dealing with patient data or clinical trials, outliers can significantly affect the analysis of treatment efficacy or patient outcomes. The proposed method can be instrumental in estimating more accurate median values of health metrics (e.g., blood pressure, cholesterol levels) in populations with heterogeneous health conditions, thus ensuring more robust conclusions and better healthcare decision-making.

Environmental monitoring: For environmental monitoring, where pollutants or rare events (e.g., oil spills, radioactive leaks) can distort data, the improved median estimator can offer a more reliable measure of the central tendency, improving the accuracy of risk assessments and mitigation strategies.

Education and social sciences: The method could be applied in educational assessments or social research, where extreme test scores or outlier responses could otherwise skew the interpretation of central tendencies. By providing more robust median estimates, researchers can better understand the typical performance or behavior in a population, leading to more informed decisions in policy or education strategy development.

The structure of this paper can be outlined as follows: Section 2 provides a comprehensive explanation of the methods and notations employed in the study, followed by a review of existing estimators in Section 2 to establish a foundation for comparison. Section 3 introduces and elaborates on the proposed class of estimators. Section 4 presents a detailed mathematical comparison of the estimators under consideration. To verify the theoretical findings discussed in this section, a simulation study is conducted in Section 5. This study involves the construction of five distinct artificial populations utilizing various probability distributions. Furthermore, this section includes numerical examples to illustrate the practical application of the theoretical results. Finally, Section 6 provides a summary of the key findings and proposes potential paths for future research.

2. Concepts and Existing Estimators

Suppose the auxiliary variable is denoted by X and the study variable by Y for a finite population consisting of N units, denoted as Ω=(Ω1,Ω2,,ΩN). For each unit i where i=1,2,,N, the corresponding values of the auxiliary and study variables are xi and yi, respectively. Let a random sample of size n be selected from the population of size N with the condition that n<N under simple random sampling without replacement (SRSWOR). The population medians for the study and auxiliary variables are represented as δMy and δMx, and the sample medians as δ^My and δ^Mx. The associated probability density functions for the population medians are fy(δMy) and fx(δMx). The correlation coefficient between δMy and δMx is denoted by ρyx and is defined as ρ(δMy,δMx)=4P11(y,x)1, where P11=P(yδMyxδMx).

To determine the mathematical properties of different estimators, the following relative error terms are utilized:

δe0=δ^MyδMyδMy

and

δe1=δ^MxδMxδMx,

such that Eδei=0 for i=0,1.

Eδe02=θΔMy2,

Eδe12=θΔMx2,

Eδe0δe1=θΔMyx=ρyxΔMyΔMx,

where

ΔMy=1δMyfy(δMy),

ΔMx=1δMxfx(δMx),

denote the population coefficient of variations of the study variable Y and the auxiliary variable X, and let

θ=141n1N

be the finite population-correction factor.

Now, the biases and mean squared errors of the existing estimators used to estimate the finite population mean are investigated. We then compare these results with those of our suggested class of estimators to identify potential improvements.

The population median is generally estimated by the unbiased estimator, defined as:

(1)δ^MGR=δ^My.

The variance of δ^MGR is given by:

(2)V(δ^My)=θδMy2ΔMy2.

Assuming the median of the X variable is known, ref. [9] proposed a ratio-type estimator for δ^MR, which is defined as:

(3)δ^MR=δ^Myδ^MxδMx.

The following formulas are used to express the bias and MSE of δ^MR:

(4)Biasδ^MRθΔMyΔMx2ΔMyx

and

(5)MSEδ^MRθδMy2ΔMy2+ΔMx22ΔMyx.

The difference estimator for δ^MD introduced by [13] is defined as:

(6)δ^MD=Δ^My+dδMxδ^Mx,

where d is an unknown constant, and the optimum value of d is as given below:

dmin=ρyxδMyΔMyδMxΔMx.

The minimum MSE of δ^MD is given below:

(7)MSEδ^MDminθδMy2ΔMy21ρyx2.

We present an exponential ratio and product-type estimators in terms of median using the idea provided by [28]:

(8)δ^MRe=δ^MyexpδMxδ^MxδMx+δ^Mx

and

(9)δ^MPe=δ^Myexpδ^MxδMxδMx+δ^Mx.

The biases and MSEs for (δ^MRe,δ^MPe) are as follows:

(10)Biasδ^MReθδMy38ΔMx212ΔMyx,

(11)Biasδ^MPeθδMy12ΔMyx38ΔMx2,

(12)MSEδ^MReθδMy2ΔMy2+14ΔMx2ΔMyx

and

(13)MSEδ^MPeθδMy2ΔMy2+14ΔMx2+ΔMyx.

The difference type estimators for estimating median introduced by [10,14] are:

(14)δ^MD1=d1δ^My+d2δMxδ^Mx,

(15)δ^MD2=d3δ^My+d4δMxδ^MxδMxδ^Mx,

(16)δ^MD3=d5δ^My+d6δMxδ^MxδMxδ^MxδMx+δ^Mx.

The optimum values of the unknown constants di(i=1,2,,6) are given below:

d1opt=11+θΔMy21ρyx2,

d2opt=δMyδMxρyxΔMy1+θΔMy21ρyx2,

d3opt=1θΔMy21θΔMy2+θΔMy21ρyx2,

d4opt=δMyδMx1+d3optρyxΔMyΔMx2,

d5opt=188θΔMx21+θΔMx21ρyx2

and

d6opt=δMyδMx12+d5optρyxΔMyΔMx1.

The minimum biases and mean squared errors of δ^MDi(i=1,2,3) are given by the following expressions, using the optimal values of di(i=1,2,,6):

(17)Biasδ^MD1δMyd11,

(18)Biasδ^MD2δMyd31+θd3δMyΔMx2ΔMyx+θd4δMxΔMx2,

(19)Biasδ^MD3δMyd51+θd5δMy38ΔMx212ΔMyx+θ2d6δMxΔMx2,

(20)MSEδ^MD1minθδMy2ΔMy21ρyx21+θΔMy21ρyx2,

(21)MSEδ^MD2minθδMy21θΔMx2ΔMy21ρyx21θΔMx2+θΔMy21ρyx2

and

(22)MSEδ^MD3minθδMy2ΔMx21ρyx2θ4ΔMx2116ΔMx2+ΔMy21ρyx21+θΔMy21ρyx2.

3. Proposed Family of Estimators

In this section, we discuss a family of ratio–product-type estimators, which employs the transformation technique by linearly combining robust measures such as the trimean and decile mean and five non-conventional measures, the range, inter-quartile range, mid-range, quartile average, and quartile deviation, on auxiliary variables with the simple random sampling method to estimate the finite population median. This transformation approach improves efficiency and helps estimators manage data variability. The proposed estimator is defined below:

(23)δ^Me=δ^MyexpL1a1δ^MxδMxa1δMx+δ^Mx+2a2expL2b1δMxδ^Mxb1δMx+δ^Mx+2b2,

where the terms Li,i=1,2 represent fixed constant values either 1 or 2, while the known population parameters (a1,a2,b1,b2) are associated with the auxiliary variable X. From Equation (23), a set of new estimators can be derived by varying the population parameters (a1,a2), including the interquartile range (QR), mid-range (MR), quartile average (QA), quartile deviation (QD), tri-mean (TM), and decile mean (DM), as shown in Table 1.

Here

K=L2δMxδ^MxδMx+δ^Mx+2XmaxXmin,

b1=1,

b2=XmaxXmin,

QR=Q3Q1,

MR=Xmax+Xmin2,

QA=Q3+Q12,

QD=Q3Q12,

TM=Q1+2Q2+Q34,

DM=i=19Di9.

The following theorem provides the bias and mean squared error of the family of ratio–product-type estimators δ^Me.

Theorem 1. 

Consider δ^Me as a set of ratio–product estimators used to estimate the finite population median δMy in a simple random sampling scheme. The expressions for the bias and mean squared error (MSE) of δ^Me are provided below:

B i a s δ ^ M e θ 8 δ M y g 1 2 g 2 2 Δ M x 2 2 2 g 1 2 g 2 g 1 g 2 Δ M y x .

and

M S E δ ^ M e θ δ M y 2 Δ M y 2 + g 1 2 + g 2 2 2 g 1 g 2 4 Δ M x 2 g 1 g 2 Δ M y x .

Proof. 

To prove this theorem, we recall some concepts:

δe0=δ^MyδMyδMy,δe1=δ^MxδMxδMx,

such that Eδei=0 for i=0,1,

Eδe02=θΔMy2,

Eδe12=θΔMx2

and

Eδe0δe1=θΔMyx,

where

ΔMy=1δMyfy(δMy),

ΔMx=1δMxfx(δMx).

In order to examine the properties of the suggested estimator, we simplify Equation (23) by expressing it in terms of relative errors, which allows us to calculate the bias and mean squared error (MSE) of δ^Me2, as follows:

(24)δ^Me=δMy1+δe0expL1g1δe121+g1δe121expL2g2δe121+g2δe121

where g1 and g2 are defined as:

g1=a1δMxa1δMx+a2

and

g2=δMxδMX+b2.

We examine the right-hand side of Equation (24) using the first-order Taylor series expansion. To simplify, we neglect terms where ei>2, as their contributions are considered negligible in this context. This approach allows us to derive the following key expression:

δ^Me=δMy1+δe0expL1g1δe121g1δe12+g12δe124expL2g2δe121g2δe12+g22δe124,

δ^Me=δMy1+δe0expL1g1δe12L1g12δe124expL2g2δe12+L2g12δe124.

After simplifying, we obtain:

(25)δ^MeδMyδMyδe0L1g1L2g22δe12L1g122L2g22L12g12L22g228δe12+2L1g12L2g2L1L2g1g24δe0δe1.

The bias of δ^Me is derived by applying the expectation to both sides of Equation (25) and replacing the terms (δe0,δe1,δe12,δe0δe1) with their expected values, which is expressed as:

(26)Biasδ^Meθ8δMy2L1g122L2g22L12g12L22g22ΔMx222L1g12L2g2L1L2g1g2ΔMyx.

The MSE of δ^Me can be derived by squaring both sides of Equation (25) and taking the expectation, resulting in the equation shown below:

(27)MSEδ^MeθδMy2ΔMy2+L12g12+L22g222L1L2g1g24ΔMx2L1g1L2g2ΔMyx.

We can obtain the final results if we substitute the known constant values of (L1=L2=1) into Equations (26) and (27), and after some straightforward simplification, we obtain:

(28)Biasδ^Meθ8δMyg12g22ΔMx222g12g2g1g2ΔMyx.

and

(29)MSEδ^MeθδMy2ΔMy2+g12+g222g1g24ΔMx2g1g2ΔMyx.

4. Mathematical Comparison

In this section, we obtain the efficiency conditions by using the mean squared error equations of the proposed family of estimators δMe with the mean squared error equations of existing estimators, such as δ^My, δ^MR, δ^MD, δ^MRe, δ^MPe, δ^MD1,δ^MD2, and δ^MD3.

(i). The following condition results from comparing the MSE of the new family of estimators proved in (29) with the variance of the sample median mentioned in (2):

V(δ^My)>MSEδ^Me if

g12+g222g1g2ΔMx2<4g1g2ΔMyx

(ii). The following condition results from comparing the MSE of the new family of estimators proved in (29) with the MSE of the ratio estimator mentioned in (5):

MSE(δ^MR)>MSEδ^Me if

g12+g222g1g24ΔMx2<4g1g22ΔMyx.

(iii). The following condition results from comparing the MSE of the new family of estimators proved in (29) with the the MSE of the difference-type estimator mentioned in (7):

MSE(δ^MD)min>MSEδ^Me if

g12+g222g1g2ΔMx2<4g1g2ΔMyxΔMy2ρyx2.

(iv). The following condition results from comparing the MSE of the new family of estimators proved in (29) with the MSE of the exponential ratio-type estimator mentioned in (12):

MSE(δ^MRe)>MSEδ^Me if

g12+g222g1g21ΔMx2<4g1g2+1ΔMyx.

(v). The following condition results from comparing the MSE of the new family of estimators proved in (29) with the MSE of the exponential product-type estimator mentioned in (13):

MSE(δ^MPe)>MSEδ^Me if

g12+g222g1g21ΔMx2<4g1g21ΔMyx.

(vi). The following condition results from comparing the MSE of the new family of estimators proved in (29) with the MSE of difference estimator M^D1 mentioned in (20):

MSE(δ^MD1)min>MSEδ^Me if

g12+g222g1g21ΔMx2+4g1g2ΔMyx<4ΔMy2θΔMy21ρyx2θΔMy21+θΔMy21ρyx2.

(vii). The following condition results from comparing the MSE of the new family of estimators proved in (29) with the MSE of difference estimator M^D2 mentioned in (21):

MSE(δ^MD2)min>MSEδ^Meif

g12+g222g1g2ΔMx24g1g2ΔMyx<4ΔMy2θρyx2ΔMy2+ΔMx2θΔMy2+ρyx21θΔMx2+θΔMy21ρyx2.

(viii). The following condition results from comparing the MSE of the new family of estimators proved in (29) with the MSE of difference estimator M^D3 mentioned in (22):

MSE(δ^MD3)min>MSEδ^Meminif

4ΔMy2+g12+g222g1g2ΔMx24g1g2ΔMyx<ΔMx21ρyx24θΔMx2θ16ΔMx21+θΔMy21ρyx2.

5. Results and Discussion

To compare the effectiveness of the new family of estimators with all other existing estimators, we generate the five different simulated populations in this part using suitable positively skewed distributions. Additionally, four datasets are used to confirm the performance of the newly suggested estimators.

5.1. Simulation Study

The distribution that is best suited for a median estimate depends on the statistical properties of the data and the distribution itself; the median is particularly helpful and compatible with skewed, outlier-containing, and non-normal data. To obtain the variable X, we chose one of the five distributions provided below:

Population 1: X∼Moderate skew and spread Gamma distribution (α=5,β=2) with ρyx=0.5;

Population 2: X∼Slight skew Log-Normal distribution (μ=3,σ=2) with ρyx=0.35;

Population 3: X∼Heavy tails Cauchy distribution (γ0=6,γ=3) with ρyx=0.45;

Population 4: X∼Baseline Uniform distribution (a=6,b=19) with ρyx=0;

Population 5: X∼High skew Exponential (μ=12) with ρyx=0.75.

For analyzing and showing the robustness of the proposed estimators under different scenarios and their characteristics, these five distributions are most suited. We can use the following equation to find the variable Y:

Y=ρyx×X+e,

where ρyx is the correlation, and eN(0,1) represents the error term.

We analyzed the MSEs of the proposed estimators and other existing estimators for each distribution and correlation setting using the following techniques proposed by [29,30,31] in the R software (latest v. 4.4.0) to evaluate their robustness and efficiency.

Step 1:. Using the techniques described above, simulate N=800 observations for X and Y.

Step 2:. Samples of size n were chosen using simple random sampling without replacement (SRSWOR). The sample sizes are n=25,100,150, and 200.

Step 3:. To measure the performance of the estimators, compute the required statistics (such as sample mean, median, variance, or covariance) from the sampled data using the procedures outlined above. The optimal values for existing estimators, which consider unknown constants, are also determined.

Step 4:. For every sample size, MSE values for each estimator discussed in this article are computed.

Step 5:. After 50,000 iterations of steps 3 and 4, compute the mean squared error values using the formula provided below:

MSE(δ^Mt)min=k=150,000δ^MtkδMy250,000,

where t (t=R,D,Re,Pe,D1,D2D3,e1,e2,,e8) denotes the subscripts of the existing and new family of estimators.

5.2. Real-Life Application

The performance of the suggested family of estimators in comparison with the other estimators is now assessed by analyzing the MSE values of three different datasets. An extensive description of the datasets is given in the following part, along with statistical summaries that emphasize the most significant parameters and characteristics.

Population 1 

(Source: [32]). Y: Denotes the total number of households; X: Denotes the total area in square miles.

N=128,n=45,Xmin=5.961,Xmax=10.579,δMx=4.715,δMy=686,fx(δMx)=0.1154,fy(δMy)=0.00092,ρyx=0.468,TM=5.385,DM=5.378,QR=4.618,QA=5.316,QD=2.309,MR=8.270.

Population 2 

(Source: [33]). Y: Denotes the overall count of educators employed within educational institutions; X: Denotes the overall count of students enrolled in educational institutions in 2012.

N=923,n=180,Xmin=85,388.5,Xmax=93,671.5,δMx=4123,δMy=171,ρyx=0.155,fx(δMx)=0.00009409,fy(δMy)=0.002676,TM=7726,DM=7348.3,QR=8283,QA=5870.5,QD=4141.5,MR=89,530.

Population 3 

(Source: [13]). Y: The overall amount of fish harvested in 1995; X: Denotes the total number of fish harvested by recreational marine fisherman in 1994.

N=69,n=17,Xmin=15,055,Xmax=19,005,δMx=2007,δMy=2068,ρyx=0.314,fx(δMx)=0.00014,fy(δMy)=0.00014,TM=3777,DM=3615.2,QR=3936,QA=3002,QD=1975,MR=17,030.

Now, proceed with the calculation of mean squared error values for all available estimators. The results of this analysis, which highlight the performance of our proposed family of estimators, can be found in Table 2.

5.3. Discussion

We performed simulations using suitable distributions with different ρyx values and sample sizes to support the median estimations. The performance of the proposed family of estimators was also evaluated by analyzing three datasets. The mean squared error (MSE) criterion is used to measure the different estimators. From five simulated distributions, the MSE values for the new family and other existing estimators are shown in Table 3, Table 4, Table 5 and Table 6. The outcomes from the real datasets are shown in Table 2. We derive the following significant conclusions from these analyses:

The MSE values for all new estimators are less than those of the other existing estimators covered in Section 2, according to the findings of both simulated and real datasets, which are shown in Table 2, Table 3, Table 4, Table 5 and Table 6. This illustrates the better performance of the newly proposed estimators over existing ones.

Furthermore, the downward-trending graph lines in Figure 1 and Figure 2 for both five simulated distributions and three real datasets prove that all new estimators have MSE values that are consistently lower than those of existing estimators. The inverse relationship between the MSE values for the new estimators and the existing estimators led to the conclusion that the new family of estimators outperforms existing methods.

The box plot given in Figure 3 illustrates the distribution of mean squared error (MSE) values for different sample sizes (25, 100, 150, and 200) across five distributions: Gamma, Lognormal, Cauchy, Uniform, and Exponential. As sample size increases, the MSE values tend to decrease, indicating improved estimation accuracy with larger samples. The plot also highlights variability in MSE across the different distributions, with some showing wider spreads and others demonstrating more consistent performance.

The bar chart displayed in Figure 4 shows MSE values for different estimators across five distributions and four sample sizes. MSE generally decreases with larger sample sizes, indicating better estimator performance. Distributions like Lognormal and Exponential show higher MSEs, suggesting more estimation difficulty compared to Gamma and Uniform.

6. Conclusions

In this work, we used robust measurements of an auxiliary variable to obtain a new family of estimators to determine the finite population median under simple random sampling. The first degree of approximation provided a valuable framework for obtaining the biases and mean squared errors associated with both existing estimators and several newly developed ones. This approach enhances our understanding of their performance and potential areas for improvement. We conducted simulation analysis through five distributions with all possible different conditions and three real-life datasets to check the potential performance of new estimators with existing estimators by applying the mean squared error criterion. The simulation and numerical real-life dataset results are given in Table 2, Table 3, Table 4, Table 5 and Table 6, which show that the new family of estimators performs well and can obtain the optimum estimators as compared to other existing estimators. We noticed that all the new estimators have higher efficiency than other estimators.

The improved median estimation method has broad practical relevance. In economics, it enhances the accuracy of income-distribution analysis by reducing the impact of outliers. In climate research and environmental monitoring, it provides stable estimates despite extreme events, aiding in model reliability and risk assessment. Medical studies benefit from more robust estimates of patient health indicators, leading to sounder conclusions. In education and social sciences, it helps interpret central tendencies accurately, even in the presence of skewed data, thereby supporting better policy and decision-making.

Furthermore, our analysis focused on the properties of the new improved estimators within the context of a simple random sampling technique. It is worth exploring the potential of developing new estimators based on these findings, with the goal of achieving even lower MSE values under systemmatic sampling and stratified random sampling. This topic offers an interesting direction for further investigation.

Author Contributions

Conceptualization, F.A.A. and A.S.A.; methodology, F.A.A. and A.S.A.; software, F.A.A. and A.S.A.; validation, F.A.A. and A.S.A.; formal analysis, F.A.A. and A.S.A.; investigation, F.A.A. and A.S.A.; resources, A.S.A.; data curation, F.A.A. and A.S.A.; writing-original draft preparation, F.A.A. and A.S.A.; writing-review and editing, F.A.A. and A.S.A.; visualization, A.S.A.; supervision, F.A.A.; project administration, A.S.A.; funding acquisition, F.A.A. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The real data are secondary, and their sources are given in the datasection, while the simulated data have been generated using R software (latest v. 4.4.0).

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1 A graphical display of the findings utilizing information obtained from different population distributions. In the figure, the vertical axis represents the mean squared errors (MSEs) of the estimators, while the horizontal axis denotes the corresponding estimators. To enhance clarity, each estimator is assigned a numerical label from 1 to 16. For further details, please refer to Table 3, Table 4, Table 5 and Table 6. (a) XGam(α=5,β=2) with ρyx=0.5. (b) XLN(μ=3,σ=2) with ρyx=0.35. (c) XC(γ0=6,γ=3) with ρyx=0.45. (d) XUni(a=6,b=19) with ρyx=0. (e) XExp(μ=12) with ρyx=0.75.

View Image -

Figure 2 A graphical display of the findings utilizing information obtained from different datasets. In the figures, the vertical axis represents the mean squared errors (MSEs) of the estimators, while the horizontal axis denotes the corresponding estimators. To enhance clarity, each estimator is assigned a numerical label from 1 to 16. For further details, please refer to Table 2. (a) Source: [13]. (b) Source: [32]. (c) Source: [33].

View Image -

Figure 3 A graphical display of the findings utilizing information obtained from different datasets. In the figures, the vertical axis represents the mean squared errors (MSEs) of the estimators, while the horizontal axis denotes the corresponding distributions and sample sizes.

View Image -

Figure 4 A graphical display of the findings utilizing information obtained from different datasets. In the figures, the vertical axis represents the mean squared errors (MSEs) of the estimators, while the horizontal axis denotes the corresponding distributions and sample sizes.

View Image -

Some classes of the proposed estimator.

Subsets of the Proposed Estimator δ^Me a 1 a 2
δ ^ M e 1 = δ ^ M y exp L 1 Q R δ ^ M x δ M x Q R δ M x + δ ^ M x + 2 D M exp K Q R D M
δ ^ M e 2 = δ ^ M y exp L 1 M R δ ^ M x δ M x M R δ M x + δ ^ M x + 2 T M exp K M R T M
δ ^ M e 3 = δ ^ M y exp L 1 Q A δ ^ M x δ M x Q A δ M x + δ ^ M x + 2 Q D exp K Q A Q D
δ ^ M e 4 = δ ^ M y exp L 1 Q D δ ^ M x δ M x Q D δ M x + δ ^ M x + 2 Q A exp K Q D Q A
δ ^ M e 5 = δ ^ M y exp L 1 T M δ ^ M x δ M x T M δ M x + δ ^ M x + 2 M R exp K T M M R
δ ^ M e 6 = δ ^ M y exp L 1 D M δ ^ M x δ M x D M δ M x + δ ^ M x + 2 Q R exp K D M Q R
δ ^ M e 7 = δ ^ M y exp L 1 M R δ ^ M x δ M x M R δ M x + δ ^ M x + 2 Q A exp K M R Q A
δ ^ M e 8 = δ ^ M y exp L 1 Q D δ ^ M x δ M x Q D δ M x + δ ^ M x + 2 T M exp K Q D T M

Mean squared error of various estimators using real populations.

Estimator Population-1 Population-2 Population-3
[1] δ^My 4256.180 156.129 565,443.600
[2] δ^MR 5361.578 316.274 799,889.900
[3] δ^MD 3323.974 152.378 509,693.100
[4] δ^MRe 3377.321 181.892 532,582.300
[5] δ^MPe 7998.154 238.983 898,473.600
[6] δ^MD1 3300.661 151.588 455,416.100
[7] δ^MD2 3300.375 151.582 447,631.900
[8] δ^MD3 4429.561 210.612 466,361.300
[9] δ^Me1 3011.078 142.217 388,892.200
[10] δ^Me2 2730.148 142.220 388,834.300
[11] δ^Me3 2640.630 142.219 388,857.600
[12] δ^Me4 2629.768 142.215 388,942.200
[13] δ^Me5 2619.683 142.178 389,190.600
[14] δ^Me6 2599.298 141.967 388,830.500
[15] δ^Me7 2725.291 142.220 388,618.400
[16] δ^Me8 2702.546 142.214 388,974.800

Mean squared error (MSE) values for n=25.

Estimator Gam ( 5 , 2 ) LN ( 3 , 2 ) C ( 6 , 3 ) Uni ( 6 , 19 ) Exp ( 0.5 )
[1] δ^My 5.3 × 10 3 6.1 × 10 2 7.0 × 10 3 3.0 × 10 2 4.0 × 10 2
[2] δ^MR 6.0 × 10 3 6.8 × 10 2 7.5 × 10 3 3.7 × 10 2 4.9 × 10 2
[3] δ^MD 4.1 × 10 3 4.9 × 10 2 6.1 × 10 3 2.7 × 10 2 3.5 × 10 2
[4] δ^MRe 3.6 × 10 3 4.7 × 10 2 5.4 × 10 3 2.5 × 10 2 3.2 × 10 2
[5] δ^MPe 4.5 × 10 3 5.4 × 10 2 6.3 × 10 3 2.9 × 10 2 3.6 × 10 2
[6] δ^MD1 2.8 × 10 3 3.7 × 10 2 4.7 × 10 3 2.0 × 10 2 2.6 × 10 2
[7] δ^MD2 2.3 × 10 3 3.5 × 10 2 4.4 × 10 3 1.9 × 10 2 2.4 × 10 2
[8] δ^MD3 3.5 × 10 3 4.4 × 10 2 4.9 × 10 3 2.4 × 10 2 3.0 × 10 2
[9] δ^Me1 1.7 × 10 3 2.8 × 10 2 3.4 × 10 3 1.5 × 10 2 1.8 × 10 2
[10] δ^Me2 1.8 × 10 3 2.9 × 10 2 3.6 × 10 3 1.6 × 10 2 1.9 × 10 2
[11] δ^Me3 1.9 × 10 3 3.0 × 10 2 3.9 × 10 3 1.7 × 10 2 1.9 × 10 2
[12] δ^Me4 1.7 × 10 3 2.5 × 10 2 3.3 × 10 3 1.2 × 10 2 1.7 × 10 2
[13] δ^Me5 1.5 × 10 3 2.4 × 10 2 3.2 × 10 3 1.1 × 10 2 1.5 × 10 2
[14] δ^Me6 1.7 × 10 3 2.6 × 10 2 3.1 × 10 3 1.3 × 10 2 1.6 × 10 2
[15] δ^Me7 1.4 × 10 3 2.3 × 10 2 2.9 × 10 3 1.0 × 10 2 1.4 × 10 2
[16] δ^Me8 1.5 × 10 3 2.4 × 10 2 3.0 × 10 3 1.2 × 10 2 1.5 × 10 2

Mean squared error (MSE) values for n=100.

Estimator Gam ( 5 , 2 ) LN ( 3 , 2 ) C ( 6 , 3 ) Uni ( 6 , 19 ) Exp ( 0.5 )
[1] δ^My 4.7 × 10 3 5.6 × 10 2 6.5 × 10 3 2.7 × 10 2 3.6 × 10 2
[2] δ^MR 5.5 × 10 3 6.4 × 10 2 7.3 × 10 3 3.5 × 10 2 4.5 × 10 2
[3] δ^MD 3.6 × 10 3 4.5 × 10 2 5.3 × 10 3 2.3 × 10 2 3.0 × 10 2
[4] δ^MRe 3.5 × 10 3 4.3 × 10 2 5.1 × 10 3 2.2 × 10 2 2.9 × 10 2
[5] δ^MPe 4.2 × 10 3 5.1 × 10 2 6.0 × 10 3 2.5 × 10 2 3.4 × 10 2
[6] δ^MD1 2.5 × 10 3 3.4 × 10 2 4.2 × 10 3 1.7 × 10 2 2.3 × 10 2
[7] δ^MD2 2.0 × 10 3 3.0 × 10 2 4.0 × 10 3 1.5 × 10 2 2.1 × 10 2
[8] δ^MD3 3.2 × 10 3 4.1 × 10 2 4.5 × 10 3 2.1 × 10 2 2.7 × 10 2
[9] δ^Me1 1.5 × 10 3 2.5 × 10 2 3.2 × 10 3 1.2 × 10 2 1.6 × 10 2
[10] δ^Me2 1.6 × 10 3 2.6 × 10 2 3.3 × 10 3 1.3 × 10 2 1.7 × 10 2
[11] δ^Me3 1.7 × 10 3 2.7 × 10 2 3.4 × 10 3 1.4 × 10 2 1.7 × 10 2
[12] δ^Me4 1.4 × 10 3 2.3 × 10 2 3.1 × 10 3 1.1 × 10 2 1.5 × 10 2
[13] δ^Me5 1.3 × 10 3 2.2 × 10 2 2.9 × 10 3 1.0 × 10 2 1.4 × 10 2
[14] δ^Me6 1.4 × 10 3 2.4 × 10 2 3.0 × 10 3 1.2 × 10 2 1.5 × 10 2
[15] δ^Me7 1.2 × 10 3 2.1 × 10 2 2.7 × 10 3 0.9 × 10 2 1.3 × 10 2
[16] δ^Me8 1.3 × 10 3 2.2 × 10 2 2.8 × 10 3 1.0 × 10 2 1.4 × 10 2

Mean squared error (MSE) values for n=150.

Estimator Gam ( 5 , 2 ) LN ( 3 , 2 ) C ( 6 , 3 ) Uni ( 6 , 19 ) Exp ( 0.5 )
[1] δ^My 4.5 × 10 3 5.4 × 10 2 6.3 × 10 3 2.6 × 10 2 3.5 × 10 2
[2] δ^MR 5.3 × 10 3 6.2 × 10 2 7.1 × 10 3 3.4 × 10 2 4.4 × 10 2
[3] δ^MD 3.5 × 10 3 4.4 × 10 2 5.2 × 10 3 2.2 × 10 2 2.9 × 10 2
[4] δ^MRe 3.4 × 10 3 4.2 × 10 2 5.0 × 10 3 2.1 × 10 2 2.8 × 10 2
[5] δ^MPe 4.1 × 10 3 5.0 × 10 2 5.9 × 10 3 2.4 × 10 2 3.3 × 10 2
[6] δ^MD1 2.4 × 10 3 3.3 × 10 2 4.1 × 10 3 1.6 × 10 2 2.2 × 10 2
[7] δ^MD2 1.9 × 10 3 2.9 × 10 2 3.9 × 10 3 1.4 × 10 2 2.0 × 10 2
[8] δ^MD3 3.1 × 10 3 4.0 × 10 2 4.4 × 10 3 2.0 × 10 2 2.6 × 10 2
[9] δ^Me1 1.4 × 10 3 2.4 × 10 2 3.1 × 10 3 1.1 × 10 2 1.5 × 10 2
[10] δ^Me2 1.5 × 10 3 2.5 × 10 2 3.2 × 10 3 1.2 × 10 2 1.6 × 10 2
[11] δ^Me3 1.6 × 10 3 2.6 × 10 2 3.3 × 10 3 1.3 × 10 2 1.6 × 10 2
[12] δ^Me4 1.3 × 10 3 2.2 × 10 2 3.0 × 10 3 1.0 × 10 2 1.4 × 10 2
[13] δ^Me5 1.2 × 10 3 2.1 × 10 2 2.8 × 10 3 0.9 × 10 2 1.3 × 10 2
[14] δ^Me6 1.3 × 10 3 2.3 × 10 2 2.9 × 10 3 1.1 × 10 2 1.4 × 10 2
[15] δ^Me7 1.1 × 10 3 2.0 × 10 2 2.6 × 10 3 0.8 × 10 2 1.2 × 10 2
[16] δ^Me8 1.2 × 10 3 2.1 × 10 2 2.7 × 10 3 0.9 × 10 2 1.3 × 10 2

Mean squared error (MSE) values for n=200.

Estimator Gam ( 5 , 2 ) LN ( 3 , 2 ) C ( 6 , 3 ) Uni ( 6 , 19 ) Exp ( 0.5 )
[1] δ^My 4.3 × 10 3 5.2 × 10 2 6.1 × 10 3 2.5 × 10 2 3.4 × 10 2
[2] δ^MR 5.1 × 10 3 6.0 × 10 2 7.0 × 10 3 3.2 × 10 2 4.2 × 10 2
[3] δ^MD 3.3 × 10 3 4.2 × 10 2 5.0 × 10 3 2.1 × 10 2 2.8 × 10 2
[4] δ^MRe 3.2 × 10 3 4.0 × 10 2 4.8 × 10 3 2.0 × 10 2 2.6 × 10 2
[5] δ^MPe 3.9 × 10 3 4.8 × 10 2 5.7 × 10 3 2.3 × 10 2 3.2 × 10 2
[6] δ^MD1 2.3 × 10 3 3.2 × 10 2 4.0 × 10 3 1.5 × 10 2 2.1 × 10 2
[7] δ^MD2 1.8 × 10 3 2.8 × 10 2 3.7 × 10 3 1.3 × 10 2 1.9 × 10 2
[8] δ^MD3 3.0 × 10 3 3.9 × 10 2 4.3 × 10 3 1.9 × 10 2 2.5 × 10 2
[9] δ^Me1 1.3 × 10 3 2.3 × 10 2 3.0 × 10 3 1.0 × 10 2 1.4 × 10 2
[10] δ^Me2 1.4 × 10 3 2.4 × 10 2 3.1 × 10 3 1.1 × 10 2 1.5 × 10 2
[11] δ^Me3 1.5 × 10 3 2.5 × 10 2 3.2 × 10 3 1.2 × 10 2 1.6 × 10 2
[12] δ^Me4 1.2 × 10 3 2.1 × 10 2 2.9 × 10 3 0.9 × 10 2 1.3 × 10 2
[13] δ^Me5 1.1 × 10 3 2.0 × 10 2 2.7 × 10 3 0.8 × 10 2 1.2 × 10 2
[14] δ^Me6 1.2 × 10 3 2.2 × 10 2 2.8 × 10 3 1.0 × 10 2 1.3 × 10 2
[15] δ^Me7 1.0 × 10 3 1.9 × 10 2 2.5 × 10 3 0.7 × 10 2 1.1 × 10 2
[16] δ^Me8 1.1 × 10 3 2.0 × 10 2 2.6 × 10 3 0.8 × 10 2 1.2 × 10 2

References

1. Zaman, T.; Bulut, H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat.-Theory Methods; 2023; 52, pp. 2610-2624. [DOI: https://dx.doi.org/10.1080/03610926.2021.1955388]

2. Zaman, T.; Bulut, H. A simulation study: Robust ratio double sampling estimator of finite population mean in the presence of outliers. Sci. Iran.; 2021; 31, pp. 1330-1341. [DOI: https://dx.doi.org/10.24200/SCI.2021.55813.4418]

3. Alghamdi, A.S.; Alrweili, H. New class of estimators for finite population mean under stratified double phase sampling with simulation and real-life application. Mathematics; 2025; 13, 329. [DOI: https://dx.doi.org/10.3390/math13030329]

4. Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods; 2018; 17, 20. [DOI: https://dx.doi.org/10.22237/jmasm/1532007537]

5. Alomair, M.A.; Daraz, U. Dual transformation of auxiliary variables by using outliers in stratified random sampling. Mathematics; 2024; 12, 2829. [DOI: https://dx.doi.org/10.3390/math12182839]

6. Gross, S. Median estimation in sample surveys. Proceedings of the Section on Survey Research Methods; Houston, TX, USA, 11–14 August 1980; American Statistical Association Ithaca: Alexandria, VA, USA, 1980.

7. Sedransk, J.; Meyer, J. Confidence intervals for the quantiles of a finite population: Simple random and stratified simple random sampling. J. R. Stat. Soc. Ser. (Methodol.); 1978; 40, pp. 239-252. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1978.tb01670.x]

8. Philip, S.; Sedransk, J. Lower bounds for confidence coefficients for confidence intervals for finite population quantiles. Commun. Stat.-Theory Methods; 1983; 12, pp. 1329-1344. [DOI: https://dx.doi.org/10.1080/03610928308828534]

9. Kuk, Y.C.A.; Mak, T.K. Median estimation in the presence of auxiliary information. J. R. Stat. Soc. Ser. B; 1989; 51, pp. 261-269. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1989.tb01763.x]

10. Rao, T.J. On certail methods of improving ration and regression estimators. Commun. Stat.-Theory Methods; 1991; 20, pp. 3325-3340. [DOI: https://dx.doi.org/10.1080/03610929108830705]

11. Singh, S.; Joarder, A.H.; Tracy, D.S. Median estimation using double sampling. Aust. N. Z. J. Stat.; 2001; 43, pp. 33-46. [DOI: https://dx.doi.org/10.1111/1467-842X.00153]

12. Khoshnevisan, M.; Singh, H.P.; Singh, S.; Smarandache, F. A General Class of Estimators of Population Median Using Two Auxiliary Variables in Double Sampling; Virginia Polytechnic Institute and State University: Blacksburg, VA, USA, 2002.

13. Singh, S. Advanced Sampling Theory with Applications: How Michael Selected Amy; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; Volume 2.

14. Gupta, S.; Shabbir, J.; Ahmad, S. Estimation of median in two-phase sampling using two auxiliary variables. Commun. Stat.-Theory Methods; 2008; 37, pp. 1815-1822. [DOI: https://dx.doi.org/10.1080/03610920701826476]

15. Aladag, S.; Cingi, H. Improvement in estimating the population median in simple random sampling and stratified random sampling using auxiliary information. Commun. Stat.-Theory Methods; 2015; 44, pp. 1013-1032. [DOI: https://dx.doi.org/10.1080/03610926.2012.753090]

16. Solanki, R.S.; Singh, H.P. Some classes of estimators for median estimation in survey sampling. Commun. Stat.-Theory Methods; 2015; 44, pp. 1450-1465. [DOI: https://dx.doi.org/10.1080/03610926.2013.768664]

17. Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics; 2024; 12, 1737. [DOI: https://dx.doi.org/10.3390/math12111737]

18. Daraz, U.; Wu, J.; Alomair, M.A.; Aldoghan, L.A. New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling. Heliyon; 2024; 10, e33402. [DOI: https://dx.doi.org/10.1016/j.heliyon.2024.e33402]

19. Daraz, U.; Alomair, M.A.; Albalawi, O. Variance estimation under some transformation for both symmetric and asymmetric data. Symmetry; 2024; 16, 957. [DOI: https://dx.doi.org/10.3390/sym16080957]

20. Shabbir, J.; Gupta, S. A generalized class of difference type estimators for population median in survey sampling. Hacet. J. Math. Stat.; 2017; 46, pp. 1015-1028. [DOI: https://dx.doi.org/10.15672/HJMS.201610614759]

21. Irfan, M.; Maria, J.; Shongwe, S.C.; Zohaib, M.; Bhatti, S.H. Estimation of population median under robust measures of an auxiliary variable. Math. Probl. Eng.; 2021; 2021, 4839077. [DOI: https://dx.doi.org/10.1155/2021/4839077]

22. Shabbir, J.; Gupta, S.; Narjis, G. On improved class of difference type estimators for population median in survey sampling. Commun. Stat.-Theory Methods; 2022; 51, pp. 3334-3354. [DOI: https://dx.doi.org/10.1080/03610926.2020.1795195]

23. Subzar, M.; Lone, S.A.; Ekpenyong, E.J.; Salam, A.; Aslam, M.; Raja, T.A.; Almutlak, S.A. Efficient class of ratio cum median estimators for estimating the population median. PLoS ONE; 2025; 18, e0274690. [DOI: https://dx.doi.org/10.1371/journal.pone.0274690]

24. Iseh, M.J. Model formulation on efficiency for median estimation under a fixed cost in survey sampling. Model Assist. Stat. Appl.; 2023; 18, pp. 373-385. [DOI: https://dx.doi.org/10.3233/MAS-231437]

25. Hussain, M.A.; Javed, M.; Zohaib, M.; Shongwe, S.C.; Awais, M.; Zaagan, A.A.; Irfan, M. Estimation of population median using bivariate auxiliary information in simple random sampling. Heliyon; 2024; 10, e28891. [DOI: https://dx.doi.org/10.1016/j.heliyon.2024.e28891]

26. Bhushan, S.; Kumar, A.; Lone, S.A.; Anwar, S.; Gunaime, N.M. An efficient class of estimators in stratified random sampling with an application to real data. Axioms; 2023; 12, 576. [DOI: https://dx.doi.org/10.3390/axioms12060576]

27. Stigler, S.M. Linear functions of order statistics. Ann. Math. Stat.; 1969; 40, pp. 770-788. [DOI: https://dx.doi.org/10.1214/aoms/1177697587]

28. Singh, H.P.; Vishwakarma, G.K. Modified exponential ratio and product estimators for finite population mean in double sampling. Austrian J. Stat.; 2007; 36, pp. 217-225. [DOI: https://dx.doi.org/10.17713/ajs.v36i3.333]

29. Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat.; 2021; 8, 1899402. [DOI: https://dx.doi.org/10.1080/27658449.2021.1899402]

30. Daraz, U.; Wu, J.; Agustiana, D.; Emam, W. Finite population variance estimation using Monte Carlo simulation and real life application. Symmetry; 2025; 17, 84. [DOI: https://dx.doi.org/10.3390/sym17010084]

31. Daraz, U.; Agustiana, D.; Wu, J.; Emam, W. Twofold auxiliary information under two-phase sampling: An improved family of double-transformed variance estimators. Axioms; 2025; 14, 64. [DOI: https://dx.doi.org/10.3390/axioms14010064]

32. Murthy, M.N. Sampling Theory and Methods; Statistical Publishing Society: Calcutta, India, 1967.

33. Koyuncu, K.; Kadilar, C. Family of estimators of population mean using two auxiliary variables in stratified random sampling. Commun. Stat.-Theory Methods; 2009; 38, pp. 2398-2417. [DOI: https://dx.doi.org/10.1080/03610920802562723]

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.