Content area
Adaptive cluster sampling (ACS) is an efficient sampling technique for studying populations where the characteristic of interest is rare or spatially clustered. This method is widely applied in fields such as ecological studies, epidemiology, and resource management. ACS initially selects sampling units using simple random sampling without replacement. However, in some cases, selected networks may overlap, leading to multiple networks being included in the sample. To address this issue, a modified version of ACS was developed to ensure sampling without replacement at the network level, maintaining sampling symmetry and preventing the inclusion of overlapping networks. Despite this adjustment, asymmetry may still occur when network formation is highly irregular. This issue can be mitigated by incorporating auxiliary variables, which help correct distortions in the sampling process. In many situations, auxiliary variables related to the variable of interest can be utilized to enhance the precision of population parameter estimates. This research proposes multiplicative generalization for an estimator with two auxiliary variables using adaptive cluster sampling with networks selected without replacement. The bias and mean square error (MSE) are derived using a Taylor series expansion to determine the optimal conditions for minimizing MSE. A simulation study is conducted to support the theoretical findings. The results show that the proposed estimator under the optimal values of
Full text
1. Introduction
Adaptive cluster sampling (ACS) is a data-driven method for efficiently estimating the abundance of rare and clustered populations. First introduced by Thompson in 1990 [1], ACS begins by selecting initial sample units using simple random sampling without replacement. If an initial unit satisfies a predefined condition C, its neighboring units are added to the sample. If any of those neighboring units also satisfy condition C, their respective neighborhoods are added in turn. This process continues until no additional units meet the condition. Conversely, if the initial unit does not satisfy condition C, no additional units are added, and the cluster remains a single unit. The initial set of sample units and all subsequently included neighborhoods that satisfy condition C are collectively referred to as networks. In this context, a “neighborhood” is defined as the four spatially adjacent units located at the top, bottom, left, and right (i.e., north, south, west, and east) of the selected unit (Figure 1). For instance, if a unit marked with a star is the initial selection, then the condition for adding neighboring units could be a value greater than or equal to one. The green units in the figure illustrate a single network formed under this sampling framework. Adaptive cluster sampling (ACS) has been widely utilized in various survey applications, particularly in cases where the characteristic of interest is rare or spatially clustered. Research employing ACS includes studies on forest ecosystems [2], herpetofauna in tropical rainforests [3], larvae of the sea lamprey [4], freshwater mussel populations [5], hydroa-coustic surveys [6], and assessments related to the COVID-19 pandemic [7,8,9]. Additionally, ACS has been explored in autonomous systems [10] and Internet of Things (IoT) applications [11].
Thompson proposed an unbiased estimator for ACS under the condition that units are selected without replacement. The initials were selected using simple random sampling. However, some selected networks occasionally contained more than one selected network. Building on this framework, Salehi and Seber [12] introduced ACS without replacement at the network level and developed an estimator that leveraged prior work by Des Raj and Murthy.
The estimators discussed above were primarily designed to estimate a single variable of interest. However, in many situations, other variables are closely related to the variable of interest. Leveraging auxiliary information from these related variables is a well-established method to enhance the precision of estimation. Several researchers have developed estimators for adaptive cluster sampling without replacement that incorporate auxiliary information from such variables. Chao [13] introduced a ratio estimator, while Dryver and Chao [14] proposed modified ratio estimators. Chutiman and Kumphon [15] suggested regression, difference, and modified ratio estimators. Additionally, Chutiman [16] and Yadav et al. [17] proposed ratio estimators based on population parameters, including the coefficient of variation, kurtosis, skewness, and correlations with auxiliary variables. Chaudhry and Hanif [18] introduced a generalized exponential-cum-exponential estimator utilizing network averages, whereas Singh and Mishra [19] proposed transformed ratio-type estimators. Finally, Bhat et al. [20] developed a generalized class of ratio-type estimators. Finally, Mishra et al. [21] proposed combined ratio and product-type estimators.
Chutiman and Chiangpradit [22] developed a ratio estimator that utilized auxiliary variable information for adaptive cluster sampling, with networks selected without replacement. However, their approach was limited to the use of a single auxiliary variable. To address this limitation, this paper focuses on advancing adaptive cluster sampling estimators by incorporating information from two auxiliary variables, with networks still selected without replacement. Section 2 outlines key concepts of adaptive cluster sampling without the replacement of units, while Section 3 expands on sampling without the replacement of networks. The proposed estimators for adaptive cluster sampling without the replacement of networks are introduced in Section 4, followed by simulation studies presented in Section 5. Finally, the conclusions drawn from this study are discussed in Section 6.
2. Concept of ACS Without Replacement of Units
Consider a finite population, , of size N units. Let y denote the variable of interest taking the values on the unit , with representing the unknown total population of the variable of interest.
Let n denote the initial sample size and denote the final sample size. Let denote a network that includes unit i and as the number of units in that network. The initial sample of units is selected by simple random sampling without replacement. The Hansen–Hurwitz estimator of the total population for the variable of interest can be written as
(1)
where is the average of the variable of interest in the network that includes the unit of the initial sample, .The mean square error (MSE) of is
(2)
When the auxiliary variable is available, and this auxiliary variable has a positive relationship with the variable of interest, a ratio estimator is employed to enhance the efficiency of the estimator. Dryver and Chao [14] proposed a modified ratio estimator as
(3)
where is the Hansen–Hurwitz estimator of the population total for the auxiliary variable and is the population total of the auxiliary variable. The MSE of is(4)
where .3. Concept of ACS with Networks Selected Without Replacement
In adaptive cluster sampling, the number of distinct networks selected is inherently random. It is possible for multiple initial sampled units to fall within the same network, resulting in some units being selected more than once. Salehi and Seber [12] introduced a new sampling design as an adaptive cluster sampling with networks selected without replacement.
In this approach, the first sample unit is selected using simple random sampling from the population. A network is then formed based on this unit and subsequently removed from the population. The second sample unit is selected using simple random sampling without replacement from the remaining units, and a second network is formed. This process is repeated until networks have been selected.
Let be the first—draw probabilities for the network that includes unit i. Thus, , where is the number of units in the network that includes unit i. So, is the conditional ith draw probability for the iþnetwork, which includes the unit iþin the sample given the first network selection.
Building on work by Des Raj [23], Salehi and Seber [12] used a modified estimator, providing an unbiased estimator for the total population of the variable of interest as follows
(5)
where ,The MSE of is
(6)
and an unbiased estimator of is(7)
Meanwhile, Chutiman and Chiangpradit [22] presented a ratio estimator in adaptive cluster sampling without the replacement of networks.(8)
where is the estimator of the population total for the auxiliary variable.The approximated MSE of is
(9)
4. Proposed Estimator in ACS Without Replacement of Networks
Motivated by Gupta and Shabbir [24] and Chutiman and Kumphon [15], the multiplicative generalization for the estimator of a population total can be written as
(10)
where x and u are two auxiliary variables, and and are the estimators of the population total for the auxiliary variable x and u, respectively.and are called ratio-type, product-type, ratio-cum-product type, and product-cum-ratio-type estimator, respectively.
Let , , and ; and , , and . Thus, and a Taylor series expansion of is
(11)
where is third or higher order term in .The approximate bias is given by
(12)
The approximate mean square error (MSE) of is
(13)
where The values of and are derived by minimizing with respect to and so that and .(14)
Therefore,(15)
and(16)
we substitute from Equation (15) into Equation (16). Then, the optimum values of and are(17)
(18)
where and .The estimators of and are
(19)
(20)
where and ,5. Results and Discussion
5.1. Simulation Study
The population of the variable of interest and the two auxiliary variables was based on the study by Nipaporn and Kumphon [15], consisting of a population size of 20 rows and 20 columns, or 400 units (Figure A1, Figure A2 and Figure A3). The parameter values were , , , , and . For each iteration, the initial sample units were selected by simple random sampling. The condition for adding sample units was defined by . A total of 10,000 iterations were performed for each estimator. The number of networks were varied as n = 2, 5, 10, 15, 20, 25, and 50.
The estimated absolute relative bias was defined as
The estimated MSE of the estimator was defined as
The percentage relative efficiency of the proposed estimator was compared with was defined as
The estimated absolute relative bias, estimated mean square error (MSE), and percentage relative efficiency of the estimators using two auxiliary variables under adaptive cluster sampling with networks selected without replacement were calculated. Figure 2 presents a flowchart outlining the steps of the simulation study, and the results are presented in Table 1, Table 2 and Table 3.
5.2. Discussion
The data revealed that the variable of interest was positively correlated with both auxiliary variables. However, the correlation between the variable of interest and auxiliary variable x was stronger than its correlation with auxiliary variable u.
Our findings are summarized as follows:
The results in Table 1 demonstrate that, for all estimators, the estimated absolute relative bias decreased as the network sample size increased. Among the estimators, the product-type estimator consistently exhibited higher estimated absolute relative bias than the other estimators.
Table 2 presents the estimated mean square error (MSE) of the estimators. Here, represents the estimated mean square error of the modified Des Raj estimator, which did not rely on auxiliary variable information, while refers to the proposed estimator that incorporates two auxiliary variables. For all network sample sizes, the estimated mean square error of the proposed estimator was lower than that of the modified Des Raj estimator, , when , (ratio-cum-product-type estimator), and , (the proposed estimator with optimal values). The proposed estimator with the optimal values as , achieved the lowest estimated MSE compared to the settings , and , corresponding to ratio-type, product-type, ratio-cum-product type, and product-cum-ratio-type estimators, respectively. The estimated MSE of the product-type estimator was particularly high, as this estimator was applied in scenarios where the variable of interest and the auxiliary variables were related in opposing directions.
Table 3 presents the percentage relative efficiency (PRE) of the proposed estimator compared to the modified Des Raj estimator , where is set to 100. A PRE value greater than 100 indicates that the estimator is more efficient than . The results show that the product-type and product-cum-ratio-type estimators exhibited lower efficiency than across all network sample sizes. The ratio-type estimator demonstrated higher efficiency than when the network sample size was small. Meanwhile, the ratio-cum-product-type estimator and the proposed estimator with optimal values of and as had higher efficiency than for all network sample sizes. Among all the estimators, the proposed estimator with optimal values of and was the most efficient.
6. Conclusions
In adaptive cluster sampling, the initial units are selected using simple random sampling without replacement, but networks can be selected more than once. Salehi and Seber [12] proposed an adaptive cluster sampling with networks selected without replacement by introducing an estimator based on the Des Raj estimator, . In some situations, auxiliary information related to the variable of interest is utilized to improve the precision of the estimator. Chutiman and Chiangpradit [22] proposed a ratio estimator that uses a single auxiliary variable in adaptive cluster sampling with networks selected without replacement. This study presented a multiplicative generalization of the estimator, incorporating two auxiliary variables as
The bias and mean square error (MSE) of the proposed estimator were derived, and the optimum values of and were determined by minimizing their MSE. When inappropriate values of and were used, the proposed estimator was less efficient than the modified Des Raj estimator , which does not rely on auxiliary variable information. However, the optimal values of and yielded the lowest MSE of the estimator. The performance of the proposed estimator was further validated through numerical simulations. Table 2 and Table 3 reveal that while the variable of interest was positively correlated with the two auxiliary variables, the ratio-type estimator did not outperform the modified Des Raj estimator at any network sample size. Conversely, the proposed estimator was the most efficient when and were set to their optimal values, i.e., and . Comprehensive analysis and the interpretation of the results demonstrate that the proposed estimator with optimum values of and achieved superior performance metrics in terms of both MSE and percentage relative efficiency. These findings highlight its enhanced accuracy compared to the alternative estimators examined. Future research should focus on assessing the robustness of the proposed estimator under different population structures and varying degrees of spatial clustering. Additionally, further studies should explore the integration of more than two auxiliary variables to enhance estimation efficiency in adaptive cluster sampling without the replacement of networks.
Conceptualization, N.C. and A.N.; methodology, S.W.; software, N.C. and P.G.; investigation, S.W.; writing—original draft preparation, N.C. and A.N.; writing—review and editing, P.G and S.W.; and funding acquisition, N.C. All authors have read and agreed to the published version of the manuscript.
Data are contained within the article.
The authors would like to thank the editor and the referees for their valuable feedback and insightful suggestions.
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1. Example of a network (shaded in green), where the unit marked with an asterisk represents the initial sampling unit.
The estimated absolute relative bias of the estimators for the total population of the variable of interest.
| n | | |||
|---|---|---|---|---|
| | | | | |
| 2 | 0.9091 | 43.7092 | 0.0409 | 0.7103 |
| 5 | 0.5723 | 9.6476 | 0.0272 | 0.4120 |
| 10 | 0.4252 | 3.9173 | 0.0215 | 0.1237 |
| 15 | 0.3712 | 2.0932 | 0.0141 | 0.1210 |
| 20 | 0.3167 | 1.2775 | 0.0088 | 0.0960 |
| 25 | 0.3134 | 1.1377 | 0.0082 | 0.0765 |
| 50 | 0.1912 | 0.4281 | 0.0025 | 0.0143 |
The estimated MSE of the estimators for the total population of the variable of interest.
| n | | | | ||||
|---|---|---|---|---|---|---|---|
| | | | |||||
| 2 | 6.8466 | 968,885.0341 | 204,764.3145 | 26,322,915,632.8267 | 926,194.3208 | 1,445,829.5306 | 188,667.2977 |
| 5 | 16.1432 | 375,678.8942 | 132,903.3653 | 556,216,006.2863 | 375,147.0977 | 535,979.9843 | 121,855.7443 |
| 10 | 29.3040 | 185,513.8035 | 146,738.8306 | 68,892,932.1431 | 172,073.9448 | 291,531.9326 | 68,099.7147 |
| 15 | 39.5225 | 117,455.5717 | 245,276.4686 | 16,271,424.1794 | 111,666.6914 | 176,639.4646 | 45,295.0030 |
| 20 | 53.0735 | 80,580.0398 | 309,571.7174 | 4,919,034.4968 | 76,522.4944 | 116,484.1294 | 27,811.8563 |
| 25 | 59.9526 | 64,537.0999 | 272,468.2427 | 3,344,904.9521 | 59,331.5526 | 100,902.1119 | 23,029.4124 |
| 50 | 93.7806 | 25,465.4180 | 84,934.4510 | 630,026.2443 | 22,545.9512 | 41,470.0784 | 5722.4797 |
The percentage relative efficiency of the estimators for the total population of the variable of interest.
| n | | | ||||
|---|---|---|---|---|---|---|
| | | | ||||
| 2 | 100 | 473.1708 | 0.0037 | 104.6093 | 67.0124 | 513.5416 |
| 5 | 100 | 282.6707 | 0.0675 | 100.1418 | 70.0920 | 308.2981 |
| 10 | 100 | 126.4245 | 0.2693 | 107.8105 | 63.6341 | 272.4149 |
| 15 | 100 | 47.8870 | 0.7219 | 105.1841 | 66.4945 | 259.3124 |
| 20 | 100 | 26.0295 | 1.6381 | 105.3024 | 69.1768 | 289.7327 |
| 25 | 100 | 23.6861 | 1.9294 | 108.7737 | 63.9601 | 280.2377 |
| 50 | 100 | 29.9824 | 4.0420 | 112.9490 | 61.4067 | 445.0067 |
Appendix A
The populations of the variable of interest and the two auxiliary variables are shown in
Figure A1. The population of the variable of interest [Forumla omitted. See PDF.], where unit neighborhoods are defined as four spatially adjacent units. The condition for adding units was defined by [Forumla omitted. See PDF.]. The areas shaded in different colors represent distinct networks.
Figure A2. The population of the auxiliary variable x. The position of the network is the same as the data y. The areas shaded in different colors represent distinct networks.
Figure A3. The population of the auxiliary variable u. The position of the network is the same as the data y. The areas shaded in different colors represent distinct networks.
References
1. Thompson, S.K. Adaptive cluster sampling. J. Am. Statist. Assoc.; 1990; 85, pp. 1050-1059. [DOI: https://dx.doi.org/10.1080/01621459.1990.10474975]
2. Magnussen, S.; Kurz, W.; Leckie, D.G.; Paradine, D. Adaptive cluster sampling for estimation of deforestation rates. Eur. J. For. Res.; 2005; 124, pp. 207-220. [DOI: https://dx.doi.org/10.1007/s10342-005-0074-6]
3. Noon, B.R.; Ishwar, N.M.; Vasudevan, K. Efficiency of adaptive cluster and random sampling in detecting terrestrial herpetofauna in a tropical rainforest. Wildl. Soc. Bull.; 2006; 34, pp. 59-68. [DOI: https://dx.doi.org/10.2193/0091-7648(2006)34[59:EOACAR]2.0.CO;2]
4. Sullivan, W.P.; Morrison, B.J.; Beamish, F.W.H. Adaptive cluster sampling: Estimating density of spatially autocorrelated larvae of the sea lamprey with improved precision. J. Great Lakes Res.; 2008; 34, pp. 86-97. [DOI: https://dx.doi.org/10.3394/0380-1330(2008)34[86:ACSEDO]2.0.CO;2]
5. Smith, D.R.; Villella, R.F.; Lemarié, D.P. Application of adaptive cluster sampling to low-density populations of freshwater mussels. Environ. Ecol. Stat.; 2003; 10, pp. 7-15. [DOI: https://dx.doi.org/10.1023/A:1021956617984]
6. Conners, M.E.; Schwager, S.J. The use of adaptive cluster sampling for hydroacoustic surveys. ICES J. Mar. Sci.; 2002; 59, pp. 1314-1325. [DOI: https://dx.doi.org/10.1006/jmsc.2002.1306]
7. Olayiwola, O.M.; Ajayi, A.O.; Onifade, O.C.; Wale-Orojo, O.; Ajibade, B. Adaptive cluster sampling with model based approach for estimating total number of Hidden COVID-19 carriers in Nigeria. Stat. J. IAOS; 2020; 36, pp. 103-109. [DOI: https://dx.doi.org/10.3233/SJI-200718]
8. Chandra, G.; Tiwari, N.; Nautiyal, R. Adaptive cluster sampling-based design for estimating COVID-19 cases with random samples. Curr. Sci.; 2021; 120, pp. 1204-1210. [DOI: https://dx.doi.org/10.18520/cs/v120/i7/1202-1210]
9. Stehlík, M.; Kiseľák, J.; Dinamarca, A.; Alvarado, E.; Plaza, F.; Medina, F.A.; Stehlíková, S.; Marek, J.; Venegas, B.; Gajdoš, A. et al. REDACS: Regional emergency-driven adaptive cluster sampling for effective COVID-19 management. Stoch. Anal. Appl.; 2022; 41, pp. 474-508. [DOI: https://dx.doi.org/10.1080/07362994.2022.2033126]
10. Hwang, J.; Bose, N.; Fan, S. AUV adaptive sampling methods: A Review. Appl. Sci.; 2019; 9, 3145. [DOI: https://dx.doi.org/10.3390/app9153145]
11. Giouroukis, D.; Dadiani, A.; Traub, J.; Zeuch, S.; Markl, V. A survey of adaptive sampling and filtering algorithms for the internet of things. Proceedings of the 14th ACM International Conference on Distributed and Event Based Systems; Montreal, QC, Canada, 13–17 July 2020; pp. 27-38. [DOI: https://dx.doi.org/10.1145/3401025.3403777]
12. Salehi, M.M.; Seber, G.A.F. Adaptive cluster sampling with networks selected without replacement. Biometrika; 1977; 84, pp. 209-219. [DOI: https://dx.doi.org/10.1093/biomet/84.1.209]
13. Chao, C.T. Ratio estimation on adaptive cluster sampling. J. Chin. Stat. Assoc.; 2004; 42, pp. 307-327. [DOI: https://dx.doi.org/10.29973/JCSA.200409.0006]
14. Dryver, A.L.; Chao, C.T. Ratio estimators in adaptive cluster sampling. Environmetric; 2007; 18, pp. 607-620. [DOI: https://dx.doi.org/10.1002/env.838]
15. Chutiman, N.; Kumphon, B. Ratio estimator using two auxiliary variables for adaptive cluster sampling. Thail. Stat.; 2008; 6, pp. 241-256.
16. Chutiman, N. Adaptive cluster sampling using auxiliary variable. J. Math. Stat.; 2013; 9, pp. 249-255. [DOI: https://dx.doi.org/10.3844/jmssp.2013.249.255]
17. Yadav, S.K.; Misra, S.; Mishra, S. Efficient estimator for population variance using auxiliary variable. Am. J. Oper. Res.; 2016; 6, pp. 9-15. [DOI: https://dx.doi.org/10.1080/09720510.2017.1406643]
18. Chaudhry, M.S.; Hanif, M. Generalized exponential-cum-exponential estimator in adaptive cluster sampling. Pak. J. Stat. Oper. Res.; 2015; 11, pp. 553-574. [DOI: https://dx.doi.org/10.18187/pjsor.v11i4.1009]
19. Singh, R.; Mishra, R. Transformed ratio type estimators under adaptive cluster sampling an application to covid-19. J. Stat. Appl. Probab. Lett.; 2022; 9, pp. 63-70. [DOI: https://dx.doi.org/10.18576/jsapl/090201]
20. Bhat, A.A.; Sharma, M.; Shah, M.; Bhat, M. Generalized ratio type estimator under adaptive cluster sampling. J. Sci. Res.; 2022; 67, pp. 46-51. [DOI: https://dx.doi.org/10.37398/JSR.2023.670307]
21. Mishra, R.; Singh, R.; Raghav, Y.S. On combining ratio and product type estimators for estimation of finite population mean in adaptive cluster sampling design. Braz. J. Biom.; 2024; 42, pp. 412-420. [DOI: https://dx.doi.org/10.28951/bjb.v42i4.725]
22. Chutiman, N.; Chiangpradit, M. Ratio estimator in adaptive cluster sampling without replacement of networks. J. Probab. Stat.; 2014; 2014, 726398. [DOI: https://dx.doi.org/10.1155/2014/726398]
23. Raj, D. Some Estimators in sampling with varying probabilities without replacement. J. Am. Stat. Assoc.; 1956; 51, pp. 269-284. [DOI: https://dx.doi.org/10.1080/01621459.1956.10501326]
24. Gupta, S.; Shabbir, J. On the use of transformed auxiliary variables in estimating population mean by using two auxiliary variables. J. Stat. Plan. Inference; 2007; 137, pp. 1606-1611. [DOI: https://dx.doi.org/10.1016/j.jspi.2006.09.008]
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.