Content area
Full Text
Respondent-driven sampling (RDS) employs a variant of a linktracing network sampling strategy to collect data from hard-to-reach populations. By tracing the links in the underlying social network, the process exploits the social structure to expand the sample and reduce its dependence on the initial (convenience) sample.
The current estimators of population averages make strong assumptions in order to treat the data as a probability sample. We evaluate three critical sensitivities of the estimators: (1) to bias induced by the initial sample, (2) to uncontrollable features of respondent behavior, and (3) to the without-replacement structure of sampling.
Our analysis indicates: (1) that the convenience sample of seeds can induce bias, and the number of sample waves typically used in RDS is likely insufficient for the type of nodal mixing required to obtain the reputed asymptotic unbiasedness; (2) that preferential referral behavior by respondents leads to bias; (3) that when a substantial fraction of the target population is sampled the current estimators can have substantial bias.
This paper sounds a cautionary note for the users of RDS. While current RDS methodology is powerful and clever, the favorable statistical properties claimed for the current estimates are shown to be heavily dependent on often unrealistic assumptions. We recommend ways to improve the methodology.
1. INTRODUCTION TO RESPONDENT-DRIVEN SAMPLING
Respondent-driven sampling (RDS, introduced by Heckathorn [1997, also 2002, 2007]; see also Salganik and Heckathorn [2004]; VoIz and Heckathorn [2008]) is an approach to sampling design and inference in hard-to-reach populations. Hard-to-reach populations are characterized by the difficulty in sampling from them using standard probability methods. RDS is typically employed when a sampling frame for the target population is not available, and its members are rare or stigmatized in the larger population so that it is prohibitively expensive to contact them through available frames. It is often used in populations such as injecting drug users, men who have sex with men, and sex workers (Malekinejad et al. 2008), although it has also been used in other populations such as jazz musicians (Heckathorn and Jeffri 2001), unregulated workers (Bernhardt et al. 2009), and Native American subgroups (Walters and Simoni 2002).
RDS presents two main innovations for this setting: (1) a design for sampling from the target population and (2) a...