1. Introduction
Standard randomized response (RR) methods are mainly used in surveys that elicit a binary response to a sensitive question in order to estimate the proportion of the study population presenting a given (sensitive) characteristic. Warner’s study generated a rapidly expanding body of research literature on alternative techniques for eliciting suitable RR schemes in order to estimate such a population proportion ([1,2,3,4,5,6]).
Some studies addressed situations in which the response to a sensitive question results in a quantitative variable and when the researcher wishes to estimate a linear parameter as the mean or the total of the sensitive variable under study. In the method proposed by [7], the interviewee was asked to choose, by means of a randomization device, from two questions; one concerned the sensitive variable and the other was unrelated (both were of the same order of magnitude). Other important papers in this regard include [8,9,10,11,12,13,14,15,16,17,18,19,20,21], together with the contributions compiled by [22,23,24,25,26]. When dealing with quantitative sensitive variables, the idea is that respondents should not disclose the true value of the sensitive variable but rather provide a scrambled value, which is obtained by algebraically perturbing the true response. This is done by applying one or more scrambling random variables, independent from each other and from the sensitive variable, the distributions of which are fully known to the researcher.
RR methods were also been applied to examine relationships between a qualitative sensitive variable and other variables. Thus, reference [27] showed that logistic regression may be performed with RR data, and [28] developed multivariate regression logistic techniques for four RR designs. In addition, reference [29] considered the univariate logistic regression model for binary RR response variables and presented this model as a generalized linear model. The same research group also developed a multivariate logistic regression model for RR response variables. Under simple random sampling, reference [30] considered a generalized linear model and generalized linear mixed models for RR designs where the probability of obtaining a positive response can be written as a linear equation of the answer to the sensitive question. Finally, reference [31] presented a logistic regression model on RR data when the covariates for some subjects were randomly missing.
However, few prior studies were made of regression techniques for quantitative randomized response variables. reference [32] performed a linear regression analysis using the model presented in [10] for the simple random sampling case, from which the variance of the estimate was calculated. In a related paper, reference [33] discussed the maximum likelihood estimation of an independently and identically distributed normal linear regression model when some of the covariates are subject to RR.
In this paper, we address the question of regression techniques for quantitative RR data under a general sampling design. Specifically, we consider a general class of RR methods ([34]) for quantitative variables and show how the RR can be used as the outcome in regression models.
The rest of this paper is organized as follows. First, we review the unified RRT approach described by [21] to establish the framework, and clarify the notation used (Section 2). We then show how RR can be used as the outcome in regression models, present estimators for the regression coefficients and investigate their theoretical properties in Section 3. Based on the asymptotic variance, we propose an estimator for the variance and discuss two interesting resampling methods, jackknife and bootstrap. Simulation experiments were carried out to confirm the finite size sample properties of the proposed estimators. These simulations are discussed in Section 4, after which the method described is applied to a real-world situation, that of a survey focused on sensitive characteristics. Finally, in Section 6, we summarize the main findings obtained and the conclusions drawn.
2. Randomized Response Survey Designs for Quantitative Variables
LetU={1,…,i,…,N}be a finite population consisting of N different elements. Letyibe the value of the sensitive aspect under study for the i-th population element.
In this case, y is a sensitive variable that cannot be observed directly. We consider the unified approach given by [21] because some important RR techniques [8,10,11,13] can be viewed as particular cases of this approach.
The respondent performs a random experiment with three possible outcomes. If the first result is obtained, the respondent reports the real value of variable; with the second result, the respondent reports the scrambled responseyi S1i+S2i, and otherwise the respondent reports a value of a variableS3iwhereS1,S2andS3are scramble variables whose distributions are known. In this randomization device, the distribution of the response given by person i is
zi=yiwithprobabilityp1yi S1i+S2iwithprobabilityp2S3iotherwise
mjandσj2denote the mean and the variance, respectively, of the variableSj(j=1,2,3).
The sample s of individuals is chosen according to a sampling designp(·).πi=∑s∋ip(s)andπij=∑s∋i,jp(s)wherei,j∈U are the first- and second-order inclusion probabilities. We assume that the sampling design and the randomization stage are independent of each other and that the randomization stage is performed on each selected individual independently ([35]).
The main study goal is usually to estimateY¯=1N∑i=1N yi. A design-unbiased estimator of the population meanY¯is given by the Horvitz-Thompson (HT) estimator:
y¯rrt=1N∑i∈swi ri
wherewi=1πiis the sampling weight and
ri=zi−p2 m2−p3 m3p1+p2 m1.
The variance of this estimator and an estimator of this variance are given in [21]. In cases where the population size N is unknown, is usual to consider the Hájek estimator (see [36,37]). The Hájek estimator is generally preferred to the Horvitz-Thompson estimator for the mean, although it is not considered in this paper.
3. Regression for RR Models
Consider a regression problem, in which the data that are collected on the i-th subject are the outcome variableyiand a vectorxi=(x1,x2,…,xK)′of K covariates. Under this scenario, we can consider superpopulation models, in which it is assumed that the population under studyy=(y1,…,yN)′constitutes a realization of superpopulation random variablesY=(Y1,…,YN)′under a superpopulation model M. The value of the variable of interest, associated with the i-th unit of the population, has two terms: a deterministic elementμi=g(xi′β)and a random element:
Yi=μi+ei,i=1,…,N
whereg(·)is a specific function and the random vectore=(e1,…,eN)is assumed to have a zero mean and independent components.
Now, our aim is to estimate the regression coefficientsβ. To do so, letμi=EM(Yi|xi,β)denote the expectation under the model ofYigiven the covariates andβ.
Because the values ofYicannot be observed directly we need to relate the randomized response to the linear predictor of the sensitive question. This relation is given by:
E(Zi|xi,β)=EM ER(Zi|xi,β)=EM(Yi p1+(Yi m1+m2)p2+m3 p3|xi,β)
=g(xi′β)(p1+m1 p2)+m2 p2+m3 p3
whereERdenotes the expectation under the RR mechanism.
A linear transformation of the observed values can then be performed:
ri=zi−m2 p2−m3 p3p1+m1 p2
which can be considered a realization of the variables
Ri=Zi−m2 p2−m3 p3p1+m1 p2.
Thus, we consider the new regression modelRi=g(xi′β)+ϵi. The components of random vectorϵ=(ϵ1,…,ϵN)are supposed to be independent with a zero mean and a positive definite covariance matrix which is diagonal,E(ϵi2|xi)=σ2 vi=σRi2. Theviare known constants depending onxi. This model verifies thatE(Ri)=g(xi′β)=EM(Yi).
3.1. Estimation of the Regression Coefficients
Consider the population function:
U(β)=1N∑Udiri−g(xi′β)σRi2=1N∑Uu(ri,xi,β)
wheredi=∂g(xi′β)∂β.
The population regression coefficientβNis obtained as the solution of the estimating equationsU(β)=0.βNis an estimate of the model parameterβif the census data set is known andβNdefines a parameter for the survey population if it is unknown.
Given the values observed in the sample we consider the weighted estimation function
U^(β)=1N∑swi diri−g(xi′β)σRi2
Letβ^Wbe a solution toU^(β)=0. We study the properties ofβ^Was an estimator ofβN.
The usual asymptotic framework in survey sampling is adopted: the finite population U and the sampling designp(·)are embedded within a sequence of populations and designs indexed byν,{Uν,pν}, withν→∞. Stochastic orderOp(·)is with respect to the above sequence of designs. To confirm our results, the following technical assumptions are made:
-
A.1. The survey design satisfiesU^(β)−U(β)=Op(n−1/2)for anyβ∈Θ.
-
A.2. The survey design ensures thatU^(β)is asymptotically normally distributed with meanU(β)and entries of the variance-covariance matrix at the ordern−1for anyβ∈Θ.
-
A.3. The survey design satisfies∂U^∂β=Op(1)and∂2U^∂β∂β′=Op(1)for anyβ∈Θ.
Theorem 1.
Under assumptions A.1 and A.3, the solution toU^(β)=0provides a consistent estimator for the parameterβN. If condition A.2 is also met, the weighted quasi-likelihood estimatorβ^Wis asymptotically normally distributed with meanβNand variance-covariance matrix
V(β^W)=J(βN)−1V1N∑swi diri−g(xi′ βN)σRi2J′ (βN)−1
whereVis the design variance-covariance matrix andJ(β)=1N∑U∂u(ri,xi,β)∂β.
Proof.
The estimating functionu(ri,xi,β)=diri−g(xi′β)σRi2is twice differentiable with respect toβ . [38] showed that, under these conditions, a general parameterθNgiven by the solution of the population equationU(θ)=0is consistently estimated byθ^the solution toU^(θ)=0. In our caseθN=βNandU(θ)=1N∑U diri−g(xi′β)σRi2.
Consider the following Taylor series expansion
β^W=βN−J(βN)−1U^(βN)+Op(n−1).
Thus,β^Wis asymptotically normally distributed becauseU^(βN)is asymptotically normally distributed under assumption A.2. The asymptotic variance-covariance matrix ofβ^Wis easily derived:
J(βN)−1V(U^(βN))J′ (βN)−1
and thus expression (2) is obtained. □
Remark 1.
Please note that in the RR setting there are two sources of randomness (if we do not account for the model variability), due to the sampling design, and to the randomization device that scrambles the variable of interest. Thus, the variances inV(U^(βN))are composed of two terms.
LetEdandVddenote the expectation and variance operators for any sampling design d. Taking into account the two sources of variability induced by the sampling design and the randomization device, we have the variance decomposition formula:
V1N∑swi∂g(xi′β)∂βkri−g(xi′β)σRi2=
1N2Ed VR∑swi∂g(xi′β)∂βkri−g(xi′β)σRi2+1N2Vd ER∑swi∂g(xi′β)∂βkri−g(xi′β)σRi2=
1N2Ed∑i∈swi2 σRi4∂g(xi′β)∂βk2 VR(ri)+Vd∑swi∂g(xi′β)∂βkyi−g(xi′β)σRi2=
1N2∑i∈Uwi σRi4(∂g(xi′β)∂βk)2 VR(ri)+
∑i,j∈U(wi wj πij−1)∂g(xi′β)∂βk∂g(xj′β)∂βkyi−g(xi′β)σRi2yj−g(xj′β)σRj2
whereERandVRare the expectation–variance operators over the RR device. A detailed expression ofVR(ri) can be seen in ([21], formulae 3).
The expressions of the covariances are simpler since the randomization stage is performed on each selected individual independently (covR(ri,rj)=0).
Remark 2.
Software packages such as survey [39] in R with the function svyglm can be used to fit linear and generalized linear models incorporating the design weights and thus to calculateβ^Wfrom the randomized valuesri, but the reported variances and covariances are incorrect. Accordingly, the standard significance test based on these values is invalid and can lead to grossly misleading conclusions being drawn.
From (2) we can construct a design-based estimator for the variance-covariance matrix ofβ^Wthrough the plug-in method:
v(β^W)=J^−1V^J^′−1
where
J^=1N∑swi ∂u∂ββ=β^W
and
V^=1N2∑i,j∈su˜i u˜j′wi wj πij−1πij
withu˜i=diri−g(xi′ β^W)σ^Ri2and whereσ^Ri2is an estimator ofσRi2.
This variance estimator is not unbiased because it does not include the terms of variability induced by the randomization device; moreover, it is difficult to obtain because on many occasions it does not have an estimator ofσRi2. Furthermore, the estimator requires knowledge of second-order inclusion probabilities, which are often impossible to compute or are not available for complex sampling designs.
From a practical viewpoint therefore, it is better to use the jackknife ([40]) and bootstrap techniques ([41]), which are readily applicable under diverse conditions.
The application of the jackknife method to the regression coefficient under simple random sampling is given in Section 4.4 and its use in stratified sampling is given in Section 4.5 of [42]. We apply these methods torirather thanyi.
The jackknife estimation of variance of an estimator of the population mean based on a RR survey data is considered in [43,44]. The authors show that the jackknife estimator underestimates the variance of the Horvitz-Thompson estimator of the population mean and propose modifications of the conventional jackknife estimator. These modifications include an additional term that adds an estimate of the variance due to the randomization device that scrambles the variable of interest.
The bootstrap method developed by [41] has been adjusted for survey sampling and its sampling design is incorporated in several studies (see e.g., [45,46,47]). Direct applications of bootstrap methods for estimating the variance-covariance matrix (2) involve solving the equationU^(β)=0 repeatedly for each bootstrap sample. Multiplier bootstrap with estimating functions was proposed by [48]. We use this method with theri values to estimate the variance of the proposed estimator. See [49] for a detailed description of this bootstrap method, Section 10.3.1.
Obtaining jackknife and bootstrap estimators for the variance ofβ^Wthat takes into account the randomness due to the RR process is a lot more complex than in the case of estimating means. Measuring the influence of the randomization mechanism on the variance estimation using jackknife or bootstrap is an open problem that requires further investigation.
3.2. The Homoscedastic Linear Model
Let us now consider the case of the homoscedastic linear model:μi=xi′βandvar(Ri|xi)=σ2. In this case the weighted quasi-likelihood estimateβ^Wreduces to the weighted least squared estimator that is the solution to the equation:
U^(β)=∑swi xiri−xi′βσ2=0
The solution is given by the design-weighted estimator:
β^W=∑s wi xi ri∑s wi xi xi′
This estimator is model-unbiased and design-consistent.
For this linear model, matrixJis simplified, and takes the simple expression
J=1N∑Uxi xi′σ2,
Thus, an estimator of the asymptotic variance ofβ^Wis given by:
var^(β^W)=1N∑s wi xi xi′σ^2−1var^(U^(β^W))1N∑s wi xi xi′σ^2−1
withσ^2=∑swi (ri−xi′β^)2∑s wiand wherevar^(U^(β^W))is the estimated HT variance.
3.3. The Ratio Model
We now consider the case of a single auxiliary variable, x, and the following ratio model ([37])
E(Ri)=βxiandV(Ri)=σ2 xi
The weighted quasi-likelihood estimateβWcan be reduced to the solution of the simple equation:
U^(β)=∑swiri−xiβσ2=0.
This solution is given by the design-weighted ratio estimator:
β^R=∑s wi ri∑s wi xi=y¯rrt x¯HT
wherex¯HTis the HT estimator of the population meanX¯ . The estimator of the variance of a ratio estimator is straightforwardly obtained by Taylor linearization (see e.g., [42]):
V^(β^R)=1x¯HT2(V^(y¯rrt)+β^R2V^(x¯HT)−2β^Rcov^(y¯rrt,x¯HT))
where
V^(y¯rrt)=1N2∑i∈svi wi2+∑i,j∈sri rjwi wj πij−1πij
and wherevi=1(p1+p2 μ1)2(ri2A+riB+C) (see ([21]) and
V^(x¯HT)=1N2∑i,j∈sxi xjwi wj πij−1πij.
Since
cov(y¯rrt,x¯HT)=EdcovR(y¯rrt,x¯HT)+covd(Er(y¯rrt),x¯HT)=0+covd((y¯HT),x¯HT)
an estimator for this covariance can be obtained as follows:
cov^(y¯rrt,x¯HT)=1N2∑i,j∈sxi rjwi wj πij−1πij.
4. Simulation Study
This section describes an extensive simulation study, which was implemented in R. In the first study, the variables were simulated using the R-package simstudy ([50]) and the samples were selected with sampling package discussed in ([51]).
The population size wasN=2350. The main variable y and two auxiliary variablesx1andx2were generated using the genCorData function. The means, the standard deviations and the correlation matrix were:
μ=(3,8,15),σ=(1,2.5,3)andρ=1.00.50.70.51.00.20.70.21.0
We use as sampling design stratified simple random sampling from a stratified population with six strata of sizesNh=1000, 500, 150, 250, 150 and 300. Three different combinations of sample sizes were drawn for the population, corresponding to the following number of units per stratum:
n1=(70,35,27,38,26,54)=250.
n2=(230,100,32,55,38,45)=500.
n3=(310,215,27,65,40,93)=750.
Point estimators of the coefficient of regression were computed using the Eichhorn and Hayre (EH) and the Bar-Lev, Bobovitch and Boukai (BBB) models. For both models we let S as an innocuous quantitative variable unrelated to the sensitive variable and assume that its distribution is known. In Eichhorn and Hayre model the i-th respondent answer the truth multiplied by a generated numbersifrom S. In BBB model, the procedure is as follows, the i-th respondent is asked to answer the truth about the sensible variable with probability p and answer the truth multiplied by a generated numbersifrom S with probability1−p. In this study aF20,20distribution was used for the scramble variable S, and in the BBB modelp=0.5was assumed. The use of theFn,n distribution as a scrambling distribution is justified by [10], who highlighted the protection it gives the respondent. For this reason, it is commonly used as a scramble variable in RRT simulation studies, see e.g., [17,21].
For each estimatorβ^Wof the population coefficient of regressionβN, we computed the relative biasRB=EMC(β^W−βN)/βN×100% (in percent) and the relative mean squared errorRMSE=EMC[(β^W−βN)2]/βN2×100% (in percent), whereEMCdenotes the average based on 1000 simulation runs.
The results for every possible combination are shown in Table 1.
The RMSE values in this table confirm that the estimatorsβ^W1andβ^W2 obtained using the EH method are less efficient than with BBB method. Moreover, on comparing the estimatorβ^WforβW1and forβW2the estimates for the first parameter are worse.
The second simulation study examines the behaviour of variance estimators. In this study, we obtained the plug-in method based on the asymptotic variance formulae AV (described in Section 3.1), the jackknife JK and the bootstrap BS variance estimators. Table 2 shows the average length (L) of the95%confidence intervals based on a normal distribution, the simulated coverage (Cov) probability for each method, the absolute relative bias (|RB|) and the relative mean squared error (RMSE) in percent. In this case, and for each variance estimator, AV, JK, BS, RB and RMSE are calculated based on a simulated variance obtained as the average of 1000 independent runs.
The most important observation is that, in general, all the variance estimators and the associated confidence intervals present good levels of performance. The lengths of the confidence intervals are small and the coverage probabilities of the 95% confidence interval are close to the nominal coverage. The jackknife variance estimator has the smallest length, which means there is under-coverage for the confidence interval for some sample sizes. The bootstrap variance estimator provides a short length and the resulting coverage is very close to the nominal value. We start by noting that the percent relative bias of all variance estimators were small, (less than 0.667% in absolute value for estimator AV, less than 0.233% in absolute value for estimator JK and less than 0.141% in absolute value for estimator BS). The model used to randomize the response has a low impact on the relative bias. For all models and sample sizes, we observed that JK and BS estimators are similar in terms of relative mean squared error.
This study was then repeated with a sample sizen=500and considering also aF5,5distribution of the distribution of scramble variable S. The dispersion of theβ^W1andβ^W2 values obtained for each randomization method and degrees of freedom are represented by boxplot graphics (Figure 1).
The figure shows that the values ofβ^W2are higher and the dispersion is lower than withβ^W1for all randomization methods. Moreover, the variance of the scramble variable increases in line with the dispersion.
Following this example, the value of the plug-in method based on the asymptotic variance, the jackknife and bootstrap variances and the dispersion obtained for each randomization method and degrees of freedom considered are represented by boxplot graphics (Figure 2).
For each randomization method, we note that the greater the variance of the scramble variable S, the greater the dispersion. This behaviour is especially noticeable in the estimation of parameterβ^W1. This result is expected, since adding more noise makes the dispersion increase, but in practice it is not possible to use scramble variables with little variance, as this reduces the privacy protection obtained.
To compare regression-based RR model and ratio-based RR model, we conducted the third simulation study in which both models are included. We use as sampling design the simple random sampling under a population of sizeN=10,000. Three different combinations of sample sizes were drawn from the population,n=250,500,750. As in the previous study, point estimators of the coefficient of regression were computed using the Eichhorn and Hayre (EH) and the Bar-Lev, Bobovitch and Boukai (BBB) models. AF20,20distribution was used for the scramble variable S, and in the BBB modelp=0.5was assumed. The main variable y and an auxiliary variables x were generated using the modelyi=βxi+ϵiwithE(ϵi)=σ2 xi, in this casex∼N(30,2),σ=0.5,β=7andϵi∼N(0,σ2 xi).
For all randomization methods and in both models, regression and ratio, we can see (Table 3) how the values obtained from the relative bias and the relative mean squared error are small. Focusing on the RMSE, we observe that the value decreases as the sample size increases, as we expected, and we obtain a slightly better behavior of the ratio model compared to the regression model.
5. Real Application As a real application of the methods described above, we conducted a survey by stratified random sampling at the University of *** to investigate the consumption of alcohol and drugs among the university population (in a sample of 754 students).
The sensitive question in this case was, “Indicates the age at which you started drinking alcohol and using drugs” and the RR technique used was the model proposed by [11]. To apply this model, each student was asked to use used as a randomizing device the app “Baraja Española” (a deck of cards, composed of 40 cards, divided into four families or suits, each numbered one to seven plus three face cards). When the user touches the screen, a card is shown. When it is a face card, the sensitive question should be answered; otherwise, the real number should be given, multiplied by the number shown on the card. Thus, the design parameter of the BarLev model was 3/10.
After the study data was compiled, a regression model was performed, in which the sensitive variable was taken as the dependent variable and the variable “Indicate on a scale of 0 (very bad) to 10 (optimal), how would you rate your relationship with your parents?” was an independent variable. After obtaining the value of the parameter, the estimate of the variance was obtained by the jackknife technique and the corresponding 95% confidence interval. This approach produced the following results:
β^=2.392682,v^J(β^)=9.45795e−06andIC=[2.387;2.399].
In other words, the better the relationship with their parents, the higher the age at which these students began to consume alcohol and drugs. 6. Conclusions Indirect interview techniques effectively reduce voluntary bias in surveys referring to sensitive questions. In recent years, many new techniques emerged for the estimation of proportions, means or totals of sensitive variables, but few studies addressed the question of dependency parameters. In this paper, we propose a general scheme for a randomized response (RR) technique, under a general sampling design for estimating regression coefficients. We study the theoretical properties of the proposed estimators and we derive several estimators for their variances. To assess the accuracy of the proposed estimators, a simulation study was conducted using two RR techniques. In this simulation study, the proposed estimators obtained good results in terms of relative bias and relative mean squared error. The application of the proposed technique to a real survey enabled us to relate the age at which young people begin to consume alcohol and drugs with the perceived quality of the relationship with their parents.
BBB Method | EH Method | |||||||
---|---|---|---|---|---|---|---|---|
β^W1 | β^W2 | β^W1 | β^W2 | |||||
n | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE |
250 | 4.374 | 9.152 | 1.51 | 1.44 | 7.83 | 14.73 | 2.89 | 2.25 |
500 | 2.99 | 4.13 | 0.56 | 0.07 | 6.06 | 7.07 | 1.89 | 1.08 |
750 | 1.46 | 2.2 | 0.07 | 0.86 | 1.56 | 3.27 | 1.22 | 0.89 |
Asymptotic Variance | Jackknife | Bootstrap | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
β^W1 | β^W2 | β^W1 | β^W2 | β^W1 | β^W2 | |||||||
n | L | Cov | L | Cov | L | Cov | L | Cov | L | Cov | L | Cov |
BBB method | ||||||||||||
250 | 0.161 | 0.967 | 0.085 | 0.952 | 0.122 | 0.936 | 0.066 | 0.931 | 0.129 | 0.954 | 0.070 | 0.940 |
500 | 0.116 | 0.969 | 0.060 | 0.965 | 0.085 | 0.926 | 0.045 | 0.924 | 0.095 | 0.950 | 0.051 | 0.953 |
750 | 0.082 | 0.982 | 0.043 | 0.971 | 0.058 | 0.911 | 0.031 | 0.905 | 0.070 | 0.960 | 0.038 | 0.966 |
EH model | ||||||||||||
250 | 0.189 | 0.952 | 0.101 | 0.956 | 0.153 | 0.922 | 0.083 | 0.930 | 0.163 | 0.933 | 0.089 | 0.939 |
500 | 0.133 | 0.957 | 0.069 | 0.954 | 0.107 | 0.931 | 0.057 | 0.930 | 0.120 | 0.958 | 0.064 | 0.960 |
750 | 0.092 | 0.976 | 0.049 | 0.958 | 0.072 | 0.912 | 0.039 | 0.920 | 0.087 | 0.964 | 0.047 | 0.964 |
n | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE |
BBB method | ||||||||||||
250 | 0.667 | 1.023 | 0.616 | 1.017 | 0.076 | 0.082 | 0.062 | 0.093 | 0.039 | 0.099 | 0.061 | 0.118 |
500 | 0.616 | 0.619 | 0.530 | 0.546 | 0.143 | 0.077 | 0.139 | 0.074 | 0.081 | 0.094 | 0.091 | 0.095 |
750 | 0.562 | 0.450 | 0.484 | 0.382 | 0.228 | 0.070 | 0.231 | 0.071 | 0.126 | 0.075 | 0.130 | 0.071 |
EH model | ||||||||||||
250 | 0.391 | 0.489 | 0.397 | 0.534 | 0.109 | 0.043 | 0.071 | 0.044 | 0.009 | 0.048 | 0.057 | 0.061 |
500 | 0.353 | 0.251 | 0.303 | 0.238 | 0.129 | 0.042 | 0.119 | 0.039 | 0.094 | 0.052 | 0.109 | 0.053 |
750 | 0.263 | 0.145 | 0.244 | 0.149 | 0.233 | 0.040 | 0.222 | 0.032 | 0.121 | 0.046 | 0.141 | 0.050 |
BBB Method | EH Method | |||||||
---|---|---|---|---|---|---|---|---|
β^R | β^W | β^R | β^W | |||||
n | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE |
250 | 0.042 | 0.090 | 0.083 | 0.092 | 0.083 | 0.050 | 0.085 | 0.051 |
500 | 0.128 | 0.047 | 0.158 | 0.048 | 0.132 | 0.026 | 0.129 | 0.027 |
750 | 0.168 | 0.029 | 0.201 | 0.030 | 0.119 | 0.016 | 0.116 | 0.017 |
Author Contributions
Conceptualization, M.d.M.R.; Data curation, B.C.; Formal analysis, A.A.; Funding acquisition, M.d.M.R.; Investigation, M.d.M.R.; Methodology, A.A.; Software, B.C.; Writing-original draft, M.d.M.R.; Writing-review & editing, B.C.. All authors have read and agreed to the published version of the manuscript.
Funding
This work is partially supported by Ministerio de Ciencia e Innovación of Spain [grant PID2019-106861RB-I00].
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Randomized response (RR) techniques are widely used in research involving sensitive variables, such as drugs, violence or crime, especially when a population mean or prevalence must be estimated. However, they are not generally applied to examine relationships between a sensitive variable and other characteristics. This type of technique was initially applied to qualitative variables, and studies later showed that a logistic regression may be performed with RR data. Since many of the variables considered in this context are quantitative, RR techniques were extended to these cases to estimate the values required. Regression analysis is a valuable statistical tool for exploring relationships among variables and for establishing associations between responses and covariates. In this article, we propose a design-based regression analysis for complex sample designs based on the unified RR approach. We present estimators of the regression coefficients, study their theoretical properties and consider different ways to estimate their variance. The properties of these estimation techniques were simulated using various quantitative randomized models. The method proposed was also used to analyse the findings from a real-world survey.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer