Abstract

This article evaluates the use of the multiple imputation framework to protect the confidentiality of respondents' answers in sample surveys. The basic proposal is to simulate multiple copies of the population from which these respondents have been selected and release a random sample from each of these synthetic populations. Users can analyze the synthetic sample data sets with standard complete-data software for simple random samples, then obtain valid inferences by combining the point and variance estimates using the methods in this article. Both parametric and nonparametric approaches for simulating these synthetic databases are discussed and evaluated. It is shown, using actual and simulated data sets in simple settings, that statistical inferences from these simulated research databases and the actual data sets are similar, at least for a class of analyses. Arguably, this class will be large enough for many users of public-use data. Users with more detailed demands may have to apply for special access to the confidential data.

Details

Title
Multiple Imputation for Statistical Disclosure Limitation
Author
Raghunathan, TE; Reiter, JP; Rubin, DB
First page
1
Publication year
2003
Publication date
Mar 2003
Publisher
Statistics Sweden (SCB)
ISSN
0282423X
e-ISSN
20017367
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1266794989
Copyright
Copyright Statistics Sweden (SCB) Mar 2003