Multiple Imputation for Statistical Disclosure

Abstract

This article evaluates the use of the multiple imputation framework to protect the confidentiality of respondents' answers in sample surveys. The basic proposal is to simulate multiple copies of the population from which these respondents have been selected and release a random sample from each of these synthetic populations. Users can analyze the synthetic sample data sets with standard complete-data software for simple random samples, then obtain valid inferences by combining the point and variance estimates using the methods in this article. Both parametric and nonparametric approaches for simulating these synthetic databases are discussed and evaluated. It is shown, using actual and simulated data sets in simple settings, that statistical inferences from these simulated research databases and the actual data sets are similar, at least for a class of analyses. Arguably, this class will be large enough for many users of public-use data. Users with more detailed demands may have to apply for special access to the confidential data.

Details

Title

Multiple Imputation for Statistical Disclosure Limitation

Author

Raghunathan, TE; Reiter, JP; Rubin, DB

First page

Publication year

2003

Publication date

Mar 2003

Publisher

Statistics Sweden (SCB)

ISSN

0282423X

e-ISSN

20017367

Source type

Scholarly Journal

Language of publication

English

ProQuest document ID

1266794989

Multiple Imputation for Statistical Disclosure Limitation

Jump to:

Abstract

Details

Suggested sources