Content area
Full Text
PSYCHOMETRIKAVOL. 71, NO. 4, 713732 DECEMBER 2006
DOI: 10.1007/s11336-005-1295-9
LIMITED INFORMATION GOODNESS-OF-FIT TESTING IN MULTIDIMENSIONAL CONTINGENCY TABLES
ALBERT MAYDEU-OLIVARES UNIVERSITY OF BARCELONA AND INSTITUTO DE EMPRESA BUSINESS SCHOOL
HARRY JOE
UNIVERSITY OF BRITISH COLUMBIA
We introduce a family of goodness-of-t statistics for testing composite null hypotheses in multidimensional contingency tables. These statistics are quadratic forms in marginal residuals up to order r. They are asymptotically chi-square under the null hypothesis when parameters are estimated using any asymptotically normal consistent estimator. For a widely used item response model, when r is small and multidimensional tables are sparse, the proposed statistics have accurate empirical Type I errors, unlike Pearsons X2. For this model in nonsparse situations, the proposed statistics are also more powerful than X2. In addition, the proposed statistics are asymptotically chi-square when applied to subtables, and can be used for a piecewise goodness-of-t assessment to determine the source of mist in poorly tting models.
Key words: multivariate discrete data, categorical data analysis, multivariate multinomial distribution, composite likelihood, item response theory, Lisrel.
1. Introduction
Consider the problem of modeling N independent and identically distributed observations on n discrete random variables consisting, respectively, of K1,...,Kn categories. This type of data arises, for ingfstance, in surveys, educational tests, or social science questionnaires when the number of choices is not constant over items. The observed data can be gathered in an n-dimensional contingency table with C =
ni Ki cells.
Now, consider a parametric model, (), where is the C-dimensional vector of cell probabilities, which depends on a q-dimensional parameter vector which is typically estimated from the data. For assessing the t of the model, consider a composite null hypothesis H0 :
= () for some versus H1 : = () for any . Researchers confronted with testing such a composite hypothesis face two problems. First, how to assess the overall goodness of t of the hypothesized model and, second, how to determine the source of the mist in poorly tting models.
The two most commonly used goodness-of-t statistics for testing the overall goodness of t of a parametric model in multivariate categorical data analysis are Pearsons X2 = 2N Cc=1(pc c)2/c, and the likelihood ratio statistic G2 = 2N Cc=1 pc ln(pc/c). When the model holds, the two...