Content area
Full text
ABSTRACT
A greater variety of categorical data methods are used today than 15 years ago. This article surveys categorical data methods widely applied in public health research. Whereas large sample chi-square methods, logistic regression analysis, and weighted least squares modeling of repeated measures once comprised the primary analytic tools for categorical data problems, today's methodology is comprised of a much broader range of tools made available by increasing computational efficiency. These include computational algorithms for exact inference of small samples and sparsely distributed data, conditional logistic regression for modeling highly stratified data, and generalized estimating equations for cluster samples. The latter, in particular, has found wide use in modeling the marginal probabilities of correlated counted, binary, and multinomial outcomes. The various methods are illustrated with examples including a study of the prevalence of cerebral palsy in very low birthweight infants and a study of cancer screening in primary care settings.
KEY WORDS: exact inference, conditional logistic regression, proportional odds, generalized estimating equations, weighted least squares
INTRODUCTION
Public health research is frequently concerned with the relationship of categorical response variables with one or more explanatory variables. A categorical response variable may have two or more possibly ordered categories. For example, in a study of the factors related to cerebral palsy in premature neonates,
the outcome of interest is whether an infant has developed cerebral palsy by one year of age. Categorical data methods are used to describe trends in the rate of cerebral palsy in infants born at a particular North Carolina hospital. In a statewide study of cancer screening in primary care practices in Colorado, interest is in the physician and patient factors which predict the extent of breast cancer screening that women receive. The outcome for each woman takes a value of 0, 1, or 2 for the number of types of screening procedures (mammogram and clinical breast exam) received in the past year. The statistical analysis should consider that the level of one woman's cancer screening may be correlated to that of another patient in the same medical practice. An array of categorical data methods are available for analyzing such data. Only 15 years ago, much of it was either nonexistent, not widely known, or not easily implemented because computational...





