Content area
Full Text
The generalized estimating equation (GEE) approach of Zeger and Liang facilitates analysis of data collected in longitudinal, nested, or repeated measures designs. GEEs use the generalized linear model to estimate more efficient and unbiased regression parameters relative to ordinary least squares regression in part because they permit specification of a working correlation matrix that accounts for the form of within-subject correlation of responses on dependent variables of many different distributions, including normal, binomial, and Poisson. The author briefly explains the theory behind GEEs and their beneficial statistical properties and limitations and compares GEEs to suboptimal approaches for analyzing longitudinal data through use of two examples. The first demonstration applies GEEs to the analysis of data from a longitudinal lab study with a counted response variable; the second demonstration applies GEEs to analysis of data with a normally distributed response variable from subjects nested within branch offices of an organization.
Keywords: longitudinal regression; nested data analysis; generalized linear models; logistic regression; Poisson regression
Organizational researchers who investigate topics such as absenteeism, innovation, turnover intentions, and decision making have often been forced to rely on suboptimal methods of analyzing their data because responses are generally not normally distributed. Researchers may either transform the response variable prior to conducting data analysis or use a method of aggregating their response variable so as to make the distribution of responses approximately normal. But these approaches sacrifice both precision in analysis and clarity in interpreting results (Gardner, Mulvey, & Shaw, 1995; Harrison, 2002).
A separate challenge comes in analyzing data that are correlated within subject, such as that provided in longitudinal studies and other studies in which data are clustered within subgroups. Failure to incorporate correlation of responses can lead to incorrect estimation of regression model parameters, particularly when such correlations are large. The regression estimates ([beta]s) are less efficient, that is, they are more widely scattered around the true population value than they would be if the withinsubject correlation were incorporated in the analysis (Diggle, Heagerty, Liang, & Zeger, 2002; Fitzmaurice, 1995). To increase their confidence in regression results, researchers should use analytical methods that produce the most efficient parameter estimates that are also unbiased (cf. McCullagh & Neider, 1989; Pindyck & Rubinfeld, 1998), that is, with an expected...