Content area
Full text
Homoskedasticity is an important assumption in ordinary least squares (OLS) regression. Although the estimator of the regression parameters in OLS regression is unbiased when the homoskedasticity assumption is violated, the estimator of the covariance matrix of the parameter estimates can be biased and inconsistent under heteroskedasticity, which can produce significance tests and confidence intervals that can be liberal or conservative. After a brief description of heteroskedasticity and its effects on inference in OLS regression, we discuss a family of heteroskedasticity-consistent standard error estimators for OLS regression and argue investigators should routinely use one of these estimators when conducting hypothesis tests using OLS regression. To facilitate the adoption of this recommendation, we provide easy-to-use SPSS and SAS macros to implement the procedures discussed here.
Ordinary least squares (OLS) regression is arguably the most widely used method for fitting linear statistical models. An OLS regression model takes the familiar form
where Y^sub i^ is case i"s value on the outcome variable, β^sub 0^ is the regression constant, X^sub ij^ is case j's score on the jth of p predictor variables in the model, β^sub j^ is predictor jr's partial regression weight, and ε^sub i^ is the error for case i. Using matrix notation, Equation 1 can be represented as
where y is an n × 1 vector of outcome observations, X is an n × (p + 1) matrix of predictor variable values (including a column of ones for the regression constant), and ε is an n × 1 vector of errors, where n is the sample size andp is the number of predictor variables. The p partial regression coefficients in 0 provide information about each predictor variable's unique or partial relationship with the outcome variable. Researchers are often interested in testing the null hypothesis that a specific element in B is zero or constructing a confidence interval for that element using a sample-derived estimate combined with an estimate of the sampling variance of the estimate.
The validity of the hypothesis tests and confidence intervals as implemented in most statistical computing packages depends on the extent to which the model's assumptions are met. The assumptions of the OLS regression model include that (1) the Y^sub i^s are generated according to the model specified in Equation...





