Content area
Abstract
This dissertation presents a multivariate survival model for clustered survival times such as those reflecting the survival experience of children in the same family. The model is multivariate not only in the usual sense of having multiple predictors or independent variables, but also in the sense of having multiple responses or dependent variables. Survival data are a particular type of data in which interest focuses on the occurrence of a well-defined event such as death and in which the subject is observed from the start of exposure to the event until either the event occurs or the subject is censored.
Chapter 1 provides statistical and sociological motivations for the multivariate survival model. Statistically, blindly applying a conventional survival model to correlated survival data is simply incorrect. Sociologically, ignoring the correlation may lead to quite a different conclusion. Moreover, the multivariate survival model may be used to model the unobserved cluster effects in a substantive way. Chapter 1 discusses these issues in the context of many concrete cases of sociological and demographic research.
Chapter 2 describes and interprets the model. Chapter 3 uses the EM algorithm to obtain the maximum likelihood estimates of the parameters. The EM algorithm is numerically stable and particularly appropriate for estimating the unobserved cluster effects, which the algorithms treats as missing data. Chapter 4 extends the standard EM algorithms to obtain the standard errors of parameter estimates. Chapter 5 speeds up the algorithm by taking advantage of the estimated standard errors. The speeding algorithm is particularly important for social scientists whose data analysis often involves several thousands of observations and many covariates.
In Chapter 6, the model is applied to child survival data from Guatemala previously analyzed by Pebley and Stupp. The reanalysis of the data with the multivariate survival model shows that ignoring the correlation tends to underestimate the standard errors and thus exaggerate the z-ratios of the parameter estimates. The parameter estimates from the reanalysis have remained largely identical to those obtained by Pebley and Stupp in this particular case. The reanalysis also suggests that the unobserved familial factors net of household socioeconomic status are relatively unimportant to child survival, at least in this Guatemalan data set. It is argued that the size of the estimated familial effects can be viewed as the upper bound of the size of the familial genetic factors.
In the concluding chapter, directions in which this work may be extended are suggested.