Content area
Full Text
Abstract
Background: Missing data is a common problem in cancer research. Although simple methods, such as complete-case (C-C) analysis, are commonly employed to deal with this problem, several studies have shown that such methods lead to biased estimates. The aim of this study was to address the issues encountered in the development of a prognostic model when missing data exist.
Patients and Methods: A total of 310 breast cancer patients were recruited. Initially, the patients with missing data for any of the 4 candidate variables were excluded. Then, the missing data were imputed 10 times. Cox regression model was fitted to the C-C and imputed data. The results were compared in terms of the variables retained in the model, discrimination ability, and goodness of fit.
Results: In the C-C analysis, some variables lost their significance because of a loss in power, but after imputation of the missing data, these variables reached significant level. The discrimination ability and goodness of fit of the imputed data sets model was higher than those of the C-C model (C-index, 76 % versus 72 %; likelihood ratio test result, 51.19 versus 32.44).
Conclusions: The results indicate the inappropriateness of an ad hoc C-C analysis. This approach leads to loss in power of the variables and imprecise estimates. Application of multiple imputation techniques is recommended for avoiding such problems.
Keywords: Prognostic model; Missing data; Multiple imputation; Breast cancer
(ProQuest: ... denotes formulae omitted.)
Introduction
Prognostic models combine key patient characteristics (risk factors) to predict clinical outcomes such as recurrence of cancer. These models are excellent tools for investigating the contribution of variables to the course of a disease and for selecting the appropriate treatment approach (1). However, if the model assumptions are ignored during its development, the results may be misleading (2,3). One of the challenges in modeling practice is incomplete data. In survival analysis, a problem occurs when data on risk factors are missing (4). The traditional response to this problem is to exclude the individuals with incomplete data for any prognostic factors from the analysis (such an analysis is known as complete-case analysis [C-C analysis]) (4). However, exclusion of missing data leads to reduction in the sample size, which reduces the precision of estimates and can lead...