Highlights
Introduction
For the past 30 years, the Canadian Cancer Society and the Government of Canada (Public Health Agency of Canada and Statistics Canada) have published an annual comprehensive report, Canadian Cancer Statistics (CCS). The report includes a series of population cancer incidence and mortality counts and rate projections that fill the gap between the latest available year of data and the year the report is released. These projections are a planning and prioritizing resource for stakeholders; they also keep the Canadian population informed on the considerable burden of cancer.
A few projection models have been used over the years to produce the CCS. The Poisson regression1 used from 2003 to 2012 changed to Nordpred in 2011/2012. Nordpred, an R-package that was developed in Norway, makes available one single projection model, the age–period–cohort (APC) model with a drift component.2 Nordpred is a well-studied package that has been shown to improve the reliability of cancer projections.3 4 5 6 7
In an effort to further cancer projections, Qiu et al. developed Canproj, which is also an R-package.8 Canproj has three key advantages over Nordpred: 1) replacement of the Poisson distribution by the negative-binomial distribution when over-dispersion is present; 2) inclusion of an age–cohort model; and 3) a set of hybrid models that combine the strengths of Poisson or negative-binomial regression, the segmented regression method,9 and an average method for projections based on age-specific counts. Some of the features of Canproj were used for the 2017 CCS10 while the full package was utilized for the 2019 CCS.
Canproj is a relatively new cancer-projection tool that has neither been extensively used nor validated.11 12 The objective of our project was to validate the national short-term (up to 5 years) cancer incidence projections generated by the Canproj package using Canadian data. Specifically, we compared the outputs of the Canproj projection models to actual data using the holdout cross-validation method13 and graphical representation. We also evaluated the automatic model selection features of Canproj (decision trees) to assess the capacity of these functions to select the best model.
Methods
Data
Cancer incidence data from 1986 to 2014 from the National Cancer Incidence Reporting System (NCIRS) and Canadian Cancer Registry (CCR) were used for the analysis.14 Data from the province of Quebec were not included since the provincial cancer registry has not submitted new data to the CCR since 2010. The data file used the International Agency for Research on Cancer’s international rules for multiple primary cancers.15 Results were tabulated for all cancers combined, by cancer site (the same cancer sites as those included in the CCS annual reports) and sex.10 A dataset was created for each combination of sex (n = 2) and cancer type (19 common to males and females plus five sex-specific types: cervix, ovary, uterus, prostate and testis). Datasets contained information by years from 1986 to 2014 and eighteen 5-year age groups (0–4, 5–9, …, 85+). Annual population estimates by geography, age and sex were provided by Statistics Canada with post-censal population estimates based on the 2016 Canadian census.16 Inter- and post-censal estimates were adjusted by Statistics Canada for net under-coverage. Rates were age standardized using the direct method and the 2011 Canada population.17
Canproj
The Canproj R-package contains several models used to project cancer incidence or mortality data. These include the Nordpred model, which incorporates age, drift, period and cohort effects; the age–cohort model; three hybrid models that incorporate age and potentially period effects (age-specific or all ages); and the 5-year average model (Table 1).18
Table 1. Models available in the Canproj R-package and variables included in the models
Modela | Model variables | |||
---|---|---|---|---|
Age | Period | Cohort | Drift | |
Nordpred | ✓ | ✓ | ✓ | ✓ |
Age-cohort | ✓ | – | ✓ | – |
Hybrid age-specific trendb | ✓ | ✓ | – | – |
Hybrid age-common trendc | ✓ | ✓ | – | – |
Hybrid age only (average)d | ✓ | – | – | – |
5-year averagee | ✓ | – | – | – |
This variable is included in this model. Return to first footnote ✓ This variable is not included in this model. Return to first footnote – Poisson or negative-binomial distribution can be selected for Nordpred and the age-cohort and hybrid age-specific models. a The period trend is calculated by age group. b The period trend is common to all age groups. c Rate average based on a number of years determined by the magnitude of the age-standardized rate. d Rate average based on the most recent 5 years of data. e |
The Canproj package uses two decision trees to determine which model is the most appropriate based on the significance of the variables. Alternatively, models can be selected individually. At first, Canproj considers four variables, namely age, period, cohort and a drift parameter; this is the most complex model, and these are the variables Nordpred uses. Canproj first determines if the cohort variable is significant. If it is significant, Canproj determines if the drift parameter is significant. If the cohort variable and the drift parameter are both significant, Canproj selects the Nordpred model to make the projections. If the cohort variable is significant but the drift parameter is not, Canproj selects the age–cohort model.
If the cohort effect is not significant, Canproj selects one of the hybrid models. If the number of cases is too small to run a regression model, a 5-year average is calculated. If the number of cases is big enough, Canproj will fit two models: an “age-common trend” model and an “age-specific trend” model. If the age-specific trend model has a better fit, then this model is selected. If not, the age-common trend model is selected. The slope of the common trend variable is then tested to determine if it differs from zero. If it is not different, then only the age variable is used in the model; if it is, the age + common trend model is used.
Validation
Cross-validation was used to estimate the accuracy of the Canproj-generated projections by using a subset of the data (the training data) and validating the results on the other subset (the independent testing data). This study used the holdout method13 to create the training and the independent testing datasets. Data from 1986 to 2010 (five 5-year periods) were used as the training data, and data from 2011 to 2014 (the last 4 years of data) were used as the independent testing data. The predictions from the training model and the actual data from the last 4 years were compared to evaluate the accuracy of the projection models.
The validation measure we used, the relative bias (RB), compares the expected value generated by the projection models to the observed values in the testing dataset for diagnosis years 2011 to 2014. The RB measures the relative difference in percentage between the expected (or projected) value (E) and the observed value (O).
RBt=∣Et-Qt∣Ot×100, where t = 2011 to 2014
In our case, the “value” investigated is the age-standardized rate.
The RBs were summarized by projection model, cancer type and sex.
We compared the mean and median RBs by model, cancer type and sex over the 4-year projected (testing) period. Median RB indicates the typical performance of a model, whereas mean RB (due to its sensitivity to extreme values) helps reveal models that are typically accurate but occasionally very inaccurate.
Joinpoint analyses
We used Joinpoint Trend Analysis Software version 4.5.0.1 (National Cancer Institute, Bethesda, MD, USA)19 to calculate trends in Canadian cancer incidence by type and sex between 1986 and 2010. Joinpoint model estimates were used to calculate the 1986 to 2010 RBs. This measure gives an estimate of the variability of the training data, which we compared to the RB measured on the projected data. The maximum number of joinpoints was set to 4; the minimum number of observations from a joinpoint to either end of the data was set to 3; and the minimum number of observations between two joinpoints was set to 4. Otherwise, the default joinpoint parameters were used. The log-transformed age-standardized rates and associated standard errors input into joinpoint were calculated in statistical package SAS version 9.3 (SAS Institute Inc., Cary, NC, USA). Canproj was run using R version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria) and RStudio version 1.1.453 (RStudio Inc., Boston, MA, USA).
Performance indicators
Two indicators were used to highlight which models would likely project rates less reliably. The first was the identification of a joinpoint over the most recent 10-year period in the data used to train the projection models (2001–2010). Recent changes in the trend could indicate that the models will have more difficulty performing reliable projections. We divided the joinpoints between those that happened between 10 to 6 years before the last year of training data available and those that happened 5 to 3 years before the last year of data. Joinpoints were not allowed to occur between 0 and 3 years. In Table 2, yellow cells indicate joinpoints that happened between 2001 and 2005 and orange cells indicate those that happened between 2006 and 2008.
Table 2. 2011–2014 median relative bias (%) by model and cancer type
Sex | Cancer type | Model | Diagnostic | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Nordpred | Age–cohort | Hybrid | 5-year average | JPb | RB ratioe | RB (%) | ||||
Age-specific trend | Age-common trend | Age only | ||||||||
Male | All cancers | 11.0 | 10.1 | 5.9 | 5.6a | 7.8 | 7.5 | 2007d | 5.7g | 1.0 |
Oral | 8.8 | 13.4 | 14.8 | 12.7 | 1.0a | 6.3 | 2003c | 0.7 | 1.3 | |
Esophagus | 2.6 | 2.2a | 3.0 | 2.7 | 5.9 | 2.2 | 2005c | 0.8 | 2.8 | |
Stomach | 3.1a | 3.8 | 3.9 | 3.9 | 25.2 | 8.5 | 1986 | 1.6 | 1.9 | |
Colorectal | 7.4 | 7.7 | 6.4a | 8.1 | 10.2 | 7.8 | 2008d | 16.0g | 0.4 | |
Liver | 4.4 | 4.2 | 4.8 | 3.9a | 26.2 | 10.0 | 1986 | 1.0 | 4.0 | |
Pancreas | 3.0a | 6.9 | 6.5 | 5.4 | 3.1 | 3.5 | 1997 | 2.3f | 1.3 | |
Larynx | 1.4a | 3.6 | 2.4 | 2.0 | 44.0 | 18.2 | 1986 | 0.6 | 2.4 | |
Lung and bronchus | 3.1 | 1.8 | 1.9 | 1.7a | 30.2 | 11.1 | 1986 | 1.3 | 1.3 | |
Melanoma | 1.4a | 6.1 | 1.5 | 4.6 | 17.6 | 8.3 | 1986 | 0.7 | 2.0 | |
Breast | 4.9 | 4.3a | 6.3 | 6.0 | 6.6 | 6.7 | 1986 | 0.6 | 6.9 | |
Prostate | 48.4 | 90.1 | 41.4 | 44.4 | 33.8 | 33.1a | 2001c | 11.2g | 2.9 | |
Testis | 1.4 | 1.4 | 1.3 | 1.3a | 13.9 | 7.2 | 1986 | 0.3 | 4.4 | |
Urinary bladder | 10.1 | 15.9 | 12.4 | 14.7 | 9.8 | 9.3a | 1990 | 3.5f | 2.7 | |
Kidney and renal pelvis | 5.0 | 2.8 | 2.0 | 2.0a | 9.1 | 4.8 | 1998 | 1.0 | 1.9 | |
Brain/CNS | 4.1 | 2.9a | 4.3 | 3.4 | 7.7 | 5.6 | 1986 | 1.5 | 2.0 | |
Thyroid | 3.7a | 17.0 | 13.0 | 13.0 | 48.9 | 27.4 | 1997 | 0.8 | 4.6 | |
Hodgkin lymphoma | 1.4 | 1.4 | 2.1 | 1.6 | 3.4 | 1.3a | 1986 | 0.5 | 2.6 | |
Non-Hodgkin lymphoma | 7.7 | 7.4 | 6.6a | 7.3 | 8.5 | 7.8 | 2007d | 4.4f | 1.5 | |
Myeloma | 5.2 | 4.7 | 5.1 | 4.6a | 10.0 | 6.8 | 1986 | 1.3 | 3.6 | |
Leukemia | 6.2 | 3.8 | 6.1 | 5.2 | 0.8a | 3.5 | 1994 | 0.4 | 2.0 | |
All others | 4.0 | 4.1 | 2.8 | 2.8a | 4.5 | 3.6 | 2003c | 2.1f | 1.3 | |
Female | All cancers | 0.9 | 0.8a | 0.8 | 0.8 | 3.3 | 0.9 | 1986 | 1.0 | 0.8 |
Oral | 3.1 | 4.2 | 4.2 | 4.2 | 1.7a | 2.8 | 1986 | 0.6 | 2.9 | |
Esophagus | 1.1 | 1.1 | 1.5 | 0.9a | 6.8 | 1.3 | 1986 | 0.3 | 3.4 | |
Stomach | 1.3a | 2.7 | 7.2 | 4.4 | 20.5 | 3.5 | 1992 | 0.6 | 2.0 | |
Colorectal | 4.0 | 3.7 | 2.7a | 4.2 | 8.3 | 4.9 | 2000 | 5.5g | 0.5 | |
Liver | 5.3 | 4.8 | 5.1 | 4.4a | 21.2 | 8.9 | 1986 | 0.7 | 6.4 | |
Pancreas | 4.3 | 5.1 | 5.5 | 4.9 | 3.5a | 4.2 | 1986 | 1.4 | 2.5 | |
Larynx | 10.0a | 13.6 | 17.3 | 15.1 | 64.1 | 28.5 | 1986 | 1.8 | 5.5 | |
Lung and bronchus | 1.3 | 1.3a | 9.0 | 4.9 | 3.9 | 1.9 | 2006d | 2.1f | 0.6 | |
Melanoma | 2.8a | 8.4 | 3.5 | 3.4 | 16.3 | 9.3 | 1992 | 1.4 | 2.0 | |
Breast | 2.3 | 3.2 | 0.7a | 1.3 | 1.9 | 1.6 | 1991 | 0.4 | 1.9 | |
Cervix uteri | 3.4 | 4.7 | 1.4a | 1.4 | 22.3 | 8.6 | 2006d | 0.7 | 2.1 | |
Uterus | 3.1 | 8.0 | 3.4 | 2.6a | 10.0 | 10.0 | 2005c | 1.8 | 1.5 | |
Ovary | 1.1 | 1.1a | 1.2 | 1.7 | 9.9 | 4.7 | 1986 | 0.6 | 1.7 | |
Urinary bladder | 14.1 | 14.2 | 13.1 | 14.5 | 10.1 | 9.7a | 1986 | 3.1f | 3.1 | |
Kidney and renal pelvis | 12.0 | 4.3 | 4.6 | 4.3 | 6.1 | 4.2a | 1986 | 1.8 | 2.4 | |
Brain/CNS | 5.6 | 5.3a | 5.7 | 5.5 | 8.8 | 6.2 | 1986 | 2.2f | 2.4 | |
Thyroid | 4.5a | 6.3 | 5.4 | 5.9 | 50.1 | 21.7 | 2005c | 2.7f | 1.7 | |
Hodgkin lymphoma | 10.3 | 10.9 | 12.2 | 11.6 | 8.0a | 10.4 | 1986 | 2.4f | 3.4 | |
Non-Hodgkin lymphoma | 5.3 | 4.9 | 4.4a | 4.9 | 5.8 | 6.0 | 1997 | 3.9f | 1.1 | |
Myeloma | 3.8a | 4.1 | 3.9 | 3.9 | 6.8 | 4.2 | 1986 | 1.0 | 3.7 | |
Leukemia | 14.3 | 5.0 | 4.7 | 6.8 | 2.6a | 4.8 | 2001c | 1.5 | 1.7 | |
All others | 3.1 | 4.2 | 3.6 | 4.1 | 3.0a | 3.1 | 2004c | 3.5f | 0.8 | |
Abbreviations: CNS, central nervous system; JP, joinpoints; RB, relative bias. Models with the smallest 2011–2014 median RB. Return to first footnote a Year of most recent joinpoint for rate trends. b Joinpoints that happened between 2001 and 2005. Return to first footnote c Joinpoints that happened between 2006 and 2008. Return to first footnote d The RB ratio is the ratio of median RB for the 2011–2014 period to the median RB for the 1986–2010 period. In order to show the cancer sites that were more difficult to model, the continuous RB ratios were grouped as per footnotes f and g. e The 2011–2014 median RB is 2 to 5 times higher than the 1986–2010 median RB. Return to first footnote f The 2010–2014 median RB is more than 5 times higher than the 1986–2010 median RB. Return to first footnote g |
For the second indicator, we used the RB ratio, which is the ratio of RB from 1986 to 2010 to the RB for the 2011–2014 period. We considered that the bias from the projected rates should be at least equal to or greater than the bias in the rates that were used to build the projection models. To obtain the 1986 to 2010 RB, we used the output of the joinpoint analysis. In Table 2, if the 2011–2014 RB was 2 to 5 times higher than the 1986–2010 RB, table cells are in yellow; if the 2011–2014 RB was more than 5 times higher than the 1986–2010 RB, the cells are in orange. These cutoffs were arbitrarily determined after looking at the distribution of the results.
Results
Canproj models
Five of the six models (Nordpred, the age–cohort model, the hybrid common trend model, the hybrid age-specific trend model and the 5-year average model) had mean RB between 5% and 10% and a median RB around 5% (Figure 1). Greater variation was observed in the mean and median RB when the accuracy of the projection models was compared by cancer site (Figure 2). None of the models were good at predicting prostate cancer and a greater predictive variability was apparent for cancers of the thyroid, larynx, bladder, liver and brain/central nervous system (CNS).
[Image removed - see PDF.]
Figure footnote
Note: See the following link for details about box plots: http://onlinestatbook.com/2/graphing_distributions/boxplots.html
[Image removed - see PDF.]
Figure 2 footnotes
Abbreviations: CNS, central nervous system; HL, Hodgkin lymphoma; NHL, non-Hodgkin lymphoma.
Note: See the following link for details about box plots: http://onlinestatbook.com/2/graphing_distributions/boxplots.html
A more detailed and slightly different picture emerges when models are graphically compared by type of cancer and sex (Figure 3, Table 2). The performance of all projection models was poor for male all cancer sites combined, male and female colorectal, prostate, male bladder, male and female brain/CNS, female Hodgkin lymphoma and male myeloma. The greater variation observed in Figure 2 for cancers of the liver, larynx and thyroid seems to be due to the inability of a few models to predict rates.
[Image removed - see PDF.]
[Image removed - see PDF.][Image removed - see PDF.][Image removed - see PDF.][Image removed - see PDF.][Image removed - see PDF.][Image removed - see PDF.]
Cancer sites that showed recent change in trend for which projections could potentially be improved by changing the length of the data included male all cancers, male and female colorectal cancer and prostate cancer. We ran separate hybrid models on these cancers sites using the last 7 years of data only. It was possible to increase the fit of the projections substantially for all four cancer sites. For colorectal cancer, it was possible to bring the RB ratio from 16.0 to 2.6 for males and from 5.5 to 1.9 for females. We were also able to bring the RB ratio from 5.7 to 5.4 for male all cancers combined and from 11.2 to 7.7 for prostate cancer.
Canproj model decision trees
As shown in Table 3, the cohort effect and the drift parameter were significant 79% of the time (34 out of 43 models), which makes Nordpred the model most often selected by Canproj. However, Nordpred was the model with the smallest RB only 24% of the time. Nevertheless, the mean RB was between 0 and 5% for at least one of the six models 76% of the time and between 6% and 10% for at least one model 20% of the time.
Table 3. Canproj decision tree: average relative bias by model, sex and cancer site
Table 3. Canproj decision tree: average relative bias by model, sex and cancer siteSex | Cancer type | Model | |||||
---|---|---|---|---|---|---|---|
Nordpred | Age–cohort | Hybrid | 5-Year average | ||||
Age-specific trend | Age-common trend | Age only | |||||
Male | All cancers | 10.9a | 10.1 | 5.9 | 5.7b | 7.8 | 7.5 |
Oral | 8.6a | 13.2 | 14.8 | 12.6 | 1.6b | 5.9 | |
Esophagus | 3.9 | 2.6b | 4.2a | 3.2 | 6.8 | 3.0 | |
Stomach | 3.4c | 3.7 | 4.1 | 3.7 | 26.1 | 9.5 | |
Colorectal | 7.9a | 8.2 | 6.9b | 8.6 | 10.7 | 8.3 | |
Liver | 6.8a | 6.5 | 7.6 | 6.2b | 26.0 | 8.6 | |
Pancreas | 3.2 | 7.0 | 6.2a | 5.0 | 3.2b | 3.8 | |
Larynx | 2.1c | 3.6 | 2.5 | 2.2 | 39.4 | 16.2 | |
Lung and bronchus | 3.2a | 1.7 | 2.6 | 1.6b | 32.4 | 12.9 | |
Melanoma | 2.1c | 5.5 | 2.8 | 3.9 | 17.6 | 7.8 | |
Breast | 6.2b | 6.8 | 8.6 | 8.3a | 6.4 | 6.2 | |
Prostate | 45.9a | 86.4 | 38.9 | 41.8 | 31.3 | 30.6b | |
Testis | 2.4 | 2.4a | 2.2 | 2.1b | 14.3 | 7.1 | |
Urinary bladder | 12.3a | 17.3 | 13.7 | 15.9 | 11.2b | 11.6 | |
Kidney and renal pelvis | 5.5a | 3.0 | 2.4 | 2.3b | 9.4 | 5.3 | |
Brain/CNS | 6.7a | 6.0b | 7.0 | 6.3 | 10.0 | 8.0 | |
Thyroid | 3.3c | 16.8 | 13.4 | 13.5 | 46.8 | 26.9 | |
Hodgkin lymphoma | 3.4b | 3.4 | 3.5 | 3.4 | 5.5a | 3.8 | |
Non-Hodgkin lymphoma | 7.5a | 7.2 | 6.9b | 7.0 | 8.5 | 7.5 | |
Myeloma | 6.6 | 6.3b | 6.6a | 6.4 | 10.7 | 7.9 | |
Leukemia | 6.0 | 3.9 | 6.0a | 5.5 | 2.8b | 3.6 | |
All others | 3.8a | 4.8 | 3.8 | 3.6 | 4.6 | 3.6b | |
Female | All cancers | 1.3a | 1.1 | 1.0b | 1.1 | 3.4 | 1.1 |
Oral | 3.4 | 4.1a | 4.4 | 3.9 | 2.9b | 3.3 | |
Esophagus | 3.6 | 3.5 | 3.2 | 3.0c | 8.8 | 3.9 | |
Stomach | 2.7c | 2.9 | 7.2 | 4.1 | 20.7 | 4.9 | |
Colorectal | 5.2a | 4.9 | 3.7b | 5.4 | 9.5 | 6.1 | |
Liver | 5.4a | 4.9b | 5.2 | 5.5 | 21.8 | 9.0 | |
Pancreas | 4.9 | 5.1 | 5.2 | 5.0 | 4.9a | 4.7b | |
Larynx | 10.2c | 14.5 | 16.7 | 15.3 | 53.4 | 24.0 | |
Lung and bronchus | 1.6c | 1.6 | 10.3 | 5.8 | 3.5 | 2.4 | |
Melanoma | 3.0c | 8.4 | 4.2 | 4.1 | 16.9 | 9.1 | |
Breast | 2.3a | 2.9 | 1.2b | 1.5 | 1.8 | 1.7 | |
Cervix uteri | 3.4a | 4.6 | 1.5 | 1.5b | 20.9 | 8.3 | |
Uterus | 2.8c | 7.9 | 3.0 | 3.6 | 10.1 | 10.1 | |
Ovary | 1.9a | 1.8b | 1.8 | 1.8 | 9.7 | 4.0 | |
Urinary bladder | 14.6a | 14.7 | 13.7 | 15.0 | 10.7b | 10.7 | |
Kidney and renal pelvis | 12.0a | 5.4 | 6.0 | 5.8 | 7.7 | 4.1b | |
Brain/CNS | 7.8a | 7.1b | 8.2 | 7.7 | 10.8 | 9.1 | |
Thyroid | 4.3c | 5.7 | 5.5 | 5.3 | 47.9 | 21.6 | |
Hodgkin lymphoma | 10.1 | 10.5a | 11.5 | 11.0 | 7.9b | 10.0 | |
Non-Hodgkin lymphoma | 5.6a | 5.3 | 5.4 | 5.3b | 5.8 | 5.7 | |
Myeloma | 3.7b | 3.9 | 3.8a | 3.8 | 6.4 | 4.4 | |
Leukemia | 13.2a | 5.6 | 5.2 | 6.4 | 4.1b | 5.1 | |
All others | 3.0c | 4.2 | 3.7 | 4.0 | 3.1 | 3.2 | |
Abbreviations: CNS, central nervous system; RB, relative bias. The projection models selected by Canproj. Return to first footnote a The models with smallest RB. Return to first footnote b The Canproj selection is the model with the smallest RB. Return to first footnote c |
Discussion
Our aim was to validate short-term projections generated by Canproj using Canadian cancer incidence data. The results show that the range of models Canproj offers supports making reliable projections for most of the cancer sites investigated. When variations in rates were identified within the last 10 years of training data, it was possible to use the recent, shorter time period as the projection base for the hybrid models to improve the accuracy of the projected rates.
The large jump in bladder cancer rates in 2013/2014 is due to changes in reporting rules in Ontario;20 starting in 2013, Ontario added in situ bladder to malignant bladder cancer in their registry.
Brain/CNS, colorectal cancer, female Hodgkin lymphoma, prostate and male all cancer combined rates are declining faster than the models predicted, while male myeloma is increasing faster than the models predict. The poor performance at predicting these cancer rates is related to the recent and rapid changes in their rates that were not part of the training dataset or happened in the last few years of the training dataset.
We evaluated the automatic model selection feature of Canproj (decision trees) to assess the capacity of these functions to select the best model. For the national dataset, Nordpred was the model most often selected by Canproj decision tree although it was the one with the smallest RB only 24% of the time. Other models can outperform Nordpred when analyzing data from smaller populations.21 Personal and others’ experiences with Canproj suggest that the decision tree selection should be used in combination with individual outputs of each model and expert advice to select the best projection model.22
The results of this project build on prior Canadian studies that examined different cancer projection methods. Lee et al. (2011) compared the accuracy of 16 models and model variations for projecting short-term cancer mortality rates.23 They found that no single method was able to consistently provide accurate forecasts for a wide range of cancer sites and that a choice of models is preferable. Qiu et al. (2010) compared the Nordpred model, the generalized additive model and the Bayesian model.8 They concluded that when the age, drift and cohort effects are present, the Nordpred method is the preferred approach; when the age and cohort effects are present, an age–cohort model is the best approach; and when the cohort effect is not present, a hybrid method should be used. They also found that for small cancer sites, data aggregation is required to apply the hybrid method. In 2010, the Canadian Cancer Projections Network (C-Proj) released a report in which they evaluated Nordpred, hybrid, age–cohort and Bayesian models using Markov chain Monte Carlo cancer incidence projection methods with data from the Nova Scotia Cancer Registry.21 They suggested that the age–cohort method should be used for cancer projections for provinces with small and stable populations.
Although cancer incidence projections are routinely performed, only a few studies describe the evaluation of alternative methods; the recommendations depend on the population included and projection time frame. Stock et al. (2018) used a Bayesian approach to project cancer incidence rates to 2030 using data from the German cancer registry.24 They found that this method offered advantages in terms of flexibility, interpretability, transparency and level of detail, but they did not recommend using it for short-term data. Pesola et al. (2017) compared a number of models (null, age–drift, age–period, age–cohort and APC) to predict pediatric and adolescent cancer incidence in England to 2030.25 The model fit results showed that the age–drift model offered as good a fit to the data as more complex models for all cancers in children. An APC model with natural cubic splines was evaluated when predicting cancer incidence and mortality in the United Kingdom until 2035.26 The basis of the APC model is that past trends will continue into the future. If vaccines or new treatments that change cancer incidence and mortality are developed, the model will not anticipate these changes, reinforcing the importance of using recent data and completing projections at regular intervals.27 Katanoda et al. (2014) examined three projection models’ ability to project short-term cancer incidence in Japan: generalized linear model with age and period as independent variables (A+P linear); generalized linear model with age, period and their interactions (A*P linear); and generalized additive model with age, period and their interactions smoothed by spline (A*P spline).28 They used Nordpred in their preliminary analysis and it failed to predict the peak in liver cancer in the mid-1990s.
Strengths and limitations
This project has several limitations. In all the models Canproj uses, the variables age, period and cohort encompass all the changes and improvements in risk factors, demography and ethnic profile of the population, prevention, early detection and treatment. More details on these cancer rate determinants would improve the capacity for making more reliable projections. However, the level of information needed may be hard to obtain in some jurisdictions and, for most of the cancer sites investigated in the project, the age, period and cohort information has proven sufficient for making reliable projections.
We did not conduct a detailed Canadian provincial data analysis in the present exercise, but we expect that as provincial populations get smaller, models other than Nordpred would become the most frequently selected through the decision tree.
The data used in the models did not include data from the province of Quebec and consequently does not represent the entire country.
Finally, as with all methods, projections rely on the assumption that past trends will continue into the future, which may not always be the case.
Conclusions
Health care planners and policy makers need to know about the future burden of cancer to help them prioritize cancer control strategies, allocate resources and evaluate treatments and interventions. The Canproj package can provide reliable cancer projections to help them support their task.
Conflicts of interest
The authors have no conflicts of interest to declare.
Authors’ contributions and statement
AD, ZQ and AS were involved in the design and conceptualization of the work.
AD and ZQ were in involved in the analysis of the data.
AD drafted the paper.
All authors provided input for the interpretation of the results and revision of the paper.
The content and views expressed in this article are those of the authors and do not necessarily reflect those of the Government of Canada.
1 Canadian Cancer Society’s Advisory Committee on Cancer Statistics. Canadian cancer statistics 2011. Ottawa (ON): Government of Canada; 2011.
2 Fekjær H, Møller B. Nordpred software package. Majorstuen (NO): Cancer Registry of Norway; [cited 2017 Jul 20]. Available from: https://www.kreftregisteret.no/en/Research/Projects/Nordpred/Nordpred-software/
3 Møller B, Fekjaer H, Hakulinen T, et al. Prediction of cancer incidence in the Nordic countries: empirical comparison of different approaches. Stat Med. 2003;22(17):2751-66. doi:10.1002/sim.1481.
4 Clayton D, Schifflers E. Models for temporal variation in cancer rates. II: Age-period-cohort models. Stat Med. 1987;6(4):469-81. doi:10.1002/sim.4780060406.
5 Weir HK, White MC. Cancer incidence and mortality through 2020. Prev Chronic Dis. 2016;13:E48. doi:10.5888/pcd13.160024.
6 Rapiti E, Guarnori S, Pastoors B, Miralbell R, Usel M. Planning for the future: cancer incidence projections in Switzerland up to 2019. BMC Public Health. 2014;14(1):102. doi:10.1186/1471-2458-14-102.
7 Nowatzki J, Moller B, Demers A. Projection of future cancer incidence and new cancer cases in Manitoba, 2006-2025. Chronic Dis Can. 2011;31(2):71-8.
8 Qiu Z, Jiang Z, Wang M, Hatcher J. Long-term projection methods: comparison of age-period-cohort model-based approaches. Technical report for Cancer Projections Network (C-Proj). Edmonton (AB): Alberta Health Services; 2010.
9 Kim HJ, Fay MP, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Stat Med. 2000;19:335-51. doi:10.1002/(sici)1097-0258(20000215)19:3<335::aid-sim336>3.0.co;2-z.
10 Canadian Cancer Society’s Advisory Committee on Cancer Statistics. Canadian cancer statistics 2017. Ottawa (ON): Government of Canada; 2017.
11 Fritschi L, Chan J, Hutchings SJ, Driscoll TR, Wong AY, Carey RN. The future excess fraction model for calculating burden of disease. BMC Public Health. 2016;16(1):386. doi:10.1186/s12889-016-3066-1.
12 Alberta Health Services, Surveillance and Reporting. The 2017 report on cancer statistics in Alberta. Edmonton (AB): Alberta Health Services; 2017.
13 Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Statist Surv. 2010;4:40-79. doi:10.1214/09-SS054.
14 Statistics Canada. Canadian Cancer Registry (CCR). Detailed information for 2017. 2019 [cited 2019 May 3]. Available from: http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=3207
15 Working Group Report. International rules for multiple primary cancers (ICD-0 third edition). Eur J Cancer Prev. 2005;14(4):307-8. doi:10.1097/00008469-200508000-00002.
16 Statistics Canada. Data products, 2016 Census. Ottawa (ON): Statistics Canada; 2019 [cited 2019 Mar 6]. Available from: https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/index-eng.cfm
17 Statistics Canada. Canadian Cancer Registry: Age-standardization. Ottawa (ON): Statistics Canada; 2019 [cited 2019 Mar 5]. Available from: https://www.statcan.gc.ca/eng/statistical-programs/document/3207_D12_V3
18 Qiu Z, Hatcher J; Cancer Projection Analytical Network Working Team. Canproj: the R package of cancer projection methods based on generalized linear models for age, period, and/or cohort: Version I. 2013.
19 National Cancer Institute, Division of Cancer Control & Population Sciences, Surveillance Research Program. Joinpoint trends analysis software. Bethesda (MD): National Institutes of Health; 2019 [cited 2019 Mar 6]. Available from: https://surveillance.cancer.gov/joinpoint/
20 Cancer Care Ontario. Ontario cancer statistics 2018. Toronto (ON): Cancer Care Ontario; 2018.
21 Qiu Z, Wang H, Wang M, Dewar R, Hatcher J; Cancer Projection Analytical Network Working Team. Comparison of projection methods: validation analysis using Nova Scotia cancer registry database. Edmonton (AB): Alberta Health Services and Cancer Care Nova Scotia; 2011.
22 Poirier AE, Ruan Y, Walter SD, et al.; ComPARe Study Team. The future burden of cancer in Canada: Long-term cancer incidence projections 2013-2042. Cancer Epidemiol. 2019;59:199-207. doi:10.1016/j.canep.2019.02.011.
23 Lee TC, Dean CB, Semenciw R. Short-term cancer mortality projections: a comparative study of prediction methods. Stat Med. 2011;30(29):3387-402. doi:10.1002/sim.4373.
24 Stock C, Mons U, Brenner H. Projection of cancer incidence rates and case numbers until 2030: a probabilistic approach applied to German cancer registry data (1999-2013). Cancer Epidemiol. 2018;57:110-9. doi:10.1016/j.canep.2018.10.011.
25 Pesola F, Ferlay J, Sasieni P. Cancer incidence in English children, adolescents and young people: past trends and projections to 2030. Br J Cancer. 2017;117(12):1865-73. doi:10.1038/bjc.2017.341.
26 Smittenaar CR, Petersen KA, Stewart K, Moitt N. Cancer incidence and mortality projections in the UK until 2035. Br J Cancer. 2016;115(9):1147-55. doi:10.1038/bjc.2016.304.
27 De Souza Giusti AC, De Oliveira Salvador PT, Dos Santos J, et al. Trends and predictions for gastric cancer mortality in Brazil. World J Gastroenterol. 2016;22(28):6527-38. doi:10.3748/wjg.v22.i28.6527.
28 Katanoda K, Kamo K, Saika K, et al. Short-term projection of cancer incidence in Japan using an age-period interaction model with spline smoothing. Jpn J Clin Oncol. 2014;44(1):36-41. doi:10.1093/jjco/hyt163.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the associated terms available at https://www.canada.ca/en/public-health/services/reports-publications/health-promotion-chronic-disease-prevention-canada-research-policy-practice/about-health-promotion-chronic-disease-prevention-canada-research-policy-practice.html
Abstract
Introduction: Cancer projections can provide key information to help prioritize cancer control strategies, allocate resources and evaluate current treatments and interventions. Canproj is a cancer-projection tool that builds on the Nordpred R-package by adding a selection of projection models. The objective of this project was to validate the Canproj R-package for the short-term projection of cancer rates.
Methods: We used national cancer incidence data from 1986 to 2014 from the National Cancer Incidence Reporting System and Canadian Cancer Registry. Cross-validation was used to estimate the accuracy of the projections generated by Canproj and relative bias (RB) was used as validation measure. The Canproj automatic model selection decision tree was also assessed.
Results: Five of the six models had mean RB between 5% and 10% and median RB around 5%. For some of the cancer sites that were more difficult to project, a shorter time period improved reliability. The Nordpred model was selected 79% of the time by Canproj automatic model selection although it had the smallest RB only 24% of the time.
Conclusions: The Canproj package was able to provide projections that closely matched the real data for most cancer sites.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer