Introduction
Representing one of the world’s largest primary care databases, the Clinical Practice Research Datalink (CPRD) contains anonymised patient level data captured at consenting general practitioner (GP) practices throughout the United Kingdom. Covering approximately 7% of the UK population, CPRD contains information on demographics, clinical results, medication usage, hospital admission, referrals, registration details and death [1]. CPRD has been shown to be representative of ethnicity, sufficiently accurate in recordings of death and comparable to other populations with regards to age and sex distribution [2–4].
A common research area of Electronic Health Records (EHRs) research, including the use of CPRD, is the effect of diseases on mortality and it is therefore imperative to understand how mortality rates in a selected CPRD population compare with general population rates. The selection of cohorts on the requirement of individuals having been registered at a contributing GP practice for a specific length of time is commonplace within EHR research [5–10]. Sometimes referred to as research-quality follow-up, or lookback window, it is an observation period prior to the start of a subject’s at-risk follow-up, ending at a date often referred to as the index date. This lookback period may be used for the clinical assessment of a comorbid condition or diagnoses, or to identify medication history. The selection effect of these delayed-entry conditions on estimated mortality rates is unknown.
In order to assess mortality rates in CPRD and the effect of the requirement for a lookback window, Standardised Mortality Ratios (SMRs) were estimated over two time scales; calendar year and follow-up period utilising CPRD data for the period 2000 to 2018.
Materials and methods
CPRD cohort and patient timelines
The data used comprised of CPRD GOLD patients deemed as having research acceptable data with data linkages to both the Office for National Statistics (ONS) for death registration data and secondary hospital admission data from Hospital Episode Statistics (HES). These commonly applied data linkages reduce the geographical area of CPRD to only the English data contribution. A random sample of 1 million patients was taken without replacement from research acceptable patients with data linkages to both HES and ONS, who were ≥18 years old and alive with CPRD follow-up after 1 January 2000. Details of the random sample and associated Stata code can be found in the S1 File. This defined the cohort entry or index date, I(0), of our cohort from which mortality follow-up started (Fig 1).
[Figure omitted. See PDF.]
Lookback window (w) and at-risk follow-up period displayed.
A composite start date, S, was defined for each patient as the latest of the date of registration at their GP practice (first or current registration date) and the date the practice data was deemed to be of research quality or “up-to-standard” [11]. An end date, E, was defined as the earliest of the practice’s last data collection date, a patient’s date of transfer out of their GP practice (including for death), the death date from ONS, or the administrative censoring date, 31st December 2018 (Fig 1). Four sub-cohorts were selected to have a lookback window, W, of at least 1, 2, 5 or 10 years. For each instance, a new cohort index date, I(w), was defined, signifying the start of at-risk follow-up, where W≥w, w = 1, 2, 5, 10. For each new sub-cohort, those with lookback window <w years were omitted from the analysis. The at risk period for each individual was end date, E, minus the cohort index date, I(w), (in years) and a crude death rate was calculated for each sub-cohort as the number of deaths divided by the total person-time at-risk, expressed per 1000 person-years. A Charlson Comorbidity Index (CCI) [12] score was calculated per patient using comorbid conditions identified in HES in the 10 years prior to cohort index date I(w), baseline. The scores were classified into four groups for those with a CCI score at baseline of zero, one, two and three or more.
Reference mortality rates are derived from ONS life tables for England [13]. These published tables are based on population estimates and deaths for a three-year consecutive period. The population mortality rates used [published September 2021] covered the period 1980–1982 to 2018–2020, with the mid-year chosen to represent the data period; i.e. 2016–2018 life table captured as 2017. Life tables are stratified by age and calendar year, and published separately per gender.
Standardised mortality ratios
The SMR is an indirect standardisation measure giving an estimate of the relative increase or decrease in mortality in a study population compared to a reference population. It is calculated as the ratio of the observed number of deaths within the study cohort to the expected number of deaths in the reference population (E), with di = 1 if individual i dies and 0 otherwise; i = 1,…,N. The expected number of deaths are defined as , where is the mortality rate in the reference population for stratum k, defined by unique gender, age and calendar year combinations, and tk is the cohort’s total time at-risk (measured in person-years) for that stratum. The estimation of the reference mortality rates are obtained from national actuarial life-tables published by ONS [13]. These provide precise estimates of mortality rates in the reference population, utilising mid-year population estimates and recorded mortality counts. An estimate of the overall SMR is obtained by modelling the number of observed deaths in the cohort in stratum k, dk, such that dk~Poisson(Ek), where Ek = E[dk] = λktk and λk is the cohort mortality rate in stratum k. To incorporate the expected number of deaths we use Poisson regression with a log link and two offsets, log(tk) and , to obtain
This gives as the overall SMR, accounting for the stratum-specific mortality rates. The model can be extended to estimate stratum-specific SMRs by inclusion of explanatory variables in the Poisson regression model [14–16]. For example, we obtained estimates of calendar-year specific SMRs from data grouped by strata using the modelwhere is the SMR for calendar year y and the subscript a, s, y relates to stratum combinations defined by attained age a (in years), sex s, and calendar year y. The individual patient data are split by age and calendar year into one-year epochs, before aggregation by unique sex, age and calendar year combination to give the total number of deaths and person-years at-risk for each stratum. The resulting aggregated data are matched with ONS published rates for the same stratum, and SMRs estimated.
SMR by follow-up period
For the full cohort of 1 million randomly sampled CPRD GOLD patients, time-since-entry, defined as the time from index date in years (Fig 1), was included in the estimation model, providing estimates of SMRs by follow-up period. When estimating SMRs by follow-up period f, the data are split additionally by the third timescale, time-since-entry, defined as
The inclusion of age groups (18–59, 60–69, 70–79, 80–89, 90–99) as an interaction with follow-up period allowed for SMRs to vary by age group over follow-up period.
All analysis and modelling procedures were performed in Stata 16.
This research was approved by the Independent Scientific Advisory Committee (ISAC) for Medicines and Healthcare products Regulatory Agency Database Research (19_253RA). Generic ethical approval for observational research using the CPRD with approval from ISAC has been granted by a Health Research Authority Research Ethics Committee. Individual patient consent is not required.
Results
Over the almost 19—year period (1st January 2000 – 31st December 2018), there were 78 729 deaths (7.9%) in the full CPRD random sample cohort (n = 1 000 000), Table 1. Each selected sub-cohort with the required lookback window W≥w [w = 0,1,2,5,10], resulted in reduced cohort sizes. The sample size decreased to n = 876 048 for the sub-cohort with at least 1 year lookback, n = 771 175 for W≥2 years, n = 568 114 for W≥5 years and n = 370 780 for W≥10 years. There was some evidence of geographical variation between the sub-cohorts with the relative contribution of patients and practices from the London region decreasing for sub-cohorts with longer lookback windows. The patient pre-index CPRD history (defined as index date–start date in years) was on average 1.84 years for those with no lookback requirement, with a minimum of zero years of CPRD history, while some subjects had over 18 years of history prior to their start of at-risk follow-up. The mean pre-index CPRD history increased with increases in the lookback window requirement. Gender ratio and mean age at start date and mean age at death date remained consistent over all sub-cohorts whilst mean age at index date and end date increased with lookback reflecting an older population in the sub-cohorts. Despite this, the percentage of deaths in follow-up remained relatively consistent over sub-cohorts while follow-up decreased from over 6.5 million person-years to 2.2 million person-years from zero to ten years lookback. The mean follow-up per individual remained constant at around 6 years.
[Figure omitted. See PDF.]
The crude death rate remained relatively stable, increasing only slightly in the ten year lookback sub-cohort. The large majority of subjects had no comorbidity at baseline across all sub-cohorts. The proportion with no comorbidity score at baseline decreased with increases in lookback, with all other comorbidity groups increasing as comorbidity burden rose due to an aging population. In those with ten years of lookback the proportion with no comorbidity reduced to 88%, compared to 91% in the sub-cohort with five years of lookback. A small increase was also seen in the mean CCI score.
Practice registration history in CPRD for patients in the full CPRD random sample (n = 1 000 000), starting when a practice is deemed to provide up-to-standard data and ending at the date of last data collection, had a mean of 16.65 (SD = 7.03) years. The longest registration was 31.6 years, while the shortest was 68 days.
Fig 2 shows the CPRD practice history, ordered from the earliest registered practices to the latest with the number of active contributing CPRD practices overlaid. The vertical red lines and shaded area demarcate the follow-up period of 01/01/2000 to 31/12/2018. Active CPRD practices providing data to CPRD rose to a peak in 2008 (n = 361) before a sharp decrease to registration levels equalling those seen in 1990 by the end of 2018.
[Figure omitted. See PDF.]
The shaded region shows the follow-up period with the number of active practices by calendar year overlaid (right-hand y-axis).
Lookback window and effect on SMR
The overall SMR for the 1 million CPRD random sample was 0.980 [95% confidence interval (CI) (0.973, 0.987)]. As suggested by the overall SMR, the cohort with no requirement of lookback window (w = 0) had SMRs that tended to be just below one. With increasing amounts of lookback window came reduced SMRs. The requirement of at least a single year of lookback resulted in a SMR of 0.905 (0.898–0.912). The subsequent increase in lookback revealed a trend of decreasing overall SMRs; for two years of lookback (W≥2) a SMR of 0.881 (0.874–0.888), five years (W≥5) a SMR of 0.849 (0.841–0.857) and ten years (W≥10) a SMR of 0.837 (0.827–0.847) (S1 Table in S1 File). Across the sub-cohorts there was some evidence that the SMRs were decreasing slightly over calendar time, Fig 3.
[Figure omitted. See PDF.]
Reference line of SMR = 1 in red.
Mortality by follow-up in CPRD
In the full cohort there was evidence of an initial high SMR in the first two years after entry, Fig 4 (S2 Table in S1 File). After the second year of follow-up, mortality rates reverted to below national background rates. When considered across all follow-up periods, the mortality rate in the cohort was just below the mortality rate in the general population, overall SMR = 0.980 (0.973–0.987).
[Figure omitted. See PDF.]
Reference line of SMR = 1 in red.
Mortality by follow-up and age group in CPRD
SMRs were estimated by follow-up and age group, Fig 5. This confirmed that the initial high SMR seen overall (Fig 4) was present in all age groups, yet the effect was lowest in the youngest age group (18–59). Older age groups had higher initial SMRs and lower SMRs in later follow-up, yet in all age groups the SMR fell below one after the third year of follow-up. This trend continued up to 19 years after study entry (index date).
[Figure omitted. See PDF.]
Split to show initial high mortality rate trend (5a) and lower mortality rate after year 2 (5b). Reference line of SMR = 1 in red.
Discussion
Overall, mortality rates in the unrestricted CPRD GOLD random sample population of 1 million patients are similar to mortality rates seen in the general English population. The inclusion of a lookback window requirement of even a single year resulted in a significantly lower mortality rate in the sub-cohort once accounting for age and sex when compared with the English population. This implies that a healthier population is being selected, creating a form of selection bias. The requirement of a lookback window may inadvertently remove high-risk patients, or simply result in the selection of a more “stable” patient population. Longer registration periods with a single primary care provider may additionally result in more medically vigilant and compliant patients, all indicative of a healthier patient subgroup.
The end date of a patient’s follow-up, as in many EHR studies, represents a compound measure including data specific to an individual and data contributed by their registered GP practice. The end date utilised here is either the patient’s date of transfer out (which can be for reasons of death), date of death, the date of last data collection from their GP practice or the administrative censoring date, whichever came earliest. As the requirement for more lookback increases, so does the proportion of patient’s end dates defined by the date of last data collection from their registered GP practice. This form of censoring, though likely to be uninformative, should be examined and the impact of the selection of practices no longer contributing to CPRD considered. Similarly, the increase in lookback increases the number who reach administrative censoring, while the number of patients who transfers out of a registered GP practice decreases, emphasising the “stable” population narrative but these reasoning’s may be an oversimplification of the mechanisms at play and need further investigation.
The complexity regarding the anonymity of CPRD data may be a driving factor in the high initial SMRs. Patients in CPRD represent unique lines of data. If a patient transfers out of their elected GP practice and into a new practice (for a multitude of reasons such as at their request or due to the change of residential address), this results in the creation of a “new” patient record in CPRD on registration with their new primary care provider. Therefore, it is conceivable for CPRD to contain multiple patient’s records that are in fact the same individual. At current, utilising only CPRD as a data source, there is no mechanism to link these records together. It is theorised that the transfer out of patients from one GP practice and their subsequent death shortly after re-registration with a new GP practice may be accountable for a portion of the high initial SMRs seen in the first two years of follow-up.
As a hypothetical example, consider an elderly patient who transfers out of their current longstanding GP practice and moves residence into assisted care housing, registers at the closest GP practice or a GP practice associated with the care home and then passes away 10 months after re-registration. Within the context of the data available, this would be seen as two individual records in CPRD, the first with a long CPRD record with no mortality event as the patient transferred out, and the second having a death within 10 months of registration. This hypothesis is partly supported by the finding that younger patients have lower initial SMRs than older patients do. Further investigation is needed to assess if subjects that are re-registering at a new GP practice (with previous CPRD registration history) are at a higher risk than new CPRD patients are.
A number of limitations have been identified in this research. This research was performed on a random sample of patients from CPRD and so does not represent the entirety of CPRD GOLD. Additionally, this data represented only data derived from an English population. The generalisability of these results to CPRD Aurum, other geographical areas within the United Kingdom and other large scale primary care EHRs is unknown. The lack of a full date of birth per patient, with only a birth year provided could have a marginal effect on results, while the unavailability of a linkage mechanism between de-and-re-registered patients proves vastly more problematic. The size of the sample (1 million patients) is seen as a strength though, along with the use of a robust statistical model, in the form of Poisson regression, considering changes over calendar year and follow-up, modelled on multiple time scales (age and calendar year).
Conclusions
Regardless of the mechanism or reasoning for the selection effect or high initial mortality rates when compared to the general population, the results of reduced mortality rates with increased lookback window periods and high initial mortality rates in CPRD is significant and should be noted by all who use CPRD in the study of mortality. The use of these lookback periods is commonplace, and the implicit assumption that CPRD is representative of mortality in the general population must be carefully considered. If the requirement of lookback is consistently applied to both the study population and control group, then comparisons between groups may be valid leading to internal validity. However, when the results of a study are to be generalised to the wider population, the representativeness of the CPRD cohort should be questioned. In addition, the higher rates of mortality compared to adjusted general population rates, in the first two years of entry into CPRD, also need to be considered when addressing research questions using CPRD.
Supporting information
S1 File.
https://doi.org/10.1371/journal.pone.0265709.s001
(DOCX)
Acknowledgments
The author gratefully acknowledges Leicester Real-World Evidence Unit (LRWE) for providing CPRD data. The interpretation and conclusions contained in this report/article do not necessarily reflect those of the LRWE.
This study is based in part on data from the Clinical Practice Research Datalink GOLD database obtained under licence from the UK Medicines and Healthcare products Regulatory Agency. However, the interpretation and conclusions contained in this article are those of the authors alone.
Citation: Schmidt JCF, Lambert PC, Gillies CL, Sweeting MJ (2022) Patterns of rates of mortality in the Clinical Practice Research Datalink. PLoS ONE 17(8): e0265709. https://doi.org/10.1371/journal.pone.0265709
About the Authors:
James C. F. Schmidt
Roles: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Visualization, Writing – original draft, Writing – review & editing
E-mail: [email protected]
Affiliation: Biostatistics Research Group, Department of Health Sciences, University of Leicester, Leicester, United Kingdom
https://orcid.org/0000-0001-9470-0654
Paul C. Lambert
Roles: Conceptualization, Formal analysis, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing
Affiliations Biostatistics Research Group, Department of Health Sciences, University of Leicester, Leicester, United Kingdom, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Clare L. Gillies
Roles: Conceptualization, Formal analysis, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing
Affiliation: Leicester Diabetes Centre, Leicester General Hospital, University of Leicester, Leicester, United Kingdom
Michael J. Sweeting
Roles: Conceptualization, Formal analysis, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing
Affiliation: Biostatistics Research Group, Department of Health Sciences, University of Leicester, Leicester, United Kingdom
1. CPRD GOLD Data Specification. Version 2.0 September 2017. Padmanabhan, S. https://cprdcw.cprd.com/_docs/CPRD_GOLD_Full_Data_Specification_v2.0.pdf (June 2021, date last accessed).
2. Mathur R, Bhaskaran K, Chaturvedi N, Leon DA, vanStaa T, Grundy E, et al. Completeness and usability of ethnicity data in UK-based primary care and hospital databases. Journal of public health (Oxford, England) 2014 Dec;36(4):684–692. pmid:24323951
3. Gallagher AM, Dedman D, Padmanabhan S, Leufkens HGM, de Vries F. The accuracy of date of death recording in the Clinical Practice Research Datalink GOLD database in England compared with the Office for National Statistics death registrations. Pharmacoepidemiology and drug safety 2019 May;28(5):563–569. pmid:30908785
4. de Jong, Roy G. P. J, Gallagher AM, Herrett E, Masclee AAM, Janssen-Heijnen MLG, de Vries F. Comparability of the age and sex distribution of the UK Clinical Practice Research Datalink and the total Dutch population. Pharmacoepidemiology and drug safety 2016 Dec;25(12):1460–1464. pmid:27465256
5. Strongman H, Gadd S, Matthews A, Mansfield KE, Stanway S, Lyon AR, et al. Medium and long-term risks of specific cardiovascular diseases in survivors of 20 adult cancers: a population-based cohort study using multiple linked UK electronic health records databases. The Lancet (British edition) 2019 Sep 21,;394(10203):1041–1054. pmid:31443926
6. Jaggi A, Nazir J, Fatoye F, Quelen C, Tu X, Ali M, et al. Anticholinergic Burden and Associated Healthcare Resource Utilization in Older Adults with Overactive Bladder. Drugs Aging 2021;38(10):911–920. pmid:34386936
7. Al-Hamed F, Kouniaris S, Tamimi I, Lordkipanidzé M, Madathil SA, Kezouh A, et al. Acetylcholinesterase inhibitors and risk of bleeding and acute ischemic events in non-hypertensive Alzheimer’s patients. Alzheimer’s Dement (N Y) 2021;7(1):e12184. pmid:34458554
8. Husemoen LLN, Mørch L,S., Christensen PK, Hartvig NV, Feher MD. All-Cause and Cardiovascular Mortality Among Insulin-Naïve People With Type 2 Diabetes Treated With Insulin Detemir or Glargine: A Cohort Study in the UK. Diabetes therapy: research, treatment and education of diabetes and related disorders 2021;12(5):1299–1311. pmid:33721211
9. Sarmanova A, Doherty M, Kuo C, Wei J, Abhishek A, Mallen C, et al. Statin use and risk of joint replacement due to osteoarthritis and rheumatoid arthritis: a propensity-score matched longitudinal cohort study. Rheumatology (Oxford) 2020;59(10):2898–2907.
10. Blackwell J, Alexakis C, Saxena S, Creese H, Bottle A, Petersen I, et al. Association between antidepressant medication use and steroid dependency in patients with ulcerative colitis: a population-based study. BMJ Open Gastro 2021;8(1):e000588. pmid:34045238
11. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data Resource Profile: Clinical Practice Research Datalink (CPRD). International journal of epidemiology 2015;44(3):827–836. pmid:26050254
12. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. Journal of Chronic Diseases 1987;40(5):373–383. pmid:3558716
13. Office for National Statistics. National life tables: England, September 2021. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/datasets/nationallifetablesenglandreferencetables/current (November 2021, date last accessed).
14. Breslow NE, Day NE. Statistical Methods in Cancer Research Volume II: The Design and Analysis of Cohort Studies. Lyon: IARC Sci Publ; 1987.
15. Clayton D, Hills M. Statistical models in epidemiology. Reprint. ed. Oxford [u.a.]: Oxford Univ. Press; 1995.
16. Tom BDM, Farewell VT. Statistical Methods for Individual-Level Data in Cohort Mortality Studies of Rheumatic Diseases. Communications in Statistics—Theory and Methods: Advances in Statistical Methodology for Analyzing Rheumatic Diseases 2009 Sep 21,;38(18):3472–3487.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022 Schmidt et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The Clinical Practice Research Datalink (CPRD) is a widely used data resource, representative in demographic profile, with accurate death recordings but it is unclear if mortality rates within CPRD GOLD are similar to rates in the general population. Rates may additionally be affected by selection bias caused by the requirement that a cohort have a minimum lookback window, i.e. observation time prior to start of at-risk follow-up. Standardised Mortality Ratios (SMRs) were calculated incorporating published population reference rates from the Office for National Statistics (ONS), using Poisson regression with rates in CPRD GOLD contrasted to ONS rates, stratified by age, calendar year and sex. An overall SMR was estimated along with SMRs presented for cohorts with different lookback windows (1, 2, 5, 10 years). SMRs were stratified by calendar year, length of follow-up and age group. Mortality rates in a random sample of 1 million CPRD GOLD patients were slightly lower than the national population [SMR = 0.980 95% confidence interval (CI) (0.973, 0.987)]. Cohorts with observational lookback had SMRs below one [1 year of lookback; SMR = 0.905 (0.898, 0.912), 2 years; SMR = 0.881 (0.874, 0.888), 5 years; SMR = 0.849 (0.841, 0.857), 10 years; SMR = 0.837 (0.827, 0.847)]. Mortality rates in the first two years after patient entry into CPRD were higher than the general population, while SMRs dropped below one thereafter. Mortality rates in CPRD, using simple entry requirements, are similar to rates seen in the English population. The requirement of at least a single year of lookback results in lower mortality rates compared to national estimates.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer