An Overview of Cancer in the First 315,000 All of

Full text

Turn on search term navigation

Introduction

Cancer was the leading cause of death in the United States after cardiovascular diseases in 2020, with over 600,000 cancer-related deaths and a further 1.8 million expected diagnoses [1–5]. Although treatments are improving and personalized medicine promises advancements, cancer diagnoses are expected to increase substantially over the next decade, due mainly to the aging population in the US and modifiable behavioral/lifestyle factors [1,2,6]. The risk of developing cancer depends on the complex interplay of factors including genes, age, and gender, lifestyle and behavioral factors such as diet, energy balance, physical activity, tobacco and alcohol use; endogenous factors such as hormones and growth factors; medication and drug use; infectious agents; and environmental exposures [1,6]. Precision medicine and precision health, which consider the patient as an individual, hold promise for cancer research [7–10]. For instance, individuals with similar diagnoses often receive the same treatment despite observations that efficacy varies by patient. Additionally, new approaches to precision prevention and early detection, informed by an enriched understanding of the etiology and natural history of cancer, could improve clinical interventions.

With over one million participants, the All of Us Research Program will have the scale to enable research on myriad diseases, especially cancer [11–13]. The program’s focus on diversity and inclusion promises to shed light on US cancer inequities, as fewer than 2% of cancer studies have been powered to consider race/ethnicity [14,15]. Given its diversity and large sample size, All of Us may have the statistical power to answer questions about the causes of cancer and drivers of disparities and identify opportunities for precision prevention.

Researchers currently have access to data from over 315K All of Us participants through the Researcher Workbench. Although the program does not target enrollment by health status, the sample to date includes a sufficient number of participants with a history of cancer, prevalent cancers, and incident cancers to enable systematic studies of cancer risk, outcomes, medication effects, and therapeutic approaches across environmental, social, genomic, and economic contexts. This demonstration project examines the distribution and characterization of cancer in All of Us and compares these numbers to expected national rates reported by the Surveillance, Epidemiology, and End Results (SEER) Program [16] and distribution in the US population.

Materials and methods

All of us research projects

The goals, recruitment methods and sites, and scientific rationale for All of Us have been described previously [17]. Demonstration projects were designed to establish the value of the cohort by describing the cohort and replicating previous findings for validation [18]. The work described here was proposed by Consortium members, reviewed and overseen by the program’s Science Committee, and was confirmed as meeting criteria for non-human subjects research by the All of Us Institutional Review Board. The initial release of data and tools used in this work was published in 2020 [18].

This work was performed using the All of Us Researcher Workbench, a cloud-based platform where approved researchers can access and analyze All of Us data. At the time of analysis, the All of Us data included survey responses, Electronic Health Records (EHR), and physical measurements (PM). These three types of data are collected either at an All of Us affiliated health care provider organization (HPO) or through a “direct-volunteer” mechanism. HPOs include regional medical centers, federally qualified health centers, and the Veterans Health Administration. HPOs recruit the majority of program participants–mainly persons affiliated with their center. The direct-volunteer route allows those who are not HPO patients to enroll online and visit a designated health clinic, blood bank, laboratory, or health care provider organization to have their PM collected. All three data types (survey, PM, and EHR) were mapped to the Observational Health and Medicines Outcomes Partnership (OMOP) common data model v 5.2 maintained by the Observational Health and Data Sciences Initiative (OHDSI) collaborative

To protect participant privacy, a series of data transformations were applied. These included data suppression of codes with a high risk of identification such as military status; generalization of categories, including age, sex at birth, gender identity, sexual orientation, and race; and date shifting by a random (less than one year) number of days, implemented consistently across each participant record. Documentation on privacy implementation and creation of the CDR is available in the All of Us Registered Tier CDR Data Dictionary [19]. The Researcher Workbench currently offers tools with a user interface (UI) built for selecting groups of participants (Cohort Builder), creating datasets for analysis (Dataset Builder), and Workspaces with Jupyter Notebooks (Notebooks) to analyze data. The Notebooks enable use of saved datasets and direct query using R and Python 3 programming languages.

Study population

Participant-provided information for our analysis was derived from the surveys described above. The full text of these surveys is available in the Survey Explorer found in the All of Us Research Hub, a publicly available website designed to support researchers [20]. The Basics survey elicits demographic information including age, race/ethnicity, education, marital status, household income, and geography. The Lifestyle survey collects tobacco use data. Personal Medical History collects self-reported cancer history, including cancer type(s), life stage at diagnosis, and whether the participant is currently seeing a health care provider and/or receiving cancer treatment. The Basics and Lifestyle surveys are collected at baseline, whereas Personal Medical History is collected during retention efforts 3 months after enrollment.

Cancer diagnosis data were also derived from participant EHR. Diagnoses were determined using SNOMED CT codes and mapped to OMOP concept ID by the All of Us DRC. SNOMED CT codes for cancers and subtypes were combined to reflect the categories used for national reporting, including SEER and the North American Association of Central Cancer Registries (NACCR). EHR data also include procedures, medications, laboratory tests, and health care provider visits. We used the following cancers/cancer sites in our analysis: bladder (93689003), leukemia (93143009), non-Hodgkin’s lymphoma (118601006), myeloma (109989006), bone (93725000), brain (93727008), breast (372137005), cervix (372024009), colon (93761005), endocrine system (371983001), endometrium (10708511000119100), esophagus (371984007), eye (371986009), head/neck (372123001), kidney (93849006), lung (93880001), oral cavity (372001002), ovary (93934004), pancreas (372003004), prostate (93974005), rectum (93984006), stomach (372014001), and thyroid (94098005).

Categories of time from diagnosis were taken from the Personal Medical History survey, which asks: “About how old were you when you were first told you had this condition?” Response categories were child (0–11), adolescent (12–17), adult (18–64), older adult (65–74), and elderly (75+).

Time from diagnosis in the EHR was calculated as the current date minus the date of diagnosis, reported in years (mean, SD, and median).

Treatment type was reported for persons with a history of cancer from the EHR using the following SNOMED codes: surgery (1623, 11600, 11601, 11602, 11603, 11604, 11606, 11620, 11621, 11622, 11624, 11626, 11640, 11641, 11642, 11643, 11644, 11646, 17260, 17261, 17262, 17263, 17264, 17266, 17270, 17272, 17273, 17274, 17276, 17280, 17281, 17282, 17283, 17284, 17286, 370612006), radiotherapy (108290001), chemotherapy (38216008), immunotherapy (64644003), hormone therapy (10324, 72143, L02BB, L02BG), and stem cell transplant (41.04, 41.05, 41.06, 41.07, 41.08).

National comparison

We compared the observed frequency of cancer reported in All of Us to National Cancer Institute’s SEER 18 Registries Database, November 2018 submission [21], to analyze cancer frequency overall and by site based on cases diagnosed in 2016 among residents of the areas included in the 18 registries covering ∼28% of the United States population. We reported the frequency of diagnosis in 2016 by assessing the limited duration 26-year cancer prevalence to determine the relative frequency and percent contribution of each cancer type to all cancers in the population by evaluating prevalence data representing the first invasive tumor site. Limited-Duration Prevalence represents the proportion of people alive on a certain day who had a diagnosis of the disease within the past x years (e.g. x = 5, 10 or 20 years). We chose the most recent year of diagnosis given the period for which All of Us has been conducting enrollment. Skin cancer (melanoma of the skin) was excluded from the “total cancer” calculation for SEER cancers and from the analysis since the All of Us survey data does not differentiate between melanoma and non-melanoma skin cancer. Invasive cancer was coded using the International Classification of Diseases for Oncology, third edition (ICD-O-3) [22].

Data analysis

We generated descriptive statistics and prevalence for the most common cancers and used Chi-square tests to test the difference in the categorical distribution of data source types (survey data, EHR, and both) across the key demographic and lifestyle categories. The percent distribution of cancer types was calculated as the number of cases per site/total number of cancer cases in each respective dataset. Results are stratified by race/ethnicity and sex at birth to consider the demographic-specific distributions in cancer types. Cancer frequency was calculated using SEER*Stat 8.3.9 [23].

Results

Table 1 shows the distribution of the baseline characteristics of all participants (N = 315,297), and by those with a cancer outcome as captured from the EHR (N = 203,813 participants with EHR; including N = 23,520 cancer cases), via self-report in the survey database (N = 89,261 completed Personal Medical History survey; including N = 13,298 cancer cases), and from participants with both survey and EHR data (N = 62,497 participants with both data types; including N = 7,123 cancer cases). Personal Medical History survey completion varies considerably, with older, female, and non-Hispanic Whites more likely to provide data than the population with available EHR (that more closely reflects the larger All of Us participant population). Differences across key demographic factors in data availability (survey data and/or EHR) are reflected in the distribution of cancer from the different data sources. Specifically, 84.8% of cancers from the Personal Medical History survey were reported by non-Hispanic Whites, 5.0% by Blacks, and 4.7% by Hispanics compared to 67.1%, 14.3%, and 12.2% respectively captured from the EHR. Non-Hispanic Whites are overwhelmingly represented among those with both self-report and EHR data (75.8%) compared to 51.5% representation in the overall All of Us study population. All p-values for the chi-square values comparing the distributions are <0.001 except the comparison of EMR versus total (which is 0.002).

[Figure omitted. See PDF.]

Table 2 shows that All of Us participants’ EHR data indicate a history of breast cancer most frequently (N = 6,474; 27.5% of cases) followed by blood cancers (N = 4,841; 20.6%) and prostate cancer (N = 3,971; 16.9%). This mirrors the most common self-reported cancers (from the survey) for breast cancer (N = 4,062; 30.5%) and prostate cancer (N = 2,165; 16.3%) but not for blood cancer (N = 483; 9.9%). There are N = 2,499 individuals with breast cancer documented from both the survey and EHR data sources, followed by N = 1,304 individuals with prostate cancer cases, and followed by N = 657 blood cancer cases. Prevalence is broken down by cancer site showing the difference in contribution to disease burden by data source.

[Figure omitted. See PDF.]

Table 3 presents cancer type distribution from each data source by race and ethnicity, with N = 6,125 cancer cases detected in both data sources for non-Hispanic Whites compared to N = 328 cancer cases in African Americans and N = 294 cancer cases in Hispanics. Differences in the distribution of cancer types between survey data and EHR are observed by race/ethnicity (both within and between race/ethnicity, comparing non-Hispanic Whites, Blacks, and Hispanics (<0.001)). The prevalence of cancer subsequently varies by race/ethnicity in each data source as well as reported here.

[Figure omitted. See PDF.]

Table 4 compares the distribution of cancer sites from All of Us survey data and EHR to the expected distribution nationally, based on recent SEER reports of the 26-year limited duration prevalence in 2018. The most common cancer types in SEER (based on contribution to total cancers) are breast cancer (19.9%), prostate cancer (17.6%), blood cancers (11.4%), and colorectal cancers (8.4%). The percent contribution to the cancer burden nationally (as illustrated by SEER data) from each cancer site differs significantly from the EHR site distribution (p<0.001) and the self-reported distribution (p<0.001). As expected, the percent of persons enrolled into All of Us largely from medical centers have a higher proportion of prevalent cancer (11.54% in EHR and 14.90% in survey) than in the US population reported by SEER (4.43%).

[Figure omitted. See PDF.]

Table 5 presents a description of the time from cancer diagnosis as reported in the EHR and survey database. The cancer with the shortest time from diagnosis in the EHR is lung cancer (mean = 5.85 years; SD = 4.46), and the longest time from diagnosis is for head and neck cancer (mean = 11.75 years; SD = 6.59). Across all cancer types, the most common period of diagnosis was adult, followed by older adult.

[Figure omitted. See PDF.]

Table 6 presents treatment types for cancer overall and by site. The most common treatment from the EHR is radiation (N = 7,422; 31.56%), followed by surgery (N = 5,975; 25.54%), hormone therapy (N = 3,962; 16.84%), chemotherapy (N = 842; 3.58%), immunotherapy (N = 470; 1.2%), and stem cell transplant (N = 127; 0.54%). Treatment type utilization varied by cancer site.

[Figure omitted. See PDF.]

Conclusions

In this preliminary analysis of data from the All of Us Research Program, we report that the first 315K+ persons comprise a diverse population with a large number of prevalent cancer cases. As the goal of this effort is to inform studies on a variety of health conditions, including cancer, and to delineate information on risk factors and treatments, an early evaluation of cancers represented in the study population is warranted. Our findings have some key implications for cancer prevention, control, treatment, and outcomes research in the All of Us study population.

Our most notable finding is simple: although a diverse cohort is being enrolled, self-reported cancers are not being ascertained as frequently through the survey modules among underrepresented participants. As validation of diagnosis from EHR using manual verification or self-report is the gold standard to ensure accurate classification and minimize measurement error, the difference in valid case ascertainment by key factors like race is relevant for All of Us cancer research. The drop in cancer data detected from the survey or validated with survey data is associated with racial/ethnic differences in longitudinal retention. Although surveys are completed by a relatively older population, age doesn’t appear to be a key factor influencing differences in data collection. History of cancer is collected through a survey completed at least 90 days after enrollment in All of Us, with an overall medical history survey completion rate among underrepresented participants of 22% across the program compared to 42% in non-UBR participants. Some factors noted in the literature previously [24] that could be of relevance for differences in retention by race/ethnicity include language, literacy, cultural appropriateness, flexibility, ongoing incentives, communication, and of particular growing importance with increasingly electronic survey data collection is the digital divide. This has research implications for the cancer history data collected at follow-up as well as other key risk factor information including health care utilization, personal medical history, and family history. Our investigation shows that the impact of these factors on cancer disparities will be underreported even if cancer history can be obtained from the EHR of most underrepresented participants. All of Us leadership has changed survey module timeline and made Personal Medical History available at baseline, addressing some of the limitations noted here for prospective enrollees.

Furthermore, the difference in cancer ascertainment between survey modules and EHR modalities in underrepresented participants highlights the importance of technologies to integrate the medical records of direct volunteers. Sync for Science for obtaining EHRs from direct volunteers or other non-digital methods of collecting survey data could offer utility beyond the ability to confer medical record information for direct volunteers, as there are implications for inclusion and equity in the investigation of all diseases, including cancer.

The distribution of cancer sites between the two data sources when compared to SEER national statistics is impacted by exclusion of skin cancer from the All of Us cancer analyses. Skin cancer cases account for approximately half of the total cases reported in the survey data. These cases likely include both malignant and non-malignant skin cancers, which would introduce significantly different relative proportions of other cancers if included in the analysis. As restriction to malignant cases was not possible, we excluded all skin cancer cases from analysis.

Another point to consider is the grouping of blood cancers. Because the survey module asks about blood cancers generically, it is impossible to differentiate between myeloma, lymphoma, and leukemia in survey responses. This distinction can be deciphered from the EHR when available. The ability to distinguish these types will be crucial to many cancer researchers.

We further report on the time from diagnosis and the life stage to consider opportunities to collect incident cases or investigate hypotheses for more recent diagnoses. The utility of the life stage questions in etiology or outcomes research is unclear, as the groups (age ranges (child (0–11); adolescent (12–17); adult (18–64); older adult (65–74) and elderly (75+)) are quite broad in the survey. A more refined or consistent metric, such as date of cancer diagnosis, would aid investigation of various cancer-related hypotheses (such as being able to stratify by pre and post menopausal breast cancer. Presenting this data side-by-side highlights how distinct these metrics of diagnosis timing really are.

The All of Us Research Program is set to become one of the largest scientific efforts in U.S. history, and its emphasis on inclusion presents key opportunities to advance precision health and medicine and address disparities in research [25]. Despite the limitations noted in this report, this unprecedented depth of inclusion will confer an important resource for cancer research. All of Us was conceived to support studies of disease outcomes, medication effects, and other therapeutic approaches across various environmental, social, genomic, and economic contexts [26]. The scale and scope of its current cancer data will support extensive investigation of cancer-related hypotheses and enhance the pace of discovery and generalizability. The cohort’s expansion to 1 million participants will create further opportunities. Furthermore, feedback from demonstration projects such as this one will directly inform edits to existing surveys and development of reassessment modules.

In summary, the All of Us Research Program has collected significant cancer data from its first 315K participants. This preliminary investigation notes the most common cancers that will confer sufficient study power for research, especially once whole genome data is available for all participants. Considering our findings, the program might consider the implications of lower retention through survey completion among underrepresented participants on the resource’s utility for research on cancer and other diseases.

Acknowledgments

Past and Present All of Us Research Program Principal Investigators: Brian Ahmedani¹; Christine D Cole Johnson¹; Briseis Aschebrook-Kilfoy²; Habibul Ahsan²; Donna Antoine-LaVigne³; Glendora Singleton*³; Pamelia Watson-McGee³; Arnita Ford Norwood³; Hoda Anton-Culver⁴; Eric Topol⁵; Katie Baca-Motes⁵; Julia Moore-Vogel⁵; Steven Steinhubl⁵; Praduman Jain⁶; Mark Begale⁶; Neeta Jain⁶; David Klein⁶; Scott Sutherland⁶; James Wade*⁶; Bruce Korf⁷; Mona Fouad⁷; Beth Lewis⁷; David B Goldstein⁸; Louise Bier⁸; Ali G Gharavi⁸; George Hripcsak⁸; Eric Boerwinkle⁹; Murray H Brilliant¹⁰; Narayana Murali¹⁰; Scott Joseph Hebbring¹⁰; Elizabeth Burnside¹¹; Dorothy Farrar-Edwards¹¹; Yashoda Sharma¹²; Amy Taylor¹²; Carmen Chinea¹³; Liliana Lombardi Desa¹³; Nancy Jenks¹³; Steve Thibodeau¹⁴; Mine Cicek¹⁴; Eric Schlueter¹⁵; Beverly Wilson Holmes¹⁵; Maria Argos¹⁶; Martha Daviglus¹⁶; Robert Winn¹⁶; Paul Harris¹⁷; Consuelo Wilkins¹⁷; Dan Roden¹⁷; Joshua Denny¹⁷; Kim Doheny¹⁸; Debbie Nickerson¹⁹; Evan Eichler¹⁹; Gail Jarvik¹⁹; Gretchen Funk²⁰; Sallie Hussey²⁰; Anthony Philippakis²¹; Heidi Rehm²¹; Stacey Gabriel²¹; Richard Gibbs²²; Edgar M Gil Rico²³; David Glazer²⁴; Jessica Burke²⁵; Joyce Ho²⁶, Philip Greenland²⁶; Elizabeth Shenkman²⁷; William R Hogan,²⁷; Priscilla Igho-Pemu²⁸; W Karlson²⁹; Jordan Smoller²⁹; Shawn N Murphy²⁹; Margaret Elizabeth Ross³⁰; Rainu Kaushal³⁰; Eboni Winford³¹; Febe Wallace³¹; Parinda Khatri³¹; Vik Kheterpal³²; Monica Kraft³³; Francisco A Moreno³³; Irving Kron³³; Rachele Peterson³³; Patricia Watkins Lattimore³⁴; Cheryl Thomas³⁴; Mitchell Lunn³⁵; Juno Obedin-Maliver³⁵; Oscar Marroquin³⁶; Shyam Visweswaran³⁶; Steven Reis³⁶; Patrick McGovern³⁷; Fatima Munoz³⁸; Gregory Talavera³⁸; George T O’Connor³⁹; Christopher O’Donnell⁴⁰; Lucila Ohno-Machado⁴¹; Greg Orr⁴²; Fornessa Randal⁴³; Andreas A Theodorou⁴⁴; Eric Reiman⁴⁴; Mercedita Roxas-Murray⁴⁵; Louisa Stark⁴⁶; Ronnie Tepp⁴⁷; Alicia Zhou⁴⁸; Scott Topper⁴⁸; Rhonda Trousdale⁴⁹; Phil Tsao⁵⁰; Scott T Weiss⁵¹; David Wellis⁵²; Jeffrey Whittle⁵³; Amanda Wilson⁵⁴; Stephan Zuchner⁵⁵; Olveen Carrasquillo⁵⁵; Margaret Pericak-Vance⁵⁵; Michael E Zwick⁵⁶; Megan Lewis⁵⁷; Jen Uhrig⁵⁷; May Okihiro⁵⁸

Note

This is the list of individuals who were Principal Investigators or equivalent with the All of Us Research Program during the period that this paper was in development, October 1, 2019 –July 31, 2021.

+ Principal Investigator/Lead Author for the All of Us Research Program protocol ([email protected])

Affiliations:

1. Henry Ford Health System, Detroit, Michigan, United States of America

2. University of Chicago Medical Center, Chicago, Illinois, United States of America

3. Jackson-Hinds Comprehensive Health Center, Jackson, Mississippi, United States of America

4. University of California, Irvine, Irvine, California, United States of America

5. Scripps Research Translational Institute, La Jolla, California, United States of America

6. Vibrent Health, Fairfax, Virginia, United States of America

7. University of Alabama at Birmingham, Birmingham, Alabama, United States of America

8. Columbia University, New York, New York, United States of America

9. University of Texas Health Science Center at Houston, Houston, Texas, United States of America

10. Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America

11. University of Wisconsin at Madison, Madison, Wisconsin, United States of America

12. Community Health Center, Inc., Middletown, Connecticut, United States of America

13. Sun River Health, New York, New York, United States of America

14. Mayo Clinic and Foundation, Rochester, Rochester, Minnesota, United States of America

15. Cooperative Health, Columbia, South Carolina, United States of America

16. University of Illinois at Chicago, Chicago, Illinois, United States of America

17. Vanderbilt University Medical Center, Nashville, Tennessee, United States of America

18. Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America

19. University of Washington, Seattle, Washington, United States of America

20. FiftyForward, Nashville, Tennessee, United States of America

21. Broad Institute, Cambridge, Massachusetts, United States of America

22. Baylor University, Waco, Texas, United States of America

23. National Alliance for Hispanic Health, Washington, DC, United States of America

24. Verily Life Sciences, San Francisco, California, United States of America

25. MITRE Corporation, McLean, Virginia, United States of America

26. Northwestern University, Chicago, Illinois, United States of America

27. University of Florida, Gainesville, Florida, United States of America

28. Morehouse School of Medicine, Atlanta, Atlanta, Georgia, United States of America

29. Partners Health Care, Boston, Massachusetts, United States of America

30. Cornell University, Weill Medical College, Ithaca, New York, United States of America

31. Cherokee Health Systems, Knoxville, Tennessee, United States of America

32. CareEvolution, Inc., Ann Arbor, Michigan, United States of America

33. University of Arizona, Tucson, Tucson, Arizona, United States of America

34. Delta Research and Educational Foundation, Washington, DC, United States of America

35. Stanford University, Palo Alto, California, United States of America

36. University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America

37. Wondros, Los Angeles, California, United States of America

38. San Ysidro Health Center, San Ysidro, California, United States of America

39. Boston Medical Center, Boston, Massachusetts, United States of America

40. VA All of Us Coordinating Center, Boston, Boston, Massachusetts, United States of America

41. University of California, San Diego, La Jolla, California, United States of America

42. Walgreen Co., Deerfield, Illinois, United States of America

43. Asian Health Coalition, Chicago, Illinois, United States of America

44. Banner Health, Phoenix, Arizona, United States of America

45. Montage Marketing Group, Bethesda, Maryland, United States of America

46. University of Utah, Salt Lake City, Utah, United States of America

47. HCM Strategists, Austin, Texas, United States of America

48. Color Genomics, Inc., Burlingame, California, United States of America

49. NYC Health + Hospitals, New York, New York, United States of America

50. VA All of Us Coordinating Center—Palo Alto, Palo Alto, California, United States of America

51. Brigham and Women’s Hospital, Boston, Massachusetts, United States of America

52. San Diego Blood Bank, San Diego, California, United States of America

53. Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America

54. National Library of Medicine (NLM), Bethesda, Maryland, United States of America

55. University of Miami School of Medicine, Miami, Florida, United States of America

56. Emory University, Atlanta, Georgia, United States of America

57. Research Triangle Institute, Research Triangle Park, North Carolina, United States of America

58. Waianae Coast CHC, Waianae, Hawaii, United States of America

Citation: Aschebrook-Kilfoy B, Zakin P, Craver A, Shah S, Kibriya MG, Stepniak E, et al. (2022) An Overview of Cancer in the First 315,000 All of Us Participants. PLoS ONE 17(9): e0272522. https://doi.org/10.1371/journal.pone.0272522

About the Authors:

Briseis Aschebrook-Kilfoy

Roles: Conceptualization, Data curation, Formal analysis, Investigation, Supervision, Writing – original draft

E-mail: [email protected]

Affiliations Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America, Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America, Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, United States of America

https://orcid.org/0000-0003-1918-7816

Paul Zakin

Roles: Conceptualization

Andrew Craver

Roles: Data curation, Methodology, Writing – review & editing

Sameep Shah

Roles: Data curation, Formal analysis

Muhammad G. Kibriya

Roles: Conceptualization, Writing – review & editing

Elizabeth Stepniak

Roles: Project administration, Supervision

Andrea Ramirez

Roles: Funding acquisition, Project administration, Resources, Writing – review & editing

Affiliation: Vanderbilt University Medical Center, Nashville, Tennessee, United States of America

Cheryl Clark

Roles: Project administration, Resources, Writing – review & editing

Affiliation: Brigham and Women’s Hospital, Boston, Massachusetts, United States of America

Elizabeth Cohn

Roles: Methodology, Resources, Writing – review & editing

Affiliation: Hunter College City University of New York, New York, New York, United States of America

Lucila Ohno-Machado

Roles: Project administration, Resources, Writing – review & editing

Affiliation: University of California San Diego Health, La Jolla, California, United States of America

Mine Cicek

Roles: Project administration, Resources, Writing – review & editing

Affiliation: Mayo Clinic, Rochester, Minnesota, United States of America

Eric Boerwinkle

Roles: Project administration, Resources, Writing – review & editing

Affiliation: The University of Texas Health Science Center at Houston, Houston, Texas, United States of America

Sheri D. Schully

Roles: Funding acquisition, Project administration, Resources, Writing – review & editing

Affiliation: National Institutes of Health, Bethesda, Maryland, United States of America

Stephen Mockrin

Roles: Project administration, Resources, Writing – review & editing

Affiliation: National Institutes of Health, Leidos, Inc, Frederick, Maryland, United States of America

Kelly Gebo

Roles: Project administration, Resources, Writing – review & editing

Affiliation: Johns Hopkins University School of Medicine, Bethesda, Maryland, United States of America

https://orcid.org/0000-0003-4010-398X

Kelsey Mayo

Roles: Project administration, Resources, Writing – review & editing

Affiliation: National Institutes of Health, Bethesda, Maryland, United States of America

Francis Ratsimbazafy

Roles: Resources, Software, Writing – review & editing

Affiliation: National Institutes of Health, Bethesda, Maryland, United States of America

Alan Sanders

Roles: Methodology, Project administration, Writing – review & editing

Affiliation: Northshore University Health System, Evanston, Illinois, United States of America

https://orcid.org/0000-0001-6629-4011

Raj C. Shah

Roles: Resources, Writing – review & editing

Affiliation: Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, United States of America

Maria Argos

Roles: Conceptualization, Writing – review & editing

Affiliation: Division of Epidemiology and Biostatistics, School of Public Health, University of Illinois at Chicago, Chicago, Illinois, United States of America

Joyce Ho

Roles: Project administration, Resources, Writing – review & editing

Affiliation: Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America

https://orcid.org/0000-0003-4191-0054

Karen Kim

Roles: Resources, Supervision, Writing – review & editing

Affiliations Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, United States of America, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America

Martha Daviglus

Roles: Funding acquisition, Resources, Supervision, Writing – review & editing

Affiliation: Institute for Minority Health Research, College of Medicine, University of Illinois at Chicago, Chicago, Illinois, United States of America

Philip Greenland

Roles: Conceptualization, Funding acquisition, Investigation, Resources, Writing – review & editing

Affiliation: Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America

Habibul Ahsan

Roles: Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – review & editing

On behalf of the All of Us Research Program Investigators

¶a full list is noted in the acknowledgments.

References

1. Ward EM, Sherman RL, Henley SJ, Jemal A, Siegel DA, Feuer EJ, et al. Annual Report to the Nation on the Status of Cancer, Featuring Cancer in Men and Women Age 20–49 Years. J Natl Cancer Inst. 2019;111(12):1279–97. pmid:31145458; PubMed Central PMCID: PMC6910179.

2. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin. 2016;66(1):7–30. Epub 20160107. pmid:26742998.

3. American Cancer Society. Cancer Facts & Figures 2020. Atlanta: American Cancer Society, 2020.

4. Bauer UE, Briss PA, Goodman RA, Bowman BA. Prevention of chronic disease in the 21st century: elimination of the leading preventable causes of premature death and disability in the USA. Lancet. 2014;384(9937):45–52. Epub 20140701. pmid:24996589.

5. Pickens CM, Pierannunzi C, Garvin W, Town M. Surveillance for Certain Health Behaviors and Conditions Among States and Selected Local Areas—Behavioral Risk Factor Surveillance System, United States, 2015. MMWR Surveill Summ. 2018;67(9):1–90. Epub 20180629. pmid:29953431; PubMed Central PMCID: PMC6023179.

6. Jemal A, Center MM, DeSantis C, Ward EM. Global patterns of cancer incidence and mortality rates and trends. Cancer Epidemiol Biomarkers Prev. 2010;19(8):1893–907. Epub 20100720. pmid:20647400.

7. Pauli C, Hopkins BD, Prandi D, Shaw R, Fedrizzi T, Sboner A, et al. Personalized In Vitro and In Vivo Cancer Models to Guide Precision Medicine. Cancer Discov. 2017;7(5):462–77. Epub 20170322. pmid:28331002; PubMed Central PMCID: PMC5413423.

8. Werner RJ, Kelly AD, Issa JJ. Epigenetics and Precision Oncology. Cancer J. 2017;23(5):262–9. pmid:28926426; PubMed Central PMCID: PMC5708865.

9. Amin MB, Greene FL, Edge SB, Compton CC, Gershenwald JE, Brookland RK, et al. The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more "personalized" approach to cancer staging. CA Cancer J Clin. 2017;67(2):93–9. Epub 20170117. pmid:28094848.

10. Paolillo C, Londin E, Fortina P. Next generation sequencing in cancer: opportunities and challenges for precision cancer medicine. Scand J Clin Lab Invest Suppl. 2016;245:S84–91. Epub 20160817. pmid:27542004.

11. National Institutes of Health. The Precision Medicine Initiative Cohort Program–Building a Research Foundation for 21st Century Medicine. National Institutes of Health, U.S. Department of Health and Human Services, 2015.

12. National Institutes of Health. PMI Working Group of the Advisory Committee to the Director: National Institutes of Health, U.S. Department of Health and Human Services 2015 [cited 2021 April 30]. Available from: https://allofus.nih.gov/about/who-we-are/pmi-working-group-advisory-committee-director.

13. Collins FS, Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;372(9):793–5. Epub 20150130. pmid:25635347; PubMed Central PMCID: PMC5101938.

14. Oh SS, Galanter J, Thakur N, Pino-Yanes M, Barcelo NE, White MJ, et al. Diversity in Clinical and Biomedical Research: A Promise Yet to Be Fulfilled. PLoS Med. 2015;12(12):e1001918. Epub 20151215. pmid:26671224; PubMed Central PMCID: PMC4679830.

15. American Cancer Society. Cancer Facts and Figures for African Americans 2019–2021. Atlanta: American Cancer Society, 2021.

16. National Cancer Institute. Overview of the SEER Program: National Cancer Institute 2020. Available from: https://seer.cancer.gov/about/overview.html.

17. Denny JC, Devaney SA, Gebo KA. The "All of Us" Research Program. Reply. N Engl J Med. 2019;381(19):1884–5. pmid:31693826.

18. Ramirez A, Sulieman L, Schlueter D, Halvorson A, Qian J, Ratsimbazafy F, et al. The All of Us Research Program: data quality, utility, and diversity. medRxiv; 2020.

19. National Institutes of Health. All of Us Research Hub: Data Methods: National Institutes of Health, U.S. Department of Health and Human Services; 2020 [cited 2020 April 23]. Available from: https://www.researchallofus.org/methods/.

20. National Institutes of Health. All of Us Research Hub: National Institutes of Health, U.S. Department of Health and Human Services; 2020 [cited 2020 April 30]. Available from: https://www.researchallofus.org/.

21. SEER*Stat Database: Incidence—SEER 9 Regs Research Data, Nov 2018 Sub (1975–2016) <Katrina/Rita Population Adjustment>—Linked To County Attributes—Total U.S., 1969–2017 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2019, based on the November 2018 submission. [Internet]. National Cancer Institute 2020. Available from: www.seer.cancer.gov.

22. World Health Organization. International classification of diseases for oncology (ICD-O). 3rd, 1st revision ed. Geneva: World Health Organization; 2013 2013.

23. National Cancer Institute. National Cancer Institute SEER*Stat software: National Cancer Institute; 2019. Available from: https://seer.cancer.gov/seerstat/.

24. Wallace DC, Bartlett R. Recruitment and retention of African American and Hispanic girls and women in research. Public Health Nurs. 2013;30(2):159–66. Epub 20121122. pmid:23452110; PubMed Central PMCID: PMC4040954.

25. Mapes BM, Foster CS, Kusnoor SV, Epelbaum MI, AuYoung M, Jenkins G, et al. Diversity and inclusion for the All of Us research program: A scoping review. PLoS One. 2020;15(7):e0234962. Epub 20200701. pmid:32609747; PubMed Central PMCID: PMC7329113.

26. Denny JC, Rutter JL, Goldstein DB, Philippakis A, Smoller JW, Jenkins G, et al. The “All of Us” Research Program. New England Journal of Medicine. 2019;381(7):668–76. pmid:31412182.

Word count: 5618

Show less

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication: https://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Introduction

The NIH All of Us Research Program will have the scale and scope to enable research for a wide range of diseases, including cancer. The program’s focus on diversity and inclusion promises a better understanding of the unequal burden of cancer. Preliminary cancer ascertainment in the All of Us cohort from two data sources (self-reported versus electronic health records (EHR)) is considered.

Materials and methods

This work was performed on data collected from the All of Us Research Program’s 315,297 enrolled participants to date using the Researcher Workbench, where approved researchers can access and analyze All of Us data on cancer and other diseases. Cancer case ascertainment was performed using data from EHR and self-reported surveys across key factors. Distribution of cancer types and concordance of data sources by cancer site and demographics is analyzed.

Results and discussion

Data collected from 315,297 participants resulted in 13,298 cancer cases detected in the survey (in 89,261 participants), 23,520 cancer cases detected in the EHR (in 203,813 participants), and 7,123 cancer cases detected across both sources (in 62,497 participants). Key differences in survey completion by race/ethnicity impacted the makeup of cohorts when compared to cancer in the EHR and national NCI SEER data.

Conclusions

This study provides key insight into cancer detection in the All of Us Research Program and points to the existing strengths and limitations of All of Us as a platform for cancer research now and in the future.

Details

Title

An Overview of Cancer in the First 315,000 All of Us Participants

Author

Aschebrook-Kilfoy, Briseis

; Zakin, Paul; Craver, Andrew; Shah, Sameep; Kibriya, Muhammad G; Stepniak, Elizabeth; Ramirez, Andrea; Clark, Cheryl; Cohn, Elizabeth; Ohno-Machado, Lucila; Cicek, Mine; Boerwinkle, Eric; Schully, Sheri D; Mockrin, Stephen; Gebo, Kelly

; Mayo, Kelsey; Ratsimbazafy, Francis; Sanders, Alan

; Shah, Raj C; Argos, Maria; Ho, Joyce

; Kim, Karen; Daviglus, Martha; Greenland, Philip; Ahsan, Habibul; ¶a full list is noted in the acknowledgments.

First page

e0272522

Section

Research Article

Publication year

2022

Publication date

Sep 2022

Publisher

Public Library of Science

e-ISSN

19326203

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1371/journal.pone.0272522

ProQuest document ID

2708995775

An Overview of Cancer in the First 315,000 All of Us Participants

Jump to:

Full text

Abstract

Details

Suggested sources