Introduction
Cancer was the leading cause of death in the United States after cardiovascular diseases in 2020, with over 600,000 cancer-related deaths and a further 1.8 million expected diagnoses [1–5]. Although treatments are improving and personalized medicine promises advancements, cancer diagnoses are expected to increase substantially over the next decade, due mainly to the aging population in the US and modifiable behavioral/lifestyle factors [1,2,6]. The risk of developing cancer depends on the complex interplay of factors including genes, age, and gender, lifestyle and behavioral factors such as diet, energy balance, physical activity, tobacco and alcohol use; endogenous factors such as hormones and growth factors; medication and drug use; infectious agents; and environmental exposures [1,6]. Precision medicine and precision health, which consider the patient as an individual, hold promise for cancer research [7–10]. For instance, individuals with similar diagnoses often receive the same treatment despite observations that efficacy varies by patient. Additionally, new approaches to precision prevention and early detection, informed by an enriched understanding of the etiology and natural history of cancer, could improve clinical interventions.
With over one million participants, the All of Us Research Program will have the scale to enable research on myriad diseases, especially cancer [11–13]. The program’s focus on diversity and inclusion promises to shed light on US cancer inequities, as fewer than 2% of cancer studies have been powered to consider race/ethnicity [14,15]. Given its diversity and large sample size, All of Us may have the statistical power to answer questions about the causes of cancer and drivers of disparities and identify opportunities for precision prevention.
Researchers currently have access to data from over 315K All of Us participants through the Researcher Workbench. Although the program does not target enrollment by health status, the sample to date includes a sufficient number of participants with a history of cancer, prevalent cancers, and incident cancers to enable systematic studies of cancer risk, outcomes, medication effects, and therapeutic approaches across environmental, social, genomic, and economic contexts. This demonstration project examines the distribution and characterization of cancer in All of Us and compares these numbers to expected national rates reported by the Surveillance, Epidemiology, and End Results (SEER) Program [16] and distribution in the US population.
Materials and methods
All of us research projects
The goals, recruitment methods and sites, and scientific rationale for All of Us have been described previously [17]. Demonstration projects were designed to establish the value of the cohort by describing the cohort and replicating previous findings for validation [18]. The work described here was proposed by Consortium members, reviewed and overseen by the program’s Science Committee, and was confirmed as meeting criteria for non-human subjects research by the All of Us Institutional Review Board. The initial release of data and tools used in this work was published in 2020 [18].
This work was performed using the All of Us Researcher Workbench, a cloud-based platform where approved researchers can access and analyze All of Us data. At the time of analysis, the All of Us data included survey responses, Electronic Health Records (EHR), and physical measurements (PM). These three types of data are collected either at an All of Us affiliated health care provider organization (HPO) or through a “direct-volunteer” mechanism. HPOs include regional medical centers, federally qualified health centers, and the Veterans Health Administration. HPOs recruit the majority of program participants–mainly persons affiliated with their center. The direct-volunteer route allows those who are not HPO patients to enroll online and visit a designated health clinic, blood bank, laboratory, or health care provider organization to have their PM collected. All three data types (survey, PM, and EHR) were mapped to the Observational Health and Medicines Outcomes Partnership (OMOP) common data model v 5.2 maintained by the Observational Health and Data Sciences Initiative (OHDSI) collaborative
To protect participant privacy, a series of data transformations were applied. These included data suppression of codes with a high risk of identification such as military status; generalization of categories, including age, sex at birth, gender identity, sexual orientation, and race; and date shifting by a random (less than one year) number of days, implemented consistently across each participant record. Documentation on privacy implementation and creation of the CDR is available in the All of Us Registered Tier CDR Data Dictionary [19]. The Researcher Workbench currently offers tools with a user interface (UI) built for selecting groups of participants (Cohort Builder), creating datasets for analysis (Dataset Builder), and Workspaces with Jupyter Notebooks (Notebooks) to analyze data. The Notebooks enable use of saved datasets and direct query using R and Python 3 programming languages.
Study population
Participant-provided information for our analysis was derived from the surveys described above. The full text of these surveys is available in the Survey Explorer found in the All of Us Research Hub, a publicly available website designed to support researchers [20]. The Basics survey elicits demographic information including age, race/ethnicity, education, marital status, household income, and geography. The Lifestyle survey collects tobacco use data. Personal Medical History collects self-reported cancer history, including cancer type(s), life stage at diagnosis, and whether the participant is currently seeing a health care provider and/or receiving cancer treatment. The Basics and Lifestyle surveys are collected at baseline, whereas Personal Medical History is collected during retention efforts 3 months after enrollment.
Cancer diagnosis data were also derived from participant EHR. Diagnoses were determined using SNOMED CT codes and mapped to OMOP concept ID by the All of Us DRC. SNOMED CT codes for cancers and subtypes were combined to reflect the categories used for national reporting, including SEER and the North American Association of Central Cancer Registries (NACCR). EHR data also include procedures, medications, laboratory tests, and health care provider visits. We used the following cancers/cancer sites in our analysis: bladder (93689003), leukemia (93143009), non-Hodgkin’s lymphoma (118601006), myeloma (109989006), bone (93725000), brain (93727008), breast (372137005), cervix (372024009), colon (93761005), endocrine system (371983001), endometrium (10708511000119100), esophagus (371984007), eye (371986009), head/neck (372123001), kidney (93849006), lung (93880001), oral cavity (372001002), ovary (93934004), pancreas (372003004), prostate (93974005), rectum (93984006), stomach (372014001), and thyroid (94098005).
Categories of time from diagnosis were taken from the Personal Medical History survey, which asks: “About how old were you when you were first told you had this condition?” Response categories were child (0–11), adolescent (12–17), adult (18–64), older adult (65–74), and elderly (75+).
Time from diagnosis in the EHR was calculated as the current date minus the date of diagnosis, reported in years (mean, SD, and median).
Treatment type was reported for persons with a history of cancer from the EHR using the following SNOMED codes: surgery (1623, 11600, 11601, 11602, 11603, 11604, 11606, 11620, 11621, 11622, 11624, 11626, 11640, 11641, 11642, 11643, 11644, 11646, 17260, 17261, 17262, 17263, 17264, 17266, 17270, 17272, 17273, 17274, 17276, 17280, 17281, 17282, 17283, 17284, 17286, 370612006), radiotherapy (108290001), chemotherapy (38216008), immunotherapy (64644003), hormone therapy (10324, 72143, L02BB, L02BG), and stem cell transplant (41.04, 41.05, 41.06, 41.07, 41.08).
National comparison
We compared the observed frequency of cancer reported in All of Us to National Cancer Institute’s SEER 18 Registries Database, November 2018 submission [21], to analyze cancer frequency overall and by site based on cases diagnosed in 2016 among residents of the areas included in the 18 registries covering ∼28% of the United States population. We reported the frequency of diagnosis in 2016 by assessing the limited duration 26-year cancer prevalence to determine the relative frequency and percent contribution of each cancer type to all cancers in the population by evaluating prevalence data representing the first invasive tumor site. Limited-Duration Prevalence represents the proportion of people alive on a certain day who had a diagnosis of the disease within the past x years (e.g. x = 5, 10 or 20 years). We chose the most recent year of diagnosis given the period for which All of Us has been conducting enrollment. Skin cancer (melanoma of the skin) was excluded from the “total cancer” calculation for SEER cancers and from the analysis since the All of Us survey data does not differentiate between melanoma and non-melanoma skin cancer. Invasive cancer was coded using the International Classification of Diseases for Oncology, third edition (ICD-O-3) [22].
Data analysis
We generated descriptive statistics and prevalence for the most common cancers and used Chi-square tests to test the difference in the categorical distribution of data source types (survey data, EHR, and both) across the key demographic and lifestyle categories. The percent distribution of cancer types was calculated as the number of cases per site/total number of cancer cases in each respective dataset. Results are stratified by race/ethnicity and sex at birth to consider the demographic-specific distributions in cancer types. Cancer frequency was calculated using SEER*Stat 8.3.9 [23].
Results
Table 1 shows the distribution of the baseline characteristics of all participants (N = 315,297), and by those with a cancer outcome as captured from the EHR (N = 203,813 participants with EHR; including N = 23,520 cancer cases), via self-report in the survey database (N = 89,261 completed Personal Medical History survey; including N = 13,298 cancer cases), and from participants with both survey and EHR data (N = 62,497 participants with both data types; including N = 7,123 cancer cases). Personal Medical History survey completion varies considerably, with older, female, and non-Hispanic Whites more likely to provide data than the population with available EHR (that more closely reflects the larger All of Us participant population). Differences across key demographic factors in data availability (survey data and/or EHR) are reflected in the distribution of cancer from the different data sources. Specifically, 84.8% of cancers from the Personal Medical History survey were reported by non-Hispanic Whites, 5.0% by Blacks, and 4.7% by Hispanics compared to 67.1%, 14.3%, and 12.2% respectively captured from the EHR. Non-Hispanic Whites are overwhelmingly represented among those with both self-report and EHR data (75.8%) compared to 51.5% representation in the overall All of Us study population. All p-values for the chi-square values comparing the distributions are <0.001 except the comparison of EMR versus total (which is 0.002).
[Figure omitted. See PDF.]
Table 2 shows that All of Us participants’ EHR data indicate a history of breast cancer most frequently (N = 6,474; 27.5% of cases) followed by blood cancers (N = 4,841; 20.6%) and prostate cancer (N = 3,971; 16.9%). This mirrors the most common self-reported cancers (from the survey) for breast cancer (N = 4,062; 30.5%) and prostate cancer (N = 2,165; 16.3%) but not for blood cancer (N = 483; 9.9%). There are N = 2,499 individuals with breast cancer documented from both the survey and EHR data sources, followed by N = 1,304 individuals with prostate cancer cases, and followed by N = 657 blood cancer cases. Prevalence is broken down by cancer site showing the difference in contribution to disease burden by data source.
[Figure omitted. See PDF.]
Table 3 presents cancer type distribution from each data source by race and ethnicity, with N = 6,125 cancer cases detected in both data sources for non-Hispanic Whites compared to N = 328 cancer cases in African Americans and N = 294 cancer cases in Hispanics. Differences in the distribution of cancer types between survey data and EHR are observed by race/ethnicity (both within and between race/ethnicity, comparing non-Hispanic Whites, Blacks, and Hispanics (<0.001)). The prevalence of cancer subsequently varies by race/ethnicity in each data source as well as reported here.
[Figure omitted. See PDF.]
Table 4 compares the distribution of cancer sites from All of Us survey data and EHR to the expected distribution nationally, based on recent SEER reports of the 26-year limited duration prevalence in 2018. The most common cancer types in SEER (based on contribution to total cancers) are breast cancer (19.9%), prostate cancer (17.6%), blood cancers (11.4%), and colorectal cancers (8.4%). The percent contribution to the cancer burden nationally (as illustrated by SEER data) from each cancer site differs significantly from the EHR site distribution (p<0.001) and the self-reported distribution (p<0.001). As expected, the percent of persons enrolled into All of Us largely from medical centers have a higher proportion of prevalent cancer (11.54% in EHR and 14.90% in survey) than in the US population reported by SEER (4.43%).
[Figure omitted. See PDF.]
Table 5 presents a description of the time from cancer diagnosis as reported in the EHR and survey database. The cancer with the shortest time from diagnosis in the EHR is lung cancer (mean = 5.85 years; SD = 4.46), and the longest time from diagnosis is for head and neck cancer (mean = 11.75 years; SD = 6.59). Across all cancer types, the most common period of diagnosis was adult, followed by older adult.
[Figure omitted. See PDF.]
Table 6 presents treatment types for cancer overall and by site. The most common treatment from the EHR is radiation (N = 7,422; 31.56%), followed by surgery (N = 5,975; 25.54%), hormone therapy (N = 3,962; 16.84%), chemotherapy (N = 842; 3.58%), immunotherapy (N = 470; 1.2%), and stem cell transplant (N = 127; 0.54%). Treatment type utilization varied by cancer site.
[Figure omitted. See PDF.]
Conclusions
In this preliminary analysis of data from the All of Us Research Program, we report that the first 315K+ persons comprise a diverse population with a large number of prevalent cancer cases. As the goal of this effort is to inform studies on a variety of health conditions, including cancer, and to delineate information on risk factors and treatments, an early evaluation of cancers represented in the study population is warranted. Our findings have some key implications for cancer prevention, control, treatment, and outcomes research in the All of Us study population.
Our most notable finding is simple: although a diverse cohort is being enrolled, self-reported cancers are not being ascertained as frequently through the survey modules among underrepresented participants. As validation of diagnosis from EHR using manual verification or self-report is the gold standard to ensure accurate classification and minimize measurement error, the difference in valid case ascertainment by key factors like race is relevant for All of Us cancer research. The drop in cancer data detected from the survey or validated with survey data is associated with racial/ethnic differences in longitudinal retention. Although surveys are completed by a relatively older population, age doesn’t appear to be a key factor influencing differences in data collection. History of cancer is collected through a survey completed at least 90 days after enrollment in All of Us, with an overall medical history survey completion rate among underrepresented participants of 22% across the program compared to 42% in non-UBR participants. Some factors noted in the literature previously [24] that could be of relevance for differences in retention by race/ethnicity include language, literacy, cultural appropriateness, flexibility, ongoing incentives, communication, and of particular growing importance with increasingly electronic survey data collection is the digital divide. This has research implications for the cancer history data collected at follow-up as well as other key risk factor information including health care utilization, personal medical history, and family history. Our investigation shows that the impact of these factors on cancer disparities will be underreported even if cancer history can be obtained from the EHR of most underrepresented participants. All of Us leadership has changed survey module timeline and made Personal Medical History available at baseline, addressing some of the limitations noted here for prospective enrollees.
Furthermore, the difference in cancer ascertainment between survey modules and EHR modalities in underrepresented participants highlights the importance of technologies to integrate the medical records of direct volunteers. Sync for Science for obtaining EHRs from direct volunteers or other non-digital methods of collecting survey data could offer utility beyond the ability to confer medical record information for direct volunteers, as there are implications for inclusion and equity in the investigation of all diseases, including cancer.
The distribution of cancer sites between the two data sources when compared to SEER national statistics is impacted by exclusion of skin cancer from the All of Us cancer analyses. Skin cancer cases account for approximately half of the total cases reported in the survey data. These cases likely include both malignant and non-malignant skin cancers, which would introduce significantly different relative proportions of other cancers if included in the analysis. As restriction to malignant cases was not possible, we excluded all skin cancer cases from analysis.
Another point to consider is the grouping of blood cancers. Because the survey module asks about blood cancers generically, it is impossible to differentiate between myeloma, lymphoma, and leukemia in survey responses. This distinction can be deciphered from the EHR when available. The ability to distinguish these types will be crucial to many cancer researchers.
We further report on the time from diagnosis and the life stage to consider opportunities to collect incident cases or investigate hypotheses for more recent diagnoses. The utility of the life stage questions in etiology or outcomes research is unclear, as the groups (age ranges (child (0–11); adolescent (12–17); adult (18–64); older adult (65–74) and elderly (75+)) are quite broad in the survey. A more refined or consistent metric, such as date of cancer diagnosis, would aid investigation of various cancer-related hypotheses (such as being able to stratify by pre and post menopausal breast cancer. Presenting this data side-by-side highlights how distinct these metrics of diagnosis timing really are.
The All of Us Research Program is set to become one of the largest scientific efforts in U.S. history, and its emphasis on inclusion presents key opportunities to advance precision health and medicine and address disparities in research [25]. Despite the limitations noted in this report, this unprecedented depth of inclusion will confer an important resource for cancer research. All of Us was conceived to support studies of disease outcomes, medication effects, and other therapeutic approaches across various environmental, social, genomic, and economic contexts [26]. The scale and scope of its current cancer data will support extensive investigation of cancer-related hypotheses and enhance the pace of discovery and generalizability. The cohort’s expansion to 1 million participants will create further opportunities. Furthermore, feedback from demonstration projects such as this one will directly inform edits to existing surveys and development of reassessment modules.
In summary, the All of Us Research Program has collected significant cancer data from its first 315K participants. This preliminary investigation notes the most common cancers that will confer sufficient study power for research, especially once whole genome data is available for all participants. Considering our findings, the program might consider the implications of lower retention through survey completion among underrepresented participants on the resource’s utility for research on cancer and other diseases.
Acknowledgments
Past and Present All of Us Research Program Principal Investigators: Brian Ahmedani1; Christine D Cole Johnson1; Briseis Aschebrook-Kilfoy2; Habibul Ahsan2; Donna Antoine-LaVigne3; Glendora Singleton*3; Pamelia Watson-McGee3; Arnita Ford Norwood3; Hoda Anton-Culver4; Eric Topol5; Katie Baca-Motes5; Julia Moore-Vogel5; Steven Steinhubl5; Praduman Jain6; Mark Begale6; Neeta Jain6; David Klein6; Scott Sutherland6; James Wade*6; Bruce Korf7; Mona Fouad7; Beth Lewis7; David B Goldstein8; Louise Bier8; Ali G Gharavi8; George Hripcsak8; Eric Boerwinkle9; Murray H Brilliant10; Narayana Murali10; Scott Joseph Hebbring10; Elizabeth Burnside11; Dorothy Farrar-Edwards11; Yashoda Sharma12; Amy Taylor12; Carmen Chinea13; Liliana Lombardi Desa13; Nancy Jenks13; Steve Thibodeau14; Mine Cicek14; Eric Schlueter15; Beverly Wilson Holmes15; Maria Argos16; Martha Daviglus16; Robert Winn16; Paul Harris17; Consuelo Wilkins17; Dan Roden17; Joshua Denny17; Kim Doheny18; Debbie Nickerson19; Evan Eichler19; Gail Jarvik19; Gretchen Funk20; Sallie Hussey20; Anthony Philippakis21; Heidi Rehm21; Stacey Gabriel21; Richard Gibbs22; Edgar M Gil Rico23; David Glazer24; Jessica Burke25; Joyce Ho26, Philip Greenland26; Elizabeth Shenkman27; William R Hogan,27; Priscilla Igho-Pemu28; W Karlson29; Jordan Smoller29; Shawn N Murphy29; Margaret Elizabeth Ross30; Rainu Kaushal30; Eboni Winford31; Febe Wallace31; Parinda Khatri31; Vik Kheterpal32; Monica Kraft33; Francisco A Moreno33; Irving Kron33; Rachele Peterson33; Patricia Watkins Lattimore34; Cheryl Thomas34; Mitchell Lunn35; Juno Obedin-Maliver35; Oscar Marroquin36; Shyam Visweswaran36; Steven Reis36; Patrick McGovern37; Fatima Munoz38; Gregory Talavera38; George T O’Connor39; Christopher O’Donnell40; Lucila Ohno-Machado41; Greg Orr42; Fornessa Randal43; Andreas A Theodorou44; Eric Reiman44; Mercedita Roxas-Murray45; Louisa Stark46; Ronnie Tepp47; Alicia Zhou48; Scott Topper48; Rhonda Trousdale49; Phil Tsao50; Scott T Weiss51; David Wellis52; Jeffrey Whittle53; Amanda Wilson54; Stephan Zuchner55; Olveen Carrasquillo55; Margaret Pericak-Vance55; Michael E Zwick56; Megan Lewis57; Jen Uhrig57; May Okihiro58
Note
This is the list of individuals who were Principal Investigators or equivalent with the All of Us Research Program during the period that this paper was in development, October 1, 2019 –July 31, 2021.
+ Principal Investigator/Lead Author for the All of Us Research Program protocol ([email protected])
Affiliations:
1. Henry Ford Health System, Detroit, Michigan, United States of America
2. University of Chicago Medical Center, Chicago, Illinois, United States of America
3. Jackson-Hinds Comprehensive Health Center, Jackson, Mississippi, United States of America
4. University of California, Irvine, Irvine, California, United States of America
5. Scripps Research Translational Institute, La Jolla, California, United States of America
6. Vibrent Health, Fairfax, Virginia, United States of America
7. University of Alabama at Birmingham, Birmingham, Alabama, United States of America
8. Columbia University, New York, New York, United States of America
9. University of Texas Health Science Center at Houston, Houston, Texas, United States of America
10. Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America
11. University of Wisconsin at Madison, Madison, Wisconsin, United States of America
12. Community Health Center, Inc., Middletown, Connecticut, United States of America
13. Sun River Health, New York, New York, United States of America
14. Mayo Clinic and Foundation, Rochester, Rochester, Minnesota, United States of America
15. Cooperative Health, Columbia, South Carolina, United States of America
16. University of Illinois at Chicago, Chicago, Illinois, United States of America
17. Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
18. Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
19. University of Washington, Seattle, Washington, United States of America
20. FiftyForward, Nashville, Tennessee, United States of America
21. Broad Institute, Cambridge, Massachusetts, United States of America
22. Baylor University, Waco, Texas, United States of America
23. National Alliance for Hispanic Health, Washington, DC, United States of America
24. Verily Life Sciences, San Francisco, California, United States of America
25. MITRE Corporation, McLean, Virginia, United States of America
26. Northwestern University, Chicago, Illinois, United States of America
27. University of Florida, Gainesville, Florida, United States of America
28. Morehouse School of Medicine, Atlanta, Atlanta, Georgia, United States of America
29. Partners Health Care, Boston, Massachusetts, United States of America
30. Cornell University, Weill Medical College, Ithaca, New York, United States of America
31. Cherokee Health Systems, Knoxville, Tennessee, United States of America
32. CareEvolution, Inc., Ann Arbor, Michigan, United States of America
33. University of Arizona, Tucson, Tucson, Arizona, United States of America
34. Delta Research and Educational Foundation, Washington, DC, United States of America
35. Stanford University, Palo Alto, California, United States of America
36. University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
37. Wondros, Los Angeles, California, United States of America
38. San Ysidro Health Center, San Ysidro, California, United States of America
39. Boston Medical Center, Boston, Massachusetts, United States of America
40. VA All of Us Coordinating Center, Boston, Boston, Massachusetts, United States of America
41. University of California, San Diego, La Jolla, California, United States of America
42. Walgreen Co., Deerfield, Illinois, United States of America
43. Asian Health Coalition, Chicago, Illinois, United States of America
44. Banner Health, Phoenix, Arizona, United States of America
45. Montage Marketing Group, Bethesda, Maryland, United States of America
46. University of Utah, Salt Lake City, Utah, United States of America
47. HCM Strategists, Austin, Texas, United States of America
48. Color Genomics, Inc., Burlingame, California, United States of America
49. NYC Health + Hospitals, New York, New York, United States of America
50. VA All of Us Coordinating Center—Palo Alto, Palo Alto, California, United States of America
51. Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
52. San Diego Blood Bank, San Diego, California, United States of America
53. Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
54. National Library of Medicine (NLM), Bethesda, Maryland, United States of America
55. University of Miami School of Medicine, Miami, Florida, United States of America
56. Emory University, Atlanta, Georgia, United States of America
57. Research Triangle Institute, Research Triangle Park, North Carolina, United States of America
58. Waianae Coast CHC, Waianae, Hawaii, United States of America
Citation: Aschebrook-Kilfoy B, Zakin P, Craver A, Shah S, Kibriya MG, Stepniak E, et al. (2022) An Overview of Cancer in the First 315,000 All of Us Participants. PLoS ONE 17(9): e0272522. https://doi.org/10.1371/journal.pone.0272522
About the Authors:
Briseis Aschebrook-Kilfoy
Roles: Conceptualization, Data curation, Formal analysis, Investigation, Supervision, Writing – original draft
E-mail: [email protected]
Affiliations Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America, Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America, Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, United States of America
https://orcid.org/0000-0003-1918-7816
Paul Zakin
Roles: Conceptualization
Affiliations Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America, Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
Andrew Craver
Roles: Data curation, Methodology, Writing – review & editing
Affiliations Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America, Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
Sameep Shah
Roles: Data curation, Formal analysis
Affiliations Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America, Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
Muhammad G. Kibriya
Roles: Conceptualization, Writing – review & editing
Affiliations Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America, Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
Elizabeth Stepniak
Roles: Project administration, Supervision
Affiliations Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America, Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America
Andrea Ramirez
Roles: Funding acquisition, Project administration, Resources, Writing – review & editing
Affiliation: Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
Cheryl Clark
Roles: Project administration, Resources, Writing – review & editing
Affiliation: Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
Elizabeth Cohn
Roles: Methodology, Resources, Writing – review & editing
Affiliation: Hunter College City University of New York, New York, New York, United States of America
Lucila Ohno-Machado
Roles: Project administration, Resources, Writing – review & editing
Affiliation: University of California San Diego Health, La Jolla, California, United States of America
Mine Cicek
Roles: Project administration, Resources, Writing – review & editing
Affiliation: Mayo Clinic, Rochester, Minnesota, United States of America
Eric Boerwinkle
Roles: Project administration, Resources, Writing – review & editing
Affiliation: The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
Sheri D. Schully
Roles: Funding acquisition, Project administration, Resources, Writing – review & editing
Affiliation: National Institutes of Health, Bethesda, Maryland, United States of America
Stephen Mockrin
Roles: Project administration, Resources, Writing – review & editing
Affiliation: National Institutes of Health, Leidos, Inc, Frederick, Maryland, United States of America
Kelly Gebo
Roles: Project administration, Resources, Writing – review & editing
Affiliation: Johns Hopkins University School of Medicine, Bethesda, Maryland, United States of America
https://orcid.org/0000-0003-4010-398X
Kelsey Mayo
Roles: Project administration, Resources, Writing – review & editing
Affiliation: National Institutes of Health, Bethesda, Maryland, United States of America
Francis Ratsimbazafy
Roles: Resources, Software, Writing – review & editing
Affiliation: National Institutes of Health, Bethesda, Maryland, United States of America
Alan Sanders
Roles: Methodology, Project administration, Writing – review & editing
Affiliation: Northshore University Health System, Evanston, Illinois, United States of America
https://orcid.org/0000-0001-6629-4011
Raj C. Shah
Roles: Resources, Writing – review & editing
Affiliation: Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, United States of America
Maria Argos
Roles: Conceptualization, Writing – review & editing
Affiliation: Division of Epidemiology and Biostatistics, School of Public Health, University of Illinois at Chicago, Chicago, Illinois, United States of America
Joyce Ho
Roles: Project administration, Resources, Writing – review & editing
Affiliation: Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
https://orcid.org/0000-0003-4191-0054
Karen Kim
Roles: Resources, Supervision, Writing – review & editing
Affiliations Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, United States of America, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
Martha Daviglus
Roles: Funding acquisition, Resources, Supervision, Writing – review & editing
Affiliation: Institute for Minority Health Research, College of Medicine, University of Illinois at Chicago, Chicago, Illinois, United States of America
Philip Greenland
Roles: Conceptualization, Funding acquisition, Investigation, Resources, Writing – review & editing
Affiliation: Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
Habibul Ahsan
Roles: Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – review & editing
Affiliations Department of Public Health Sciences, University of Chicago, Chicago, Illinois, United States of America, Institute for Population and Precision Health, University of Chicago, Chicago, Illinois, United States of America, Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, United States of America
On behalf of the All of Us Research Program Investigators
¶a full list is noted in the acknowledgments.
1. Ward EM, Sherman RL, Henley SJ, Jemal A, Siegel DA, Feuer EJ, et al. Annual Report to the Nation on the Status of Cancer, Featuring Cancer in Men and Women Age 20–49 Years. J Natl Cancer Inst. 2019;111(12):1279–97. pmid:31145458; PubMed Central PMCID: PMC6910179.
2. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin. 2016;66(1):7–30. Epub 20160107. pmid:26742998.
3. American Cancer Society. Cancer Facts & Figures 2020. Atlanta: American Cancer Society, 2020.
4. Bauer UE, Briss PA, Goodman RA, Bowman BA. Prevention of chronic disease in the 21st century: elimination of the leading preventable causes of premature death and disability in the USA. Lancet. 2014;384(9937):45–52. Epub 20140701. pmid:24996589.
5. Pickens CM, Pierannunzi C, Garvin W, Town M. Surveillance for Certain Health Behaviors and Conditions Among States and Selected Local Areas—Behavioral Risk Factor Surveillance System, United States, 2015. MMWR Surveill Summ. 2018;67(9):1–90. Epub 20180629. pmid:29953431; PubMed Central PMCID: PMC6023179.
6. Jemal A, Center MM, DeSantis C, Ward EM. Global patterns of cancer incidence and mortality rates and trends. Cancer Epidemiol Biomarkers Prev. 2010;19(8):1893–907. Epub 20100720. pmid:20647400.
7. Pauli C, Hopkins BD, Prandi D, Shaw R, Fedrizzi T, Sboner A, et al. Personalized In Vitro and In Vivo Cancer Models to Guide Precision Medicine. Cancer Discov. 2017;7(5):462–77. Epub 20170322. pmid:28331002; PubMed Central PMCID: PMC5413423.
8. Werner RJ, Kelly AD, Issa JJ. Epigenetics and Precision Oncology. Cancer J. 2017;23(5):262–9. pmid:28926426; PubMed Central PMCID: PMC5708865.
9. Amin MB, Greene FL, Edge SB, Compton CC, Gershenwald JE, Brookland RK, et al. The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more "personalized" approach to cancer staging. CA Cancer J Clin. 2017;67(2):93–9. Epub 20170117. pmid:28094848.
10. Paolillo C, Londin E, Fortina P. Next generation sequencing in cancer: opportunities and challenges for precision cancer medicine. Scand J Clin Lab Invest Suppl. 2016;245:S84–91. Epub 20160817. pmid:27542004.
11. National Institutes of Health. The Precision Medicine Initiative Cohort Program–Building a Research Foundation for 21st Century Medicine. National Institutes of Health, U.S. Department of Health and Human Services, 2015.
12. National Institutes of Health. PMI Working Group of the Advisory Committee to the Director: National Institutes of Health, U.S. Department of Health and Human Services 2015 [cited 2021 April 30]. Available from: https://allofus.nih.gov/about/who-we-are/pmi-working-group-advisory-committee-director.
13. Collins FS, Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;372(9):793–5. Epub 20150130. pmid:25635347; PubMed Central PMCID: PMC5101938.
14. Oh SS, Galanter J, Thakur N, Pino-Yanes M, Barcelo NE, White MJ, et al. Diversity in Clinical and Biomedical Research: A Promise Yet to Be Fulfilled. PLoS Med. 2015;12(12):e1001918. Epub 20151215. pmid:26671224; PubMed Central PMCID: PMC4679830.
15. American Cancer Society. Cancer Facts and Figures for African Americans 2019–2021. Atlanta: American Cancer Society, 2021.
16. National Cancer Institute. Overview of the SEER Program: National Cancer Institute 2020. Available from: https://seer.cancer.gov/about/overview.html.
17. Denny JC, Devaney SA, Gebo KA. The "All of Us" Research Program. Reply. N Engl J Med. 2019;381(19):1884–5. pmid:31693826.
18. Ramirez A, Sulieman L, Schlueter D, Halvorson A, Qian J, Ratsimbazafy F, et al. The All of Us Research Program: data quality, utility, and diversity. medRxiv; 2020.
19. National Institutes of Health. All of Us Research Hub: Data Methods: National Institutes of Health, U.S. Department of Health and Human Services; 2020 [cited 2020 April 23]. Available from: https://www.researchallofus.org/methods/.
20. National Institutes of Health. All of Us Research Hub: National Institutes of Health, U.S. Department of Health and Human Services; 2020 [cited 2020 April 30]. Available from: https://www.researchallofus.org/.
21. SEER*Stat Database: Incidence—SEER 9 Regs Research Data, Nov 2018 Sub (1975–2016) <Katrina/Rita Population Adjustment>—Linked To County Attributes—Total U.S., 1969–2017 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2019, based on the November 2018 submission. [Internet]. National Cancer Institute 2020. Available from: www.seer.cancer.gov.
22. World Health Organization. International classification of diseases for oncology (ICD-O). 3rd, 1st revision ed. Geneva: World Health Organization; 2013 2013.
23. National Cancer Institute. National Cancer Institute SEER*Stat software: National Cancer Institute; 2019. Available from: https://seer.cancer.gov/seerstat/.
24. Wallace DC, Bartlett R. Recruitment and retention of African American and Hispanic girls and women in research. Public Health Nurs. 2013;30(2):159–66. Epub 20121122. pmid:23452110; PubMed Central PMCID: PMC4040954.
25. Mapes BM, Foster CS, Kusnoor SV, Epelbaum MI, AuYoung M, Jenkins G, et al. Diversity and inclusion for the All of Us research program: A scoping review. PLoS One. 2020;15(7):e0234962. Epub 20200701. pmid:32609747; PubMed Central PMCID: PMC7329113.
26. Denny JC, Rutter JL, Goldstein DB, Philippakis A, Smoller JW, Jenkins G, et al. The “All of Us” Research Program. New England Journal of Medicine. 2019;381(7):668–76. pmid:31412182.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication: https://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Introduction
The NIH All of Us Research Program will have the scale and scope to enable research for a wide range of diseases, including cancer. The program’s focus on diversity and inclusion promises a better understanding of the unequal burden of cancer. Preliminary cancer ascertainment in the All of Us cohort from two data sources (self-reported versus electronic health records (EHR)) is considered.
Materials and methods
This work was performed on data collected from the All of Us Research Program’s 315,297 enrolled participants to date using the Researcher Workbench, where approved researchers can access and analyze All of Us data on cancer and other diseases. Cancer case ascertainment was performed using data from EHR and self-reported surveys across key factors. Distribution of cancer types and concordance of data sources by cancer site and demographics is analyzed.
Results and discussion
Data collected from 315,297 participants resulted in 13,298 cancer cases detected in the survey (in 89,261 participants), 23,520 cancer cases detected in the EHR (in 203,813 participants), and 7,123 cancer cases detected across both sources (in 62,497 participants). Key differences in survey completion by race/ethnicity impacted the makeup of cohorts when compared to cancer in the EHR and national NCI SEER data.
Conclusions
This study provides key insight into cancer detection in the All of Us Research Program and points to the existing strengths and limitations of All of Us as a platform for cancer research now and in the future.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details



