Introduction
Healthcare providers, clinical trialists, and electronic healthcare record (EHR) system users collect patient and study participant self-reported medical histories, systemic medical problem lists, laboratory test results and medications. These databases are used by healthcare providers, clinical trialists and data scientists to facilitate patient care and determine eligibility or stratification in clinical trials. Specific medical illness reporting is also the focus of government agencies and organizations with inferred prevalence and metrics about relevant diseases using a variety of methods including surveys. Currently, there is no universal method to verify the veracity and completeness of these data in clinical trial databases, surveys, and EHRs in the United States. Yet, decisions are made daily based on these information sources.
Prospectively collected data is considered best for avoiding the bias that occurs in retrospective record review and should therefore be most useful to determine concordance or prevalence of disease or comorbidities for specific illnesses. But medical illness reporting is still dependent on the degree that each component is linked in each database. A prospective study conducted at an inner-city emergency department reported of 114 patients, only 71 (62%) provided accurate histories concerning major health issues such as cardiac, pulmonary, neurological, and sickle cell when compared with the information in existing medical records.[1] Another prospective study performed in an emergency department showed that medication-use was the variable most prone to discrepancy. Comparing pharmacist-determination and patient self-reporting in 138 patients revealed approximately 20% had at least one discrepancy for a major medication.[2]
Studies that target reporting reliability show increased accuracy if patients are queried on a single medical disorder or if patients are told in advance that the study purpose is to test the accuracy of responses. When patients were asked about an orthopedic issue, patient accuracy increased to 90%.[3] A prospective study in a cancer population showed a 90% concordance for patient self-reported diseases and recorded medical histories when the patients were informed the issue was the evaluation of their self-reporting.[4] These reports suggest that a regimented data collection at study enrollment and the nature of individuals who enroll in a clinical trial could provide more accurate identification of their medical illnesses. Participants in clinical trials are generally recruited at a particular stage of an illness and usually have uniformly collected medical information during the trial. Thus, a clinical trial database, which is structured comprehensively, should provide an excellent source to explore the veracity of disease prevalence data and develop methods to improve accuracy.[5]
As we explored a database for a recently completed clinical trial of non-arteritic anterior ischemic optic neuropathy (NAION), we noted discrepancies in the medical history prevalence data when compared with the medications, examination and laboratory data. This database provided an opportunity to investigate potential discrepancies in the reporting of the frequency of major illnesses in this defined cohort. We hypothesized that the study ancillary data such as medications, physical examination findings, and laboratory test results collected at screening and enrollment for this clinical trial would verify medical record and patient-reported medical illnesses, in this study population.
Methods
Study cohort
We utilized the reported medical history, physical examination, blood and other clinical test results, and medications from Quark Pharmaceutic Company (trial sponsor), Qrk 207 NAION study database collected at the screening and enrollment visits to explore the veracity of reported medical illnesses.[6] The clinical trial was registered at ClinicalTrials.gov with Identifier: NCT02341560 and conducted after Institutional Review Board (IRB) approval according to the tenets of the Declaration of Helsinki. The trial began 02/24/16 and was completed 07/01/19. The data was received and accessed for research purposes starting 01/04/21. The study enrolled 729 subjects, at 80 study sites in eight countries, with acute NAION who met entry criteria (detailed in supplement), some of which led to not including an estimated 12% of patients who had exclusionary medical disorders. The analysis of these collected data for the purpose of the current study required no additional IRB approval according to the Icahn School of Medicine at Mount Sinai IRB as the data were de-identified. Quark Pharmaceuticals provided a complete secure de-identified database for all study data collected, including a complete standardized medical and ophthalmic history, physical examination, blood and urine laboratory tests, electrocardiogram (ECG), ocular exam findings, and medications. All participants were 50–80 years old and had acute unilateral NAION for no longer than 14 days at enrollment.
Data collection and categorization
The information in the Qrk 207 NAION study was collected and entered by certified study coordinators at each site. Medical illness reporting, by participants or taken from site EHRs (at some but not all sites) or both, clinical tests, physical examination findings (performed by site investigators) and medications were recorded in accordance with the study protocol. In this report, we focused on the occurrence of 11 medical disorders of interest (Table 1), for which potential verifying confirmation was available in the database. The evidence included medications, fasting basic blood laboratory test results, other clinical tests and physical examinations. The categories of reported medical disorders were mostly obvious, e.g., diabetes mellitus, but some were grouped into one entity, e.g., cerebrovascular and peripheral venous and arterial diseases were termed vascular disorders. Cardiac disorders, most of which were ischemic heart disease, were separated from vascular disorders (see supplement for complete list). We also were aware that the exclusion criteria for the clinical trial would limit the frequency of active or severe systemic diseases, but this would not alter the purpose of this report – to investigate methods of recording medical illness.
[Figure omitted. See PDF.]
Laboratory testing
We standardized the blood laboratory values using the normal values and ranges for Mount Sinai Beth Israel Laboratory standard operating procedures for 2021. We categorized values that were above or below reference ranges as abnormally high or low, respectively. We used laboratory or physical exam or test result criteria recommended in the literature or by specialty societies or organizations to categorize each participant as having each disorder (Table 1).
Medication indications
We expanded the sponsor list of indications for each medication, noted in the database, using the Anatomical Therapeutic Chemical codes [17], the FDA and National Library of Medicine [18] databases for additional indications. We grouped medications as indicators of having specific medical illnesses, recognizing that some have multiple potential indications. Thus, specific medications were categorized for more than one clinical indication or disorder.
Statistical analysis
We measured the frequency and concordance for diagnoses using four ascertainment methods: M1 - Participant and health record reported medical history alone; M2 - A combination of physical examination findings and clinical or laboratory tests results; M3 - Medications grouped by indications and; M4 - Combining all the results of methods 2 and 3.
The prevalence of reported medical disorders and the determinants or verifiable measures of medical illnesses, including laboratory measures, medications, and clinical exam findings, were calculated as frequencies with proportions (%). The determinants were stratified and compared across country of residence and sex using Chi-square tests, and when appropriate Fisher exact tests. We used Cohen’s kappa (K) statistic to analyze the relative observed agreement corrected for the probability of chance agreement between reported medical history for the 11 medical illnesses and the determinants used in methods 2, 3 and 4. The Κ statistic ranges from -1–1, such that a negative measurement indicated less than chance agreement, zero specified chance agreement, and a positive measurement indicated greater than chance agreement.[19] K statistics with 95% confidence intervals were calculated across country of residence and sex. We also examined potential modification by sex and by country using Chi-square statistic tests to assess whether the stratum levels of the K statistics were equal.[20] For analyses stratified by country, we included 725 participants, excluding the four participants from Singapore. For all other analyses, we included all 729 participants. For each of the 11 diseases and their corollary determinants, we conducted power analyses. Assuming a null hypothesis of K = 0.81, which indicates substantial agreement, and an alpha of 0.05, we found that a sample size ranging from a minimum of 101 to a maximum of 275 produced 98% power. When we assumed a null hypothesis of K = 0.6, indicating moderate agreement, a sample size ranging from a minimum of 197 to a maximum of 551, produced 98% power. Consequently, our sample size appeared to be sufficient to test our hypotheses. Statistical analyses were performed using SAS software version 9.4 (SAS Institute Inc., Cary, NC).[21]
Funding Source: None of the funding sources had a role in the study design, conduct, and reporting.
Results
The study cohort included 500 males (mean age 61 ± 7.8 years) and 229 females (mean age 62.1 ± 7.4 years). The race composition for the study participants was 570 White, 149 Asian, and three other races from eight countries.
Primary analyses
By history alone, the reported prevalence of medical disorders for all study participants was variable (Table 2). None of the reported disorders occurred in more than 30% of participants, except for systemic hypertension and vascular disorders. However, laboratory and other tests and the physical examination (M2) or for the medications (M3) were abnormal in more than 24% of participants for cardiac disorders, diabetes, hyperlipidemia, systemic hypertension, and obesity (supplement Table 1). Medical history reporting in eight disorders was uncommonly unassociated with at least one of the other determinants of disease (supplement Table 2). Remarkably, for three medical disorders that affected at least 100 participants and could be verified by laboratory tests or physical examination (M2) or medications (M3), medical reporting with no verifying determinants occurred in 11% for hyperlipidemia, 14% for high blood pressure, and 19% for prior NAION in the fellow eye. The agreement between the reported medical disorder (M1) and the M2 determinants, in general, were minimal or poor for all medical disorders except for prior fellow eye NAION. The medical disorders with the lowest amount of agreement with medications (M3) were kidney disease [-0.003 (-0.007, 0.002)], obesity [-0.02 (-0.05, 0.01)] and cardiac disorders [0.07 (0.04, 0.11)]. The medical disorder history with the lowest agreement with the laboratory test results were hepatic disorders [0.01 (-0.03, 0.04)], hyperlipidemia [0.07 (0.03, 0.11)] and anemia [0.06 (0.01, 0.11)]. The medical disorders with the lowest agreement with combining all three verification methods (M4) were anemia [0.06 (0.01, 0.11)], cardiac disorders [0.07 (0.05, 0.09)], hyperlipidemia [0.06 (0.04, 0.09)], and kidney disease [0.05 (0.01, 0.1)]. However, there were some instances of moderate agreement between a reported medical disorder with disorder-specific medication use including diabetes [0.59 (0.52, 0.66)], psychiatric disorders [0.52 (0.44, 0.61)], and systemic hypertension [0.39 (0.32, 0.46)]. The combination of disorder-specific medications, laboratory test, and physical examination evaluations (M4) yielded a frequency that was markedly greater than each reported medical disorder alone
[Figure omitted. See PDF.]
Secondary analyses
In general, the prevalence of reported medical disorders (M1, previously reported6), laboratory test abnormalities and abnormal physical findings (M2), and medications (M3) were similar across sex for most characteristics (Table 3). Females and males had similar degree of agreement for test and exam abnormalities and medications (M4) with reported cardiac disorder [0.1 (0.06, 0.12) for males and 0.1 (0.05, 0.14) for females], diabetes mellitus [0.6 (0.52, 0.69) for males and 0.53 (0.38, 0.68) for females], systemic hypertension [0.27 (0.21, 0.33) for males and 0.35 (0.24, 0.46) for females]. Moderate concordance for medications (M3) and psychiatric disorder was similar for males [0.57 (0.45, 0.68) and females [0.45 (0.31, 0.58)]. Except for prior NAION in the fellow eye, the reported medical disorders did not have moderate or strong agreement with the laboratory findings or physical exam findings stratified by sex for all disorders.
[Figure omitted. See PDF.]
The prevalence of reported medical disorders (M1), laboratory abnormalities and abnormal physical exam findings (M2) and medications (M3) varied widely among the seven countries with enough participants to analyze (supplement Table 3). There was a striking difference in the reporting of psychiatric disorders and use of medication to treat these disorders, with no participants in China with either, and one participant (4.4%) in Italy taking medication, while in the USA, 15% had a reported history and 23% took medication for a psychiatric disorder. The prevalence of anemia and renal dysfunction were too low to compare among countries.
The agreement between determinants M2 and M1 was fair to poor across all countries, except for prior fellow eye NAION in Australia, Israel, and the USA (Table 4). Concordance between medications (M3) and M1 was moderate to high across all countries, except in Israel, for diabetes, for systemic hypertension only in China and India, and for psychiatric disorders only in Australia and the USA. Agreement was moderate with disorder-specific medication or laboratory test results and physical findings (M4) for diabetes in China and Italy and systemic hypertension in China; but for all other disorders agreement was generally poor. Anemia, hepatic and renal disorders were too infrequent for analysis by country.
[Figure omitted. See PDF.]
Key points
Question: Are medical illnesses reported in clinical trials verifiable and accurate?
Findings: Analysis of the records of a clinical trial shows significant under-reporting of 11 major medical disorders. Verification methods, using laboratory tests results, physical examination findings, and medication indications show study participants have much greater prevalence of these illnesses.
Meaning: Verifiable objective methods can be used to improve the accuracy of medical disorder prevalence in all medical records.
Discussion
Our study results show that self-reporting or medical history information collected from an EHR alone, even when collected in a clinical trial, is an unreliable method to determine the prevalence of major medical disorders. This finding has major implications for clinical decision making as well as research, as under reporting of illnesses is common. The reduced reporting can also slow or decrease recruiting to new clinical trials that depend on extracting research-quality phenotypes from the EHR. It also suggests that some attempts at personalized medicine may be biased; and research based on such data, which includes large genetic databases linked to EHR or self-reported data [22,23], may potentially lead to false findings, bias and wrong treatment recommendations. We were surprised that even when rigorously collected data in a clinical trial were compared with three objective verification methods, the observed reporting on common medical disorders was deficient. Our methods included diagnosis-determination based on the medication indications or physical examination findings and laboratory tests collected at study entry. The third approach, combining the results for the latter two methods, substantially increased identification of medical disorders. Our study is novel in that we used a database prospectively collected at a uniform time point prior to administering a study intervention. The trial collected the data utilizing a standardized protocol to evaluate medical illnesses (inclusive of psychiatric disorders) as well as for a targeted illness, acute NAION. Our report highlights that even with patients who have a high medical acumen (given the understanding and willingness to participate in a clinical trial), that medical histories, particularly when self-reported, are frequently inaccurate. Our combination method, using laboratory test results, physical exam findings and medications coupled with medical histories can be used by sponsors conducting clinical trials to determine more complete assessments of systemic disease, and possible confounding illnesses.
Precise assessment of all illnesses and comorbidities is needed for clinical trial management as well as for recommendations for each study participant based on baseline illness status. Administrative, data management, and sponsor issues notwithstanding, improved accuracy is needed for future considerations, for optimal management of participants during a study, and for improving the clinical trial results.[24] Machine learning and other types of artificial intelligence methods are currently being applied to clinical trial datasets in a post hoc manner.[25] Multiple variables are often considered in the evaluation of the trial results, but if all the variables directly associated with the target or associated disease are not included, the results of analyses could be flawed or not reproducible.[26] This also has implications for other patient management and research projects, which explore associations between EHR derived data and gene studies.
Our results show variation across countries, but there was generally under-reporting of medical disorders and poor agreement with the medications and the laboratory and physical exam findings for each medical disorder. However, for systemic hypertension, hyperlipidemia, and prior fellow eye NAION, all of which are verifiable by one or more of our determinants, the participants over-reported these conditions. Female and male participants were similar, showing good to very good concordance only for medications or exam findings for diabetes mellitus, prior fellow eye NAION, and psychiatric disease. The generally strong agreement with reporting prior fellow eye NAION and the physical exam finding, is due in part to the specific clinical trial for this disorder so participants were motivated to be screened and enrolled in the treatment trial. We did not explore the relationship with age, given that the study entry criteria limited participants to older adults. We cannot comment on the medical history reporting from EHRs across countries or sites as the trial database did not detail how often they were used by site investigators or coordinators.
This cohort study suggests that cultural issues may have a role for reporting and treating illnesses such as psychiatric disorders. It is also important to note that there are no laboratory confirmatory tests to adequately evaluate the presence of psychiatric disorders.
Our findings, drawn from data collected under controlled conditions, also have important implications for “big data” approaches utilizing healthcare data sources, which are not collected under optimal conditions. EHR reviews frequently under-report the prevalence of medical illness, and results rarely include medications or medication indications or diagnostic test results.[27–29] Comparison studies suggest that self-reported surveys yield higher prevalence rates compared with electronic medical data extraction except for severe illnesses.[30] There are few studies on the actual sensitivity and specificity of procedures used to identify diagnoses from survey or medical records,.[31] Additionally, prevalence reports have not used consistent definitions or evidence criteria.[32] We suggest that population health survey and ‘big data’ evaluations from healthcare system records need verifying evidence. Few reports describe the use of all the information available to describe the prevalence of disease.[27] Even the prevalence of comorbidity data for targeted diseases varies widely depending on the type of records searched.[33] Reasons for the variability include incomplete records, non-uniform recruitment criteria or stage of illness, and medical records data collection driven by patients seeking care or the severity of illness [34] These issues are not easily overcome, but including all the data in the medical records will likely improve diagnostic precision. Machine learning is currently in use to evaluate ‘Big Data’[35], but artificial intelligence methods cannot overcome training on only part of the variables associated with each disease and may lead to models that incompletely identify risk factors or provide spurious correlations.[36,37] Our results suggest that such analyses should incorporate multiple types of medical information to enhance accuracy.
Although we might assume that symptomatic or clinically obvious disorders would have a higher prevalence than asymptomatic conditions [29], our study did not show this. For example, obesity, cardiac and ischemic heart disease prevalence were markedly under-reported by the medical history. However, one symptomatic medical illness, psychiatric disorder, showed excellent concordance between medical history and medication. The variability of patient reporting of symptomatic illnesses and medical history under-reporting, particularly when minimally symptomatic or asymptomatic, is known.[4] We assumed that our study participants with a cardiac disorder did not have severe disease, due to the clinical trial exclusion criteria. The under-reporting of less symptomatic medical conditions in our study participants is in concert with what is recognized when EHRs are interrogated.
Poor documentation for all aspects of medical information further limit the accuracy and increase discrepancies when comparing medical history to other methods to determine disease prevalence.[38] Using billing codes or problems lists as the gold standard for identifying chronic conditions and multi-morbidity data only has moderate to good agreement with other methods, which is in part due to variability of healthcare provider evaluations and documentation.[39] Similar issues occur during a clinical trial if the priority for data entry and monitoring is focused solely on issues that are known to be associated with the study outcomes and safety.
Our report has limitations. Our method of listing multiple indications for drugs could have overstated the disparity between the medical history information and disorder-specific medications. For example, medications used to treat ischemic heart disease could be to treat systemic hypertension or cerebrovascular and peripheral vascular disorders. Thus, we might have overestimated the prevalence of some disorders based on the medication indications. We did not consider clinical disorders for which the clinical trial did not have data for one of our verifying methods. We did not consider all issues related to obesity, and most healthcare system EHRs do not contain all of the types of data that are relevant. Medical disorders that had a low frequency in the study cohort might have too few participants to report accurately. It is also possible that potential participants might omit information that might exclude them from being enrolled in a study. Last, the study cohort was restricted by entry criteria, 50–80 years of age and stable medical illnesses. This study does not purport to report the prevalence of the medical illnesses beyond the study cohort.
Despite the study limitations, we conclude that health surveys, EHR, and clinical trial analyses need to consider all potential indicators in determining the prevalence of medical illnesses. It is possible that more complete data that includes the patient illness and findings for which a medication is prescribed will reduce potential artificially increased prevalence based on drug indication databases. Using these and additional measures should improve the accuracy of reporting.
Data access: Others who seek access to the datset used in this study have to contact Quark Pharmaceutical, Inc for a data sharing agreement. Quark Pharmaceutical, Inc allowed MJK access to the data for academic use. The binding agreement with the company prevents distribution of the data for use as the database contains proprietary information that could be used by competitor drug companies. For data Inquires [email protected] can be contacted.
Strengths and limitations
* Prospective rigorously collected data utilized.
* Cohort of participants with acute non-arteritic anterior ischemic neuropathy for a trial that excluded some active medical illnesses.
* Use of medicine indications might lead to over estimation of medical illnesses.
References
1. 1. Neugut AI, Neugut RH. How accurate are patient histories?. J Community Health. 1984;9(4):294–301. pmid:6480893
* View Article
* PubMed/NCBI
* Google Scholar
2. 2. Wai A, Salib M, Aran S, Edwards J, Patanwala AE. Accuracy of patient self-administered medication history forms in the emergency department. Am J Emerg Med. 2020;38(1):50–4. pmid:31005394
* View Article
* PubMed/NCBI
* Google Scholar
3. 3. Boissonnault WG, Badke MB. Collecting health history information: the accuracy of a patient self-administered questionnaire in an orthopedic outpatient setting. Phys Ther. 2005;85(6):531–43. pmid:15921474
* View Article
* PubMed/NCBI
* Google Scholar
4. 4. Ye F, Moon DH, Carpenter WR, Reeve BB, Usinger DS, Green RL, et al. Comparison of Patient Report and Medical Records of Comorbidities: Results From a Population-Based Cohort of Patients With Prostate Cancer. JAMA Oncol. 2017;3(8):1035–42. pmid:28208186
* View Article
* PubMed/NCBI
* Google Scholar
5. 5. Fort D, Weng C, Bakken S, Wilcox AB. Considerations for using research data to verify clinical data accuracy. AMIA Jt Summits Transl Sci Proc. 2014 Apr 7;2014:211-7.
6. 6. Kupersmith M, Fraser C, Morgenstern R, Miller N, Levin L, Jette N. Ophthalmic and systemic risk factors of acute nonarteritic ischemic optic neuropathy in the Quark 207 treatment trial. Ophthalmology. 2024;131(1):S0161-6420(24)00033-2.
* View Article
* Google Scholar
7. 7. National Heart, Lung and Blood Institute. https://www.nhlbi.nih.gov/health/anemia/diagnosis
* View Article
* Google Scholar
8. 8. National Heart, Lung and Blood Institute. https://www.nhlbi.nih.gov/health/anemia/diagnosis
* View Article
* Google Scholar
9. 9. American Heart Association. https://www.heart.org/en/healthy-living/fitness/fitness-basics/target-heart-rates
* View Article
* Google Scholar
10. 10. Centers for Disease Control and Prevention. Getting tested for diabetes. n.d.
11. 11. Kwo PY, Cohen SM, Lim JK. ACG Clinical Guideline: Evaluation of Abnormal Liver Chemistries. Am J Gastroenterol. 2017;112(1):18–35. pmid:27995906
* View Article
* PubMed/NCBI
* Google Scholar
12. 12. Weng CY. Data Accuracy in Electronic Medical Record Documentation. JAMA Ophthalmol. 2017;135(3):232–3. pmid:28125748
* View Article
* PubMed/NCBI
* Google Scholar
13. 13. Centers for Disease Control and Prevention. https://www.cdc.gov/cholesterol/about.htm
14. 14. Centers for Disease Control and Prevention; https://www.cdc.gov/cholesterol/about.htm
15. 15. Centers for Disease Control and Prevention. https://www.cdc.gov/kidney-disease/testing/?CDC_AAref_Val= https://www.cdc.gov/kidneydisease/publications-resources/kidney-tests.html
16. 16. Centers for Disease Control and Prevention. National Institutes of Health. https://www.ncbi.nlm.nih.gov/books/NBK507821/#:~:text=The%20ratio%20of%20BUN%3A%20creatinine,sometimes%20%3E30%3A1
* View Article
* Google Scholar
17. 17. de Groot R, Glaser S, Kogan A, Medlock S, Alloni A, Gabetta M, et al. ATC-to-RxNorm mappings – A comparison between OHDSI Standardized Vocabularies and UMLS Metathesaurus. International Journal of Medical Informatics. 2025;195:105777.
* View Article
* Google Scholar
18. 18. https://www.fda.gov/science-research/bioinformatics-tools/fdalabel-full-text-search-drug-product-labeling
* View Article
* Google Scholar
19. 19. Cohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement. 1960;20(1):37–46.
* View Article
* Google Scholar
20. 20. Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. 3rd ed. Hoboken, NJ: John Wiley & Sons. 2003.
21. 21. Kruse RL, Mehr DR. Data management for prospective research studies using SAS software. BMC Medical Research Methodology. 2008;8(6):6.
* View Article
* Google Scholar
22. 22. https://www.23andme.com/dna-and-personalized-healthcare/.
* View Article
* Google Scholar
23. 23. https://www.ukbiobank.ac.uk/
* View Article
* Google Scholar
24. 24. Kupersmith MJ, Jette N. Specific recommendations to improve the design and conduct of clinical trials. Trials. 2023;24(1):263. pmid:37038147
* View Article
* PubMed/NCBI
* Google Scholar
25. 25. Weissler EH, Naumann T, Andersson T, Ranganath R, Elemento O, Luo Y, et al. The role of machine learning in clinical research: transforming the future of evidence generation. Trials. 2021;22(1):537. pmid:34399832
* View Article
* PubMed/NCBI
* Google Scholar
26. 26. McDermott M, Wang S, Marinsek N, Ranganath R, Foschini L, Ghassemi M. Reproducibility in machine learning for health research: Still a ways to go. Science Translational Medicine. 2021;13(586):eabb1655.
* View Article
* Google Scholar
27. 27. Banerjee D, Chung S, Wong EC, Wang EJ, Stafford RS, Palaniappan LP. Underdiagnosis of hypertension using electronic health records. Am J Hypertens. 2012;25(1):97–102. pmid:22031453
* View Article
* PubMed/NCBI
* Google Scholar
28. 28. Chase HS, Radhakrishnan J, Shirazian S, Rao MK, Vawdrey DK. Under-documentation of chronic kidney disease in the electronic health record in outpatients. J Am Med Inform Assoc. 2010;17(5):588–94. pmid:20819869
* View Article
* PubMed/NCBI
* Google Scholar
29. 29. Violán C, Foguet-Boreu Q, Hermosilla-Pérez E, Valderas JM, Bolíbar B, Fàbregas-Escurriola M, et al. Comparison of the information provided by electronic health records data and a population health survey to estimate prevalence of selected health conditions and multimorbidity. BMC Public Health. 2013;13:251. pmid:23517342
* View Article
* PubMed/NCBI
* Google Scholar
30. 30. Bagley SC, Altman RB. Computing disease incidence, prevalence and comorbidity from electronic medical records. J Biomed Inform. 2016;63:108–11. pmid:27498067
* View Article
* PubMed/NCBI
* Google Scholar
31. 31. Rotily M, Roze S. What is the impact of disease prevalence upon health technology assessment?. Best Pract Res Clin Gastroenterol. 2013;27(6):853–65. pmid:24182606
* View Article
* PubMed/NCBI
* Google Scholar
32. 32. Borges Migliavaca C, Stein C, Colpani V, Barker TH, Munn Z, Falavigna M, et al. How are systematic reviews of prevalence conducted? A methodological study. BMC Med Res Methodol. 2020;20(1):96. pmid:32336279
* View Article
* PubMed/NCBI
* Google Scholar
33. 33. Sheriffdeen A, Millar JL, Martin C, Evans M, Tikellis G, Evans SM. (Dis)concordance of comorbidity data and cancer status across administrative datasets, medical charts, and self-reports. BMC Health Serv Res. 2020;20(1):858. pmid:32917193
* View Article
* PubMed/NCBI
* Google Scholar
34. 34. Daskivich T, Abedi G, Kaplan S, Skarecky D, Ahlering T, Spiegel B, et al. Electronic health record problem lists: accurate enough for risk adjustment?. American Journal of Managed Care. 2018;24(1):e24–9.
* View Article
* Google Scholar
35. 35. Ninomiya K, Kageyama S, Garg S, Masuda S, Kotoku N, Revaiah P, et al. Can machine learning unravel unsuspected, clinically important factors predictive of long-term mortality in complex coronary artery disease? A call for ‘big data’. Eur Heart J Digit Health. 2023;4(3):275–8.
* View Article
* Google Scholar
36. 36. D’Amour A, Heller K, Moldovan D, Adlam B, Alipanahi B, Beutel A, et al. Underspecification presents challenges for credibility in modern machine learning. The Journal of Machine Learning Research. 2022;23(1):10237–97.
* View Article
* Google Scholar
37. 37. Chen D, Liu S, Kingsbury P, Sohn S, Storlie CB, Habermann EB, et al. Deep learning and alternative learning strategies for retrospective real-world clinical data. NPJ Digit Med. 2019;2:43. pmid:31304389
* View Article
* PubMed/NCBI
* Google Scholar
38. 38. Weiner SJ, Wang S, Kelly B, Sharma G, Schwartz A. How accurate is the medical record? A comparison of the physician’s note with a concealed audio recording in unannounced standardized patient encounters. J Am Med Inform Assoc. 2020;27(5):770–5. pmid:32330258
* View Article
* PubMed/NCBI
* Google Scholar
39. 39. King BL, Meyer ML, Chari SV, Hurka-Richardson K, Bohrmann T, Chang PP, et al. Accuracy of the electronic health record’s problem list in describing multimorbidity in patients with heart failure in the emergency department. PLoS One. 2022;17(12):e0279033. pmid:36512600
* View Article
* PubMed/NCBI
* Google Scholar
Citation: Morgenstern R, Reichenberg A, Kummer B, Jette N, Kupersmith MJ (2025) The reliability of medical illness reporting in a randomized clinical trial. PLoS ONE 20(4): e0320759. https://doi.org/10.1371/journal.pone.0320759
About the Authors:
Rachelle Morgenstern
Roles: Data curation, Formal analysis
Affiliation: Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
Avi Reichenberg
Roles: Methodology, Writing – review & editing
Affiliation: Icahn School of Medicine at Mount Sinai, Icahn School of Medicine at Mount Sinai, Pscyhiatry and Environemental Medicine and Public Health, New York, New York, United States of America
Benjamin Kummer
Roles: Writing – original draft, Writing – review & editing
Affiliation: Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
Nathalie Jette
Roles: Formal analysis, Methodology, Writing – review & editing
Affiliation: Departments of Clinical Neurosciences and Community Health Sciences, O’Brien Institute for Public Health, University of Calgary, Calgary, Canada
Mark J. Kupersmith
Roles: Formal analysis, Funding acquisition, Project administration, Supervision, Writing – original draft, Writing – review & editing
E-mail: [email protected]
Affiliation: Icahn School of Medicine at Mount Sinai, Ophthalmology and Neurosurgery, New York, New York, United States of America
ORICD: https://orcid.org/0000-0003-0461-8839
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
1. Neugut AI, Neugut RH. How accurate are patient histories?. J Community Health. 1984;9(4):294–301. pmid:6480893
2. Wai A, Salib M, Aran S, Edwards J, Patanwala AE. Accuracy of patient self-administered medication history forms in the emergency department. Am J Emerg Med. 2020;38(1):50–4. pmid:31005394
3. Boissonnault WG, Badke MB. Collecting health history information: the accuracy of a patient self-administered questionnaire in an orthopedic outpatient setting. Phys Ther. 2005;85(6):531–43. pmid:15921474
4. Ye F, Moon DH, Carpenter WR, Reeve BB, Usinger DS, Green RL, et al. Comparison of Patient Report and Medical Records of Comorbidities: Results From a Population-Based Cohort of Patients With Prostate Cancer. JAMA Oncol. 2017;3(8):1035–42. pmid:28208186
5. Fort D, Weng C, Bakken S, Wilcox AB. Considerations for using research data to verify clinical data accuracy. AMIA Jt Summits Transl Sci Proc. 2014 Apr 7;2014:211-7.
6. Kupersmith M, Fraser C, Morgenstern R, Miller N, Levin L, Jette N. Ophthalmic and systemic risk factors of acute nonarteritic ischemic optic neuropathy in the Quark 207 treatment trial. Ophthalmology. 2024;131(1):S0161-6420(24)00033-2.
7. National Heart, Lung and Blood Institute. https://www.nhlbi.nih.gov/health/anemia/diagnosis
8. National Heart, Lung and Blood Institute. https://www.nhlbi.nih.gov/health/anemia/diagnosis
9. American Heart Association. https://www.heart.org/en/healthy-living/fitness/fitness-basics/target-heart-rates
10. Centers for Disease Control and Prevention. Getting tested for diabetes. n.d.
11. Kwo PY, Cohen SM, Lim JK. ACG Clinical Guideline: Evaluation of Abnormal Liver Chemistries. Am J Gastroenterol. 2017;112(1):18–35. pmid:27995906
12. Weng CY. Data Accuracy in Electronic Medical Record Documentation. JAMA Ophthalmol. 2017;135(3):232–3. pmid:28125748
13. Centers for Disease Control and Prevention. https://www.cdc.gov/cholesterol/about.htm
14. Centers for Disease Control and Prevention; https://www.cdc.gov/cholesterol/about.htm
15. Centers for Disease Control and Prevention. https://www.cdc.gov/kidney-disease/testing/?CDC_AAref_Val= https://www.cdc.gov/kidneydisease/publications-resources/kidney-tests.html
16. Centers for Disease Control and Prevention. National Institutes of Health. https://www.ncbi.nlm.nih.gov/books/NBK507821/#:~:text=The%20ratio%20of%20BUN%3A%20creatinine,sometimes%20%3E30%3A1
17. de Groot R, Glaser S, Kogan A, Medlock S, Alloni A, Gabetta M, et al. ATC-to-RxNorm mappings – A comparison between OHDSI Standardized Vocabularies and UMLS Metathesaurus. International Journal of Medical Informatics. 2025;195:105777.
18. https://www.fda.gov/science-research/bioinformatics-tools/fdalabel-full-text-search-drug-product-labeling
19. Cohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement. 1960;20(1):37–46.
20. Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. 3rd ed. Hoboken, NJ: John Wiley & Sons. 2003.
21. Kruse RL, Mehr DR. Data management for prospective research studies using SAS software. BMC Medical Research Methodology. 2008;8(6):6.
22. https://www.23andme.com/dna-and-personalized-healthcare/.
23. https://www.ukbiobank.ac.uk/
24. Kupersmith MJ, Jette N. Specific recommendations to improve the design and conduct of clinical trials. Trials. 2023;24(1):263. pmid:37038147
25. Weissler EH, Naumann T, Andersson T, Ranganath R, Elemento O, Luo Y, et al. The role of machine learning in clinical research: transforming the future of evidence generation. Trials. 2021;22(1):537. pmid:34399832
26. McDermott M, Wang S, Marinsek N, Ranganath R, Foschini L, Ghassemi M. Reproducibility in machine learning for health research: Still a ways to go. Science Translational Medicine. 2021;13(586):eabb1655.
27. Banerjee D, Chung S, Wong EC, Wang EJ, Stafford RS, Palaniappan LP. Underdiagnosis of hypertension using electronic health records. Am J Hypertens. 2012;25(1):97–102. pmid:22031453
28. Chase HS, Radhakrishnan J, Shirazian S, Rao MK, Vawdrey DK. Under-documentation of chronic kidney disease in the electronic health record in outpatients. J Am Med Inform Assoc. 2010;17(5):588–94. pmid:20819869
29. Violán C, Foguet-Boreu Q, Hermosilla-Pérez E, Valderas JM, Bolíbar B, Fàbregas-Escurriola M, et al. Comparison of the information provided by electronic health records data and a population health survey to estimate prevalence of selected health conditions and multimorbidity. BMC Public Health. 2013;13:251. pmid:23517342
30. Bagley SC, Altman RB. Computing disease incidence, prevalence and comorbidity from electronic medical records. J Biomed Inform. 2016;63:108–11. pmid:27498067
31. Rotily M, Roze S. What is the impact of disease prevalence upon health technology assessment?. Best Pract Res Clin Gastroenterol. 2013;27(6):853–65. pmid:24182606
32. Borges Migliavaca C, Stein C, Colpani V, Barker TH, Munn Z, Falavigna M, et al. How are systematic reviews of prevalence conducted? A methodological study. BMC Med Res Methodol. 2020;20(1):96. pmid:32336279
33. Sheriffdeen A, Millar JL, Martin C, Evans M, Tikellis G, Evans SM. (Dis)concordance of comorbidity data and cancer status across administrative datasets, medical charts, and self-reports. BMC Health Serv Res. 2020;20(1):858. pmid:32917193
34. Daskivich T, Abedi G, Kaplan S, Skarecky D, Ahlering T, Spiegel B, et al. Electronic health record problem lists: accurate enough for risk adjustment?. American Journal of Managed Care. 2018;24(1):e24–9.
35. Ninomiya K, Kageyama S, Garg S, Masuda S, Kotoku N, Revaiah P, et al. Can machine learning unravel unsuspected, clinically important factors predictive of long-term mortality in complex coronary artery disease? A call for ‘big data’. Eur Heart J Digit Health. 2023;4(3):275–8.
36. D’Amour A, Heller K, Moldovan D, Adlam B, Alipanahi B, Beutel A, et al. Underspecification presents challenges for credibility in modern machine learning. The Journal of Machine Learning Research. 2022;23(1):10237–97.
37. Chen D, Liu S, Kingsbury P, Sohn S, Storlie CB, Habermann EB, et al. Deep learning and alternative learning strategies for retrospective real-world clinical data. NPJ Digit Med. 2019;2:43. pmid:31304389
38. Weiner SJ, Wang S, Kelly B, Sharma G, Schwartz A. How accurate is the medical record? A comparison of the physician’s note with a concealed audio recording in unannounced standardized patient encounters. J Am Med Inform Assoc. 2020;27(5):770–5. pmid:32330258
39. King BL, Meyer ML, Chari SV, Hurka-Richardson K, Bohrmann T, Chang PP, et al. Accuracy of the electronic health record’s problem list in describing multimorbidity in patients with heart failure in the emergency department. PLoS One. 2022;17(12):e0279033. pmid:36512600
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 Morgenstern et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Background/Objective
Reported medical disorders from population surveys, medical records, and clinical trials, may not be accurate and methods are needed to improve confirmation. We report the accuracy of reported prevalence of medical disorders in a clinical trial and comparison with potential verification methods.
Methods
We report the prevalence of 11 medical disorders, utilizing prospectively collected data from 729 participants in an eight-country multicenter clinical treatment trial on non-arteritic anterior ischemic optic neuropathy (NAION). We chose disorders where the medical history was potentially verifiable. We determined the prevalence using four methods: Method (M)1: Participant and medical health record reporting; M2: Physical examination, clinical tests; M3: Medication indications; M4: Combining M2 and M3. We estimated concordance between M1 and the other methods using Cohen’s kappa (K) statistic.
Results
Prevalence of the medical disorders based on M1 were lower than for either M2 or M3, depending on the disorder, and consistentlly lower for M4. For M1 and M4, moderate concordance (K ≥ 0.50) was observed only for psychiatric disorders (K = 0.52) and prior NAION (K=0.67). The prevalence and concordance for M1 and M4 for anemia, hypertension, diabetes and psychiatric disease were the only disorders that differed between females and males. For all methods, the prevalence varied widely across countries. Concordance for M1 and M4 varied and moderate concordance occurred for psychiatric disorders and prior NAION.
Conclusion
Even with prospective, rigorously collected data, medical histories do not reliably identify all medical disorders. Adding the results of physical examination, laboratory tests, and medications increases the accuracy of reporting. This strategy could be adapted for clinical trials and electronic medical record disease-prevalence data mining.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer