Content area
Background
Alzheimer’s disease (AD) and related dementias (ADRD) are common in older adults, their prevention and management are challenging problems. To prevent or delay ADRD, dietary supplements (DS) have emerged as a promising treatment; however, the role of DS usage on disease progression of patients with cognitive impairments remains unclear. Little clinical trial evidence is available, but substantial information is contained in electronic health records (EHR), including structured and unstructured data about patients’ DS usage and disease status. The objectives of this study were to (1) develop accurate natural language processing (NLP) methods to extract DS usage for patients with Mild Cognitive Impairment (MCI) and ADRD, (2) examine the coverage of DS in structured data versus unstructured data and (3) compare DS usage information in EHR with National Health and Nutrition Examination Survey (NHANES) data.
Methods
We collected EHR data for patients with MCI and ADRD. A pipeline to extract the usage information of DS from both structured data and unstructured clinical notes was developed in the study. For structured data, we used the medication table to identify the DS and for unstructured clinical notes, we applied Bidirectional Encoder Representations from Transformers (BERT) fine-tuning strategy to extract the DS usage status.
Results
The best named entity recognition model for DS achieved an F1-score of 0.964 and the PubMed BERT-based use status classifier had a weighted F1-score of 0.879. We applied these models to extract DS usage information from unstructured clinical notes and subsequently compared and combined with those from structured medication orders. In total, 125 unique DS were identified for patients with MCI and 108 unique DS were identified for patients with ADRD.
Conclusions
In this study, we developed an NLP-based pipeline to extract the DS use information from medication structured data and clinical notes in EHR for patients with MCI and ADRD. Our method could further help understand the DS usage of patients with MCI and ADRD, and how these DS could influence the diseases.
Background
Preventing or delaying Alzheimer’s disease (AD) and related dementias (ADRD) remains challenging [1]. The ADRD endemic affected 50 million people worldwide in 2019, but is expected to reach 75 million by 2030 and 152 million by 2050 [2]. However, no drugs have been shown to prevent, cure, or even slow the progression of AD, including the recently FDA-approved aducanumab [3]. Given the current evidence that AD pathogenesis takes decades to develop, prevention strategies implemented in the preclinical phase or prodromal phase of AD known as Mild Cognitive Impairment (MCI) is critical [4].
In the quest to prevent or delay onset of ADRD, dietary supplements (DS) including vitamins, minerals, herbs and botanicals, and probiotics have emerged as a promising treatment. Evidence has been increasing, but mixed, for the role of DS in ADRD prevention [5, 6]. On one hand, epidemiological studies suggest antioxidants, vitamins, polyphenols, polyunsaturated fatty acids, fish, fruits, vegetables, tea, and light-to moderate consumption of alcohol are beneficial for cognition [6]. Daily folic acid plus vitamin B12, a combination of EGCG, DHA and ALA (EDA), and Korean thistle have shown some promising effects [7, 8]. Omega-3 fatty acid supplementation in mild AD has shown beneficial effects in disease onset with slight impairment in brain function [9]. Among marine n-3 fatty acids, only DHA was protective against the development of AD and consumption of α-linolenic acid was protective in people with the APOE-ϵ4 genotype [10]. A study reported that folic acid and vitamin B12 can slow down the conversion from MCI to dementia [11].
Other studies have concluded small or no effect of DS for ADRD prevention. For example, the strength of evidence of ω-3 fatty acids, soy, ginkgo biloba, folic acid alone or with other B vitamins, β-carotene, vitamin C, vitamin D plus calcium, and multivitamins or multi-ingredient supplements on reducing cognitive decline or ADRD was insufficient, low, or none [7, 12, 13]. Moderate strength evidence showed that vitamin E had no benefit on cognition and in delaying or preventing the onset of AD [7]. The LipiDiDiet trial using Fortasyn Connect (includes uridine monophosphate; choline; phospholipids; eicosapentaenoic acid; docosahexaenoic acid, vitamins E, C, B12, and B6; folic acid; selenium) did not slow cognitive decline in persons with mild-to-moderate AD [14].
About 77% of US adults took at least one DS according to the 2019 Council for Responsible Nutrition survey [15]. Given the prevalence of DS use in the US adults and especially older adults, it is critical to establish the efficacy and safety of DS for brain health promotion and AD prevention. However, traditional gold-standard approaches such as clinical trials have limitations. Sample sizes in DS clinical trials are generally small, ranging from 100 to 500 based on clinicaltrials.gov. As such, the power is low to detect modest but clinically meaningful effect sizes. Further, clinical trials are resource-intensive and hence have only evaluated a small number of DS. Lastly, the enrolled population, dosing strategy, and compliance behavior may not reflect the way in which DS are used in the real-world population [16]. The limitations of clinical trials can be partially overcome by electronic health records (EHR) which contain real-world, longitudinal clinical data from a large number of patients. EHR have been increasingly used for a variety of clinical studies, including pharmacovigilance studies [17, 18]. EHR store structured data on drug, DS, drug interactions, adverse events, and signs and symptoms as well as unstructured clinical notes that contain individualized specific, detailed information that structured data didn’t capture [19]. Furthermore, a previous study by our group found that a substantial percentage (~ 40%) of total medications listed in EHR are DS [20]. However, data quality and consistency in EHR can vary, so it is important to benchmark findings against existing high-quality data sources such as national surveys and longitudinal cohort studies.
Hence, the purposes of this study were to (1) develop an accurate natural language processing (NLP) pipeline to extract DS usage for patients with MCI and ADRD, (2) examine the coverage of DS in structured data versus unstructured data, and (3) compare DS usage information in EHR with National Health and Nutrition Examination Survey (NHANES) data. Findings from this study will inform future clinical studies on the roles of DS on AD progression. To our knowledge, this is the first study to explore Transformer-based language models to identify DS use status in clinical notes and explore the role of clinical notes to document DS information in EHR. We also demonstrate that DS representation in EHR is comparable to the NHANES data, establishing the external validity of our methods.
Methods
Data source
The data used in this study was obtained from the EHR of the University of Minnesota (UMN) Clinical Data Repository. We used diagnosis codes to identify 25,669 patients with MCI or ADRD from 2001 to 2018. This study was approved by the UMN Institutional Review Board.
We defined two patient groups (MCI and ADRD groups) based on International Classification of Diseases (ICD) codes for MCI (ICD codes for MCI: 331.83, 294.9, G31.84, F09) and ADRD (290.40, 290.41, 331.0, 331.11, 331.19, 331.82, G30.0, G30.1, G30.8, G30.9, G31.01, G31.09, G31.83, F01.50, F01.51). Since MCI and ADRD are two clinical phases of AD, the inclusion criterion for the MCI group was that the patients must have at least one MCI diagnosis but no ADRD diagnosis. The inclusion criteria for the ADRD group are that patients had (1) an ADRD diagnosis, (2) If MCI existed, then the diagnosis of MCI occurred before the diagnosis of ADRD, and (3) If MCI existed, the first diagnosis of MCI appeared at least 6 months prior to the first diagnosis of ADRD.
Identify DS mentions from EHR
[IMAGE OMITTED: SEE PDF]
Figure 1 shows the overview of the study. DS usage information was mainly from two sources, i.e., the structured medication orders and unstructured clinical notes. In order to consistently compare the DS usage for the MCI and ADRD groups, we added a time constraint for extracting the DS to 6 months before and after the first MCI diagnosis.
Identify DS usage from structured medication orders
To extract the DS information from the structured medication orders, we identified a list of DS in our prior work [20]. The categories of DS in the DS list and representative examples are listed in Table 1.
[IMAGE OMITTED: SEE PDF]
All medication orders containing the Medication_id in the DS list were identified. Each Medication_id represents a medication brand name. Since different brand names may contain the same medication ingredient, we merged the identified medication orders of each patient by the Ingredient_id, and summarized the patients’ DS usage by ingredient level.
Identify DS usage from unstructured clinical notes
We collected each patient’s clinical notes, and first used the keyword searching strategy with BERT fine-tuning method to capture DS usage information in the clinical notes. We used a series of methods to collect keywords, including synonyms and misspellings as follows: First, DS ingredients identified in the medication orders, including the DS ingredient name identified from the Ingredient_name, were used as keywords. We also used the DS knowledge base iDISK [21], which has been demonstrated to have better DS coverage than the Unified Medical Language System [22], to add more synonyms to the DS keyword list. We also leveraged our prior work [23] which trained a word embedding based on clinical corpus, thus including misspellings that actually exist in our corpus. All the sentences containing any DS ingredient mentions were extracted for further classification of use status using the BERT fine-tuning method.
Development of BERT-based NLP algorithms to classify DS use status in unstructured data
A corpus of 2,500 sentences [24] used in a prior study to assess machine learning models for classifying use status was used for fine-tuning. The corpus contains the use status for 25 different dietary supplements (e.g., black cohosh, ginger, glucosamine, vitamin E, etc.) which were used for fine-tuning the use status classification task. These use statuses include “started”, “continuing”, “discontinued”, and “uncertain”. The dataset was split into training, development, and test sets (80/10/10) for training and evaluation.
BERT, BioBERT, ClinicalBERT, and PubMedBERT [25,26,27,28] were fine-tuned to perform the named entity recognition (NER) and use status classification tasks simultaneously. We decided on these models based on prior experience using various BERT models in prior work [29] and because they demonstrated strong results in a variety of biomedical NLP tasks at the time our experiments were run. The HuggingFace transformers package [30] (v4.0.0) was used with PyTorch (v1.6.0) for training and inference. The model consisted of a single BERT model to provide contextual embeddings of the input sentences which were then passed to two linear layers. One linear layer took the contextual embedding as is and was trained to perform the NER task and the other linear layer took the pooled output to perform use status classification. The loss from each layer was summed and the total was propagated back through the model. A single forward pass through the BERT model and a single backward pass of the combined loss was chosen in order to reduce the number of computations. Training was done using AdamW [31] over 3 epochs with a learning rate of 2e-5, batch size of 16, max sequence length of 256, and cosine annealing with no warmup was used for learning rate decay. The PubMedBERT model demonstrated the highest performance and was selected for extracting DS mentions and determining use-status. The extracted sentences containing the DS mentions were then passed to the fine-tuned PubMed BERT model for use-status classification at the sentence level rather than the token level. In instances where a sentence contained multiple DS mentions, a unique window was constructed for each DS and classified independently. For example, if the sequence “…started black cohosh and vitamin d last week while…” was found in a larger sentence, two windows would be constructed for each DS and each window would be classified based on the DS it was constructed around.
Integrating DS usage from structured and unstructured data
After the DS were identified from the structured medication orders and unstructured clinical notes for the MCI and ADRD groups, a list of single DS ingredients was summarized from all identified DS. All identified DS for each patient were normalized to ingredient level, for example, both ascorbic acid and vitamin c were normalized to ascorbic acid. DS usage timelines for each patient in the MCI and ADRD groups were compiled for further analysis. The DS usage timelines include all DS ingredients identified for all time points for each patient, along with the DS source, i.e., from medication orders or clinical notes.
Patient level summary
All DS used by each patient were summarized. The DS usage timeline data for each patient was merged by patient_id, and all DS and associated counts were calculated for each patient. The average number of DS used by each patient and the distribution of number of DS used were summarized.
DS level summary
For DS level summary, the unique DS ingredient used, and the number of patients using each DS ingredient were summarized. The most frequently consumed DS for both ADRD and MCI groups were identified. For the overlapped top frequently consumed DS for two groups, the Chi-square tests were performed to compare if use of certain DS among the patients with MCI and ADRD are statistically different.
Comparing the DS extracted from medication orders and clinical notes
We also compared the DS information extracted from the structured medication orders and unstructured clinical texts. The numbers and proportions of DS extracted from medication orders and clinical notes were summarized for patients in MCI and ADRD groups.
DS usage and cognitive status identified from the NHANES
We extracted dietary supplements usage of survey respondents aged 60 years and older with poor cognitive performance from the NHANES. NHANES is a cross-sectional survey conducted in the USA by the National Center for Health Statistics to assess the health and nutritional status of the US population [32]. We obtained a dataset from publicly-available part of NHANES that contains the 2011–2012 and 2013–2014 survey cycles. The DS information was derived from NHANES Dietary Supplement Database.
While study participants do not undergo the clinical testing needed to establish a diagnosis of MCI or ADRD, we sought to derive an indicator of cognitive impairment from NHANES data using a previously published method [33,34,35,36]. In the 2011 to 2014 NHANES cycles, three cognitive tests were conducted: the Consortium to Establish a Registry for Alzheimer’s disease Word Learning (CERAD W-L) sub-test for assessing immediate word learning and recall ability [37]; the Animal Fluency test for examining categorical verbal fluency (range: 1–40, 25% Q1 = 12) and the DANTES Subject Standardized Tests (DSST) (0-105, 25% Q1 = 33). Of note, the CERAD W-L test consists of three consecutive learning trials and a delayed recall, the score of each trial and the delayed recall ranges from 0 to 10. We derived the sum of the three immediate trials scores (range: 0–30, 25% Q1 = 15) and delayed call score (range: 0–10, 25% Q1 = 4). We used the lowest quartile (25% Q1) of variables in the study group as the cutoff value to identify those with cognitive impairments. Respondents with any missing value of impairment related variables were excluded. The extracted information from NHANES was compared to the results extracted from the EHR data.
Results
Patient cohorts for the MCI and ADRD groups
We identified 11,884 patients with MCI and 947 patients with ADRD in the EHR data. Table 2 summarizes the demographics of the two groups of patients.
[IMAGE OMITTED: SEE PDF]
Overall summary of DS extracted from EHR
Table 3 shows the summary of overall number of medications orders and clinical texts that contain DS identified for the MCI and ADRD groups, as well as the numbers of DS identified from each source. We found substantially more DS usage information documented in the clinical notes than structured medication orders, with this gap being larger for the ADRD group than the MCI group. Furthermore, while the number of DS identified from each source was proportional to the number of sources analyzed in the MCI group, structured medication orders were relatively “information-poor” in the ADRD group, accounting for 11% of sources analyzed but only 5% of DS identified.
[IMAGE OMITTED: SEE PDF]
BERT fine-tuning for use status classification
Table 4 shows the performances for BERT DS named entity recognition task. The PubMedBERT model achieved the best performances with strict match F1-score equals to 0.967 and lenient match F1-score equals to 0.969.
[IMAGE OMITTED: SEE PDF]
Table 5 shows the performances for use status classification task. The PubMedBERT model achieved the best performances with micro F1-score equals to 0.879.
[IMAGE OMITTED: SEE PDF]
DS usage and cognitive status
We extracted DS usage information from EHR for 10,135 patients with MCI and 935 with ADRD. In total, 108 unique DS were identified for patients with ADRD from both medication orders and clinical notes, while 125 were identified for the patients with MCI. The number of DS taken by patients ranged from 1 to 29; the distributions of total number of DS taken by patients with MCI and ADRD are shown in Fig. 2.
[IMAGE OMITTED: SEE PDF]
On average, patients with MCI took 6.4 DS compared to 8.8 DS for patients with ADRD (p < 0.001). This difference of 2.4 DS did not change (and remained highly significant) after adjustment for age and gender in a linear regression model. Several DS’s were taken much more frequently by those in the MCI vs. ADRD group, and vice versa. Figure 3 compares the percentage of takers in the MCI and ADRD groups for the 22 DS taken by more than 5% of both groups.
[IMAGE OMITTED: SEE PDF]
Table 6 presents the age- and gender-adjusted odds ratios comparing the probability of taking each of these commonly taken DS between the MCI and ADRD groups, derived from a logistic regression model. Notably, 17 of the 22 most commonly taken DS’s had odds ratios significantly different than 1. 13 DS’s are taken significantly more frequently by ADRD patients, and 4 DS’s are taken significantly more frequently by patients with MCI. When the analysis was expanded to include DS’s taken by at least 0.1% of both groups, 38 DS’s had significant age- and gender-adjusted odds ratios, with 31 being taken more frequently by patients with ADRD and 7 being taken more frequently by patients with MCI. The largest odds ratios in this expanded analysis were for Caffeine (OR = 104, 95% CI 72.1 to 149), and Lipase (OR = 25.4, 95% CI 18.5 to 34.8), and the smallest was for Pantothenate (OR = 0.31, 95% CI 0.10 to 0.98).
[IMAGE OMITTED: SEE PDF]
Comparison of DS documentation in structured and unstructured EHR data
In this study, we also explored the DS information that existed in the structured medication orders and clinical notes. In the MCI group, we identified on average 0.44 more DS from unstructured clinical notes than from structured medication orders for each patient. And on average, for each patient, 53.5% of DS information was extracted only from the clinical notes, 39.4% was from medication orders, whereas 7.1% were from both resources.
For each patient in the ADRD group, we identified on average 5.6 more DS from unstructured clinical notes than from structured medication orders. The DS only identified from the clinical notes constitute 63.7% of all identified DS for each patient, while DS only identified from the medication orders constitute 8.9%, while 27.4% were from both sources. Among 11,884 patients with MCI, 8,150 of them have medication orders that contain DS and 9,368 of them have clinical notes for DS. For the ADRD group, 616 of 947 patients have medication orders that contain DS and 929 of them have clinical notes for DS. Figure 4 shows the bar plot of number of patients for two groups that have DS information in different categories. It shows that clinical notes contribute a significant portion of DS usage information for both groups, especially for the ADRD group.
[IMAGE OMITTED: SEE PDF]
Some DS often appear in medication orders while some of them are likely to be found in clinical notes. We calculated the ratio of sources for each DS (number of patients for a DS identified from medication orders: number of patients for a DS identified from clinical notes). Table 7 shows the top 10 DS with highest ratios from either medication orders or clinical notes for two patient groups.
[IMAGE OMITTED: SEE PDF]
DS data summary for 2011 to 2014 NHANES cycles
We compared the DS extracted from EHR with DS from NHANES data qualitatively. Among the 125 DS ingredients from MCI group, 89 (71.2%) of them are overlapped with DS from NHANES data. And for 108 DS from ADRD group, 78 (72.2%) of them are in the NHANES data.
Among the 1042 patients identified with cognitive impairments from NHANES, 43.38% were Non-Hispanic White and 27.54% were Non-Hispanic Black. The average age was 72 and total numbers of female (52.59%) and male (47.41%) were close. The 15 most frequently mentioned DS in NHANES data include: Vitamin D, Calcium, Vitamin B12, Vitamin C, Vitamin E, Vitamin B6, Folic acid, Magnesium, Niacin, Zinc, Riboflavin, Pantothenic acid, Thiamin, Vitamin A and Biotin.
Discussion
Summary of DS usage for MCI and ADRD groups
In this project, we identified the DS usage from EHR for the patients with MCI and ADRD in certain time periods and compared the results for both groups. The DS usage information was extracted from both structured medication orders and unstructured clinical notes, and our results show that our methods could effectively extract DS usage information for most of the patients in MCI (85.3%) and ADRD (98.8%) groups. On average, each patient with ADRD took 2.4 more DS than patient with MCI.
The top frequently mentioned DS in the two groups are largely overlapped. However, for the 13 overlapped top-frequent DS, all the percentages (number of patient used the DS/total number of patients in the group) of MCI and ADRD groups are significantly different (p < 0.01). These findings indicate that patients continued to use DS in hope of slow down their cognitive decline and increase DS use and their choices of DS changed as well after their MCI progressed to AD.
Comparison of DS identified from the EHR and NHANES data
In this study, the DS extracted from EHR were also compared with NHANES national survey data, which record comprehensive DS use information for patient with cognitive impairment at the national wide. The results show that our DS extracted from EHR are highly overlapped with the NHANES data, and 71.2% and 72.2% of DS identified in EHR are overlapped in the NHANES data respectively. Among the 15 most frequently used DS in NHANES, 5 of them are in 15 most frequently DS of MCI group and 4 of them are in ADRD group. All top 20 DS in NHANES data were also found in EHR data by our method. Our findings indicate that DS information documented in EHR are highly comparable to NHANES data. The consistency between the NHANES DS use data and the DS use extracted from EHR validate the effectiveness of our DS extraction pipeline and feasibility of using EHR as an observational data sources for DS clinical research.
DS distribution in structured data and unstructured data for patients with MCI and ADRD
We developed NLP method using the keywords searching with BERT DS use status classification pipeline to identify the DS use information from clinical notes. The results show the effectiveness of our method. Compared to the number of DS extracted from structured medication orders, our method could identify 0.44 and 5.6 more DS from clinical notes for MCI and ADRD groups respectively. In MCI group, 16.8% of patients only have DS information from clinical notes while for the ADRD group, the percentage is 33.7%, which indicates that clinical notes are important sources for DS usage information. In Table 7, the top DS that are most likely to be identified from medication orders and clinical notes were identified. The medication orders could only record the DS that were prescribed at the hospital, while the clinical notes may provide information related to all other DS that patients took themselves.
BERT fine-tuning for DS use status identification
A qualitative error analysis of a random sample of the notes revealed the model tended to assign DS labels to items in lab results and tried to predict use status for those tokens as well. For example, creatinine or sodium appearing in metabolic panel results would often be labeled as DS by the NER head and the use-status head would try to predict if the patient was using sodium or not. While there was a class for ‘uncertain’ in the training set, in the examples we examined the model would more often make seemingly random guesses in these cases. This is likely due to the fact that the training set did include some ‘uncertain’ use statuses but often in the context of recommendations to the patient, i.e., “recommended she start omega-3 supplements” or “advised against continuing ginkgo supplements” however, in the clinical notes, the lab results would appear simply as a term followed by a numerical value and was surrounded by similar pairs of terms and numerical values which was not represented in the training set.
The other failure mode that we observed for the NER task was in sequences with multiple DS mentions. Because the training set was primarily sequences containing a single mention of a single DS or, in a minority of sequences, a single mention of two DS, sequences containing mentions of more than two DS were inconsistently labeled by the model. This also affected the use-status head and produced inconsistent labeling of use status. Though, qualitatively, ‘uncertain’ tended to appear more frequently as a label for windows containing multiple DS mentions than other labels.
We found that the model was able to identify some DS that were not explicitly in the training set. Notably, from only vitamins E, the model was able to label others, such as vitamins B, C, and K.
Limitations and future directions
To train accurate deep learning models, we will further expand the annotated corpus to cover more DS. We will also consider using the weak supervision in our prior work [38] to expand the corpus. In this study, only certain time ranges (6 months before and after diagnosis of MCI) of identified DS were included for analysis for MCI and ADRD groups. In the future we plan to include whole timeline DS data for analysis.
Conclusion
In this study, we developed an NLP-based pipeline to extract the DS use information from medication orders and clinical notes in EHR for patients with MCI and ADRD. To extract DS from clinical notes, we developed a PubMed BERT use status classification model which achieved a F1-score of 0.879. 125 unique DS were identified for patients with MCI and 108 unique DS were identified for patients with ADRD. The DS level and patient level DS use situation for patients with MCI and ADRD were summarized. Clinical notes were a major source of DS usage information. The public NHANES data were also summarized and compared with the results from the EHR data. The results validated the effectiveness of our methods for DS extraction and feasibility of using EHR as an additional data source to investigate DS usage.
Data availability
The datasets generated and/or analyzed during the current study are not publicly available due to it contains protected health information (PHI) but are available from the corresponding author on reasonable request.
Abbreviations
AD:
Alzheimer’s disease
ADRD:
Alzheimer’s disease and related dementias
DS:
Dietary supplements
EHR:
Electronic health records
NLP:
Natural language processing
BERT:
Bidirectional Encoder Representations from Transformers
NHANES:
National Health and Nutrition Examination Survey
UMN:
University of Minnesota
ICD:
International Classification of Diseases
CERAD W-L:
Consortium to Establish a Registry for Alzheimer’s disease Word Learning
World Health O. Published. Risk reduction of cognitive decline and dementia. 2018. https://www.who.int/mental_health/neurology/dementia/english_foreward_executive_summary_dementia_guidelines.pdf?ua=1. Accessed 28 Oct 2019.
World Health O. Dementia. 2017. https://www.who.int/news-room/fact-sheets/detail/dementia. Accessed 25 Mar 2019.
Walsh S, Merrick R, Milne R, Brayne C. Aducanumab for alzheimer’s disease? BMJ. 2021;374:n1682.
Jack CR Jr, Bennett DA, Blennow K, Carrillo MC, Dunn B, Haeberlein SB, Holtzman DM, Jagust W, Jessen F, Karlawish J, Liu E. NIA-AA research framework: toward a biological definition of alzheimer’s disease. Alzheimer’s Dement. 2018;14(4):535–62.
Rasmussen J. The lipididiet trial: what does it add to the current evidence for Fortasyn connect in early alzheimer’s disease? Clin Interv Aging. 2019;14:1481–92.
Jacobs IS, Bean CP. Fine particles, thin films and exchange anisotropy. In: Rado GT, Suhl H, editors. Magnetism. Vol. III. New York: Academic Press; 1963. p. 271–350.
Butler M, Nelson VA, Davila H, et al. Over-the-counter supplement interventions to prevent cognitive decline, mild cognitive impairment, and clinical Alzheimer-type dementia: a systematic review. Ann Intern Med. 2018;168(1):52–62.
Wagle A, Seong SH, Shrestha S, Jung HA, Choi JS. Korean thistle (Cirsium japonicum var. maackii (Maxim.) Matsum.): a potential dietary supplement against diabetes and Alzheimer’s disease. Molecules. 2019;24(3).
Canhada S, Castro K, Perry IS, Luft VC. Omega-3 fatty acids’ supplementation in alzheimer’s disease: A systematic review. Nutr Neurosci. 2018;21(8):529–38.
Morris MC, Evans DA, Bienias JL, et al. Consumption of fish and n-3 fatty acids and risk of incident alzheimer disease. Arch Neurol. 2003;60(7):940–6.
Blasko I, Hinterberger M, Kemmler G, et al. Conversion from mild cognitive impairment to dementia: influence of folic acid and vitamin B12 use in the VITA cohort. J Nutr Health Aging. 2012;16(8):687–94.
Dwyer J, Donoghue MD. Is risk of alzheimer disease a reason to use dietary supplements? Am J Clin Nutr. 2010;91(5):1155–6.
Vernarelli JA, Roberts JS, Hiraki S, Chen CA, Cupples LA, Green RC. Effect of alzheimer disease genetic risk disclosure on dietary supplement use. Am J Clin Nutr. 2010;91(5):1402–7.
Soininen H, Solomon A, Visser PJ, et al. 24-month intervention with a specific multinutrient in people with prodromal alzheimer’s disease (LipiDiDiet): a randomised, double-blind, controlled trial. Lancet Neurol. 2017;16(12):965–75.
2019 CRN consumer survey on dietary supplements: consumer intelligence to enhance business outcomes. 2019. https://www.crnusa.org/resources/2019-crn-consumer-survey-dietary-supplements-consumer-intelligence-enhance-business. Accessed 28 Oct 2020.
He Z, Tang X, Yang X, Guo Y, George TJ, Charness N, Quan Hem KB, Hogan W, Bian J. Clinical trial generalizability assessment in the big data era: a review. Clin Transl Sci. 2020;13(4):675–84.
Harpaz R, DuMochel W, Shah NH. Big data and adverse drug reaction detection. Clin Pharmacol Ther. 2016;99(3):268–70.
Ventola CL. Big data and pharmacovigilance: data mining for adverse drug events and interactions. Pharm Ther. 2018;43(6):340.
van Puijenbroek EP, Egberts AC, Heerdink ER, Leufkens HG. Detecting drug–drug interactions using a database for spontaneous adverse drug reactions: an example with diuretics and non-steroidal anti-inflammatory drugs. Eur J Clin Pharmacol. 2000;56(9–10):733–8.
Zhang R, Manohar N, Arsoniadis E, et al. Evaluating term coverage of herbal and dietary supplements in electronic health records. AMIA Annu Symp Proc. 2015;2015:1361–70.
Rizvi RF, Vasilakes J, Adam TJ, Melton GB, Bishop JR, Bian J, Tao C, Zhang R. iDISK: the integrated dietary supplements knowledge base. J Am Med Inform Assoc. 2020;27(4):539–48.
Vasilakes J, Bompelli A, Bishop J, Adam T, Bodenreider O, Zhang R. Assessing the enrichment of dietary supplement coverage in the UMLS. J Am Med Inf Assoc. 2020;ocaa128. https://doi.org/10.1093/jamia/ocaa128
Fan Y, Pakhomov S, McEwan R, Zhao W, Lindermann E, Zhang R. Using word embeddings to expand terminology of dietary supplements on clinical notes. J Am Med Inf Association Open. 2019;2(2):246–53.
Fan Y, Zhang R. Using natural Language processing methods to classify use status of dietary supplements in clinical notes. BMC Med Inf Decis Mak. 2018;18(2):15–22.
Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, Naumann T, Gao J, Poon H. Domain-specific language model pretraining for biomedical natural language processing. arXiv preprint arXiv:2007.15779. 2020 Jul 31.
Devlin J, Chang MW, Lee K, Toutanova K, BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 2018 Oct 11.
Alsentzer E, Murphy JR, Boag W, Weng WH, Jin D, Naumann T, McDermott M. Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323. 2019 Apr 6.
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical Language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.
Zhang R, Hristovski D, Schutte D, Kastrin A, Fiszman M, Kilicoglu H. Drug repurposing for COVID-19 via knowledge graph completion. J Biomed Inform. 2021;115:103696.
Wolf T, Chaumond J, Debut L, Sanh V, Delangue C, Moi A, Cistac P, Funtowicz M, Davison J, Shleifer S, Louf R. Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; 2020 Oct. p. 38–45.
Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. 2017 Nov 14.
National Health and Nutrition Examination Survey. https://www.cdc.gov/nchs/data/nhanes/nhanes_13_14/NHANES_Overview_Brochure.pdf. Accessed 15 Jan 2021.
Chen SP, Bhattacharya J, Pershing S. Association of vision loss with cognition in older adults. JAMA Ophthalmol. 2017;135(9):963–70.
Baker ML, Wang JJ, Rogers S, Klein R, Kuller LH, Larsen EK, Wong TY. Early age-related macular degeneration, cognitive function, and dementia: the cardiovascular health study. Arch Ophthalmol. 2009;127(5):667–73.
Rosano C, Newman AB, Katz R, Hirsch CH, Kuller LH. Association between lower digit symbol substitution test score and slower gait and greater risk of mortality and of developing incident disability in well-functioning older adults. J Am Geriatr Soc. 2008;56(9):1618–25.
Swindell WR, Cummings SR, Sanders JL, Caserotti P, Rosano C, Satterfield S, Strotmeyer ES, Harris TB, Simonsick EM, Cawthon PM. Data mining identifies digit symbol substitution test score and serum Cystatin C as dominant predictors of mortality in older men and women. Rejuven Res. 2012;15(4):405–13.
Morris JC, Heyman A, Mohs RC, et al. The consortium to Establish a registry for alzheimer’s disease (CERAD). Part 1. Clinical and neuropsychological assessment of alzheimer’s disease. Neurology. 1989;39:1159–65.
Shen Z, Yi Y, Bompelli A, Yu F, Wang Y, Zhang R. Extracting lifestyle factors for Alzheimer’s disease from clinical notes using deep learning with weak supervision. arXiv preprint arXiv:2101.09244. 2021 Jan 22.
© 2025. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.