Content area
Background
The clinical information housed within unstructured electronic health records (EHRs) has the potential to promote cancer research. The National Cancer Center Hospital (NCCH) is widely recognized as a leading institution for the treatment of thoracic malignancies in Japan. Information on medical treatment, particularly the characteristics of malignant tumors that occur in patients, tumor response evaluation, and adverse events, was compiled into the databases of each NCCH department from EHRs. However, there have been few opportunities for integrated analysis of data on both the hospital and research institute.
Methods
We developed a method for predicting tumor response evaluation and survival curves of drug therapy from the EHRs of lung cancer patients using natural language processing. First, we developed a rule-based algorithm to predict treatment duration using a dictionary of anticancer drugs and regimens used for lung cancer treatment. Thereafter, we applied supervised learning to radiology reports during each treatment period and constructed a classification model to predict the tumor response evaluation of anticancer drugs and date when the progressive disease (PD) was determined. The predicted response and PD date can be used to draw a survival curve for the progression-free survival.
Results
We used the EHRs of 716 lung cancer treatments at the NCCH and structured data of the cases as labels for the training and testing of supervised learning. The structured data were manually curated by physicians and CRCs. We investigated the results and performance of the proposed method. Individual predictions of tumor response evaluation and PD date were not extremely high. However, the final predicted survival curves were nearly similar to the actual survival curves.
Conclusions
Although it is difficult to construct a fully automated system using our method, we believe that it achieves sufficient performance for supporting physicians and CRCs constructing the database and providing clinical information to help researchers find out a chance of clinical studies.
Details
Tumors;
Supervised learning;
Patients;
Prescription drugs;
Survival;
Machine learning;
Drug development;
Electronic medical records;
Physicians;
Medical treatment;
Chemotherapy;
Medical prognosis;
Hospitals;
Antineoplastic drugs;
Thorax;
Algorithms;
Dictionaries;
Clinical medicine;
Cancer therapies;
Lung cancer;
Oncology;
Drug therapy;
Health services;
Drugs;
Survival analysis;
Electronic health records;
Radiology;
Structured data;
Natural language processing;
Unstructured data;
Malignancy