INTRODUCTION
In the field of medical research, clinical trials are one of the important means to promote the development of human health [1]. Since 2019, Corona Virus disease 2019 (COVID-19), an acute respiratory infectious disease caused by SARS coronavirus type 2, has become an unprecedented public health crisis, posing a serious threat to human life safety [2, 3]. Due to the severity of COVID-19, the World Health Organization has raised the risk assessment to the highest level and declared it a global pandemic. According to clinical observation, its typical clinical symptoms include dry cough, dyspnoea, headache and fever, and other symptoms include muscle pain, confusion, chest pain and diarrhoea, which may lead to acute respiratory distress syndrome and septic shock in severe cases, eventually leading to multiple organ failure and even death [4].
In the clinical trial summary [5], there are many medical related semantic relationships between entities. After the entity recognition task, the relationship between entities can be further extracted. The entity relationship not only reflects the latest diagnosis methods, drug design, treatment plans, preventive measures and test purposes made by clinical researchers for the study of related diseases but also contains rich clinical trial knowledge and rules. Therefore, performing entity relationship extraction on abstracts to construct a clinical knowledge graph [6] is of great significance in improving trial efficiency, summarising trial rules, customising personalised plans, understanding the latest clinical information, improving clinical trial design and saving clinical resources.
This study takes COVID-19 as an example, uses the entity relationship extraction model in this mining technology to mine the text content of a large number of relevant clinical trial registrations, extracts the relevant clinical entities and relationships in the registration text to complete the knowledge establishment of follow-up research, laying a foundation for further research on drug recommendation, disease prediction, adverse drug reaction detection, intelligent medical question answering system etc.
The main contributions of this paper are summarised as follows:
First of all, according to the work of the unified medical language system and the former, the relationship type of the clinical trial text of this study was determined, and the COVID-19 clinical entity relationship extraction corpus was constructed.
Secondly, the pre-training model is used to extract the semantics to obtain the dynamic word vector, and the hidden deep features in the input vector are extracted through the hierarchical two-way gated loop unit network.
At the same time, the attention mechanism is introduced to capture the feature information of the sentence.
Finally, the Conditional Random Field (CRF) model is input to obtain a more accurate conditional probability of entity relationship. The experimental results show that the proposed model performs well in COVID-19 clinical entity relationship extraction task.
RELATED WORKS
Long short-term memory
In order to solve the problem that the traditional recurrent neural network (RNN) [7] may lead to gradient explosion and gradient disappearance when training long sentences, Hochreiter et al. [8] proposed Long Short-Term Memory (LSTM), which can capture long-distance dependence features in long text training. Compared with the standard RNN model, LSTM adds two modules: gating mechanism and memory unit. The memory unit is used to store text features, and the gating mechanism filters the stored information in the memory unit. Zhao et al. [9] proposed a fault diagnosis method based on the long short-term memory neural network. The new method can directly classify raw process data without specific feature extraction and classifier design. Li et al. [10] used a long short-term memory neural network method to predict tourist flows and experimentally demonstrated that the LSTM method outperformed the autoregressive integral moving average model and back-propagation neural network. And this is the first time that LSTM is applied to tourism flow forecasting.
Long Short-Term Memory model has set the input gate, forgetting gate and output gate respectively, which eliminates the possible problems of the RNN model when processing long text tasks by accumulating and updating information. Its unit structure is shown in Figure 1.
[IMAGE OMITTED. SEE PDF]
Long Short-Term Memory model is composed of the input word Xt, cell state Ct, temporary cell state , hidden state ht, forgetting gate ft, input gate it and output gate ot at time t. Among them, the forgetting gate determines which information is retained or discarded in the previous step; The input gate is used to process the input of the current sequence position; the output gate determines the next hidden state.
Calculate the forgetting gate according to formula (1), where the input is the hidden state ht−1 at the previous moment and the input word Xt at the current moment.
Calculate the value it of the input gate and the temporary cell state . As shown in formula (2) and formula (3), where tanh is the hyperbolic tangent activation function.
Calculate the cell state Ct at time t, where the input is the value of the input gate it, the value of the forgetting gate ft, the temporary cell state and the cell state Ct−1 at the previous time. The calculation formula is as follows:
For processing natural language processing (NLP) tasks (especially sequence labelling tasks), contextual content is particularly important throughout the research process, whether it is for words, phrases or characters. Usually, the common unit of LSTM is forward propagation. However, when studying sequence problems, forward LSTM cannot process the content information below, which makes the model unable to learn the following knowledge and affects the final model effect. Bi directional LSTM (BiLSTM) [11] can not only obtain the above information, but also capture the following content. It can memorise the two-way information and improve the performance of the whole NLP model [12] by obtaining the output in both directions at the same time. Xu et al. [13] used BiLSTM for sentiment analysis. The comparative experiments show that the proposed sentiment analysis method has higher accuracy, recall and F1 scores than LSTM, RNN etc. The structure of the BiLSTM model is shown in Figure 2. [14].
[IMAGE OMITTED. SEE PDF]
Its steps are as follows: Start from the front and back respectively, then calculate the LSTM of different paths and then combine the LSTMs of two different directions to obtain the BiLSTM. The forward LSTM contains the past data information of the input sequence; the backward LSTM contains the future data information of the input sequence. Then the hidden state Ht of BiLSTM at time t includes forward and backward . The specific formula is as follows:
Gated recurrent neural network
The gated recurrent neural network (GRU) is a gating mechanism of RNN [15], which is similar to other gating mechanisms (such as LSTM). It aims to solve the gradient explosion problem in standard RNN and retain the long-term information of the sequence at the same time. The difference is that GRU extracts the really necessary elements in learning based on LSTM, combines the forgetting gate and input gate in LSTM into the updated gate unit and introduces the concept of reset gate, which not only reduces the parameters of the model, removes the cell state but also improves the speed of model training. In practical application, GRU and LSTM often have similar excellent performance, while other gated RNN variants are also difficult to defeat these two original structures in a wide range of tasks. Hafiz et al. [16] used the attention-based GRU-LSTM statement-level defect prediction method to solve the problem that software defect prediction cannot accurately predict failures. Zhou et al. [17] used the GRU model for the time series prediction of air pollutants, and the comparative experiments showed that the prediction accuracy based on the GRU model was higher. The general structure of GRU is shown in Figure 3.
Specifically, assuming the number of hidden cells is h, the small batch input (n is the sample size, d is the number of inputs) of a given time step t and the hidden state of the previous time step, the reset gate and the update gate are calculated as follows:
[IMAGE OMITTED. SEE PDF]
The hidden state calculated based on the reset gate is shown in formula (10). Wherein, and are the weight matrix, is the deviation matrix and ⊙ is the same or operation.
The reset gate determines how the candidate’s hidden state at the current time depends on the hidden state at the previous time, and the hidden state at the previous time may contain the complete historical information in the time series. Therefore, the reset gate can be used to discard the historical information irrelevant to the prediction results.
Finally, the update formula of the hidden state based on the update gate pair time step t is shown in formula (11):
Conditional random field
Conditional Random Field is a learning model of the discriminant probability undirected graph based on the maximum entropy model and hidden Markov model. It is commonly used to label and segment the conditional probability model of ordered data. In the CRF method, Li et al. [18] make full use of the temporal characteristics of music audio features to classify music regions. Its general definition is as follows:
Let X, Y random variables, P(Y|X) be the conditional probability distribution of Y under the given X condition. Suppose that the random variable Y forms an undirected graph G = (V, E), and is a set of random variables Yv with G as the middle node and v as the index. Under the condition of given X, if each random variable Yv obeys thr Markov attribute, that is, formula (12) is true for any vertex v, then the conditional probability distribution P(Y|X) is called CRF.
Attention mechanism
Attention mechanism was first applied to computer images and then it was gradually widely used in speech recognition, natural language processing and other fields due to its excellent performance and [19–21]. Yan et al. [22] used a novel spatiotemporal attention mechanism in an encoder-decoder neural network for video captioning. Spatiotemporal attention mechanism successfully takes into account the spatial and temporal structure in the video, enabling the decoder to automatically select the most relevant time segments of important regions for word prediction.
Its working principle is to calculate the similarity between the current input unit and the entire input sentence information through a function and then assign the calculation result to the input sentence as a weight. Let αi be the attention distribution (i.e. probability distribution), and be the attention scoring mechanism. Common scoring mechanisms include the additive model, dot product model, scaled dot product model and bilinear model. The specific formula is shown in Formula (13):
MPNet
MPNet is a pre-training language model based on the respective characteristics of Bert and xlnet, which was jointly proposed by Nanjing University and Microsoft in 2020 [23]. Its main contribution is to integrate the advantages of Masked Language Model (MLM) and Permuted Language Model (PLM), make up for the deficiency that MLM cannot learn the dependency between tokens and overcome the problem that PLM cannot obtain the complete information visible in downstream tasks. Their experiments on various tasks show that MPNet substantially outperforms MLM and PLM, as well as previous powerful pre-trained models such as BERT, XLNet and RoBERTa.
The attention mask mechanism of the MPNet model is as follows: First, set the input sequence with length n = 6 as (x1, x2, x3, x4, x5, x6). If the randomly generated sequence is (x5, x4, x2, x6, x3, x1) and the predicted values are x6, x3 and x1 respectively, then the non-predicted sequence is expressed as (x5, x4, x2, [mask], [mask], [mask]), corresponding to the position sequence . Second, in order to enable the [mask] of the prediction part to see the previously predicted tokens, MPNet uses the PLM double stream self attention mechanism to complete the autoregressive generation and sets different masking mechanisms for the content stream and the query stream. For example, when MPNet predicts x3 in the above sequence, it can see (x5 + P5, x4 + P4, x2 + P2) in the non-prediction part and (x6 + P6) in the prediction part, thus avoiding the problem of missing dependencies in MLM. In addition, in order to ensure consistency between the input information in the pre-training and the input information in the downstream task, MPNet adds mask symbols and position information ([mask] + P6, [mask] + P3, [mask] + P1) in the non-prediction part, so that the model can see complete sentences. When predicting x3, the original (x5 + P5, x4 + P4, x2 + P2) and the ([mask] + P3, [mask] + P1) with additional tokens and location information can be seen in the non-prediction part, and the previously predicted (x6 + P6) can be seen in the prediction part. The model that compensates the position of the query stream and the content stream by the above method can greatly reduce the input inconsistency between pre-training and fine-tuning.
ENTITY RELATION EXTRACTION MODEL BASED ON MPNet
The task of entity relationship extraction is the basis for establishing the COVID-19 clinical knowledge map. This section proposes an entity relationship extraction model suitable for COVID-19 clinical medical texts by fusing the pre-training language model MPNet, which has performed well recently.
This section first introduces the basic structure and theoretical background of the model, then determines the entity relationship types and relationship extraction tasks, completes the annotation of experimental data sets and constructs a relationship extraction corpus based on COVID-19 clinical trial texts. In order to further improve the expressiveness of the model, this model uses the Dropout overfitting mitigation strategy to improve the generalisation ability of the model and then through a number of comparative experiments to verify that the model has a better effect on the clinical medical entity relationship extraction task.
Design of entity Relation extraction model for clinical trials
Relation extraction (RE) is one of the most concerned sub-tasks in the information extraction task. The purpose is to extract the semantic relationship between two or more entities from the text, so as to build the knowledge map of related fields. Given the RE task training set , where S is the sample set, E1 and E2 are two entity sets and Re represents the entity relationship set. For any di,j,k,l ∈ D denoted as , there exists e1i ∈ E, e2j ∈ E corresponding to rk ∈ R in the sentence sl ∈ S. Relation extraction task obtains relational mapping by training the model on set D, and maximises the correct mapping proportion of prediction samples in given verification set V and test set D′ with the same data distribution as the training set D.
Yang et al. [24] first proposed a hierarchical Attention network structure. By using two-layer Attention mechanisms to encode words and sentences, they can distinguish between high-quality information and low-quality features, thus optimising the previous model architecture. Therefore, the coding layer of this paper first extracts the forward and backward features of sequences through bidirectional-GRU (BiGRU) to capture the context representation of entity relationships containing semantic dependency and hierarchical structure information. Then, the multi-level Attention mechanism (MATT, including word-level Attention and sentence-level Attention) is introduced to splice word vectors, and the self-attention weight is obtained through the self-attention mechanism. The two are multiplied to obtain the sentence-level vector representation. Then, the semantic features between sentences are obtained through the sentence-level Attention layer, and the weights are spliced. Finally, the output vectors of the coding layer are weighted and summed to generate the sentence-level feature representation; Input the features output from the previous module into the CRF model of the output layer to complete the RE task of the entity.
The input unit includes the input of the text to be trained and the word vector representation of the corpus obtained through the embedding layer (MPNet). After inputting the samples to be trained, the word representation layer uses the MPNet pre-training language model to perform vector representation of words. Suppose there is a sentence sequence in the training corpus V. For a word xi in this sentence, the corresponding word vector is obtained through matrix E ∈ RD×|V| mapping. Finally, the sentence is converted to through the model, where D is the dimension of the word vector, and |V| is the size of the feature matrix of the training corpus. The word vector dimension and position vector dimension of this model are set as 768 with reference to BERT_Base configuration. The text information combined with relative position and absolute position information is obtained by splicing the two vectors. The input calculation method accepted by the transformer is as follows, where pos is the position index, dmodel is the vector dimension and i is the dimension index.
Formula (16) and (17) represent the 2i, 2i + 1 components of the encoding vector of position pos. The embed mentioned in formula (18) is actually the embedding process in the transformer. The process of embedding is to digitise all useful information (information that needs to be given to the model), which is mainly reflected in the digitisation of location information. It can be understood as a function here Figure 4.
[IMAGE OMITTED. SEE PDF]
The RE model proposed in this study uses the BiGRU neural network in both the word-level and sentence-level encoding layers. BiGRU retains the long-term memory ability of LSTM [25] for long texts and also has the ability to learn two-way encoding of texts. At the same time, it simplifies BiLSTM's internal structure. By constructing forward GRU and reverse GRU, the hidden deep features in the input vector are extracted, and the contextual semantic relations in the input sequence are fully learnt and encoded. Given the current time t, BiGRU will calculate the forward hidden state according to the hidden state of the previous time t − 1 and the current sequence input It. At the same time, the reverse hidden state is calculated according to the input of the hidden state at time t − 1 and the current sequence It and then perform the weighted summation of the above two state vectors to obtain the hidden layer state ht at the current moment. The execution process of and is shown in formulas (19) and (20). The calculation method of the encoding output ht of the word-level BiGRU model is shown in formula (21):
Similarly, given the input sentence Si, encode Si according to the sentence-level BiGRU coding layer, and the calculation method of the output hi is as follows:
According to the hidden state output sequence of the BiGRU module, the model introduces a Self-Attention mechanism to learn word-level features and merges the above features with sentence-level feature vectors. For the spliced information, the sentence-level Attention layer is introduced to calculate the attention [26] weight vector and then the weight vector is normalised to obtain the weight probability distribution assigned by the sentence-level Attention mechanism and finally the weighted summation with the output vector of the BiGRU layer is obtained via the feature vector output by the Attention layer.
Given the hidden state word-level output sequence hw = {hw1, hw2, …, hwM} of the BiGRU layer, let the corresponding feature representation vector be Hw = {hw1, hw2, …, hwM}, where M represents the character length of the sequence. Pass HW into the Self-Attention module to calculate the weight vector α of the hidden state. The calculation process is as follows:
By multiplying the weight vector α and H, the weighted sum m of the output vector is calculated as follows:
The given sentence sequence is encoded by BIGRU to obtain the sentence-level vector mi and then the sentence-level Attention mechanism and context vector mk are introduced. The calculation process is as follows:
After BiGRU feature extraction and MAtt mechanism weight operation, some constraint rules are automatically added to the final predicted label (the output result of the BiGRU layer) through the CRF layer: (1)The first word of the entity should start with “B-” or “O”; (2)Valid patterns should be “O B-[Label]” or “B-[Label] I-[Label]”; the “O I-[Label]” mode is regarded as the invalid mode. Finally, in the decoding stage, the Veterbi algorithm is used to obtain the label sequence with the highest predicted total score in the sequence, and use it as the entity relationship classification result of the COVID-19 clinical trial registration paper.
Overfitting mitigation strategies
In practical applications, sampling errors are often mixed into machine learning models during training, and these sampling errors are fitted during the training process, resulting in the model often only getting better performance on the training set. This type of poor generalisation is called overfitting when applied to test, validation or other datasets that produce the opposite result. The main reasons for overfitting can be summarised as follows: (1) The model has too many parameters, the structure is too complex and it has too strong academic ability; (2) There are too few training samples, and the model cannot fully learn all the features in the training set; (3) The imbalance between the content of training samples and test samples leads to unsatisfactory real output results.
Theoretically, in order to avoid the overfitting problem of the model, the model can learn more features by increasing the number of training samples, so that the model can perform better on the test samples. However, in practical application, the workload of the entity labelling task is extremely large, especially the overall number of COVID-19 clinical trial abstracts is not large, and it also contains various complex medical terms, which leads to inefficient labelling work and ultimately leads to insufficient training samples. Although the model can achieve better performance on the training set, there may be inaccurate classification results on the test set and validation set. In order to solve the problem of small sample overfitting, this paper refers to the Dropout overfitting mitigation strategy proposed by Hinton et al [27]. By ignoring some neurons according to a certain probability in each training, the generalisation ability of the model is improved without relying too much on local features. Under normal circumstances, the Dropout value is selected in the range of 0.2–0.5. The model in this study uses the Dropout strategy in both the word representation layer and the encoding layer.
EXPERIMENTS
Introduction to experimental data
This article uses the COVID-19-related clinical trial registration data in the US clinical trials registry (CT, ). CT contains private or publicly funded clinical research projects carried out by clinical researchers around the world, including information about medical research of human volunteers, such as disease, intervention measures, title of the study, experimental design, inclusion/exclusion criteria and location of the study. At present, many researchers have used this database for investigation and research.
Krishna et al. [28] evaluated the evidence characteristics and expected strength of COVID-19 research that is registered on the platform and proposed the problems and improvement direction of relevant clinical trials. Amy et al. [29] analysed the death cases in the clinical trial and compared the papers published in the trial with the data in the trial and found that the records were inconsistent. Pradhan et al. [30] developed a Python-based software application (EXACT) and automatically extracted the data required for meta analysis from the database in the format of a spreadsheet. extracted data with 100% accuracy, saving 60% of the time compared with the method of manually extracting data from journal articles. Federer et al. [31] used Python script to extract information from ClinicalTrials and then used regular expressions and drug dictionaries to process and structure the relevant information into a relational database, where they conducted data mining and pattern analysis on adverse drug events. The database can be used as a tool to help researchers find the drug adverse event relationship, so as to develop, reposition and reposition drugs.
Entity relationship definition
The relationship extraction task based on COVID-19 clinical trial registration data is to determine the semantic relationship between every two medical entities in the abstract. When there is an association between two entities, the relationship extraction task is treated as a classification task. Before this, the most critical thing is to determine the type of clinical entity relationship, and then define the entity relationship in the sentence according to the meaning of the sentence.
The clinical medical records contain rich medical information. Meystre et al. [32] suggested that “problem oriented medical records” be adopted to collect information guiding diagnosis and nursing plans, as well as a series of medical behaviours for patients' problems. Uzuner et al. [33] classified the semantic relationships of clinical medical related abstracts, created an entity relationship system targeting problem-oriented records according to the characteristics of data and defined a series of semantic relationships involving patients' medical problems. Based on the former research work, this paper defines the following six clinical trial data relationship types: Disease-Treatment Relationship Type; Disease-Test Relationship Type; Disease-Biopharmaceutical Relationship Type; and Disease-Extent Relationship Type. The entity relationship types defined in this study and their related concepts are listed in Table 1.
TABLE 1 Clinical trial data entity relationship type definitions.
No. | Entity relationship type | Description of related meanings |
1 | Disease_Procedure | Relationship between disease name and corresponding treatment |
2 | Disease_Item | The relationship between the disease name and various inspection items and physiological indicators involved in the test |
3 | Disease_Drug | Relationship between disease names and related biologics used in intervention trials |
4 | Disease_Severity | The relationship between the disease name and the severity of the patient's infection |
5 | Severity_Drug | Relationship between severity of patient infection and related biologics used in intervention trials |
6 | Symptom_Drug | The relationship between clinical manifestations of disease and corresponding use of related biological agents |
Finally, use the Brat tool to annotate the samples and export the annotation data through the ann format file. Table 2 [34] is an example of the annotation results of the ann file. After completing the labelling task, match the labelling file with the content in the text file and preprocess the data, focussing on removing blank lines and illegal characters; the samples are divided by a combination of punctuation marks and sliding windows, and the two entity pairs are combined with each other to segment a sentence sequence containing both and filter out too long sentences to avoid noise that is affecting the training of the model.
TABLE 2 Example of ann file annotation results.
Relationship number | Entity relationship | Entity pair |
R1 | Disease_Drug | Arg1:T4 Arg2:T6 |
R2 | Disease_Severity | Arg1:T4 Arg2:T7 |
R3 | Disease_Drug | Arg1:T13 Arg2:T15 |
R4 | Disease_Item | Arg1:T13 Arg2:T23 |
R5 | Severity_Drug | Arg1:T8 Arg2:T7 |
R6 | Symptom_Drug | Arg1:T17 Arg2:T19 |
R7 | Disease_Drug | Arg1:T21 Arg2:T20 |
R8 | Disease_Item | Arg1:T21 Arg2:T26 |
R9 | Disease_Item | Arg1:T29 Arg2:T32 |
Evaluation indicators
The RE task [35] usually uses the precision rate (P), recall rate (R) and F1 value as the evaluation indicators of the model. The corresponding calculation formula of each indicator is as follows:
Experimental parameter configuration
This experiment is built and run in the Ubuntu 16.04 operating environment, the version numbers of Python and PyTorch are 3.7.0 and 1.7.1, respectively, and the server graphics card configuration is NVIDIA RTX3080 (16G). The hyperparameters of the MPNet pre-training model are consistent with Bert_Base, which is composed of 12 Transformer layers; the hidden layer dimension is set to 768, and the 12 attention head modes are used in this study. GELU is used as the activation function. In the training phase, the maximum sequence length is 256, the batch_size is 128, the MPNet learning rate is set to 3e-5, the Dropout in the training phase is set to 0.1 and the model is trained by the Adam optimisation algorithm [36].
Parameter optimisation
In order to alleviate the problem of model over-fitting, the Dropout selection experiment is used to compare and analyse the evaluation indicators corresponding to different Dropout values, so as to select the parameter values that are suitable for the model. The experiment sets five Dropout parameter values of 0.1, 0.2, 0.3, 0.4 and 0.5. The experimental results are shown in Figure 5, where the ordinate is the score of each index, and the abscissa is the Dropout value set in the experiment [37]. The results show that the F1 score increases first and then decreases with the change of the value. When the Dropout value is 0.3, it reaches the highest point. At this time, the model obtains the best F1 score.
[IMAGE OMITTED. SEE PDF]
Experimental results and analysis
In order to verify that the proposed medical RE model effectively improves the classification effect of COVID-19 clinical trial registration data, the following five methods are designed for comparison: (1) BiGRU-CRF benchmark model, which implements word embedding through the glove layer on SemEval 2014 dataset and then uses BiGRU module as the backbone network, which reduces the complexity of the model and effectively alleviates the possible over fitting problem caused by BiLSTM compared with BiLSTM. Then the CRF module is connected to jointly train the attention vector and the label vector to obtain the entity relationship label corresponding to the text; (2) In the BiGRU-Softmax model, in order to verify the effectiveness of the CRF sequence label layer of this model, the CRF reasoning layer in the previous model is removed, and the Softmax function output is directly used as the result. BiGRU-Softmax is also one of the main relationship extraction methods widely studied before; (3) The BiGRU-Att-CRF model, which uses Word2vec word embedding as the input representation of the text. At the same time, based on the BiGRU-CRF benchmark model, it introduces a multi-layer attention mechanism to learn the impact of different features on relationship classification from the two dimensions of words and sentences and improve the training efficiency and recognition effect of the model by reducing model parameters and setting different weight values; (4) The BERT-BiGRU-CRF model achieves better context dependency and parallelism through the Bert pre-training language model. At the same time, BiGRU is used to fine-tune the results generated by the upper layer, which can more accurately extract the effective features of the text; (5) The XLNet-BiGRU-CRF fusion model [38] uses the well-performing XLNet model to fully integrate the contextual features to obtain the semantic representation of the input sequence and then complete the feature extraction through the BiGRU-CRF network and calculate the label sequence probability to output the final predicted label.
A number of comparative experiments were established in the same experimental environment. The results are shown in Table 3 and Figure 6. Compared with other methods, the relationship extraction method based on the MPNet language model and Matt mechanism has improved each evaluation index. The specific comparative analysis is as follows:
-
In the sequence annotation layer, the benchmark model BiGRU-CRF is compared with BiGRU-Softmax. It is found that Softmax's label prediction effect is not as good as that of the CRF model. The F1 value of the former is 58.53%, while the F1 value of the latter is 61.85%. After the introduction of the CRF layer, the F1 value of the entity relationship extraction model has increased by 3.32%. This shows that the CRF algorithm really plays an important role in the text relationship classification of COVID-19 clinical trial.
-
The BiGRU-Att-CRF model improves the overall performance by introducing an Attention layer into the base structure of the benchmark model. Comparative experiments show that this method improves the precision by 2.04% and the recall rate by 2.01%. The F1 value is increased from 61.85% to 63.88%, which proves that the Attention mechanism can notice the relationship dependencies between contexts. By dynamically adjusting the weight parameters of each element to focus on more similarities with the input elements and considering both global and local connections, the model has parallel computing capabilities, thereby effectively improving the expressiveness of the model.
-
In the text representation layer, the BERT-BiGRU-CRF and XLNet-BiGRU-CRF, which incorporate pre-trained language models, are compared with the benchmark models. Table 3 shows that all indicators have been improved after the introduction of the language model, and the XLNet model has certain advantages in transfer learning. The MPNet-BiGRU-MAtt-CRF model proposed in this study is compared and analysed with the former two. The overall accuracy of the RE model using the MPNet pre-training model and the multi-layer Attention structure is higher, and the F1 value is increased by 5.42% and 3.18%, which proves that the fusion model proposed in this study has better performance.
TABLE 3 Comparison results of each experiment (%).
Model | Accuracy | Recall | F1 value |
BiGRU-CRF | 58.32 | 65.83 | 61.85 |
BiGRU-Softmax | 54.51 | 63.20 | 58.53 |
BiGRU-Att-CRF | 60.36 | 67.84 | 63.88 |
BERT- BiGRU-CRF | 64.19 | 68.07 | 66.07 |
XLNet- BiGRU-CRF | 66.03 | 70.75 | 68.31 |
MPNet-BiGRU-MAtt-CRF(Ours) | 70.16 | 72.88 | 71.49 |
[IMAGE OMITTED. SEE PDF]
CONCLUSIONS
This paper proposes a deep learning method based on the COVID-19 clinical trial data RE model. The model adopts the MPNet model, BiGRU network, MAtt mechanism and CRF reasoning layer integrated architecture to improve the problem that static word vectors cannot represent ambiguity through pre-training language models, using BiGRU network to replace the current general BiLSTM structure to obtain feature vectors from the input and make full use of the previous and next token information of each word. At the same time, the simplified LSTM network structure improves the training efficiency of the model. In addition, word-level and sentence-level attention mechanisms are also introduced into the model to fully learn different features to improve the effect of relation classification. Through comparative experiments, it is proved that the deep learning model used in this study has good performance in the task of entity-relationship classification of COVID-19 clinical texts.
AUTHOR CONTRIBUTIONS
Su Qianmin conceived the idea. Pan Wei and Cai Xiaoqiong designed the study. Cai Xiaoqiong did the analyses. Su Qianmin and Pan Wei wrote the main manuscript text. Cai Xiaoqiong prepared Tables 1 to 3 and Figures 4–6. Pan Wei prepared Figures 1–3. Huang Jihan. provide expertise and guidance in neural networks. Ling Hongxing and Huang Jihan assisted in our research, including searching papers and doing preliminary reading of papers. All authors reviewed the manuscript.
ACKNOWLEDGEMENT
This work was supported by Science and Technology Innovation 2030—Major Project of “New Generation Artistic Intelligence (2020AAAA0109300)”.
CONFLICT OF INTEREST STATEMENT
The author reports no conflicts of interest in this work.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Yu, H., Liu, J.: Overview of international clinical trial registration. J. Integr. Chin. West. Med. 5(003), 234–242 (2007). [DOI: https://dx.doi.org/10.3736/jcim20070302]
Rongen, Y., Jiang, X., Dang, D.: Named entity recognition by using xlnet‐bilstm‐crf. Neural Process. Lett. 53(5), 3339–3356
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
With the rapid development of biomedical research and information technology, the number of clinical medical literature has increased exponentially. At present, COVID‐19 clinical text research has some problems, such as lack of corpus and poor annotation quality. In clinical medical literature, there are many medical related semantic relationships between entities. After the task of entity recognition, how to further extract the relationships between entities efficiently and accurately becomes very critical. In this study, a COVID‐19 clinical trial data relationship extraction model based on deep learning method is proposed. The model adopts MPNet model, bidirectional‐GRU (BiGRU) network, MAtt mechanism and Conditional Random Field inference layer integration architecture and improves the problem that static word vector cannot represent ambiguity through pre‐trained language model. BiGRU network is used to replace the current Bi directional long short term memory structure and simplify the network structure of Long Short Term Memory to improve the training efficiency of the model. Through comparative experiments, the proposed method performs well in the COVID‐19 clinical text entity relation extraction task.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
2 Shanghai Business and Information College, Shanghai, China
3 Center for Drug Clinical Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China