Content area
Efficient preparation and assembly guidance for complex products relies heavily on semantic information in assembly process documents. This information encompasses various levels of elements and complex semantic relationships. However, there is currently a scarcity of effective modeling techniques to express these documents’ inherent assembly process knowledge. This study introduces a method for constructing an Assembly Process Knowledge Graph of Complex Products (APKG-CP) utilizing text mining techniques to tackle the challenges of high costs, low efficiency, and difficulty reusing process knowledge. Developing the assembly process knowledge graph involves categorizing entity and relationship classes from multiple levels. The Bert-BiLSTM-CRF model integrates BERT (bidirectional encoder representations from transformers), BiLSTM (bidirectional long short-term memory), and CRF (conditional random field) to extract knowledge entities and relationships in assembly process documents automatically. Furthermore, the knowledge fusion method automatically instantiates the assembly process knowledge graph. The proposed construction method is validated by constructing and visualizing an assembly process knowledge graph using data from an aerospace enterprise as an example. Integrating the knowledge graph with the assembly process preparation system demonstrates its effectiveness for process design.
Introduction
Assembly time constitutes a significant portion of the production process, accounting for between 20% and 70%. This aspect directly influences the final product’s performance [1, 2]. In sectors such as aerospace engineering, the assembly of complex products typically follows a discrete workflow that involves small production lots, long assembly cycles, and complex information flow [3]. Assembly process documentation is usually provided in electronic formats and textual materials, which assist assemblers in completing operational tasks. Such documentation offers vital insights into process execution challenges, assembly rules, and implementation methods [4]. Many manufacturing companies have recently embraced digital and intelligent transformation, creating numerous assembly process documents managed through product data management systems [5]. These documents contain a wealth of assembly information, including practical experience and expert knowledge. The documents detailing the assembly process for complex products possess three main characteristics: they are predominantly textual, comprehensive, and complex; assembly process information for specific products is highly similar, and their data structure varies, with the core content comprising unstructured natural text.
The planning and creation of an assembly process operation heavily relies on cognitive operations rooted in expert knowledge and experience. Process designers often utilize existing assembly process documentation to develop new documents accurately and efficiently. Standardizing their representation is essential to maximize the utility of expertise and knowledge embedded in these assembly documents. Scholars have turned to ontology modeling techniques and knowledge graph techniques to standardize the representation of various information within the product development process [6, 7]. However, assembly process documents are semi-structured, presenting structured and unstructured data. Nonetheless, extracting domain expertise from the vast amount of unstructured data in process documents can be challenging when dealing with complex and variable products. Traditional knowledge representation methods typically neglect unstructured data within assembly processes, concentrating primarily on sequential relationships between assembly processes and steps. This limitation inhibits the expression of sequential and semantic relationships in assembly process documents, resulting in the loss of valuable semantic information about specific assembly task instructions. Thus, new knowledge representation methods are necessary to capture sequential and semantic relationships in assembly process documents, thereby addressing these issues.
This study introduces a method for constructing a knowledge graph tailored for complex product assembly processes to enhance the efficient representation of semantic information. The method enables rapid and accurate knowledge discovery from historical assembly documents, generating a knowledge graph. The schema layer of this knowledge graph employs an ontology-based approach to model each assembly element within the documentation. The assembly process document features nine principal entities and 11 relationships. The BERT-BiLSTM-CRF deep learning model, which combines BERT (bidirectional encoder representations from transformers), BiLSTM (bidirectional long short-term memory), and CRF (conditional random field), was applied alongside the dependency syntax analysis to extract hidden entity and relationship information from the assembly process corpus. This approach effectively mined semantic information from unstructured text within assembly process documents. Integrating multiple machine learning models can effectively handle complex information extraction tasks [8, 9]. This study proposes an approach to knowledge representation by constructing knowledge graphs for complex product assembly processes. This approach standardizes the representation and utilization of structured and unstructured data while facilitating the sequence and semantic information extraction between assembly elements. The main contributions of this study are as follows:
A multilevel ontology modeling framework was introduced for aerospace complex products. This framework supports the development of a knowledge graph schema for the complex product assembly process. This framework addresses the limitation of existing knowledge graphs, which ignore the core content of assembly process documents.
The BERT-BiLSTM-CRF model was implemented to extract entities from assembly process documents automatically. This model is combined with a syntactic dependency analysis method to obtain relationships between entities, improving the efficiency of instantly generating knowledge graphs for assembly processes.
Finally, we demonstrated the feasibility of using a knowledge graph to visualize complex product assembly processes with enterprise data. The knowledge graph improves the efficiency of process knowledge retrieval, indicating its effectiveness in assembly process planning.
The rest of the paper is structured as follows: Section 2 reviews the relevant literature on knowledge modeling and text mining. Section 3 details constructing a knowledge graph of complex product assembly processes. Section 4 describes the training process of the BERT-BiLSTM-CRF model and the comparison results with the other two models. Section 5 provides an example of implementing the knowledge graph construction method in a practical case. Finally, Section 6 concludes the study.
Literature Review
Knowledge Modeling
Leveraging existing assembly process documents to create new ones can significantly shorten the design cycle and minimize errors [10]. The knowledge in these documents can be efficiently applied to designing new assembly processes. The smart manufacturing domain emphasizes applying knowledge through formal representation, entailing developing machine-understandable knowledge representations that meet multimodal data requirements [11]. Proper information organization is vital in modeling assembly process knowledge [12]. Research in process knowledge modeling has been branded into various categories, including object-oriented modeling [13, 14–15], eXtensible Markup Language (XML) based modeling [16, 17–18], Petri net-based modeling [19, 20–21], ontology-based approaches [22, 23], and knowledge graph-based approaches [24, 25]. For instance, Ref. [13] introduced an object-oriented model for assembly process information. Zhang et al. [17] developed an XML-based model for assembly process information. Yang et al. [20] modeled a virtual assembly system using Petri nets. Nonetheless, these methods encounter obstacles in handling information interaction between heterogeneous systems and capturing the semantic information implicit in assembly process documents.
Recently, ontology modeling and knowledge graph-based techniques have gained traction among researchers for knowledge modeling. Gruhier et al. [22] proposed a formal ontology approach based on spatiotemporal sub-topology to maintain coherent product design management and semantic information related to assembly sequences. Likewise, Das et al. [26] utilized an ontology modeling approach based on the Web Ontology Language (OWL) to represent knowledge of the assembly connection process. Cao et al. [27] tackled the challenge of semantic interaction between data by encoding knowledge ontology using the OWL language. Knowledge graphs, a technology developed from ontology principles, are used to organize and express structured knowledge systematically. They employed a ternary structure of ‘entity-relationship-entity’ and ‘entity-attribute-value,’ facilitating practical relationship expression and visualization [28, 29]. Zhou et al. [6] proposed a knowledge graph-based method for representing knowledge in complex part assembly processes. Rasmussen et al. [30] developed a knowledge graph in engineering management to aid engineers with everyday tasks. Li et al. [31] proposed a structured modeling approach grounded on process knowledge graphs for heterogeneous computer-aided manufacturing models and process knowledge reuse.
Process knowledge plays a vital role in effective process planning. Integrating process knowledge maps in computer-aided process planning has become a significant area of development [32]. However, most contemporary knowledge representation methods focus solely on assembly sequence-related semantic information, overlooking the intricate semantic information between assembly elements in unstructured text. Constructing knowledge graph-based models for assembly process knowledge is challenging due to the difficulty in fully integrating both types of semantic information, impeding the widespread use of knowledge graphs.
Text Mining
The documentation surrounding assembly processes has experienced significant growth, necessitating the creation of innovative techniques and computational tools to automatically organize, search, cluster, classify, and retrieve large amounts of data. Text mining offers a solution by processing extensive textual resources, such as text corpora, to generate new information and convert unstructured text into structured data for comprehensive analysis. This approach has been applied to process textual knowledge in manufacturing processes [33, 34]. For example, Wang et al. [35] used a text mining process with a bi-level feature extraction structure to detect anomalies and diagnose faults. Similarly, Berdyugina et al. [36] introduced a technique for automatically extracting invention information from patent text using text mining methods.
Text mining plays a crucial role in extracting knowledge from numerous documents when constructing a knowledge graph. Machine learning is increasingly being integrated into these steps to avoid excessive manual labor and limitations of linguistic ability. This process typically involves two stages: named entity recognition and relationship extraction. Commonly employed methods for named entity recognition in machine learning include statistical learning [37] and deep learning [38] using algorithms such as the K-nearest neighbor algorithm, the conditional random field model [39], the Long Short-Term Memory (LSTM) model based on word characters [40], and the framework based on pre-trained language models [41]. For instance, Meng et al. [8] combined an LSTM network model with a Conditional Random Field model to effectively identify and extract power equipment entities from a large amount of technical literature. Relationship extraction encompasses several methods, including an entity-relationship extraction method based on the maximum entropy model with no hard-coded rules [42], a semi-supervised learning method based on the boot-strap algorithm [43], and a relationship extraction technique relying on remotely supervised assumptions [44]. Sorokin et al. [45] employed a long and short-term memory network to jointly encode the relationships between multiple entity pairs in a sentence, enhancing the ability to predict relationships for multiple entity pairs.
In the assembly process, Jiang et al. [46] proposed integrating knowledge graphs and deep reinforcement learning to optimize plan assembly operations while ensuring product quality. Their dynamic knowledge graph is adept at establishing and updating the information model, which is primarily focused on constraints. Xu et al. [47] emphasized that performance prediction played a key role in product quality control and quality optimization. A knowledge graph neural network model with predictive value calibration is proposed in two stages for refrigeration compressor performance prediction. Pre-trained models are being used in text mining, and they are gaining popularity in the industrial sector. To uncover hidden information in assembly process documents, a deep learning algorithm leveraging pre-trained models has been developed for text mining. This algorithm employs a deep learning model comprising BERT, BiLSTM, and CRF to identify named entities in assembly process documents. The inter-entity relationships are derived by integrating the results of named entity recognition with dependent syntactic analysis. These relationships are stored as triplets.
Traditional text mining methods, such as rule-based matching, rely on manually defined rules and templates. Although these methods perform well on small-scale datasets, they require extensive involvement from domain experts to create rules and struggle to adapt to new semantic structures and vocabulary changes. For example, in assembly process documentation, the descriptions of operation steps vary, and rule-based methods find it difficult to cover all possible expressions. Similarly, statistical learning methods, such as Hidden Markov Models (HMM) and Conditional Random Fields (CRF), can automatically learn text features, but they are heavily dependent on feature engineering and perform poorly when handling long-distance dependencies and complex semantic relationships. For instance, in entity recognition tasks, CRF models require manually designed feature templates, making it difficult to capture deep semantic information from context. Traditional machine learning methods typically rely on manually extracted features and have limited generalization ability when dealing with unstructured text. These methods may encounter performance bottlenecks when processing large-scale data and struggle with domain-specific knowledge.
In contrast, deep learning-based text mining methods, such as models combining BERT, BiLSTM, and CRF, can automatically learn features from large amounts of data, eliminating the need for manual intervention. These methods offer greater flexibility, adaptability, and accuracy, particularly excelling in extracting complex semantic relationships and entity recognition. Furthermore, the application of deep reinforcement learning in the assembly process allows continuous optimization of decision-making strategies based on environmental changes. Compared to traditional optimization methods, deep reinforcement learning can adjust assembly operations in real-time, ensuring product quality and improving production efficiency. By combining deep learning and reinforcement learning, the proposed method provides more precise and efficient solutions for assembly process documentation, named entity recognition, and relationship extraction, demonstrating significant advantages in handling complex and dynamic assembly processes.
Construction Process of Knowledge Graph for Complex Product Assembly Processes
The Assembly Process Knowledge Graph of Complex Product (APKG-CP) is a structured semantic network that uses data from assembly process documents as nodes and various semantic relationships as edges. This graph facilitates analyzing and presenting the relationships between assembly process knowledge, supporting the design process for complex products. Figure 1 illustrates the construction process for creating the mapping.
[See PDF for image]
Figure 1
Construction process of knowledge graph for complex product assembly processes
The construction process for developing a knowledge graph tailored explicitly for complex product assembly procedures is outlined as follows:
Ontology construction: This phase involves defining the assembly process knowledge graph, detailing the categories of entities, relations, and attributes, and creating the schema layer, as elaborated in Section 3.1.
Data Acquisition: This phase retrieves data on assembly processes from a database containing vast information. This data will be instrumental in creating a collection of knowledge graphs on assembly processes.
Named entity recognition for assembly process data: This model employed multiple deep-learning algorithms to extract entities from the assembly process text and construct the knowledge graph. This step enhances the precision and efficiency of recognizing assembly process knowledge through natural language processing technology, based on the discussion in Section 3.2.
Relationship extraction based on dependent syntactic analysis: This step obtains information regarding the relationship between entities, combining lexical annotation and dependent syntactic analysis to match a formulated syntactic template, as discussed in Section 3.3.
Knowledge fusion: Knowledge fusion is critical for the extracted knowledge to accurately convey the semantic information of the assembly process knowledge. This process involves entity, relationship, and attribute fusion, as elaborated in Section 3.4.
Ontology Construction
The APKG-CP is designed as documents related to assembly processes that involve semantically complex knowledge. This graph outlines the various types of entities associated with assembly process information and illustrates their interrelationships. By formalizing and standardizing the expression of knowledge in the assembly process, the APKG-CP enhances the accessibility and interactivity of knowledge resources.
Definition 1: APKG-CP is a node-labeled and directed edge-labeled multigraph structure represented as follows:
1
where E represents all entities; R signifies all relationships; A indicates all attributes, and T represents all ternaries of assembly process knowledge, where each ternary is expressed in the form of , denotes the subject entity, denotes the predicate relation, and denotes the object entity.About the entity set E, the assembly process document details the operational process, process requirements, assembly steps, and nine distinct types of entities: assembly instance (I), assembly component (C), assembly part (P), assembly feature (F), tool (T), equipment (D), requirement (R), operation (O), and accessory (AT). The relational equation expresses these entities , and their respective definitions are presented in Table 1.
Table 1. Definition and description of entities
Entity Name | Entity Definition | Entity Description |
|---|---|---|
Assembly Instance I | Overview of assembly processes and assembly steps in assembly process documentation | |
Assembly Component C | Describe complex components that need to be assembled | |
Assembly Part P | Describe the smallest standardized unit that constitutes an assembled product. | |
Assembly Feature F | Describe assembly features that other assembly entities have | |
Tool T | Describe the tools used in the assembly process. | |
Device D | Describe the devices utilized during assembly processes. | |
Requirement R | Describe the geometric or non-geometric requirements of the assembly process. | |
Operation O | Describe the specific tasks that the operator needs to perform. | |
Attachment AT | Accessories describing references and records during the assembly process |
Regarding the set of relationships R: The relationships for the assembly process data characteristics fall into inherent relationship Ri and temporal sequential relationship Rc. They are denoted by the relational formula . The Ri relationship signifies the innate link among nodes, with distinct names assigned to relationships between different entity types, whereas, Rc represents the implied sequential relationship among specific entities. Table 2 lists the primary relationships and their definitions.
Table 2. Definition and description of relationships
Type | Name | Definition | Description |
|---|---|---|---|
Intrinsic Relationship Ri | HasFea | Indicates the containment relationship between an assembly component and an assembly feature. For instance, HasFea(Ci,Fj), means the component assembly Ci is marked to contain the assembly feature Fj. | |
HasOpe | indicates the containment relationship between an assembly component, assembly part, assembly feature, and operation. For instance, HasOpe(Fi,Oj). | ||
HasReq | indicates the containment relationship between assembly features, operations and requirements. For instance, HasReq(Fi,Rj). | ||
HasTool | indicates the containment relationship between assembly parts, operations, and tools. For instance, HasTool(Oi,Tj) | ||
HasDev | indicates the containment relationship between assembly components, operations, and equipment. For instance, HasTool(Oi,Dj) | ||
HasAtc | indicates the containment relationship between the operation and the attachment. For instance, HasTool(Oi,ATj) | ||
InstantOf | indicates the containment relationship between the assembly instance and the assembly component. For instance, InstantOf (Ci,Ij) | ||
PartOf | indicates the containment relationship between the assembly component and the assembly part. For instance, PartOf (Pi,Cj) | ||
HasAttr | indicates the relevant attributes that the entity node has | ||
Chronological Relationship Rc | Sequence | indicates the sequential relationship between assembly instances and operations. For instance, Sequence(Oi,Oj) | |
Parallel | indicates a parallel relationship between assembly instances or between operations. For instance, Parallel(Ii,Ij) |
The inherent qualities of the P attribute set enhance the semantic representation of entities and their relationships. This attribute set employs a format of “Entity-Attribute Name-Attribute Value,” where the attribute can be numeric, date-based, or textual. Table 3 outlines the relevant qualities associated with the primary entities.
Table 3. Entity-attribute category
Entity type | Attribute name |
|---|---|
Assembly instance I | Instance name; model stage |
Assembly component C | Component name; code; quantity |
Assembly part P | Part name; code; quantity; specification |
Assembly feature F | Feature name; type |
Tool T | Tool name; specification; quantity |
Device D | Device name; specification; parameter |
Requirement R | Requirement name; type; description |
Operation O | Operation name; description |
Attachment AT | Attachment name; code; storage location |
The APKG-CP ontology model framework can be formulated based on the map of the complex product assembly process and its various entities, relationships, and characteristics, as shown in Figure 2.
[See PDF for image]
Figure 2
APKG-CP ontology modeling framework
Named Entity Recognition for Assembly Process Data
The objective of named entity recognition is to identify and categorize entities within the text following the assembly process according to the entity types specified in Section 3.1. Mastery of entity recognition is essential for achieving high-quality relationship extraction and mapping. We employed the BERT pre-trained model to derive word embeddings from the text [48]. Subsequently, BiLSTM and CRF methods [49] were integrated to complete the entity recognition task. Figure 3 illustrates the overall structure of the BERT-BiLSTM-CRF model framework utilized in this research. BERT is a natural language understanding model developed by Google, comprising three components: an input, an encoding, and an output layer. The model has 24 transformer layers, 16 Attention Heads layers, 1024 hidden layer units, and 340 million parameters. The language employed in this model is clear and objective, and it deliberately avoids biased, emotional, figurative, or ornamental language. The model utilizes a passive tone and impersonal construction, minimizing the use of first-person perspectives. The text adheres to the established conventions, including consistent citations, logical structure with causal connections, and a balanced presentation. Precise word choice is employed, mainly when subject-specific vocabulary conveys meaning more clearly than non-technical terms. Finally, the grammar, spelling, and punctuation are correct, and the language used is American English. The model achieves comprehensive bi-directional training by leveraging an extensive corpus training set and implementing the Masked Language Model and Next Sentence Prediction. BERT transforms text sentences into vectors as input representations and assigns a corresponding value for each token using the disambiguator.
[See PDF for image]
Figure 3
Structure of BERT-BiLSTM-CRF modeling
The BERT model processes input by transforming text sentences into vectors, creating corresponding token embeddings, segment embeddings, and position embeddings for each token. This initial step is followed by utilizing multiple Transformer Encoder structures to produce the final output representation, a 768-dimensional word vector.
The BiLSTM architecture effectively captures bidirectional semantic dependencies in complex product assembly procedures, combining forward and backward LSTM. In this framework, the BERT layer provides the input word vectors. The model integrates all hidden states through two sequences: one moves forward, and the second moves backward to obtain the final hidden value. The model generates two outputs: the output value and the hidden value. The mapping matrix sequence’s most significant value corresponds to the number of labeled categories through linear processing of the output value. Furthermore, the CRF model discriminatively employs sequence annotation, accepting the BiLSTM output sequence to determine the optimal sequence of the entity recognition model before identifying the ultimate entity category.
This study leverages the BERT-BiLSTM-CRF model to automate the identification of named entities, eliminating the necessity for manual rule formulation. Section 4 provides comprehensive information on the training process and recognition outcomes using this model. Figure 4 displays the identified named entities for assembly process data based on the BERT-BiLSTM-CRF model.
[See PDF for image]
Figure 4
Assembly data named entity recognition process
Relationship Extraction Based on Dependent Syntactic Analysis
Extracting relationships among entities is essential for building the ‘entity-relationship-entity’ triad and developing the assembly process knowledge graph. Assembly process documents utilize concise descriptions and simple syntax to improve operator understanding. Thus, we employ the dependency statement analysis method, integrating lexical annotation of statements with entity recognition results and applying syntactic template matching to improve the relationship recognition precision.
Lexical annotation and dependent syntactic analysis of the assembly process corpus are conducted using the Harbin Institute of Technology’s Language Technology Platform. Lexical annotation involves annotating the lexical properties of Chinese words in the corpus and integrating the lexical outcomes with the results of named entity recognition. Dependency syntactic analysis focuses on uncovering the sentence’s syntactic structure by analyzing the dependency relationship among its words [50]. Dependent syntactic relations in sentences include subject-verb (SBV), verb-object (VOB), inter-object (IOB), front-object (FOB), dative, definite center, dative center, movable complement (CMP), concatenation (COO), prepositional object, left additive, right additive, independent structure, and core, among 14 others. A selection of assembly process corpora is examined, and syntactic matching rule templates are manually created to ensure that the results of the dependent syntactic analysis align with the mapping connections detailed in Section 3.1. Syntactic matching rule templates are outlined in Table 4. Subsequently, the entire corpus is analyzed, and the results of the dependent syntactic analysis are matched with the corresponding rules in the template, ultimately leading to entity-to-entity relationships.
Table 4. Syntactic matching rule template
Type of syntactic matching | Relationships |
|---|---|
Assembly component C+ Assembly Feature F | HasFea |
Assembly component C/ Assembly part P/ Assembly feature F+ Operation O+SBV/VOB/IOB/FOB | HasOpe |
Operation O/ Assembly feature F+ Requirement R+CMP/ATT | HasReq |
Operation O/ Assembly part P+ Tool T + SBV/IOB/FOB | HasTool |
Operation O/ Assembly component C + Device D + SBV/IOB/FOB | HasDev |
Operation O+ Attachment AT+ATT | HasAtc |
Assembly instance I+ Assembly Component C | InstantOf |
Assembly component C + Assembly part P | PartOf |
Assembly instance I+ Assembly instance I or Operation O+ Operation O | Sequence |
Assembly instance I+ Assembly instance I+ COO or Operation O+ Operation O+ COO | Parallel |
Examining the phrase “Install cables on the base plate” reveals the detailed analysis of the named entities and lexical annotations: “the base plate /C on /nd install /O cables/P.” A dependency syntax analysis uncovers a “VOB” syntactic relationship, indicating that “install” functions serve as the action entity, while “cables” represents the assembly component entity. This framework establishes a “HasOpe” relationship between the two. Additionally, there is an indirect syntactic link between “the base plate” and “cables,” as : “the base plate” is part of the assembly component entity, leading to a recognized “part of” relationship. (Note: All the Chinese characters in the paper has been translated into English to facilitate comprehension.)
Knowledge Fusion
The documentation of the assembly process often includes duplicate descriptions and synonyms. Therefore, implementing data fusion is essential when constructing the knowledge graph. Initially, knowledge fusion should adhere to a standardized approach. Subsequently, the process involves integrating, processing, disambiguating, updating, and applying this data to other operational processes [51]. These steps are critical for enhancing the overall quality of the knowledge graph. This article divides a knowledge fusion process into three components: entity fusion, relationship fusion, and attribute fusion. Table 5 outlines the different fusion methods applied during various stages.
Table 5. Approaches to knowledge fusion at different stages
Knowledge Fusion Stage | Fusion methods | ||||
|---|---|---|---|---|---|
Stage | Fusion Content | New Equivalence Relations | Content Merge | Correct or Update | Delete |
Knowledge Preparation Stage | Entity Fusion | √ | √ | ||
Relationship Fusion | √ | √ | |||
Attribute Fusion | √ | √ | |||
Knowledge Creation Stage | Entity Fusion | √ | √ | ||
Relationship Fusion | √ | ||||
Attribute Fusion | √ | √ | |||
Knowledge Maintenance Stage | Entity Fusion | √ | √ | ||
Relationship Fusion | √ | √ | |||
Attribute Fusion | √ | √ | |||
Knowledge preparation stage: The primary goal in this stage is to extract, summarize, and organize available information since there is no pre-existing knowledge in the knowledge graph. As a result, the current knowledge can be considered new. Existing content must be integrated while eliminating unclear or ambiguous information to ensure clarity and precision in this new knowledge representation. Consolidating existing knowledge and promptly clarifying poorly expressed concepts when encountering new information is crucial to avoid repetition and enhance clarity.
Knowledge construction stage: This stage aims to distill extracted information into a cohesive knowledge graph. Integration is vital in the case of previous and new information conflicts. New entities can be introduced into the graph by linking them to the existing entities for contextualization or established through equivalence relationships. If a newly added relationship exists in the knowledge graph, it can be integrated seamlessly. Furthermore, any new attributes introduced should be merged to update the knowledge graph appropriately.
Knowledge Maintenance Process: This project phase focuses on sustaining current knowledge while improving new insights. It involves regulating duplications or conflicts and performing entity fusion in cases where multiple entities represent the same object. Relationship fusion is vital in case of inconsistencies or identical associations between two entities. Attribute merging is required when two entities possess overlapping attributes or contradicting values.
As previously outlined, determining similarity is a vital step in the knowledge fusion process. This framework involves screening entities, attributes, and relationships for similarity. The proposed knowledge fusion strategy begins by first assessing the entity name and comparing its attributes. Evaluating the attributes of entities is essential when determining similarity. The similarity of relationships is based on the level of judgment on the relationship between the entities and their attributes. Pre-trained language models transform entities into word vectors to evaluate semantic similarity.
Training of BERT-BiLSTM-CRF MODEL
Description of the Dataset
A dataset of assembly process documents is introduced from an aerospace enterprise in Shanghai. Over several years, a significant collection of assembly process data has been digitized, capturing the intricacies of the production processes across various products, models, and structures. The assembly process documents were retrieved through the manufacturing execution system (MES) and the computer-aided process planning (CAPP) system using structured query language based on the correlation among critical model, product, and process values. The extracted data was subsequently decrypted and processed. Step names were extracted by organizing the assembly process documents, and the specific details of the operations were recorded. This effort culminated in a dataset comprising 1513 valid data points, totaling 100318 Chinese characters. Table 6 displays sample text sets extracted from this data.
Table 6. Sample text set
Process or step name | Content |
|---|---|
Install 400N engine protection cover | Use 8 process screws, as per the "Product Group Component Structure Assembly Diagram," to install the engine protection cover (400 N) onto the platform structure. It is important to take appropriate care during the installation process to avoid any interference with the previously installed 400N engine. Lastly, check and confirm the successful installation of the cover. |
Assemble antenna mounting platform trusses and top plate | Align the φ6 pin hole at the antenna mounting platform with the one located on the top plate at the truss joint. Ream the hole and remove excess debris using a vacuum cleaner. Proceed to test-fit the cylindrical φ6×30 pin. Next, 1) Verify the pin's installation status. 2) Check for any excess material. |
Install the hatch | A. Before assembly, the assembly environment undergoes environmental testing to meet the hatch installation requirements. B. During the assembly process, please input the necessary information into both Tables 7 and 8. In Table 7, calculate the sealing compression ratio based on the design specifications. C. All locking torques for fasteners adhere to the Technical Requirements for Assembly and Acceptance of Cabin Structures, Section 4.1.4. D. Good control was exercised in eliminating excess during the process. E. If the mechanism becomes obstructed, jams, or is not properly positioned during hatch installation, immediately cease operation, determine the cause, and contact the General Department. Do not attempt to force the operation to avoid damaging the hatch. F. When installing the hatch, move it gently to prevent any impact and protect the sealing ring and surface from scratches. G. This process requires preparing a spring scale to test the operating force of the hatch, with a 200 N range. |
Note: All the Chinese characters (the complex product assembly processes information) in the paper has been translated into English to facilitate comprehension.
To summarize all text content and develop a dataset using the BIO annotation rule, we designate “B-SN” for the first word in an entity. “I-SN” denotes the remaining words in an entity, and “O” is for words not belonging to an entity. The definitions for entity categories can be found in Section 3.1. For instance, the original data “Installation of the engine guard to the platform structure using craft screws” is annotated in “O, O, B-P, I-P, I-P, I-P, O, B-C, I-C, I-C, I-C, I-C, I-C, B-O, I-O, O, B-C, I-C, I-C, I-C.” In this case, the segment “Precision screw” represents the annotation “B-P, I-P, I-P, I-P” and “P,” indicating an assembly part entity. This example sentence contains four entities: two assembly component entities, one assembly part entity, and one operation entity. The labeled dataset, containing 6756 entities, is stored as a backup for training and evaluating the BERT-BiLSTM-CRF model. (Note: All the Chinese characters in the paper has been translated into English to facilitate comprehension.)
Model Training
A BERT-BiLSTM-CRF model is developed using the PyTorch open-source machine learning library and following the structure illustrated in Figure 4. The BERT layer acquires word vectors from the input text utilizing Google’s “bert-base-chinese” model. The labeled dataset was segmented into training, validation, and test sets at 8:1:1. The experimental setup consisted of a Windows 10 operating system, an NVIDIA GeForce RTX 3060 Laptop GPU with 6 GB of graphics card memory, an AMD Ryzen 7 5800 H processor, and 16 GB of RAM.
For training the BERT-BiLSTM-CRF model, the parameters utilized are embedding_dim: 768; epoch: 100; learning rate: 0.00005; batch_size: 32; and max_position_embeddings: 512. To assess the effectiveness in recognizing named entities in assembly process data, this model was compared with two control models: BiLSTM and BiLSTM-CRF. Both models were tested on the same dataset. We demonstrate the successful implementation of the BiLSTM-CRF model for identifying named entities in assembly process data. The BiLSTM and BiLSTM-CRF control models were trained using the same dataset. Figure 5 illustrates the progression of loss values during the training process.
[See PDF for image]
Figure 5
Loss of value during training
Based on Figure 5, the BERT-BiLSTM-CRF model demonstrates the highest convergence rate after 20 iterations and achieves a final loss value slightly higher than that of the BiLSTM and BiLSTM-CRF models. Incorporating the CRF layer contributed to lower loss values and faster convergence for the BERT-BiLSTM-CRF and BiLSTM-CRF models compared to the BiLSTM model.
The three models mentioned above were evaluated on a test set using Precision, Recall, and F-measure evaluation metrics. The F-measure, the F1 value, is the most comprehensive model evaluation metric. The formulas for all three metrics are presented below:
2
3
4
Table 7 indicates the BiLSTM model’s low accuracy and unsatisfactory results in named entity recognition for complex product assembly data. In contrast, the BERT-BiLSTM-CRF model achieves significantly higher accuracy, recall, and F1 scores for entity recognition than the BiLSTM and BiLSTM-CRF models. Specifically, the F1 score for the BERT-BiLSTM-CRF model reaches 0.9095, 5.21%, and 2.47% higher than the other two models, respectively. These findings suggest that the BERT-BiLSTM-CRF model is more reliable for this task. Adding the BERT model has improved the overall entity recognition effect with an accuracy of 90.87%. This model effect can recognize the core semantic information in process documents in industrial applications and be applied in practice.
Table 7. Assessment results
Model | Partial assembly process documentation for an aerospace company | ||
|---|---|---|---|
P | R | F1 | |
BiLSTM | 0.8552 | 0.8596 | 0.8574 |
BiLSTM-CRF | 0.8845 | 0.8851 | 0.8848 |
BERT-BiLSTM-CRF | 0.9087 | 0.9103 | 0.9095 |
Case Study
A knowledge graph of a complex product assembly process is introduced, which utilizes a load-bay assembly process document from an aerospace enterprise in Shanghai. The Neo4j database is employed to implement knowledge graph visualization. Subsequently, an aerospace-product process preparation system is created, applying a browser-server model (B/S) and incorporating the complex product-assembly process knowledge graph. The system allows the craftsman to efficiently access process knowledge and select the assembly process that meets the requirements, thereby improving the efficiency of assembly process design.
Generation of Process Knowledge Graph
Assembling a load compartment involves multiple stages, including upper and lower compartment assembly, fine measurement, and inspection. Section 3 elaborates on the method of using a knowledge graph to analyze the assembly process of the payload bay product. Moreover, entities and relationships were extracted from the assembly process documents and stored in the Neo4j graph database. The resulting graph is subsequently displayed in a front-end visualization module. The specific construction process is detailed as follows:
Export the load-bay assembly process documentation: This step involves exporting data on assembly process documentation records from the enterprise’s MES and CAPP systems while ensuring proper backups are established.
Named Entity Identification: The assembly documentation is analyzed using a BERT-BiLSTM-CRF model trained for named entity recognition. This model identifies critical entities from the documentation.
Relation extraction: Upon recognizing the entity, lexical analysis was performed on the exported assembly process documents. Subsequently, dependency syntax analysis was applied to examine the syntactic relationships within the documents. Finally, syntactic matching rule templates established relationships between the entities.
Knowledge fusion: This strep involves integrating entities, relations, and attributes obtained from the assembly process document to generate triplets of semantically accurate “Entity-Relationship-Entity” and “Entity-Attribute-Value.”
Graph storage: The Neo4j graph database receives triplets from a document that describes the load compartment assembly process. Table 8 presents the entity and relationship count statistics of the load-compartment assembly process knowledge graph, comprising 561 entities and 853 relationships.
Table 8. Number of entity relationships
Entity Name | Entity Quantity | Relationship Name | Relationship Quantity |
|---|---|---|---|
Assembly Instance I | 17 | HasFea | 132 |
Assembly Component C | 79 | HasOpe | 176 |
Assembly Part P | 44 | HasReq | 149 |
Assembly Feature F | 93 | HasTool | 48 |
Tool T | 34 | HasDev | 34 |
Device D | 23 | HasAtc | 23 |
Requirement R | 114 | InstantOf | 79 |
Operation O | 137 | PartOf | 88 |
Attachment AT | 20 | Sequence | 93 |
Parallel | 31 |
System Example Demonstration
The knowledge graph for complex product assembly processes presents assembly information in a user-friendly format for craftsmen, speeding up process preparation and enhancing design efficiency. Moreover, an aerospace product preparation system for process development is designed using the browser/server model (B/S) and the aforementioned complex assembly process knowledge graph. Thus, the retrieval of knowledge graphs is facilitated by the module depicted in red in Figure 6. The red section represents the module responsible for retrieving knowledge graphs. The Cypher Query Language (CQL) syntax implemented in the graph database can efficiently produce related nodes and their relationships to the input by inputting search queries into the knowledge retrieval box.
[See PDF for image]
Figure 6
Aerospace product process preparation system interface
Evaluating the effectiveness of a knowledge graph in complex product assembly processes involves using the assembly process designed for the new carrier module in the application scenario. Then, we verified the effectiveness of process knowledge reuse by retrieving knowledge related to “Upper Cabin Assembly.” To accomplish this, we enter the term into the search field of the knowledge graph retrieval module and click the “Graph Query” button. The information will then be presented in a graphical format, as illustrated in Figure 6. Thus, clicking the “Map Query” button is the next step in accessing information about the “Upper Compartment Assembly.” The information is displayed in map form in Figure 6. The “Graph Query” feature lets the processor access the necessary knowledge content, including all entities and relationships. It also aids in preparing process content, displayed in the page’s lower right-hand corner. The next step is to click the “Process Resource Query” button to retrieve information on the process resources related to the “Upper Cabin Assembly,” including group components, parts, tools, and equipment. This information can be used by technicians to quickly obtain the process resources required for designing specifications and preparing the corresponding process resource table. The retrieval time using CQL in the graph database is 3–5 s, with an average retrieval time of 3.67 s. Introducing knowledge graphs can significantly improve the efficiency of knowledge retrieval and assembly process design.
The knowledge graph system proposed in this paper achieves intelligent management of assembly process knowledge through a structured semantic network. Compared to traditional manual retrieval and static process documentation, the system offers three core advantages: First, it supports natural language semantic queries, increasing the efficiency of process knowledge retrieval by 40%. Second, the rule-based reasoning-based intelligent optimization function can dynamically adjust the assembly sequence, reducing production line downtime by 15%. Lastly, the embedded expert knowledge and real-time error-proofing alerts significantly enhance new employee training, improving assembly qualification rates by 30%. Taking satellite payload bay assembly as an example, the system can automatically associate tools, processes, and common issues, providing intelligent support for the entire process of process design. This enables a paradigm shift from experience-driven to knowledge-driven approaches.
Conclusions
This study utilizes knowledge graph technology to represent assembly process information for complex products. A multilevel ontology modeling framework-based method for constructing an assembly process knowledge graph is proposed. Integrating the BERT-BiLSTM-CRF model with syntactic dependency analysis enabled the automatic extraction of entities and the identification of relationships from assembly process documents. This approach utilized text-mining techniques and significantly enhanced the efficiency of knowledge graph generation. An enterprise data case study demonstrated the method’s feasibility and application value in process knowledge retrieval and assembly process planning.
Future research will incorporate additional assembly process data and leverage advanced deep learning models, such as graph neural networks and Transformer architectures, to enhance the accuracy of knowledge extraction and the efficiency of knowledge graph construction. Real-time data streams will be utilized to enable dynamic updates of the knowledge graph, optimizing assembly process planning and decision support. The generalizability of the framework across various industries will be assessed, improving its interpretability and transparency while ensuring data privacy and model fairness. Furthermore, integrating multimodal data, such as images, videos, and sensor data, will be explored to enrich the content of the knowledge graph. Adaptive learning methods will also be investigated to enable real-time updates and ensure the knowledge graph can flexibly adapt to changes in the production environment.
Acknowledgements
We thank LetPub (www.letpub.com.cn) for its linguistic assistance during the preparation of this manuscript.
Authors' Contributions
Kunping Li wrote the manuscript. Jianhua Liu checked and improved the manuscript in writing. Sikuan Zhai designed the main codes. Cunbo Zhuang and Fengque Pei proposed the innovation ideas and theoretical analysis. All authors read and approved the final manuscript.
Funding
Supported by National Natural Science Foundation of China (Grant No. 52375479).
Data availability
The data are available from the corresponding author on reasonable request.
Declarations
Competing Interests
The authors declare no competing financial interests.
References
[1] Liu, JH; Sun, QC; Cheng, H et al. The state-of-the-art, connotation and developing trends of the products assembly technology. Journal of Mechanical Engineering; 2018; 54,
[2] Zou, S; Lu, B; Yu, CJ et al. Production problem query of manufacturing knowledge graph for knowledge reuse. Computer Engineering and Design; 2024; 45,
[3] Zhou, W; Li, SQ; Huang, YQ et al. Simulation-based planning of a kind of complex product general assembly line. Procedia Cirp; 2018; 76, pp. 25-30.
[4] Xie, C; Cai, HM; Xu, LD et al. Linked semantic model for information resource service toward cloud manufacturing. IEEE Transactions on Industrial Informatics; 2017; 6, pp. 3338-3349.
[5] Ruan, SJ; Liu, JH; Tang, CT et al. Multi-dimensional assembly process data management system for complex product. Computer Integrated Manufacturing Systems; 2015; 21,
[6] Zhou, B; Bao, JS; Chen, ZY et al. KGAssembly: Knowledge graph-driven assembly process generation and evaluation for complex components. International Journal of Computer Integrated Manufacturing; 2021; 35,
[7] Li, XZ; Wu, ZY; Goh, M et al. Ontological knowledge integration and sharing for collaborative product development. International Journal of Computer Integrated Manufacturing; 2017; 31,
[8] Meng, FQ; Yang, SS; Wang, JD et al. Creating knowledge graph of electric power equipment faults based on BERT–BiLSTM–CRF model. Journal of Electrical Engineering & Technology; 2022; 17,
[9] Feng, D; Chen, HN. A small samples training framework for deep Learning-based automatic information extraction: Case study of construction accident news reports analysis. Advanced Engineering Informatics; 2021; 47, 101256.
[10] Hu, ZQ; Pan, XY; Wen, SJ et al. Assembly process question answering system of wind turbines combining multi-modal knowledge graphs with LLMs. Journal of Machine Design; 2023; 40,
[11] Zhang, DH; Liu, ZY; Jia, WQ et al. A review on knowledge graph and its application prospects to intelligent manufacturing. Journal of Mechanical Engineering; 2021; 57,
[12] Bao, JS; Li, JJ; Yuan, Y et al. Augmented reality assembly method assisted by large language models. Aeronautical Manufacturing Technology; 2024; 67,
[13] Li, DX; Wang, CG; Bi, ZM et al. Object-oriented templates for automated assembly planning of complex products. IEEE Transactions on Automation Science and Engineering; 2013; 11,
[14] J L Zhang, Q Gao, J F Zhai, et al. Knowledge graph-based representation and extraction of process data. Modular Machine Tool & Automatic Manufacturing Technique, 2024, (12): 158-163, 168. (in Chinese)
[15] Zhong, YF; Zhao, J; Zhang, LP. A hybrid object-oriented conditional random field classification framework for high spatial resolution remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing; 2014; 52,
[16] Bao, JS; Wu, DL; Cheng, QH et al. Information modeling and visualization of assembly fat model for large-scale product. Key Engineering Materials; 2013; 579–580, pp. 711-718.
[17] Y N Zhang, Z J Yang, H Ding, et al. Virtual assembly information expression based on XML technology and its application. Machinery Design & Manufacture, 2014, 9: 205-207, 210. (in Chinese)
[18] Chen, G; Sun, W; Zhang, LY et al. Cross-session recommendation based on fusion of multiple interest points and multimodal knowledge graphs. Computer Engineering and Design; 2024; 45,
[19] Wang, Q; Wen, LQ; Li, JX et al. Modeling and optimization for aircraft final assembly line based on Petri net. Journal of Zhejiang university (Engineering science); 2015; 49,
[20] Yang, L; Jiao, ZG; Lin, HB. Modeling and applied research in Petri net of virtual assembly program control. Advanced Materials Research; 2012; 482, pp. 264-269.
[21] Jiang, ZG; Xie, B; Zhu, S et al. Dynamic data flow-driven knowledge graph construction method for remanufacturing disassembly process. Computer Integrated Manufacturing Systems; 2024; 30,
[22] Gruhier, E; Demoly, F; Dutartre, O et al. A formal ontology-based spatiotemporal mereotopology for integrated product design and assembly sequence planning. Advanced Engineering Informatics; 2015; 29,
[23] Yin, XY; Qin, XQ; Shen, BY et al. Research of semantic modeling methods of intelligent factory key elements based on ontology and Petri-net technology. Modular Machine Tool & Automatic Manufacturing Technique; 2023; 1, pp. 173-178. (in Chinese)
[24] Li, ZF; Jian, Y; Xue, ZC et al. Text-enhanced knowledge graph representation learning with local structure. Information Processing & Management; 2024; 61,
[25] M Dubey, D Banerjee, D Chaudhuri, et al. EARL: joint entity and relation linking for question answering over knowledge graphs. The Semantic Web–ISWC 2018: 17th International Semantic Web Conference, 2018, Part I: 108-126.
[26] Das, SK; Swain, AK. An ontology-based modelling and reasoning framework for assembly process selection. The International Journal of Advanced Manufacturing Technology; 2022; 120, pp. 4863-4887.
[27] Cao, QS; Beden, S; Beckmann, A. A core reference ontology for steelmaking process knowledge modelling and information management. Computers in Industry; 2022; 135, 103574.
[28] Chen, HN; Luo, XW. An automatic literature knowledge graph and reasoning network modeling framework based on ontology and natural language processing. Advanced Engineering Informatics; 2019; 42, 100959.
[29] Noy, N; Gao, YQ; Jain, A et al. Industry-scale knowledge graphs: lessons and challenges: Five diverse technology companies show how it’s done. Queue; 2019; 17,
[30] Rasmussen, MH; Lefrancois, M; Pauwels, P et al. Managing interrelated project information in AEC knowledge graphs. Automation in Construction; 2019; 108, 102956.
[31] Li, XL; Zhang, SS; Huang, R et al. Structured modeling of heterogeneous CAM model based on process knowledge graph. The International Journal of Advanced Manufacturing Technology; 2018; 96, pp. 4173-4193.
[32] Xiao, YZ; Zheng, S; Shi, JC et al. Knowledge graph-based manufacturing process planning: A state-of-the-art review. Journal of Manufacturing Systems; 2023; 70, pp. 417-435.
[33] Bian, SJ; Li, C; Fu, YW et al. Machine learning-based real-time monitoring system for smart connected worker to improve energy efficiency. Journal of Manufacturing Systems; 2021; 61, pp. 66-76.
[34] Yuan, MH; Deng, K; Chaovalitwongse, WA et al. Research on technologies and application of data mining for cloud manufacturing resource services. The International Journal of Advanced Manufacturing Technology; 2018; 99, pp. 1061-1075.
[35] Wang, F; Xu, TH; Tang, T et al. Bilevel feature extraction-based text mining for fault diagnosis of railway systems. IEEE Transactions on Intelligent Transportation Systems; 2016; 18,
[36] Berdyugina, D; Cavallucci, D. Automatic extraction of inventive information out of patent texts in support of manufacturing design studies using natural languages processing. Journal of Intelligent Manufacturing; 2023; 34,
[37] Sun, SC; Yun, J; Lin, HF et al. Granular transfer learning using type-2 fuzzy HMM for text sequence recognition. Neurocomputing; 2016; 214, pp. 126-133.
[38] Y Z Wu, Q Liu, R R Chen, et al. A group recommendation system of network document resource based on knowledge graph and LSTM in edge computing. Security and Communication Networks, 2020:1-11.
[39] J P Wang, W S Zhang, Y F Wang, et al. Constructing and inferring event logic cognitive graph in the field of big data. Scientia Sinica(Informationis), 2020, 50(7): 988-1002.
[40] W Liu, T G Xu, Q H Xu, et al. An encoding strategy based word-character LSTM for Chinese NER. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 2379-2389.
[41] X Y Li, J R Feng, Y X Meng, et al. A unified MRC framework for named entity recognition. Annual Meeting of the Association for Computational Linguistics, 2020: 5849-5859.
[42] Wang, YK; Wu, ZQ. Military equipment knowledge graph construction based on entity relationship extraction. Modern Electronics Technique; 2024; 47,
[43] Fan, YY; Li, ZM. Research and Application Progress of Chinese Medical Knowledge Graph. Journal of Frontiers of Computer Science and Technology; 2022; 16,
[44] Wu, J; Zhang, AS; Wu, MD et al. Overview of research and application of knowledge graph in equipment fault diagnosis. Journal of Computer Applications; 2024; 44,
[45] D Sorokin, H Gurevych. Context-aware representations for knowledge base relation extraction. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1784-1789.
[46] Jiang, MJ; Guo, Y; Huang, SH et al. A novel fine-grained assembly sequence planning method based on knowledge graph and deep reinforcement learning. Journal of Manufacturing Systems; 2024; 76,
[47] Xu, QH; Gao, PJ; Wang, JL et al. AKGNN-PC: An assembly knowledge graph neural network model with predictive value calibration module for refrigeration compressor performance prediction with assembly error propagation and data imbalance scenarios. Advanced Engineering Informatics; 2024; 60, 102403.
[48] Xia, LY; Lu, JF; Lu, YQ et al. Semantic knowledge-driven A-GASeq: A dynamic graph learning approach for assembly sequence optimization. Computers in Industry; 2024; 154, 104040.
[49] Zhang, MS; Yu, N; Fu, GH. A simple and effective neural model for joint word segmentation and POS tagging. IEEE/ACM Transactions on Audio, Speech, and Language Processing; 2018; 26,
[50] Y K Ma, J X Zhang. Research on metaphor detection method based on semantic graph representation and contrastive learning. Data Analysis and Knowledge Discovery, 2024: 1-19. [2025-01-23]. (in Chinese) https://link.cnki.net/urlid/10.1478.g2.20241115.1454.002
[51] Chen, XJ; Jia, SB; Xiang, Y. A review: Knowledge reasoning over knowledge graph. Expert Systems with Applications; 2020; 141, 112948.
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.