Content area
Authentication of ancient Chinese paintings is crucial to protecting and preserving cultural heritage. However, traditional authentication methods rely heavily on expert knowledge and experience, and are difficult to handle unstructured and multimodal information. In addition, the lack of interactive tools also hinders humanities scholars from effectively applying advanced technologies. In this study, we present ACPAS, an intelligent authentication system that enables an expert-led, AI-assisted authentication model. The system uses Large Language models (LLMs) to interpret the needs of experts and assigns them to the corresponding tool modules for processing. It integrates image processing, text retrieval, and structured databases, and employs interactive visualizations to support reasoning. Case studies and user evaluations demonstrate that ACPAS improves efficiency and results in interpretability. It provides a new paradigm for cultural heritage protection and digital humanities research and promotes the deep integration of artificial intelligence and humanities.
Introduction
The authentication of ancient paintings is an important part of art history research, and it bears the responsibility of inheriting the artistic context and protecting cultural heritage. It mainly uses technical methods to analyze the imagery and textual elements of artworks, infer the creators and the era, judge the authenticity of the works, and ensure their artistic value and cultural purity. Due to the differences in subject matter, materials, and artistic techniques between ancient Chinese paintings and Western paintings (specifically, artistic traditions originating in Europe and North America, particularly forms such as oil painting and watercolor that developed within these regions), the authentication of ancient Chinese paintings has gradually formed a unique academic system.
Currently, there are two main authentication approaches for ancient Chinese paintings. The first is visual element comparison. This method compares the brushwork, color, and other visual elements of the target painting with works of the same background (e.g., the same painter or the same period) to determine the authenticity of the painting. The other is historical material tracing. This method is based on information such as inscriptions and seals in the work, and deduces the authenticity of the painting by analyzing a large number of historical materials. In actual work, the two methods are usually carried out in parallel and complement each other. Traditional authentication work is typically carried out by experts1. However, the authentication work involves many factors and is complex, which puts forward requirements on the experts’ knowledge reserves and experience. In addition, the subjective judgment of experts can easily affect the accuracy of the results2. With the development of interdisciplinary integration, technical methods in other fields have gradually been applied to the authentication of ancient paintings, such as physical and chemical analysis methods of painting paper3 and pigments4,5. Although these methods provide a more scientific basis, they are likely to cause damage to physical objects6.
Computer technology offers significant advantages in both the depth and breadth of analytical capabilities, making it a better solution. It is applied across various aspects of the ancient painting authentication process. In the initiation stage of authentication work, the powerful data storage capability of computer technology is applied to construct large-scale labeled image datasets7, which are used for subsequent rapid retrieval8 and comprehensive comparison9 of image information. In the authentication process, the application of computer vision technology can automatically complete image segmentation10, feature recognition11 and classification12,13 from multiple dimensions such as texture14 and style15 of the image. However, the current application of computer technology is more inclined to be used as an image analysis and processing tool. These cross-disciplinary methods are difficult to be applied by humanities scholars, lacking the integration of the actual work model and the systematic analysis of the knowledge contained in ancient Chinese paintings.
Experts in the digital humanities need more efficient ways to express their needs and obtain information. The recent development of artificial intelligence, especially large language models (LLMs), has promoted the transfer and application of cross-domain knowledge by improving information retrieval and text generation capabilities16. Intuitive multimodal dialog systems have significantly improved the efficiency of human-computer collaboration17,18. In addition, the integration of retrieval-augmented generation (RAG) technology has further improved the interpretability and accuracy of the system19. Autonomous agents based on LLMs have shown significant advantages in intent understanding and reasoning and can provide advanced task orchestration capabilities, especially showing great potential in dealing with complex and dynamic tasks7. Using LLMs as a mediating tool for human-computer dialog has become an effective solution to break through domain barriers8.
Artificial intelligence (AI) agents9 have brought new ideas to the innovation of ancient Chinese painting authentication systems. By applying AI agents, we aim to use LLMs to understand user needs and convert them into instructions, and work with external models to complete tasks7. However, general AI agents lack dedicated methods and data for handling professional tasks and face many challenges. Ancient Chinese paintings face challenges due to their scattered distribution, uneven data quality, and inherent multimodal content (including seals, inscriptions, and images). In addition, existing encoding methods can detect visual components, but often fail to capture deep semantics beyond the visual presentation, such as painting background, symbolic meaning, etc. The fusion of heterogeneous data and reasoning about complex cultural and historical knowledge remain difficult. Therefore, we plan to design an AI agent system specifically for the authentication of ancient Chinese paintings. We will integrate multimodal data, build a knowledge representation and training framework that adapts to the characteristics of this field, and provide systematic auxiliary tools and smooth interaction modes. The system can promote the collaboration between experts and intelligent technology in actual work and give full play to the advantages of both.
In this study, we propose an ancient Chinese painting authentication system (ACPAS), which aims to assist authentication experts in their efficient work through intelligent technology. We first sorted out the workflow of ancient Chinese painting authentication and clarified the key requirements of each step. On this basis, we integrated relevant functional modules and structured databases to build a system prototype. With the help of AI agents, the expert intentions were decomposed into specific tasks, and each module was processed collaboratively, thus realizing an efficient authentication model led by experts and assisted by intelligent technology. We evaluated, analyzed, and verified the effectiveness of the system. This system not only promotes the application of intelligent technology in digital humanities, but also provides a new direction for the innovation of ancient Chinese calligraphy and painting authentication methods.
In summary, the contributions of this paper are as follows: (1) This study proposes an intelligent authentication workflow for practical needs. We systematically extract the key steps in the existing process, design and integrate the corresponding technical modules for optimization; (2) We built a proof-of-concept prototype system (ACPAS). The system provides an intuitive communication interface to support experts in communicating through natural language. A visual component is introduced to help experts clarify the complex relationship between clues and provide authoritative citation evidence to improve the reliability and interpretability of authentication results; (3) We validated the system through case studies and user studies. The results show that the system helps experts conduct more comprehensive fine-grained analysis and improves the efficiency and balance of human-computer collaboration in artwork identification. Expert feedback provides valuable suggestions for the future optimization direction of the system.
Methods
This section introduces the design strategy and overall architecture of ACPAS. First, through expert interviews, we refined the authentication workflow and summarized the key steps that intelligent technology can provide effective support. On this basis, we defined the core requirements of the authentication task and converted them into the functional specifications required for system design. Finally, we explained the architecture of ACPAS, outlining its main modules, implementation methods, and role in solving the practical challenges faced by authentication experts.
Workflow sorting
Over the past three years, we have worked closely with 10 experts (E1-E10) in the field of traditional Chinese culture to gain an in-depth understanding of the model and details of the authentication work and summarize a set of authentication requirements. The participating experts are aged between 20 and 55, with a balanced gender ratio. Among them, four experts focus on ancient Chinese painting research, one focuses on Chinese calligraphy research, and the remaining five are engaged in digital humanities research. All experts have more than two years of research experience.
Specifically, we invited these 10 experts to share their practices in the authentication of ancient Chinese paintings and conducted follow-up interviews. We fully recorded the steps and methods involved, and gained insights into the experts’ thought processes. We focused on discussing two issues: (1) the focus of the authentication work; (2) the problems encountered in previous authentication work. In addition, we also discussed the differences and common problems of each expert’s work through collective meetings, and finally determined the key links and challenges of the current authentication work.
We divide the authentication work into three stages, as shown in Fig. 1.
[See PDF for image]
Fig. 1
Authentication workflow and optimize space.
Step 1: Determine the object. This step involves multimodal information, including seals, images, inscriptions and other endogenous information of paintings. Step 2: Evidence collection. Experts need to consult ancient books and images to filter out valuable information from massive data. Step 3: Reasoning and judgment. The final judgment requires the integration of all previous information. Differences in the early focus of experts often lead to divergent conclusions. R1-R6 illustrate the areas of optimization in each link enabled by computer technology.
Step 1: Determine the object. Determine the authentication target elements. A reasonable entry point has a significant impact on the efficiency of subsequent work. Eight experts believe that this link requires a high level of knowledge and experience of the authentication experts.
Step 2: Evidence collection. This involves collecting relevant text and visual materials. All ten experts believe that this is the most complex stage and is prone to information loss. Experts pointed out that the existing retrieval system lacks structured resources for ancient Chinese paintings.
Step 3: Reasoning and judgment. This stage tests the expert’s multi-line parallel analysis ability. We found that experts will use methods such as mind maps and color cards to summarize and organize clues, and there is no unified pattern.
Design requirements
Based on analysis and structuring of the key steps in the authentication process, we extracted the design requirements (R1-R6) of ACPAS to ensure that the system functions fit the actual work scenarios. The structure and process design of the system are detailed in Fig. 1.
Authentication Requirement Parsing. (Step1)
R1: Image co-segmentation. The system should support experts in adjusting image segmentation strategies autonomously to improve annotation accuracy and retrieval efficiency.
R2: LLM-driven task parsing. The system should leverage LLM to interpret expert needs, assist in task allocation and tool activation.
Multimodal entity retrieval. (Step2)
R3: Image Similarity Retrieval. The system should utilize the database to match similar images and assist experts in performing similarity analysis.
R4: Text content query. The information in the painting should be cross-validated with literature, including the relationship between people, places, time periods and events. These clues are retrieved from the document corpus by expert-driven queries to provide references and evidence for experts.
Evidence Link Reasoning. (Step3)
R5: Information filtering. With the support of LLMs interaction and tool analysis, experts can screen and refine the chain of authentication evidence to assist in judgment.
R6: Visualization interface. Multidimensional data needs to be integrated to assist expert judgment, and authentication materials and results should be clearly presented with the help of icons and colors.
System design
This section introduces ACPAS, an LLM-based AI agent system designed to assist experts through intelligent task decomposition and multimodal reasoning in the authentication workflow. Its interface is shown in Fig. 2.
[See PDF for image]
Fig. 2
ACPAS interface design.
Experts can upload paintings for authentication in original painting view (Fig. 2A) and perform free-form segmentation. We integrate interactive tools for refining the segmentation results based on SAM. Users can use scribble hints10 to adjust structures that the algorithm cannot recognize using scribbles, clicks, or bounding boxes (Fig. 2a1), which helps experts adjust the image content more accurately (R1). Authentication elements candidates view allows experts to temporarily store elements of interest during the authentication process (Fig. 2C). Elements from the target painting are saved in Fig. 2c1, while Fig. 2c2 shows nodes selected from the graph view for further investigation. These stored items support subsequent multimodal interactive queries. We encapsulate a two-level agent architecture based on the DeepSeek-V3 (671B) API. Authentication dialog view (Fig. 2D) enables experts to submit text queries or combine them with image inputs. The system performs query parsing and task decomposition, dispatching subtasks to appropriate modules (R2). Results are then aggregated and presented as textual and graphical responses in the graph view (R6). The authentication graph view (Fig. 2B) shows the interrelated entities extracted from the dedicated database, and experts can click on them to view details (Fig. 2b1). The system supports authentication experts to filter and review historical question-answering rounds (Fig. 2b2). We also designed an interactive panel to help experts adjust the similarity value (Fig. 2b3) to narrow the scope of matching (R5).
The technical route and functional module design of ACPAS are shown in Fig. 3.
[See PDF for image]
Fig. 3
ACPAS system design.
The system supports experts in raising requirements through dialog, and uses AI agents to interpret and assign tasks. Relying on relevant databases, the system builds functional modules such as digital image matching and text retrieval to provide technical support for experts, and assists experts in reasoning and judgment through interactive visualization solutions.
Data sources
The data sources for this study are ACP (Ancient Chinese Paintings), CBDB (China Biographical Database)11, CTP (China Text Project)12 and CHFSD (Chinese Historical Figure Seal Data)13. ACP is based on the 62 volumes and 232 books of painting collections published in the “Series of Ancient Chinese Paintings” cultural project14, and refers to the list of painting collections and related art reviews included therein. Currently, ACP has collected 3596 paintings by 1529 artists, and it is expected that 10–30 segments can be extracted from each painting. The database is constantly expanding as more works are processed and collected; CBDB contains biographical information about 641,568 people, including Kinship, Non-kinship associations, Status, Writings (referring to all forms of a person’s creative output recorded in the CBDB) and other information; CTP includes digital ancient Chinese texts from the pre-Qin period (approximately the 21st century BCE to 221 BCE) to the Republic of China; CHFSD contains more than 2000 people and more than 30,000 seals. It covers the names and Chinese pinyin of the people, their pen names, their birth and death years, their native places, the eras they belonged to, the names of libraries (rooms) or studios, brief introductions, seal graphics, seal explanations, and seal sources.
Image processing module
In ancient Chinese paintings, image data includes segments and seals. Segments reflect the artist’s style and technique, while seals, which are often reused and uniform in color and shape, may come from the artist or later collectors. Seals are easier to trace and useful for inferring a painting’s context via owner identification. The image processing module supports the matching of segments and seals to assist in authentication (R3).
The matching process includes full-image matching and segment-based matching methods. To improve efficiency and accuracy, we use semantically meaningful segments pre-extracted from each painting as matching units15. Although this increases the number of objects in the vector space, this method is more consistent with the authentication workflow and enables more accurate retrieval (Fig. 4).
[See PDF for image]
Fig. 4
Image segment matching function module.
Each image is segmented into multiple semantically meaningful segments, which are then encoded for similarity retrieval.
Initial segmentation is performed via the segment anything model (SAM)20, which employs Vision Transformer (ViT)21 to extract global and detailed features. SAM was evaluated on 23 unseen datasets through a human study, consistently receiving high-quality ratings (mean scores 7–9), demonstrating strong zero-shot generalization. We then use multimodal fusion22 to embed both traditional and deep features, guided by expert-defined criteria including texture, color, and edge characteristics, and convert them into vector representations.
To extract texture features, we apply the Local Binary Pattern (LBP)23 with uniform mapping (P = 8, R = 1), which effectively captures the image’s texture structure:
1
where pc represents the grayscale value of the central pixel, pi denotes the grayscale value of the neighbor of pc, and P is the number of neighboring pixels. The LBP operator compares each pixel with its neighbors to generate a binary code based on intensity differences, which is then converted to a decimal number label. We compute a 59-dimensional normalized histogram over the entire image to represent its texture.Color features are obtained via K-Means clustering (k = 8) in Lab color space24, yielding the pixel ratio of each cluster. For structural features, the Canny edge detector is used to produce a binary edge map25, from which we compute edge densities over a 4 × 4 grid, forming a 16-dimensional vector.
To complement high-level visual attributes, we extract high-level semantic features using a pre-trained ResNet-10126, which effectively learns abstract representations via its deep residual architecture. Finally, the traditional features and the 2048-dimensional deep features from ResNet-101 are concatenated into a unified segment-level representation. This fused vector encodes both low-level and high-level visual cues, allowing for accurate similarity computation across image segments and robust matching between paintings:
2
where f1 and f2 are the feature vectors of two segments, and ||f1 || and ||f2 || are their respective norms. In this way, we can calculate the similarity between each pair of segments. Subsequent modules also use the same method during the seal matching process.We evaluate the performance of image matching in a classification task. Paintings by four prolific Chinese artists—Shi Tao, Zhu Da, Wang Hui, and Wen Zhengming—were segmented into 2934, 1915, 1578, and 1244 slices, respectively. For each artist, 300 segments were used as the test set, with the remainder used to fine-tune ResNet-101. As shown in the confusion matrix (Fig. 5), the classification accuracy ranges from 75% to 85%, demonstrating the model’s effectiveness in capturing artistic style and visual features, and its potential to support expert-level image retrieval.
[See PDF for image]
Fig. 5
The confusion matrix for image matching shows the model’s precision and recall performance in classifying painting segments by painter.
Seal matching
Seal variations result from stamping pressure, wear, and tilt, with occasional forgeries. Matching similar seals helps provide origin clues for expert analysis. To support this, all seals are standardized during database construction (Fig. 6).
[See PDF for image]
Fig. 6
Seal matching function module.
Seal images from different paintings are pre-processed and standardized, and then compared using cosine similarity for retrieval.
The entire process involves denoising, super-resolution enhancement, rotation correction, color calibration, binarization, and morphological processing. We first apply Gaussian filtering27 for denoising to remove noise from the seal image:
3
where G(x, y) represents the value of the filtered image at position (x, y), f(i, j) is the original pixel at (i, j), is the Gaussian standard deviation, and k is the radius of the Gaussian kernel. We apply ESPCN-based super-resolution (OpenCV) to enhance seal resolution28. Then, the Hough Transform29 is used to estimate the seal’s rotation angle , which is corrected based on the derived formula:4
where (x, y) represents the edge points in the image, the Hough Transform converts the line passing through these edge points into polar coordinates, resulting in the shortest distance from the line to the origin and the angle between the line and the x-axis. The most probable angle is then selected for adjustment.For color calibration, the seal image is converted to LAB space, where histogram equalization is applied to the L channel to enhance contrast30. The adjusted L is merged with the original A and B channels, and then converted back to RGB. Binarization is then performed using a thresholding method31:
5
where and represent the weights of the two-pixel classes after image segmentation, and are the means of the two classes, and is the between-class variance. Morphological processing includes erosion and dilation operations to optimize the fine details of the seal32. Specifically, let as the binary seal mask, where indexes pixel coordinates, and as a flat 5 × 5 structuring element. We perform:6
7
8
Similar to segment matching, the processed seal images are encoded and matched using cosine similarity. We selected 30 frequently used seals (≥2 occurrences) from 377 Shi Tao seal images and randomly chose one image per seal for retrieval. A match was considered successful if a homologous seal appeared in the top 3 (excluding the query). The method achieved 93.33% accuracy, confirming its effectiveness on simple seal images.
Text retrieval module
The text retrieval module processes non-image queries, often based on expert interpretation of inscriptions or exploratory inquiries. Upon receiving a query, the system uses a Classifier with prompt guidance (Fig. 8) to perform semantic analysis and extract keywords, including entities (e.g., artist names) and relations (e.g., “create”, “collect”)33. Entity matching is conducted by searching structured attributes (e.g., ID, author, date) and relevant long-form literature entries (R4) (Fig. 7).
[See PDF for image]
Fig. 7
Text retrieval function module.
User queries are processed by named entity recognition (NER) to extract key nouns, which are then used to retrieve relevant records from the structured database and related literature. Retrieved results are converted into visualized entities for expert interpretation.
This process relies on the BM25 retrieval algorithm34, which assigns a score to each entry related to the queried entity, based on factors such as term frequency, document frequency, and other parameters:
9
The matching score between a document D and a query Q depends on the term frequency of each keyword t queried in Q, the length of document D denoted as |D | , and the average length of all documents in the database. The parameters k1 and b are tuning constants related to these factors. Additionally, the inverse document frequency (IDF) of a term t is defined as follows:
10
where N is the total number of documents in the database, and nt is the number of documents that contain the term t.BM25 is first used to filter reference records containing key entities. To address semantic variations in long-form descriptive texts (e.g., provenance, historical background), we adopt a Chinese-specific BERT-wwm model35 for fuzzy relation matching. Both the expert query and reference texts are embedded using BERT architecture36, where the CLS token vector represents the overall sequence semantics. This allows the system to capture meaning beyond literal phrasing, including lexical and contextual relationships. Cosine similarity is then computed between vectors to retrieve semantically relevant records, overcoming BM25’s limitations in string-based matching.
We selected the document records corresponding to 312 paintings by Shi Tao, and used DeepSeek-V3 to generate 100 query–document pairs, which were manually verified. An ablation study was then conducted using BM25, BERT-wwm, and their combined approach (BM25 + BERT-wwm). The results, shown in Table 1, indicate that the hybrid method achieves the best performance in terms of both accuracy and coverage for retrieving relevant documents.
Table 1. Ablation study results on text retrieval
Model | Metric | ||
|---|---|---|---|
Precision | Recall | F1 | |
BM25 | 0.512 | 0.647 | 0.572 |
BERT-wwm | 0.649 | 0.837 | 0.731 |
BM25 + BERT-wwm | 0.728 | 0.932 | 0.817 |
Prompt-based guidance
As shown in Fig. 8, we encapsulate a two-level LLM as an agent to solve the problem7,8,37. In this framework, expert queries are decomposed into tasks, each handled by a dedicated submodule. This design enables multimodal feedback to provide supporting evidence for identification. Each submodule follows strict input-output rules. The image processing and text retrieval modules serve as the Matcher and Retriever, returning matched images and associated records (R2).
[See PDF for image]
Fig. 8
Prompt-based guidance.
Through prompt engineering with tailored prompts and case examples38, we ensure that the Classifier and Analyzer produce outputs in a standardized JSON format. The expert’s query is first input to the Classifier, which interprets intent and decomposes it into one or more tasks assigned to submodules (Matcher, Retriever, Analyzer). In the generated JSON, ‘id’ indicates the dialog round, ‘task’ specifies the assigned module(s), and ‘query’ and ‘data’ carry the necessary inputs39. If assigned to the Retriever, keywords are extracted into the keyword field; if an image is present, the Classifier determines its relevance and includes it in the data field for the Matcher. Task-specific instructions are stored in the task description field.
The Analyzer will receive one or more outputs from the preceding modules as inputs to provide specific and reliable responses to the expert’s queries. If the Classifier assigns a task to the Matcher or Retriever, the information retrieved from the database (e.g., literature records or details of specific paintings) will be encapsulated in the feedback field and passed as input to the Analyzer. To ensure contextual coherence, the Analyzer also incorporates task descriptions, case examples, and a history of prior queries and responses into its prompt, enabling a more accurate understanding of the identification scenario40.
To evaluate the effectiveness of prompt design41, we conducted an ablation study on the two major prompts used in our system (Table 2). For the Classifier, we tested how prompt components such as role specification, structural formatting, and few-shot examples affect tool assignment accuracy across 20 expert queries. The full prompt achieved 100% structural validity and 95% task assignment accuracy, while removing structural constraints reduced validity to 30%. For the Analyzer, we assessed output quality on 20 multimodal queries rated by two experts. The full prompt yielded 100% valid outputs with an average rating of 4.28/5, while ablated prompts showed noticeable declines in both structure and content.
Table 2. Prompt ablation results for tool assignment and answer generation.
Module | Prompt Setting | JSON Structure Accuracy | Task Assignment Accuracy | Expert Rating |
|---|---|---|---|---|
Module 1 (Tool Assignment) | Full prompt | 100% | 95% | / |
No role specification | 100% | 90% | / | |
No structure constraints | 45% | 30% | / | |
No examples (few-shot) | 75% | 60% | / | |
Module 2 (Answer Generation) | Full prompt | 100% | / | 4.28 |
No role specification | 100% | / | 3.45 | |
No structure constraints | 60% | / | 3.82 | |
No examples (few-shot) | 80% | / | 4.12 |
Visual interface
The image and text information obtained by the Matcher and Retriever are converted into nodes and edges, and visualize it in the system’s graph module, providing multimodal auxiliary authentication support to the expert in both graphical and textual formats. We designed a visualization scheme to present the entities obtained by the system. After the expert submits the query, the relevant nodes will be activated. Figure 9 shows the detailed information presented when the mouse hovers over the node.
[See PDF for image]
Fig. 9
Four types of entities.
a the painting entity; (b) the literature entity; (c) the seal entity; (d) the character entity
The edges between nodes are divided into three categories (Fig. 10): (1) the relationship of similarity, such as between images or seals, visualized as pie charts () that display similarity scores and rankings on hover; (2) the relationship of subordination, such as painting-painter or seal-user associations, represented by the icons (); and (3) the relationship based on literature research, including scholarly commentary or recorded artist opinions, shown using the icon (), which reveals detailed literature content on hover. To enhance clarity, each relationship type is color-coded. This view enables experts to intuitively explore multimodal relationships through both visual and interactive elements.
[See PDF for image]
Fig. 10
Three types relationships between entities.
a the relationship of similarity; (b) the relationship of subordinate; (c) the relationship based on literature research. Blue (4B80FA) indicates relationships related to paintings; Red (FF1A1A) represents relationships related to seals; Green (C4D6A0) represents relationships related to literature.
Results
We invited the 10 experts mentioned in “Method” section to complete testing tasks and collect feedback. We demonstrated the functionality and usage of ACPAS through two case studies and conducted user studies to evaluate the system across three dimensions.
Case studies
We randomly selected a painting from the image database as a recognition target and asked the experts to use the system to identify it. We documented this process in detail and selected two representative case studies to illustrate the system’s capabilities and usage. The following is a detailed description of these case studies.
Case 1: E2 initially suspected the painting might be by Shi Tao based on the signature but, being unfamiliar with his work, began by examining the seals and inscriptions. She searched for seal information and retrieved paintings with seals that had a similarity of over 70%. By comparing seals and creation times of related paintings, E2 formed a preliminary estimate of the painting’s date. She then noticed that the inscription mentioned “Dadi Thatched Cottage (大滌草堂)” and asked the system for specific information about Dadi Thatched Cottage. The system fed back relevant records of Shi Tao and “Dadi Thatched Cottage” in Li Lun’s “Biography of Dadizi” through literature and historical data matching. She further asked about the construction time of Dadi Thatched Cottage, hoping to narrow the creation time of the painting. This question involves the textual research conclusions of different scholars. The system integrated the textual research views of five scholars, including Wang Shiqing and Chen Guoping, through literature retrieval, and provided the textual research documents for E2 to read and refer to in detail. Finally, E2 combined the above analysis and believed that this painting was most likely created by Shi Tao when he settled in Yangzhou in his later years, and the creation time may be between 1696 and 1707. E2’s inquiry process and the system’s feedback are shown in Fig. 11.
[See PDF for image]
Fig. 11
Record of the process of E2 using ACPAS to authenticate the target painting.
Case 2: E5, an expert in landscape painting familiar with Shi Tao’s work, initially judged the target painting to be from Shi Tao’s later years in Yangzhou based on the inscriptions. However, she noted a lack of strong personal characteristics and requested additional work for comparison. She retrieved four other late-period paintings by Shi Tao and compared their techniques and styles. Intrigued by the use of colored ink dots on the rocks, E5 asked whether this stylistic feature appeared in Shi Tao’s other works. Through image similarity and literature retrieval, the system returned a visually and technically similar painting. She then inquired about Shi Tao’s compositional features. The system retrieved relevant content from Shi Tao’s work “Kug Gua He Shang Hua Yu Lu”, which provided E5 with Shi Tao’s insights on composition and summarized Shi Tao’s commonly used composition methods. After comparison and analysis, E5 believed that the target painting was more in line with the “three-fold and two-section” composition, and she wanted to obtain more paintings of Shi Tao with this composition. The system fed back 5 paintings related to this composition mentioned in relevant art reviews to E5 for her to review. At the same time, the research literature on the composition style of Shi Tao’s landscape paintings was also provided to E5. In the end, E5 believed that this work was more in line with Shi Tao’s creative ideas and style in terms of techniques, composition, and other contents, and there was a certain degree of possibility that it was Shi Tao’s authentic work. E5’s inquiry process and the system’s feedback are shown in Fig. 12.
[See PDF for image]
Fig. 12
Record of the process of E5 using ACPAS to authenticate the target painting.
User studies
In this section, we aim to clarify the interactive experience of ancient Chinese paintings authentication experts using ACPAS and the interpretability of the visual design. We hope to verify three hypotheses (H1-3) through user studies:
H1: Interactive experience. The interactive tools of the system can support experts in expressing their needs smoothly and efficiently, and obtain corresponding functional support.
H2: Visual cognitive compatibility. The visual design of the system conforms to the cognitive mode of experts and has good interpretability, which makes it easy for experts to obtain meaningful information and form their own opinions.
H3: System usability. The system performance and functional design can help experts complete their tasks efficiently, and the overall process is reasonable and practical.
After the ten experts completed the system exploration, we invited them to evaluate and provide feedback on the system based on the three dimensions in H1-3. The process was divided into two stages: (1) filling out the System Usability Scale (SUS)42; (2) obtaining feedback through interviews. All expert quotes included in this section were drawn from themes that emerged repeatedly across participants and were selected to illustrate typical rather than exceptional responses. Full interview notes are available in the supplementary materials.
Evaluation results
An efficient interactive experience (H1). E4 and E6 found the system easy to use, noting that they quickly grasped the interaction mechanisms of each view after a brief trial. E7 further emphasized the system’s intuitive design, noting that “the system’s interaction panel takes into account the user’s prior knowledge. When I actually used the interface, I found that many interaction modes were designed that authentication researchers were familiar with, so they were easier to understand and use”.
E4 echoed this sentiment, highlighting the system’s ability to bridge technical functionality and user accessibility, saying that “the best contribution of this system is that it can apply corresponding advanced technologies in a simple way of question and answer. For many technologies, I don’t need to look up materials to understand what they can do specifically”. These responses affirm the validity of H1, demonstrating that the system supports an efficient and user-friendly interaction experience.
Clear and cognitively compatible visual design (H2). Experts agreed that the visual graphics displayed in the system intuitively conveyed the multimodal information involved in the authentication process. E1, E2, and E5 mentioned that the visual presentation of the information network unified a large amount of trivial information in the authentication work into the associated network view. This information integration greatly improved the efficiency of the authentication work. E5 said that “through the visual presentation of nodes and relationships, combined with the screening of inquiry rounds, it is possible to sort out authentication clues from massive data and assist in the reasoning process in a new visual mode. I can quickly and accurately lock the information I need from these view windows. These visual designs make my thinking clearer”. These reflections support the validation of H2.
Excellent system usability (H3). The results of the SUS questionnaire are shown in Fig. 13. The SUS consists of 10 questions, with odd-numbered items phrased positively and even-numbered items phrased negatively. To compute the SUS score, we first convert the responses as follows: For odd-numbered items: Scaled score = original score −1; For even-numbered items: Scaled score = 5 − original score. Finally, the calculated total score is converted into a percentage score (percentage total score of odd and even items × 2.5). The total score is 85.3, and the system usability reaches a good level. E3 recognized the system’s ability to respond quickly to user needs and flexibly to changes brought about by various situations, which is particularly applicable to situations dealing with the variety and quantity of content in authentication work. E2 believed that the system has improved the efficiency of authentication work, explaining that “in the past, I needed to filter out potentially valuable comparative materials based on my experience and memory. This took a lot of time and effort. With this system, I only need to express my needs to obtain the corresponding materials, which reduces a lot of work”.
[See PDF for image]
Fig. 13
Rating of the SUS questionnaire.
All the experts said that the system can reduce the burden of authentication work compared with traditional workflow, and that the visual design and interactive mode of the system promote in-depth exploration of authentication content. E6 highly valued the application of the system, which is “a close union and perfect collaboration between the field of humanities and modern technology.” These findings support the validation of H3.
In summary, all experts who participated in the test gave high praise to ACPAS.
Discussion
Based on the traditional authentication process, this study proposes an interactive intelligent system, ACPAS. The system combines multimodal analysis, such as visual features, inscriptions, and seals with knowledge-assisted reasoning to help experts infer the authenticity of Chinese paintings. The system demonstrates the potential of intelligent technology in promoting cultural heritage protection and artwork authentication.
One of the core challenges of traditional authentication is the over-reliance on expert intuition and prior knowledge, which often leads to slow progress, inconsistent results, and a lack of traceability. ACPAS addresses this problem by providing an integrated platform that supports cross-multimodal evidence discovery. The system enables experts to efficiently trace historical information, visualize complex semantic relationships, and obtain support from structured literature and image databases. Through this process, the system improves the efficiency and interpretability of expert decision-making. Case studies and user evaluations confirmed that the system can reduce cognitive burden and personal bias. The most valuable contributions are: (1) improving information retrieval efficiency; (2) integrating scattered information into a coherent knowledge flow through visualization tools; (3) breaking down barriers between disciplines.
Although the system has received positive feedback overall, it still has certain limitations. First, the system’s recommendations depend heavily on the quality and coverage of its underlying database. When the data is sparse or unbalanced, matching results may become unreliable. Second, the system’s ability to interpret latent or implicit semantic connections remains limited; it can effectively capture explicit features but may miss subtle stylistic cues that only trained human experts can perceive.
Participating experts also expressed concerns about the system’s interpretability. Experts said that while the system provides some support for judgments, the system’s opacity reduces its persuasiveness if the system’s output conflicts with the expert’s intuition. In addition, some experts pointed out that the system’s use is driven by user-initiated queries. Therefore, if experts are unsure how to ask effective questions, the system may not be able to fully realize its potential.
To address these challenges, future work will focus on the following aspects: (1) Expanding the coverage and diversity of the database, especially covering different painting schools and periods, to improve the representativeness and robustness of evidence retrieval; (2) Enhancing semantic reasoning capabilities by integrating multimodal knowledge graphs and improving the system’s ability to identify implicit visual semantic associations; (3) Improving system transparency through explainable AI technology so that users can understand the principles behind the suggestions; (4) Developing proactive system behaviors, such as guided questions or suggestion prompts, to support users in scenarios with strong ambiguity.
In addition, we explored the potential application of the system in other art authentication fields. Although the core framework (such as multimodal retrieval and interactive reasoning) demonstrates broad portability, specific functions need to be adapted to different art forms. For example, due to differences in materials and styles, Western oil painting authentication focuses more on brushstroke patterns, pigment layers, and material composition. In this case, components need to be redesigned for fine-grained texture extraction and signature recognition, while existing segmentation and auxiliary reasoning modules can still be reused.
We discussed the flexibility and extensibility of ACPAS. As the convergence of cultural heritage research and intelligent technologies continues to deepen, it becomes increasingly valuable to explore how such systems can support scalable, interpretable, and context-aware applications across different artistic domains. Designing cross-cultural adaptation modules and validating their effectiveness in diverse heritage contexts will be important directions for future research.
Acknowledgements
This work was supported by the National Social Science Fund of China (No. 19ZDA046) and the National Natural Science Foundation of China (No. 52205290).
Author contributions
Conceptualization: C.X., C.W., and W.X.; Writing: C.X., L.Y., and C.Y.; Investigation: C.Y., T.T., W.R., and W.Y.; Methodology: C.X., L.Y., C.Y., and F.Y. Software: C.Y., T.T., and F.Y. Visualization: L.Y. and C.Y. All authors reviewed the manuscript.
Data availability
The dataset used and analyzed during the current study is available from the corresponding author upon reasonable request.
Competing interests
The authors declare no competing interests.
Supplementary information
The online version contains supplementary material available at https://doi.org/10.1038/s40494-025-02093-z.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Yang, F. Visual analysis in painting and Calligraphy authentication. Gugong Stud.; 2012; 01, pp. 293-309.
2. Guan, X; Pan, G; Wu, Z; Wu, G. Computer-aided authentication of traditional Chinese painting. Comput. Appl. Softw.; 2007; 24, pp. 103-105+144.
3. Na, N; Ouyang, Q; Ma, H; Ouyang, J; Li, Y. Non-destructive and in situ identification of rice paper, seals and pigments by FT-IR and XRD spectroscopy. Talanta; 2004; 64, pp. 1000-1008.1:CAS:528:DC%2BD2cXotlWhtLk%3D [DOI: https://dx.doi.org/10.1016/j.talanta.2004.04.025] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18969703]
4. Guo, X; Zhang, L; Wu, T; Zhang, H; Luo, X. Hidden information extraction from the ancient painting using hyperspectral imaging technology. Image Graph.; 2017; 22, pp. 1428-1435.
5. Li, J et al. In situ identification of pigment composition and particle size on wall paintings using visible spectroscopy as a noninvasive measurement method. Appl. Spectrosc.; 2016; 70, pp. 1900-1909.1:CAS:528:DC%2BC2sXht1ajsr7N [DOI: https://dx.doi.org/10.1177/0003702816645608] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27461462]
6. Sun, T., Ma, Y. The Use of Spectroscopy for non-destructive determination of the authenticity of Chinese Calligraphy and Painting. In Proceedings of the International Conference on Sensor Networks and Signal Processing (SNSP) 403–408 (IEEE, 2018).
7. Zhang, J; Xiang, R; Kuang, Z; Wang, B; Li, Y. ArchGPT: harnessing large language models for supporting renovation and conservation of traditional architectural heritage. Herit. Sci.; 2024; 12, 220. [DOI: https://dx.doi.org/10.1186/s40494-024-01334-x]
8. Wang, Z., Yuan, L. P., Wang, L., Jiang, B., Zeng, W. VirtuWander: enhancing multi-modal interaction for virtual tour guidance through large language models. In Proceedings of the Conference on Human Factors in Computing Systems (CHI)612 (2024).
9. Wang, L et al. A survey on large language model-based autonomous agents. Front. Comput. Sci.; 2024; 18, 186345. [DOI: https://dx.doi.org/10.1007/s11704-024-40231-1]
10. Wong, H. E., Rakic. M., Guttag, J., Dalca, A. V. ScribblePrompt: fast and flexible interactive segmentation for any biomedical image. arXiv e-prints arXiv:2312.07381 (2023).
11. Harvard University, Academia Sinica, Peking University. China Biographical Database (CBDB) https://projects.iq.harvard.edu/cbdb (2021).
12. Sturgeon, D. China Text Project (CTP)http://ctext.org/zhs (2006).
13. ZheJiang Library. Chinese Historical Figure Seal Data (CHFSD)http://diglweb.zjlib.cn:8081/zjtsg/zgjcj/index.htm (2011).
14. ZheJiang University. The Series of Ancient Chinese Paintings. https://www.zju.edu.cn/2022/1111/c73244a2677590/page.htm (2005).
15. Uijlings, JRR; Van De Sande, KEA; Gevers, T; Smeulders, AWM. Selective search for object recognition. Int. J. Comput. Vis.; 2013; 104, pp. 154-171. [DOI: https://dx.doi.org/10.1007/s11263-013-0620-5]
16. Pavlick, E. Symbols and grounding in large language models. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci.; 2023; 381, 2251, 20220041.
17. Xu, L; Lu, L; Liu, M; Song, C; Wu, L. Nanjing Yunjin intelligent question-answering system based on knowledge graphs and retrieval augmented generation technology. Herit. Sci.; 2024; 12, 118. [DOI: https://dx.doi.org/10.1186/s40494-024-01231-3]
18. Wan, J et al. WuMKG: a Chinese painting and calligraphy multimodal knowledge graph. Herit. Sci.; 2024; 12, 159. [DOI: https://dx.doi.org/10.1186/s40494-024-01268-4]
19. Yan, Y; Hou, Y; Xiao, Y; Zhang, R; Wang, Q. KNowNEt: Guided Health Information Seeking from LLMs via Knowledge Graph Integration. Proc. IEEE Trans. Vis. Comput. Graph. ((TVCG)); 2025; 31, pp. 547-557. [DOI: https://dx.doi.org/10.1109/TVCG.2024.3456364]
20. Kirillov, A. et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 3992–4003 (IEEE, 2023).
21. Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv e-prints arXiv:2010.11929 (2020).
22. Kalamkar, S; Geetha, MA. Multimodal image fusion: a systematic review. Decis. Anal. J.; 2023; 9, 100327. [DOI: https://dx.doi.org/10.1016/j.dajour.2023.100327]
23. Ojala, T; Pietikäinen, M; Harwood, D. A comparative study of texture measures with classification based on feature distributions. Pattern Recognit.; 1996; 29, pp. 51-59. [DOI: https://dx.doi.org/10.1016/0031-3203(95)00067-4]
24. MacQueen, J et al. Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Symp. Math. Stat. Probab.; 1967; 1, pp. 281-297.
25. Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell.; 1986; PAMI-8, pp. 679-698. [DOI: https://dx.doi.org/10.1109/TPAMI.1986.4767851]
26. He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR) 770-778 (IEEE 2016).
27. Mafi, M et al. A comprehensive survey on impulse and Gaussian denoising filters for digital images. Signal Process.; 2019; 157, pp. 236-260. [DOI: https://dx.doi.org/10.1016/j.sigpro.2018.12.006]
28. Shi, W. et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1874-1883 (IEEE 2016).
29. Mukhopadhyay, P; Chaudhuri, BB. A survey of Hough Transform. Pattern Recognit.; 2015; 48, pp. 993-1010. [DOI: https://dx.doi.org/10.1016/j.patcog.2014.08.027]
30. Gonzalez, R. C., Woods, R. E. Digital Image Processing. (Prentice Hall, 2008).
31. Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern.; 1979; SMC-9, pp. 62-66. [DOI: https://dx.doi.org/10.1109/TSMC.1979.4310076]
32. Haralick, RM; Sternberg, SR. Image analysis using mathematical morphology. IEEE Trans. Pattern Anal. Mach. Intell.; 1987; PAMI-9, pp. 532-550. [DOI: https://dx.doi.org/10.1109/TPAMI.1987.4767941]
33. Brown, T. B. et al. Language Models are Few-Shot Learners. In Proceedings of the 34th International Conference on Neural Information Processing System (NIPS'20) 1877–1901 (2020).
34. Robertson, S; Zaragoza, H. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr.; 2009; 3, pp. 333-389. [DOI: https://dx.doi.org/10.1561/1500000019]
35. Cui, Y; Che, W; Liu, T; Qin, B; Yang, Z. Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans. Audio Speech Lang. Process; 2021; 29, pp. 3504-3514. [DOI: https://dx.doi.org/10.1109/TASLP.2021.3124365]
36. Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 4171-4186 (2019).
37. Park, J. S. et al. Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ‘23)2 (2023).
38. Reynolds, L. & McDonell, K. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA'21)314 (2021).
39. Schick, T. et al. Toolformer: Language Models Can Teach Themselves to Use Tools. Adv Neural Inf Process Syst. 36 (2023).
40. Wei, J. et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS'22)1800 (2022).
41. Zhu, K. et al. PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts. In Proc. The 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis (LAMPS'24) 57-68 (2024).
42. Brooke, J. SUS: A quickdirty usability scale (ed. Thomas W.) 252 (Taylor & Francis, 1996).
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.