Full text

Turn on search term navigation

1. Introduction

The exponential advancement of digital technologies has fundamentally transformed the landscape of public discourse, creating complex systems of human–technology interaction that defy traditional analytical approaches. This complexity is particularly evident in the realm of emerging technologies such as large language models (LLMs), where public responses form intricate patterns of sentiment, engagement, and thematic evolution. The simulation and prediction of these discourse patterns represent a significant frontier in computational social science and digital humanities, requiring innovative methodological frameworks that can capture both micro-level behaviors and macro-level emergent phenomena [1,2].

Digital twins—defined as high-fidelity virtual replicas that mirror physical entities or systems in real time—have transformed practices in manufacturing, healthcare, and urban planning by enabling precise monitoring and optimization. However, researchers have rarely extended this concept to model social and communication phenomena. Our research bridges this gap by integrating digital twins with agent-based modeling (ABM) and natural language processing to create virtual representations of public discourse ecosystems. This integration creates computational experimental spaces where researchers can observe how technological innovations trigger specific patterns of public discussion, emotional responses, and shared interpretations of new technologies [3,4]. Unlike traditional discourse analysis that examines historical data, these digital representations enable the simulation of alternative scenarios and the prediction of likely discourse trajectories before they unfold in reality.

The emergence of DeepSeek, an advanced indigenous LLM characterized by cost-efficiency and sophisticated reasoning capabilities, provides an exceptional case study for implementing and validating such an intelligent digital twin approach. Upon its release in January 2025, DeepSeek triggered an intense burst of public discourse, generating approximately 250,000 social media interactions during a compressed 13-day period. This concentrated engagement event created an ideal natural experiment for examining how discourse forms, evolves, and stabilizes around technological innovations [5].

The theoretical significance of modeling technology discourse through digital twins extends beyond mapping content evolution to understanding the underlying socio-cognitive processes. User interaction behaviors in response to technological innovation reflect complex processes of interpretation, evaluation, and integration that cannot be adequately captured through traditional content analysis or survey methodologies. Digital twins, by contrast, can simulate the dynamic interplay between individual agency, Social Influence, and technological affordances that collectively shape discourse patterns [6].

By constructing the intelligent digital twin framework to analyze the formation mechanism of technological discourse, this study not only fills the gap in computational social science methodology in the study of technological communication but also provides substantive cognitive tools and decision support for multidimensional stakeholders. For technology developers, the framework provides prospective public opinion situational awareness, enabling them to simulate the evolution of public perception and acceptance under multiple scenarios prior to the release of the technology so as to optimize the product’s design, adjust the dissemination strategy, and prevent potential disputes; for policy makers, the framework builds a cognitive laboratory for technology governance, enabling them to assess the evolution path of public attitudes under different regulatory frameworks and predict the social acceptance of regulatory policies. For policy makers, the framework builds a cognitive laboratory for technology governance, enabling them to assess the evolution of public attitudes under different regulatory frameworks and to predict the social acceptance and potential controversies of regulatory policies; for corporate strategists, the digital twin provides a precise communication strategy design tool, enabling them to develop differentiated communication strategies for different user groups, optimize the construction of technological stories, and enhance the efficiency of social legitimacy acquisition. The theoretical contributions and application values of this study are as follows: first, it transcends the static content analysis paradigm to capture the micro-mechanisms and emergence patterns of the dynamic evolution of technological discourse; second, it breaks the limitations of the passive response of traditional public opinion monitoring and realizes the proactive prediction and prospective intervention of the technological discourse; and third, it provides a computational experimental framework for the study of the process of technological socialization so that the abstract process of the formation of the public’s cognition becomes a simulable research object that can be interfered with as an interventionable research object. Together, these values and contributions constitute the unique theoretical significance and practical value of the digital twin of technological discourse.

To clarify the conceptual positioning and methodological innovations of our proposed framework, we present Table 1, which compares our discourse-oriented digital twin with traditional digital twin paradigms widely used in industrial domains.

Previous approaches to modeling public discourse have been constrained by methodological limitations in capturing behavioral complexity and sentiment dynamics. Content analysis techniques often struggle to represent the multidimensional nature of public engagement, while sentiment analysis frequently fails to account for the contextual nuances and evolutionary patterns in emotional responses. The integration of LLM-enhanced topic modeling with agent-based simulation addresses these limitations by enabling a more comprehensive representation of both semantic structures and behavioral dynamics [7,8].

The innovative methodological framework we propose combines three complementary components: (1) LLM-BERTopic integration for semantic analysis and thematic clustering, (2) agent-based modeling for simulating user behaviors and interaction patterns, and (3) network analysis for mapping the relational structures that shape information diffusion. This integrated approach enables the construction of an intelligent digital twin capable of simulating discourse evolution and predicting potential trajectories with enhanced accuracy [9,10].

Unlike previous studies that merely visualize online discussions post hoc, our framework enables the forward-looking simulation of technology-related public discourse by integrating three novel aspects: (1) empirically grounded agent behavior parameterization from LLM-extracted sentiments; (2) a bidirectional feedback mechanism linking user behavior and discourse structure; and (3) modeling of cross-topic affective diffusion to capture sentiment spillover across semantic clusters. Together, these innovations support scenario-based public opinion forecasting, enabling stakeholders to test narrative strategies before technology release.

Our research addresses four interrelated dimensions of technology discourse patterns through our intelligent digital twin framework as follows: First, we examine the temporal evolution patterns and thematic clustering in DeepSeek discourse, identifying how topics emerge, transform, and stabilize during the observation period. Second, we develop an agent-based simulation of public opinion formation processes, modeling how individual user behaviors aggregate into collective sentiment dynamics. Third, we analyze network effects and interaction patterns among different user communities, revealing how structural positions influence discourse contributions [11,12,13].

The structure of this paper is as follows: After reviewing relevant literature on digital twin paradigms, agent-based modeling of user interactions, and computational approaches to discourse analysis, we describe our methodological framework. We then present simulation results concerning temporal evolution patterns, thematic clustering, sentiment dynamics, and network effects in the DeepSeek discourse case. Following a discussion of theoretical implications, we conclude the study by identifying limitations and directions for future research in digital twin applications for discourse modeling.

2. Literature Review

2.1. Digital Twin Paradigms for Modeling Public Discourse Dynamics

Digital twins have traditionally been conceptualized as virtual replicas of physical entities that enable the real-time monitoring, simulation, and optimization of operational processes. While initially developed for manufacturing and industrial applications, the digital twin paradigm has gradually expanded to encompass more complex socio-technical systems, including public discourse environments [14,15]. This conceptual evolution reflects growing recognition that discourse dynamics—particularly those surrounding emerging technologies—can be modeled, simulated, and analyzed using frameworks originally designed for physical systems.

A social phenomenon digital twin refers to a high-fidelity virtual replication of a specific social interaction system (such as a public discourse environment) capable of capturing the system’s content characteristics, behavioral dynamics, and structural evolution through computational simulation. Compared to industrial digital twins, social phenomenon digital twins possess four distinctive features: multidimensional representation, simultaneously simulating content (semantic), subject (behavioral), and structure (network) dimensions; emergent properties, capturing macro-patterns emerging from micro-interactions; feedback adaptability, where simulated subjects can respond to environmental changes and adjust behavioral strategies; and predictive interventionability, enabling the simulation of possible evolutionary trajectories under different intervention strategies. These characteristics enable social phenomenon digital twins to transcend the limitations of traditional social simulation, forming a more comprehensive and dynamic method for the virtual representation of social systems.

The application of digital twin paradigms to public discourse modeling represents a significant departure from traditional content analysis approaches. Rather than merely documenting and categorizing discourse content, digital twins aim to create functioning simulations that capture the dynamic processes through which discourse emerges, evolves, and stabilizes over time [16]. These simulations incorporate multiple dimensions of discourse systems, including content characteristics, user behaviors, network structures, and temporal patterns.

In terms of theoretical foundations, social phenomenon digital twins integrate three disciplinary traditions: complex systems theory, focusing on non-linear interactions and emergent patterns among system components; computational social science, providing data-driven formal analysis methods for social phenomena; and cognitive science, offering cognitive processing and decision mechanism models for participant behavior simulation. This interdisciplinary integration forms the theoretical cornerstone of social phenomenon digital twins, enabling simultaneous consideration of the content structure, behavioral dynamics, and cognitive mechanisms in social systems, forming a more comprehensive framework for understanding social phenomena.

Several key paradigmatic developments have facilitated this extension of digital twin applications to discourse domains. First, the emergence of “social physics” approaches established conceptual foundations for modeling social and communicative phenomena using frameworks derived from physical systems theory [17]. Second, advances in computational linguistics and natural language processing enabled a more sophisticated representation of semantic content and linguistic patterns [7]. Third, developments in complexity science provided analytical frameworks for understanding how micro-level interactions generate macro-level discourse patterns [18].

The conceptualization of digital twins for discourse modeling encompasses several distinctive architectural approaches. Oliveira proposed a conceptual framework emphasizing the importance of developing digital twins for complex communication systems [19], highlighting the value of 3D visualizations and real-time data integration for analyzing public discourse patterns. Zografos and Madas further extended this framework to incorporate decision support capabilities, enabling predictive analysis of discourse trajectories based on historical interaction patterns [20].

Recent research has identified several core components of digital twins for discourse modeling. Liu et al. developed a hierarchical semantic network approach that enables a multi-level representation of discourse structures, from individual semantic units to comprehensive conceptual networks [21]. Ren et al. complemented this approach with a temporal dimension, introducing dynamic link prediction algorithms that model how discourse structures evolve over time [22]. Wu et al. further expanded these frameworks to encompass multi-layer network representations, capturing how different discourse dimensions (e.g., thematic, emotional, and relational) interact within complex communication environments [5].

The distinctive value proposition of digital twin approaches to discourse modeling lies in their capacity to integrate multiple analytical dimensions within coherent simulation environments. Rather than examining discourse characteristics in isolation, digital twins enable researchers to observe and analyze how content patterns, user behaviors, and network structures co-evolve through continuous interaction [2]. This integrative capability is particularly valuable for understanding technology-triggered discourse, where rapid information flows and complex feedback mechanisms generate non-linear patterns that defy conventional analytical approaches.

Despite these advances, significant challenges remain in developing effective digital twins for public discourse dynamics. These include issues of data granularity and representativeness, computational complexity in simulating large-scale discourse environments, and methodological questions regarding the appropriate level of abstraction for discourse models [13]. Addressing these challenges requires interdisciplinary collaboration across computational linguistics, network science, and social media analytics.

2.2. Agent-Based Modeling of User Interaction Patterns in Technology-Triggered Environments

Agent-based modeling (ABM) has emerged as a particularly powerful approach for simulating user interaction patterns in technology-triggered discourse environments. By representing individual users as autonomous agents with distinctive behavioral characteristics, ABM enables researchers to examine how micro-level interactions generate macro-level discourse patterns [23,24]. This approach is especially valuable for technology discourse contexts, where heterogeneous user populations with diverse behavioral tendencies engage with novel information in distinctive ways.

The conceptual foundations of ABM in technology discourse contexts draw from several theoretical traditions. The diffusion of innovations theory provides frameworks for understanding how users encounter, evaluate, and adopt new technological concepts [25]. Social Influence models explain how interpersonal connections shape individual responses to technological information [26]. Cognitive processing theories illuminate how users interpret and integrate novel technological information into existing knowledge structures [27].

Recent applications of ABM to technology discourse contexts have demonstrated the approach’s versatility and analytical power. Christos et al. developed agent-based simulations of supply chain communication patterns, identifying how technological innovations trigger distinctive interaction sequences among industry stakeholders [28]. Anand et al. applied ABM frameworks to model multi-stakeholder communication in urban technology domains, demonstrating how diverse institutional actors negotiate conflicting interpretations of technological developments [29]. Wibowo et al. further extended these applications to simulate how communication patterns evolve in response to infrastructure expansions, revealing non-linear relationships between physical system changes and discursive responses [30].

The design of agent attributes and behaviors represents a critical dimension of ABM applications to technology discourse. Several researchers have developed sophisticated frameworks for representing user characteristics in technology discourse contexts. Zhuang and Zhang identified distinctive cognitive processing patterns that shape how users interpret technological developments, including framing tendencies, evidential standards, and inferential practices [11]. Gao and Li mapped emotional response patterns to technological narratives, documenting how affective reactions condition subsequent engagement behaviors [12]. Wang et al. characterized participation tendencies across different user segments, distinguishing between passive consumption, active engagement, and content generation behaviors [13].

Beyond individual agent characteristics, ABM approaches to technology discourse emphasize the importance of interaction rules and environmental conditions. Several key interaction mechanisms have been identified in technology discourse contexts: information sharing, opinion influence, behavioral imitation, and network formation [9]. These interactions occur within structured environments that shape discourse patterns through platform affordances, algorithmic filtering mechanisms, and community norms [10].

The methodological advantages of ABM for technology discourse analysis are numerous. First, ABM enables researchers to observe emergent phenomena that arise from interactions among heterogeneous agents, revealing how simple behavioral rules can generate complex discourse patterns [4]. Second, ABM facilitates the exploration of counterfactual scenarios, allowing researchers to examine how discourse might have evolved under alternative conditions [3]. Third, ABM provides natural frameworks for integrating empirical data at multiple levels, from individual behavioral tendencies to aggregate discourse patterns [5].

Despite these advantages, ABM approaches to technology discourse face several methodological challenges. These include questions of the appropriate abstraction level, validation strategies for behavioral assumptions, and computational complexity in simulating large-scale discourse environments [1]. Addressing these challenges requires both methodological innovation and rigorous empirical grounding to ensure that agent-based simulations accurately represent the behavioral dynamics of technology discourse.

2.3. Multi-Agent Systems in Simulating Sentiment Evolution and Thematic Diffusion Processes

Multi-agent systems (MASs) provide specialized frameworks for modeling the interdependent processes of sentiment evolution and thematic diffusion in technology discourse. While agent-based modeling offers general principles for simulating user behaviors, MAS approaches emphasize the interactions among intelligent agents with sophisticated decision-making capabilities, cognitive models, and adaptive strategies [31]. This orientation makes MASs particularly valuable for modeling how sentiment patterns and thematic structures co-evolve through complex feedback mechanisms in technology discourse environments.

The conceptual foundations of MAS approaches to sentiment evolution draw from several theoretical domains. Affective computing frameworks provide models for how emotional responses are generated, expressed, and recognized in digital communication contexts [32]. Sentiment diffusion theories explain how emotional reactions propagate through social networks, creating distinctive patterns of collective sentiment [33]. Appraisal theories illuminate how individuals evaluate technological developments based on personal values, perceived implications, and contextual factors [34].

Recent applications of MASs to sentiment evolution in technology discourse have yielded valuable insights into emotional response patterns. Dong and Song developed multi-agent simulations of sentiment diffusion in response to technological announcements, demonstrating how emotional cascades can rapidly transform discourse environments [35]. Kuzmicz and Pesch modeled how sentiment patterns evolve differently across various technological domains, identifying distinctive emotional trajectories for different innovation categories [36]. Suttmeier et al. further examined how sentiment patterns differ across stakeholder groups, revealing how institutional positioning shapes emotional responses to technological developments [37].

The representation of thematic diffusion processes represents a complementary focus of MAS approaches to technology discourse. Several researchers have developed sophisticated frameworks for modeling how thematic structures evolve through agent interactions. Müller and Tierney identified distinctive patterns of thematic convergence and divergence in technology discourse, showing how initial topic diversity tends to consolidate around dominant narratives over time [38]. Liu and Wan examined how thematic structures evolve at different temporal scales, distinguishing between rapid convergence around breaking developments and the gradual evolution of foundational concepts [39]. Jeon et al. further explored how thematic evolution varies across different technological domains, documenting distinctive patterns for incremental versus disruptive innovations [40].

A central focus of MAS approaches to technology discourse is the interdependence between sentiment evolution and thematic diffusion. Several key interaction mechanisms have been identified: emotional priming of attention to specific topics, thematic framing of emotional responses, sentiment-based network formation, and emotion-driven thematic elaboration [13]. These mechanisms create complex feedback cycles in which sentiment patterns shape thematic structures, which in turn influence emotional responses in ongoing discourse processes [11].

The methodological advantages of MASs for modeling these interdependent processes are significant. First, an MAS enables the representation of heterogeneous computational intelligence across different agent types, reflecting the diverse cognitive models that shape discourse contributions [41]. Second, an MAS facilitates the modeling of sophisticated emotional architectures that capture the multidimensional nature of affective responses to technological developments [42]. Third, an MAS provides frameworks for representing complex decision processes that incorporate both cognitive evaluation and emotional response [43].

Recent methodological innovations have further enhanced the capacity of MASs to model sentiment and thematic processes in technology discourse. The integration of transformer-based language models has enabled more sophisticated representations of semantic content and emotional expressions [8]. The development of hybrid modeling approaches combining agent-based simulation with machine learning has facilitated more accurate prediction of discourse trajectories [21]. The incorporation of reinforcement learning mechanisms has enabled more realistic modeling of how agents adapt communication strategies based on discourse outcomes [5].

Despite these advances, challenges remain in applying MAS approaches to sentiment evolution and thematic diffusion in technology discourse. These include questions of appropriate representational complexity for emotional processes, validation strategies for cognitive models, and integration methods for connecting micro-level sentiment dynamics with macro-level discourse patterns [12]. Addressing these challenges requires continued interdisciplinary collaboration across affective computing, cognitive science, and computational linguistics.

3. Data and Methods

3.1. Data Collection

This study aims to develop an intelligent digital twin for predicting technology discourse patterns by systematically analyzing the thematic narrative characteristics and user interaction dynamics surrounding DeepSeek’s emergence. Through this digital twin construction, we seek to reveal the internal mechanisms and patterns of technological public opinion evolution, thereby establishing a robust framework for simulating discourse dynamics in similar contexts. Based on the research objectives, we employed an integrated approach combining LLM-enhanced BERTopic modeling with agent-based simulation, establishing a comprehensive framework for data collection, preprocessing, and digital twin development.

Weibo, one of China’s most influential social media platforms, features a broad user base, rapid topic updates, and strong interactivity. It has become an important channel for public expression and discussion, as well as a key data source for public opinion research. The selection of Weibo as the data collection source is justified for several reasons: (1) Weibo’s user demographic structure is diverse, encompassing netizens of different ages, occupations, and educational backgrounds, providing a relatively comprehensive reflection of public perceptions and evaluations of DeepSeek across social sectors; (2) Weibo’s real-time and interactive features make it both an origination point and distribution hub for technological hot topics, with a high sensitivity for capturing public opinion peaks; (3) Weibo’s open comment mechanism lowers participation barriers, allowing the public to express opinions more freely, providing a rich corpus for the diversified construction of public opinion themes.

This research focuses on short-term public opinion following the public release of the DeepSeek-R1 model, with a specific data collection timeframe established from 20 January to 1 February 2025, totaling 13 days. The time window was determined based on preliminary research observing DeepSeek’s public opinion lifecycle, starting from “DeepSeek-R1 release” and ending on the eve of “DeepSeek topping major app stores”. This period encompasses a complete cycle from initial attention to intense discussion to opinion cooling, facilitating the capture of the entire process of technological public opinion theme evolution. Data collection was carried out by employing the “Zhiwei Data” company’s proprietary crawler in conjunction with Weibo’s open API, using keywords such as “DeepSeek,” “深度求索” (DeepSeek in Chinese), and “国产大模型” (domestic large model) as search criteria to retrieve Weibo posts containing these terms.

To ensure data completeness and representativeness, the collection process followed these strategies: (1) an incremental collection method was employed with data crawling every 4 h to capture real-time changes in public opinion; (2) deep crawling was conducted for popular Weibo posts (those with more than 1000 reposts, comments, or likes) to obtain their complete textual content; (3) data underwent deduplication and cleaning processes to remove advertisements, duplicate content published by bot accounts, and insubstantial emoji-only comments. Ultimately, 253,280 valid Weibo text entries were obtained, constituting the core dataset for this research.

The collected data served as the empirical foundation for our digital twin construction, providing the necessary inputs for semantic analysis, user behavior profiling, network structure mapping, and temporal pattern identification. This comprehensive dataset enabled us to create a high-fidelity virtual representation of the DeepSeek discourse ecosystem, capturing both content characteristics and interaction dynamics with sufficient granularity and temporal resolution to simulate discourse evolution processes. The data preprocessing pipeline included text normalization, user attribute extraction, interaction pattern identification, and temporal segmentation, ensuring that all necessary dimensions were properly represented in the digital twin architecture.

To ensure that data collection, semantic analysis, and agent modeling constitute an integrated research framework, we designed a multi-tiered methodological architecture that allows components to operate independently as well as synergistically. The framework consists of three interrelated modules: the data acquisition and preprocessing module, the semantic analysis and topic identification module, and the agent modeling and network analysis module. The data acquisition module first extracts interactive texts and user attributes to construct an initial corpus; the semantic analysis module then applies the LLM-enhanced BERTopic method to thematically cluster the corpus to identify key discourse domains and their evolution trajectories. These thematic clustering results and temporal patterns are directly fed into the agent modeling module to provide an empirical basis for behavioral rules and parameter settings of different types of agents. Through this modular design, we achieve a systematic transformation from raw social media data to an actionable simulation environment, where the output of each methodological component becomes the input of the next, forming a complete data–semantic behavioral analysis chain.

Given the interdisciplinary nature of this study and the involvement of cross-domain concepts, a glossary of key terms used in the paper is provided in Appendix A to facilitate readers’ understanding.

3.2. Conducting Semantic Analysis Through LLM-BERTopic Integration

The semantic analysis component of our digital twin framework employs an innovative integration of large language models (LLMs) with BERTopic methodology to enable more sophisticated representation of discourse content. This integrated approach addresses limitations of traditional topic modeling techniques by enhancing semantic understanding capabilities, improving topic coherence, and enabling more accurate identification of thematic evolution patterns. The LLM-BERTopic integration serves as the foundational semantic engine for our digital twin, providing critical inputs for subsequent agent-based simulation.

Our methodological framework, as illustrated in Figure 1, comprises three interconnected modules: data processing, representation clustering, and topic analysis. This architecture enables comprehensive semantic parsing of technology discourse while maintaining computational efficiency and representational accuracy.

3.2.1. Data Processing Module Enhancement

The data processing module performs dual operations on the raw textual data to maximize semantic information extraction. First, we implement conventional preprocessing using Python 3.10.0 libraries (pandas and jieba) to prepare texts for TF-IDF calculation, including deduplication, tokenization, and stopword removal. This creates a clean corpus for initial statistical analysis.

Concurrently, we introduce LLM-based semantic enhancement to extract deeper contextual meaning. We deploy locally hosted LLMs (Qwen2.5-7B and DeepSeek-R1-7B) to distill core viewpoints and opinions from original texts. The LLM implementation utilizes a specially designed instruction template to guide semantic extraction (shown in Figure 2).

After comparative testing, we selected Qwen2.5-7B as our primary semantic processing model due to its optimal balance of performance and computational efficiency. To further enhance processing speed, we implemented a Ray-based distributed computing framework that achieved approximately 15× acceleration over single-threaded processing.

The LLM component significantly enhances semantic representation by ① identifying latent concept associations and thematic orientations; ② expanding text with relevant semantic information; ③ filtering noise from unstructured social media content; and ④ improving contextual understanding of ambiguous terminology.

This dual-processing approach provides complementary semantic representations that capture both statistical patterns and contextual meanings within the discourse corpus.

3.2.2. Representation Clustering Module

The representation clustering module transforms the processed textual data into a high-dimensional semantic space where thematic clusters can be identified. We implement this through a three-stage process: embedding generation, dimensionality reduction, and density-based clustering.

To transform text data into a format suitable for machine processing, we use Sentence-BERT (SBERT) to convert each social media post about DeepSeek into a numerical representation called an embedding vector. This process translates human language into a 768-dimensional vector where each dimension captures a specific semantic feature of the text [44]. In practical terms, SBERT analyzes each post to generate a mathematical representation (vector E) of its meaning using the following formula:

(1) ${\vec{v}}_{i} = SBERT (t_{i})$

where

{\vec{v}}_{i}

represents the embedding vector for text entry

t_{i}

. These vectors position semantically similar posts closer together in mathematical space, much like placing related books on nearby library shelves. While these detailed vectors effectively capture subtle meaning differences between posts, their high dimensionality (768 features) creates computational challenges for identifying topic clusters, necessitating dimension reduction techniques.

To develop our text topic modeling research, we employed advanced pretrained models and topic modeling techniques. First, we loaded the paraphrase-multilingual-MiniLM-L12-v2 pretrained model using the SentenceTransformer library. This multilingual model effectively converts sentences or paragraphs from different languages into fixed-length semantic vectors, reflecting text semantic similarity in vector space while achieving a good balance between speed and performance.

Building on this foundation, we further constructed a topic model using the BERTopic library. The SentenceTransformer model was used as the embedding model for BERTopic to leverage its powerful semantic understanding capabilities. For model parameter settings, we configured the topic number to auto-determination mode (nr_topics = ‘auto’), allowing the model to dynamically adjust the number of topics based on data characteristics and clustering results. The top 15 most representative words were extracted for each topic (top_n_words = 15) to clearly present the main content of each topic. Simultaneously, we specified that each topic should contain at least 10 documents (min_topic_size = 10) to avoid generating topics that are too small and lack practical significance. Considering that the data might include multilingual text, we set the language parameter to multilingual to optimize the topic modeling process for multilingual text. Additionally, we enabled probability calculation (calculate_probabilities = true) to evaluate the probability of each document belonging to various topics, enhancing the understanding of document topic relevance.

To address high dimensionality, we employed the Uniform Manifold Approximation and Projection (UMAP) algorithm for dimensionality reduction [45]. UMAP preserves both local and global semantic structures while reducing computational complexity:

(2) ${\vec{u}}_{i} = UMAP ({\vec{v}}_{i}, n_{neighbors} = 15, n_{components} = 5, \min_dist = 0.1)$

where

{\vec{u}}_{i}

represents the reduced dimension vector in 5-dimensional space. The parameters were optimized through ablation studies to maintain semantic integrity while enabling efficient clustering.

For the final clustering operation, we implemented the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm with the following configuration:

(3) $C = HDBSCAN ({{\vec{u}}_{1}, {\vec{u}}_{2}, \dots, {\vec{u}}_{n}}, \min_cluster_size = 10, \min_samples = 5)$

where

C

represents the resulting cluster assignments. HDBSCAN offers several advantages for discourse clustering, including automatic identification of optimal cluster numbers, effective noise handling, and detection of non-spherical cluster structures [46]. This allows our model to identify natural thematic groupings without imposing artificial structural constraints.

3.2.3. Topic Analysis Module

The topic analysis module constructs semantic representations of identified clusters and tracks their evolution over time. We employed a class-based TF-IDF approach (c-TF-IDF) to extract representative terms for each cluster, calculating the importance of term tt t in cluster cc c as

(4) ${c - TF - IDF}_{t, c} = \frac{f_{t, c}}{\sum_{t^{'} \in c} f_{t^{'}, c}} \times \log (1 + \frac{N}{n_{t}})$

where

f_{t, c}

is the frequency of term t in cluster c, N is the total number of documents, and

n_{t}

is the number of documents containing term t [7].

To address the challenge of topic interpretation, we further enhance the analysis by implementing an LLM-based topic refinement process. Following the approach proposed in [8], we prompted the LLM to synthesize and summarize the BERTopic clusters using an instruction template (shown in Figure 3).

This LLM-enhanced topic interpretation significantly improves the coherence and interpretability of the identified thematic structures, addressing a common limitation of purely statistical topic modeling approaches.

For temporal analysis, we constructed a topic-time matrix $M$ where each element $M_{c, t}$ represents the relative frequency of cluster c at time interval t. This enables tracking of thematic evolution patterns throughout the observation period:

(5) $M_{c, t} = \frac{| D_{c, t} |}{| D_{t} |}$

where

| D_{c, t} |

is the number of documents in cluster c at time t, and

| D_{t} |

is the total number of documents at time t.

The integration of these three modules—data processing, representation clustering, and topic analysis—creates a comprehensive semantic engine that forms the foundation of our digital twin. This engine enables accurate identification of thematic structures, tracking of discourse evolution patterns, and analysis of semantic relationships that inform the agent-based simulation component of our framework.

3.3. Agent-Based Model Design for User Behavior Simulation

Building upon the semantic structures identified through LLM-BERTopic analysis of the empirical DeepSeek dataset, we designed an agent-based simulation that could reproduce and predict the emergent patterns of discourse evolution (shown in Figure 4). This approach enabled us to bridge micro-level user behaviors with macro-level discourse phenomena, providing mechanistic explanations for observed patterns and supporting counterfactual scenario testing.

The simulation employs an object-oriented programming paradigm to conceptualize the DeepSeek discourse ecosystem as a complex system of heterogeneous agents that interact in a structured social network environment. Based on the statistical analysis of the empirical dataset, we define three categories of agent types, namely general users, domain experts, and institutional accounts, each with differentiated behavioral characteristics and influence capabilities [24]. We set an initial interval according to the proportions of different categories of users in the actual data, and each experiment randomly selects a proportion within the interval to conduct the experiment and ensures that all the proportions add up to 1. Multiple experiments were conducted, and the obtained results are as follows.

Ordinary user agents (accounting for about 85% of the total) exhibit specific behavioral patterns: (1) power-law distribution of topic engagement, making them more inclined to engage in non-technical topics such as Technological Competition and social impact; (2) stronger tendency to express emotions, generating higher emotional responses to stimulating content; (3) lower influence weights (0.1–0.3) (but the group size effect allows them to collectively shape the overall discourse direction); and (4) communication behavior based on heuristic rules, i.e., content retweeting occurs when the emotional intensity exceeds the threshold of 0.5 and the engagement level exceeds 0.7.

Domain expert agents (~10% of the overall) exhibit distinct behavioral characteristics: (1) higher topic focus, concentrating on engaging in specialized topics such as Technological Breakthrough and Information Security; (2) more balanced expression of emotions with a smaller range of fluctuations; (3) medium influence weight (0.4–0.6), with significant discourse on specific technical topics; and (4) communication behaviors based on professional judgments considering content quality and technical accuracy.

Institutional account agents (~5% of the overall) exhibit unique systematic communication behaviors: (1) a highly balanced cross-topic engagement pattern; (2) the most neutral and stable expression of emotions; (3) the highest influence weight (0.7–0.9), but with a lower frequency of content retweeting; and (4) communication behaviors based on strategic considerations to maintain brand image and message integrity.

The agent behavioral architecture integrates both cognitive and affective dimensions. In the cognitive dimension, agents exhibit topic-specific engagement tendencies that determine their likelihood of engaging with different topics; in the affective dimension, agents maintain topic-specific affective orientations that shape their affective responses to discourse content. These behavioral tendencies are initialized based on the statistical distribution of the empirical dataset, ensuring that the simulation reflects the real behavioral diversity observed in the DeepSeek case. Agent engagement behavior is determined by the contact probability function P(topic, agent), which combines three factors: topic heat (α = 0.3), user engagement (β = 0.4), and authentication weight (γ = 0.3). Meanwhile, the decision to repost is jointly determined by emotional intensity and engagement, forming a differentiated information diffusion model.

The key innovation of this model is that it captures cross-topic affective effects. Using a correlation matrix derived from empirical topic co-occurrence patterns, we simulated how participation in a DeepSeek dimension (e.g., “Technological Breakthroughs”) affects perceptions of related dimensions (e.g., “Technological Competition” or “Information Security”). For example, the correlation coefficient between Technological Breakthrough and Technological Competition is 0.8, suggesting that positive perceptions of Technological Breakthrough enhance perceptions of Technological Competition with a strength of 0.8. This association mechanism allows the model to simulate how emotion and engagement patterns diffuse across topic boundaries, capturing the multidimensional nature of technology perceptions.

The network structure of user interactions is designed to reflect the complex patterns of connectivity observed in social media environments. We integrated a variety of network structures, including small-world clustering (Watts–Strogatz model), prioritized connectivity mechanisms (Barabási–Albert model), and institutional hierarchies (hierarchical networks), to construct a hybrid network architecture. This multi-layered network structure provides a more realistic characterization of information flow patterns in a technological discourse environment. The weights of the edges between nodes are determined by the mean of the influence weights of the agents at both ends, capturing the critical role of influence in information dissemination.

Time dynamics is another key dimension of the model. The simulation integrates endogenous processes (such as emotional decay and network evolution) and exogenous triggering factors (such as news events and platform algorithm changes) to jointly drive discourse evolution. The emotional value decays at a rate of 0.1 per step (with a 20% increase in negative emotional decay rate), reflecting the natural decay characteristic of public attention. At the same time, the network structure also dynamically evolves over time, and agents with high activity have the opportunity to reorganize their connection relationships, delete existing connections, and establish new connections at each time step, thereby allowing the network structure to naturally change over time. In addition, the system also incorporates a random hotspot triggering mechanism, which randomly increases the popularity of a topic with a probability of 0.1 at each time step, simulating unexpected events and hot topics in the real world.

The integration of agent simulation and semantic analysis components forms a comprehensive digital twin of the DeepSeek discourse ecosystem. Semantic analysis provides thematic structures and content features that shape agent behavior, while agent simulation reproduces the dynamic evolution of these structures over time. This integration method enables us to conduct a more complex analysis of discourse dynamics, surpassing the analytical depth that can be achieved by a single component.

Unlike traditional agent-based simulation, this study’s intelligent digital twin framework achieves methodological innovation in three key dimensions: The first is multidimensional integration. Traditional agent-based simulations typically employ simplified behavioral rules and interaction environments, whereas digital twins integrate LLM-enhanced semantic analysis, parameterized agent behavioral models, and evolutionary network structures, achieving full-dimensional simulation of the content–behavior–structure system. The second is data-driven parameterization. Traditional simulations often rely on theoretical assumptions to set parameters, while our framework uses semantic structures, sentiment distributions, and network characteristics extracted from real data to provide an empirical foundation for agent behaviors, significantly improving the reality fit of the simulation. The third is dynamic feedback mechanisms. Traditional simulations mostly adopt fixed environmental settings, while digital twins construct bidirectional influence mechanisms among discourse content, participant behavior, and network structure, where agents are influenced by topics and reshape topic structures, more realistically reflecting the complex dynamics of social interaction. These innovations enable digital twins to transcend pure behavioral simulation, constructing a comprehensive virtual environment capable of capturing the entire process of technological discourse formation.

4. Results

4.1. Temporal Evolution Patterns and Thematic Clustering in DeepSeek Discourse

The analysis of temporal evolution patterns in the DeepSeek discourse reveals distinctive engagement dynamics throughout the 13-day observation period. As illustrated in Figure 5, the discourse exhibited a clear three-phase developmental trajectory. The initial phase (20–24 January) was characterized by limited engagement, with daily post volumes remaining below 500 and interaction counts showing minimal fluctuation. This represents the incubation period when awareness of DeepSeek was confined primarily to specialized technological communities.

The discourse entered an explosive growth phase beginning on 25 January, with both post volumes and interaction metrics experiencing dramatic acceleration. This phase culminated in a peak in discourse intensity on 28 January, when daily post counts exceeded 4000 and interaction metrics surpassed 50,000. This rapid intensification coincided with broader media coverage and the entry of influential opinion leaders into the discourse. The temporal distribution of engagement metrics demonstrates the cascading nature of technology discourse diffusion, as initial specialized discussions triggered progressively broader public engagement.

In the post-peak phase (29 January–2 February), the discourse exhibited a pulsating pattern characterized by regular fluctuations in both post volumes and interaction metrics. Rather than showing simple decay, the discourse maintained substantial engagement levels with recurring daily peaks exceeding 2000 posts. This pattern suggests the formation of a sustained interest community around DeepSeek, with regular engagement cycles driven by platform algorithms, content creator schedules, and ongoing developments.

The analysis of emotional responses in the DeepSeek discussion reveals interesting patterns over time. During the first few days (initial phase), public reactions were mostly neutral and balanced, with people primarily sharing information rather than expressing strong opinions. However, when public attention surged between 25 and 28 January (explosive growth phase), emotional responses became highly unstable, rapidly shifting between strongly positive and strongly negative responses—sometimes within hours. This emotional volatility occurred because different groups began actively promoting competing perspectives on DeepSeek’s significance: some celebrated it as a national technological achievement, while others questioned its capabilities compared to international alternatives. These conflicting evaluations created a sentiment fluctuation in the overall discourse as different interpretations gained temporary dominance in the conversation. The post-peak phase exhibited a gradual sentiment stabilization trend, suggesting the emergence of more balanced and nuanced technological evaluations as the discourse matured.

The frequency distribution histograms for various engagement metrics reveal highly skewed participation patterns characteristic of social media environments. The distribution of shares, interactions, likes, and comments all demonstrate classic power-law distributions, with a small number of highly engaged posts accounting for a disproportionate share of total engagement. This pattern highlights the critical role of influential content in shaping discourse dynamics, as relatively few viral posts drive substantial portions of the overall engagement.

The thematic clustering analysis uncovered six distinct discourse domains that collectively structured the DeepSeek discussion, as shown in Figure 6. These clusters represent coherent interpretive frameworks through which participants engaged with different aspects of technological development. The “Technological Competition” cluster focused primarily on comparative evaluations between DeepSeek and competing models, with particular emphasis on model architecture, open-source strategies, and international competitive positioning. Terms such as “模型” (model), “开源” (open-source), “美国” (United States), and “中国” (China) emerged as central organizing concepts within this discourse domain, reflecting the geopolitical framing of technological development.

The “Technological Breakthrough” cluster centered on technical capabilities and performance characteristics, with keywords such as R1 model, performance, “强化学习” (reinforcement learning), and “低成本” (cost reduction) highlighting the technical aspects that attracted participant attention. This cluster represents the technically oriented segment of the discourse, where specialized knowledge communities engaged in a detailed assessment of DeepSeek’s capabilities and architecture.

The “User Feedback” cluster captured experiential evaluations from early adopters, focusing on usability, functionality, and comparative performance in specific application contexts. Keywords such as “有趣” (interesting), “认可” (recognition), “功能” (functionality), and “友好” (user-friendly) indicate predominantly positive user experiences, though “失望” (disappointment) also appeared among the prominent terms, suggesting mixed reception among different user segments.

The “Financial Market” cluster reflected market-oriented interpretations of DeepSeek’s significance, with particular focus on implications for technology companies and market dynamics. Keywords such as “英伟达” (Nvidia), “股价” (stock price), “市场波动” (market volatility), and “泡沫” (bubble) indicate participants’ attempts to situate DeepSeek within broader market narratives and evaluate its potential economic impact.

The “Social Influence” cluster addressed broader societal implications, with an emphasis on international competition, technological sovereignty, and industry transformation. Keywords such as “中美竞争” (China-US competition), “开源策略” (open-source strategy), “科技崛起” (technological rise), and “硅谷霸权” (Silicon Valley hegemony) highlight the geopolitical frames through which many participants interpreted DeepSeek’s significance beyond its technical capabilities.

The “Information Security” cluster focused on privacy, data protection, and system security considerations. Keywords such as “隐私” (privacy), “数据挖掘” (data mining), “泄露风险” (leakage risk), and “网络安全” (cybersecurity) reflect growing public awareness of security implications associated with large language models, representing a counterpoint to the predominantly capability-focused discussions in other clusters.

The distribution of keywords across these thematic clusters demonstrates how DeepSeek simultaneously occupied multiple interpretive domains within public discourse. Rather than being understood through a single dominant frame, the technology stimulated diverse forms of engagement spanning technical evaluation, market analysis, geopolitical interpretation, and security assessment. This multidimensional framing reflects the complex positioning of advanced AI technologies at the intersection of technical, economic, and geopolitical domains.

These temporal and thematic patterns provide more than descriptive insight; they reveal how different interpretive frames compete and stabilize in early-stage technology discourse. The multi-peaked pattern of attention in the post-peak phase implies not a single burst followed by decay, but rather a dynamic system sustained by periodic stimuli—suggesting that media events, influencer interventions, or algorithmic amplifications can trigger emotional reverberations even after the initial climax. The coexistence of multiple themes (e.g., technological capability, geopolitical competition, and market speculation) reflects the fragmentation of public meaning-making and calls for narrative strategies that can integrate or reconcile conflicting frames. For practitioners, this means anticipating not only what topics might emerge but also when and how they interact.

4.2. Agent-Based Simulation of Public Opinion Formation and Sentiment Dynamics

The parametric construction of sentiment dynamics in agent simulation constitutes the core technical aspect of the digital twin framework, realizing the methodological bridge from empirical sentiment analysis to computational simulation. Specifically, we transformed the sentiment results of semantic analysis into parametric inputs for agent simulation through the following five-step technical path: first, we classified the sentiment polarity of each comment in the original corpus (based on the pre-trained LLM) and extracted the topic inclination labels (based on the BERTopic clustering results), and then constructed the ternary data structure of {User ID, Topic, Sentiment Value}; second, each user’s sentiment expression under each theme was vertically aggregated, and its theme-specific sentiment mean and standard deviation were calculated to form a user–theme sentiment distribution mapping based on empirical data; third, a hierarchical clustering analysis was performed on the sentiment distribution based on the type of user authentication (ordinary users, domain experts, and institutional accounts), and the sentiment feature vectors of different user groups were extracted, including the mean value of the sentiment, the polarity bias, and the fluctuation amplitude; fourth, we mapped these emotion feature vectors into initial emotion parameters and dynamic update rules in the agent model, such as the initial emotion mean value of ordinary users on the topic of technical competition, the range of emotion fluctuation of institutional accounts on the topic of Information Security, etc.; lastly, we designed the cross-topic emotion diffusion mechanism, constructing the topic correlation strengths based on the topic co-occurrence frequency matrix so that the emotion change on a topic can affect the emotion state of related topics with different intensities to affect the sentiment state of related topics.

This approach of parameterizing emotions based on empirical data ensures a realistic basis for agent behavior simulation and enables the model to reproduce the complex emotional patterns observed in DeepSeek discourse. During the execution of the simulation, the agent’s affective state is further influenced by three types of dynamics: the network proximity effect (affective assimilation between connected nodes), the temporal decay effect (natural decay of affective intensity over time), and the hotspot triggering effect (a specific event that leads to the reinforcement of a topic’s affect). Together, these mechanisms constitute a multidimensional affective dynamics system capable of capturing the complex patterns of affective evolution in technological discourse.

The agent-based simulation of DeepSeek discourse dynamics provides insights into the temporal evolution of thematic participation and sentiment patterns across different user segments. Figure 7 illustrates the predicted trajectories of both participation levels and sentiment intensity across the six thematic domains identified in our semantic analysis: Technological Competition (TC), Technological Breakthrough (TB), User Feedback (UF), Financial Market (FM), Social Influence (SI), and Information Security (IS).

The simulation results for thematic participation reveal distinctive evolutionary patterns across different topics. All themes demonstrate a consistent upward trajectory, indicating growing engagement throughout the simulation period. However, the growth rates and participation levels vary significantly across topics. Financial Market (FM) exhibits the highest final participation level, closely followed by Technological Competition (TC), reflecting the predominant economic and competitive framing of DeepSeek among discourse participants. In contrast, Technological Breakthrough (TB) shows consistently lower participation levels throughout the simulation, suggesting that technical discussions remained relatively specialized even as broader discourse expanded.

Notably, participation growth follows a sigmoid pattern across all themes, with rapid acceleration in the early stages (steps 0–5), followed by a gradual approach to saturation after step 10. This pattern aligns with classic diffusion models, where initial adoption by early participants triggers cascading engagement that eventually reaches natural limits. The final participation levels stabilize between 0.8 and 0.85 for most themes, indicating substantial but not universal engagement across the simulated population.

The sentiment evolution patterns shown in the lower panel of Figure 7 reveal more complex dynamics with significant fluctuations throughout the simulation period. All themes show a rapid initial increase in positive sentiment during the first five simulation steps, coinciding with the early adoption phase of participation growth. This reflects the initial enthusiasm and positive framing that typically accompanies emerging technological innovations. Between steps 5 and 60, sentiment patterns exhibit notable volatility with multiple peaks and troughs, particularly in the Technological Competition and Financial Market domains where sentiment values oscillate between 0.15 and 0.25.

After step 60, a gradual sentiment decline is observed across all themes, with final sentiment values converging between 0.1 and 0.15. This pattern suggests a normalization process where initial enthusiasm gives way to more measured evaluations as discourse matures. The Technological Breakthrough theme consistently maintains lower sentiment values than other domains throughout most of the simulation, indicating more critical or skeptical assessments of technical capabilities compared to other aspects of DeepSeek.

Figure 8 provides further insights by disaggregating participation and sentiment patterns across three agent types: general users, domain experts, and institutional accounts. The participation analysis reveals that general users maintain the highest engagement levels across all thematic domains, with particularly strong participation in Technological Competition (TC) at approximately 0.88. Domain experts show a similar pattern but with slightly lower overall engagement, particularly in technical domains. Institutional accounts demonstrate consistently lower participation levels across all themes, with values hovering around 0.82, indicating more selective engagement strategies.

The sentiment analysis in the lower panel of Figure 8 reveals striking differences in affective orientations across agent types. General users exhibit the most positive sentiment across all themes, with particularly strong positive effects on Technological Competition (TC) and Social Influence (SI). This suggests that non-specialist participants view DeepSeek primarily through nationalistic and competitive frames, emphasizing its significance in international technology competition. Domain experts demonstrate more moderate positive sentiment, particularly toward technical themes (TB and IS), reflecting more measured professional assessments.

Institutional accounts show distinctively different sentiment patterns compared to other agent types, with consistently moderate levels of positive sentiment across all themes. Their sentiment values cluster narrowly between 0.11 and 0.15, indicating a more neutral or balanced communicative approach. This restrained sentiment expression aligns with institutional communication strategies that typically emphasize objectivity and avoid strong emotional framing.

The clear differences in emotional expression between different user groups reveal how professional and institutional roles shape technology discussions. General users (everyday social media participants) showed the most enthusiasm when discussing DeepSeek as a competitive achievement for China, expressing national pride and excitement about technological advancement. Domain experts (AI researchers, computer scientists, etc.) were more measured in their responses, focusing primarily on technical capabilities and security implications with more balanced assessments. Institutional accounts (companies, media organizations, and government entities) maintained notably neutral and consistent messaging across all topics, avoiding strong positive or negative positions—a communication strategy typical of organizations seeking to maintain credibility. These patterns demonstrate how a person’s professional background and organizational affiliation directly influence both which aspects of a type of technology they find most important and how emotionally they respond to technological developments.

These simulation results demonstrate how our agent-based digital twin can reproduce complex patterns of discourse evolution while providing mechanistic explanations for observed phenomena. The model captures both the gradual saturation of participation and the non-linear dynamics of sentiment evolution, offering insights into how different user segments contribute to the collective construction of technology discourse. The simulated divergence in sentiment trajectories across themes and user types reveals deep cognitive and emotional asymmetries in how technological narratives are received. General users’ high emotional responsiveness to themes like “Technological Competition” indicates the salience of national identity and symbolic pride in shaping public perception—an insight aligned with theories of technonationalism. Meanwhile, domain experts’ lower, more stable sentiment curves suggest that informational quality and technical rigor are more critical than emotional resonance. This divergence implies that communication strategies targeting lay audiences may require emotional amplification, while those for experts require epistemic precision. Moreover, the declining sentiment across all groups after peak engagement suggests a saturation point, or “emotional fatigue,” which underscores the need for narrative reactivation to sustain attention in long-term technological campaigns.

4.3. Network Effects and Interaction Patterns in Digital Twin Simulation of User Interactions

The network analysis component of our digital twin simulation provides critical insights into the structural dynamics of user interactions throughout the DeepSeek discourse evolution. Figure 9 presents the temporal evolution of network properties, specifically focusing on clustering coefficients and degree distributions at different stages of the simulation. These metrics reveal how the underlying social structure of the discourse community evolved as participants engaged with different aspects of the DeepSeek phenomenon.

The evolution of the clustering level exhibits a distinctive pattern characterized by initial fluctuation followed by stabilization. In the early stages of discourse evolution (steps 0–20), the clustering coefficient demonstrates significant volatility, oscillating between values of 0.25 and 0.35. This instability reflects the rapid reconfiguration of network structures during the initial phase when discourse participants were still developing stable communication patterns and opinion communities were in the process of formation. As the simulation progresses to the middle phase (steps 20–60), the clustering coefficient gradually stabilizes around 0.32, indicating the emergence of more consistent interaction patterns and the consolidation of discourse communities.

During the later stages of the simulation (steps 60–100), the clustering coefficient exhibits a subtle but consistent upward trend, ultimately reaching values near 0.35. This gradual increase suggests a progressive densification of local network structures, as participants increasingly engage with others who share similar viewpoints or thematic interests. The heightened clustering in the mature phase of discourse indicates the formation of distinctive opinion communities with strong internal connectivity, a phenomenon often observed in online discourse around polarizing technological innovations.

The sequence of degree distributions captured at different time points throughout the simulation provides further insights into the evolving network structure. In the initial stages (steps 0–10), the degree distribution exhibits characteristics of a random network, with a relatively symmetrical bell-shaped curve centered around the average degree value. This reflects the early phase of discourse when connection patterns were still largely unstructured and opinion communities had not yet crystallized.

As the simulation progresses to the middle phase (steps 30–50), the degree distribution begins to skew rightward, developing a longer tail that indicates the emergence of higher-degree nodes. This transformation suggests the progressive formation of hub-like structures within the network, as certain participants began to accumulate disproportionate numbers of connections. These emerging hubs likely represent influential opinion leaders or particularly engaging content sources that attracted greater attention within the discourse community.

In the mature phase of discourse (steps 70–100), the degree distribution exhibits clear characteristics of a scale-free network, with a power-law distribution featuring many low-degree nodes and a small number of very high-degree nodes. This structural transformation reflects the Matthew effect in online discourse, where a small number of highly influential participants capture a disproportionate share of engagement, while most participants maintain relatively limited connection networks.

Particularly notable is the emergence of nodes with degrees exceeding 30 in the final stages of the simulation, representing super-connectors within the discourse network. These high-centrality nodes likely function as bridges between different discourse communities, facilitating information flow across thematic boundaries and influencing sentiment dynamics throughout the network. The progressive emergence of these influential nodes demonstrates how network effects amplify the impact of certain participants, creating opinion leaders whose influence extends far beyond their immediate connections.

The evolution of network structures revealed in our simulation closely mirrors empirical patterns observed in online discourse communities. The transition from relatively homogeneous initial connection patterns to increasingly hierarchical structures with pronounced hub formation represents a common developmental trajectory in technology discourse networks. This structural evolution has significant implications for information diffusion, sentiment propagation, and opinion formation, as network structure shapes both the speed and patterns of content spread.

These findings demonstrate how our digital twin can capture the complex structural dynamics underlying technology discourse formation. By simulating not only individual agent behaviors but also their evolving connection patterns, our model provides insights into how network effects shape collective sense-making processes around emerging technologies. The structural analysis complements our thematic and sentiment analyses, offering a multidimensional perspective on the DeepSeek discourse evolution that integrates content, affect, and relational dimensions. The evolution of the interaction network from random to scale-free topology has significant theoretical and practical implications. Structurally, it reflects the Matthew effect in discourse power—where influential nodes (opinion leaders or viral accounts) progressively dominate meaning-making and information spread. This aligns with preferential attachment models in complex systems theory and highlights the inevitable centralization of discursive influence in open platforms. Practically, this suggests that early identification and engagement with high-centrality users is crucial for shaping public perception. Furthermore, the rising clustering coefficient implies the crystallization of “opinion communities” or echo chambers, which may lead to interpretive segmentation or polarization. For platform designers and policy communicators, this finding warns of the risk of cognitive fragmentation even in technologically unified spaces.

These network analysis findings not only reveal the structural dynamics of technical discourse formation but also the micro-mechanisms of information diffusion and opinion formation. The evolution of the network structure from an initial random connection pattern to a scale-free structure with significant centralization features reveals the Matthew effect in the formation of technical discourse—a small number of high-influence nodes gradually dominate the process of discourse construction through the mechanism of preferential connection, which in turn reinforces the uneven distribution of information flow. This structural evolution has a profound impact on the formation of technology perception: first, the central node (usually an opinion leader or highly engaged content) becomes a key mediator of meaning construction, and its frame choice and emotional expression will have a disproportionate impact on the overall discourse atmosphere; second, as the clustering coefficient stabilizes and rises, the boundaries of the opinion community are gradually solidified, forming a relatively independent interpretive community, which may lead to the technological cognitive differentiation and fragmentation; third, high centrality nodes in high schools connect multiple subject communities at the same time and become hubs for cross-topic information flow, which explains why certain topics (e.g., Technological Competition vs. Financial Market) exhibit stronger co-evolutionary patterns. These network dynamics explain the structural power distribution in technological discourse, revealing the deeper mechanisms of why certain explanatory frameworks are able to quickly dominate public discourse while others struggle to gain sufficient visibility. At the practical level, these findings suggest that technological communication strategies should pay special attention to the identification and interaction of high centrality nodes as these key nodes constitute structural fulcrums for the construction of technological meanings and can significantly amplify or diminish the social impact of particular technological narratives.

4.4. Practical Scenario Demonstration: Simulated Decision-Making Based on Manus Controversy

To concretize the application potential of the proposed digital twin framework, we introduce a real-world inspired scenario involving the Manus AI controversy—a rapidly unfolding technical discourse event that occurred in March 2025. This case provides a representative context for demonstrating how different stakeholders can utilize our system to simulate public discourse evolution and inform strategic decisions.

The Manus incident began with the release of an AI agent system that initially received overwhelming public support on social media. However, within days, skepticism and backlash emerged in response to technical criticisms and concerns about originality and code transparency. The resulting public discourse exhibited patterns of stance reversal, emotional contagion, and opinion polarization, making it a highly relevant use case for discourse simulation.

Using empirical data from over 36,000 Weibo interactions, we constructed an agent-based simulation replicating the sentiment and stance dynamics of the controversy. As shown in Figure 10, the simulation reveals that initial support gradually decreases as skepticism rises—until a critical inflection point triggers a rebound in supportive sentiment, consistent with a counter-spiral of silence effect. Concurrently, emotional intensities evolve asymmetrically, with surprise emerging as the dominant emotion, followed by growing anger and sadness.

From a decision-making perspective, such simulations allow stakeholders to test how public reactions might evolve under different framing strategies or disclosure sequences. For instance, consider the following illustrative applications.

Technology firms may use the system to simulate public response before launching a product. By modeling the impact of various announcement tones (e.g., emphasizing innovation vs. transparency), they can anticipate potential sentiment volatility and prepare mitigation plans.

Policy makers regulating controversial technologies (e.g., AI regulation and data security laws) can simulate public perception under different regulatory narratives and timeframes. System outputs help predict whether a proposed policy could trigger defensive backlash or foster consensus.

Media and public relations teams can deploy the digital twin to identify peak emotional windows and decide on optimal communication strategies, such as whether to engage directly with emerging skepticism or allow supportive communities to self-organize.

These user-oriented processes are supported by the digital twin’s simulation outputs of both emotional and structural discourse evolution. Figure 11 illustrates the network transformation throughout the simulation. Initial random-like networks evolve into highly clustered, emotionally and ideologically polarized communities. Supporters of Manus form tightly connected hubs, while skeptics remain fragmented. This demonstrates that emotional dynamics and discourse structure co-evolve, informing users not only of “what” will be said but also “how” it will be spread.

In summary, this case scenario highlights how our digital twin can function as a decision support system for diverse actors navigating high-stakes technological controversies. It operationalizes simulation outputs into actionable insights, enabling anticipation of discourse trajectories, adaptive framing of communications, and targeted engagement strategies.

5. Discussion

Our computational simulation of DeepSeek-related discussions provides important insights into how public opinion forms around new technologies. By combining text analysis methods (that identify what people are discussing) with agent-based modeling (that replicates how users interact), we created a virtual representation that shows how online conversations about technology develop and spread. This simulation reveals the specific mechanisms driving public reaction to new AI systems: individual social media users interact within specific platform structures (like Weibo’s recommendation algorithms), discussing particular themes (like national competition or technical capabilities), which collectively produce observable patterns of public engagement that would be difficult to predict through conventional research methods.

The temporal evolution patterns revealed through our analysis suggest that technology discourse follows distinctive developmental trajectories that differ from other forms of public opinion. Unlike crisis-triggered discourse, which typically exhibits rapid onset followed by monotonic decay, the DeepSeek discourse demonstrated a multi-phase pattern with sustained engagement levels and recurring attention cycles. This pattern reflects the complex nature of technological sense-making, where initial interpretations are continuously revised as participants develop more sophisticated understanding and encounter diverse perspectives.

The thematic clustering results illuminate the multidimensional nature of public engagement with advanced technologies. Rather than being understood through a single dominant frame, DeepSeek simultaneously occupied multiple interpretive domains spanning technical evaluation, economic analysis, geopolitical positioning, and security assessment. This thematic multiplicity reflects the complex positioning of cutting-edge AI technologies at the intersection of technical, economic, political, and social domains. The prominence of geopolitical framing (particularly China–US competition) highlights how technological developments are increasingly interpreted through nationalistic frames that transcend purely technical assessments.

Our agent-based simulation results reveal how different user segments contribute distinctively to discourse formation. The notable differences in both participation patterns and sentiment expression across general users, domain experts, and institutional accounts demonstrate how professional identities and institutional positioning shape discourse contributions. The stronger positive sentiment expressed by general users compared to specialists suggests that technological enthusiasm may be inversely related to domain expertise, a finding that aligns with previous research on technology perception.

The network analysis findings highlight the critical role of structural dynamics in shaping discourse evolution. The progressive transformation from random-like initial connection patterns to scale-free structures with pronounced hubs reflects the emergent hierarchy that characterizes mature discourse communities. This structural evolution has significant implications for information diffusion and opinion formation as network structure increasingly concentrates influence among a small subset of participants who function as opinion leaders.

These insights demonstrate the value of digital twin methodologies for understanding complex socio-technical phenomena. By creating a comprehensive virtual representation of the DeepSeek discourse ecosystem, we were able to identify behavioral mechanisms and structural dynamics that would have been difficult to discern through conventional analytical approaches. The integration of semantic analysis with agent-based simulation provides a more comprehensive understanding of discourse dynamics than either approach could achieve in isolation.

6. Conclusions

6.1. Theoretical Contributions

This research makes several significant theoretical contributions to our understanding of technology discourse dynamics and digital twin methodologies. First, we demonstrated the feasibility of constructing comprehensive digital twins for social phenomena, extending the paradigm beyond its traditional applications in physical and industrial systems. By integrating semantic analysis with agent-based modeling, we showed how digital twins can capture both content characteristics and behavioral dynamics in technology discourse, providing a more holistic representation than conventional analytical approaches.

Second, our findings advance the understanding of how different user segments contribute to technology discourse formation. The distinctive participation patterns and sentiment orientations observed across general users, domain experts, and institutional accounts illuminate how professional identities and institutional positioning shape discourse contributions. This multi-agent perspective offers a more nuanced understanding of discourse formation than approaches that treat public opinion as a homogeneous construct.

Third, our research contributes to theoretical models of technology perception by demonstrating the multidimensional nature of public engagement with advanced AI technologies. The identification of six distinct thematic domains highlights how technologies like DeepSeek simultaneously occupy multiple interpretive frames, challenging simplistic narratives about public technology reception. In particular, our findings regarding the prominence of geopolitical framing contribute to emerging scholarship on the nationalization of technology discourse in contemporary contexts.

Finally, the network analysis component of our research advances the understanding of structural evolution in discourse communities. The documented transformation from random-like initial connection patterns to scale-free structures with pronounced hubs provides empirical support for theoretical models of preferential attachment in online discourse networks while illuminating the progressive concentration of influence that shapes collective sense-making processes.

6.2. Practical Implications

While primarily designed as a methodological framework, the smart digital twin approach proposed in this study also holds potential practical relevance for stakeholders concerned with technology communication and governance. By integrating semantic analysis, agent-based modeling, and network simulation, the framework offers a structured means to observe how different themes, user groups, and sentiment patterns co-evolve in the early stages of technological discourse.

Rather than functioning as a finalized tool or deployable system, this framework is best understood as a simulation-based research infrastructure that can support reflection, strategic foresight, and stakeholder engagement. For instance, technology developers may use such simulations to better understand how certain features or narratives resonate across different segments of the public, especially when multiple interpretive frames—such as national competitiveness, economic prospects, or ethical concerns—intersect and compete in shaping public perception.

To illustrate this practical relevance, we conducted an experimental scenario simulation based on the Manus Controversy (Section 4.4), a real-world-inspired case involving sudden shifts in public stance and emotional polarization following the release of a contested AI product. In this scenario, developers simulated alternative communication strategies to observe how shifts in disclosure timing or framing might influence discourse dynamics. The results provided insights into the trajectory of sentiment reversal, emotional contagion, and the role of opinion leaders—offering developers a conceptual environment for evaluating potential communication risks before actual deployment. This case does not suggest operational deployment of the framework but demonstrates how the method can support deliberative and exploratory tasks in practice-oriented contexts.

Communication strategists may also find value in the observed variations in emotional expression and topic engagement across user types. Our simulation results suggest that message reception is not uniform: general users tend to respond more strongly to socially and nationally framed narratives, while professional users engage more critically with technical and security issues. Recognizing such differences can inform more nuanced communication designs, even if these insights emerge within a controlled modeling environment rather than real-world validation.

Similarly, for policy makers, our findings on the network evolution of discourse communities underscore the significance of highly connected individuals or accounts—so-called “super-connectors”—in shaping information diffusion. While not offering predictive precision, the simulation illustrates how structural influence accumulates and how discourse centralization may affect the spread and reception of technological narratives. These observations may inform the timing, targeting, or framing of policy communication in complex or sensitive technology governance contexts.

Overall, this study’s digital twin framework should not be interpreted as a predictive tool per se but as a methodologically grounded simulation model that offers conceptual and empirical scaffolding for exploring how public discourse around emerging technologies may unfold under different hypothetical conditions. As such, it contributes to the growing repertoire of computational methods that can support anticipatory thinking and communication planning in an increasingly complex socio-technical landscape.

6.3. Future Directions and Limitations

While our research demonstrates the value of digital twin approaches for understanding technology discourse dynamics, several limitations should be acknowledged. First, our simulation was based on a single case study (DeepSeek) within a specific cultural context, limiting the generalizability of our findings to other technologies and cultural settings. Future research should apply similar methodologies to diverse technology cases across multiple cultural contexts to identify both common patterns and contextual variations in discourse dynamics.

Second, our agent-based model relied on simplified representations of user characteristics and behaviors that may not fully capture the complexity of real-world discourse participation. Future work should incorporate more sophisticated behavioral models that account for factors such as cognitive biases, identity-based motivations, and strategic communication objectives. The integration of empirical behavioral data from experimental studies could enhance the psychological realism of agent representations.

Third, the temporal scope of our analysis was limited to the initial discourse formation period following DeepSeek’s release. Extended longitudinal studies would provide valuable insights into how discourse patterns evolve over longer timeframes, potentially revealing cyclical patterns, maturation effects, and responses to external events. Such longitudinal perspectives would enhance understanding of both short-term dynamics and long-term evolutionary trajectories in technology discourse.

Finally, future research should explore the integration of additional data modalities into digital twin frameworks. While our model incorporated textual content, network structures, and behavioral patterns, it did not account for visual content, platform-specific affordances, or cross-platform dynamics. Multimodal approaches that integrate diverse data types could provide more comprehensive representations of contemporary technology discourse ecosystems.

Author Contributions

Conceptualization, K.Z. and C.D.; methodology, K.Z.; software, Y.G.; validation, K.Z., C.D. and G.Y.; formal analysis, K.Z.; investigation, C.D.; resources, Y.G.; data curation, C.D.; writing—original draft preparation, K.Z. and C.D.; writing—review and editing, G.Y. and Y.G.; visualization, K.Z.; supervision, G.Y. and J.M.; project administration, J.M. and Y.G.; funding acquisition, Y.G. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

This article encompasses this study’s original contributions; further inquiries can be addressed to the corresponding author.

Acknowledgments

The authors would like to thank the National Natural Science Foundation of China and the “HIT-China Mobile” 5G Application Innovation Joint Research Institute for providing the funding to support this research. They would also like to thank Zhiwei Research Institute for providing the data collection platform.

Conflicts of Interest

Author Yifeng Guo was employed by the company China Mobile System Integration Co. Ltd. The remaining authors declare that this research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Table

Figure 1 LLM-BERTopic public opinion analysis methodology.

Figure 2 LLM processing of data processing module.

Figure 3 LLM processing of topic clustering modules.

Figure 4 Agent-based simulation for emergent patterns of discourse evolution.

Figure 5 Trends in the number of tweets, interactions, and sentiments for DeepSeek.

Figure 6 Topic clustering and its main keywords.

Figure 7 Prediction of theme participation and emotional change trend of agents over time.

Figure 8 Thematic engagement and affective states of the three types of agents in the final homeostatic situation.

Figure 9 Evolution of clustering coefficients and degree distributions for DeepSeek opinion agent networks.

Figure 10 Evolution of stance and sentiment obtained by ABM of Manus controversy.

Figure 11 Network visualization for ABM simulation of Manus controversy. (a) Initial network structure; (b) evolved network structure (by sentiment); (c) evolved network structure (by stance).

Table 1

Comparison between traditional digital twin and our proposed discourse-oriented digital twin.

Dimension	Traditional Digital Twin	Proposed Digital Twin for Technology Discourse
Core Concept	High-fidelity virtual replicas of physical systems	Computational simulation of socio-semantic dynamics in public technology discourse
Modeling Object	Physical assets (e.g., machines, factories, and vehicles)	Social cognitive ecosystems (users, topics, emotions, and networks in LLM-related discourse)
Modeling Techniques	Physics-based modeling + sensor-driven data integration	LLM-enhanced semantic clustering + agent-based simulation + dynamic network evolution
System Functions	Real-time monitoring, predictive maintenance, and operational optimization	Forecasting sentiment evolution, simulating narrative strategies, and informing stakeholder action
Key Innovations	Focused on engineering efficiency and control	(1) Empirical parameterization of discourse agents(2) Simulation of cross-topic emotional diffusion(3) Predictive experimentation in narrative framing scenarios
Temporal Orientation	Real-time mirroring of physical status	Scenario-based forecasting of opinion dynamics before real events unfold
Typical Application	Smart manufacturing, intelligent transportation, and energy grids	Public communication strategy, technology governance, and AI risk perception

Appendix A

Table A1

Glossary of key terms.

Term	Definition	Application Example in This Study
Digital Twin (for discourse)	A computational simulation model that virtually replicates the dynamic structure and evolution of public discourse systems.	Used to simulate how public sentiment and engagement evolve during the 13-day DeepSeek LLM discourse burst.
Agent-Based Modeling (ABM)	A simulation technique that models individual entities (agents) with defined rules and observes their interactions and emergent system behavior.	Agents represent different user types (general users, domain experts, and institutions) with distinct sentiment and communication rules.
Cross-Topic Affective Diffusion	The process by which emotional responses in one topic domain influence or spill over into related thematic areas.	Positive sentiment in “Technological Breakthrough” influences sentiment in “Technological Competition” with a correlation coefficient of 0.8.
Influence Weight	A numerical indicator representing an agent’s relative ability to shape discourse through information dissemination.	Institutional accounts are assigned high influence weights (0.7–0.9), making their posts more likely to affect network sentiment.
Power-Law Distribution	A statistical pattern in which a small number of elements account for most of the activity (long tail distribution).	Most Weibo posts had low engagement, but a few viral posts accounted for the majority of shares and likes.
Scale-Free Network	A type of network where a few nodes (users) have a disproportionately high number of connections, forming central hubs.	The simulation showed the discourse network evolving into a scale-free form, with influential users acting as hubs.
Clustering Coefficient	A measure of the degree to which nodes in a network tend to cluster together.	Used to track community formation during the discourse simulation; values increased over time, indicating stronger opinion group formation.
Semantic Clustering	The grouping of text data based on semantic similarity to identify thematic structures in discourse.	Implemented via LLM-BERTopic integration, identifying six dominant themes in DeepSeek discourse such as “User Feedback” and “Social Impact.”
LLM-BERTopic Integration	A hybrid method combining large language models and BERTopic for semantic representation and thematic modeling of textual data.	Applied to preprocess and cluster 253,280 Weibo posts into semantically coherent discourse topics.
Homeostatic Equilibrium	The stabilized emotional state reached by agents after iterative sentiment updates across interactions.	Describes the final emotional convergence of different user groups after 100 steps in the agent-based simulation.
Topic Participation Rate	The proportion of agents engaging with a particular discourse topic at a given time.	Tracked for six topics over time; “Financial Market” had the highest simulated participation rate by step 100.
Narrative Scenario Simulation	The simulation of discourse evolution under different framing or messaging strategies to predict public response.	Proposed as a practical application: policy makers can test alternative regulatory narratives before releasing new AI policies.
Network Centrality	A measure of a node’s structural importance within a network based on its position and connectivity.	High-centrality nodes emerged in simulation and were interpreted as opinion leaders or viral content hubs.
Sentiment Polarity	The direction of emotional expression—positive, negative, or neutral—toward a specific topic or message.	Extracted from social media posts and used to initialize agent emotions in simulation.
Temporal Sentiment Evolution	The change in collective sentiment over time as a discourse progresses.	Simulated sentiment curves showed early enthusiasm followed by stabilization and partial decline.

References

1. Sadowski, J.; Bendor, R. Selling Smartness: Corporate Narratives and the Smart City as a Sociotechnical Imaginary. Sci. Technol. Hum. Values; 2019; 44, pp. 540-563. [DOI: https://dx.doi.org/10.1177/0162243918806061]

2. Bareis, J.; Katzenbach, C. Talking AI into Being: The Narratives and Imaginaries of National AI Strategies and Their Performative Politics. Sci. Technol. Hum. Values; 2022; 47, pp. 855-881. [DOI: https://dx.doi.org/10.1177/01622439211030007]

3. Zhang, D.; Wu, X.; Liu, P.; Huang, K.; Wei, W.; Liu, X. Identification of Product Innovation Path Incorporating the FOS and BERTopic Model from the Perspective of Invalid Patents. Appl. Sci.; 2023; 13, 7987. [DOI: https://dx.doi.org/10.3390/app13137987]

4. Wittmayer, J.M.; Backhaus, J.; Avelino, F.; Pel, B.; Strasser, T.; Kunze, I.; Zuijderwijk, L. Narratives of Change: How Social Innovation Initiatives Construct Societal Transformation. Futures; 2019; 112, 102433. [DOI: https://dx.doi.org/10.1016/j.futures.2019.06.005]

5. Wu, Y.; Ji, Y.; Gu, F. Identifying Firm-Specific Technology Opportunities in a Supply Chain: Link Prediction Analysis in Multilayer Networks. Expert Syst. Appl.; 2023; 213, 119053. [DOI: https://dx.doi.org/10.1016/j.eswa.2022.119053]

6. Brown, O.; Davison, R.M.; Decker, S.; Ellis, D.A.; Faulconbridge, J.; Gore, J.; Greenwood, M.; Islam, G.; Lubinski, C.; MacKenzie, N. . Theory-Driven Perspectives on Generative Artificial Intelligence in Business and Management. Br. J. Manag.; 2024; 35, pp. 3-23. [DOI: https://dx.doi.org/10.1111/1467-8551.12788]

7. Grootendorst, M. BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure. arXiv; 2022; arXiv: 2203.05794

8. Mu, Y.; Dong, C.; Bontcheva, K.; Collier, N. Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling. arXiv; 2024; arXiv: 2403.16248

9. Hu, Z.; Lou, S.; Xing, Y.; Wang, X.; Cao, D.; Lv, C. Review and Perspectives on Driver Digital Twin and Its Enabling Technologies for Intelligent Vehicles. IEEE Trans. Intell. Veh.; 2022; 7, pp. 417-440. [DOI: https://dx.doi.org/10.1109/TIV.2022.3195635]

10. Bruch, E.; Atwell, J. Agent-Based Models in Empirical Social Research. Sociol. Methods Res.; 2015; 44, pp. 186-221. [DOI: https://dx.doi.org/10.1177/0049124113506405]

11. Zhuang, X.; Zhang, C. The Discourse Construction of Technology Issues and Their Social Technical Imagination: Taking Weibo Discussion of Pfizer COVID-19 Vaccine as an Example. Journalist; 2022; 3, pp. 14–23+85. (In Chinese)

12. Gao, X.; Li, N. The “Media Mythical Narrative” of ChatGPT from the Perspective of Social Technical Imagination: A Computer-Assisted Content Analysis Based on WeChat Public Platform. Shanghai J. Rev.; 2023; pp. 28-44. (In Chinese) [DOI: https://dx.doi.org/10.16057/j.cnki.31-1171/g2.2023.10.009]

13. Wang, J.; Yang, L.; Li, X.; Niu, R. Multi-Dimensional Evolution Analysis of ChatGPT Network Public Opinion Characteristics. J. Intell.; 2024; 43, pp. 138-145. (In Chinese)

14. Bibri, S.E.; Huang, J.; Jagatheesaperumal, S.K.; Krogstie, J. The Synergistic Interplay of Artificial Intelligence and Digital Twin in Environmentally Planning Sustainable Smart Cities: A Comprehensive Systematic Review. Environ. Sci. Ecotechnol.; 2024; 20, 100433. [DOI: https://dx.doi.org/10.1016/j.ese.2024.100433]

15. Tao, F.; Qi, Q.; Wang, L.; Nee, A.Y.C. Digital Twins and Cyber-Physical Systems Toward Smart Manufacturing and Industry 4.0: Correlation and Comparison. Engineering; 2019; 15, pp. 2460-2472. [DOI: https://dx.doi.org/10.1016/j.eng.2019.01.014]

16. El Saddik, A. Digital Twins: The Convergence of Multimedia Technologies. IEEE Multimed.; 2018; 25, pp. 87-92. [DOI: https://dx.doi.org/10.1109/MMUL.2018.023121167]

17. San Miguel, M.; Toral, R. Introduction to the chaos focus issue on the dynamics of social systems. Chaos Interdiscip. J. Nonlinear Sci.; 2020; 30, 120401. [DOI: https://dx.doi.org/10.1063/5.0037137]

18. Talib, N.; Fitzgerald, R. Micro–Meso–Macro Movements; a Multi-Level Critical Discourse Analysis Framework to Examine Metaphors and the Value of Truth in Policy Texts. Crit. Discourse Stud.; 2016; 13, pp. 531-547. [DOI: https://dx.doi.org/10.1080/17405904.2016.1182932]

19. Oliveira, P.P. Digital Twin Development for Airport Management. J. Airpt. Manag.; 2018; 14, pp. 246-259. [DOI: https://dx.doi.org/10.69554/PZMM9316]

20. Zografos, K.G.; Madas, M.A. Development and Demonstration of an Integrated Decision Support System for Airport Performance Analysis. Transp. Res. Part C Emerg. Technol.; 2019; 14, pp. 1-17. [DOI: https://dx.doi.org/10.1016/j.trc.2006.04.001]

21. Liu, Z.; Feng, J.; Uden, L. Technology Opportunity Analysis Using Hierarchical Semantic Networks and Dual Link Prediction. Technovation; 2023; 128, 102872. [DOI: https://dx.doi.org/10.1016/j.technovation.2023.102872]

22. Ren, R.; Xiang, L.; Xu, W.; Zhao, M. Discovery of New Energy Vehicle Technology Opportunities Based on BERTopic Method. J. Intell.; 2025; 44, pp. 147-155. (In Chinese)

23. Macal, C.M.; North, M.J. Tutorial on Agent-Based Modelling and Simulation. J. Simul.; 2010; 4, pp. 151-162. [DOI: https://dx.doi.org/10.1057/jos.2010.3]

24. Mou, X.; Wei, Z.; Huang, X. Unveiling the Truth and Facilitating Change: Towards Agent-Based Large-Scale Social Movement Simulation. arXiv; 2024; arXiv: 2402.16333

25. Dearing, J.W.; Cox, J.G. Diffusion of Innovations Theory, Principles, and Practice. Health Aff.; 2018; 37, pp. 183-190. [DOI: https://dx.doi.org/10.1377/hlthaff.2017.1104]

26. Lewis, W.; Agarwal, R.; Sambamurthy, V. Sources of Influence on Beliefs about Information Technology Use: An Empirical Study of Knowledge Workers. MIS Q.; 2003; 27, pp. 657-678. [DOI: https://dx.doi.org/10.2307/30036552]

27. Moreau, C.P.; Lehmann, D.R.; Markman, A.B. Entrenched Knowledge Structures and Consumer Response to New Products. J. Mark. Res.; 2001; 38, pp. 14-29. [DOI: https://dx.doi.org/10.1509/jmkr.38.1.14.18836]

28. Christos, K.; Dimitrios, B.; Dimitrios, A. Agent-Based Simulation for Modeling Supply Chains: A Comparative Case Study. Int. J. New Technol. Res.; 2016; 2, 263416.

29. Anand, N.; Van Duin, J.R.; Tavasszy, L. Framework for Modelling Multi-Stakeholder City Logistics Domain Using the Agent-Based Modelling Approach. Transp. Res. Procedia; 2016; 16, pp. 4-15. [DOI: https://dx.doi.org/10.1016/j.trpro.2016.11.002]

30. Wibowo, R.A.; Hidayatno, A.; Moeis, A.O. Simulating Port Expansion Plans Using Agent-Based Modelling. Int. J. Technol.; 2015; 6, pp. 864-871. [DOI: https://dx.doi.org/10.14716/ijtech.v6i5.1900]

31. Busetta, P.; Rönnquist, R.; Hodgson, A.; Lucas, A. Jack Intelligent Agents-Components for Intelligent Agents in Java. AgentLink News Lett.; 1999; 2, pp. 2-5.

32. Zhang, P. The Affective Response Model: A Theoretical Framework of Affective Concepts and Their Relationships in the ICT Context. MIS Q.; 2013; 37, pp. 247-274. [DOI: https://dx.doi.org/10.25300/MISQ/2013/37.1.11]

33. Stieglitz, S.; Dang-Xuan, L. Emotions and Information Diffusion in Social Media—Sentiment of Microblogs and Sharing Behavior. J. Manag. Inf. Syst.; 2013; 29, pp. 217-248. [DOI: https://dx.doi.org/10.2753/MIS0742-1222290408]

34. Scherer, K.R. The Dynamic Architecture of Emotion: Evidence for the Component Process Model. Cogn. Emot.; 2009; 23, pp. 1307-1351. [DOI: https://dx.doi.org/10.1080/02699930902928969]

35. Dong, J.-X.; Song, D.-P. Container Fleet Sizing and Empty Repositioning in Liner Shipping Systems. Transp. Res. Part E Logist. Transp. Rev.; 2009; 45, pp. 860-877. [DOI: https://dx.doi.org/10.1016/j.tre.2009.05.001]

36. Kuzmicz, K.A.; Pesch, E. Approaches to Empty Container Repositioning Problems in the Context of Eurasian Intermodal Transportation. Omega; 2019; 85, pp. 194-213. [DOI: https://dx.doi.org/10.1016/j.omega.2018.06.004]

37. Suttmeier, R.P.; Yao, X.; Tan, A.Z. Standards of Power? Technology, Institutions, and Politics in the Development of China’s National Standards Strategy. Geopolit. Hist. Int. Relat.; 2009; 1, pp. 46-84.

38. Müller, D.; Tierney, K. Decision Support and Data Visualization for Liner Shipping Fleet Repositioning. Inf. Technol. Manag.; 2017; 18, pp. 203-221. [DOI: https://dx.doi.org/10.1007/s10799-016-0259-3]

39. Liu, L.; Wan, F. Unveiling Temporal and Spatial Research Trends in Precision Agriculture: A BERTopic Text Mining Approach. Heliyon; 2024; 10, e36808. [DOI: https://dx.doi.org/10.1016/j.heliyon.2024.e36808]

40. Jeon, E.; Yoon, N.; Sohn, S.Y. Exploring New Digital Therapeutics Technologies for Psychiatric Disorders Using BERTopic and PatentSBERTa. Technol. Forecast. Soc. Chang.; 2023; 186, 122130. [DOI: https://dx.doi.org/10.1016/j.techfore.2022.122130]

41. Calegari, R.; Ciatto, G.; Mascardi, V.; Omicini, A. Logic-Based Technologies for Multi-Agent Systems: A Systematic Literature Review. Auton. Agents Multi-Agent Syst.; 2021; 35, 1. [DOI: https://dx.doi.org/10.1007/s10458-020-09478-3]

42. Marsella, S.C.; Gratch, J. Computationally Modeling Human Emotion. Commun. ACM; 2014; 57, pp. 56-67. [DOI: https://dx.doi.org/10.1145/2631912]

43. Moreno-Jiménez, J.M.; Vargas, L.G. Cognitive Multiple Criteria Decision Making and the Legacy of the Analytic Hierarchy Process. Stud. Appl. Econ.; 2018; 36, pp. 67-80. [DOI: https://dx.doi.org/10.25115/eea.v36i1.2516]

44. Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. arXiv; 2019; arXiv: 1908.10084

45. McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv; 2018; arXiv: 1802.03426

46. McInnes, L.; Healy, J.; Astels, S. hdbscan: Hierarchical Density Based Clustering. J. Open Source Softw.; 2017; 2, 205. [DOI: https://dx.doi.org/10.21105/joss.00205]

Word count: 13523

Show less

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Understanding user interaction patterns during technology-triggered public discourse provides critical insights into how emerging technologies gain social meaning. This study develops an intelligent digital twin framework for modeling discourse dynamics around DeepSeek, an indigenous large language model that generated approximately 250,000 social media interactions during a 13-day period. By integrating LLM-enhanced semantic analysis with agent-based modeling, we create a comprehensive virtual representation that captures both content characteristics and behavioral dynamics. Our analysis identifies six distinct thematic domains that structure public engagement: Technological Competition, Technological Breakthrough, User Feedback, Financial Market, Social Influence, and Information Security. The agent-based simulation reveals distinctive participation and sentiment patterns across different user segments, with general users expressing stronger positive sentiments than domain experts and institutional accounts. Network analysis demonstrates the evolution from random-like initial connection patterns to scale-free structures with pronounced influence hubs. The simulation results illuminate how individual behaviors aggregate to produce complex discourse patterns, offering insights into the micro-mechanisms underlying technology reception. This research advances digital twin methodologies beyond physical systems into social phenomena, providing a framework for anticipating public responses to technological innovations and informing more effective communication strategies.

Details

Title

Intelligent Digital Twin for Predicting Technology Discourse Patterns: Agent-Based Modeling of User Interactions and Sentiment Dynamics in DeepSeek Discourse Case

Author

Zhang Kaihang¹; Dong Changqi¹

; Guo Yifeng²; Yu, Guang³; Mi Jianing⁴

¹ School of Management, Harbin Institute of Technology, Harbin 150001, China; [email protected] (K.Z.); [email protected] (G.Y.), Harbin Institute of Technology-China Mobile Limited 5G Application Innovation Joint Research Institute, Harbin 150006, China; [email protected]
² Harbin Institute of Technology-China Mobile Limited 5G Application Innovation Joint Research Institute, Harbin 150006, China; [email protected]
³ School of Management, Harbin Institute of Technology, Harbin 150001, China; [email protected] (K.Z.); [email protected] (G.Y.)
⁴ Harbin Institute of Technology-China Mobile Limited 5G Application Innovation Joint Research Institute, Harbin 150006, China; [email protected], School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing 100876, China

First page

451

Publication year

2025

Publication date

2025

Publisher

MDPI AG

e-ISSN

20798954

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/systems13060451

ProQuest document ID

3223942078

Intelligent Digital Twin for Predicting Technology Discourse Patterns: Agent-Based Modeling of User Interactions and Sentiment Dynamics in DeepSeek Discourse Case

Jump to:

Full text

1. Introduction

2. Literature Review

2.1. Digital Twin Paradigms for Modeling Public Discourse Dynamics

2.2. Agent-Based Modeling of User Interaction Patterns in Technology-Triggered Environments

2.3. Multi-Agent Systems in Simulating Sentiment Evolution and Thematic Diffusion Processes

3. Data and Methods

3.1. Data Collection

3.2. Conducting Semantic Analysis Through LLM-BERTopic Integration

3.2.1. Data Processing Module Enhancement

3.2.2. Representation Clustering Module

3.2.3. Topic Analysis Module

3.3. Agent-Based Model Design for User Behavior Simulation

4. Results

4.1. Temporal Evolution Patterns and Thematic Clustering in DeepSeek Discourse

4.2. Agent-Based Simulation of Public Opinion Formation and Sentiment Dynamics

4.3. Network Effects and Interaction Patterns in Digital Twin Simulation of User Interactions

4.4. Practical Scenario Demonstration: Simulated Decision-Making Based on Manus Controversy

5. Discussion

6. Conclusions

6.1. Theoretical Contributions

6.2. Practical Implications

6.3. Future Directions and Limitations

Abstract

Details

Suggested sources