Data asset valuation model based on generative

Full text

Turn on search term navigation

1. Introduction

In the digital economy era, data has emerged as a crucial competitive advantage for businesses [1–3], and its asset value has become increasingly prominent [4–7]. With the rapid advancement of Generative Artificial Intelligence (Generative AI), how data is acquired, processed, and utilized undergoes fundamental transformations [8]. Generative AI produces new samples based on existing data and redefines the intrinsic value and data application scenarios [9,10]. For instance, it can enhance the effectiveness of model training through synthetic data or utilize generated content to meet personalized user needs [8,11,12], thereby creating significant commercial value for enterprises [13,14].

The “Database Development Research Report (2024)” released by the China Communications Standards Association indicates that the size of the global database market surpassed $100 billion for the first time in 2023, reaching approximately $101 billion. On the other hand, China’s database market size reached $7.41 billion, accounting for 7.34% of the global. It is anticipated that by 2028, the total size of the Chinese database market will reach 93.029 billion RMB, with a compound annual growth rate (CAGR) of 12.23%. However, despite the explosive growth in data volume, the valuation of data assets remains relatively lagging [15,16]. Traditional data assessment methods have often emphasized static characteristics of data [17], such as quantity, quality, and historical value, failing to effectively account for the dynamic changes and innovative potential brought about by Generative AI [8,14,18]. The value of data assets not only relies on their intrinsic quality and relevance but is also closely related to their application scenarios, conversion capabilities, and market demand within the framework of Generative AI [15,10,19,20,21]. The application value of a particular dataset may be extremely high in one scenario, yet yield little to no value in another. Furthermore, the widespread application of Generative AI has significantly increased data usage frequency and scenario diversity, presenting new challenges for the valuation of data assets [9,12,22,23].

In this context, it becomes specifically important to establish a model for data asset valuation based on Generative AI [24,25]. This model should comprehensively consider data generation characteristics, market dynamics, and strategic objectives of enterprises, providing a scientific pricing basis for data assets through a systematic evaluation framework [20,26]. Effective data asset valuation not only aids businesses in better managing and utilizing their data assets but also enhances the scientific and forward-looking nature of decision-making, thereby achieving a relative advantage in intense market competition [27–29].

The primary contribution of this paper lies in bridging the theoretical gap between Generative AI and data asset valuation [30], deepening the understanding of the relationship between them. Specifically, how Generative AI reshapes the value attributes of data and subsequently influences the valuation standards for data assets within enterprises is researched [24,25]. By constructing a valuation model that comprehensively considers the generative characteristics of data, market dynamics, and corporate strategic goals [22,31], this article aims to provide new perspectives and methodologies for research and practice. Moreover, how Generative AI alters assessment metrics and methods for data assets across different application scenarios is researched [8,18,32]. Through empirical analysis, the efficacy and applicability of the proposed model will be validated, offering theoretical support and practical guidance for data asset management in the digital transformation processes of businesses [20,33]. Additionally, this research will provide empirical foundations for policymakers to promote the sustainable development of the data economy [34–36].

2. Model introduction

This paper proposes a valuation model based on generative AI to assess the value of data assets effectively. The model consists of three main components: feature extraction, value generation algorithms, and market adaptability assessment. It aims to provide accurate and dynamic evaluation results, thereby filling gaps in the existing literature and enhancing the scientific rigor and foresight of data-driven decision-making in enterprises.

2.1 Data feature extraction

Data feature extraction was the preliminary step of the model, primarily tasked with ensuring high-quality and validated data. This process included the following key steps:

Data Quality Assessment: To ensure the integrity, accuracy, and consistency of the data, three key indicators were established:

Completeness:

(1)

where N_valid denotes the number of valid data records, and N_total shows the total number of records.

Accuracy:

(2)

where N_correct represents the number of correct data records.

Consistency: This was confirmed by comparing the same data across multiple data sources.

Type Classification: Data was categorized according to its characteristics, including:

Structured Data: Data that can be represented in tabular form, such as in SQL databases.

Semi-Structured Data: Data in formats like JSON or XML.

Unstructured Data: Data such as text, images, etc.

Feature Engineering: In this stage, Generative Adversarial Networks (GANs) were employed to generate new samples, thereby enhancing the diversity and representativeness of the dataset. The generative model was represented as:

(3)

The discriminative model was represented as:

(4)

Through adversarial training, the objective was to maximize the following loss function:

(5)

2.2 Value generation algorithm

The value generation algorithm constituted the core model, aiming to establish a mapping between data features and market value. The structure could be formalized as follows:

Input Layer:

(6)

Hidden Layers: A multi-layer neural network was employed, setting the weight matrix W_i for each layer and utilizing an activation function:

(7)

where f represents the activation function, commonly ReLU or Sigmoid.

Output Layer: The predicted value V could be expressed as:

(8)

where L represents the total number of hidden layers, and H(L) denotes the activation output of the last layer.

2.3 Market adaptability assessment

To ensure that the model’s evaluation results aligned with market dynamics, the following strategies were employed:

Market Feedback Mechanism:

(9)

V_pred and V_actual represent the model’s predicted and actual market values, respectively. This comparison analysis was conducted to optimize model parameters.

Scenario Analysis: For different market scenarios S_k, the simulated analysis was defined as:

(10)

where V_base represents the baseline prediction, and ΔV(S_k) indicates the adjustment in a specific scenario.

Sustainability Assessment: Monitoring the evolution trend of data asset value:

(11)

where V_t represents the asset value at the current moment, and ΔV_t reflects value adjustments due to external environmental changes.

2.4 Formal description of the model

Integrating the aforementioned components, the model could be formally described by the following comprehensive expression:

(12)

where V denotes the predicted value of the data asset, encompassing the generative model G, the discriminative model D, the neural network output H(L), the market feedback function F_market, and the actual market value V_actual.

3. Research design

This section presents the processes and notations used in the article, such as sample selection, dependent and predictive variables, model evaluation, and hyperparameter ranges, thereby ensuring the model’s efficacy and scientific rigor.

3.1 Sample selection

The sample selection focused on representative industries and enterprises, encompassing a variety of data asset conditions. The sample was primarily collected from Chinese A-share listed companies, covering 2015 to 2023. The rationale for selecting this timeframe was that enterprises experienced significant changes in the utilization and evaluation of data assets, particularly with the ongoing development of Generative Artificial Intelligence technologies, which led to an increased emphasis on data assets.

To ensure the representativeness and comparability of the sample, this study first extracted a preliminary sample from Chinese A-share listed companies, totaling 6,250 observations. Companies with anomalous financial data or incomplete information were excluded from the sample, resulting in a final sample size of 5,720, forming a high-quality sample database. Since we apply strict rules to eliminate problematic data, for instance, missing observations and outliers are out completely, the final dataset obtained is composed of directly usable data in the modeling process. Table 1 illustrates the annual collection of samples and provides a solid foundation for subsequent analyses.

[Figure omitted. See PDF.]

3.2 Dependent variable

The dependent variable was the value of a company’s data assets. Specifically, a quantitative evaluation was conducted using data asset-related items reported by the enterprises such as user data, market data, and operational data. The calculation of data asset value was based on corporate financial statements and industry analysis reports, combined with the general perception of data asset value in the market. A specific numerical value was generated through the evaluation model, serving as the core variable to be predicted.

3.3 Selection of predictive variables

A systematic variable framework was constructed based on the available literature and data availability to ensure the model’s comprehensiveness and accuracy in selecting predictive variables. This study selected a total of 50 predictive variables, categorized as follows:

1) Performance indicators.

The performance of a company directly reflected its effective utilization of data assets. The existing literature suggested that financial metrics such as operating revenue and net profit are key indicators of corporate performance [37–39]. This study selected core performance indicators, including Return on Assets (ROA), Return on Equity (ROE), and Operating Cash Flow (OCF). Additionally, the revenue growth rate and profit growth rate were included to reflect the effectiveness of data asset management [40–42].

2) Company characteristics.

Company characteristics significantly influenced data asset valuation. Related research indicated that factors such as company size, industry attributes, and management background could affect the efficiency of data asset utilization [43–45]. This study considered variables such as total assets, industry classification, and employee count, while also paying attention to the company’s capital structure (e.g., debt ratio) [46,47], as this could influence the firm’s willingness to invest in data assets [48–50].

3) Management motivation.

The motivations and decision-making behaviors of management had a direct impact on the management and valuation of data assets. Studies indicated that strategic decisions made by corporate management often reflected their understanding of the value of data assets [51,52]. This study selected variables such as management ownership percentage, industry experience, and turnover frequency of management to analyze the potential impact of these factors on data asset valuation [53–56].

4) Corporate governance.

A sound corporate governance structure could effectively constrain managerial behavior and promote transparency and efficiency regarding data assets [57,58]. This study examined variables such as the structure of the board of directors, the proportion of independent directors, and corporate governance ratings to assess the impact of governance on data asset management [59,60]. Furthermore, the presence or absence of an audit committee was also regarded as an important consideration due to its key role in overseeing managerial decisions [61–63].

5) External environmental factors.

External environmental factors, such as industry growth rates and market competition levels, were also critical to evaluating the value of corporate data assets [64,65]. Literature indicated that rapid industry development might foster more efficient utilization of data assets by enterprises [66,67]. This study aimed to select variables including industry growth rate, changes in market share, and competitor performance within the industry to comprehensively understand the impact of external environments on data asset value.

3.4 Model evaluation

To comprehensively assess the effectiveness of the data asset valuation model based on Generative Artificial Intelligence, this study employed various evaluation metrics. These metrics not only reflected the accuracy of the model but also considered economic practicality and model stability.

1) Selection of evaluation metrics.

Given the sample characteristics and data structure of this study, the following primary evaluation metrics were utilized:

Accuracy: This measured the proportion of correctly classified samples out of the total samples. Due to potential imbalances in the dataset, relying solely on accuracy could lead to biased model evaluations.

Precision and Recall: These two metrics provided insights into the model’s performance across different categories, which was particularly crucial when identifying economically significant samples.

F1-score: This metric combines precision and recall, making it suitable for imbalanced datasets and seeking a balance between accuracy and completeness.

2) Addressing sample imbalance issues.

Due to the potential imbalances in the samples used for the valuation of data assets particularly concerning the scarcity of samples in certain value ranges it was essential to carefully select evaluation metrics [68,69]. The study incorporated the Receiver Operating Characteristic (ROC) curve and the Area Under Curve (AUC) as supplementary evaluation standards. AUC measured the model’s performance at various classification thresholds; higher values indicated stronger differentiation capabilities between positive and negative samples.

3) Cross-validation and model stability.

This study implemented k-fold cross-validation to ensure the model’s robustness across different dataset partitions. By conducting multiple training and validation runs, the model’s consistency and stability in practical applications were effectively assessed. This method not only reduced the model’s dependency on a specific training set but also enhanced its generalization capability [70,71].

4) Feature importance analysis.

During the model evaluation process, attention was given to the impact of each predictive variable on the model’s predictive capability. Feature selection techniques, such as Least Absolute Shrinkage and Selection Operator (LASSO) regression were employed to analyze the contribution of various features, thereby clarifying which variables played a crucial role in the valuation of data assets.

3.5 Hyperparameter range

In this study, the selection of hyperparameters significantly influenced the performance of the model. Therefore, a systematic approach to hyperparameter optimization was adopted to ensure the optimal performance of the Generative Artificial Intelligence model in data asset valuation.

1) Hyperparameter selection methods.

This study utilized methods such as Grid Search and Random Search to explore various combinations of hyperparameters. Each of these methods had its advantages and disadvantages. Grid Search systematically covered all possible combinations within a specified range but incurred higher computational costs; on the other hand, Random Search reduced computational complexity to some extent but risked missing some potentially optimal combinations.

2) Setting hyperparameter ranges.

Based on the characteristics of the Generative Artificial Intelligence model, focus was placed on the following hyperparameters:

Learning Rate: This controlled the step size for updating model weights, influencing convergence speed and final results.

Batch Size: This affected both the stability of model training and the convergence speed, typically chosen within the range of 32 to 256.

Number of Layers and Number of Nodes: These two parameters determined the complexity and capacity of the model, requiring reasonable settings based on actual model needs.

3) Balancing computational cost and accuracy.

During hyperparameter optimization, a trade-off between model accuracy and computational cost was necessary. For instance, increasing the number of layers and nodes might enhance performance but would also significantly raise computational demands. Hence, reasonable hyperparameter ranges were set to ensure that the model achieved high accuracy without wasting computational resources.

4) Consideration of imbalanced samples.

In addressing potential issues with imbalanced samples, adjustments to hyperparameters were considered to incorporate weighting mechanisms, thereby increasing attention to minority class samples. By establishing class weights, the model could better learn the important features that appeared less frequently in the training set.

Through these methods, this study aimed to achieve scientific optimization of the hyperparameters of the Generative Artificial Intelligence model to attain optimal results in data asset valuation. This systematic approach to hyperparameter adjustment would provide a solid foundation for the model’s accuracy and generalization capability, further advancing related research.

3.6 Sample partitioning and model training

In the research on the data asset valuation model, sample partitioning and model training were key steps in ensuring model effectiveness and reliability. This study will elaborate on the following aspects.

1) Sample partitioning strategy.

The sample partitioning approach will be divided into training, validation, and test sets to effectively evaluate the model’s generalizability and stability. Specifically, the entire dataset will be partitioned in a 70:15:15 ratio:

Training Set (70%): Used for model training and parameter tuning, ensuring that the model learns the underlying patterns in the data.

Validation Set (15%): Used for adjusting and selecting hyperparameters, and evaluating the effect of different hyperparameter combinations on model performance.

Test Set (15%): Utilized for the final evaluation of the model, testing its performance on unseen data.

2) Training process.

During the model training stage, the following steps will be undertaken:

Data Preprocessing: This involves data cleaning, normalization, and feature selection to enhance the training effectiveness of the model.

Model Construction: Based on the results of hyperparameter optimization, a Generative Artificial Intelligence model will be built, ensuring that its architecture aligns with the characteristics of the data.

Training Algorithm: Suitable optimization algorithms (e.g., Adam or SGD) will be used for multiple iterations of training, dynamically adjusting model parameters to minimize the loss of function value.

3) Cross-validation.

Throughout the training process, k-fold cross-validation will be employed to further verify the model’s stability and generalization capability. This method partitions the training set into k subsets, using one subset as the validation set while the others serve as the training set, thus producing more reliable model evaluation results.

4) Model evaluation metrics.

Alongside the aforementioned model evaluation methods, this study will employ metrics including Mean Squared Error (MSE), Mean Absolute Error (MAE), R² values, and the results from cross-validation to conduct a comprehensive assessment of model performance. Additionally, special attention will be given to evaluation metrics under imbalanced sampling conditions, such as the AUC (Area Under Curve) value, to ensure the model’s recognition capabilities across different sample categories.

5) Training efficiency and resource management.

During the training process, reasonable settings for batch size and learning rate will be implemented to control the use of computational resources and avoid overfitting caused by excessive model complexity. Moreover, the model state will be periodically saved during training to facilitate subsequent tuning and evaluation.

4. Empirical results

4.1 Model prediction performance

In this study, the prediction performance of the data asset valuation model based on Generative Artificial Intelligence was comprehensively evaluated using multiple metrics. We conducted detailed tests on the trained model, analyzing its performance across different years and datasets to ensure its efficacy and accuracy. Table 2 suggests that the prediction performance of the models significantly improved over time.

[Figure omitted. See PDF.]

Particularly between 2017 and 2023, the predictive accuracy and other evaluation metrics of the Generative Artificial Intelligence models showed a steady upward trend that s that as Generative Artificial Intelligence technologies advanced, the effectiveness and reliability of the models in valuing data assets were considerably enhanced.

Comparative Analysis of Model Types: The performance of the baseline model was relatively low, while the predictive performance of both the generative and optimized models significantly increased, showcasing the advantages of Generative Artificial Intelligence in data asset valuation. In particular, the optimized model achieved the highest levels of precision and AUC value in 2022 and 2023, demonstrating its enhanced capability to capture market dynamics and data characteristics.

Relationship Between Accuracy and Market Demand: The improvement in accuracy was closely related to the increasing emphasis on data assets in the market. As companies increased their investment in data assets, Generative Artificial Intelligence was able to more effectively reflect the market value of data.

Model Practicality: Through comparisons across different years, we observed that the model continuously adapted to market changes, providing more precise evaluation results for enterprises. This offers significant guidance for companies in reasonably utilizing data assets during their digital transformation processes.

4.2 Variable importance

In the data asset valuation model, various predictive variables significantly influenced the model’s predictive capability. By performing feature importance analysis, we identified the most impactful variables in data asset valuation. This analysis not only serves as a basis for subsequent research but also guides enterprises in strategizing their data asset management.

Feature Selection Methods This study employed LASSO regression and Random Forest methods for feature selection, assessing the contribution of each variable to the model’s predictive performance. LASSO regression effectively handled high-dimensional data, automatically selecting the most meaningful variables for prediction by introducing an L1 penalty term. On the other hand, the Random Forest method provided variable importance rankings by calculating the frequency of feature usage in decision trees.

Variable Importance Analysis Results: Table 3 illustrates the variable importance scores calculated through LASSO regression and Random Forest:

[Figure omitted. See PDF.]

Comparative Results Analysis: By comparing the results of LASSO regression and Random Forest, the following observations were made:

Consistency in Variable Importance: Both methods identified “Return on Assets (ROA)” and “Industry Growth Rate” as the most important variables, indicating that profitability and market environment play crucial roles in data asset valuation.

Score Differences: While the scoring differed between the two methods, the rankings were generally consistent. For instance, LASSO regression assigned higher scores to “Return on Equity (ROE)” and “Management Ownership Percentage,” whereas Random Forest slightly favored the score for “Operating Cash Flow (OCF).” This suggests that different methods might be sensitive to the importance of variables in varying ways.

Sparsity and Complexity: LASSO regression tended to produce sparse models when selecting features, potentially eliminating some unimportant variables while emphasizing key variables. In contrast, Random Forest, due to its ensemble learning characteristics, retained more variable information, aiding in revealing potential interaction effects.

To summarize the comparative analysis of LASSO regression and Random Forest, the following conclusions were drawn:

Methodological Complementarity: The two methods provided different perspectives in assessing variable importance. LASSO regression is suitable for variable selection and model simplification, whereas Random Forest is more effective in capturing complex relationships among variables.

Practical Implications: Enterprises managing data assets should consider the importance of different variables in formulating more precise strategies. The combined results of these two methods can help businesses identify critical drivers and optimize the utilization of data assets.

4.3 Model fusion

In data asset valuation, the predictive capability of a single model can be limited by various factors; therefore, adopting a model fusion strategy can effectively enhance prediction accuracy and robustness. Model fusion, by combining the strengths of different models, can more comprehensively capture complex patterns in the data, thus improving overall predictive performance.

Fusion Methods: This study employed two fusion methods: weighted averaging and stacking, to achieve effective integration of different models.

Weighted Averaging: This method weighted the predictions of LASSO regression, Random Forest, and other auxiliary models based on their performance on the validation set. The weights were dynamically adjusted according to each model’s performance. This simple method balanced the influence of each model.

Stacking: In the stacking approach, multiple base models such as LASSO regression, Random Forest, and Support Vector Machine were trained, and the predictions from these base models were used as new features in a meta-model such as Linear Regression or XGBoost for final predictions. This approach captured feature interactions between different models, enhancing prediction complexity and accuracy.

Model Fusion Effectiveness: Table 4 compares the performance of single models and fused models in predicting data asset values across different periods.

[Figure omitted. See PDF.]

The following conclusions are drawn

Single Model Performance: Among single models, XGBoost and Random Forest exhibited relatively high predictive performances, with average accuracies of 69.2% and 68.2%, respectively. In contrast, the Logit model demonstrated a considerably lower average performance at 55.3%.

Enhanced Performance of Fusion Models: All fusion models outperformed single models, particularly combinations that included multiple strong predictive models, for example, Random Forest and XGBoost. The average accuracy for Random Forest + XGBoost and Random Forest + XGBoost + Logit reached 68.8% and 69.4%, respectively, demonstrating significant advantages of model fusion.

Yearly Improvement Trend: From 2015 to 2023, the predictive performance of different models generally displayed an upward trend, indicating that with data accumulation and model optimization, the accuracy of predictions in the long term improved.

Optimal Model: In this study, the fusion model consisting of Random Forest + XGBoost + Logit performed the best, with an average accuracy of 69.4%, suggesting the potential of model fusion in the valuation of data assets.

The model fusion strategy demonstrated significant advantages in predicting data asset values, especially as combinations of multiple models were better able to comprehensively capture data features and market trends, thus providing more precise support for enterprises in data-driven decision-making.

4.4 Market testing

To validate the applicability and reliability of the proposed fusion model in a real market, this study conducted a market test using actual market data. The market test aimed to evaluate the correlation between the model’s predicted results and actual market data to determine whether the model could effectively support enterprises in the valuation of their data assets. We selected data from listed companies in various industries within the A-share market and compared the predicted value of the model with actual market performance.

Testing Methods: This study employed two methods to assess the market performance of the model: correlation analysis and bias analysis.

Correlation Analysis: This involved calculating the correlation coefficient between the model’s predicted values and the actual market values to gauge the consistency between the two.

Bias Analysis: This analyzed the average deviation between the model’s predicted values and actual market values to understand the extent of the model’s bias.

Testing Results: Table 5 displays the correlation coefficients and average deviations between the model’s predicted market values and actual market values from 2015 to 2023, grouped by industry. The data covered various industries to ensure the comprehensiveness and representativeness of the tests.

[Figure omitted. See PDF.]

Several conclusions can be drawn.

High Correlation: The correlation coefficients in most industries ranged from 0.76 to 0.85, with the financial services and information technology sectors exhibiting particularly high correlations of 0.85 and 0.82, respectively. This indicates a strong consistency between the model’s predicted results and actual market values, accurately reflecting market trends.

Bias Analysis: The average deviation ranged between 5.8% and 7.5%, with the financial services industry’s average deviation being the lowest at 5.8%, while the consumer goods industry showed a slightly higher deviation of 7.5%. Overall, the relatively low bias levels indicate that the model’s predictions were within a reasonable range of the actual market data.

Industry Differences: There were notable differences in predictive performance among different industries. For instance, the financial services and information technology sectors performed particularly well, likely due to their greater emphasis on data assets and more developed data management practices. In contrast, the consumer goods and telecommunications industries exhibited slightly larger predictive deviations, potentially influenced by market volatility and data uncertainties.

The results of the market test suggested that the constructed fusion model demonstrated strong applicability and reliability in actual market environments, particularly in data-intensive industries such as information technology and financial services. The bias analysis also indicated that the model’s predictions were closely aligned with real market data, with bias levels remaining within acceptable limits. This finding suggests that the model could provide effective reference support for enterprises in assessing the value of their data assets, enabling more precise decision-making in the market.

4.5 Training set size

In constructing the data asset valuation model, the size of the training set significantly influenced the model’s predictive performance. Different training set sizes directly impacted the model’s capability to learn data features and its generalization capabilities. Therefore, it was necessary to analyze the effect of training set size on model performance to identify the optimal training set size, providing a reference for model training.

Experimental Design: This study controlled the size of the training set to test the model’s performance across different training set sizes. Specifically, we used 25%, 50%, 75%, and 100% of the dataset for model training and evaluated how different training set sizes affected predictive accuracy. The evaluation metrics included prediction accuracy and F1-score to comprehensively assess the model’s performance.

Experimental Results: Table 6 displays the predictive performance of various models at different training set sizes (measured in terms of accuracy and F1-score). The training set proportions were set at 25%, 50%, 75%, and 100%, and the performance of each model was averaged over multiple experiments to ensure the robustness of the results.

[Figure omitted. See PDF.]

Several patterns can be observed.

Relationship Between Training Set Size and Model Performance: As the size of the training set increased, all models demonstrated improved prediction accuracy and F1 scores. This indicates that a larger training set helps the model better learn the data features, thus enhancing its predictive performance.

Differences Between Single Models and Fusion Models: Single models such as Logit and Decision Tree exhibited relatively lower performance, while fusion models such as Random Forest + XGBoost and Random Forest + XGBoost + Logit performed better, particularly with larger training sets, where the improvement for fusion models was more pronounced.

Optimal Training Set Size: When using 100% of the training set, the fusion model Random Forest + XGBoost + Logit achieved the highest accuracy and F1-score of 77.0% and 0.76, respectively, indicating that training with the maximum training set size yielded the best model performance. This suggests that for the data asset valuation model, utilizing the entire dataset for training contributes significantly to achieving optimal results.

Marginal Effect: Although the model performance improved significantly with larger training sets, the rate of increase gradually diminished, suggesting that beyond a certain scale, further increases in the training data yield diminishing returns.

The size of the training set had a significant impact on the performance of the data asset valuation model, particularly in fusion models, where a larger training set size notably enhanced predictive performance. Through this experiment, we established the necessity of utilizing a large-scale training set such as 100% of the data for model training, which will help improve the model’s accuracy and stability. In practical applications, enterprises should strive to use complete datasets for model training to obtain more accurate data asset valuation results.

5. Conclusion and implications

This study developed a comprehensive valuation model for data assets based on Generative Artificial Intelligence (Generative AI), employing a multi-layered framework that integrated data feature extraction, value generation algorithms, and market adaptability assessments to conduct dynamic and precise evaluations of data assets. Using sample data from non-financial listed companies in China’s A-share market from 2015 to 2023, various models, including LASSO regression, Random Forest, and XGBoost, along with model fusion techniques, were applied to systematically analyze the performance of Generative AI in data asset valuation. The experimental results indicated that the average predictive accuracy of the XGBoost and Random Forest models significantly surpassed that of the traditional Logit model. Furthermore, the fused models excelled across all evaluation metrics, particularly the Random Forest + XGBoost + Logit fusion model, which achieved a 77.0% accuracy rate in market testing, further validating its market applicability and robustness. Additionally, the analysis revealed that larger training set sizes contributed to improved model predictive performance, although the gains exhibited a diminishing marginal trend.

The main implications include the following points:

1) The need for dynamic updates in data asset valuation models

With the rapid advancement of Generative AI, traditional data asset valuation methods have become inadequate in addressing the dynamic changes and diversity of data. The valuation model constructed based on Generative AI provides new tools for the dynamic management and pricing of data assets. Especially, the superior performance of the fused models in capturing complex data relationships and market dynamics demonstrates the immense potential of Generative AI in data asset valuation. These findings offer reliable evidence for companies to make data-driven strategic decisions, thereby enhancing the commercial value and applicability of data assets.

2) The profound mpact of generative AI on data asset management

The research illustrated that Generative AI exhibited unique advantages in data feature extraction and value generation, particularly in enhancing data diversity and expanding application scenarios. By utilizing Generative Adversarial Networks (GANs) to generate high-quality data samples, companies can enrich their datasets, bolstering the generalization capabilities of their models. Moreover, the introduction of a market feedback mechanism effectively enhanced the model’s adaptability to market dynamics. This indicates that Generative AI not only aids in the management of data assets but also helps companies maintain a competitive edge through more accurate valuation models, especially in fast-changing market environments.

3) Advantages and limitations of model fusion

Through multi-model fusion experiments, the study demonstrated that the combination of strong models such as XGBoost and Random Forest contributed to improved predictive performance. Although the integration of these strong models with weaker models like Logit showed slight performance enhancements, the increases were modest. Fusion models outperformed single models in terms of accuracy and stability; however, they also incurred higher computational costs and demanded greater data and computational resources, limiting their practical application to some extent. Therefore, companies should judiciously select appropriate fusion strategies that align with their available resources and needs to strike a balance between predictive accuracy and computational costs.

4) The validity of market testing and future applications

The research validated the applicability of the constructed valuation model in data-intensive sectors such as information technology and financial services through market testing, revealing a high correlation between the model’s predictions and actual market values. As the importance of data assets grows in corporate operations and the market, precise data asset valuation will enhance market awareness of these assets, thereby influencing corporate valuations and investment decisions. This finding offers new perspectives for market participants and policymakers, promoting a more accurate reflection of data asset values in the market, thus supporting the healthy development of the data economy.

5) Expandability and improvement directions for the data asset valuation model

The study discovered that as the training set size increased, the predictive performance of the model significantly improved. However, the benefits diminished once a certain training set size was reached. This phenomenon suggests that future research in data asset valuation could explore more efficient feature extraction and sample selection methods to reduce dependence on data and computational resources. Additionally, future studies may consider incorporating time-series data and multi-dimensional external market information to further enhance the model’s timeliness and adaptability, better addressing the ever-changing market environment.

In summary, the data asset valuation model constructed based on Generative AI technology offers a scientific pricing tool for enterprises and the market, contributing to the scientific and forward-looking nature of data asset management. Generative AI not only expands the application scenarios for data assets but also provides robust support for a data-driven future. It is hoped that this research will provide beneficial insights for subsequent studies and offer practical theoretical support for enterprises’ digital transformation and data asset management.

References

1. 1. Kubina M, Varmus M, Kubinova I. Use of Big Data for Competitive Advantage of Company. Procedia Economics and Finance. 2015;26:561–5.

* View Article

* Google Scholar

2. 2. Medeiros MMD, Maçada ACG, Freitas Junior JCDS. The effect of data strategy on competitive advantage. The Bottom Line. 2020;33(2):201–16.

* View Article

* Google Scholar

3. 3. Tsiu S, Ngobeni M, Mathabela L. Thango B. Applications and Competitive Advantages of Data Mining and Business Intelligence in SMEs Performance: A Systematic Review. Preprints.2024. 2024090940. https://www.preprints.org/manuscript/202409.0940/v1

4. 4. Campos J, Sharma P, Gabiria UG, Jantunen E, Baglee D. A Big Data Analytical Architecture for the Asset Management. Procedia CIRP. 2017;64:369–74.

* View Article

* Google Scholar

5. 5. Corrado C, Haskel J, Iommi M, Jona-Lasinio C. Measuring data as an asset: Framework, methods, and preliminary estimates. 2022. https://www.oecd.org/content/dam/oecd/en/publications/reports/2022/11/measuring-data-as-an-asset_cf2c5025/b840fb01-en.pdf

6. 6. Stander JB. The modern asset: big data and information valuation. Doctoral dissertation, Stellenbosch: Stellenbosch University. 2015. https://core.ac.uk/download/pdf/37439869.pdf

* View Article

* Google Scholar

7. 7. Stein SS, Stein S. Data as an asset. Blockchain, Artificial Intelligence, and Financial Services: Implications and Applications for Finance and Accounting Professionals. 2020. p. 213–39.

8. 8. Ooi KB, Tan GWH, Al-Emran M, Al-Sharafi MA, Capatina A, Chakraborty A. The potential of generative artificial intelligence across disciplines: perspectives and future directions. Journal of Computer Information Systems. 2023:1–32.

* View Article

* Google Scholar

9. 9. Banh L, Strobel G. Generative artificial intelligence. Electronic Markets. 2023;33(1):63.

* View Article

* Google Scholar

10. 10. Dhoni P. Exploring the synergy between generative AI, data, and analytics in the modern age. Preprints. 2023. https://d197for5662m48.cloudfront.net/documents/publicationstatus/171558/preprint_pdf/b19276e1b30b325c2ad2563b2bf1c229.pdf

11. 11. Cao Y, Li S, Liu Y, Yan Z, Dai Y, Yu PS, Sun L. A comprehensive survey of AI-generated content (aigc): A history of generative AI from GAN to ChatGPT. arXiv Preprint arXiv:2303.04226. 2023.

* View Article

* Google Scholar

12. 12. Nah F F-H, Zheng R, Cai J, Siau K, Chen L. Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration. Journal of Information Technology Case and Application Research. 2023;25(3):277–304.

* View Article

* Google Scholar

13. 13. Chui M, Roberts R, Yee L. Generative AI is here: how tools like ChatGPT could change your business. Quantum Black AI by McKinsey. 2022.

14. 14. Kanbach DK, Heiduk L, Blueher G, Schreiter M, Lahmann A. The GenAI is out of the bottle: generative artificial intelligence from a business model innovation perspective. Rev Manag Sci. 2023;18(4):1189–220.

* View Article

* Google Scholar

15. 15. Birch K, Cochrane D, Ward C. Data as asset? The measurement, governance, and valuation of digital personal data by Big Tech. Big Data & Society. 2021;8(1).

* View Article

* Google Scholar

16. 16. Xiong F, Xie M, Zhao L, Li C, Fan X. Recognition and Evaluation of Data as Intangible Assets. Sage Open. 2022;12(2).

* View Article

* Google Scholar

17. 17. Batini C, Cappiello C, Francalanci C, Maurino A. Methodologies for data quality assessment and improvement. ACM Comput Surv. 2009;41(3):1–52.

* View Article

* Google Scholar

18. 18. Feuerriegel S, Hartmann J, Janiesch C, Zschech P. Generative AI. Bus Inf Syst Eng. 2023;66(1):111–26.

* View Article

* Google Scholar

19. 19. Fernandez RC, Subramaniam P, Franklin MJ. Data market platforms: Trading data assets to solve data problems. arXiv preprint arXiv:2002.01047. 2020.

* View Article

* Google Scholar

20. 20. Hannila H, Silvola R, Harkonen J, Haapasalo H. Data-driven Begins with DATA; Potential of Data Assets. Journal of Computer Information Systems. 2019;62(1):29–38.

* View Article

* Google Scholar

21. 21. Tsai C-F, Lu Y-H, Yen DC. Determinants of intangible assets value: The data mining approach. Knowledge-Based Systems. 2012;31:67–77.

* View Article

* Google Scholar

22. 22. Bandi A, Adapa PVSR, Kuchi YEVPK. The Power of Generative AI: A Review of Requirements, Models, Input–Output Formats, Evaluation Metrics, and Challenges. Future Internet. 2023;15(8):260.

* View Article

* Google Scholar

23. 23. Sun J, Liao QV, Muller M, Agarwal M, Houde S, Talamadupula K, Weisz JD. Investigating explainability of generative AI for code through scenario-based design. In Proceedings of the 27th International Conference on Intelligent User Interfaces. 2022. 212–28.

* View Article

* Google Scholar

24. 24. Moro-Visconti R. Artificial Intelligence Valuation. Books: Springer. 2024.

25. 25. Moro-Visconti R. The valuation of intangible assets: an introduction. Artificial Intelligence Valuation: The Impact on Automation, BioTech, ChatBots, FinTech, B2B2C, and Other Industries. Cham: Springer Nature Switzerland. 2024. p. 41–129.

26. 26. Yanlin W, Haijun Z. Data Asset Value Assessment Literature Review and Prospect. J Phys: Conf Ser. 2020;1550(3):032133.

* View Article

* Google Scholar

27. 27. Côrte-Real N, Oliveira T, Ruivo P. Assessing business value of Big Data Analytics in European firms. Journal of Business Research. 2017;70:379–90.

* View Article

* Google Scholar

28. 28. Popovič A, Hackney R, Tassabehji R, Castelli M. The impact of big data analytics on firms’ high value business performance. Inf Syst Front. 2016;20(2):209–22.

* View Article

* Google Scholar

29. 29. Warren JD, Moffitt KC, Byrnes P. How big data will change accounting? Accounting Horizons. 2005;29(2), 397–407.

* View Article

* Google Scholar

30. 30. Yoshioka T. Valuation of intangible fixed assets using generative artificial intelligence and machine learning. Journal of Management Science. 2024;13:27–36.

* View Article

* Google Scholar

31. 31. Ganguli D, Hernandez D, Lovitt L, Askell A, Bai Y, Chen A, et al. Predictability and surprise in large generative models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022. 1747–64.

* View Article

* Google Scholar

32. 32. Brynjolfsson E, Li D, Raymond LR. Generative AI at work (No. w31161). National Bureau of Economic Research. 2023. https://www.nber.org/system/files/working_papers/w31161/w31161.pdf

33. 33. Re Cecconi F, Dejaco MC, Moretti N, Mannino A, Blanco Cadena JD. Digital Asset Management. Research for Development. Springer International Publishing. 2019. p. 243–53.

* View Article

* Google Scholar

34. 34. Bickley SJ, Macintyre A, Torgler B. Artificial Intelligence and Big Data in Sustainable Entrepreneurship. Journal of Economic Surveys. 2024;39(1):103–45.

* View Article

* Google Scholar

35. 35. Bag S, Pretorius JHC, Gupta S, Dwivedi YK. Role of institutional pressures and resources in the adoption of big data analytics-powered artificial intelligence, sustainable manufacturing practices, and circular economy capabilities. Technological Forecasting and Social Change. 2021;163:120420.

* View Article

* Google Scholar

36. 36. Loukis EN, Maragoudakis M, Kyriakou N. Artificial intelligence-based public sector data analytics for economic crisis policymaking. TG. 2020;14(4):639–62.

* View Article

* Google Scholar

37. 37. Delen D, Kuzey C, Uyar A. Measuring firm performance using financial ratios: A decision tree approach. Expert Systems with Applications. 2023;40(10):3970–83.

* View Article

* Google Scholar

38. 38. Demydyuk G. Optimal financial key performance indicators: evidence from the airline industry. Accounting & Taxation. 2020;3(2):39–51.

* View Article

* Google Scholar

39. 39. Ittner CD, Larcker DF. Are non-financial measures leading indicators of financial performance? An analysis of customer satisfaction. Journal of Accounting Research. 2020;36:1–35.

* View Article

* Google Scholar

40. 40. Hag ADA, Firmansyah D. Profit growth: the impact of total asset turnover and firm size. Journal Ekonomi, Manajemen Dan Akuntansi. 2020;2(04):531–44.

* View Article

* Google Scholar

41. 41. Sharpe WF. Asset allocation: Management style and performance measurement. Journal of Portfolio Management. 1992;18(2):7–19.

* View Article

* Google Scholar

42. 42. Shi W. Analyzing enterprise asset structure and profitability using cloud computing and strategic management accounting. PLoS One. 2021;16(9):e0257826. pmid:34591883

* View Article

* PubMed/NCBI

* Google Scholar

43. 43. Chen DQ, Preston DS, Swink M. How the Use of Big Data Analytics Affects Value Creation in Supply Chain Management. Journal of Management Information Systems. 2015;32(4):4–39.

* View Article

* Google Scholar

44. 44. Kwon O, Lee N, Shin B. Data quality management, data usage experience and acquisition intention of big data analytics. International Journal of Information Management. 2014;34(3):387–94.

* View Article

* Google Scholar

45. 45. Teece DJ. Strategies for managing knowledge assets: the role of firm structure and industrial context. Long-Range Planning. 2020;33(1):35–54.

* View Article

* Google Scholar

46. 46. Frank MZ, Goyal VK. Capital structure decisions: which factors are reliably important?. Financial Management. 2009;38(1):1–37.

* View Article

* Google Scholar

47. 47. Rajan RG, Zingales L. What Do We Know about Capital Structure? Some Evidence from International Data. The Journal of Finance. 1995;50(5):1421–60.

* View Article

* Google Scholar

48. 48. Fisher T. The data asset: how smart companies govern their data for business success. John Wiley & Sons. 2009.

49. 49. Hu C, Li Y, Zheng X. Data assets, information uses, and operational efficiency. Applied Economics. 2022;54(60):6887–900.

* View Article

* Google Scholar

50. 50. Redman TC. Data-driven: profiting from your most important business asset. Harvard Business Press. 2008.

51. 51. Grover V, Chiang RHL, Liang T-P, Zhang D. Creating Strategic Business Value from Big Data Analytics: A Research Framework. Journal of Management Information Systems. 2018;35(2):388–423.

* View Article

* Google Scholar

52. 52. Kaplan RS, Norton DP. Measuring the strategic readiness of intangible assets. Harv Bus Rev. 2004;82(2):52–63, 121. pmid:14971269

* View Article

* PubMed/NCBI

* Google Scholar

53. 53. Denis DJ, Denis DK, Sarin A. Ownership structure and top executive turnover. Journal of Financial Economics. 1997;45(2):193–221.

* View Article

* Google Scholar

54. 54. Kayode BG, Omirin M. An assessment of the relative impact of factors influencing inaccuracy in valuation. International Journal of Housing Markets and Analysis. 2012;5(2):145–60.

* View Article

* Google Scholar

55. 55. Khorana A. Performance changes following top management turnover: evidence from open-end mutual funds. J Financ Quant Anal. 2001;36(3):371–93.

* View Article

* Google Scholar

56. 56. Wilkins J, Van Wegen B, De Hoog R. Understanding and valuing knowledge assets: Overview and method. Expert Systems with Applications. 1997;13(1):55–72.

* View Article

* Google Scholar

57. 57. Hsieh MY. An empirical investigation into the enhancement of decision-making capabilities in corporate sustainability leadership through Internet of Things (IoT) integration. Internet of Things. 2024;28:101382.

* View Article

* Google Scholar

58. 58. Uhlaner L, Massis A de, Jorissen A, Du Y. Are outside directors on the small and medium-sized enterprise board always beneficial? Disclosure of firm-specific information in board-management relations as the missing mechanism. Human Relations. 2020;74(11):1781–819.

* View Article

* Google Scholar

59. 59. Post C, Rahman N, Rubow E. Green Governance: Boards of Directors’ Composition and Environmental Corporate Social Responsibility. Business & Society. 2011;50(1):189–223.

* View Article

* Google Scholar

60. 60. Roberts J, McNulty T, Stiles P. Beyond Agency Conceptions of the Work of the Non‐Executive Director: Creating Accountability in the Boardroom. British J of Management. 2005;16(s1).

* View Article

* Google Scholar

61. 61. Free C, Trotman AJ, Trotman KT. How Audit Committee Chairs Address Information-Processing Barriers. The Accounting Review. 2020;96(1):147–69.

* View Article

* Google Scholar

62. 62. García-Meca E, Ramón-Llorens M-C, Martínez-Ferrero J. Are narcissistic CEOs more tax aggressive? The moderating role of internal audit committees. Journal of Business Research. 2021;129:223–35.

* View Article

* Google Scholar

63. 63. Wu CY-H, Hsu H-H, Haslam J. Audit committees, non-audit services, and auditor reporting decisions prior to failure. The British Accounting Review. 2016;48(2):240–56.

* View Article

* Google Scholar

64. 64. Horak S, Cui J. Financial performance and risk behavior of gender-diversified boards in the Chinese automotive industry. PR. 2017;46(4):847–66.

* View Article

* Google Scholar

65. 65. Yu Z, Li J, Yang J. Does corporate governance matter in competitive industries? Evidence from China. Pacific-Basin Finance Journal. 2017;43:238–55.

* View Article

* Google Scholar

66. 66. Shimura H, Masuda S, Kimura H. Research and development productivity map: visualization of industry status. J Clin Pharm Ther. 2014;39(2):175–80. pmid:24438433

* View Article

* PubMed/NCBI

* Google Scholar

67. 67. Wang C-N, Yang F-C, Vo NTM, Duong C-T, Nguyen VTT. Enhancing Operational Efficiency in Industrial Systems: A DEA-Grey Integration. IEEE Access. 2024;12:58532–50.

* View Article

* Google Scholar

68. 68. Hasanin T, Khoshgoftaar TM, Leevy JL, Bauder RA. Severely imbalanced Big Data challenges: investigating data sampling approaches. J Big Data. 2019;6(1).

* View Article

* Google Scholar

69. 69. Rathi SC, Misra S, Colomo-Palacios R, Adarsh R, Neti LBM, Kumar L. Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction. Expert Systems with Applications. 2023;223:119806.

* View Article

* Google Scholar

70. 70. Panda JP, Warrior HV. Evaluation of machine learning algorithms for predictive Reynolds stress transport modeling. Acta Mechanica Sinica. 2022;38(4):321544.

* View Article

* Google Scholar

71. 71. Lao Z, He D, Jin Z, Liu C, Shang H, He Y. Few-shot fault diagnosis of turnout switch machine based on semi-supervised weighted prototypical network. Knowledge-Based Systems. 2023;274:110634.

* View Article

* Google Scholar

Citation: Tang Y, Liu Y, Liu D (2025) Data asset valuation model based on generative artificial intelligence. PLoS One 20(8): e0328926. https://doi.org/10.1371/journal.pone.0328926

About the Authors:

Yungang Tang

Roles: Conceptualization, Data curation, Methodology, Writing – original draft

E-mail: [email protected]

Affiliation: School of Economics and Management, Quanzhou University of Information Engineering, Quanzhou, Fujian, China

ORICD: https://orcid.org/0000-0002-7061-2841

Yaoqian Liu

Roles: Conceptualization, Methodology, Writing – original draft

Affiliation: Faculty of Humanities, Arts and Social Sciences, University of Exeter, Exeter, United Kingdom

ORICD: https://orcid.org/0009-0003-0310-7233

Daxin Liu

Roles: Data curation, Methodology

Affiliation: China Construction Materials Industrial Geology Reconnaissance Center, Beijing, China

[/RAW_REF_TEXT]

References

1. Kubina M, Varmus M, Kubinova I. Use of Big Data for Competitive Advantage of Company. Procedia Economics and Finance. 2015;26:561–5.

2. Medeiros MMD, Maçada ACG, Freitas Junior JCDS. The effect of data strategy on competitive advantage. The Bottom Line. 2020;33(2):201–16.

3. Tsiu S, Ngobeni M, Mathabela L. Thango B. Applications and Competitive Advantages of Data Mining and Business Intelligence in SMEs Performance: A Systematic Review. Preprints.2024. 2024090940. https://www.preprints.org/manuscript/202409.0940/v1

4. Campos J, Sharma P, Gabiria UG, Jantunen E, Baglee D. A Big Data Analytical Architecture for the Asset Management. Procedia CIRP. 2017;64:369–74.

5. Corrado C, Haskel J, Iommi M, Jona-Lasinio C. Measuring data as an asset: Framework, methods, and preliminary estimates. 2022. https://www.oecd.org/content/dam/oecd/en/publications/reports/2022/11/measuring-data-as-an-asset_cf2c5025/b840fb01-en.pdf

6. Stander JB. The modern asset: big data and information valuation. Doctoral dissertation, Stellenbosch: Stellenbosch University. 2015. https://core.ac.uk/download/pdf/37439869.pdf

7. Stein SS, Stein S. Data as an asset. Blockchain, Artificial Intelligence, and Financial Services: Implications and Applications for Finance and Accounting Professionals. 2020. p. 213–39.

8. Ooi KB, Tan GWH, Al-Emran M, Al-Sharafi MA, Capatina A, Chakraborty A. The potential of generative artificial intelligence across disciplines: perspectives and future directions. Journal of Computer Information Systems. 2023:1–32.

9. Banh L, Strobel G. Generative artificial intelligence. Electronic Markets. 2023;33(1):63.

10. Dhoni P. Exploring the synergy between generative AI, data, and analytics in the modern age. Preprints. 2023. https://d197for5662m48.cloudfront.net/documents/publicationstatus/171558/preprint_pdf/b19276e1b30b325c2ad2563b2bf1c229.pdf

11. Cao Y, Li S, Liu Y, Yan Z, Dai Y, Yu PS, Sun L. A comprehensive survey of AI-generated content (aigc): A history of generative AI from GAN to ChatGPT. arXiv Preprint arXiv:2303.04226. 2023.

12. Nah F F-H, Zheng R, Cai J, Siau K, Chen L. Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration. Journal of Information Technology Case and Application Research. 2023;25(3):277–304.

13. Chui M, Roberts R, Yee L. Generative AI is here: how tools like ChatGPT could change your business. Quantum Black AI by McKinsey. 2022.

14. Kanbach DK, Heiduk L, Blueher G, Schreiter M, Lahmann A. The GenAI is out of the bottle: generative artificial intelligence from a business model innovation perspective. Rev Manag Sci. 2023;18(4):1189–220.

15. Birch K, Cochrane D, Ward C. Data as asset? The measurement, governance, and valuation of digital personal data by Big Tech. Big Data & Society. 2021;8(1).

16. Xiong F, Xie M, Zhao L, Li C, Fan X. Recognition and Evaluation of Data as Intangible Assets. Sage Open. 2022;12(2).

17. Batini C, Cappiello C, Francalanci C, Maurino A. Methodologies for data quality assessment and improvement. ACM Comput Surv. 2009;41(3):1–52.

18. Feuerriegel S, Hartmann J, Janiesch C, Zschech P. Generative AI. Bus Inf Syst Eng. 2023;66(1):111–26.

19. Fernandez RC, Subramaniam P, Franklin MJ. Data market platforms: Trading data assets to solve data problems. arXiv preprint arXiv:2002.01047. 2020.

20. Hannila H, Silvola R, Harkonen J, Haapasalo H. Data-driven Begins with DATA; Potential of Data Assets. Journal of Computer Information Systems. 2019;62(1):29–38.

21. Tsai C-F, Lu Y-H, Yen DC. Determinants of intangible assets value: The data mining approach. Knowledge-Based Systems. 2012;31:67–77.

22. Bandi A, Adapa PVSR, Kuchi YEVPK. The Power of Generative AI: A Review of Requirements, Models, Input–Output Formats, Evaluation Metrics, and Challenges. Future Internet. 2023;15(8):260.

23. Sun J, Liao QV, Muller M, Agarwal M, Houde S, Talamadupula K, Weisz JD. Investigating explainability of generative AI for code through scenario-based design. In Proceedings of the 27th International Conference on Intelligent User Interfaces. 2022. 212–28.

24. Moro-Visconti R. Artificial Intelligence Valuation. Books: Springer. 2024.

25. Moro-Visconti R. The valuation of intangible assets: an introduction. Artificial Intelligence Valuation: The Impact on Automation, BioTech, ChatBots, FinTech, B2B2C, and Other Industries. Cham: Springer Nature Switzerland. 2024. p. 41–129.

26. Yanlin W, Haijun Z. Data Asset Value Assessment Literature Review and Prospect. J Phys: Conf Ser. 2020;1550(3):032133.

27. Côrte-Real N, Oliveira T, Ruivo P. Assessing business value of Big Data Analytics in European firms. Journal of Business Research. 2017;70:379–90.

28. Popovič A, Hackney R, Tassabehji R, Castelli M. The impact of big data analytics on firms’ high value business performance. Inf Syst Front. 2016;20(2):209–22.

29. Warren JD, Moffitt KC, Byrnes P. How big data will change accounting? Accounting Horizons. 2005;29(2), 397–407.

30. Yoshioka T. Valuation of intangible fixed assets using generative artificial intelligence and machine learning. Journal of Management Science. 2024;13:27–36.

31. Ganguli D, Hernandez D, Lovitt L, Askell A, Bai Y, Chen A, et al. Predictability and surprise in large generative models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022. 1747–64.

32. Brynjolfsson E, Li D, Raymond LR. Generative AI at work (No. w31161). National Bureau of Economic Research. 2023. https://www.nber.org/system/files/working_papers/w31161/w31161.pdf

33. Re Cecconi F, Dejaco MC, Moretti N, Mannino A, Blanco Cadena JD. Digital Asset Management. Research for Development. Springer International Publishing. 2019. p. 243–53.

34. Bickley SJ, Macintyre A, Torgler B. Artificial Intelligence and Big Data in Sustainable Entrepreneurship. Journal of Economic Surveys. 2024;39(1):103–45.

35. Bag S, Pretorius JHC, Gupta S, Dwivedi YK. Role of institutional pressures and resources in the adoption of big data analytics-powered artificial intelligence, sustainable manufacturing practices, and circular economy capabilities. Technological Forecasting and Social Change. 2021;163:120420.

36. Loukis EN, Maragoudakis M, Kyriakou N. Artificial intelligence-based public sector data analytics for economic crisis policymaking. TG. 2020;14(4):639–62.

37. Delen D, Kuzey C, Uyar A. Measuring firm performance using financial ratios: A decision tree approach. Expert Systems with Applications. 2023;40(10):3970–83.

38. Demydyuk G. Optimal financial key performance indicators: evidence from the airline industry. Accounting & Taxation. 2020;3(2):39–51.

39. Ittner CD, Larcker DF. Are non-financial measures leading indicators of financial performance? An analysis of customer satisfaction. Journal of Accounting Research. 2020;36:1–35.

40. Hag ADA, Firmansyah D. Profit growth: the impact of total asset turnover and firm size. Journal Ekonomi, Manajemen Dan Akuntansi. 2020;2(04):531–44.

41. Sharpe WF. Asset allocation: Management style and performance measurement. Journal of Portfolio Management. 1992;18(2):7–19.

42. Shi W. Analyzing enterprise asset structure and profitability using cloud computing and strategic management accounting. PLoS One. 2021;16(9):e0257826. pmid:34591883

43. Chen DQ, Preston DS, Swink M. How the Use of Big Data Analytics Affects Value Creation in Supply Chain Management. Journal of Management Information Systems. 2015;32(4):4–39.

44. Kwon O, Lee N, Shin B. Data quality management, data usage experience and acquisition intention of big data analytics. International Journal of Information Management. 2014;34(3):387–94.

45. Teece DJ. Strategies for managing knowledge assets: the role of firm structure and industrial context. Long-Range Planning. 2020;33(1):35–54.

46. Frank MZ, Goyal VK. Capital structure decisions: which factors are reliably important?. Financial Management. 2009;38(1):1–37.

47. Rajan RG, Zingales L. What Do We Know about Capital Structure? Some Evidence from International Data. The Journal of Finance. 1995;50(5):1421–60.

48. Fisher T. The data asset: how smart companies govern their data for business success. John Wiley & Sons. 2009.

49. Hu C, Li Y, Zheng X. Data assets, information uses, and operational efficiency. Applied Economics. 2022;54(60):6887–900.

50. Redman TC. Data-driven: profiting from your most important business asset. Harvard Business Press. 2008.

51. Grover V, Chiang RHL, Liang T-P, Zhang D. Creating Strategic Business Value from Big Data Analytics: A Research Framework. Journal of Management Information Systems. 2018;35(2):388–423.

52. Kaplan RS, Norton DP. Measuring the strategic readiness of intangible assets. Harv Bus Rev. 2004;82(2):52–63, 121. pmid:14971269

53. Denis DJ, Denis DK, Sarin A. Ownership structure and top executive turnover. Journal of Financial Economics. 1997;45(2):193–221.

54. Kayode BG, Omirin M. An assessment of the relative impact of factors influencing inaccuracy in valuation. International Journal of Housing Markets and Analysis. 2012;5(2):145–60.

55. Khorana A. Performance changes following top management turnover: evidence from open-end mutual funds. J Financ Quant Anal. 2001;36(3):371–93.

56. Wilkins J, Van Wegen B, De Hoog R. Understanding and valuing knowledge assets: Overview and method. Expert Systems with Applications. 1997;13(1):55–72.

57. Hsieh MY. An empirical investigation into the enhancement of decision-making capabilities in corporate sustainability leadership through Internet of Things (IoT) integration. Internet of Things. 2024;28:101382.

58. Uhlaner L, Massis A de, Jorissen A, Du Y. Are outside directors on the small and medium-sized enterprise board always beneficial? Disclosure of firm-specific information in board-management relations as the missing mechanism. Human Relations. 2020;74(11):1781–819.

59. Post C, Rahman N, Rubow E. Green Governance: Boards of Directors’ Composition and Environmental Corporate Social Responsibility. Business & Society. 2011;50(1):189–223.

60. Roberts J, McNulty T, Stiles P. Beyond Agency Conceptions of the Work of the Non‐Executive Director: Creating Accountability in the Boardroom. British J of Management. 2005;16(s1).

61. Free C, Trotman AJ, Trotman KT. How Audit Committee Chairs Address Information-Processing Barriers. The Accounting Review. 2020;96(1):147–69.

62. García-Meca E, Ramón-Llorens M-C, Martínez-Ferrero J. Are narcissistic CEOs more tax aggressive? The moderating role of internal audit committees. Journal of Business Research. 2021;129:223–35.

63. Wu CY-H, Hsu H-H, Haslam J. Audit committees, non-audit services, and auditor reporting decisions prior to failure. The British Accounting Review. 2016;48(2):240–56.

64. Horak S, Cui J. Financial performance and risk behavior of gender-diversified boards in the Chinese automotive industry. PR. 2017;46(4):847–66.

65. Yu Z, Li J, Yang J. Does corporate governance matter in competitive industries? Evidence from China. Pacific-Basin Finance Journal. 2017;43:238–55.

66. Shimura H, Masuda S, Kimura H. Research and development productivity map: visualization of industry status. J Clin Pharm Ther. 2014;39(2):175–80. pmid:24438433

67. Wang C-N, Yang F-C, Vo NTM, Duong C-T, Nguyen VTT. Enhancing Operational Efficiency in Industrial Systems: A DEA-Grey Integration. IEEE Access. 2024;12:58532–50.

68. Hasanin T, Khoshgoftaar TM, Leevy JL, Bauder RA. Severely imbalanced Big Data challenges: investigating data sampling approaches. J Big Data. 2019;6(1).

69. Rathi SC, Misra S, Colomo-Palacios R, Adarsh R, Neti LBM, Kumar L. Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction. Expert Systems with Applications. 2023;223:119806.

70. Panda JP, Warrior HV. Evaluation of machine learning algorithms for predictive Reynolds stress transport modeling. Acta Mechanica Sinica. 2022;38(4):321544.

71. Lao Z, He D, Jin Z, Liu C, Shang H, He Y. Few-shot fault diagnosis of turnout switch machine based on semi-supervised weighted prototypical network. Knowledge-Based Systems. 2023;274:110634.

Word count: 9091

Show less

© 2025 Tang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

In the digital economy era, the significance of data assets has increasingly become evident, particularly against the backdrop of the rapid development of Generative Artificial Intelligence. This paper constructed a data asset valuation model based on Generative AI, aimed at dynamically assessing the commercial value of data assets. The model integrates data feature extraction, value generation algorithms, and market adaptability evaluations to address the shortcomings of traditional valuation methods in dynamic market environments. The validity and applicability of the model were verified through an empirical analysis of data from Chinese A-share listed companies from 2015 to 2023. The results indicated that the integrated model exhibited a significant advantage over individual models in accuracy and stability, especially in data-intensive industries such as information technology and financial services. This research provided new perspectives and methodologies for enterprises in digital transformation and data asset management, thereby promoting the sustainable development of the data economy.

Details

Title

Data asset valuation model based on generative artificial intelligence

Author

Tang, Yungang

; Liu, Yaoqian

; Liu, Daxin

First page

e0328926

Section

Research Article

Publication year

2025

Publication date

Aug 2025

Publisher

Public Library of Science

e-ISSN

19326203

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1371/journal.pone.0328926

ProQuest document ID

3240696141

Data asset valuation model based on generative artificial intelligence

Jump to:

Full text

1. Introduction

2. Model introduction

2.1 Data feature extraction

2.2 Value generation algorithm

2.3 Market adaptability assessment

2.4 Formal description of the model

3. Research design

3.1 Sample selection

3.2 Dependent variable

3.3 Selection of predictive variables

1) Performance indicators.

2) Company characteristics.

3) Management motivation.

4) Corporate governance.

5) External environmental factors.

3.4 Model evaluation

1) Selection of evaluation metrics.

2) Addressing sample imbalance issues.

3) Cross-validation and model stability.

4) Feature importance analysis.

3.5 Hyperparameter range

1) Hyperparameter selection methods.

2) Setting hyperparameter ranges.

3) Balancing computational cost and accuracy.

4) Consideration of imbalanced samples.

3.6 Sample partitioning and model training

1) Sample partitioning strategy.

2) Training process.

3) Cross-validation.

4) Model evaluation metrics.

5) Training efficiency and resource management.

4. Empirical results

4.1 Model prediction performance

4.2 Variable importance

4.3 Model fusion

4.4 Market testing

Several conclusions can be drawn.

4.5 Training set size

Several patterns can be observed.

5. Conclusion and implications

1) The need for dynamic updates in data asset valuation models

2) The profound mpact of generative AI on data asset management

3) Advantages and limitations of model fusion

4) The validity of market testing and future applications

5) Expandability and improvement directions for the data asset valuation model

References

Abstract

Details