Content area
With the development of blockchain technology, the security of blockchain data verification has received increasing attention. A fuzzy encryption search scheme and data verification mechanism are proposed in the study. First, privacy protection of the raw data is achieved through a hierarchical Mini Batch K-Means clustering algorithm and locally sensitive hash functions. Second, the study proposes a blockchain data verification mechanism that employs a fuzzy encryption search scheme, in conjunction with cloud storage and a fuzzy encryption algorithm. This mechanism is designed to ensure data integrity and confidentiality. When the number of attributes was 30, the initialization time cost of the proposed fuzzy encryption search scheme was 98 ms, which was reduced by about 51.6% compared to the blockchain-assisted sorting method. The encryption time cost of the proposed search method was 61 ms, which was reduced by about 35.6% compared to the multi-permission data access method. When the number of parallel transactions reached 1000, the transaction duration for smart contract Upload was 23.68 s, while the transaction duration for Search was 24.36 s. The proposed fuzzy encryption search scheme and data verification mechanism not only protect data privacy and ensure data integrity and confidentiality, but also have high search efficiency and low time overhead, providing better security for blockchain data.
Introduction
The rapid development and wide application of blockchain technology has led to an increase in the amount of data stored in the blockchain, which contains important information and value. However, as blockchains continue to scale, it becomes increasingly challenging to search and verify this data efficiently [1]. The traditional methods often use keyword-based search, which requires the data to be labeled or classified beforehand, and then keyword search to obtain the desired data. However, in blockchain, the amount and complexity of data is huge and the results of searching using keywords are not ideal. Second, traditional methods often require traversing the entire blockchain to verify the integrity and authenticity of the data, which is inefficient and time-consuming. In blockchain, data privacy protection is an important issue [2]. Due to the transparency of the blockchain, the data stored on the blockchain may be open to the public. This gives rise to the issue of data privacy disclosure. While encryption techniques do exist to protect data, in certain instances, decryption is still necessary for search and verification purposes. This leads to certain limitations in data privacy protection [3]. To address these issues, some researchers are working on AI-based methods to classify and search data on the blockchain. This method can automatically classify and label data without the need for human intervention. In addition, some researchers are working on blockchain-based privacy protection technologies to protect the privacy of data [4, 5]. In general, data search and verification on the blockchain is a complex issue. Traditional methods have limitations in terms of efficiency and privacy protection. First, keyword-based search is ineffective in blockchains with substantial and intricate data volumes. It necessitates pre-labeled or classified data, which is cumbersome to operate. Second, the verification of data integrity and authenticity is often a time-consuming and inefficient process in traditional methods, which require traversing the entire blockchain. Moreover, the transparency inherent to blockchain technology engenders significant challenges pertaining to data privacy protection. Despite the implementation of encryption techniques, the possibility of decryption and subsequent revelation of private information persists. These limitations pose significant challenges to traditional methods in meeting the demands of efficiency and privacy protection when processing blockchain data. Based on this, the paper proposes a fuzzy encryption search scheme and data verification based on blockchain. By using K-means locality sensitive hashing (KMLSH) mixed index and fuzzy encryption search scheme, it is expected to improve the search efficiency while ensuring the search accuracy. The study hypothesizes that by combining fuzzy encryption algorithms with blockchain technology, the efficiency of data search and the accuracy of verification can be improved while protecting data privacy. This research question focuses on how to achieve efficient and secure blockchain data search and verification while ensuring data integrity and confidentiality. Through the design and application analysis of a fuzzy encryption search scheme and a blockchain data verification mechanism, the advantages of the proposed scheme in initialization time, search efficiency, verification accuracy, and other aspects are verified. This provides an effective guarantee for blockchain data security. Blockchain native data privacy relies on encrypted storage and anonymous addresses to achieve confidentiality, but its transparent architecture stores data traceability and associated risks, which belongs to the static mode of “passive encryption and public verification”. The fuzzy encryption search scheme proposed in this paper constructs a double-layer barrier of “dynamic encryption and dense state computing”. One is to use KMLSH mixed index dimension reduction, fuzzy features, break the direct mapping between plaintext and encrypted data. The second is to use verifiable encryption search mechanism, verify in ciphertext domain, realize dynamic protection by zero-knowledge proof, and eliminate privacy disclosure window. The double reconstruction of the scheme allows the privacy protection to be upgraded to irreversible privacy computing, and solves the defect of the blockchain native scheme to expose metadata when the data is used. The study consists of four sections. Section 1 is a summary of relevant research. In Sects. 2 and 3, a blockchain data search and verification mechanism is designed and validated. Section 4 summarizes the entire study. This study aims to provide effective solutions for the search and verification of blockchain data and to provide better support and guarantee for the application of blockchain technology.
Related Works
Locality sensitive hashing (LSH) is a technique used to approximately search for similar terms in high-dimensional space. LSH accelerates the search for similar terms by mapping similar data to the same hash bucket. Li et al. proposed a global low-density LSH search algorithm, global low-density locality-sensitive hashing (GLDH). A new global low-density hyperplane candidate set using graph cutting method was constructed and hyperplanes were selected with minimum information gain. These results confirmed that GLDH had better performance compared to the latest methods at the same hash encoding length [6]. Orliński and Jankowski proposed an acceleration algorithm based on random neighbor embedding, which utilized balanced tree forest and LSH to eliminate the necessity of low dimensionality. These results confirmed that the algorithm had a good acceleration factor and achieved better performance through parallelization methods [7]. Wu et al. used LSH and importance sampling methods to estimate the number of points within a specified distance threshold for a given query in high-dimensional vector representation. These results confirmed the effectiveness of this method on standard word embedding datasets [8]. Aumueller et al. proposed an LSH-based algorithm that maintained fairness in data structures without significant efficiency loss. These results confirmed that this method could maintain fairness and could be applied to radius nearest neighbor search problems in high-dimensional spaces [9]. Ferrada et al. proposed a simple and effective algorithm to calculate approximate K-nearest neighbor self-similarity connections by comparing each pair of objects in the dataset. These results confirmed that this algorithm achieved an empirical accuracy of 46% in real-world datasets [10].
Blockchain is a distributed ledger technology, its core features are decentralized storage and consensus verification mechanism, through the chain data structure to achieve the immutable data, and with the help of cryptography technology to ensure trusted collaboration between network nodes. Its essence is to build a transparent data management system of mutual trust through mathematical rules, which is typically used in financial transactions, supply chain traceability and other scenarios. On this basis, the blockchain security verification mechanism forms a triple defense system through the continuity verification of the hash chain, the distributed control of the consensus algorithm on the right to modify data, and the automated execution of business rules by smart contracts. In large-scale network environments, Zhang et al. proposed caching nodes and three-level caching mechanisms through smart contracts to improve content retrieval speed and reduce network burden. These results confirmed that the request response time of the three-level caching mechanism was better than that of ordinary first-in, first-out caching [11]. Liu et al. proposed an improved DPoS consistency mechanism based on probabilistic language terminology for intelligent autonomous multi-robot systems. This mechanism calculated the degree of deviation for each node by adding voting options and scores to improve consensus efficiency and flexibility. The instance verification confirmed that the effectiveness of the improved DPoS consensus mechanism could reach 85% [12]. Bhattacharya et al. proposed an auxiliary two-layer blockchain framework that leveraged the advantages of blockchain’s distributed storage, consensus mechanism, and avoidance of single point of failure. This framework solved issues such as limited computing power and high bandwidth requirements for data exchange. These studies confirmed that smart contracts could reliably communicate and aggregate smart grid measurements [13]. Fuzzy search is a search method that allows the input to not match the target data exactly. In traditional precision search, the input must be exactly the same as the record in the database to find a match, while fuzzy search takes into account data similarity, spelling errors, synonyms, and so on. Fuzzy search is often used in search engine, database query, text processing and other scenarios, which improves the flexibility and accuracy of search. Gardas et al. proposed a blockchain-based edge IoT node selection method that processes data through fuzzy search logic and uses preference ranking techniques for multi-criteria decision-making. The experimental results showed that this method had high accuracy and reliability in handling IoT edge tasks, effectively solving problems such as high cloud server load and slow response time [14]. Lai et al. proposed a new Fermat fuzzy number scoring function to handle discrete evaluation information and uncertain information that may exist in the decision-making process. The effectiveness and practicality of this method have been verified through case studies selected on blockchain platforms [15]. Jiang et al. proposed a fuzzy keyword searchable encryption scheme based on blockchain. The scheme utilized the editing distance to generate fuzzy keyword sets and a security index with verification labels for them. The authors also introduced a punishment mechanism through blockchain. After security analysis, the scheme realized non-adaptive semantic security. The performance analysis and functional comparison showed that the scheme was effective and could meet the requirements of cloud environment search application [16]. Zou et al. proposed a blockchain-assisted multi-keyword fuzzy search encryption scheme. This scheme involved mapping attributes in ciphertext policy attribute encryption to attribute tokens on the blockchain. It also involved building a keyword search scheme based on searchable symmetric encryption algorithms. In addition, it realized authorization and search with the help of smart contracts. Finally, it supported multi-keyword fuzzy search. A large number of experiments on real data sets have verified the effectiveness and efficiency of the scheme [17]. Zhou et al.’s core authentication data structure for text data validation and fuzzy query could verify the accumulator prefix tree. Formal security analysis and performance simulation showed that the scheme was safe and applicable to a variety of blockchains, and could solve the problems of relying on trusted third parties and fuzzy query with multiple false positives [18]. The summary table for related works is shown in Table 1.
Table 1. Summary table for related works
Field | Researchers | Method | Advantage | Disadvantage |
|---|---|---|---|---|
LSH | Li et al. [6] | GLDH method | Using the minimum information gain to select hyperplanes with good performance | The process may be time-consuming |
Orliński and Jankowski [7] | Acceleration algorithm based on random neighbor embedding | Eliminating the necessity of low dimensionality and having a good acceleration factor | There are certain requirements for the distribution of the dataset | |
Wu et al. [8] | LSH and importance sampling method | Effective on standard word embedding datasets | For large-scale datasets, the computational complexity may be significant | |
Aumueller et al. [9] | Algorithm based on LSH | Maintain data structure fairness | Not applicable to all datasets | |
Ferrada et al. [10] | K-nearest neighbor self similar connection algorithm | High accuracy of experience | High computational complexity | |
Blockchain | Zhang et al. [11] | Level 3 caching mechanism | Request response time is relatively short | Cache management is relatively complex |
Liu et al. [12] | Improved DPoS consistency mechanism based on probabilistic language term set | Improved consensus efficiency and flexibility | There are certain requirements for node communication and data processing capabilities | |
Bhattacharya et al. [13] | Assisted two-layer blockchain framework | Utilizing blockchain’s distributed storage, consensus mechanism, and avoiding single point of failure | Not applicable to all scenarios | |
Gardas et al. [14] | Method for selecting edge IoT nodes | Has high accuracy and reliability | The decision-making process is quite time-consuming | |
Lai et al. [15] | Fermat fuzzy number scoring function | Capable of handling discrete evaluation information and uncertain information in the decision-making process | The rating results are unstable in different scenarios | |
Jiang et al. [16] | Fuzzy keyword searchable encryption scheme based on blockchain | Implement fair payment and ciphertext search, with more complete functionality and higher system availability | The universality of different blockchain environments needs to be verified | |
Zou et al. [17] | Blockchain-assisted multi-keyword fuzzy search encryption for secure data sharing | Support fine-grained access control and attribute revocation | Complex technical implementation and high deployment and maintenance costs | |
Zhou et al. [18] | Verifiable Searchable Encryption Scheme With Blockchain Supporting Fuzzy Query | Verifiable fuzzy queries for text data | Challenges and potential risks to efficiency and accuracy in complex data scenarios |
In summary, extant studies have addressed local sensitive hashing from the perspectives of algorithm improvement and application expansion to enhance the search performance and acceleration effect of analogous terms. However, these studies have overlooked the integration of considerations for data privacy protection. In blockchain security verification mechanisms, numerous studies have been conducted that prioritize the optimization of mechanisms across diverse scenarios. However, these studies have not yet fully integrated efficient search technology. Although the work of combining fuzzy search with blockchain is involved, there are shortcomings in the comprehensive performance optimization of data privacy, integrity, and confidentiality protection. The fuzzy encryption search scheme and data verification mechanism based on blockchain proposed in this study protect the privacy of original data through hierarchical clustering algorithm and local sensitive hash function. The blockchain data verification mechanism, which integrates cloud storage with a fuzzy encryption algorithm, has been engineered to markedly enhance search efficiency and reduce time expenditures while upholding the integrity and confidentiality of data. This approach addresses the existing gap in comprehensive data privacy, search efficiency, and security.
Design of Blockchain Data Search and Verification Mechanism
This section proposes a fuzzy keyword data search method based on the KMLSH hybrid index. A blockchain data verification mechanism based on fuzzy encryption search scheme is designed, which utilizes blockchain technology to encrypt and cross-check data on the cloud storage end, ensuring the integrity and confidentiality of data files. The data owner obtains the dynamic identity key through the smart contract, encrypts the original document with the group public and private key system, and anonymizes the signature with the ring signature to ensure the identity privacy and the signature cannot be forged. The encrypted data and feature index are uploaded to the cloud server, and when the blockchain network distributes metadata such as file hashes in the chain search, the data user is authenticated by the smart contract and the system generates encryption trap fuzzy processing query keywords. The cloud server uses KMLSH mixed index matching to send the candidate results to the blockchain verification node. The validation module compares the on-chain metadata hash value with the real-time calculated value to ensure data integrity. When multiple users are concurrent, the blockchain coordinates parallel processing through smart contract nodes and cross-validates the verification results with consensus algorithms to achieve encryption protection and efficient verification of data.
Blockchain Data Search Method Based on KMLSH Hybrid Index
The extant fuzzy keyword encryption search methods are relatively elementary, offering only support for single keyword fuzzy searches, and are incapable of resolving issues such as spelling errors. Therefore, the study proposes a fuzzy keyword data search method based on the KMLSH hybrid index, which roughly segments the original dataset through a hierarchical Mini Batch K-Means clustering algorithm. The LSH function is used to map multiple similar keywords to the same location, achieving privacy protection for the original text. Mini Batch K-Means is an improved version of the K-means clustering algorithm. The K-means algorithm is a commonly used unsupervised learning algorithm for dividing a data set into K distinct clusters. The traditional K-means algorithm has large computation and high memory consumption when dealing with large-scale data sets. Mini Batch K-Means updates the cluster center by using only one part of the data at a time (small batches of data) instead of the entire data set. This can significantly reduce the amount of computation and memory consumption, while maintaining the accuracy of clustering to a certain extent, which is suitable for cluster analysis of processing large-scale data sets. The reason for choosing the hierarchical Mini Batch K-Means algorithm is that it can effectively handle large-scale raw datasets. Conventional clustering algorithms necessitate a substantial computational investment when processing voluminous data sets. In contrast, the proposed algorithm employs a streamlined approach by updating the clustering centers with a limited number of samples per iteration, thereby significantly reducing the computational demands and expediting the clustering process. Through hierarchical operations, data can be roughly segmented, laying the foundation for subsequent processing. The use of LSH is to achieve efficient mapping of similar keywords. It can map multiple similar keywords to the same position, which is crucial for solving spelling errors and other problems in fuzzy keyword search, while also protecting the privacy of the original text. Compared with other clustering algorithms such as traditional K-Means, Mini Batch K-Means is more suitable for large-scale data and has higher computational efficiency. In terms of similarity search, compared to ordinary hash functions, LSH has a better mapping effect on similar data, which can effectively solve the problem of data similarity matching in fuzzy search and greatly improve the accuracy and efficiency of blockchain data search. The initial cluster center of K samples is represented by Eq. (1).
1
In Eq. (1), the initial set of cluster centers is , and the number of cluster centers is . The nearest cluster center of the sample is represented by Eq. (2).
2
In Eq. (2), the cluster center closest to sample is . The position update of the cluster center is represented by Eq. (3).
3
In Eq. (3), the updated cluster center is . The learning rate is . In the Mini Batch K-Means algorithm, only one small batch of samples is used to update the cluster center in each iteration, which can reduce computational complexity and accelerate the clustering process. LSH is a technique used for approximate nearest neighbor search, represented by Eq. (4).
4
In Eq. (4), the hash value of data point is . The randomly selected parameters are and . A constant greater than the number of data points is . Equation (5) is the calculation of the hash function group is sensitive to .
5
In Eq. (5), the distance between points and is . is a tuple. and are random numbers. and are integers used to represent element indices. Figure 1 shows a blockchain data search algorithm based on the KMLSH hybrid index [19].
[See PDF for image]
Fig. 1
A blockchain data search method based on KMLSH hybrid index
First, the data are input and the text feature words are extracted through the improved Bigram algorithm, such as extracting the words “blockchain” and “data search” from an article. Then, a keyword collection is formed to organize these words together. The keywords are then initialized to prepare for subsequent processing. This is then converted into binary vectors, such as “blockchain” corresponding to a specific binary code. The hybrid index has been regenerated and integrates multiple indexing methods. The data set is roughly segmented to roughly group the data. Finally, keyword mapping is carried out to find relevant data and output it. For example, when searching for “blockchain technology”, relevant article paragraphs are found as output results through the above process.
The fuzzy keyword query method based on the KMLSH mixed index first preprocesses the basic data text and uses an improved Bigram algorithm to extract text feature words. Then, Bigram is utilized to obtain strings adjacent to keywords, record the content, frequency, and position information of the words, and add them to the keyword set. Next, the Mini-batch K-Means is utilized to filter and initialize keywords. A notable disadvantage of LSH is its sensitivity to data similarity, which can lead to hash conflicts. This occurs when data points with high similarity are hashed into different buckets, while data points with low similarity are hashed into the same bucket. In addition, LSH is not efficient at processing high-dimensional data and is too sensitive to small changes in the data. To overcome these disadvantages, a small batch K-means hierarchical tree structure is introduced. This structure improves the resolution of hashing and reduces conflicts by building a hash table of multiple levels and mapping data points to different levels. Concurrently, the small-batch K-means algorithm expedites the filtration of representative keywords during the initialization stage, thereby facilitating the effective organization of data through a hierarchical structure. This enhances the efficiency of processing high-dimensional data and ensures enhanced robustness to minor variations in data. In data preprocessing, the original data document is represented by Eq. (6).
6
In Eq. (6), the set of original data documents is . The quantity of documents is . After obtaining adjacent strings, Bigram adds them to the candidate word set, which is represented by Eq. (7).
7
In Eq. (7), the candidate word set is , and its word count is . The probability of segmenting a sequence is represented by Eq. (8).
8
In Eq. (8), the probability of splitting the sequence is . The feature word set is . To improve the efficiency of the method based on Bigram for calculating the probability of splitting sequences, a new hypothesis is proposed, that is, the appearance of a word is only related to its first n words. This hypothesis is proposed to simplify and optimize the original computational logic. The limitation of each word to only n words in front of it has the potential to reduce the number of factors that must be considered in the calculation process. Furthermore, this limitation can decrease the computational complexity and enhance the efficiency of the computational segmentation sequence probability method. At the same time, this hypothesis imposes reasonable constraints on the original data relationships, making data processing more targeted and avoiding unnecessary correlation calculations, thereby improving the efficiency and performance of the entire text processing process. In the collection creation, the detected words are compared. If they are new words, they are added to the collection. If they already exist, their occurrence is increased. Without considering special symbols, English letters, and punctuation, the next word is directly combined to obtain a set of feature words. The Bigram vector keyword representation method is sensitive to spelling errors and is also more sensitive to misspelled letters. Through this representation method, even if there are multiple spelling errors in keywords, the vector of keywords can still be approximated, achieving fuzzy queries for keywords. In the construction of permutation index, the Bloom filter maps high-dimensional points to low-dimensional space and uses a hash function to convert high-dimensional points into unit vectors. The Bloom filter is a data structure that can convert keywords into binary bit vectors and map these vectors to specific index positions with the help of hash functions. This conversion from high-dimensional data to low-dimensional space significantly reduces the complexity of the mapping process. This feature enables the Bloom filter to quickly locate the index area where the keywords are located in the face of a large amount of data and achieve fast keyword retrieval. Concurrently, the Bloom filter facilitates fuzzy queries, thereby enabling users to retrieve relevant information with a high degree of accuracy when the precise form of the keyword is not fully known. This enhancement significantly enhances the efficiency and flexibility of data querying. Figure 2 shows the process of keyword conversion to Bloom filter. In the text preprocessing phase, the stop word is removed. This step occurs prior to the extraction of text feature words using the improved Bigram algorithm. At this stage, common stops in the text, such as “of”, “and”, “yes”, etc., are removed. As for the choice of stop word data set, the study uses the stop word table provided by the English-Chinese Dictionary.
[See PDF for image]
Fig. 2
The process of converting keywords to Bloom filters
The process of converting keywords to Bloom filters involves representing each keyword as multiple binary bit vectors and mapping them to the same index position using the characteristics of Bloom filters. The process of building an index tree involves dividing the original Bloom filter-based index vector into multiple sub-vectors, each of which is a node of the index tree. In tree building, each node is split to ensure that the keywords in the node do not exceed the threshold. If the keywords do not exceed the threshold, the insertion is successful, otherwise the node needs to be split. When searching, generating trapdoors will match leaf nodes in the same way, and the final similarity score can be obtained by adding up the scores of each level. After sending the search token to the cloud server, the correlation between the query and the index vector is represented by Eq. (9).
9
In Eq. (9), the correlation between query and index vector is . The most relevant search results are represented by Eq. (10).
10
In Eq. (10), the search result for querying and encrypting documents is . The weight coefficient is . The importance of raw data in category is . The constants are and , respectively. By calculating the inner product of the encrypted index and trapdoor, as well as the inner product of the original index and query, a good matching keyword result is obtained within a certain range. Finally, the file is returned.
The proposed KMLSH hybrid index mechanism ensures the security and efficiency of data search through multi-level optimization. The hash conflict problem of LSH is alleviated by a small batch K-means hierarchical tree structure, which uses hierarchical mapping of multi-level hash tables and cluster pre-grouping to significantly reduce the collision probability of a single hash function family, making similar data more likely to fall into different buckets of the same level. The false positive rate of the Bloom filter is controlled below 1% by dynamically adjusting the number of hash functions and bit vector length. The false positive results are filtered in the subsequent precise matching stage, which mainly affects the query efficiency rather than security. The binary vectorization of keywords combined with homomorphic encryption technology ensures that the data is always in ciphertext during the search process, and even if LSH collisions or Bloom filter false positives occur, the attacker cannot infer the original information. The blockchain’s distributed verification mechanism provides a trusted environment for index building and query operations, preventing malicious nodes from tampering with hash mapping rules or Bloom filter parameters. Differential privacy technology adds the right amount of noise to the clustering process, further reducing the possibility of inferring sensitive information through statistical analysis. Access control policies implemented by smart contracts ensure that only authorized users can trigger specific search operations, preventing privacy breaches caused by unauthorized access. In summary, the mechanism through LSH optimization, Bloom filter tuning, encryption protection, blockchain verification and access control and other multiple means to ensure the search efficiency while building a robust security protection system.
Design of Blockchain Data Verification Mechanism Based on Fuzzy Encryption Search Scheme
Compared with cloud storage, blockchain decentralization can reduce network overhead, achieve encrypted data queries without the need for complete downloads, and record upload and search operations to achieve data integrity verification [20]. Based on this, a blockchain data verification mechanism based on fuzzy encryption search scheme is proposed, which uses blockchain technology to encrypt and cross-check data on cloud storage, ensuring the integrity and confidentiality of data files. The data owner in the storage module first obtains the identity key through the blockchain smart contract, and encrypts the data document using the group private key and public key. The generation of public and private keys is initiated through the use of ring signatures, which serve to ensure the security of signatures. Subsequently, data are encrypted, document indexes are established, and encrypted documents are stored in a cloud server. This series of steps is undertaken to ensure the confidentiality and integrity of data on the blockchain. By synchronously executing search operations with multiple users, fast querying of massive data can be achieved to reduce computational costs. Figure 3 shows the region chain fuzzy encryption search scheme.
[See PDF for image]
Fig. 3
Fuzzy encryption search scheme for regional chains
First, the data owner uploads the data to cloud storage through file processing. During the upload process, a ring signature is involved to ensure the security and privacy of the data. Cloud storage is responsible for the storage of data. When the data user needs to access the data, the access request is issued to the retrieval module, and the ring signature is also required. The access requests are routed through a central authority, which processes them and verifies the information using a node verification process that is stored on the blockchain. Subsequent to the verification process, the data user is permitted to download the requisite data from the cloud storage. The whole system ensures the security, privacy, and integrity of data in the process of storage and access through ring signature, node verification, and other mechanisms.
The regional chain fuzzy encryption search scheme mainly includes three substantive entities: cloud servers, blockchain, and data users. Data users include data owners and data users, and there are certain differences between the two in their operations on data. To achieve data privacy protection, it is necessary to consider file privacy, data owner privacy, and data user privacy. Cloud servers are responsible for storing data and responding to data user requests, while blockchain smart contracts maintain the verification and verification work of users during data storage and search services, to construct blockchain oriented storage query operations. The design process of a blockchain smart contract begins with a requirements analysis, clarifying the functional objectives of data storage, retrieval, and verification, and defining the permissions and interaction rules of each participant. Then, the logical architecture of the contract is designed, including identity management module, encryption algorithm module, query processing module and verification audit module, to ensure that the data flow and control flow between modules are clear and secure. Then, select the appropriate programming language and development framework, write smart contract code, to achieve identity key generation, data encryption, fuzzy search and matching, integrity verification and other core functions. In the process of code implementation, the principle of modular design is adopted to ensure that each functional unit can be independently tested and maintained. Finally, the smart contract is tested comprehensively through the simulation environment, including functional testing, security testing and performance testing, to ensure that it can run stably under various boundary conditions, and the correctness and security of the contract logic is checked by formal verification tools. The plan mainly includes three modules: storage, retrieval, and verification. In the storage module, the data owner needs to perform identity verification, then obtain the identity key through a blockchain smart contract, encrypt the data document, and upload it to the cloud storage end [21]. During the storage phase, the generation of group private keys and public keys is represented by Eq. (11).
11
In Eq. (11), the safety parameter is . The private key is . The public key is . The generation of ring signatures is represented by Eq. (12).
12
In Eq. (12), the event to be executed is . The private key of the th member is . The public key of other members is . The event signature is . Data encryption is represented by Eq. (13).
13
In Eq. (13), the key of the data owner is . The data to be encrypted is . The encrypted document is . The document index is established using Eq. (14).
14
In Eq. (14), the index of each data document is . Cloud server storage is represented by Eq. (15).
15
In Eq. (15), the cloud service storage is , used to store uploaded signature events, encrypted documents, and constructed indexes. In the retrieval module, the data user needs to first obtain authentication in the blockchain, then enter search keywords, perform retrieval through the blockchain smart contract, and return the results to the data user. In the verification module, the users need to perform identity verification and verify data integrity through a smart contract after data retrieval. Figure 4 shows the validation of blockchain data [22].
[See PDF for image]
Fig. 4
Blockchain data validation process
In Fig. 4, the user agent first sends the user’s authorization information to the blockchain, giving the system operation rights. The relevant data are signed by the authority to ensure the reliability and accuracy of the data source. The signed data are then recorded on the blockchain, which keeps the data secure thanks to its tamper-proof properties. The verifier will verify the certificate to verify the authenticity and integrity of the data. The users are also involved, enjoying the traceability and privacy that blockchain brings. In the whole process, the participants work together and the blockchain serves as the basic support, so that the data verification can be carried out in a safe and transparent environment, effectively preventing the data from being maliciously tampered with and ensuring the credibility of the data.
The blockchain data verification mechanism based on fuzzy encryption search scheme combines the collaborative effect between cloud servers, blockchain, and data users, providing users with efficient data storage and search services. In terms of data privacy protection, data owners and data users need to sign privacy agreements to ensure data privacy during data usage. The data server is responsible for storing data and ensuring the security and reliability of data storage. The blockchain smart contract is responsible for maintaining user authentication and verification in data storage and search services, constructing blockchain oriented storage query operations to provide efficient data search services. In data storage and search, fuzzy encryption algorithms are used to encrypt data to ensure its privacy and security. Data access control controls user access to data through authentication and permission control, thereby protecting data security and privacy. Finally, the blockchain audit mechanism records and regulates data access and operations to ensure that data usage complies with relevant regulations and standards. Figure 5 shows the blockchain data verification mechanism based on the fuzzy encryption search scheme.
[See PDF for image]
Fig. 5
Process of blockchain data verification mechanism based on fuzzy encryption search scheme
Application Analysis of Blockchain Data Search and Verification Mechanism
This section analyzed the blockchain data search method based on the KMLSH hybrid index and compared it with the multi-permission data access and blockchain-assisted sorting methods. The impact of the blockchain data verification mechanism based on the fuzzy encryption search scheme on deployment costs and on chain operation. Gas consumption was also analyzed, as well as the efficiency detection results of smart contracts.
Application Analysis of Blockchain Data Search Method Based on KMLSH Hybrid Index
A workstation with CPU Intel @ 3.90 GHz and 64 GB RAM was used, running on the Windows operating system. 4000 web resource files were used as the test dataset, with around 150 keywords per file. To perform a comprehensive evaluation and comparison of the performance of disparate blockchain data search and verification mechanisms, a comparative analysis was conducted. In this analysis, a variety of methods were utilized as comparison objects, and an in-depth examination of each method’s performance was undertaken. The performance metrics encompassed initialization time, key generation time, encryption and decryption cost, query time, and search accuracy. This comparison helped to reveal the advantages of KMLSH’s hybrid indexing approach when working with blockchain data, such as efficient data organization and fast access capabilities. In addition, the experiment simulated a real blockchain environment and further validated the validity of the blockchain data verification mechanism based on the fuzzy cryptographic search scheme by creating nodes and smart contracts on Ganache. The evaluation indexes mainly included initialization time cost, key generation time cost, encryption and decryption cost, query time cost, search accuracy, deployment cost, transaction time, transaction throughput, and data verification accuracy. These dimensions and indicators could comprehensively reflect the performance of different methods in terms of performance, efficiency, and security, so as to provide a basis for the selection of practical applications. The experiment selected l = 30 and m = 8000 as parameters for the Bloom filter, and used the parameter settings of the LSH hash function (√3,2,0.56,0.28) to construct the LSH hash function. To validate the proposed blockchain data search method based on KMLSH hybrid index (labeled as S1), the blockchain-assisted sorting method (labeled as S2) and multi-permission data access method (labeled as S3) were used as comparative methods. Figure 6 shows the initialization and key generation time costs for different methods.
[See PDF for image]
Fig. 6
The initialization and key generation time cost of different methods
Figure 6a shows a comparison of initialization time overhead. As the number of attributes increases, the time consumption of S1 remains constant at 98 ms, S2 remains constant at 200 ms, and S3 shows a linear increase in time consumption. When the number of attributes is 30, the time consumption of S3 is 149 ms. The proposed method reduces initialization time overhead by approximately 51.6% compared to blockchain-assisted sorting methods. Figure 6b shows the time cost of key generation. As the number of attributes increases, the time cost of the three methods shows a linear growth trend. When the number of attributes is 30, the time consumption of S1 is 21 ms, S2 is 162 ms, and S3 is 156 ms. Compared to blockchain-assisted sorting methods, the key generation time cost of the blockchain data search method based on KMLSH hybrid index is saved by about 78.3%. In summary, the blockchain data search method based on KMLSH hybrid index shows a faster advantage in terms of initialization and key generation time overhead. The primary rationale for this approach is that it integrates the benefits of KMLSH indexing, thereby facilitating efficient organization and expeditious access to blockchain data. This approach circumvents substantial computation and storage overhead during the initialization phase. At the same time, in the process of key generation, this method can make use of the characteristics of index structure, reduce the redundant calculation, improve the generation efficiency, and significantly reduce the time cost. Figure 7 shows a comparison of the time cost of encryption and decryption stages using different methods.
[See PDF for image]
Fig. 7
The initialization and key generation time cost of different methods
Figure 7a shows a comparison of the time overhead during the encryption phase. When the number of attributes is 30, the encryption time overhead for S1, S2, and S3 is 61 ms, 100 ms, and 138 ms, respectively. Compared with blockchain-assisted sorting methods and multi-permission data access methods, the encryption time cost of the blockchain data search method based on KMLSH hybrid index has been reduced by about 29.5% and 35.6%, respectively. Figure 7b shows a comparison of the time overhead during the decryption phase. As the number of attributes increases, the decryption time overhead of S2 and S3 gradually increases, while the decryption time overhead of S1 remains at around 0.65 ms. Therefore, the blockchain data search method based on KMLSH hybrid index has significant advantages when dealing with large amounts of data, and can provide faster encryption and decryption speeds. KMLSH hybrid index can effectively reduce the data dimension and reduce the computational complexity in the process of encryption and decryption. At the same time, this method realizes efficient processing of large amounts of data through reasonable allocation of computing resources, thereby ensuring data security and greatly improving the processing speed. Figure 8 shows the comparison of keyword and document query time costs for different methods.
[See PDF for image]
Fig. 8
Comparison of keyword and document query time costs using different methods
Figure 8a shows a comparison of keyword query time costs. When the keywords are 30, compared to S2 and S3, the query time cost of the blockchain data search method based on KMLSH mixed index is the least, about 20 ms. Figure 8b shows a comparison of document query time costs. When the documents are 600, compared to S2 and S3, the query time cost of the blockchain data search method based on KMLSH mixed index is the least, about 1100 ms. This is mainly because KMLSH hybrid indexes can accelerate the query process through efficient key lookup and data comparison. Figure 9 compares the search accuracy of different methods.
[See PDF for image]
Fig. 9
Comparison of search accuracy of different methods
Figure 9a shows a comparison of the accuracy of precise search. Compared to S2 and S3, S1 had the highest accuracy in precise search, reaching 98.2%. Figure 9b shows a comparison of the accuracy of fuzzy search. S2 and S3 did not support fuzzy search, so the accuracy was low and there were fluctuations. As the keywords increased, the accuracy of fuzzy search in blockchain data search methods based on KMLSH mixed index gradually increased. When the keywords were 30, the fuzzy search accuracy of this method was about 94.6%. These results confirmed that S1 performed well in both precise and fuzzy search, with high accuracy and good performance. The statistical results of algorithm comparison are shown in Table 2. The number of attributes is 30, the number of keywords is 30, and the number of files is 600.
Table 2. Statistical results of the algorithm comparison
Index | S1 | S2 | S3 |
|---|---|---|---|
Initialization time overhead/ms | 98.1 ± 12.8 | 200.2 ± 25.3** | 149.2 ± 20.1** |
The time cost of key generation/ms | 21.2 ± 2.8 | 162.1 ± 9.7** | 156.2 ± 8.5** |
Encryption overhead/ms | 61.2 ± 3.7 | 100.6 ± 8.2** | 138.2 ± 9.1** |
Decryption cost/ms | 0.65 ± 0.12 | 87.6 ± 6.1** | 122.7 ± 8.2** |
Keyword query time costs/ms | 21.2 ± 3.8 | 27.3 ± 4.8** | 35.4 ± 5.1** |
Document query time costs/ms | 1124.6 ± 99.5 | 1602.3 ± 105.8** | 1983.7 ± 121.5** |
The accuracy of precise search/% | 98.2 ± 1.2 | 94.5 ± 2.1** | 87.2 ± 2.8** |
The accuracy of fuzzy search/% | 94.6 ± 1.3 | 70.3 ± 8.6** | 64.9 ± 9.2** |
Note: “**” indicates a strong significant difference compared to S1, P < 0.001
In Table 2, S1 is superior to S2 and S3 in initialization time, key generation time, encryption cost, decryption cost, and keyword query time, among which S2 and S3 have strong significant differences compared with S1 (P < 0.001). Especially in terms of initialization time cost and key generation time, S2 and S3 are about twice and seven times of S1, respectively. In terms of precision search and fuzzy search accuracy, S1 also shows high performance, reaching 98.2% and 94.6%, which is significantly higher than that of S2 (94.5% and 70.3%) and S3 (87.2% and 64.9%). However, there is a large difference between the three algorithms in the cost of document query time, with S1 taking the least time and S3 taking the longest time.
Application Analysis of Blockchain Data Verification Mechanism Based on Fuzzy Encryption Search Scheme
The experiment first simulated the Ethereum blockchain environment on Ganache and created 10 nodes. Through Ganache, the current status of all accounts could be quickly viewed, including their addresses, private keys, transactions, and balances. During the experiment, the log output of Ganache’s internal blockchain was analyzed, and the debugging information of responses and other transaction calls was checked. To verify the effectiveness of the proposed blockchain data verification mechanism based on fuzzy encryption search scheme (marked as P1), the experiment compared the blockchain based data integrity verifiable sharing scheme (marked as P2), the decentralized IoT device data verification scheme (marked as P3), and the provable fully dynamic multi-replication data verification mechanism (marked as P4). Table 3 shows the performance comparison results of different schemes.
Table 3. Performance comparison results of different schemes
Name of scheme marker | Fuzzy search | Smart contracts | Verifiable | Blockchain | Customer type |
|---|---|---|---|---|---|
P1 | Support | Support | Support | Support | Multi-user |
P2 | Not supported | Support | Support | Support | Multi-user |
P3 | Not supported | Support | Support | Not supported | Single user |
P4 | Support | Support | Not supported | Not supported | Single user |
In Table 3, P1 supported fuzzy search and smart contracts, providing more flexible and convenient data query methods and automation capabilities. The characteristics of blockchain ensure the immutability and credibility of data, providing assurance for data verification. Figure 10 shows the deployment cost of P1.
[See PDF for image]
Fig. 10
The deployment cost of blockchain data verification mechanism based on fuzzy encryption search scheme
Figure 10a shows the deployment cost of smart contracts, which gradually increases as the number of smart contracts increases. The cost of deploying two smart contracts from a virtual chain is approximately 0.019 ether coin, and the cost of deploying eight smart contracts is approximately 0.069 ether coin. Figure 10b shows the consumption of gas during on chain operations. With the increase of data storage capacity, gas consumption for on chain operations is on the rise. When the number of files is 8, search, index creation, and storage operations consume 0.62 gas, 0.51 gas, and 0.33 gas, respectively. When the amount of data is large, the time and resources required to create and maintain indexes, as well as perform data comparison and search operations, will significantly increase, leading to an increase in gas consumption. In addition, the cost of storage and search operations is relatively low, so in practical applications, the cost of these operations is acceptable for users. This also indicates that the blockchain data verification mechanism based on fuzzy encryption search scheme has good feasibility and economy. Figure 11 shows the efficiency detection results of smart contracts.
[See PDF for image]
Fig. 11
Efficiency detection results of smart contracts
Figure 11a shows the transaction execution time of the smart contract, with the transaction duration increasing proportionally to the quantity of transactions. When the parallel transactions reached 1000, the transaction duration for Upload was 23.68 s, while the transaction duration for Search was 24.36 s. In Fig. 11b, the transaction throughput fluctuated slightly with the increase of parallel transactions, but the overall trend was relatively stable. This means that despite the increasing number of parallel transactions, the average transaction throughput remains at the level of 43 TPS (transactions per second). This indicates that in a given system environment, smart contracts have relatively high efficiency and can handle a large number of parallel transactions. The evaluation results on a real test network with 1 million blocks are shown in Table 4. The results showed that the deployment cost of P1 was 0.045 Ether, which was relatively low, with transaction throughput of 50 TPS and ahead of other solutions. The accuracy of data verification was 99.4%, which was also the highest. In summary, the proposed blockchain data verification mechanism based on fuzzy encryption search scheme had obvious advantages in terms of cost, throughput, and accuracy.
Table 4. Evaluation results on the real test network
Index | P1 | P2 | P3 | P4 |
|---|---|---|---|---|
Deployment cost (Ether) | 0.045 | 0.049 | 0.052 | 0.056 |
Transaction time (s) | 0.025 | 0.035 | 0.038 | 0.030 |
Transaction throughput (TPS) | 50 | 45 | 40 | 48 |
Search delay (milliseconds) | 153 | 182 | 170 | 163 |
Data validation accuracy (%) | 99.4 | 98.8 | 97.6 | 98.3 |
In summary, compared with other comparison methods, the proposed method shows significant time cost advantages in initialization, key generation, encryption and decryption, and query, and the accuracy rate of accurate and fuzzy search is also higher. The blockchain data verification mechanism, founded upon a fuzzy encryption search scheme, facilitates both fuzzy search and smart contracts. Moreover, the deployment cost and on-chain operation gas consumption remain within an acceptable range, while the efficiency of smart contracts is enhanced. However, this mechanism also has potential security vulnerabilities. For example, although the blockchain is immutable, it can still face external malicious attacks, and attackers may try to obtain encryption keys through network vulnerabilities to steal data. To solve this problem, the encryption algorithm can be updated regularly to enhance the complexity and security of the key, while the network access is strictly controlled, and the data security and the stable operation of the mechanism are guaranteed in multiple dimensions.
In real-world blockchain applications, the approach has a wide range of potential applications. In medical care, it facilitates rapid and precise retrieval of specific disease-related keywords within extensive patient records, thereby enabling efficient medical information retrieval. At the same time, the blockchain data verification mechanism based on the fuzzy encryption search scheme can ensure the safe storage and verification of patient privacy data and ensure the integrity and confidentiality of medical data. In the financial industry, its fast encryption and decryption and accurate search function help to quickly process financial transaction data, achieve efficient retrieval and verification of transaction records, and improve the security and efficiency of financial services. In the context of the IoT, the utilization of these methods facilitates the effective management of a substantial volume of device data. The immutability characteristic of blockchain technology ensures the reliable storage and secure transmission of IoT device data, thereby providing a robust foundation for the stable operation of IoT applications.
Conclusion
Aiming at the problem of blockchain data search and verification, this study used a hierarchical Mini Batch K-Means clustering algorithm to roughly segment the original dataset, and used the LSH function to map multiple similar keywords to the same position, achieving privacy protection of the original text. At the same time, utilizing blockchain technology to encrypt and cross-check data on cloud storage ensured the integrity and confidentiality of data files. The experimental results showed that when the number of attributes was 30, the key generation time of the proposed data search method was 21 ms. Compared with the blockchain-assisted sorting method and the multi-permission data access method, the encryption time cost of this method was reduced by about 29.5% and 35.6%, respectively. In the deployment of blockchain data verification mechanism, the cost of 8 smart contracts was approximately 0.069 Ethereum. When the number of files was 8, the search, index creation, and storage operations consumed 0.62Gas, 0.51Gas, and 0.33Gas, respectively, and the operating costs were within an acceptable range for users. The results indicate that the proposed blockchain data search method exhibits excellent performance in search speed, accuracy, and efficiency, while its verification mechanism has good feasibility and economy in terms of data privacy protection and cost. These research results are of great significance for promoting the application of blockchain technology in data search and verification, and are expected to provide users with more efficient and secure data storage and query services. The limitation of this research method is that it may be affected by performance limitations such as hash code length and dataset size in some cases. Therefore, future research needs to optimize algorithms to improve their search efficiency.
Author Contribution
K.L. wrote the main manuscript text.
Funding
There are no sponsorship or funding to report.
Data Availability Statement
No datasets were generated or analysed during the current study.
Declarations
Conflict of Interest
The authors declare no competing interests.
Abbreviations
K-means locality sensitive hashing
Locality sensitive hashing
Global low-density locality-sensitive hashing
Transactions per second
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Singh, MP; Chopra, AK. Computational governance and violable contracts for blockchain applications. Computer; 2020; 53,
2. Zhang, Q; Zhao, Z. Distributed storage scheme for encryption speech data based on blockchain and IPFS. Journal of Supercomputing; 2023; 79,
3. Li, X; Wang, Z; Leung, VCM; Ji, H; Liu, Y; Zhang, H. Blockchain-empowered data-driven networks: a survey and outlook. ACM Comput. Surv.; 2022; 54,
4. Yuan, H; Chen, X; Wang, J; Yuan, J; Yan, H; Susilo, W. Blockchain-based public auditing and secure deduplication with fair arbitration. Inf. Sci.; 2020; 541,
5. Rodrigues, EO. An efficient and locality-oriented Hausdorff distance algorithm: proposal and analysis of paradigms and implementations. Pattern Recogn.; 2021; 117,
6. Li, Y; Xiao, R; Wen, X; Liu, H; Zhang, S; Du, X. GLDH: Toward more efficient global low-density locality-sensitive hashing for high dimensions. Inf. Sci.; 2020; 533,
7. Orliński, M; Jankowski, N. Fast t-SNE algorithm with forest of balanced LSH trees and hybrid computation of repulsive forces. Knowl.-Based Syst.; 2020; 206,
8. Wu, X; Charikar, M; Natchu, V. Local density estimation in high dimensions. Math. Oper. Res.; 2022; 47,
9. Aumueller, M; Har-Peled, S; Mahabadi, S; Pagh, R; Silvestri, F. Sampling near neighbors in search for fairness. Commun. ACM; 2022; 65,
10. Ferrada, S; Bustos, B; Reyes, N. An efficient algorithm for approximated self-similarity joins in metric spaces. Inf. Syst.; 2020; 91,
11. Zhang, Q; Li, C; Du, T; Luo, Y. Multi-level caching and data verification based on Ethereum blockchain. Wireless Netw.; 2023; 29,
12. Liu, J; Xie, M; Chen, S; Ma, C; Gong, Q. An improved DPoS consensus mechanism in blockchain based on PLTS for the smart autonomous multi-robot system. Inf. Sci.; 2021; 575,
13. Bhattacharya, P; Ghafouri, M; Soeanu, A; Kassouf, M; Debbabi, M. Security enhancement of time synchronization and fault identification in WAMS using a two-layer blockchain framework. Appl. Energy; 2022; 315,
14. Gardas, BB; Heidari, A; Navimipour, NJ; Unal, M. A fuzzy-based method for objects selection in blockchain-enabled edge-IoT platforms using a hybrid multi-criteria decision-making model. Appl. Sci.; 2022; 12,
15. Lai, H; Liao, H; Long, Y; Zavadskas, EK. A hesitant Fermatean fuzzy CoCoSo method for group decision-making and an application to blockchain platform evaluation. Int. J. Fuzzy Syst.; 2022; 24,
16. Jiang, Y; Lu, J; Feng, T. Fuzzy keyword searchable encryption scheme based on blockchain. Information; 2022; 13,
17. Zou, Y; Peng, T; Wang, G; Luo, E; Xiong, J. Blockchain-assisted multi-keyword fuzzy search encryption for secure data sharing. J. Syst. Architect.; 2023; 144, [DOI: https://dx.doi.org/10.1016/j.sysarc.2023.102984] 102984.
18. Zhou, F; Jiao, Z; Wang, Q; Sun, J. BCVSE: Verifiable searchable encryption scheme with blockchain supporting fuzzy query. Arab. J. Sci. Eng.; 2024; 49,
19. Joshua, KP; Prabhu, AJ. Efficient data search and retrieval in cloud assisted IoT environment. International Journal of Data Science and Artificial Intelligence; 2024; 2,
20. Takefuji, Y. Security protection mechanisms must be embedded in blockchain applications. J. Chem. Educ.; 2020; 97,
21. Aryavalli, SNG; Kumar, GH. Futuristic vigilance: empowering chipko movement with cyber-savvy IoT to safeguard forests. Archives of Advanced Engineering Science; 2023; 2,
22. Moya, F; Quesada, FJ; Martínez, L; Estrella, FJ. Phonendo: a platform for publishing wearable data on distributed ledger technologies. Wireless Netw.; 2024; 30,
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.