Content area
With the distributed and decentralized nature of blockchain, and with its sequential data access, query processing emerges as a challenging issue in the blockchain systems. These features hinder efficient query processing and make it difficult to guarantee the validity and privacy-preserving of query results. Several solutions have been proposed to tackle the efficiency, reliability, and privacy challenges of query processing in blockchain systems. There has been rarely a comprehensive solution addressing all of these issues. In addition, the existing solutions often assume that the blockchain nodes are homogeneous in terms of their capabilities and available resources, while the blockchain nodes can have heterogeneous computational, communication, and storage resources, and can also contribute to the blockchain network in different manners. This work, considering the heterogeneity of network nodes, introduces a multi-level and score-based sharding solution for query processing where the nodes are organized into a hierarchical tree-like structure based on their score and store a proportion of transaction data in a DAG-based data structure resulting in an efficient query time. Additionally, the nodes reach a consensus over the query results from the bottom to the top of the hierarchical structure enabling reliable and fast query processing. The experiments conducted during the evaluation show that the efficiency of the proposed work is near that of relational databases in terms of query response time. It also provides a high validity rate taking advantage of its hierarchical consensus mechanism and preserves the privacy of query results using a delegation-based integration method where the final query result is integrated by the client’s representative.
Introduction
With the increasing popularity of blockchain technology, many industries—such as supply chain management, energy management, healthcare, and more—are interested in integrating this technology into their applications and taking its advantage. Due to being event-driven, one of the preliminary requirements of these industrial applications is to be capable of processing users’ queries, while query processing is a challenging issue in blockchain systems as a result of its decentralized nature. Indeed, data are stored on the blockchain in a distributed manner and are spread across the whole data ledger preventing efficient query processing, e.g., range query and aggregate query. Therefore, there is a strong need for devising a solution for efficient and fast query processing capability in blockchain systems. In addition, validity guarantee (reliability) and privacy protection are the other main concerns of blockchain query processing technologies.
Efficient query processing solutions [1, 2, 3, 4, 5, 6–7] are aimed to provide a well-structured storing system where data access is possible in a fast and high-performance way. On the other hand, verifiable query processing solutions [8, 9] ensure users about the correctness and validity of the query results by some cryptographic-based techniques such as digital signature and trusted hardware environments. Privacy-preserving query processing solutions [10, 11, 12–13] refer to the approaches aiming to avoid data leakage during query processing and prevent disclosure of query results in front of attackers.
Efficient query processing solutions in blockchain usually utilize three main approaches: (1) implementing smart contract-based query processing technologies, (2) storing blockchain data on external databases, and (3) indexing blockchain data. The smart contract-based solutions [14, 15] have some limitations including gas limitations, amount limitations, and not being generalizable. In addition, storing blockchain data on an external database [16, 17, 18–19] has the following drawbacks: (1) external databases are potentially vulnerable and not tamper-resistant, and (2) synchronizing the external database with the blockchain data is time-consuming and hinders the real-time query. Furthermore, implementing an index structure on the blockchain data [20, 21, 22–23] is inconvenient and it also is not scalable enough.
Additionally, verifiable query processing methods verify query results based on techniques such as authenticated data structure (ADS) [24, 25, 26, 27–28], trusted hardware [29, 30, 31–32], as well as some cryptographic mechanisms containing multi-signature [9], and database fingerprint [33].
To prevent data leakage during query result transmission, privacy-preserving solutions mostly use searchable encryption or private information retrieval methods [34, 35, 36–37]. Moreover, to protect the identity of the user [10, 38, 39–40] and query results [41, 42–43], some mechanisms such as encryption, anonymity methods, and server proxy technologies are utilized.
A number of solutions have been proposed in the literature to address the efficiency, reliability, and security of the query results in blockchain systems, but most of them focus only on one of these challenging issues and barely propose a comprehensive solution.
As mentioned before, one of the key features of the blockchain systems hindering efficient query processing is its distributed and decentralized data structure where the transaction data are scattered across the whole data ledger and the network is not under the control of a centralized authority. Therefore, a well-designed structure is required for both organizing network nodes and the data-storing system, which causes a quick query processing time. In addition, the above-mentioned features harden the guarantee for the validity and security of the query results. Hence, this work presents a new structure model for organizing network nodes based on multi-level sharding. Sharding is a technique used to split up the network workload across multiple subsets referred to as “Shards” [44, 45]. In the proposed work, the nodes first are organized into multiple shards then each shard’s nodes is divided into multiple levels. Exploiting this multi-level sharding, the query processing can be parallelized both inter-shard and intra-shard resulting in a significant improvement in the efficiency of the query processing. In this new structure, the nodes are considered to be heterogeneous in terms of the following aspects: (1) their capabilities and available resources, (2) their contribution to the network, and (3) the transaction type that the node processes. However, in most existing query processing solutions for blockchains, the heterogeneity of the nodes has not been considered. It causes the node capabilities not to be used efficiently and the functionality and contribution of the nodes in the network to be disregarded. Considering the heterogeneity of the nodes, the network workload can be delegated to the nodes based on their capabilities (e.g., storage, computational, and communication resources) and their functionality in the network. In addition, the decentralization of blockchains turns the validity of the query results into a challenging issue. To address this issue, this work uses consensus-based query processing where query results are validated in a hierarchical manner. Moreover, privacy protection of the query process and query result is one of the concerns of this study addressed by an encryption and delegation-based method. Overall, the theoretical contributions of this work are highlighted as follows:
Presenting an efficient, reliable and privacy-preserving query processing solution.
Considering the heterogeneity of the network nodes for efficient and reliable query processing.
Providing a storage-efficient blockchain system.
The detailed novelties of this work to implement the above-mentioned contributions are as follows:
Multi-dimensional data storage there exist different types of transactions and each transaction type is represented with a keyword. The transaction data are stored based on a sharding-based data structure as each shard is responsible for storing and processing one of the transaction types (or more) each of which corresponds to a keyword. Therefore, processing a multi-dimensional query can be parallelized among different shards leading to a fast and efficient query processing.
Multi-level query processing the participating nodes in each shard are categorized into multiple levels based on their contribution and resource capabilities during protocol execution. Then, the members of each level are divided into several groups, the so-called level-groups. After that, the level-groups are organized into a hierarchical structure where each vertex denotes a level-group and members of each level-group are responsible for processing a subset of its parent level-group’s workload. Therefore, the query processing can be parallelized among different level-groups, in addition to parallelizing among the shards, resulting in an efficient and high-performance query processing.
Consensus-based query processing to ensure query validity, the nodes process the queries from the bottom to the top of the hierarchical structure, at the same time, they make agreement on the query results. In fact, query results are validated at the different levels of the hierarchical structure.
Delegation-based integration method to preserve query result security, a delegation method is used to integrate or aggregate the query results received from different shards. To that end, the query client declares a representative to calculate the final query result on its behalf.
The content of this paper is arranged as follows: Sect. 2 provides an overview of some related work. Section 3 introduces the system models and definitions. The proposed work is presented in Sect. 4. In Sect. 5, the evaluation of the proposed work and experimental results are discussed. Finally, the conclusions are given in Sect. 6.
Related work
This section summarizes some existing work on query processing in the blockchain.
Authors in [1] introduce SE-Chain a scalable storage blockchain that provides an efficient and fast retrieval model using a new data structure so-called AB-M tree (adaptive balanced Merkle tree). The AB-M tree speeds up the queries while reducing query processing overhead. Meanwhile, in SE-Chain the full nodes only store a portion of the whole data ledger determined by a duplicate ratio regulation algorithm.
A spatiotemporal blockchain was presented in [2], providing an efficient query processing in a block directed acyclic graph (block-DAG)-based data structure. Indeed, it integrates a Merkle tree with a spatial-index to facilitate the processing of spatiotemporal queries without any need for local indexing. In addition, the integrity of the queries is guaranteed by cryptographically signed history in the block-DAG.
In [3], the authors introduce CHAINSQL, an open-source system that integrates blockchain with distributed databases to make use of both the integrity of blockchain systems and the fast query processing of distributed databases.
An efficient index structure based on a subchain account transaction chain (SCATC) is introduced in [4] where the account transaction chain is divided into subchains and the last block of each subchain includes all hash pointers to the account branch nodes and all information about the corresponding account. Therefore, the queries can be processed in a fast and high-performance way due to subchain-by-subchain query mode.
In [5], the authors design an Ethereum smart contract-based solution for query processing on the pharmacogenomic data. It uses the uniqueness of the attribute values to enable mapping searches and provide efficient key-value queries.
In [6], authors introduce a secure storage scheme with an efficient and traceable data access mechanism for blockchain-based systems. To that end, the work at [6] utilizes an attribute-based access control (ABAC) model and implements a smart contract-based solution that guarantees high-speed data storage, retrieval, query, and fine-grained access control.
Authors in [7] propose a solution for the blockchain top-k transaction path query (BCTkPQ) problem. This problem aims to retrieve first k transactions within a specified query path, based on the conditions specified by the user. To ensure the secure and efficient execution of BCTkPQ, authors introduce a unique collaborative query model (CQM) which consists of a set of collaborative peers including three main components: a parser, an indexer, and an executor.
Authors in [8] offer Vchain + that provides an efficient verifiable solution for Boolean and range queries through building a sliding window accumulator (SWA) with size k denoting k last blocks. Vchain + uses a trie index for keyword queries and B + -tree for range queries. Moreover, to enable public key management an object registration index is proposed.
In [9] a blockchain-based query framework for endpoint users enabling privacy protection and query verification is proposed. To preserve privacy, data are stored on the on-chain and off-chain in an encrypted form, and to provide query verification, a multi-signature mechanism is utilized where each individual node is allowed to endorse the query result, then the user verifies the endorsement of the query result.
Authors in [10] present a blockchain-based search framework for cloud-outsourced encrypted data where an index of the encrypted data is stored on the Ethereum blockchain [46], using it the user is enabled to verify query result integrity.
In [11], the authors propose a secure heuristic semantic searching scheme that utilizes a privacy-preserving word nonlinear method and retrieves ranked search results with high accuracy. In addition, a blockchain-based verification mechanism was designed where the blockchain nodes reached a consensus over the correctness of the search results using the proofs generated during the matching process.
In [12] a solution for a trusted data-sharing framework is presented that utilizes attribute-based encryption (ABE), searchable encryption, and blockchain. The implemented searchable encryption improves search results accuracy via a multiple keywords ciphertext retrieval method.
Authors in [13] propose a blockchain-based data-sharing and privacy-preserving eHealth system called SPChain. To protect the privacy of patients’ medical histories, SPChain utilizes the proxy re-encryption scheme so that only trusted medical institutions can retrieve patient information.
The system models and definitions
System models
Transaction model
In the underlying blockchain system of this work, it is assumed that there are different types of transactions, each of which is denoted by a keyword. Each shard is responsible for one type of transaction, and a node is a member of a specific shard if it is a part of at least one transaction of the type that the shard is in charge of. Therefore, a node is allowed to be a member of one or more shards. The transactions related to a specific entity which are of different types are linked together using a pointer. That is, in comparison with a relational database, a shard equals the many-to-many or many-to-one tables and a transaction equals a record of the tables. Moreover, the function of the pointers is similar to the foreign key in a relational database.
Query model
Each query submitted by a client is a triplet ) where is a set of transaction types each of which is denoted by a keyword, is a set of triples of [property, value, operation] determining the user-specified conditions of a given query that are required to be met by the relevant transactions, and is a set of output properties (fields) which have to be included in the query result. For deeper understanding, an example of the query model with descriptive information has been provided in Sect. 4.5.
System architecture model
The network nodes are assumed to be heterogeneous in terms of their capabilities and available resources (e.g., computational, communication, and storage capabilities). During the protocol execution, the participating nodes are scored dynamically, according to their contributions in the network and also their available resources at the moment. The nodes in each shard are categorized into levels based on their score. Afterward, the nodes belonging to each level are divided into smaller groups (so-called “level-groups”). Then, the shard level-groups are organized into a hierarchical tree-like structure where the workload delegated to the nodes in each level-group is a portion of the parent level-group’s workload. The hierarchical tree structure is a full tree of depth and degree . Figure 1 shows an illustration of the hierarchical full tree structure where and . As is illustrated, the topmost level-group is the root level-group consisting of the strongest nodes in the network (shown with the darkest color) while the leaf level-groups consist of the weakest nodes in the network (shown with the lightest color). The branches of the tree structure represent the hierarchy in the network and indicate that the workload of the child level-groups is a subset of their parent’s workload and the contribution of the child level-groups needs to be validated by the parent level-group.
[See PDF for image]
Fig. 1
An example of the hierarchical full tree structure with depth = 4 and degree = 2
According to full tree structures, the number of the level-groups in each shard equals , and in each level , there are level-groups. Let be a representation of the tree structure in the underlying system architecture, then:
Note that the optimal type of the introduced hierarchical tree structure is a perfect tree, that is a full tree in which all the leaf nodes (level-groups) are at the same level and all the non-leaf nodes have children. The hierarchical perfect tree in which the tree has maximum branches results in a more efficient data structure since each child level-group reduces the workload of its parent level-group by and each branch in level of the hierarchical tree structure improves the efficiency approximately by . Hence, a perfect and full tree is a more optimal type of the proposed tree structure. In this paper, the term “full tree” has been used generally instead of “perfect and full tree.”
Data structure model
The data structure used in this work is a DAG-based structure where transactions related to different nodes are stored on separate chains. Each transaction is linked to the previous transaction in the node’s transaction chain using a hash value [47]. Each node stores an individual transaction chain for each node and |. That is, each node belonging to a specific level-group stores and processes the transactions related to the members of the same and child level-groups.
Threat model
In this work, it is assumed that the participating nodes can be honest or malicious and at any time of the protocol execution, the majority of the nodes are honest. Malicious nodes may collude with each other or act individually trying to cause protocol failure. To disrupt the functionality of the query processing protocol, malicious nodes can do different activities such as sending no or delayed responses to a query, sending invalid query results, disclosing the information under question, and causing data leakage [24].
Definitions
In the following, some important definitions of the underlying blockchain system are described.
Directory committee
In each level-group, nodes are selected as consensus members. Consensus members of the root level-group in all the existing shards form a committee named “Directory committee.” Members of the directory committee are in charge of the following tasks: (1) distribution of the subqueries among their child level-groups, (2) making an agreement over the final list of the relevant transactions’ ID (TXID), and sending it to the client (or client’s representative).
Representative
Representatives are the full nodes in the underlying blockchain system that maintain the whole data ledger and have enough resources for the computation of the final query result from the final list of TXIDs. Each client or account node must choose a representative or declare itself as its representative. Representative nodes gain a higher fee from each query processing. The higher the score of a full node, the more likely that it will be selected as a representative.
Query operations
There are two main groups of the operation used by the representatives to compute the final query result. These operations are as follows: (1) integration operations, including JOIN, UNION, INTERSECTION, and DIFFERENCE; this group of operations is used for integrating the final list of TXIDs sent by the directory committee members, and (2) aggregation operations, including MAX, MIN, COUNT, SUM, and AVERAGE; these operations are used by the representatives for calculating the final query result from the final list of TXIDs. To avoid complexity and make the concepts easier to understand, we use the “Join function” through this paper to mention the integration or aggregation operations that are used by a representative for computing the final query result.
The proposed work
In this section, the proposed work is discussed in detail. In addition, at the end of the section, an example scenario will be explained for a better understanding of both system models and workflow of the proposed work.
Bootstraping phase
The bootstrapping phase runs once at the beginning of the protocol. At this phase, the different shards of the underlying system, which are responsible for a given transaction type, are set up and their initial participating nodes are organized into a hierarchical tree structure after assigning a score to each of them.
For this purpose, first, a list consisting of proof of work (PoW) puzzles is published in the network. These PoW puzzles are sorted in ascending order by their difficulty. The more difficult a PoW puzzle is, the more scores the nodes can obtain by solving it.
To achieve an initial score, the initial set of nodes announces the transaction types that they are aiming to process, their available storage space, and a list of the nonces that they have generated by solving the first PoW puzzles included in . Then, the nodes processing the same transaction type calculate the score of their co-members, by a hard code, where the following parameters are considered into account: (a) the available storage space, (b) the total score that the node gained by solving PoW puzzles, and (c) the receive time of the node announcement.
Meanwhile, it is necessary that the nodes first concatenate the puzzle string with their public key and then solve it. This results in the valid nonces of a PoW puzzle solved by different participating nodes being different.
After time , during which the nodes are required to prepare their announcement message and then spread it into the network, the nodes volunteering to process the same transaction type (co-members) calculate a score for each other and then exchange the scores with each other. Afterward, the honest nodes consider the average of the scores calculated by all the group members as the final score of each node. Then, the co-member nodes are placed into the same shard and are organized into a hierarchical tree-like structure based on their scores. More detail is presented in the next section.
The system runs in epochs and joining and leaving the network take place whenever the system starts a new epoch. At the beginning of each epoch, the list is updated and the nodes that volunteer to join into the network send their announcement message to the nodes processing the target transaction type that they are volunteered to process. After that, same as at the bootstrapping phase, the nodes that process the same transaction type reach an agreement over the newly joined nodes’ score. Then, the nodes belonging to each shard are reorganized into the levels and level-groups. Therefore, the nodes that are new at a level-group need to download the history of the transactions corresponding to the level-group they are included in. To download the transaction data, each node randomly sends a request to one member of the last consensus group (in the previous epoch) of its level-group and verifies the integrity of the received transaction data by asking the other previous consensus members to send a cumulative hash of the last transaction of the stored transaction chains. If the hash received from the majority of the consensus group is matched to that of the downloaded transaction chains, the integrity of the downloaded transaction data will be verified.
Sharding
Sharding is a mechanism that divides the blockchain network into several smaller partitions, referred to as “shards,” to portion the blockchain network’s workload among them. This mechanism is usually used to enhance the scalability of the blockchain systems [44, 45]. In this work, sharding is the main technique used to improve the efficiency of the query processing in the blockchain.
The underlying assumption is that the nodes are heterogeneous from different aspects as follows: (1) their capabilities and available resources (e.g., computational, communication, and storage capabilities), (2) their contribution to the network’s functionality which, in this work, is contributing in the consensus process over the query results, and (3) the type of the transactions that they process. However, a main drawback of the existing sharding-based blockchain systems [44, 45, 48, 49–50] is that network nodes are assumed to be homogeneous and nodes with different capabilities are not distinguished during the sharding process. Therefore, the proposed solution introduces a multi-level sharding based on the heterogeneity of network nodes and provides a comprehensive solution for query processing in blockchain.
Firstly, the network nodes are divided into multiple shards based on the transaction type they are responsible for. Thus, the transactions and queries are parallelly processed in different shards. Meanwhile, the member nodes of each shard are split into multiple levels based on their assigned score, and transaction and query processing are parallelized among the shard levels. Therefore, the queries can be parallelized both inter-shard and intra-shard.
For intra-shard query processing, the shard nodes are organized into a hierarchical tree-like structure and make consensus over the query (subquery) results. To organize nodes into a hierarchical structure, they are firstly divided into levels based on their contribution and available resources. Then, the nodes in each level are partitioned into several smaller groups—the so-called level-groups (LGs). After forming the level-groups, the level-groups are arranged into a full tree of depth and degree where the number of the level-groups is (see Fig. 1).
In this hierarchical structure, the level-group members are responsible for a portion of their parent level-group’s workload. Each level-group member maintains the transactions related to its co-members in the corresponding level-group as well as the transactions related to the members of its child level-groups. In fact, each node stores a separate transaction chain for the members of the same and child level-groups.
Query processing
Each query submitted by a client is first converted into subqueries by the application, based on the transaction types that are involved in the query. Then, each subquery is submitted to the directory committee of the corresponding shard. Upon receiving the subquery, the directory committee distributes it among the consensus committee of the shard which is in charge of making consensus over the subquery result.
To form a consensus committee for each shard, a set of nodes is chosen among members of the level-groups forming the shard hierarchical structure. Accordingly, the number of the consensus members in each shard equals where is the number of shard level-groups. To choose the consensus members of each level-group, the proposed solution uses the election network introduced by Rapidchain [45] where the majority of the selected nodes in each group are honest.
Upon receiving a subquery, each consensus member is responsible for calculating the subquery result according to its workload. Then, the subquery results are aggregated from the bottom to the top of the hierarchical tree structure by the consensus members.
To process a subquery, the consensus members search across their stored transaction chains for the transactions that meet the query conditions. The consensus members search the transactions based on their receive time and continue searching until when all the transaction chains stored by them are explored or they face resource limitations. In that case, consensus members of the leaf level-groups send the list of the found transactions’ IDs to the consensus members of their parent level-group, while consensus members of the non-leaf level-groups first combine their found transactions’ IDs with the transaction IDs included in the majority of the lists received from consensus members of their child level-groups and then send the combined list to their parent level-group.
Furthermore, after aggregation of the transaction IDs, the consensus members can also query the transactions whose receive time is after the last accepted transaction from child level-groups. Therefore, the list of the relevant transactions’ IDs is completed from the bottom to the top of the hierarchical tree structure. Eventually, the final list is sent to the client’s representative by the consensus members of the root level-group that are a member of the directory committee. Algorithm 1 summarizes the described algorithm.
It has to be said that, during the consensus process, the consensus members reach also an agreement over the nodes’ new score based on their contribution to the consensus process. A negative contribution can cause the node to be replaced with another node. More detail will be given in the Score Calculation section.
To sum up, Fig. 2 illustrates a simplified workflow for the proposed work.
[See PDF for image]
Fig. 2
Workflow of the proposed work
In the following, the efficiency, reliability, and privacy-preserving of the proposed query processing solution are investigated in detail:
Efficiency
The efficiency of the proposed query processing is achieved by parallelizing the query processing inter-shard and intra-shard. In fact, the subqueries are distributed among the shards which store the transactions of the types included in the query. Thus, the shards participating in the query processing process subqueries in parallel. On the other hand, within each shard, the level-groups, forming the hierarchical tree structure, search through the transaction chains to find relevant transactions. In fact, in each shard, there are level-groups whose members process the subquery in parallel. However, the consensus members of the non-leaf level-groups need to spend additional time aggregating the list of the relevant transaction IDs.
Reliability
In this work, reliability (validity of the query results) is provided using a consensus algorithm. That is, the consensus members in each shard reach an agreement over the transactions that meet the query conditions. Each consensus member accepts the transactions that are returned by the majority of its child level-groups’ consensus members, because taking advantage of the election network proposed by the Rapidchain [45], it is assumed that the majority of the consensus members in each level-group are honest. At last, each consensus member combines the transactions returned by the majority of the child consensus members with the relevant transactions found by itself. Therefore, each relevant transaction is validated from the bottom to the top of the hierarchical structure.
Privacy of query result
The relevant transactions’ ID list is completed from the bottom to the top of the hierarchical tree structure by the consensus members of the shard. Therefore, the final relevant transactions’ ID list is prepared by the consensus members of the root level-group in the shard and then it is sent to the client’s representative. Upon receiving the final list of the relevant transactions’ IDs from all the shards that participate in the processing of a given query, the representative confirms the valid transactions received from the majority of the participating shards’ directory committee members. Afterward, it performs required query operations on the final relevant transactions to obtain the final query result and then sends the encrypted query result to the client ensuring privacy-preserving.
Score calculation
The score of the participating nodes determines the level at which the node can cooperate and is calculated dynamically during the protocol execution, based on the node’s contribution to the protocol and also its available resources.
The score calculation phase runs through the consensus process when the consensus members agree on the subquery results. In fact, while aggregating the list of the relevant transactions’ IDs found by themself with those received from their child level-groups, consensus members reach an agreement on the new score of the child consensus members according to their contribution to the consensus process. The new scores are obtained from Eq. (1) inspired by the reputation score calculation method of [51]:
1
where is the current score of node , and and are the number of the relevant transactions that the node found correctly and the weight considered for them. This is while and are the number of incorrect transactions (irrelevant transactions) found by the node and their weight in the score calculation phase. On the other hand, and are the number of relevant transactions that the node did not find, because of the resource limitations or malicious activity, but are discovered by the majority of the consensus members at the level-group the node belongs to, and the weight that is considered for not found relevant transactions. In addition, and are the times when the list of the relevant transactions’ ID is received from the node , and the time when the current consensus procedure starts, respectively. The difference between these two times has been taken into account in Eq. (1) in order to the effect of the communication capabilities of the nodes to be considered in their score.Simultaneously with calculating the score of its child consensus members at level , each consensus member belonging to level calculates the score of its successor consensus nodes at level to by computing the average of the node’s scores calculated by its child consensus members at level .
Finally, the new score of the consensus member of a level-group is shared with the level-group members that voted for that consensus member (during the election network process introduced by Rapidchain [45]) depending on the current proportion of the voter’s score and the score of the consensus member that the voter voted for.
Example
In this section, aiming to give a deeper understanding of the proposed mechanism and its fundamentals, a simplified model of the underlying blockchain for a healthcare system is described. This healthcare system maintains a medical history of the patients. It is assumed that these medical records are shared in the blockchain with the permission of the patient. It is presumed that there are four types of transactions that are as follows: (1) patient: it includes some of the patient’s characteristics such as age, height, weight, last blood pressure, anatomic configuration of the spinal canal, and last blood glucose measurement, (2) disease: this type of transaction shows the diagnosed disease a patient (represented by a PatientID) is affected by, (3) symptoms: this transaction type stores the signs and symptoms of a patient, these symptoms can be self-reported by the patient, diagnosed by a physician, or observed in results of medical tests, and (4) treatment process: it includes the steps followed by a patient in the treatment process. Therefore, this blockchain-based healthcare system can consist of four shards tagged by “patient,” “disease,” “symptoms,” and “treatment process” keywords.
Figure 3 demonstrates an example of a query that can be submitted to the blockchain by a client (e.g., a research organization). This query is looking up “the age and symptoms observed in the medical tests of the patients over 60 years old affected by Alzheimer’s disease.” Therefore, the query submitted by the application is a triplet (, , ) where denotes a set of target transaction types, indicates a set of triples of [property, value, operation] determining the user-specified conditions of the query, and includes a set of fields as the output of the query.
[See PDF for image]
Fig. 3
An example of the proposed query model
According to the transaction types involved in a query, the query is turned into subqueries each of which is submitted to the shard that is responsible for processing that type of transaction.
Figure 4 shows an overview of the system architecture for the example scenario. As Fig. 4 demonstrates, the shards that are involved in this example query are the ones tagged by “patient,” “disease,” and “symptoms.” As it can be seen, the that is submitted to the shard “patient” requests for the patients over 60 years old, that is distributed in the shard “disease” looks up the patients affected by Alzheimer’s disease, and that is sent to the shard “symptoms” inquires the symptoms that are observed in the result of patients’ medical tests.
[See PDF for image]
Fig. 4
An overview of the system architecture for the example scenario
At first, the subqueries are submitted to the directory committee. Upon receiving a subquery, the members of the directory committee distribute the subquery among the consensus members. After that, the consensus members organized into a hierarchical tree structure can reach an agreement over the subquery result from the bottom to the top of the structure. After making a consensus over the subquery result, the directory committee of the shard consisting of the consensus members of the root level-group sends the final list of the relevant transactions’ ID to the representative of the client for calculating the final query result and sending the encrypted result to the client. Therefore, in this query example, the representative receives three lists of TXIDs as follows: (1) : it consists of the transactions of type “Patient” that meet the condition age > 60, (2): : it includes the transactions of type “disease” recorded as Alzheimer’s disease, and (3) : this list contains the disease symptoms whose type of diagnosis is “medical test result.”
To achieve the final query result, upon receiving the lists of TXIDs, the representative that is a full node first validates and fetches the transactions corresponding to each TXID included in and then joins the other lists of TXIDs based on the patient ID and the pointers that link the transactions together. Indeed, in each shard, the transaction chains are created per each patient ID. Afterward, the representative encrypts the query result with the public key of the client and sends the encrypted result to the client.
Figure 5 demonstrates an overview of the contributions of the representative in the current example scenario and the process that the representative goes through to calculate the final query result. In addition, a generalized flowchart of the Join function is shown in Fig. 5.
[See PDF for image]
Fig. 5
An overview of the contributions of the representative in the example scenario
To give a clear insight about the benefits of the privacy protection of the proposed solution, consider a health information service [52] where privacy protection of user health topic is essential. The implementation of this service under the proposed solution can preserve the security of the user’s health data. A query requested by a client (for example, the query illustrated in Fig. 3) is processed in a secure and privacy protection way, and the final query result will not be exposed against other nodes due to the following reasons:
Delegation-based integration method a query requested by a client is broken down into multiple subqueries each of which corresponds to a specific transaction type (e.g., patient, disease, symptoms, treatment process, and healthcare centers). Moreover, in each shard, the subquery is processed by different level-groups in parallel. Therefore, each node processes a part of the transaction data related to the requested query and the final query results are combined by the client’s representative.
Encryption of the query results the final query result integrated by the client’s representative is sent to the client in an encrypted form.
Evaluation and analysis
In this section, an evaluation of the proposed query processing solution from the perspectives of efficiency, reliability, and privacy-preserving is presented. To assess the proposed work, we conducted the experiments by simulating a blockchain-based healthcare system using Python language. The query model and the transaction model described in Sect. 4.5 have been used as the basis of the experiments. To simulate the transaction data and queries, we used random functions that generate the transactions and queries randomly based on the system transaction types, the transaction properties, and predefined domains for valid values of the transaction properties. In order to achieve more stable results, each simulation experiment was repeated 200 times and the results denote the average values in all the experiments. In the simulation, the network size is set to nodes and the arrival rate of the queries is set to queries per second. To simulate the heterogeneity of the network nodes, the processing, communication, and storage capabilities of the participating nodes are modeled as random variables. The processing speed follows a normal random variable with a mean of 2 Gigahertz (GHz) and a standard deviation of 1.8 GHz, i.e., , with a lower bound of 1 GHz. The number of cores for each node is generated randomly between 2 and 20. The download and upload speeds in terms of Megabits per second (Mbps) are also modeled as a normal distribution with parameters and , respectively, and a lower bound of 5 Mbps. In addition, the available storage space of the nodes has a uniform distribution and is generated uniformly in the range of 10–2000 GB. It has to be said that the nodes can be of different operating systems. Table 1 outlines the mentioned variables and their corresponding parameters. Additionally, in Table 2, a list of the important and most used abbreviations and definitions in this section is presented.
Table 1. Parameters corresponding to the distribution of the random variables in the simulation experiments
Variable | Distribution | Parameters |
|---|---|---|
Query arrival rate (Queries per second) | Exponential | |
Processing speed (GHz) | Normal | |
Number of cores | Uniform | U (2, 20) |
Download speed (Mbps) | Normal | N (50,160) |
Upload speed (Mbps) | Normal | N (30, 140) |
Available storage space (GB) | Uniform | U (10, 2000) |
Table 2. List of abbreviations and definitions
Abbreviation | Definition |
|---|---|
Number of levels | |
Degree of the hierarchical tree structure | |
Number of participating nodes | |
Shard size | |
Number of shards | |
(or ) | Number of the level-groups in each shard |
Number of consensus members in each level-group | |
Total number of transactions in the blockchain | |
Number of the subqueries for a given query | |
Failure probability of each level-group | |
Epoch number |
Efficiency
In order to study the efficiency of the proposed approach, five different experiments are conducted as follows:
Experiment I in this experiment, the proposed work is compared with a scenario where an honest full node processes the query (full-node scenario).
Experiment II in this experiment, the proposed work is compared with the scenario where the same queries are processed in a relational database which is an optimal scheme for query processing and results in an optimal query response time. The relational database used for this comparison is a centralized Oracle database (relational database scenario).
Experiment III in this experiment, the impact of the number of levels () and the degree () of the hierarchical tree structure on the query response time are assessed.
Experiment IV in this experiment, the efficiency of the subquery processing during the consensus processing and Join function that is executed by the representative for computation of the final query result is assessed.
Experiment V in this experiment, the proposed work is compared with SCATC [4], a query processing solution that introduces an efficient subchain-based query method.
In the following, the experiments, environmental variables, and evaluation results are explained in detail.
Experiment I: comparing the proposed work with the full-node scenario
The efficiency metrics for this comparison are the average query response time and average processing time that each node spends for query processing. The experiments are executed under different level numbers and different degrees . Figure 6a and b illustrates the query processing time and query response time, respectively, in comparison with the scenario where the queries are processed by a full node. In this experiment, the number of transactions is set to 500,000.
[See PDF for image]
Fig. 6
A comparison of the a processing time and b response time of the proposed work with the scenario where a full node processes the queries alone
According to the experiment results for different values of and , the proposed solution has gained a considerable improvement in both query response time and query processing time in comparison with where the query processing is conducted by a full node without reaching agreement over the query result. The reason is that, in the proposed solution, the query processing is parallelized both inter-shard and intra-shard, meaning that the query is processed among the shards in parallel. Furthermore, within each shard, the query processing is parallelized among the level-groups. As shown in Fig. 6, with increasing the number of levels and the degree of the tree structure, the improvement of the processing time and response time of the proposed solution is more considerable. The reason is that the greater numbers of and result in an increment in the number of the level-groups that process the queries in parallel.
Experiment II: comparing the proposed work with the relational database scenario
The efficiency metrics for this comparison are the average query response time and average processing time that each node spends for query processing. In this experiment, the number of transactions is set to 500,000.
According to the experiments conducted for comparing the proposed work with the relational databases that are an optimal solution for query processing, the time each node spends for query processing in the proposed solution is less than the time a relational database instance spends for processing a query (see Fig. 7a). On the other hand, the query response time in the proposed solution is a bit longer than relational databases as shown in Fig. 7b, since the nodes need to make consensus on the query results. Therefore, despite being distributed and consensus-based, the proposed solution leads to a reasonable query response time that is comparable with relational databases. In fact, it is a blockchain-based database that is near to the traditional databases in terms of query processing efficiency.
[See PDF for image]
Fig. 7
A comparison of the a processing time and b response time of the proposed work with the scenario where the queries are processed on a relational database
Experiment III: studying the impact of the level number and degree of the hierarchical tree structure on the query efficiency
In this section, the impact of the number of levels () and degree () of the hierarchical tree structure on the query processing time and response time is investigated. Figure 8 shows the results under different values of and while the number of transactions is set to 500,000. As shown in Fig. 8, as the value of and increases the query processing time and response time improve since the greater numbers of and result in an increment in the number of the level-groups in the shard, and as a consequence, the transaction data that each node needs to explore for processing a query is reduced. But, as and continue to grow, the percent of the improvement of the query response time decreases. The reason is that increasing the number of the level-groups raises the number of messages that are required for making consensus over the query result and leads to an incremented time for aggregating the query result from the bottom to the top of the hierarchical tree structure.
[See PDF for image]
Fig. 8
Studying the impact of the level number (l) and degree (d) of the hierarchical tree structure on a the query processing time and b response time
Experiment IV: studying the efficiency of subquery processing and Join function
In this experiment, the efficiency of the subquery processing and Join function that are, respectively, performed by the consensus members and the client’s representative is analyzed.
The processing time of a given subquery by a consensus member depends on number of the transactions stored on its local data ledger. If denotes the total number of transactions in the blockchain, then, with this assumption that all the shards maintain equal numbers of transactions, the transaction data that are maintained by each shard are where c is the number of shards. Therefore, the number of the transactions stored by a consensus member at level and level-group of a shard, approximately equals , where is the degree of the tree structure and so is the number of level-groups in each level . Therefore, the time complexity of processing a subquery by a consensus member is . Since the consensus members of each shard agree on the subquery result from the bottom to the top of the hierarchical structure of depth , the time complexity of the consensus process in a shard is .
On the other hand, to figure out the final query result, the representative needs to execute the Join function to join the results of subqueries where each subquery result contains TXIDs corresponding to the transactions of a specific type. To combine the results of subqueries, the representative first validates the transactions corresponding to the TXIDs included in the result of the first subquery, then fetches them from its stored transaction chains. If represents the number of transaction types (or shards) in the blockchain, the number of transactions of each type is assumed to be . Therefore, the time complexity of validating and fetching the transactions corresponding to the first type is . After that, the representative needs to perform the JOIN operation times based on the pointers that link different transaction types together (as shown in Fig. 5). The th JOIN operation requires, at most, searching through transaction chains consuming time . Subsequently, JOIN operations consume time that is less than . Hence, it can be expressed that the time complexity of the Join function is .
To study the efficiency of the subquery processing (consensus process) and Join function in the proposed work, we conduct an experiment to compare the average time of the subquery and Join function in milliseconds (ms) under different numbers of transactions () when and . The results are illustrated in Table 3.
Table 3. Average time of the subquery processing and Join function (, )
Average time of subquery processing (ms) | Average time of Join function (ms) | |
|---|---|---|
2.8 | 0.5 | |
3.3 | 0.6 | |
3.8 | 0.7 | |
4.1 | 0.9 | |
4.6 | 0.1 | |
5.3 | 0.12 |
Experiment V: comparing the proposed work with SCATC
In the following section, the efficiency of the proposed work is evaluated in comparison with SCATC [4] which introduces an efficient subchain-based query method. In SCATC, the account transaction chain is divided into multiple subchains whose last blocks are connected using a pointer. SCATC uses a subchain-by-subchain method to process a query while the proposed work uses a separate transaction chain for each entity (e.g., patient account in a blockchain-based healthcare system) that is parallelly processed both inter-shard and intra-shard by using a multi-level sharding solution.
Figure 9a and b demonstrates, respectively, the results of comparing the query response time and the throughput (in terms of queries per second QPS) of the proposed work with the SCATC method where for the proposed work and , and for SCATC, the number of subchains is set to 100. As can be seen, the proposed work has achieved a significant improvement in the query time and throughput compared to the SCATC under different numbers of transactions. The main reasons are as follows:
Inter-shard and intra-shard sharding different shards responsible for different transaction types process the queries in parallel. Therefore, processing a multi-dimensional query can be parallelized among different shards leading to a fast query processing. Moreover, in each shard, the query processing is parallelized among multiple levels and level-groups.
Efficient storage space the transaction data are stored based on a sharding-based data structure as each shard is responsible for storing and processing one of the transaction types (or more). In addition, in each shard, the nodes organized into multiple level-groups store only a portion of the transaction data in a DAG-based data structure where transactions related to different nodes are stored on separate chains leading to a reduced time for searching the related transactions.
[See PDF for image]
Fig. 9
A comparison of the a query response time and b throughput of the proposed work with the SCATC
Reliability
In this experiment, the reliability of the proposed solution in terms of the validity of the query result is evaluated. In this work, the validity is gained by a hierarchical consensus process where the relevant transactions are proposed and then validated by the consensus members of the level-groups from the bottom to the top of the hierarchical full tree structure. In fact, the transactions proposed by the consensus members of a level-group at level are validated by its ancestor level-groups at levels to . In addition, as mentioned before, to select consensus members of a level-group, we use the election network introduced by Rapidchain [45] where selected nodes are honest by majority with high probability. For smaller group sizes, the election network is resilient to malicious nodes up to a 1/3 fraction of its members. While for larger group size, it can tolerate up to a 1/2 fraction of the malicious nodes. Accordingly, making use of the election network algorithm, the failure probability of each level-group occurring when the majority of the selected members are malicious is a low probability close to zero represented by p. Therefore, the query processing fails whenever one or some irrelevant transactions are proposed by the majority of a level-group’s consensus members and are validated by all the ancestor level-groups. This occurs when the majority of the consensus members in all the level-groups in at least one path from the bottom to the top of the hierarchical full tree structure are malicious. Thus, the probability that the validity of the query processing in each level-group at level is violated equals where is the state under which all the levels from to have a malicious majority. Therefore, the failure probability of the consensus process at each is negligible and close to zero. To evaluate the impact of the percentage of malicious nodes on the failure probability of query results’ validity, the experiment was iterated 200 times when and and the number of transactions was set to 500,000. To simulate malicious nodes during each consensus-based query processing, we generate malicious nodes that randomly do one of the malicious activities during the consensus process based on the threat model introduced in Sect. 3.1.5 (e.g., sending no or delayed response to a query, sending invalid query results).
Figure 10 shows the average number of the level-groups that fail to output valid and relevant transactions and the number of queries whose processing fails during the processing of 500 queries. Indeed, the failure of the level-groups does not necessarily lead to the failure of query processing because when invalid and irrelevant transactions are proposed by the majority of the consensus members of a level-group, they can be rejected by the honest nodes in the ancestor level-groups. As shown in Fig. 10, only for 40, 50, and 60% of the malicious nodes, respectively, processing of 6, 62, and 224 queries fail.
[See PDF for image]
Fig. 10
The number of level-group and query processing failures in the proposed work per different percentages of malicious nodes
At the following, Table 4 provides an overview of the average score of the nodes at different levels for epochs e = 1, 2, 4, 8, and 12 (1 h long epochs) that are outputted from the current experiment. Moreover, the average score achieved by the consensus members of each level during the simulation of the protocol execution can be observed in the last column of Table 4.
Table 4. Average score of the nodes at different levels (l) during epochs e = 1, 2, 4, 8, and 12
Average of the scores achieved by the consensus members | ||||||
|---|---|---|---|---|---|---|
286 | 386 | 723 | 1598 | 1960 | 44 | |
92 | 193 | 489 | 858 | 1406 | 39 | |
57 | 109 | 221 | 513 | 1112 | 21 | |
39 | 81 | 131 | 291 | 502 | 12 |
Privacy of query processing
In this work, to reach the privacy of the query result, each client selects a full node as its representative. If the client is a full node, it can declare itself as its representative. After calculating the result of the subquery, the root consensus members of the shards participating in processing a specific query send the final list of the relevant transactions’ IDs to the client’s representative. The representative, after validating all the relevant transactions, calculates the final query result based on the operations described in Sect. 3.2.3 and then sends the encrypted final result to the client. Therefore, only the client and its representative are aware of the final query result and privacy of the query result is protected. To evaluate the privacy of the query results, the percentage of the malicious nodes that are selected as representative was analyzed. The simulation was run 200 times for the different percentages of the malicious nodes while , and . As Fig. 11 shows, the percentage of malicious representatives is insignificant because the selection of the representatives is based on their score. Meaning that, the higher the score of a full node, the more likely that it will be selected as a representative. To ensure selecting the honest representatives, the clients can pay more fees and select more than one representative.
[See PDF for image]
Fig. 11
The number of malicious representatives in the proposed work per different percentages of malicious nodes
Communication complexity
In this work, each level-group i at level j of the hierarchical tree has members, where is the number of level members that equals , because assuming the different levels have equal number of the members, members of each shard (shard size), are divided into l levels. After that, each level j in a shard having members is partitioned into level-groups in turn. Since each transaction is sent to the level-group, it belongs to as well as its ancestor level-groups at level to of tree structure, the relevant transactions related to are sent to . In addition, assuming that all the level-groups in each shard need to participate in query processing, the number of the subquery results that are transferred in each shard is calculated as follows: the consensus members in the hierarchical tree structure send the subquery result to all of the consensus members of their parent level-group. Since each hierarchical tree has level-groups consisting of consensus members, there are consensus members, which send the subquery results to consensus members of their parent level-group. Therefore, messages are transferred in each shard per each query.
Overall comparison with other works
In the proposed work, query time is dependent on two main parameters: (1) the number of transactions that a node stores on its local data ledger and needs to search through to process a query; and (2) the total number of level-groups across all the shards, since the query processing is parallelized both inter-shard and intra-shards. If denotes the total number of transactions in the blockchain, the complexity of number of transactions a node needs to trace for processing a given query is where is the number of shards and and are, respectively, the number of levels and the degree of the hierarchical tree structure in the blockchain system. Furthermore, each shard has level-groups, then the number of level-groups in all shards equals . Consequently, the complexity of the query time can be expressed as , because query time is directly proportional to the number of transactions and inversely proportional to the number of level-groups.
In addition, the storage space used by each node is proportional to the level number the node belongs to. If it is assumed that all the shards have the same number of nodes () and refers to the size of the transaction data in the blockchain, then the transaction data that is maintained by each shard is = where refers to the number of shards. Moreover, according to the hierarchical tree structure, the size of the transaction data that each node belonging to level and level-group of a shard stores equals ), where is the degree of the tree structure and so is the number of level-groups in each level . Subsequently, the complexity of the storage space that a node requires to store the transaction data can be expressed as .
Table 5 summarizes the comparison between the proposed work with some state-of-the-art query processing solutions in the blockchain. The comparison is conducted in terms of the contributions of these query processing solutions, their underlying techniques, query time complexity, and storage usage per node.
Table 5. Overall comparison of the proposed work with other query processing methods in the blockchain
Underlying techniques | Efficient | Verifiable | Privacy-preserving | Query time Complexity | Storage space complexity | |
|---|---|---|---|---|---|---|
SE-Chain [1] | AB-M tree | ✓ | × | × | ||
SCATC [4] | Subchain-based index structure | ✓ | × | × | * | |
vchain [24] | Intra-block Index, Inter-block index, and Ip-tree | ✓ | ✓ | × | ||
The proposed work | Multi-level sharding | ✓ | ✓ | ✓ |
*h represents the block height in the SCATC [4]
Conclusion
In this paper, an efficient query processing approach is presented that guarantees query validity and preserves the privacy of query results. At its core, this work designs a multi-level and score-based sharding solution where the participating nodes of each shard are organized into a hierarchical tree-like structure based on their assigned score and process the queries from the bottom to the top of the hierarchical structure and make a consensus over the query results. Taking advantage of this hierarchical structure and the consensus-based query processing, the proposed solution processes the queries in a high-performance and reliable manner ensuring validity of the query results. Additionally, to preserve the privacy of query results, the proposed work uses a delegation-based method and an encryption mechanism. Finally, the evaluation results show that the proposed work provides a considerable enhancement in query time and its efficiency is comparable with the relational databases, while highly resistant against the malicious nodes that intend to cause invalid query results and data leakage. In addition, this work provides a storage-efficient approach for storing transaction data. For future work, a machine-learning-based solution can be used for scoring the nodes based on their contributions.
Algorithm 1
The consensus algorithm for query processing.
Author contributions
Alemeh Matani was responsible for writing—original draft, investigation, conceptualization, methodology, and validation. Amir Sahafi and Ali Broumandnia took part in review and supervision.
Declarations
Conflict of interest
The authors declare no competing interests.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Jia, D-Y; Xin, J-C; Wang, Z-Q; Lei, H; Wang, G-R. SE-chain: a scalable storage and efficient retrieval model for blockchain. J Comput Sci Technol; 2021; 36,
2. Qu, Q; Nurgaliev, I; Muzammal, M; Jensen, CS; Fan, J. On spatio-temporal blockchain query processing. Futur Gener Comput Syst; 2019; 98, pp. 208-218. [DOI: https://dx.doi.org/10.1016/j.future.2019.03.038]
3. Muzammal, M; Qu, Q; Nasrulin, B. Renovating blockchain with distributed databases: an open source system. Futur Gener Comput Syst; 2019; 90, pp. 105-117. [DOI: https://dx.doi.org/10.1016/j.future.2018.07.042]
4. Xing, X; Chen, Y; Li, T; Xin, Y; Sun, H. A blockchain index structure based on subchain query. J Cloud Comput; 2021; 10, pp. 1-11. [DOI: https://dx.doi.org/10.1186/s13677-021-00268-0]
5. Gürsoy, G; Brannon, CM; Gerstein, M. Using Ethereum blockchain to store and query pharmacogenomics data via smart contracts. BMC Med Genomics; 2020; 13,
6. Li, D; Han, D; Crespi, N; Minerva, R; Li, K-C. A blockchain-based secure storage and access control scheme for supply chain finance. J Supercomput; 2023; 79,
7. Hao, K; Xin, J; Wang, Z; Yao, Z; Wang, G. On efficient top-k transaction path query processing in blockchain database. Data Knowl Eng; 2022; 141, 102079. [DOI: https://dx.doi.org/10.1016/j.datak.2022.102079]
8. Wang H, Xu C, Zhang C, Xu J, Peng Z, Pei J (2022) vchain+: optimizing verifiable blockchain boolean range queries. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE). pp 1927–1940
9. Rahman, MS; Khalil, I; Moustafa, N; Kalapaaking, AP; Bouras, A. A blockchain-enabled privacy-preserving verifiable query framework for securing cloud-assisted industrial internet of things systems. IEEE Trans Ind Informatics; 2021; 18,
10. Jiang, S; Liu, J; Chen, J; Liu, Y; Wang, L; Zhou, Y. Query integrity meets blockchain: a privacy-preserving verification framework for outsourced encrypted data. IEEE Trans Serv Comput; 2021; 16,
11. Yang, W; Sun, B; Zhu, Y; Wu, D. A secure heuristic semantic searching scheme with blockchain-based verification. Inf Process Manag; 2021; 58,
12. Ma, X; Wang, C; Chen, X. Trusted data sharing with flexible access control based on blockchain. Comput Stand Interfaces; 2021; 78, 103543. [DOI: https://dx.doi.org/10.1016/j.csi.2021.103543]
13. Zou, R; Lv, X; Zhao, J. SPChain: Blockchain-based medical data sharing and privacy-preserving eHealth system. Inf Process Manag; 2021; 58,
14. Abuhashim A, Tan CC (2020) Smart contract designs on blockchain applications. In: 2020 IEEE Symposium on Computers and Communications (ISCC). pp 1–4
15. Chishti, MS; Sufyan, F; Banerjee, A. Decentralized on-chain data access via smart contracts in ethereum blockchain. IEEE Trans Netw Serv Manag; 2021; 19,
16. Li Y, Zheng K, Yan Y, Liu Q, Zhou X (2017) EtherQL: a query layer for blockchain system. In: Database Systems for Advanced Applications: 22nd International Conference, DASFAA 2017, Suzhou, China, March 27–30, Proceedings, Part II 22. pp 556–567
17. Pratama FA, Mutijarsa K, (2018) Query support for data processing and analysis on ethereum blockchain. In: 2018 International Symposium on Electronics and Smart Devices (ISESD). pp 1–5
18. Wang S, et al. (2018) Forkbase: an efficient storage engine for blockchain and forkable applications. arXiv Prepr. arXiv:1802.04949
19. Zhang Z, Zhong Y, Yu X (2021) Blockchain storage middleware based on external database. In: 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP). pp 1301–1304
20. Zeng L, Qiu W, Wang X, Wang H, Yao Y, Yu Z (2021) Transaction-based static ındexing method to ımprove the efficiency of query on the blockchain. In: 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). pp 780–784
21. Huang T-L, Huang J, (2020) An efficient data structure for distributed ledger in blockchain systems. In: 2020 International Computer Symposium (ICS). pp 175–178
22. Ruan, P; Anh Dinh, TT; Lin, Q; Zhang, M; Chen, G; Chin Ooi, B. Revealing every story of data in blockchain systems. ACM Sigmod Rec; 2020; 49,
23. Xu Y, Zhao S, Kong L, Zheng Y, Zhang S, Li Q (2017) ECBC: a high performance educational certificate blockchain with efficient query. In: Theoretical Aspects of Computing–ICTAC 2017: 14th International Colloquium, Hanoi, Vietnam, October 23–27, Proceedings 14. pp 288–304
24. Xu C, Zhang C, Xu J (2019) Vchain: enabling verifiable boolean range queries over blockchain databases. In: Proceedings of the 2019 International Conference on Management of Data. pp 141–158
25. Zhu Y, Zhang Z, Jin C, Zhou A (2020) Enabling generic verifiable aggregate query on blockchain systems. In: 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS). pp 456–465
26. Loporchio, M; Bernasconi, A; Maesa, DDF; Ricci, L. Authenticating spatial queries on blockchain systems. IEEE Access; 2021; 9, pp. 163363-163378. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3132990]
27. Tian G, Wei J, Kutyłowski M, Susilo W, Huang X, Chen X (2022) VRBC: a verifiable redactable blockchain with efficient query and integrity auditing. IEEE Trans Comput 72(7):1928–1942
28. Peng, Z et al. Vfchain: enabling verifiable and auditable federated learning via blockchain systems. IEEE Trans Netw Sci Eng; 2021; 9,
29. Shao Q, Pang S, Zhang Z, and Jing C (2020) Authenticated range query using SGX for blockchain light clients. In: Database Systems for Advanced Applications: 25th International Conference, DASFAA 2020, Jeju, South Korea, September 24–27, Proceedings, Part III 25. pp 306–321
30. Pang S, Shao Q, Zhang Z, Jin C (2020) Authqx: enabling authenticated query over blockchain via intel sgx. In: Database Systems for Advanced Applications: 25th International Conference, DASFAA 2020, Jeju, South Korea, September 24–27, Proceedings, Part III 25. pp 727–731
31. Niu Y, Zhang C, Wei L, Xie Y, Zhang X, Fang Y (2019) An efficient query scheme for privacy-preserving lightweight bitcoin client with Intel SGX. In: 2019 IEEE Global Communications Conference (GLOBECOM). pp 1–6
32. Zhou W, Cai Y, Peng Y, Wang S, Ma K, Li F (2021) Veridb: an sgx-based verifiable database. In: Proceedings of the 2021 International Conference on Management of Data. pp 2182–2194
33. Wu, H; Peng, Z; Guo, S; Yang, Y; Xiao, B. VQL: efficient and verifiable cloud query services for blockchain systems. IEEE Trans Parallel Distrib Syst; 2021; 33,
34. Cai, C; Weng, J; Yuan, X; Wang, C. Enabling reliable keyword search in encrypted decentralized storage with fairness. IEEE Trans Dependable Secur Comput; 2018; 18,
35. Tahir S, Rajarajan M (2018) Privacy-preserving searchable encryption framework for permissioned blockchain networks. In: 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). pp 1628–1633
36. Hu, S; Cai, C; Wang, Q; Wang, C; Wang, Z; Ye, D. Augmenting encrypted search: a decentralized service realization with enforced execution. IEEE Trans Dependable Secur Comput; 2019; 18,
37. Chen, C-L; Yang, J; Tsaur, W-J; Weng, W; Wu, C-M; Wei, X. Enterprise data sharing with privacy-preserved based on hyperledger fabric blockchain in IIOT’s application. Sensors; 2022; 22,
38. Chen Y, Bai J, Hao Y, Liao S, Yi Z, Zhang H, (2020) Blockchain-based dynamic group management for multiple keywords searchable encryption technology. In: 2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). pp 1–6
39. Linoy S, Mahdikhani H, Ray S, Lu R, Stakhanova N, Ghorbani A (2019) Scalable privacy-preserving query processing over ethereum blockchain. In: Proceedings—2019 2nd IEEE International Conference Blockchain, Blockchain 2019, pp 398–404. https://doi.org/10.1109/Blockchain.2019.00061
40. Ge, L; Jiang, T. A privacy protection method of lightweight nodes in blockchain. Secur Commun Networks; 2021; 2021, pp. 1-17. [DOI: https://dx.doi.org/10.1155/2021/2067137]
41. Yang, M; Margheri, A; Hu, R; Sassone, V. Differentially private data sharing in a cloud federation with blockchain. IEEE Cloud Comput; 2018; 5,
42. Zhao, Y et al. A blockchain-based approach for saving and tracking differential-privacy cost. IEEE Internet Things J; 2021; 8,
43. Xu, L; Bao, T; Zhu, L. Blockchain empowered differentially private and auditable data publishing in industrial IoT. IEEE Trans Ind Informatics; 2020; 17,
44. Luu L, Narayanan V, Zheng C, Baweja K, Gilbert S, Saxena P (2016) A secure sharding protocol for open blockchains. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. pp 17–30
45. Zamani M, Movahedi M, Raykova M (2018) Rapidchain: Scaling blockchain via full sharding. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. pp 931–948
46. Buterin V (2013) Ethereum whitepaper. https://ethereum.org/en/whitepaper/ Accessed 27 Mar 2021
47. LeMahieu C (2018) Nano: a feeless distributed cryptocurrency network. Nano https://nano.org/en/whitepaper Accessed 24 March 2018
48. Liu Y, Liu J, Li D, Yu H, Wu Q (2020) Fleetchain: a secure scalable and responsive blockchain achieving optimal sharding. In: International Conference on Algorithms and Architectures for Parallel Processing. pp 409–425
49. Wang J, Wang H (2019) Monoxide: scale out blockchains with asynchronous consensus zones. In: 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19). pp 95–112
50. Hong Z, Guo S, Li P, Chen W (2021) Pyramid: a layered sharding blockchain system. In: IEEE INFOCOM 2021-IEEE Conference on Computer Communications. pp 1–10
51. Huang, C et al. RepChain: a reputation-based secure, fast, and high incentive blockchain system via sharding. IEEE Internet Things J; 2020; 8,
52. Wu, Z; Liu, H; Xie, J; Xu, G; Li, G; Lu, C. An effective method for the protection of user health topic privacy for health information services. World Wide Web; 2023; 26,
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.