Content area
The success of Bitcoin brings tremendous interest to its underneath technology, the blockchain. A blockchain is a decentralized system that orchestrates mutually distrusting parties towards a unanimous agreement on a ledger of the transaction history, without resorting to third parties. For years, blockchains find their applications solely in cryptocurrencies, serving monetary transfers between anonymous transactors. However, this situation is now radically changed — smart contracts in blockchains enable arbitrary data transitions more than cryptocurrency transfers; demands are running high to tap the blockchains’ potential for use cases where accountability overrides anonymity. Blockchains are then shifted away from their original purpose. They enter the data-processing domain, where decades of experience from the database community is too copious to overlook. In this thesis, we focus on the optimization of blockchains from the perspective of data systems. Particularly, we target permissioned blockchains, which, unlike Bitcoin, run with identifiable parties and exhibit wider applicability.
Firstly, we treat a blockchain (either permissioned or permissionless) also as a generic distributed system, and as such it shares some similarities with distributed databases. Existing works that compare both systems focus mainly on high-level properties. They stop short of showing how the underlying designs contribute to the overall differences. This work is to fill this important gap. We perform a twin study of blockchains and distributed databases as two types of transactional systems. We propose a taxonomy that illustrates the dichotomy across four dimensions, namely replication, concurrency, storage, and sharding. Within each dimension, we discuss how the designs are driven by two goals: security for blockchains, and performance for distributed databases. To expose the impact of different design choices on the overall performance, we conduct an in-depth performance analysis of two permissioned blockchains, namely Quorum and Hyperledger Fabric, and two distributed databases, namely TiDB, and etcd. Lastly, we propose a framework for back-of-the-envelope performance forecast of blockchain-database hybrids.
Secondly, with a tamper-evident ledger for recording transactions that modify some global states, a blockchain system captures the entire evolution history of the states. The management of that history, also known as data provenance or lineage, has been studied extensively in database systems. However, querying data history in existing blockchains can only be done by replaying all transactions. This approach is applicable to large-scale, offline analysis, but is not suitable for online transaction processing. We hence present FabricSharp, a fine-grained, secure and efficient provenance system for blockchains. FabricSharp exposes provenance information to smart contracts via simple and elegant interfaces. FabricSharp captures provenance during contract execution, and efficiently stores it in a Merkle tree. FabricSharp provides a novel skip list index designed for efficient provenance queries. We have implemented FabricSharp on Hyperledger Fabric v2.2 and a blockchain-optimized storage called ForkBase. Our evaluation of FabricSharp demonstrates its benefits to the new class of blockchain applications, its efficient query, and small storage overhead.
Thirdly, catering for emerging business requirements, a new architecture called execute-order-validate has been proposed in Hyperledger Fabric to support parallel transactions and improve the blockchain’s throughput.