Content area
Full Text
Hope springs eternal in the database business. While we’re still hearing about data warehouses (fast analysis databases, typically featuring in-memory columnar storage) and tools that improve the ETL step (extract, transform, and load), we’re also hearing about improvements in data lakes (which store data in its native format) and data federation (on-demand data integration of heterogeneous data stores).
Presto keeps coming up as a fast way to perform SQL queries on big data that resides in data lake files. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes. Presto allows querying data where it lives, including Hive, Cassandra, relational databases, and proprietary data stores. A single Presto query can combine data from multiple sources. Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse.
The Presto Foundation is the organization that oversees the development of the Presto open source project. Facebook, Uber, Twitter, and Alibaba founded the Presto Foundation. Additional members now include Alluxio, Ahana, Upsolver, and Intel.
Ahana Cloud for Presto, the subject of this review, is a managed service that simplifies Presto for the cloud. As we’ll see, Ahana Cloud for Presto runs on Amazon, has a fairly simple user interface, and has end-to-end cluster lifecycle management. It runs in Kubernetes and is highly scalable. It has a built-in catalog and easy integration with data sources, catalogs, and dashboarding tools.
Competitors to Ahana Cloud for Presto include Databricks Delta Lake, Qubole, and BlazingSQL. I will draw comparisons at the end of the article.
Presto and Ahana architecture
Presto is not a general-purpose relational database. Rather, it is a tool designed to efficiently query vast amounts of data using distributed SQL queries. While it can replace tools that query HDFS using pipelines of MapReduce jobs such as Hive or Pig, Presto has been extended to operate over different kinds of data sources including traditional relational databases and other data sources such as Cassandra.
In short, Presto is not designed for online transaction processing (OLTP), but for online analytical processing (OLAP) including data analysis, aggregating large amounts of data, and producing reports. It can query...